A Novel Clustering Approach and Adaptive SVM

Sustainable Computing: Informatics and Systems 23 (2019) 120–135
Contents lists available at ScienceDirect
Sustainable Computing: Informatics and Systems

journal homepage: www.elsevier.com/locate/suscom
A novel clustering approach and adaptive SVM classifier for intrusion

detection in WSN: A data mining concept
Gautam M. Borkar a,∗ , Leena H. Patil b , Dilip Dalgade c , Ankush Hutke d
a
DY Patil, Department of Information Technology, Ramrao Adik Institute of Technology, Nerul, Navi Mumbai 400706, Maharashtra, India
b
Department of Computer Science and Engineering, Priyadarshini Institute of Engineering and Technology, Priyadharshini Campus, Digdal Hill Hingna
Road, Nagpur, 440019, Maharashtra, India
c
MCT’s Department of Conputer Engineering, Rajiv Gandhi Institute of Technology, Juhu Versovalink Road, Behind HDFC Bank Versova Andheri(w),
Mumbai, 400053, Maharashtra, India
d
MCT’s Department of Information Technology, Rajiv Gandhi Institute of Technology, Juhu Versovalink Road, Behind HDFC Bank Versova Andheri(w),
Mumbai, 400053, Maharashtra, India
a r t i c l e i n f o a b s t r a c t
Article history: Nowadays Wireless Sensor Network (WSN) mainly faces security issue during packet transmission
Received 2 August 2018 between different sensor nodes in network combined with data mining. To overcome this challenge
Received in revised form 16 March 2019 an efficient clustering technique called adaptive chicken swarm optimization algorithm is proposed for
Accepted 12 June 2019
cluster head (CH) selection. By this adaptive method the time consumption is reduced to a greater extend
Available online 23 June 2019
along with that the lifetime of the network and the scalability is improved alternatively. Additionally a
two stage classification technique known as adaptive SVM classification a supervised learning technique
Keywords:
is proposed with Intrusion Detection System (IDS) where an acknowledgement based method is utilized
Wireless sensor network (WSN)
Intrusion detection system (IDS)
for reporting the malicious sensor nodes. By this acknowledgement different types of attacks such as
Security DOS, probe, U2R, R2L are detected incorporation with Intrusion Detection System (IDS). Once detected
Chicken swarm optimization (CSO) a high level security mechanism along with intrusion response is provided to other sensor nodes by
Rotated random forest (RRF) which a secure packet transmission occurs between different sensor nodes. The proposed methodology
Support vector machine (SVM) is implemented in python platform and the comparison results provided with existing methods proves
Clustering a better result.
High–level security © 2019 Published by Elsevier Inc.
1. Introduction avoided. According to Mehmood et al. [2], there are different types
of possible attacks on WSNs like routing attacks, Sybil attacks and
The Wireless sensor networks (WSNs) are infrastructure-less, denial of service (DoS) etc. Intrusion detection systems (IDS) can be
distributed and dynamic in nature [1]. The in richness capabilities of used in WSNs to detect the suspicious behaviour of nodes inside the
the WSN change to area of emergence technologies. Fog computing WSNs [3]. Cluster-based WSNs can reduce the performance load
has an excellent example. In order to satisfying mobility support, in terms of reducing the aggregate computation and energy con-
geo distribution, locational awareness, and to low latency needs sumption of all the nodes [4]. Due to technological development,
for the IoT applications, the Fog node facilitates the user in the exe- WSNs have become visible and are used for various purposes in our
cution of IoT applications. Due to the vulnerable nature of WSNs, daily life. Therefore, security in such networks is mainly focused
these networks are always exposed to severe types of threats which on ensuring reliable performance of nodes in the network. IDS-
can vitiate their whole functionality. Authentication protocols and based systems are very effective for detecting irregular actions of
secure routing protocols implement the use of cryptographic keys inner nodes of networks, preventing the whole network from var-
to ensure secure transmission of data but cannot give protection ious types of malicious attacks. The IDS agents will collect and
against the inside attacks knows as passive attacks. These protocols analyze the abnormal behaviour of nodes in a time period and
scramble. The valuable data from intruders who try to access them then apply appropriate actions. The work [5] has discussed various
from outside, but a passive attack from a node inside cannot be detection mechanisms for analysis. There are three possible ways
of implementing IDS agents: centralized, distributed and hybrid.
These agents are more efficient if installed at base stations, the
∗ Corresponding author. centralized approach, because it does not affect the performance
E-mail address: gautamborkar2@gmail.com (G.M. Borkar). on small nodes in the network. According to [6], the term Situation
https://doi.org/10.1016/j.suscom.2019.06.002
2210-5379/© 2019 Published by Elsevier Inc.
G.M. Borkar et al. / Sustainable Computing: Informatics and Systems 23 (2019) 120–135 121
Security of data is considered to be one of the most important

concerns in today’s world. Data is vulnerable to various types of
intrusion attacks that may reduce the utility of any network or sys-
tems. Identifying and preventing such attacks is known as Intrusion
Detection System and it is one of the most challenging tasks. Intru-
sion Detection System is a type of security management system for
computers and networks. It gathers and analyzes information from
various areas within a computer or a network to identify possi-
ble security breaches, which include both intrusions (attacks from
outside the organization) and misuse (attacks from within the orga-
nization). ID uses vulnerability assessment, developed to assess the
security of a computer system or network. Data is considered to be
the most important aspect of any organization. If the organization’s
data is secure, only then it can successfully carry out its opera-
tions. In this paper, an efficient classifier with data mining concept
is introduced for the detection of intrusions accurately with less
time.
Nowadays the detection of malicious nodes during the trans-
formation of packets to the base station is a trending research
concept. For achieving this, the occurrence of attack should be
detected efficiently with less time. As a result, this paper pro-
Fig. 1. Layers of OSI. poses an efficient classifier based data mining technique to solve
the problem of intrusion detection. This paper is organized as fol-
lows. Section 2 examines the related research associated with our
proposed strategy. Section 3 describes about the proposed method-
Awareness (SA) is defined as the perception of some elements in ology that includes adaptive clustering mechanism, RRF, High-level
the environment within a space and time period which can be used Security Mechanism. By this initially clustering is processed tracked
for projecting the near future. There are four levels of SA: percep- by CH selection done with aid of adaptive chicken swarm opti-
tion, comprehension, projection and resolution [7]. Implementing mization algorithm. Some unwanted features recognized with the
security measurements on CHs (Cluster Head) in a cluster-based sensor nodes are reduced by RRF and then the malicious modes
network is very beneficial due to its partially centralized approach are classified based on two stage classification process by adaptive
for addressing prevention against various threats. The purpose of SVM classifier. Finally to protect the sensor nodes without attack a
CHs is to collect knowledge on the nodes’ behavior and to deter- high level security mechanism is implemented. Section 4 describes
mine their s operation in the near future. The knowledge base in about the results both simulation results and comparison results.
knowledge-based systems (KBSs) is used to gather and store data Section 5 describes conclusion followed by references.
in symbolized form from various scenarios [8]. A knowledge base
can be created for context awareness to overcome possible security
threats from both internal and external intruders [9,10]. 2. Related Researches
Data freshness is very important in wireless sensor networks.
Because an attacker can send an expire packet to waste the network This section provides an overview of the techniques available
resources and decrease in network lifetime [11]. in the literature for classification, clustering and data mining. Sev-
There are four major types of attacks namely Denial of Service eral techniques have been presented in the last decade, but there
(DOS), User to Root (U2R), Remote to Local (R2L) and probe and it was considerable differences in, each with respect to the overall
occurs mainly in four layers of OSI model. time, classification accuracy, dataset reduction etc. A summary of
These four layers include transport layer, network layer, the different approaches and classification techniques for WSN is
datalink layer and physical layer and it is shown in Fig. 1 [12]. presented below.
DOS attack occurs in both network and physical layer. An attack Kuila. P & Jana. P.K [17] presented a two algorithms based on par-
is said to be DOS, if any one of the three condition is satisfied i.e. ticle swarm optimization (PSO) for Linear/Nonlinear Programming
selective forwarding (attacker selectively drops the packet based (LP/NLP) formulations of problems. Energy efficient clustering and
on some predefined criterion), tampering (if there is no encryption routing are two well-known optimization problems which have
mechanism then tampering arises), jamming (interferes with radio been studied widely to extend lifetime of wireless sensor networks
frequencies used by network nodes) [13]. U2R attack occurs mainly (WSNs). Here the routing algorithm was developed with an efficient
in network layer. When an illegal node sends a Hello flood request particle encoding scheme and multi-objective fitness function. The
to any legitimate node in the network it is said to be U2R attack. R2L clustering algorithm was presented by the consideration of energy
attack occurs in network layer [14]. When either Sybil attack (hav- conservation of the nodes through load balancing. This algorithm
ing multiple identities), wormhole attack (attacking node captures was implemented extensively.
the packets from one location and transmits them to other distant Gupta. V & Sharma. S.K [18] discussed about an approach for
located node which distributes them locally) or acknowledgement selecting the cluster head in WSN by the utilization of swarm intel-
spoofing (one node successfully acts as another by falsifying data) ligence. This approach was based on LEACH clustering algorithm.
occurs then it is said to be R2L attack [15]. Probe attack also occurs Modified version of ant colony optimization by using residual
in network layer and this attack occurs when either spoofed routing energy as a parameter was employed over LEACH algorithm for
information (data sent in fake path) or altered routing information the process of cluster head selection. This approach reduced the
(data is altered and sent along the network to the base station) amount of energy consumption. This approach works in three
or replayed routing information (valid data transmission is mali- stages such as cluster members (which transmits their data directly
ciously repeated or delayed) or sinkhole attack (occurs when there to their cluster heads), cluster heads (which transmit their data to
is a communication from multiple nodes to base station) [16]. leader) and leader (which transmits data to the base station).
122 G.M. Borkar et al. / Sustainable Computing: Informatics and Systems 23 (2019) 120–135
Ahmad. I et al. [19] presented genetic algorithm for search- 3. Proposed methodology
ing the genetic principal components that offers a subset of
features with optimal sensitivity and the highest discriminatory Security of data is considered to be one of the most important
power. The selecting of an appropriate number of principal com- concerns in today’s world. Data is vulnerable to various types of
ponents was a critical problem in subset selection. Before the intrusion attacks that may reduce the utility of any network (Mobile
process of classification, the raw dataset was pre-processed in Ad-hoc NETwork (MANET)) or systems. Identifying and preventing
three ways such as discarding symbolic values, feature transfor- such attacks is done by Intrusion Detection System (IDS) and it is
mation using PCA and optimal features subset selection using one of the most challenging tasks. Intrusion Detection System is a
GA. The support vector machine (SVM) was used for the classi- type of security management system for computers and networks.
fication process. This research work used knowledge discovery It gathers and analyzes information from various areas within a
and data mining cup dataset for experimentation. The perfor- computer or a network to identify possible security breaches, which
mance of this approach was analyzed and compared with existing include both intrusions (attacks from outside the organization) and
approaches. misuse (attacks from within the organization). Intrusion Detection
Ni. Q et al. [20] discussed about a solution for the problem of clus- (ID) uses vulnerability assessment, developed to assess the security
ter head selection which is an important step in WSN. This solution of a computer system or network. Data is considered to be the most
was based on fuzzy clustering preprocessing and particle swarm important aspect of any organization. If the organization’s data is
optimization. More specifically, fuzzy clustering algorithm was secure, only then it can successfully carry out its operations. In this
used to initial clustering for sensor nodes according to geographical work, an efficient classifier with data mining concept is introduced
locations, where a sensor node belongs to a cluster with a deter- for the detection of intrusions accurately with less time.
mined probability, and the number of initial clusters was analyzed Currently WSN suffers mainly from the problem of security issue
and discussed. Furthermore, the fitness function was designed with while transferring packets from one sensor node to another in a net-
the consideration of both the energy consumption and distance work. In order to overwhelm this problem, an efficient technique is
factors of wireless sensor network. Finally, the cluster head nodes introduced by proposing an efficient classification technique with
in hierarchical topology was determined based on the improved data mining. Firstly, a group of sensor nodes is given as input and
particle swarm optimization. it is clustered because clustering improves the lifetime of the net-
Wang. G et al. [21] presented an approach, called FC-ANN work and improves scalability. Here clustering is performed by a
based on ANN and fuzzy clustering, to solve the problem and novel clustering process known as stratified sampling based on
help IDS achieve higher detection rate, less false positive rate and the nodes weight. The advantage of this sampling technique is
stronger stability. Through fuzzy clustering technique, the hetero- that it has high degree of representativeness than other sampling
geneous training set was divided to several homogenous subsets. techniques. After that cluster head selection is performed by the
The general procedure of FC-ANN was given as follows: firstly fuzzy employment of adaptive chicken swarm optimization algorithm
clustering technique was employed to generate different training (ACSO). The main advantage of this algorithm over traditional CSO
subsets. Subsequently, based on different training subsets, differ- is the process of performing sampling which is absent in traditional
ent ANN model was trained to formulate different base models. CSO. The advantage of this adaptiveness is, it mainly aims to reduce
Finally, a meta-learner, fuzzy aggregation module, was employed the time consumption for selecting the best cluster head.
to aggregate these results. After the selection of cluster head (CH), an ensemble known
K. Kalaiselvi et al. [22] stated that Wireless body area network as Rotated Random Forest (RRF) is employed to reduce the fea-
(WBAN) is a promising methodology in present health care sys- tures in the database. The advantage of this RRF is that it performs
tems to monitor, detect, predict and diagnose the disease in people. more accurately with less time than ordinary random forest. Finally,
The performance of the WBAN network is affected by un-trusted the reduced features are given to the 2-stage classification pro-
nodes in WBAN network. The un-trusted sensor nodes are formed cess. For the process of classification, adaptive SVM (Support Vector
in WBAN network due to the attackers from outside the world. In Machine) classifier a machine learning technique is utilized. In the
this paper, sensor node classification algorithm is proposed which first stage, by the utilization of acknowledgement based method,
incorporates ANFIS classifier based trusted and un-trusted sensor the sensor node is recognized that whether it is attacked or not.
nodes detection and classification system is proposed in order to In the second stage, the malicious sensor nodes are processed to
improve the efficiency of the WBAN networks. This proposed sys- obtain the type of attack based on the given conditions. There
tem constitutes feature extraction and classification modules. The are four types of attack namely, DOS, probe, U2R, R2L and after
trust features are extracted from sensor nodes and these extracted the detection of attack in WSN it is rectified by use of High-Level
features are optimized using genetic algorithm. The performance Security Mechanism comprising of cryptographic function to offer
of the WBAN network is analyzed in terms of classification rate, security. The overall process flow for the proposed technique is
packet delivery ratio and latency. illustrated below in Fig. 2. With addition of data mining technique
Walid Balid et al. [23] stated that Real-time traffic surveillance the location and topology information is inferred without explicitly
is essential in today’s intelligent transportation systems and will knowing other network management data and localization.
surely play a vital role in tomorrow’s smart cities. The work detailed
in this paper reports on the development and implementation of a 3.1. Dataset
novel smart wireless sensor for traffic monitoring. Computationally
efficient and reliable algorithms for vehicle detection, speed and This work is tested with aid of KDD cup 1999 dataset compris-
length estimation, classification, and time-synchronization were ing of 41 features labelled as attack or normal. These features has
fully developed, integrated, and evaluated. Comprehensive system continuous and symbolic forms described into four categories such
evaluation and extensive data analysis were performed to tune and as intrinsic features, content features, same host features and sim-
validate the system for a reliable and robust operation. Several field ilar service features. This dataset also illustrates about the types of
studies conducted on highway and urban roads for different scenar- attack as in Table 1.
ios and under various traffic conditions resulted in 99.98% detection In User-to-root attack, the attacker tries to access normal user
accuracy, 97.11% speed estimation accuracy, and 97% length-based account and gains root access information of the system. The U2R
vehicle classification accuracy. The developed system is portable, attacks leads to several vulnerability such as sniffing password, a
reliable, and cost-effective. dictionary attack and social engineering attacks.
Fig. 2. Proposed Approach Illustration.
Table 1 Probe-response attacks are a new threat for collaborative intru-

Attacks falling into four major categories.
sion detection systems. A probe is an attack which is deliberately
Denial of Service Attacks Back, land, neptune, pod, smurf, teardrop crafted so that its target detects and reports it with a recognizable
User to Root Attacks Buffer overflow, loadmodule, perl, rootkit, “fingerprint “in the report.
Remote to Local Attacks Ftp write, guess passwd, imap, multihop, phf,
spy, warezclient, warezmaster
Probes Satan, ipsweep, nmap, portsweep 3.2. Clustering based on sensor nodes weight
Initially clustering of sensor nodes is done which is used mostly

Table 2 in topology control and sensor hierarchical routing where base sta-
Simulation parameters.
tion receive and forward the packet or data to the MANET sink. To
PARAMETERS SYMBOL VALUE perform clustering the stratified sampling of sensor weight is uti-
Nodes N 5-200 in steps of 20 lized that performs clustering based upon the sensor nodes weight.
Simulation area X*Y 100*100 In WSN let us consider there are number of sensor nodes S described
Transmission range r 5-200 in steps of 20 as in (1)
Weights (w1 ,w2, w3 ,w4 ,w5 ,w6 ) (0.1,0.05,0.1,0.05,0.3,0.4)
Bits K 256 S = S1 , S2 , ......, Sm (1)
Electronics energy E elec 50nJ/bit
Amplifier energy E amp 100pJ/bit/m2
Where m describes the total number of sensor nodes in WSN. Before
Energy for aggregation EDA 50nJ/bit
performing clustering the lifetime, energy consumption, transmis-
sion power, distance or the communication range, delay of sensor
node and trust value are estimated. Initially the life time of each
A denial-of-service (DoS) is any type of attack where the attack- sensor nodes is given as in (2)
ers (hackers) attempt to prevent legitimate users from accessing
the service. In a DoS attack, the attacker usually sends excessive Enet
L(m) = (2)
messages asking the network or server to authenticate requests Econ
that have invalid return addresses. The network or server will not Where L(m) denotes the life time of every sensor nodes, Enet is the
be able to find the return address of the attacker when sending the energy in MANET, Econ is the energy consumed. Consequently, the
authentication approval, causing the server to wait before closing distance between two nodes (say 1, 2) is given by dis (S1 , S2 ) along
the connection. When the server closes the connection, the attacker with estimation of hop count. The hop count denotes the number
sends more authentication messages with invalid return addresses. of next hops required to reach the base station from the sink or
Hence, the process of authentication and server wait will begin destination and is given as in (3)
again, keeping the network or server busy.

A remote attack is a malicious action that targets one or a net- 1 if next hop = base station
work of computers. The remote attack does not affect the computer Hopcount =
1 + Hopcount next hop = other than base station
the attacker is using. Instead, the attacker will find vulnerable (3)
points in a computer or network’s security software to access the
machine or system.
A delay De is said to occur when the time between departure of a cumulative time T v , the initial energy E v , Distance between Base
collected packet from a source and its accession to the base station station to each sensor node BS v , the six coefficients w1 to w6.
and is given as in (4) STEP1: Find the neighbors (node degree) Nv of each node v,
within Rv .
De = (Deque + Detra ) × Hopcount (4)
Nv = {v|dis tan ce(v, v) ≤ Rv }
Where Deque is queuing delay and Detra is transmission delay.
Then clustering is done by which lifetime of MANET is maximized STEP 2: Compute the degree difference v = |dv − M| for every
which is the time from deployment of WSN along with minimized node v. M is maximum node degree.
energy consumption of sensor nodes. The moveable destination STEP 3: Compute the sum Dv of the distances between nodes v
sensor node is called as MANET sink. The energy consumption of with all its neighbors.
sensor nodes is minimized by minimizing the distance between
sensor nodes. Consequently, scalability of MANET is improved Dv = {dis tan ce(v, v)
veNv
since any increase in number of sensor nodes will not affect the
performance of WSN, with reduced network traffic. Then cluster STEP4: Compute the mobility speed of every node v by

1
formation is done that includes some characteristics such as cluster
count, cluster density, message count, stability, intra-cluster topol- Mv = (Xt − Xt−1 )2 + (Yt − Yt−1 )2
ogy etc. When more number of clusters with small size cluster is T i=1
distributed, better energy consumption is yielded since the trans- Where (Xt, Yt ) and (Xt -1 , Yt -1 ) are the coordinate positions of node
mission distance is minimized. When the total energy consumption v at time t and t − 1.
is minimized then automatically the lifetime of MANET is increased STEP 5: Assume the cumulative time Tv in which node v has
and the energy consumed is given as in (5). acted as a cluster head. A larger Tv value with node v implies that it
has spent more resources (such as energy).
ECm = ICm × EPR + ICm × EPA + EPT (NextHop) (5) STEP 6: Assume initial energies Ev of each sensor nodes.
Where ECm is the energy consumed, EPR is the energy consump- STEP 7: Calculate distance between Base Station to each sensor
tion due to packet received, EPA is the energy consumption due to nodes.
packet aggregation, EPT is the energy consumption due to packet

MBS−v = (XBS − Xv )2 + (YBS − Yv )2
transmitted, ICm is the number of sensor nodes sue to inter-cluster
topology. Also the transmission power TPm is measured in joules Where, (X BS , YBS ) and (X v , Yv ) are the coordinate positions of base
in order to estimate the weight of the nodes by stratified sampling station and each sensor node respectively.
based on nodes weight as in (6) STEP 8: Calculate the combined weight
xNNm + yECm + z Wv = w1 v + w2 Dv + w3 Mv + w4 Tv + w5 Ev + w6 BS v
Wem = ; (x + y + z ≤ 1) (6)
TPm
Where,
Where Wem is the weight of sensor nodes. Based upon this weight w1 = 0.1-weight of Degree difference (v )
obtained, clustering is performed in the sensor networks result- w2 = 0.05- weight of Sum Dv of the distances between node v
ing in different cluster formation, minimum number of clusters C with all its neighbors
(1–10) with maximum number of sensor nodes and is given in (7) w3 = 0.1-weight of Mobility speed of every node (Mv )
as w4 = 0.05-weight of Cumulative time (Tv )
C = C1 , C2 , ....., Cn (7) w5 = 0.3-weight of Initial energy (Ev )
w6 = 0.4-weight of Distance between Base Station to each sensor
Where n describes the total number of clusters. This processing for node (BS v ).
determining clusters is given in algorithm 1 and the sensor nodes STEP 9: Choose the node with a minimum Wv as the cluster head.
before and after clustering is given in Fig. 3. STEP10: Consider the nodes which are there within the trans-
mission range as member/follower nodes of that cluster.
Algorithm 1. Trust evaluation
STEP 11: First cluster formation
Input: Array of nodes, node ID value, list of neighbors STEP 12: Eliminate the chosen cluster head and its neighbors
Output: Trust value is set for all the nodes. from the set of original sensor nodes.
Parameter used: Node forwarded, node dropped, node misrouted, node STEP 13: Repeat 1–12 for the remaining nodes until each node
falsely injected is assigned to a cluster.
Step 1;Collect data forRp , Sp , f, d, m, i
Step 2: Assign the threshold values associated to each behavior fn , dn , mn , in
Step 3: Calculate ratio fs , ds , ms , is of each behavior and Rp , Sp ,total sent or 3.3. Adaptive chicken swarm optimization for CH selection
received packet accordingly
Step 4: Calculate the deviation fd , dd , md , id from the corresponding threshold After formation of clusters C the CH for each cluster is to be
fs = f/Rp andfd = fn /fs
ds = d/Rp anddd = dn /ds
detected by aid of ACSO algorithm. By use of this adaptive algo-
ms = m/Rp andmd = mn /ms rithm the time consumption for selecting CH is reduced due to
is = i/Sp andid = in /is stratified sampling weight for performing clustering. Formally the
Step 5: Calculate the Corresponding direct trust value using the formula, fitness value for determining CH is given in (8)
Trust(t) = (wl ∗ fd ) − (w2 ∗ dd ) + (w3 ∗ md ) + (w4 ∗ id ).
Where w1, w2, w3, w4 – pre-defined weights.
IC Dis tan ce
Fitness = ERm + (m − n) + + (8)
n n
Algorithm 2. Clustering by stratified Sampling based on Nodes
Weight Where ERm is the remaining energy. The trust value is estimated as
in (9)
Input: A set of sensor nodes, each with the same transmission
radius Rv , degree difference v , sum Dv , of the distances between packets correctly forwarded
Trust value = (9)
node v with all its neighbor’s, mobility speed M v , its individual total packets forwarded
Fig. 3. Clustering based on Nodes Weight.
Table 3
Features in KDD Cup 1999 Dataset.
Feature index Feature name 21 is hot login
1 duration 22 is guest login

2 protocol type 23 count
3 service 24 srv count
4 flag 25 serror rate
5 src bytes 26 srv serror rate
6 dst bytes 27 rerror rate
7 Land 28 srv rerror rate
8 wrong fragment 29 same srv rate
9 urgent 30 diff srv rate
Fig. 4. CH selection by adaptive Chicken Swarm Optimization Algorithm. 10 hot 31 srv diff host rate
11 num failed logins 32 dst host count
12 logged in 33 dst host srv count
Finally the updated fitness value is given as in (10) 13 num compromised 34 dst host same srv rate
dis tan ce 14 root shell 35 dst host diff srv rate
degree of node + n + EC + trust value 15 su attempted 36 dst host same src port rate
updated fitness = (10)
number of factors 16 num root 37 dst host srv diff host rate
17 num file creations 38 dst host serror rate
With this estimation CH is finally determined which is responsi- 18 num shells 39 dst host srv serror rate
ble for distributing energy among sensor nodes evenly throughout 19 num access files 40 dst host rerror rate
MANET. The steps involved in CH estimation by adaptive Chicken 20 num outbound cmds 41 dst host srv rerror rate
Swarm Optimization Algorithm is explained in algorithm 3 and also
CH for WSN is expressed diagrammatically in (5) (Fig. 4).
STEP 5: Energy used in all non-CH node is
Algorithm 3. Algorithm for cluster head selection
ETOT nonCH = (N − NC).Ernon CH
N number of nodes.
STEP 6: Total energy consumption is sum of energy used in all
CH node and energy used in all non-CH node.
ETOT = ETOT CH + .ETOT nonCH
3.4. Rotated random forest for discarding features
After CH determination, classification is done to which only

selected features are considered and other features are to be dis-
carded. In order to discard additional features other than important
one, feature reduction is done with aid of RRF algorithm. The 41
different features available in dataset are given as in Table 3.
RRF is responsible for reconstruction of a large dataset into a
dataset with only required features and is done in training set.
In the first stage the feature set is split into different subsets by
3.3.1. Energy computation algorithm
transformation algorithm and after that the extracted feature set
STEP 1: Initialize all the first order radio model parameters
is reconstructed by preserving all components. Distinct feature set
STEP 2: Calculate the energy consumed by CH node in receiving
splits direct to distinct rotations and as a consequence, distinct clas-
data signals from its members
sifiers are acquired. Hence the evidence of data is scattering is saved
2
ECH = (Eelec .k.CH degree + EDA .k) + (Eelec ..k + Eamp + dtonexCH .k) in extracted feature space and thus, RRF achieves both diversity and
accuracy. By RRF algorithm the dataset with 41 features is reduced
STEP 3: Energy used in all CH node is
by considering only the important features for a particular attack as
2 described in Table 4. All other attributes other than this prescribed
ETOT CH = (Eelec ..k) + (Eamp + RTX .k)
attributes are neglected by RRF. Since only selective features are
Table 4 Table 6
Attributes after Feature Reduction by RRF. Estimated Values for Proposed System.
Attack Categories Selected Attributes Parameters Value
NORMAL {1-41} Specificity 0.9756

DOS {2-8, 10,12,13,22-41} Sensitivity 0.22
PROBE {1,2,3,5,6,10,12,23,24,25,26 - 41} Accuracy 0.84
U2R {1-6,10-14, 16-19,23-25,27-30,32-37,39-41} FPR 0.0243
R2L {1,2,3,4,5,6,9-19,22-28,31-41} FDR 0.33
Error Rate 0.16
Precision 0.666
Table 5
Prevalence 0.18
Attack Classes.
Attack class Attack techniques

Hello Flood Attack: Many protocols necessitate sensor nodes to
DOS Jamming, Selected Forwarding, Tampering
U2R Hello Floods attack broadcast “Hello” packets to broadcast themselves to their neigh-
R2L Wormholes, Sybil, Acknowledgement Spoofing bors and a sensor node receiving such a packet accepts that it is
Probe Altered Routing Information Attacks, Replayed Routing within the radio range of the sender. A defender with high power
Information, Spoofed Routing Information Attack, Sinkhole
antenna can influence every sensor node in the network that it is
their neighbor.
utilized for further processing the time consumption is reduced to Wormholes Attack: This is a critical attack where the attacker
a greater extend. archives the packet at one location in the network to those another
location into the network.
3.5. Classification of malicious node by adaptive SVM classifier Sybil Attack: In this attack, the malicious sensor node collects
several identities for pretention as a group of many sensor nodes
After discarding irrelevant features the updated feature set from instead of one. Apart from a routing attack this is also considered
RRF undergoes a two – stage classification algorithm is that is as a cryptographic attack that is responsible for separating trust
responsible for detecting different attacks. A supervised machine between multiple parties.
learning classification algorithm called adaptive SVM classifier is Acknowledgement Spoofing: Many sensor network routing algo-
utilized for classification of attacks whether it is attacked or attack rithm depend on implicit or explicit link layer acknowledgements.
free. This SVM classifier is responsible for producing better results This acknowledgement can be copied so that other sensor nodes
when compared to other classifier. In this work a two stage SVM believe a weak link to be strong resulting in packet loss while trav-
classifier is implemented which includes two stages for classifica- elling in this links.
tion. In first stage an acknowledgement message is sent to detect DOS Attack: This attack tries to weaken the resources available to
whether the corresponding sensor node is malicious or not. Based the target sensor node, by sending superfluous redundant packets
on acknowledgement received, in the second stage classification and thus averts genuine network users from accessing services to
is done to detect different types of attacks. Initially the dataset is which they are entitled.
classified whether it is a normal node of attacked attack. If it is an To detect this different attack types identified in MANET the
attacked node further processing is done to detect the type of attack anomaly based IDS with two –stage SVM classifier comprises of
and perform corresponding action based on attack identified. four stages such as
For this classification the supervised learning classification algo-
rithm two –stage SVM classifier utilizes an Intrusion Detection 1 Intrusion detection
System (IDS) to detect the attacks. There are two different types 2 Attack identification
of IDS identified such as anomaly based detection and signature 3 Intruder identification
based detection. In this work, anomaly based detection method is 4 Intrusion response
utilized since it detects the known and unknown attacks identified
in the dataset whereas the latter cannot detect unknown attacks. 3.5.1. Intrusion detection
The four attack classes recognized such as Probe, DOS, U2R and R2L Initially in this phase CH in coordination with network char-
are given as in Table 5. acteristics and anomaly based IDS, identifies intrusion in MANET.
The attack techniques recognized are described as This anomaly based IDS tests the distance measure and stores the
Altered/Replayed or Spoofed Routing Information Attack: These observed values for every time interval. For every time interval CH
types of attack aims the routing information transferred between performs hypothesis testing with null hypothesis for every node
the sensor nodes. The defender may alter/ replay or spoof routing parameters. The hypothesis testing of all parameters is given by
information with the objective to interrupt network traffic. These Eq. (11).
interruptions comprises attraction or repulsion of network traffic,
⎛ ⎛ 2 ⎞⎞
i
⎜M ⎜
wYki − wY k
creation of routing loops, source sensor node extension or short- ⎟⎟
ening, partitioning of network, fake error message generation and 2 [w] = ∀w ⎝ ⎝ ⎠⎠ (11)
k=1 i
increased end-to-end latency which leads to increased congestion wY k
in traffic.
Sinkhole Attack: This attack works by constructing a compro- Where wYki is a set of random variable in ith time interval, k repre-
mised sensor node that seems to be attractive to the surrounding sents features of nodes. An expected value is predicted for overall
sensor nodes leading to the neighboring sensor nodes assume that null hypothesis and if it is rejected then it is assumed that intru-
the compromised sensor nodes is the best path to their destinations. sion has occurred and moves to attack identification, if not a moving
Selected Forwarding: This attack occurs when a compromised average value is updated (Table 6).
sensor node globules a packet bound for a particular destination.
In this way the defender selectively filter traffic from a specific part 3.5.2. Attack identification
of the network. When all packets are released it is referred to as a If network intrusion is detected, the CH proceeds to attack iden-
“black hole” attack. tification which utilize a rule-based approach to identify the attack.
The attacks are identified by set of rules given in Table 1. Based on • Sybil attack
the attack features thus the different types of attack such as DOS,
U2R, R2L and Probe are identified. By using radio resources, random key pre-distribution, registra-
tion procedure, verification of position, and code testing Sybil entity
3.5.3. Intruder identifications attacks are detecting.
After identification of attack, the CH initiates intruder identifi-
cation and applies intruder identification rules that are specific to 3.5.3.3. R2l attack. Information or data spoofing: Efficient use of
the known attack. The dataset comprises of known attack listed in the resources. Protects the network even if part of the network is
Table 1 and some other additional unknown attack such as apache compromised, Attacks Information in transit: Provides flexibility
2, httptunnel, mailbomb, mscan, named etc. in the network protects the network, even if part of the network is
compromised, provides authentication measures for sensor nodes,
3.5.3.1. DOS attack. The main feature of DOS attack is Centered on providing message authenticity, integrity and con-
Point to point nodes used to stop avoidance of the jammed fidentiality messages works in the link layer, Semantic security,
region., Utilizes Wormholes to avoid jamming., selected forwarding Replay protection, data authentication, low communication over-
is compromised sensor node globules pocketing a particular desti- head.
nation, The dataset comprises the known training dos attack such The training data set in the R2L attack are guess password, imap,
as, apache, back, land, Neptune, pod, smurf, teardrop, processtable, multihop, phf, spy,waremaster,dictionary,ftpwrite,guest, httptun-
dosnuke, mailbomb, ping of death, sshprocesstable, syslogd, tcp- nel,imap,named,netbus,phf,ppmacro,sendmail,sshtrojon,xsnoop
reset, udpstorm, jamming, selected if above the attack are known is satisfied the condition in classification technique it is said to be
attack which is consider as a DOS attack. R2L attack.
Some DOS attack description are
3.5.3.4. Probe attack. Altered Routing Information, Attack, Reply
i Apache-2
Routing Information, Spoofed Routing information attack,
The training features dataset in probe attacks are inside sniffer,
The Apache2 attack is a denial of service attack against an apache IPsweep, IPdomain, mscan, NTinfoscan, nmap, quesosaint, Satan.
web server where a client sends a request with many http headers. The above major four attacks are DOS, R2L, U2R and probe are
If the server receives many of these requests it will slow down, and the features of various attacks are trained and tested for the based
may eventually crash on known attack, finally find whether node is attack or normal
• Back using proposed classification technique,if suppose unknown attack
are found Let us consider unknown attack, looping attack, it is the
main cause circulation of data in a particular region in the network.
In this denial of service attack against the Apache web server, an
This attack stops data to send the destination node and revolve in
attacker submits requests with URL’s containing many frontslashes.
the same region which increases network traffic as well as causes
As the server tries to process these requests it will slow down and
latency, it is consider as malicious node, send random value that
becomes unable to process other requests
may (or) may not coincide with the value sent by the good node.
• Land Attack Since in this work dealing with unknown attacks, clustering algo-
rithm are trained with data that have no traces of attacks, in this
The Land attack occurs when an attacker sends a spoofed SYN paper already approach the performance when attack are present
packet in which the source address is the same as the destination during the training. Suppose attack is unknown that have been are
address treated as malicious. Compromised node (Table 1) do not perform
any malicious activity based on the result they should remain iso-
• Smurf attack lated until the security response system deal with the attack. (e.g.)
until the base station change their id. Following intruder identifi-
The Smurf attack can be identified by an intrusion detection sys- cation, an anomaly IDS respond to intrusion. However deficiencies
tem that notices that there are a large number of ’echo replies’ being occur and therefore, to improve the overall effectiveness of the
sent to a particular victim machine from many different places, but protection mechanism an intrusion response scheme is involved.
no ’echo requests’ originating from the victim machine.
3.5.4. Intrusion response
• Mail bomb
Consequently, flexible intrusion response scheme is given
describing a set of intrusion response actions for MANETs. The
A Mail bomb is an attack in which the attacker sends many mes- intrusion response is also implemented by CH where initially
sages to a server, overflowing that server’s mail queue and possible the confidence level of the attack is detected based on detection
causing system failure. information, utilizing detection information. Then the network per-
formance degradation is evaluated by utilizing percentage change
3.5.3.2. U2R attack. Hello flood: Two-directional verification and in parameters that provides measure of severity of the attack.
multiple base station routing and multi- routing are used. And also Finally a response action is selected and the necessary actions
adopts a secret, probabilistic, sharing compartment, for intrusion response are taken. This information are given by
The training attack are buffer overflow, load module, Perl, root means of a decision table. To enhance the effectiveness of intrusion
kitanypw, casesn, eject, ftbconfig, fdformat, load module, ntfsdos, response and to reduce its adverse effects on MANET, the effect of
perl,ps, sechole,xterm, vaga intrusion is analyzed. After detection of effect the following actions
Some features description of U2R attack are performed.
• Black-hole attacks 3.5.4.1. Isolation. In this response action all nodes in the network
completely isolate it from the network (MANET) immediately by
Uses geographic routing and takes advantage of being the sender imposing restriction by data forwarding and routing service. Sensor
to see the nearer transmission and detects black-hole attacks nodes do not forward any data packets, routing packets originating
from destined to the intruding node and ignore all routing pack- of packets or other criteria for defect and based on that a particular
ets originating from intruding node. Probabilistic Isolation is also solution is given based on steps in intrusion response. After detec-
done where nodes do not isolate the intruder completely instead tion of attacks additionally security mechanism is carried in other
some restriction are applied in terms of forwarding its data. Here sensor nodes to prevent the nodes.
the sensor nodes only forward some of the intruding node’s data
packets, with a specified probability and do not send any routing 3.6. High level security mechanism enchancing security
packets through the intruder.
The non – malicious sensor nodes recognized from the two stage
3.5.4.2. Route around attacker. Here nodes route data packets classification process is then additionally secured with aid of high
around the intruding node to stop attacks from intruding node level security mechanism that includes
while allowing the intruder to forward data packets for other nodes.
For this intrusion response nodes allows the intruder to forward • Secure group management
data packets for other nodes in the network for existing routes. • Secure data aggregation
Nodes process these data packets so that they will reach their des-
tinations and will include intruder in new route discoveries. Also 3.6.1. Secure group management
ignore all routing packets generated and forwarded by intruder. After clustering and detection of attacks in WSN it is necessary
to incorporate a security mechanism for secure communication
3.5.4.3. Service denial. By this mechanism sensor nodes deny between each sensor node. The CH which is receiving a data from
services provided to by the intruder while utilizing it as an interme- another cluster has to authenticate the data by group key manage-
diate router. Here the sensor nodes do not forward any data packets ment. The key management technique is done by a cryptographic
to intruding node then it ignores any further services provided to function including encryption and decryption with aid of keys usu-
other nodes in the network. Finally it allows data packets to be ally 16 byte of length that provide secure transmission of data or
routed through intruder nodes i.e. use the intruder as an interme- packet.
diate router in the network. In some cases when the attack is not
severe the attack is simply ignored. 3.6.2. Secure data aggregation
Data or packet aggregation (fusion) is necessary in sensor net-
3.5.4.4. Relocation. By another response action a node is physi- work to reduce the amount of data transmitted to the base station.
cally moved so that it is closer to intruder node before isolating The aggregator technique is responsible for creating proof of neigh-
the intruder. The network topology information is required by this bor’s data that verify the purity of the collected data to the base
approach to identify critical nodes in the network, and also requires station. Therefore by the implementation of this efficient technique
the network to be able to command its nodes to move as required. the overall time consumption, energy consumption and bagging
error is reduced with accurate classification performance. Subse-
Algorithm 4. Intrusion Response Mechanism
quently, this proposed technique shows better improvement in
performance than the existing techniques.
4. Results
The security issues in MANET combined with data mining is

analyzed where initially clustering is done with aid of stratified
sampling based on nodes weight. By this algorithm the time con-
sumption is reduced along with increased scalability subsequently
CH selection is done. This is done with aid of adaptive chicken
swarm optimization algorithm, then rotated random forest algo-
rithm is implemented to discard additional features. Then a two
stage supervised learning classification method is executed to pro-
vide an acknowledgement for detection of sensor nodes pursued
by high level security mechanism for providing secure routing. The
proposed methodology is implemented in PYTHON platform. The
system configuration is given as
Operating System: Windows 7 Ultimate
Processor : Intel ® Pentium ® CPU G2030 @ 3.00 GHz 3.00 GHz
RAM : 4 GB
Node Placement: Random
Number of Nodes: 50
4.1. Simulation results
Initially the sensor nodes is created in a MANET (say 50) and the
estimation of parameters such as the lifetime, energy consump-
tion, distance between the nodes, hop count, delay, intercluster
topology, weight, transmission power etc. are estimated with aid
of equation specified.
Thus the known attacks are provided a better solution based on 4.1.1. Encryption time
these predictions given by intrusion response. Then to detect the The encryption time of packet sent is the time calculated during
unknown attacks the effect of it is first analyzed such as dropping encryption i.e. during conversion of plain text to cipher text by High
Level Security algorithm. The encryption time is calculated by using

the formula in (13)
data size
time = (12)
speed
4.1.2. Decryption time

The decryption time of packet sent is time calculated during
decrypting i.e. during the conversion of cipher text to plaintext by
proposed High Level Security algorithm. The decryption time is also
calculated by using the formula in (13).
4.1.3. Intrusion detection parameters

The accuracy of proposed system is given based upon the True
Positive (TP), True Negative (TN), False Positive (FP), False Nega-
tive (FN), Predicted yes value(N) and actual yes value (P) in terms Fig. 5. Performance parameters under black hole and wormhole attack.
of different parameters such as specificity, sensitivity, accuracy,

False Positive Rate, Positive Predictive Value, Negative Predictive 4.1.3.6. Misclassification rate (error rate). Error rate or misclassifi-
Value, Fault Detection Rate, Matthews Correlation Coefficient as in cation rate is responsible for classification test to predict the error
(13–20) occurred and is estimated as the ratio of sum of false positive and
False Negative to the sum of Positive and negative value and is given
4.1.3.1. Specificity. Specificity is a statistical method to measure the as in (18)
performance of classification test. Specificity is responsible to mea- FP + FN
sure the accuracy provided that a specific class is set. The specificity Error rate = (18)
P+N
is the probability of getting a negative result when attack is truly
negative. Further it is defined as in (13) 4.1.3.7. Precision. Precision is estimated as the ratio of True Nega-
TN tive to negative value and is given as in (19)
Specificity = (13)
TN + FP TN
Precision = (19)
N
4.1.3.2. Sensitivity. Sensitivity is another statistical method to
measure the performance of a classification method. Sensitivity or 4.1.3.8. Prevalence. Prevalence is estimated as the ratio of positive
recall is a measure of the ability of a prediction model to select value to the sum of positive value and negative value and is given
instances of a certain class from a dataset. The sensitivity is the as in (20)
possibility of providing a positive result when the attack is defi- P
nitely positive. Further it is defined as the rate of correctly classified Prevalence = (20)
P+N
events among all events identified and is given as in (14)
Fig. 5 describes about the performance comparison for accuracy,
TP misclassification rate, False Positive Rate and False Detection Rate
Sensitivity = (14)
TP + FN for black hole attack and wormhole attack.
4.1.3.3. Accuracy. Accuracy is the most essential metric for defin- 4.2. Malicious attack before prevention and after prevention
ing classification system performance. It is also used as a statistical using different parameter strategy
measure of how well a classification test appropriately classifies
or removes a condition. It is taken as the ratio of the number of The malicious node attack (i.e. Known and unknown attacks)
correctly classified samples to the total number of samples (i.e. before prevention similarly after prevention node the different
accuracy is the proportion of true results both TP and TN among parameter values are calculated such as Throughput, End to End
the total number of cases examined and is given in (15) Delay, Transmit Energy, Distance, channel load, buffer occupancy,
Bandwidth, Bit Error Rate, Packet Delivery Ratio and QOS.
TP + TN
Accuracy = (15)
P+N
4.2.1. Throughput
Throughput is the rate of successful message delivery over a
4.1.3.4. False positive rate (FPR). FPR is responsible for classification
communication channel. Throughput is usually measured in bits
test and is estimated as the ratio of FP to the sum of false positive
per second (bit/s or bps), and sometimes in data packets per second
and True Negative and is given as in (16)
(p/s or pps) or data packets per time slot. The throughput formula
FP is given below
FPR = (16)
FP + TN Sum ((no.ofsuccessfulpackets) ∗ (avg.packet size))
Throughput = (21)
Time
4.1.3.5. Fault detection rate (FDR). FDR is responsible for classifica-
In the above Table 7 that described the parameter value of
tion test to predict the detection rate and is estimated as the ratio
throughput with different number of node which indicates the
of FP to the sum of false positive and True Positive and is given as
before prevention of malicious node that comprises of known and
in (17)
unknown attack. After prevention of malicious node using pre-
FP vention mechanism the value of throughput is increased which is
FDR = (17)
FP + TP clearly plotted in the Fig. 6.
Table 7
Parameter value of Throughput.
Attack Types improved AODV with

PARAMETER No.of Node attack prevention but
Known Attack Unknown improvement in
Attack overhead
DOS U2R R2L PROBE
20 56.888 51.2 53.895 53.89 56.895 56.888

40 53.895 51.2 56.889 56.888 54.742 56.511
Throughput 60 51.2 51.2 56.888 53.895 55.315 56.882
80 60.235 51.895 51.221 51.895 56.889 56.818
100 53.895 50.96 51.895 51.2 53.895 56.38
Fig. 6. performance value of Throughput under before and after prevention. Fig. 8. performance value of Transmit Energy with and without IDS.
4.2.3. Transmit energy

During the process that sensor nodes communicate with sink
nodes, data from sensor nodes will be delivered to key nodes firstly
which linked directly with sink nodes. Key nodes have to pass the
data from sensor node. The total energy cost per key node for one
time of inter node communication will be calculated as
Transmit Energy = node weight × number of Execution time (23)
In the above Table 9 indicate the value of Transmit Energy with

different number of node which calculate the values before pre-
vention of malicious node that comprises of known and unknown
attack. After prevention of malicious node using prevention mech-
anism of Intrusion Detection System(IDS) our proposed AODV
Protocol. The value of Transmit Energy is increased. Which is clearly
plotted in the Fig. 8.
Fig. 7. performance value of End to End Delay with before and after prevention.
4.2.4. Distance
The distance formula is used to find the distance between two
4.2.2. End to end delay points. These points can be in any dimension. Which indicates the
End-to-end delay is referring to the time taken for a packet to distance between the starting node and ending node for data trans-
be transmitted across a network from source to destination. It is a mission of the packet for communication in the AODV protocol. The
common term in IP network monitoring, and differs from round- distance formula is given below
trip time (RTT) in that only path in the one direction from source
to destination is measured. The formulation of end to end delay is Dis tan ce = x2 − x1) + (y2 − y1 ) (24)
given below
In the above Table 10 which indicate the parameter values of
dend−end = N dtrans + dprop + dproc + dqueue (22) distance between two nodes that are starting node and ending node
for communication. The values are calculated before prevention of
Where, dend−end is the end-end delay, dtrans is the transmission malicious node that comprises of known and unknown attack using
delay, dprop is the propagation delay, dproc is the processing delay different number of nodes. After prevention of malicious node using
and dqueue is the queuing delay. prevention mechanism the value of distance is increased which is
In the above Table 8 indicate the value of End to end delay with described in the Fig. 9.
different number of node which calculate the values before pre-
vention of malicious node that comprises of known and unknown 4.2.5. Channel load
attack. After prevention of malicious node using prevention AODV In channel load distribution of workloads across multiple com-
Protocol mechanism the value of End to end delay is increased puting resources, in network links Channel Load aims to optimize
which is clearly plotted in the Fig. 7. resource use multiple components with load balancing instead of a
Table 8
Parameter value of End to End Delay.

Attack overhead
DOS U2R R2L PROBE
20 1.7578 1.7578 1.7578 1.7578 1.8554 1.8514

End
40 1.8554 1.8554 1.6601 1.7575 1.7578 1.8325
to
60 1.8554 1.8554 1.75785 1.6601 1.6601 1.8354
end
80 1.6601 1.7578 1.7578 1.7578 1.7578 1.8421
delay
100 1.8554 1.6601 1.6601 1.7578 1.7578 1.8395
Table 9
Parameter value of Transmit Energy.

Attack overhead
DOS U2R R2L PROBE
20 0.1343 0.4032 0.2688 0.2687 0.3095 0.4102

40 0.1344 0.2688 0.1547 0.3096 0.3547 0.4012
Transmit
60 0.4032 0.3096 0.3096 0.2688 0.4032 0.4124
Energy
80 0.1344 0.3095 0.4032 0.3095 0.1343 0.4084
100 0.1343 0.2688 0.2687 0.4032 0.2032 0.4124
Table 10
Parameter value of Distance.

Attack overhead
DOS U2R R2L PROBE
20 0.9314 0.9314 0.2624 0.2623 1.1417 1.0125

40 1.0952 1.0952 0.1475 0.2952 0.9798 1.1214
Distance 60 1.0952 1.1315 0.2952 0.2624 0.9798 1.1417
80 0.73781 0.9798 0.3936 0.2951 0.9314 1.1417
100 1.0952 0.9314 0.2951 0.3936 0.7378 1.1417
Fig. 9. performance value of Distance with and without IDS before and after pre- Fig. 10. performance value of Channel with before and after prevention.
vention.
4.2.6. Buffer occupancy
single component may increase reliability and availability through Source node broadcasts REQ packet to the destination through
redundancy. Load channel formula are the intermediate neighbor node. In response to the REQ packet the
destination sends RREP packet along with buffer occupancy to the
no. ofRequest intermediate node. Then source node chooses the best path using
channel load = (25)
no of sloted node shortest distance and buffer occupancy. Through the shortest path
urgent data packets are routed. And for the rest best disjoint paths
In the above Fig. 10 represents the performance of Load Chan-
are chosen for normal data packet transmission. The formula for
nel of different node using before prevention of malicious attack
buffer occupancy is
node which consist of known and unknown attack after preven-
tion of the malicious node using prevention technique i.e. IDS in Transmit Energy
Buffer occupancy = (26)
AODV protocol the performance is increased when compared to channel load
before prevention of malicious attack.in the Table 10 represents In the above Fig. 11 indicates the performance comparisons of mali-
the increased value of channel load before prevention and after cious node before prevention and after prevention that comprises
prevention (Table 11). of known attack and unknown attack after prevention of mali-
Table 11
Parameter value of Channel Load.

Attack overhead
DOS U2R R2L PROBE
20 0.1312 0.3936 0.2624 0.2623 0.2952 0.2952

40 0.1312 0.2624 0.1475 0.2952 0.1475 0.2912
channel load 60 0.3936 0.2952 0.2952 0.2624 0.1475 0.2834
80 0.1311 0.2951 0.3936 0.2951 0.1311 0.2971
100 0.1311 0.2624 0.2951 0.3936 0.2624 0.2907
Table 12
Parameter value of Buffer Occupancy.

Attack overhead
DOS U2R R2L PROBE
20 0.5122 0.5123 0.5121 0.5121 0.5244 0.5643

40 0.5121 0.5121 0.5243 0.5243 0.5123 0.5243
buffer
60 0.5121 0.5243 0.5243 0.5121 0.5123 0.5554
occupancy
80 0.5121 0.5243 0.5123 0.5243 0.5121 0.5264
100 0.5121 0.5121 0.5121 0.5123 0.5121 0.5263
Fig. 11. performance value of Buffer Occupancy with before and after prevention. Fig. 12. performance value of Bandwidth with before and after prevention.
cious node using prevention mechanism the AODV protocol has interval. Bit error ratio is a unit less performance measure, often
increased their performance value of Buffer Occupancy the varia- expressed as a percentage
tion values are indicated in the Table 12.
2Eb
Bit Error Rate(BER) = Q ∗ (28)
N0
4.2.7. Bandwidth
Bandwidth is defined as the amount of data that can be transmit- Where, N0 is the noise spectral density and Eb is the energy per bit.
ted in a fixed amount of time. the bandwidth is usually expressed In the above Table 14 that described the parameter value of
in bits per second(bps) or bytes per second. For analog devices, Bit Error Rate with different number of node which indicates the
the bandwidth is expressed in cycles per second, or Hertz (Hz). the before prevention of malicious node that comprises of known and
formula for bandwidth is unknown attack. After prevention using prevention mechanism the
value of Bit Error Rate is decreased. that is clearly plotted in the
Transmit Energy Fig. 13.
Bandwidth = (27)
Channel Load
In the above Table 13 that described the parameter value of 4.2.9. Packet delivery ratio
Bandwidth with different number of node which indicates the The calculation of Packet Delivery Ratio (PDR) is based on the
before prevention of malicious node that comprises of known and received and generated packets as recorded in the trace file. In gen-
unknown attack. After prevention using prevention mechanism eral, PDR is defined as the ratio between the received packets by
the value of Bandwidth is increased when compared to before the destination and the generated packets by the source. Packet
prevention of different node. The comparison graph indicates the Delivery Ratio is calculated using formulae are
performance value of bandwidth in the Fig. 12. send data
packet delivery ratio = × 100 (29)
received data
4.2.8. Bit error rate (BER) In the above Table 15 that described the parameter value of
The bit error ratio (also BER) is the number of bit errors divided Throughput with different number of node which indicates the
by the total number of transferred bits during a studied time before prevention of malicious node that comprises of known and
Table 13
Parameter value of Bandwidth.

Attack overhead
DOS U2R R2L PROBE
20 32.3624 32.3624 32.3624 32.3635 29.0467 36.6725

40 29.0467 29.0467 36.2825 32.3624 36.2825 36.2832
bandwidth 60 29.0467 29.0467 32.3624 36.2825 36.2825 36.2825
80 36.2825 32.3135 32.3624 32.3435 32.3635 36.2847
100 29.0467 29.0467 29.0321 32.3624 36.2825 36.2754
Table 14
Parameter value of Bit Error Rate.

Attack overhead
DOS U2R R2L PROBE
20 1.8629 1.8629 1.8629 1.8628 2.2835 1.4124

40 2.1905 2.1905 1.57717 1.9598 1.4756 1.4214
Bit error rate
60 2.1905 2.2835 1.9598 1.4756 1.8629 1.4124
(BER
80 1.4756 1.9597 1.8629 1.9597 1.8687 1.4745
100 2.1905 1.8629 1.9598 1.8629 1.4756 1.4612
Table 15
Parameter value of Packet Delay Ratio.

Attack overhead
DOS U2R R2L PROBE
20 89.5404 92.9934 89.937 92.8806 88.025 97.8618

40 93.0311 95.484 90.4276 95.3712 93.211 97.8512
Packet
60 95.5216 87.9746 92.9182 97.8618 94.723 97.8325
Delivery Ratio
80 88.0122 91.4652 95.4088 93.3524 94.601 97.8741
100 90.5028 92.9558 89.8994 82.843 94.783 97.7422
Fig. 13. performance value of Bit Error Rate value for before and after prevention. Fig. 14. performance value of Packet Delay Ratio with before and after prevention.
unknown attack. After prevention using prevention mechanism the

In the above Table 16 that described the parameter value of Qual-
value of throughput is increased when compared to before preven-
ity of Service with different number of node which indicates the
tion which is clearly plotted performance value of Packet Delay
before prevention of malicious node that comprises of known and
Ratio in the Fig. 14.
unknown attack. After prevention using prevention mechanism the
value of Quality of Service is increased. Which is clearly plotted in
4.2.10. QOS (quality of service)
the Fig. 15.
Packet-switched telecommunication networks, quality of ser-
vice refers to traffic prioritization and resource reservation control
mechanisms rather than the achieved service quality. Quality of 4.3. Comparison results
service is the ability to provide different priority to different appli-
cations, users, or data flows, or to guarantee a certain level of The efficiency of proposed methodology is compared with the
performance to a data flow. existing algorithms with parameters such as encryption time,
p (z × d) z
decryption time and detection rate. The encryption time and
QOS = exp
1
× (30) decryption time comparisons are given for the proposed high level
c zi security mechanism along with the existing security algorithms
Table 16
Parameter value of Quality of Service.

Attack overhead
DOS U2R R2L PROBE
20 68.7686 68.7442 65.8615 64.6751 96.7508 96.8452

40 58.7501 68.9857 54.8479 65.985 54.8479 96.7961
QoS Analysis 60 73.0694 67.2781 65.9859 76.5521 68.7686 96.7234
80 54.6271 65.9923 68.7442 65.9923 65.8732 96.7214
100 68.7686 78.7442 76.3547 68.7442 76.5592 96.7124
Table 17
Comparison for Encryption Time and Decryption Time.
Methods Encryption time (sec) Decryption time (sec)
High Level Security Mechanism (Proposed) 0.17476 0.17472

ABE 0.237 0.246
IBE 0.498 0.399
PBE 0.475 0.376
Fig. 18. Comparison for Sensitivity.
Fig. 15. performance value of Quality of Service before and after prevention.
Fig. 19. Comparison for Specificity.

Fig. 16. Comparison for Encryption Time.
Fig. 20. Comparison for FDR.

Fig. 17. Comparison for Decryption Time.
proposed algorithm is described efficient by comparing with other

existing approaches.
References
[1] A. Mehmood, S. Khan, B. Shams, J. Lloret, Energy-efficient multi-level and

distance-aware clustering mechanism for WSNs, Int. J. Commun. Syst. 28 (5)
(2015) 972–989.
[2] A. Mehmood, J. Lloret, S. Sendra, A secure and low-energy zonebased wireless
sensor networks routing protocol for pollution monitoring, Wireless
Commun. Mobile Comput. 16 (17) (2016) 2869–2883.
[3] A. Mehmood, M.M. Umar, H. Song, ‘ICMDS: Secure inter-cluster multiple-key
distribution scheme for wireless sensor networks, Ad Hoc Netw. 55 (2017)
97–106.
[4] V. Patel, J. Gheewala, An efficient session key management scheme for cluster
based wireless sensor networks, in: Proc. IEEE Int. Adv. Comput. Conf. (IACC),
2015, pp. 963–967.
Fig. 21. Comparison for Accuracy.
[5] M.M. Umar, A. Mehmood, H. Song, A survey on state-of-the-art
knowledge-based system development and issues, Smart Comput. Rev. 5 (6)
(2015) 498–509.
[6] A. Mehmood, H. Song, J. Lloret, Multi-agent based framework for secure and
such as Attribute Based Encryption (ABE), Identity Based Encryp- reliable communication among open clouds, Netw. Protoc. Algorithms 6 (4)
tion (IBE) and Password Based Encryption (PBE). The comparisons (2014) 60–76.
[7] T. Bass, Multisensor data fusion for next generation distributed intrusion
results are explained diagrammatically in Figs. 16–18 and Table 17
detection systems, in: Proc. Net. Symp. Draft., 1999, pp. 24–27.
which shows the efficiency of the proposed security mechanism. [8] D.J. Power, R. Sharda, F. Burstein, Decision Support System, Wiley, Hoboken,
Further our proposed algorithm is compared and executed with NJ, USA, 2015.
existing classification methods such as Genetic Algorithm (GA), [9] S.-H. Liao, P.-H. Chu, P.-Y. Hsiao, Data mining techniques and applications—a
decade review from 2000 to 2011, Expert Syst. Appl. 39 (12) (2012)
Particle Swarm Optimization (PSO), Artificial Bee Colony (ABC) 11303–11311.
algorithm. These parameters show our intrusion detection system [10] A. Wahid, P. Kumar, A survey on attacks, challenges and security mechanisms
is efficient when compared with existing methods from Figs. 19–21. in wireless sensor network, Int. J. Innov. Res. Sci. Technol. 1 (8) (2015)
189–196.
[11] I. Butun, S.D. Morgera, R. Sankar, A survey of intrusion detection systems in
wireless sensor networks, Ieee Commun. Surv. Tutor. 16 (1) (2014) 266–282.
5. Conclusion [12] A. Sharma, I. Manzoor, N. Kumar, A feature reduced intrusion detection
system using ANN classifier, Expert Syst. Appl. (2017).
[13] S.H. Kang, K.J. Kim, A feature selection approach to find optimal feature
WSN with data mining is described where clustering is done
subsets for the network intrusion detection system, Cluster Comput. 19 (1)
with aid of stratified sampling based on nodes weight. In order to (2016) 325–333.
select the CH from the cluster analyzed ACSO algorithm is utilized [14] B.M. Aslahi-Shahri, R. Rahmani, M. Chizari, A. Maralani, M. Eslami, M.J. Golkar,
A. Ebrahimi, A hybrid method consisting of GA and SVM for intrusion
by which the time consumption is reduced to a greater extend
detection system, Neural Comput. Appl. 27 (6) (2016) 1669–1676.
since in traditional CSO the sampling process is not utilized. To [15] A.J. Malik, F.A. Khan, A hybrid technique using binary particle swarm
reduce the additional features within the sensor nodes RRF algo- optimization and decision tree pruning for network intrusion detection,
rithm is utilized and classification is performed by supervised Cluster Comput. (2017) 1–4.
[16] C. Anand, R.K. Gnanamurthy, Localized DoS attack detection architecture for
learning classification algorithm called adaptive SVM classifier. The reliable data transmission over wireless sensor network, Wirel. Pers.
classifier technique works with aid of anomaly based IDS which Commun. 90 (2) (2016) 847–859.
utilizes acknowledgement method to detect presence of attack [17] P. Kuila, P.K. Jana, Energy efficient clustering and routing algorithms for
wireless sensor networks: particle swarm optimization approach, Eng. Appl.
or not by utilizing a two stage process. To provide secure trans- Artif. Intell. 33 (2014) 127–140.
mission of packets between the sensor networks a High-Level [18] V. Gupta, S.K. Sharma, Cluster head selection using modified ACO, Proceedings
Security Mechanism is utilized. The efficiency of the proposed of Fourth International Conference on Soft Computing for Problem Solving
(2015) 11–20.
algorithm is compared with the existing security mechanisms [19] I. Ahmad, M. Hussain, A. Alghamdi, A. Alelaiwi, Enhancing SVM performance
available by which the encryption time for proposed mechanism is in intrusion detection using optimal feature subset selection based on genetic
0.17476 s and the decryption time is 0.17472 s. Also the detection principal components, Neural Comput. Appl. 24 (7-8) (2014) 1671–1682.
[20] Q. Ni, Q. Pan, H. Du, C. Cao, Y. Zhai, A novel cluster head selection algorithm
rate occurred from the proposed two stage classification process
based on fuzzy clustering and particle swarm optimization, IEEEACM Trans.
is compared with the existing approaches that provided a better Comput. Biol. Bioinform. 14 (1) (2017) 76–84.
result of 0.5. Furthermore, the accuracy results parameters such [21] G. Wang, J. Hao, J. Ma, L. Huang, A new approach to intrusion detection using
Artificial Neural Networks and fuzzy clustering, Expert Syst. Appl. 37 (9)
as specificity, sensitivity, FPR and FDR shows the effectiveness of
(2010) 6225–6232.
our proposed method with other existing methods. The main rea- [22] K. Kalaiselvi, G.R. Suresh, V. Ravi, Genetic algorithm based sensor node
son behind the good effectiveness of the proposed system is that classifications in wireless body area networks (WBAN), Cluster Comput.
it has used set of classifiers collectively known as RRF which has (2018) 1–7.
[23] Walid Balid, Hasan Tafish, Hazem H. Refai, Intelligent vehicle counting and
the ability to increase the prediction accuracy that is not possi- classification sensor for real-time traffic surveillance, IEEE Trans. Intell.
ble in conventional machine learning algorithm. In addition to it, Transp. Syst. 19 (6) (2018) 1784–1794.

A Novel Clustering Approach and Adaptive SVM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Novel Clustering Approach and Adaptive SVM

Uploaded by

Copyright:

Available Formats

Sustainable Computing: Informatics and Systems 23 (2019) 120–135

Contents lists available at ScienceDirect

Sustainable Computing: Informatics and Systems

A novel clustering approach and adaptive SVM classiﬁer for intrusion

Security of data is considered to be one of the most important

Fig. 2. Proposed Approach Illustration.

Table 1 Probe-response attacks are a new threat for collaborative intru-

Initially clustering of sensor nodes is done which is used mostly

Fig. 3. Clustering based on Nodes Weight.

Feature index Feature name 21 is hot login

1 duration 22 is guest login

ETOT = ETOT CH + .ETOT nonCH

3.4. Rotated random forest for discarding features

After CH determination, classiﬁcation is done to which only

Attack Categories Selected Attributes Parameters Value

NORMAL {1-41} Speciﬁcity 0.9756

Attack class Attack techniques

The security issues in MANET combined with data mining is

4.1. Simulation results

Level Security algorithm. The encryption time is calculated by using

4.1.2. Decryption time

4.1.3. Intrusion detection parameters

of different parameters such as speciﬁcity, sensitivity, accuracy,

Attack Types improved AODV with

20 56.888 51.2 53.895 53.89 56.895 56.888

4.2.3. Transmit energy

Transmit Energy = node weight × number of Execution time (23)

In the above Table 9 indicate the value of Transmit Energy with

Attack Types improved AODV with

20 1.7578 1.7578 1.7578 1.7578 1.8554 1.8514

Attack Types improved AODV with

20 0.1343 0.4032 0.2688 0.2687 0.3095 0.4102

Attack Types improved AODV with

20 0.9314 0.9314 0.2624 0.2623 1.1417 1.0125

Attack Types improved AODV with

20 0.1312 0.3936 0.2624 0.2623 0.2952 0.2952

Attack Types improved AODV with

20 0.5122 0.5123 0.5121 0.5121 0.5244 0.5643

Attack Types improved AODV with

20 32.3624 32.3624 32.3624 32.3635 29.0467 36.6725

Attack Types improved AODV with

20 1.8629 1.8629 1.8629 1.8628 2.2835 1.4124

Attack Types improved AODV with

20 89.5404 92.9934 89.937 92.8806 88.025 97.8618

unknown attack. After prevention using prevention mechanism the

Attack Types improved AODV with

20 68.7686 68.7442 65.8615 64.6751 96.7508 96.8452

Methods Encryption time (sec) Decryption time (sec)

High Level Security Mechanism (Proposed) 0.17476 0.17472

Fig. 18. Comparison for Sensitivity.

Fig. 19. Comparison for Speciﬁcity.

Fig. 20. Comparison for FDR.

proposed algorithm is described efﬁcient by comparing with other

[1] A. Mehmood, S. Khan, B. Shams, J. Lloret, Energy-efﬁcient multi-level and

You might also like