You are on page 1of 6

Partial Consensus and Incremental Learning Based

Intrusion Detection System


Mohd Mohtashim Nawaz Vineet Gupta
Indian Institute of Information Technology, Allahabad Indian Institute of Information Technology, Allahabad
Prayagraj, Uttar Pradesh, India Prayagraj, Uttar Pradesh, India
nawazmohtashim@gmail.com vineet.gupta224a@gmail.com

Aditya Raj Sangeeta Meena


Indian Institute of Information Technology, Allahabad Indian Institute of Information Technology, Allahabad
Prayagraj, Uttar Pradesh, India Prayagraj, Uttar Pradesh, India
rajadityagolu@gmail.com smiiit.1999@gmail.com

Abstract—The tremendous growth of internet and ease of systems is relatively unexplored. It is interesting to note that
availability of computer systems have brought with it a serious implementation of incremental learning algorithms and con-
threat of malicious cyber activities. The intrusion or the malicious sensus into the intrusion detection is exceptionally rare if not
attempt to compromise the security of a computer system is one
such threat. As the computational resources have increased, the yet to be implemented. We have analyzed the machine learning
problem has attracted the researchers to build efficient systems and data mining techniques used to build intrusion detection
to thwart intrusion attempts. This paper provides insight to system and proposed a possible solution using an incremental
the intrusion detection techniques and technologies and analyzes learning algorithm Learn++ proposed by R. Polikar, L. Upda,
different techniques used in building intrusion detection systems. S. S. Upda and V. Honavar [1]. The proposed solution makes
We also propose a distributed partial consensus and incremental
learning based intrusion detection system. use of concepts of decentralization and makes decisions based
Index Terms—Intrusion, Distributed intrusion detection sys- on consensus among the nodes.
tems, incremental learning, weighted random selection, consensus
II. L ITERATURE R EVIEW
Intrusion detection systems are divided into two categories
I. I NTRODUCTION in terms of where the components are placed, the ”Host-based
Broadly speaking, intrusion is an attempt to compromise Intrusion Detection Systems (HIDSs)” and the ”Network-
the security of a computer system. The intruder may try to based Intrusion Detection Systems (NIDSs)”.
compromise the security policies like confidentiality, integrity Host based network system places the system at each node in
and availability of the system, user policies and system defined the network and the packets are scanned at the host level to
security measures. Intrusion detection, as name suggests, is check for behaviour.
the process of detection of the intrusion and attempt to stop Network intrusion detection system is placed at suitable lo-
it. Since the computer systems deal with a large community cations in the network to strategically monitor the traffic and
and widespread network, the intrusion detection is critically perform checks on it.
important in today’s world. Distributed Intrusion detection systems (DIDS) are designed
Intrusion detection plays a very important role in cyber se- for networks where centralized approach fails. Centralized
curity. Network Intrusion Detection Systems are a category systems have a single or a few analysis nodes in network
of IDSs which deal with the violations of security aspects in which receive their required network information from data
the network. NIDSs monitor and analyze the network traffic collecting nodes which are spread all over the network. The
for detection of any kind of intrusion and trigger alarms centralized systems have the security issues like failure of
of intrusion detection for network administrator. Network network, crash of central node, etc. Moreover, if the central
Intrusion Detection System by anomaly detection consists of node is compromised, whole system fails. Networks like
two main parts. First is the model or the architecture that wireless Ad-hoc networks where a node in the network is
defines the operational structure. Second, the detection engine not physically linked with the network but is in radio range
which is used to detect the anomalies in network. There can communicate to the network. These networks have certain
exist several intrusion detection techniques and the methods to limitations like fixed transmission range, limited bandwidth,
implement these techniques but the use of machine learning etc. Also, mobile nodes have limited battery and memory.
and data mining techniques have been quite profound in Therefore, different solutions and distributed approaches exist
the development of intrusion detection systems. However, a for providing better security in wireless networks. As wireless
combination of machine learning techniques and distributed networks are flexible, nodes may leave and join the network

1
frequently, so there should be a mechanism to provide security is called a class imbalance problem and the class imbalance is
to such nodes in the network. Approaches like multi agent specially found in intrusion detection because number of non-
based hierarchical approach, semi-centralized cluster based anomalous instances is much higher than anomalous instances.
approach and distributed consensus based approaches are used However, not all of these criteria are often used. The most
to provide security to wireless Ad-hoc networks. widely used of these are:
There are several intrusion detection techniques like signature 1) Accuracy: It is the measure of ability of the system
based detection, anomaly based detection, etc. The more com- to classify the new instances correctly. In essence, the
monly used of these is the signature-based intrusion detection. percentage of instances classified as positive if they are
This type of intrusion detection system deals with the attack in really positive and as negative if they are really negative.
a way that whenever a new attack is encountered, the attack It is one of the basic evaluation criteria.
pattern or signature is defined and stored. Network security 2) False Negative Rate: The percentage of actually pos-
specialists can design a defence against the new assault after itive (anomalous) samples reported as negative (non-
the attack signature is studied. Accordingly, IDS is updated so anomalous) is called the false negative negative rate.
that it can detect the new signature pattern and respond to it. High false negative rate indicates that the system is not
This method is useful only for known patterns. If attacks are able to detect actually anomalous instances correctly.
slightly changed or altered they can not be identified. 3) False Positive Rate: The percentage of actually nega-
Anomaly based Intrusion Detection Systems are another type tive (non-anomalous) instances misclassified as positive
of IDSs. These detection systems deal with the attacks where (anomalous) instances.
signatures and patterns are not defined in the database. Net-
work behaviour is a major parameter on which the detection III. R ESEARCH A DVANCES
system is based and if the network behaviour is within the Different architectures of Network Intrusion Detection
predefined behaviour than it is recognized as normal traffic else Systems and different algorithms for anomaly detection
it is recognized as abnormal traffic. Instead of using signatures are the subject of research for more effective and efficient
and patterns, these detection systems use rules and heuristics intrusion detection.
to differentiate normal behaviour from abnormal behaviour. In [3], Sen, Jaydip proposed a cluster based semi-centralized
Implementation of machine learning and data mining tech- intrusion detection system for wireless networks.In the
niques and algorithms can provide a simple yet effective proposed model the network is divided into clusters using
solution for detection and classification of anomalies in the a particular clustering algorithm [4].All different clusters in
network. Several approaches like probabilistic and model network have a cluster head to manage all remaining cluster
based machine learning algorithms have been used in the members of that cluster, also cluster heads led to initialization
intrusion detection systems. The use of model based learning of cooperative intrusion detection process.These cluster heads
techniques have an advantage that they predict faster and are chosen on the basis of output of an election algorithm
once the model is trained, there is no need to keep the which is based on trust level of nodes. Also, each node
data unless the model needs to be re-trained on same data. in network maintains a database containing information of
However, probabilistic techniques like bayesian classifiers are known attacks and for the new attacks in the network some
much efficient and provide better accuracy. A few studies have thresholds are set based on anomaly detection algorithm.
used the combined model based and probabilistic techniques When a network wide intrusion is detected cluster heads relay
to build such classifiers. One such approach uses Probabilistic the information to other clusters using a gateway, however,
Neural Networks for classification. The interested reader can in case of local intrusion, member node handles it locally
find more details in the paper published by Atay, İbrahim [2]. and cluster head adds the information of detected intrusion
IDSs that can scan the network’s large data, are robust and in the database so that it can be used in future for intrusion
have a good accuracy are always in demand within the detection. Cluster head manages mobile agents for collecting
industries. The tools like sniffers, firewalls, IDSs have become intrusion related data and relaying response of intrusion
a necessity due to the shortcomings of underlying protocols detection system to cluster members these mobile agents
used in the communication. reduces load of cooperative detection. Though the approach
NSL-KDD is a data set used worldwide for evaluating in- provides the seemingly good solution at initial level, there
trusion detection systems. This data set contains four major are some security concerns related to mobile agents which
categories of attacks that are denial of service attack, user-to- can be found in [5].
root attack, remote-to-local attack and probing attack. A major downfall of this approach becomes visible as network
The evaluation criteria used to compare the performance of topology changes and cluster head begin to leave the network
different techniques and methods of intrusion detection include at higher rate. In such case elections must be done frequently
: i) accuracy, ii) false positive rate (FNR), iii) false negative to save the system from going into an inconsistent state
rate (FPR), iv) time complexity, v) memory complexity, and and to maintain the availability. Since the wireless network
vi) Correlation measures. False negative rate and false positive topology is prone to frequent and sometimes sudden changes,
rate play a major role in the cases where class imbalance a lot more number of elections may be necessary which adds
problem exists. When the desired class instances are rare it to the overhead. However, as the algorithm uses concept of

2
local and global detection, it helps to manage the latency node which may leave or loose connection. This model is
in detection as most of the attacks can be handled locally. scalable to large networks but will have a communication
Hence, the algorithm have a few weaknesses but may prove overhead and may become time inefficient as for 9 NIDS
to be beneficial in designing the intrusion detection systems module their will be 9000 iteration, for 81 modules 81000
for wireless networks. iterations and so on. The model suffers the problems of
consensus algorithm [8] which may result to inconsistent
In [6], Said OUIAZZANE, Malika ADDOU, Fatimazahra state in case of network faults, byzantine faults and crash
BARRAMOU, proposed a model of distributed network faults. The model relies on the naive bayes classifier which
intrusion detection system which can detect intrusions in has a fundamental problem of assumption that the features
a large network using multi-agent system and HADOOP are independent which may not be a case in real world
Distributed File System(HDFS). The proposed model uses scenario. Also, the algorithm requires that all the data is kept
agents to perform it’s tasks. First, DIDS uses capturing agent in a database i.e. data shall not be deleted. Moreover, the
to collect the network data, then collected data is filtered by naive bayes needs to scan the whole database each time to
filtering agent to separate the categorized data (normal and make predictions. All these requirements add to the cons of
abnormal) and un-categorized data that is not yet recognized. the algorithm. Therefore, the algorithm even though may be
This un-categorized data is stored in HDFS cluster for further beneficial for some scenarios, it comes with a price of large
processing by Load Balancing agent. Finally, the Decision memory needs, complexity and latency.
Maker agent processes the data stored on Hadoop to detect In [10] Krishnan Subramanian, Sachin Senthilkumar and
whether it is an intrusion. If an intrusion is found it stores Balasubramanian Thiagrajan proposed an intelligent intrusion
the detected intrusion in intrusion database. Here, Decision detection system which uses tools such as neural networks
Maker agent can use any machine learning algorithm for and support vector machines. The model first categorizes
anomaly detection. different types of attacks and then individual neural networks
The model has good processing speed in comparison to other are trained for a particular attack type. It uses bayesian neural
distributed NIDSs because of the used Hadoop file system networks for denial of service(DoS) or distributed denial of
that uses parallel distributed computing. Also, filtering of data service(DDoS) attack, Re-circulation neural network for U2R
before storing data in HDFS adds to it. The model is scalable attacks, fuzzy neural network for R2L attacks and evolving
to large networks as more agents can be added to DIDS if fuzzy neural network for probe attacks. A feed-forward multi-
needed. An apparent downfall of this model is the crash of layer perceptron and back propagation learning algorithm is
the file system however, to overcome it, authors presented an used as a committee machine. The machine assigns weighted
approach of replicating the whole database and storing it at average to each separate attack and then voting is done to
three different sites. But again, the replication not only gives detect the anomaly.
rise to data redundancy but also consumes a lot of storage as The model is a good approach for detecting attacks as each
the data can be large. Although, the use of modern big data type of attack is handled by different algorithms specialised
techniques and file systems like HDFS are the backbone of for a single attack and then overall decision is made by
this model, the same adds to the complexity. voting. The chances of detection of attacks are very high due
to specialised machines for each type of attack.
In [7], Michel Toulouse, Bùi Quang Minh, Philip Curtis, The model suffers from the problems of Naive Bayes model as
proposed a fully distributed network intrusion detection already discussed. In addition, it is computationally expensive
system based on a consensus algorithm called ”average as for each attack four predictions are made. Also during
consensus algorithm”. This algorithm like all other consensus training of the model, there is a need of a large amount of
algorithms aims to reach a common decision of all nodes classified data to train different neural networks. The model
which executes the detection algorithm. The detection despite it’s high performance isn’t viable for small capacity
algorithm used in the proposed model is Naive Bayes machines due to high computational expenses. A distributed
classifier for anomaly detection. The work flow of model can system can be used to implement the model to work efficiently
be seen as a process of multiple steps where, initially, all but this will introduce risks of network faults. The model
n NIDS modules are trained for naive bayes classification. suffers from the obvious high complexity and non-viability
Each module then computes two initial variables then average in some real world scenarios. The intrusion data often suffers
consensus algorithm is applied which uses those initial from class imbalance problem and a large amount of classified
variables to reach two final variables for whole network. data with sufficient number of positive instances for each
Next, the probability or hypotheses of the network behaviour class of attack may not be available. Therefore, the model
being normal and probability or hypotheses of the network is not as robustly applicable to real world scenarios as needed.
behaviour being abnormal is calculated using the variables
obtained in above step. To reach the final variables 1000 In [11] Lukas Iffländer et. al. proposed two algorithms
iterations are executed for each module. namely Adaptive blacklisting and Adaptive whitelisting to
The model has a good accuracy and is a better solution for improve the performance of intrusion detection systems. It
ad-hoc wireless networks as decision is not taken by a single uses the basic principle that most attacks occur within first

3
few packets after establishing a new connection. and incremental learning, both of them have seldom been
Currently most of the intrusion detection systems use selective combined. The proposed solution uses the combination of both
filtering which is an static approach. In selective filtering, in a unique way.
destination machine needs the knowledge of application Here, we propose a solution for developing an intrusion detec-
workload running on different hosts and intrusion detection tion system using incremental learning and partial consensus.
system for protection. Selective filtering uses a flow entry in The incremental learning is a form of machine learning where
the switch for each server and application which redirects the model is able to learn incrementally without losing the
the susceptible traffic to intrusion detection system using an learned information. Such type of learning algorithms are
SDN based traffic analysis. effective when there isn’t enough memory to load the full data
In the adaptive blacklisting as proposed by the authors, set into the memory for training. Also, they can be applied to
Connections are removed from the blacklist after sometime the systems where the training data is available incrementally
when they do not raise an alarm for a certain period of i.e. in chunks. The consensus is a mechanism to achieve
time. In this approach, when a network traffic arrives, the necessary agreement on a single data value in a decentralized
model first confirms whether the requested connection is a or distributed system. It is necessary in a distributed system
new connection. If new connection is detected, a controller to reach the consensus so that the state of the system remains
creates two flows in the network for different duration. The consistent. There exist a number of incremental learning
network traffic is forwarded to intrusion detection system by algorithms each with their own pros and cons. Losing, Viktor
the first flow and a small timeout duration is set. Once the and Hammer, Barbara and Wersing, Heiko published a paper
time is over, the traffic is sent directly to the host destination. in 2016 [9] providing an in depth analysis on the subject of
If intrusion detection system raises any alert before timeout, incremental machine learning algorithms. The paper not only
first flow is used in the network and all traffic is passed provides the basic definitions of each incremental machine
through the intrusion detection system. learning algorithm but also provides the accuracy obtained by
In adaptive white listing, the model does not require any different models on different datasets. It compares most widely
knowledge about the configured signatures in the intrusion used algorithms like Incremental Support Vector Machine
detection system. Explicitly, white listed traffic types are (ISVM), On-line Random Forest (ORF), Incremental Learn-
optionally needed. In this approach, for every connection, ing Vector Quantization (ILVQ), Learn++ (LPP), Incremental
the host initially queries the SDN controller which creates Extreme Learning Machine (IELM), Gaussian Naive Bayes
a flow via the intrusion detection system. If after a certain (GNB), etc. The paper helps in choosing the appropriate algo-
number of predetermined packets, intrusion detection system rithm according to the need of the application. The interested
generates lesser number of alerts than a threshold, the traffic reader can find more in paper [9].
is flagged as white listed. The remaining traffic of white listed R. Polikar, L. Upda, S. S. Upda and V. Honavar in their paper
connection is directly routed to the host. After a certain time [1] proposed an incremental learning algorithm for neural
limit, the traffic of the connection is again tested for alerts. networks called ’Learn++’. The algorithm uses ensemble of
If alerts more than the threshold value are generated by the classifiers learned on the data sets sampled from a database.
intrusion detection system, then whole remaining network It uses a weak classifier as the base learning algorithm and
for the particular connection is passed through the intrusion the results of different learned classifiers are combined using
detection system. weighted majority voting process. Neural networks are often
This model has the limitations that it only counts the number used as the base learning algorithm as they can be easily
of detected attacks. To account for the false positive and false manipulated to simulate a weak learner. However, the learn++
negative results, Model proposed in [10] can be used with algorithm recommends not to use strong classifiers.
the model proposed in [11] to adjust threshold values and Learn++ uses an ensemble of classifiers by generating
increase efficiency. multiple hypotheses using training data sampled according to
careful distributions. The outputs of the resulting classifiers
IV. P ROPOSED S OLUTION : PCIL-IDS are combined using a weighted majority voting procedure.
In the traditional IDSs the intercepted data is sent to a As the input, learn++ requires a data set sampled from
central node where the decisions are taken. Such a centralized a database D, a weak learn base classifier and an integer
system is prone to failure in case of network faults or crash of specifying number of iterations to be performed. It performs
the central decision making node. Also, such a system can not complex calculations and generates a final hypothesis H
tolerate the byzantine faults. Moreover, most of the traditional which can be used for prediction. As stated in the published
systems using data mining and machine learning concepts train paper, ”Learn++ guarantees convergence on any given training
the model once in a while as a batch learning process and need data set, by reducing the classification error with each added
to keep the full database so as to retain the knowledge when hypothesis”. Interested reader can find more details in paper
the model is re-trained. Thus, nodes with limited memory can [1]. The choice of learn++ is justified as according to paper
not train such models on their own and have to rely on the [9]. It provides a very good accuracy over large datasets
pre-trained models or a central node. Although, the proposals having either large number of features and classes or small
have been made to build the IDSs with Distributed systems number of features and classes. Moreover, it does not need to

4
store previous data which makes it suitable for low storage B. Learning
devices. • The learning algorithm used is Learn++ [1] incremental
learning algorithm with neural network as the base weak
classifier.
A. General Approach • Whenever sufficient number of new instances are accu-
mulated the model is re-trained.
Initially, a classifier model is trained over the pre-
C. Selection of neighbour
accumulated data set and kept at each node. Let Nbe the total
number of nodes in the network and ith be the node at which • The m number of neighbours, where m should be much
the packet/request arrived. The algorithm works as follows: less than N, are selected randomly. The random selection
is a weighted random selection so that same nodes are
• Packet/Request arrives at ith node. not selected each time.
• i relays the request to m neighbour nodes and applies the • The weights in random selection are calculated based on
classifier to the received instance. the distance in the network and the predictive weights of
• Requested neighbours return the probabilities obtained individual nodes.
according to their classifiers to i • The neighbours to be requested are selected beforehand
• i makes the decision based on weighted average of proba- i.e. after the decision making process is finished, the
bilities received from different peers and own prediction. neighbours for next time are selected to overcome the
• The results are displayed. overhead of neighbour selection procedure when packet
• The weights of those nodes that were questioned are arrives.
readjusted using the formula given in (1) and (2).
• Readjusted weights are relayed to neighbours which D. Updating the weights
further relay the weights. Thus the weights reach to all As discussed, the decision making over the returned proba-
the nodes over the network. bilities requires weights therefore, the updating of weights is
• i stores the received instance in the database and when a one of the crucial steps. The following operations occur at the
sufficient number of instances are accumulated in a batch, ith i.e. the requesting node:
learning algorithm is again applied. Thus over a period • Initially, all the nodes are given a constant weight, say
of time different nodes have their own classifier different Winit .
from others. • If a node does not return any value, its weight is set to

Fig. 1 shows the process flow through a flow diagram. zero.


• Otherwise weights are updated as:

 : error tolerance f actor


α, β : weight control parameters
v : agreed value
ci : proposed probability value

If : |v − ci | <= 
W i = W i + α ∗ e−x/ (1)
−x/
else : W i = W i − β ∗ (e − e) (2)
• Equation (1) denotes the updating of weights in case if the
error in prediction is lesser than or equal to the tolerance
factor.
• Equation (2) denotes the updating of weights in other
case.
• The values of α and β are to taken in the orders of /10
and /100 respectively. This helps to maintain the change
of weights effectively rewarding and penalizing the nodes
but at the same time keeping in mind that the weights are
not changed too drastically for each unit error.
Fig. 1. process flow diagram • If a node recovers after failure, it needs to sync the set
of weights from its neighbours.

5
E. Consensus
Once the weights are updated, they are propagated to the
neighbours. Each weight update message contains an epoch
number to identify the current epoch. The message is signed
by the node which sends the update message and the message
includes only weights that are needed to be updated along
with the IDs of nodes to which the respective updated weights
belong.
• When a node receives a weight change message, it checks
the signature.
• If the signature is valid the node further broadcasts the
message to its neighbours.
• If message contains ID of any node then node then before
updating its own weight, it checks if it sent the prediction
to the node from which update message originated and
epoch number is greater than its own epoch number, if
so, it updates it own weights.
• If not so, it rejects the message and do not propagate
it further, as it shows something malicious is going on.
Rather, in former case when the node did not sent the
prediction but the weight update message contains its ID
and new weight, it broadcasts an alert message to indicate
to all nodes that something malicious has happened and Fig. 2. consensus diagram
every node should revert to the previous weights.
• Such an alert message contains ID of weight change
[2] Atay, İbrahim. (2018). Intrusion Detection with Probabilistic Neural
message and its epoch number so if some node already Network (PNN) - Comparative Analysis.
updated the weights they should revert back or if not [3] Sen, Jaydip. ”An intrusion detection architecture for clustered wireless
updated yet, they should discard the message. ad hoc networks.” Computational Intelligence, Communication Systems
• Each nodes need to keep a few previous set of weights and Networks (CICSyN), 2010 Second International Conference on.
IEEE, 2010.
so that the changes ca be undone.
[4] C.R. Lin, M. Gerla, “Adaptive clustering for mobile wireless networks”
In this way, every node in the network reaches consensus. Fig. IEEE Journal on Selected Areas in Communications, Vol. 15, No. 7,
2 explains the process in a simple way. September 1997, pp. 1265-1275.
[5] W. Farmer, J.Guttman, and V. Swarup. “Security for Mobile Agents:
V. C ONCLUSION Authentication and State Appraisal”, Proc. of the European Symposium
Machine learning and data mining techniques have been on Research in Computer Security (ESORICS), LNCS, September 1996.
used extensively in developing anomaly based intrusion de-
[6] S. OUIAZZANE, M. ADDOU and F. BARRAMOU, ”A Multi-Agent
tection systems and they have proved to be highly accurate in Model for Network Intrusion Detection,” 2019 1st International Con-
some scenarios. Combined with data science techniques, con- ference on Smart Systems and Data Science (ICSSD), Rabat, Morocco,
cepts of decentralization and distributed systems have helped 2019, pp. 1-5.
in building robust and accurate systems dealing with some [7] Michel Toulouse, Bùi Quang Minh and Philip Curtis.”A consensus based
network intrusion detection system” 2015 5th International Conference
inherent problems of centralized systems. Since the demand on IT Convergence and Security (ICITCS).
of such systems is ever increasing, systems with high accuracy, [8] N. A. Lynch, Distributed Algorithms. San Francisco, CA, USA: Morgan
robustness and ability to deal with large data are welcomed Kaufmann Publishers Inc., 1996.
into the industry. We have proposed a solution to build such a [9] Losing, Viktor & Hammer, Barbara & Wersing, Heiko. (2016). Choosing
the Best Algorithm for an Incremental On-line Learning Task.
system using concepts of consensus and incremental learning
[10] Krishnan Subramanian, Sachin Senthilkumar and Balasubramanian Thi-
algorithm which eliminates certain issues of batch learning agrajan. (2016). Intelligent Intrusion Detection System using a Commit-
and centralized systems. Also, the proposed solution is simple, tee of Experts.
lightweight yet effective though the network latency may be [11] Lukas Iffländer, Jonathan Stoll, Nishant Rawtani, Veronika Lesch, Klaus-
a downfall to the proposed solution when tested in the large Dieter Lange and Samuel Kounev. (2019). Performance Oriented Dy-
namic Bypassing for Intrusion Detection Systems.
network.

R EFERENCES
[1] R. Polikar, L. Upda, S. S. Upda and V. Honavar, ”Learn++: an in-
cremental learning algorithm for supervised neural networks,” in IEEE
Transactions on Systems, Man, and Cybernetics, Part C (Applications
and Reviews), vol. 31, no. 4, pp. 497-508, Nov. 2001.

You might also like