1 s2.0 S1389128621001675 Main

Computer Networks 192 (2021) 108076
Contents lists available at ScienceDirect
Computer Networks
journal homepage: www.elsevier.com/locate/comnet
LIO-IDS: Handling class imbalance using LSTM and improved one-vs-one

technique in intrusion detection system
Neha Gupta a, *, Vinita Jindal b, Punam Bedi a
a
Department of Computer Science, University of Delhi, India
b
Keshav Mahavidyalaya, University of Delhi, India
A R T I C L E I N F O A B S T R A C T
Keywords: Network-based Intrusion Detection Systems (NIDSs) are deployed in computer networks to identify intrusions.
Cybersecurity NIDSs analyse network traffic to detect malicious content generated from different types of cyber-attacks.
Network security Though NIDSs can classify frequent attacks correctly, their performance declines on infrequent network in
Network-based intrusion detection system
trusions. This paper proposes LIO-IDS based on Long Short-Term Memory (LSTM) classifier and Improved One-
(NIDS)
Class imbalance problem
vs-One technique for handling both frequent and infrequent network intrusions. LIO-IDS is a two-layer Anomaly-
Long short-term memory (LSTM) based NIDS (A-NIDS) that detects different network intrusions with high Accuracy and low computational time.
Improved one-vs-one technique (I-OVO) Layer 1 of LIO-IDS identifies intrusions from normal network traffic by using the LSTM classifier. Layer 2 uses
ensemble algorithms to classify the detected intrusions into different attack classes. This paper also proposes an
Improved One-vs-One (I-OVO) technique for performing multi-class classification at the second layer of the
proposed LIO-IDS. In contrast to the traditional OVO technique, the proposed I-OVO technique uses only three
classifiers to test each sample, thereby reducing the testing time significantly. Also, oversampling techniques
have been used at Layer 2 to enhance the detection ability of the proposed LIO-IDS. The performance of the
proposed system has been evaluated in terms of Accuracy, Recall, Precision, F1-score, Receiver Characteristics
Operating (ROC) curve, Area Under ROC (AUC) values, training time and testing time for the NSL-KDD, CIDDS-
001, and CICIDS2017 datasets. The proposed LIO-IDS shows significant improvement in the results as compared
to its counterparts. High attack detection rates and short computational times make the proposed LIO-IDS
suitable to be deployed in the real-world for network-based intrusion detection.
1. Introduction data. Based on its deployment position, IDS can be classified as a Host-
based IDS (HIDS) or a Network-based IDS (NIDS).
In the present world, the Internet has become a ubiquitous part of HIDS secures a single host by analysing its logs and system calls,
people’s lives. From personal communications to professional activities, whereas a NIDS secures all the devices on a network by analysing the
a significant amount of a person’s day is spent on the Internet. Computer communications taking place on the networked devices. Further, an IDS
networks form the backbone of this digitally connected world, and with can be deployed as an isolated system or as a Collaborative IDS (CIDS).
the advancements in technology, a larger amount of network traffic CIDS consists of several sensor nodes that collect data from different
traverses over these networks than ever before. This network traffic is devices and provide it to analysis nodes for intrusion detection. Though
either benign in nature or malicious in nature. Benign network traffic is CIDSs achieve better detection Accuracy than isolated IDSs, they are
also known as normal traffic, whereas malicious network traffic is complex and costly devices [1]. In this paper, we propose an isolated
referred to as attack traffic. If undetected, attack traffic can cause severe NIDS that analyses network traffic to identify intrusions. NIDSs can
damage to the confidentiality, integrity, and availability of network data detect intrusions using two different mechanisms, either by matching
and devices. To safeguard computer systems and networks against the current network traffic with the signatures of known intrusions or by
cyber-attacks or intrusions, Intrusion Detection Systems (IDSs) are creating a profile of normal network behaviour and comparing it with
deployed in the real world. IDSs analyse the activities taking place on a the current network traffic. The first approach is used to develop
computer system or a network to identify malicious traffic from benign Signature-based NIDSs (S-NIDSs), whereas the second approach is used
* Corresponding author.
E-mail addresses: neha.phd.2018@gmail.com (N. Gupta), vjindal@keshav.du.ac.in (V. Jindal), pbedi@cs.du.ac.in (P. Bedi).
https://doi.org/10.1016/j.comnet.2021.108076
Received 6 December 2020; Received in revised form 27 February 2021; Accepted 29 March 2021
Available online 7 April 2021
1389-1286/© 2021 Elsevier B.V. All rights reserved.
N. Gupta et al. Computer Networks 192 (2021) 108076
to build Anomaly-based NIDSs (A-NIDSs). the I-OVO technique is significantly lower than the traditional OVO
S-NIDSs identify previously known attacks accurately, but they technique.
cannot identify unknown network attacks. In contrast, A-NIDSs can 3 The proposed LIO-IDS identifies both majority, and minority classes
identify both known and unknown intrusions because any network present in the network traffic with high Accuracy, thus, handling the
traffic that deviates from the normal profile is marked malicious by the class imbalance problem of network intrusion detection. For this
A-NIDS [2]. This paper proposes LIO-IDS, an A-NIDS that identifies in purpose, three data-centric approaches, namely, Random Over
trusions using Long Short-Term Memory (LSTM) and Improved One-v sampling (ROS), Borderline-Synthetic Minority Oversampling Tech
s-One technique. An A-NIDS must be trained using network traffic nique (Borderline-SMOTE), and Support Vector Machine- Synthetic
samples to identify intrusions. In any network, a large number of sam Minority Oversampling Technique (SVM-SMOTE), have been used at
ples are present for both normal traffic and for those intrusions that have the second layer of the proposed system.
been frequently witnessed in the past. However, fewer samples are 4 The performance of the proposed LIO-IDS was evaluated using Ac
available for infrequent intrusions. Hence, there is an uneven distribu curacy, Recall, Precision, F1-score, Receiver Operating Characteris
tion of samples in different classes (normal class and different attack tics (ROC) curve, Area Under Curve (AUC), training time, and
classes) of network traffic. This unequal representation is also reflected average testing time. The proposed system shows significant
in intrusion detection datasets constructed from real or synthetic improvement in the results as compared to its counterparts. High
network traffic. attack detection rates and short computational times make the pro
A dataset having a significant disproportion in the number of sam posed LIO-IDS an accurate and time-efficient A-NIDS. Thus, LIO-IDS
ples of different classes is called an imbalanced dataset. In such a is a suitable candidate for deployment in real-world networks.
dataset, the class having a higher number of samples is the majority
class, whereas the class containing fewer samples is the minority class. The remaining paper is organized as follows: Section 2 presents the
Due to the uneven distribution of normal and attack samples, network details of different classifiers and data balancing techniques used to
traffic is also imbalanced in nature, and classifying this network traffic is develop the proposed LIO-IDS. It also presents a review of recent
an example of the imbalanced classification problem. The lack of research works on IDSs. Section 3 describes the proposed LIO-IDS,
training samples in the minority attack classes makes the attack iden Section 4 explains the experimental study, Section 5 presents the re
tification process difficult for A-NIDSs. In such cases, intrusions either go sults obtained from the experiments, followed by Section 6 which con
undetected (i.e., get misclassified as benign traffic) or may be classified cludes the paper.
in the wrong attack category. Unidentified intrusions pose a higher risk
to the network’s security, its users, and their data [3]. Hence, to develop 2. Related work
effective A-NIDSs, there is a need to address the class imbalance problem
existing in network intrusion detection. This section discusses the techniques that have been used for
There are four main approaches to handle the class imbalance developing the proposed LIO-IDS. It also presents a review of recent
problem: Data-centric approaches, Algorithm-centric approaches, Cost- research works on intrusion detection.
sensitive approaches, and Ensemble approaches [4]. Data-centric
methods either increase or decrease the samples in different classes of 2.1. Background information
the imbalanced dataset before classification. Algorithm-centric tech
niques create or modify algorithms that can perform accurate classifi In this section, the details of the classifiers and data-balancing
cation irrespective of the existing imbalance in the classes. Cost-sensitive techniques used to develop the proposed LIO-IDS have been pre
approaches modify the data by assigning different costs to the classes sented. The first sub-section explains the LSTM classifier, the second and
and also modify the algorithm by incorporating costs in the learning the third sub-sections describe Bagging and Random Forest ensemble
process of the algorithm. However, assigning costs for multiple classes is classifiers, the fourth sub-section gives details of the Random Over
not an easy task. Ensemble approaches combine one of the three tech sampling technique, followed by the fifth and sixth sub-sections, which
niques mentioned above with an ensemble algorithm (Bagging algo describe SVM-SMOTE and Borderline-SMOTE techniques, respectively.
rithm or Boosting algorithm) to classify imbalanced classes correctly.
This paper uses data-centric approaches and ensemble algorithms to 2.1.1. Long short-term memory (LSTM)
handle the class imbalance problem in network intrusion detection. Long Short-Term Memory is a Deep Learning algorithm that can
The main contributions of this paper are as follows: capture long-term dependencies in the data. Deep Learning algorithms
such as Deep Artificial Neural Networks (DNNs) consider their inputs to
1 The paper proposes LIO-IDS to handle the class imbalance problem be independent of each other. They only accept fixed-sized inputs and
using LSTM and Improved One-vs-One technique in Intrusion produce fixed-sized outputs. They are unable to work with sequences of
Detection System. LIO-IDS is an Anomaly-based NIDS that identifies varying lengths as input or output data. This drawback is overcome by
and classifies network intrusions using two layers: Layer 1 and Layer Recurrent Neural Networks (RNNs), which were developed to process
2. The first layer separates normal traffic from abnormal traffic (in sequences of fixed and variable lengths. In contrast to DNNs, RNNs
trusions) using the LSTM classifier. The second layer classifies the utilize all information from previously seen inputs to compute the next
abnormal traffic into respective attack classes using the novel output, i.e., information from previous timestamps influences the pre
Improved One-vs-One technique and ensemble classifiers. By diction for the current timestamp. Therefore, RNNs can capture short-
combining the advantages of LSTM, ensembles, and I-OVO tech term dependencies between input and output.
nique, the proposed LIO-IDS detects both majority attacks and mi RNNs perform well when information from the recent past must be
nority attacks with high Accuracy and low computational time. used to predict the next output. However, when the gap between the
2 The paper also proposes the Improved One-vs-One technique for relevant information and the place where it must be used becomes large,
performing multi-class classification. The traditional OVO technique RNNs cannot generate the correct output. This is due to the vanishing
uses NC2 classifiers for training and testing, where N is the total gradient problem, which arises when the backpropagated error decays
number of classes. However, the proposed I-OVO technique uses only exponentially [5]. LSTM is an advanced form of RNN that utilizes the
2 + m*n classifiers for training (where m is the number of majority input at the current timestamp and the information from previous
classes, and n is the number of minority classes such that m + n = N) timestamps to generate its output. It consists of neuron layers that
and only three classifiers for testing each attack sample. Since ((2 + capture the long-term dependencies in data by selectively remembering
m*n) + 3) <= (2*(NC2)) classifiers, the total computational time of information for long periods. LSTMs decide what previous information
2
to retain and what to discard from memory. Three gates, namely the predictions made by AdaBoost in the first iteration, weights are reas
Input gate, Forget gate, and the Output gate, are used by LSTM for signed to samples in the subsequent iterations. In this way, AdaBoost
adding information, removing information, and producing the output. ensures that future iterations reduce the misclassifications of the pre
Eqs. (1)–(6) define the working of the three gates and different states of vious iterations, [11,12].
LSTM:
2.1.5. Random oversampling (ROS)
Input Gate : it = σ(wi ⋅[ht− 1 , xt ] + bi ) (1)
Random Oversampling is a resampling technique that is used to
handle the class imbalance problem in datasets. It is a data-centric
̃ t = tanh(wC ⋅[ht− 1 , xt ] + bC )
Intermediate Cell state : C (2)
approach that modifies the class distribution of the imbalanced data
( ) set without changing the underlying algorithm used for classification.
Forget Gate : f t = σ wf ⋅[ht− 1 , xt ] + bf (3)
ROS increases the number of samples in the minority class. Depending
on the amount of oversampling to be performed, a certain number of
Output Gate : ot = σ(wo ⋅[ht− 1 , xt ] + bo ) (4)
minority class samples are randomly selected and duplicated. This in
creases the number of minority class samples and reduces the imbalance
Cell state : Ct = f t Ct− 1 + it C̃t (5)
in the dataset. The reduction in imbalance alleviates the impact of
Hidden state : ht = tanhCt × ot (6) skewed class distributions in the learning process of the algorithm and
leads to better classification results, [3, 4, 13].
where σ is the sigmoid activation function; tanh is the hyperbolic
tangent activation function; xt is the input at time t; wi , wC , wf , wo are 2.1.6. SVM-SMOTE
the weights and bi , bC , bf , bo are the biases. Due to their architecture, Support Vector Machine - Synthetic Minority Oversampling Tech
LSTMs can address the vanishing gradient problem in RNNs through the nique (SVM-SMOTE) was developed by Nguyen et al. to perform over
gating mechanism mentioned above, [6,7]. sampling on the borderline samples of the minority class. It uses SVM to
find the support vectors that can approximate the borderline region of
2.1.2. Bagging the dataset. The authors observed that the learned decision boundary
Bagging is an acronym for Bootstrap Aggregating, which was intro between majority and minority classes tends to be skewed towards the
duced by Breiman. It is an ensemble technique that parallelly constructs minority class due to two reasons. First, minority samples lie far away
multiple base classifiers and trains each of these classifiers using from the ideal decision boundary, and second, SVM is biased towards the
different subsets of training data. Decision Trees (DTs) are generally majority class in the regions where majority and minority class samples
used as base classifiers in Bagging. To select the training subset for each overlap. To handle both these issues, SVM-SMOTE oversamples the
base classifier, samples are chosen randomly (with replacement) from minority class using extrapolation as well as interpolation.
the original training dataset. By training each classifier with a different Extrapolation is used to expand the boundary of the minority class by
dataset, the resulting classifiers differ in their detection capabilities. This introducing new samples in selected boundary regions. This causes the
prevents overfitting and improves the classification process because a learned decision boundary to be close to the ideal one. Moreover,
test sample misclassified by one classifier can be correctly classified by extrapolation tends to be better than the interpolation in SMOTE
other bagged classifiers. To classify each input sample, the predictions of because expansion occurs from inside to outside, i.e., from the neigh
all the classifiers are considered, and the final prediction is computed bour minority instance to the borderline minority instance being
using the majority voting technique, [8,9]. considered. To resolve the second issue, interpolation is performed by
SVM-SMOTE on those minority class samples which overlap the ma
2.1.3. Random forest (RF) jority class, i.e., the minority samples that have many majority class
Random Forest is an ensemble of DTs, which was proposed by samples surrounding them, [14,15].
Breiman. It is a Bagging ensemble method that combines predictions
from multiple uncorrelated DTs to reduce the variance. Since DTs are 2.1.7. Borderline-SMOTE
very sensitive to the training data, RF constructs each DT by taking a To perform accurate predictions, classifiers must efficiently learn the
random sample of the dataset with replacement. Moreover, each tree is boundary or the borderline of each class in the sample space. This is
built using a subset of features present in the dataset. Both these stra because the samples on the borderline, or close to it, are important for
tegies result in the construction of significantly different trees. This generalization, but they are more prone to misclassification. To learn the
approach of constructing multiple independent classifiers handles the borderline for minority classes of an imbalanced dataset, the Borderline-
mistakes of each classifier and reduces the overall prediction error. SMOTE algorithm was developed. This technique first identifies the
During testing, each DT classifies the test sample, and the predictions borderline minority samples and then generates synthetic minority
from all the trees are considered for computing the final prediction. This samples which are then added to the dataset. For each minority sample
final computation is either performed through majority voting or m, its k nearest neighbours are calculated from the entire dataset. If the
weighted voting. The main advantages of RFs include tuning very few number of m’s majority nearest neighbours exceeds its minority neigh
hyperparameters, resistance to overfitting, and reduced variance, [9, bours, then sample m is a borderline sample of the minority class, which
10]. can be easily misclassified. For each borderline sample, the algorithm
computes its k nearest neighbours from the minority class. Out of these, s
2.1.4. AdaBoost neighbours are randomly selected, where s is an integer between 1 and k.
AdaBoost is an acronym for Adaptive Boosting. This algorithm takes The algorithm then calculates the difference between the feature vectors
an iterative approach of building strong classifiers by learning from the of m and its s neighbours, which is further multiplied by a random
mistakes of weak learners. Any classifier that is slightly better than number between 0 and 1. In this way, s new minority samples are
random guessing, is known as a weak learner. AdaBoost works by created between sample m and its nearest neighbours [16].
assigning higher weights to the samples misclassified by the previous The next sub-section presents an overview of research works used for
weak classifier, before feeding them into the next weak classifier. By developing intrusion detection systems in the recent past.
default, Decision Stumps are used as weak classifiers by the AdaBoost
algorithm to build a strong prediction model. For constructing the first 2.2. Literature review
Decision Stump, all samples are assigned equal weights. Based on the
In literature, several contributions have been made by researchers
3
Fig. 1. The working of the proposed LIO-IDS.
for developing effective IDSs. Authors have used Machine Learning feature selection using the Bat algorithm.
(ML), Deep Learning (DL), and ensemble classifiers for improving the Salo et al. [27] proposed a NIDS that utilized an ensemble classifier
detection rate of these systems. Masdari et al. [17] presented a with a dimensionality reduction technique. Information Gain and Prin
comprehensive review of fuzzy S-IDSs that were developed using cipal Component Analysis were used to reduce the number of features in
different ML and Data Mining (DM) techniques. The authors also dis NSL-KDD, Kyoto, and ISCX 2012 datasets, while SVM, MLP, and
cussed various fuzzy membership functions and fuzzy logic controllers instance-based learning algorithms were combined to create an
in their paper. Aljbali et al. [18] developed an IDS using LSTM and tested ensemble classifier. In [28], the authors proposed a hybrid approach for
it on the UNSW dataset. The authors compared the performance of their performing intrusion detection. In their work, feature extraction was
proposed model with SVM and RF using Accuracy, Precision, and performed using the Artificial Bee Colony algorithm, while classification
F-score calculated for binary classification. Althubiti et al. [19] imple was performed using the Adaboost algorithm on NSL-KDD and ISCX IDS
mented an IDS using LSTM on the CIDDS dataset. The authors computed 2017 datasets. Al-Obeidat et al. [29] presented a hybrid ML method to
the overall Recall, Precision, False Positive Rate, and Accuracy for their classify network traffic using multicriteria fuzzy DTs and feature selec
model while comparing the performance of LSTM with SVM, Naïve tion. The authors compared their proposed system with NB, SVM, and
Bayes (NB), and Multi-Layer Perceptron (MLP). K-Nearest neighbor (KNN) using multiple datasets. Kaur [30] evaluated
Chuan-long et al. [20] proposed an IDS using RNN. The authors different ensemble techniques in a distributed environment using
performed binary classification as well as multi-class classification using Apache Spark. The author used two clustering techniques to reduce the
their proposed RNN-IDS. Their model achieved an Accuracy of 83% for training dataset size. The reduced dataset was then used as an input to
binary classification and an Accuracy of 81% for multi-class classifica RF for classifying network traffic.
tion on the NSL-KDD dataset. Dhaliwal et al. [21] presented an IDS Su et al. [31] proposed an anomaly-based intrusion detection model
based on eXtreme Gradient Boosting (XGBoost) algorithm. The authors named the BAT-MC model. The authors utilized Bidirectional LSTM and
utilized the NSL-KDD dataset to assess the performance of their proposed attention mechanism to address the problem of low Accuracy in IDSs.
system. The authors also evaluated the importance of different features The authors also used several convolutional layers to capture the local
present in the dataset in their work. Tang et al. [22] developed an IDS features of the input traffic. Chaabouni et al. [32] presented a survey of
using DNN for securing the Software-Defined Networking (SDN) envi various intrusion detection techniques and threats for Internet-of-Things
ronment. Their system achieved an Accuracy of 75% on the NSL-KDD (IoT) networks. The authors also focused on several network sniffing
dataset in the case of binary classification. Ferrag et al. [5] presented tools for the development of N-IDSs. A survey of IDSs for Cyber-Physical
a comparative study of DL techniques used for intrusion detection. The Systems (CPSs) was presented by Mitchell et al. [33]. The authors
authors also surveyed various intrusion detection datasets and studied classified modern IDSs developed for CPSs based on two dimensions,
the performance of DL approaches on different datasets. namely audit material, and detection technique. The authors also
Vinayakumar et al. [23] utilized DNN to develop a flexible and identified several research gaps and future research directions in their
efficient IDS. The authors tested different configurations of DNN to paper. CPSs like the smart grids were studied in detail in [34]. The au
obtain optimal results. The authors used different host-based and thors performed a comparative analysis of different Intrusion Detection
network-based intrusion datasets to test their proposed system. The and Prevention Systems developed in the literature for securing smart
DNN with five layers achieved the highest Accuracy of 78.9% on the grids. They also highlighted several shortcomings in existing works and
NSL-KDD dataset. The authors of [24] used two DL algorithms, namely provided recommendations for future research work.
Convolutional Neural Network (CNN) and LSTM, for feature extraction Hindy et al. [35] presented a survey of different intrusion detection
and network traffic classification. CNN was used to extract spatial fea datasets that have been used in literature for the development of IDSs.
tures, while LSTM was used to find temporal features. Besides, the au The strengths and shortcomings of each dataset were highlighted, and
thors also performed weight optimization on the training dataset to their impact on the performance of IDSs was studied. The authors also
handle the class imbalance issue. Along similar lines, Wang et al. [25] provided a taxonomy of network attacks and the tools that can be used to
performed hierarchical extraction of spatial and temporal features using perform those attacks. Buczak et al. [9] also conducted a literature re
a combination of deep CNN and LSTM. They evaluated their technique view of different DM and ML techniques for developing S-IDSs and
using DARPA and ISCX datasets by calculating Accuracy, Detection A-IDSs. The authors also discussed different types of intrusion detection
Rate, and False Alarm Rate. Zhou et al. [26] developed an IDS using an datasets and various evaluation metrics to assess the performance of
ensemble of RF, Forest PA, and C4.5 DT classifiers. The predictions from IDSs. On similar lines, Aldweesh et al. [7] presented a taxonomy of
these classifiers were combined by a voting method based on the DL-based IDSs based on different criteria, namely input data strategy,
average of probabilities. The authors also performed a correlation-based detection strategy, deployment strategy, and evaluation strategy. The
4
Fig. 2. Algorithm for the testing phase of proposed LIO-IDS.
authors also highlighted various challenges and outlined several future combined the SMOTE algorithm with the grid search technique to
research directions for developing more accurate and efficient IDSs optimize hyperparameter tuning. An ensemble-based pre-processing
using DL algorithms. Vasilomanolakis et al. [1] presented a detailed method was presented [40] in to alleviate the imbalance problem
taxonomy of CIDSs. The authors described different architectures for encountered during multi-class classification. In that technique, the
CIDSs, various characteristics of an effective CIDS, and discussed the minority classes were first oversampled using normalized probability,
attacks that can target a CIDS. and then stacked generalization was used for identifying the different
Many solutions have also been proposed to handle the class imbal classes of the dataset. Mikhail et al. [41] developed a NIDS using the
ance problem of network intrusion detection. Bedi et al. [36] used the one-vs-all technique. Their system consisted of multiple sub-ensembles
Siamese Neural Network (SNN) to develop an IDS, named Siam-IDS, for that were nested together to perform intrusion detection. Class
handling the class imbalance problem. The authors evaluated their weights were also assigned to the sub-ensemble of each class, and this
system’s performance using multi-class classification on the NSL-KDD weight was based on the true positive rate metric.
dataset and compared it with IDSs built using DNN and CNN. Though Alabdallah et al. [42] handled the class imbalance of NIDSs by
Siam-IDS achieved acceptable Recall values for the minority classes, its combining a cost function with weighted SVM. The authors performed a
Precision values were low. An improvement to Siam-IDS was presented One-vs-One (OVO) classification using a Gaussian radial basis kernel in
in [37] in which the authors proposed a two-layer ensemble model, C-SVM. Further, stratified sampling was applied on a 50% NSL-KDD
named I-SiamIDS, to alleviate the class imbalance problem using dataset to test their proposed system. The traditional OVO technique
NSL-KDD and CIDDS-001 dataset. I-SiamIDS utilized binary XGBoost, transforms a multi-class classification problem into multiple binary
DNN, and SNN to identify attacks at the first layer and multi-class classification problems. It requires NC2 binary classifiers for classifying
XGBoost at the second layer to classify the attacks in different classes. each test sample, where N is the number of classes. As the number of
Zhang et al. [38] proposed a flow-based IDS that utilized the classes increases, more binary classifiers need to be trained and tested.
Gaussian Mixture Model (GMM) and SMOTE to handle the class This is a major drawback of the OVO technique. To improve this tech
imbalance problem in network data. The authors also used nique, several strategies have been proposed in the literature. Zhang
one-dimensional CNN and tested their proposed model on UNSW-NB15 et al. [43] proposed the Dynamic Ensemble Selection procedure to
and CICIDS2017 datasets. Gonzalez-Cuautle et al. [39] proposed a new enhance the OVO technique by dynamically selecting multiple ensemble
method for sampling an imbalanced dataset. In their work, the authors classifiers for each sub-problem. In their modified OVO technique, the
5
Table 1 Table 3
Hyperparameters for LSTM, classifier C1, and classifier C2 used in the proposed Description of NSL-KDD dataset.
LIO-IDS. NSL-KDD Dataset
Hyperparameters NSL-KDD CIDDS-001 CICIDS2017
Training Data Testing Data
dataset dataset dataset
Samples % Samples %
LSTM Layers 1 4 4
Neurons/layer 8 4 16 Normal 67,343 53.45 9711 43.07
Learning Rate 0.01 0.005 0.001 DoS 45,927 36.45 7458 33.08
Batch Size 128 32 1024 Probe 11,656 9.25 2421 10.73
Epochs 60 20 100 R2L 995 0.007 2887 12.80
Classifier No. of Estimators 99 5 40 U2R 52 0.0004 67 0.002
C1
Random State 86 48 0
Classifier No. of Estimators 60 60 22
C2
Random State 86 6 0 Table 4
Description of CIDDS-001 dataset.
CIDDS-001 Dataset
Table 2 Samples % Samples %

Hyperparameters for binary classifiers used in Layer 2 of the proposed LIO-IDS.
Normal 53,000 53.17 15,000 56.77
Dataset Binary Classifiers (C3) No. of Estimators Random State DoS 36,000 36.12 6604 24.99
Port Scan 9117 9.15 3250 12.30
NSL-KDD DoS-R2L 5 0 Ping Scan 500 0.50 765 2.90
DoS-U2R 16 70 Brute Force 1055 1.06 803 3.04
Probe-R2L 29 77
Probe-U2R 67 86
CIDDS-001 DoS-Ping Scan 28 0
DoS-Brute Force 10 45
Port Scan-Ping Scan 4 89 ROS, Borderline-SMOTE (BSMOTE), and SVM-SMOTE (SSMOTE) to
Port Scan-Brute 2 75 handle the class imbalance problem and correctly identify both majority
CICIDS2017 DoS-Web Attack 40 0 and minority attacks in network traffic. The performance of the pro
DoS-Brute Force 10 0
posed LIO-IDS has been evaluated using Recall, Precision, F1-score,
DoS-Infiltration 80 0
Port-Web Attack 60 0 Accuracy, Receiver Characteristic Operating (ROC) curve, Area Under
Port-Brute Force 10 0 ROC (AUC) curve, training time, and average testing time. The working
Port-Infiltration 15 0 of the proposed system has been described in the following section.
Patator-Web Attack 10 0
Patator-Brute Force 10 0
Patator-Infiltration 10 0
3. The proposed LIO-IDS
The next section presents the details of the experiments that were performed for The proposed LIO-IDS handles the class imbalance problem of
developing the proposed LIO-IDS.
intrusion detection using LSTM and the Improved One-vs-One tech
nique. LIO-IDS is an Anomaly-based NIDS consisting of two layers: Layer
1 and Layer 2. The first layer separates network attacks from normal
authors combined predictions from the selected ensembles using the network data using LSTM classifier. The intrusions identified by Layer 1
majority voting approach. are sent to the second layer. Layer 2 performs multi-class classification
Liu et al. [44] proposed an improved OVO strategy in which the to categorize these attacks into their respective attack classes using I-
relative weight of a binary classifier was selected using KNN and the OVO and ensemble classifiers. Thus, the proposed LIO-IDS can identify
class centers of the training subset. The authors utilized their improved and classify attacks present in network traffic. It must be noted that
OVO method for performing multi-class sentiment analysis. Galar et al. attack classification performed by Layer 2 is as crucial as attack iden
[45] developed a distance-based strategy to combine the predictions of tification performed by Layer 1. This is because different defence
various binary classifiers. The authors decided on the importance of mechanisms are required to handle different kinds of attacks, and an
classifiers by measuring the closeness of the test sample with each class. appropriate defence technique can be selected only when the exact class
Our paper proposes the Improved OVO (I-OVO) technique that consists of the attack is known. Fig. 1 shows the working of the proposed LIO-
of three types of classifiers: C1, C2, and C3. Classifier C1 is trained to IDS. The detailed working of the proposed system has been discussed
distinguish majority classes, C2 is trained to distinguish minority classes, below.
while C3 distinguishes between a majority and a minority class. The In the training phase of LIO-IDS, the training dataset is pre-processed
proposed I-OVO technique dynamically selects classifier C3 from a set of to quantize and normalize the data. The pre-processed training dataset is
pre-trained binary classifiers based on the outputs obtained from C1 and then used to train different classifiers used by the proposed LIO-IDS. For
C2. The proposed I-OVO technique reduces the number of classifiers for Layer 1, the LSTM classifier is trained on the pre-processed dataset by
both testing and training, reducing the overall computational time. assigning binary labels (Normal and Attack) to the training samples. For
Our paper uses the proposed I-OVO technique for developing LIO- Layer 2, three types of classifiers, namely C1, C2, and C3, are trained to
IDS, which is a two-layer A-NIDS for the identification and classifica perform multi-class classification of the attacks identified by Layer 1. To
tion of network intrusions. At Layer 1, the proposed LIO-IDS performs train classifier C1, only the majority attack classes of the pre-processed
binary classification using the LSTM classifier to separate attack samples training dataset are selected, and (m-1) unique labels are assigned to
from normal network traffic. The identified attacks are sent to Layer 2 the samples of these (m-1) majority attack classes. If m is the total
for classification into respective attack classes. Layer 2 uses the proposed number of majority classes (including the Normal class), then there are
I-OVO technique to perform multi-class classification using RF and m-1 majority attack classes. To train classifier C2, only minority attack
balanced Bagging (henceforth referred to as Bagging) ensemble classi classes are used for training, and n unique class labels are assigned to the
fiers. Also, the proposed LIO-IDS utilizes oversampling techniques viz.
6
Table 5 majority classes in the traditional OVO technique need not be trained in
Description of CICIDS2017 dataset. the I-OVO technique. Similarly, the binary classifiers that were classi
CICIDS2017 fying two minority classes are also not required in the proposed system.
Hence, the proposed I-OVO not only reduces the number of classifiers to
be used for training ((2 + m*n) <= NC2), it also reduces the number of
Samples % Samples % classifiers that are needed for testing (3 <= NC2). The same has been
Normal 72,000 52.48 55,000 53.17 proved mathematically and shown in Appendix 1 towards the end of the
DoS 41,799 30.47 30,800 29.78 paper. Moreover, the traditional OVO technique requires an algorithm
Port Scan 13,000 9.48 10,000 9.67
to combine the predictions obtained from individual NC2 classifiers,
Patator 7997 5.83 5838 5.64
Web Attack 1367 1.00 813 0.79 whereas I-OVO does not require any such algorithm to get the final
Bot 1000 0.73 966 0.93 prediction. This reduction in classifiers decreases the training and
Infiltration 20 0.01 16 0.02 testing times needed by the proposed I-OVO to perform multi-class
classification compared to the traditional OVO technique.
To perform binary classification at Layer 1, the proposed LIO-IDS
uses a DL-based LSTM classifier. The intrusions identified by LSTM are
samples of these n minority classes. Further, ((m-1)*n) binary classifiers
passed to classifiers C1 and C2 simultaneously. C1 uses the RF algorithm
are trained using subsets of the pre-processed training dataset, such that
to distinguish the majority attack classes, while C2 uses the combination
each subset contains one majority attack class and one minority attack
of SVM-SMOTE and Bagging or Borderline-SMOTE and RF technique to
class. One of these pre-trained binary classifiers is selected as classifier
distinguish the minority attack classes of the dataset. To create the pre-
C3 in the testing phase.
trained classifiers in Layer 2, RF, Bagging, and AdaBoost classifiers have
In the testing phase, each test sample t is first passed through Layer 1.
been applied depending upon the pair of attack classes to be distin
If the first layer classifies t as benign, then t does not undergo further
guished. ROS and Borderline-SMOTE techniques have also been used
assessment by the proposed system. On the contrary, if Layer 1 identifies
with these classifiers to handle the class imbalance problem of NIDSs.
t as an attack, t is passed to Layer 2 of the proposed LIO-IDS. In the
The same has been explained in detail in the next section. It must be
second layer, t is classified using the proposed I-OVO technique that
noted that the presence of two different classifiers C1 and C2, in the I-
utilizes different ensemble classifiers. In this technique, t is first given as
OVO technique used at Layer 2 is essential because some classifiers excel
an input to two classifiers C1 and C2, simultaneously. These classifiers
at classifying majority classes, while other classifiers classify minority
are trained on mutually exclusive attack classes of the dataset: C1 is
classes accurately.
trained only on the majority attack classes, while C2 is trained only on
The use of separate classifiers for the majority and minority classes
the minority attack classes of the dataset. C1 analyses sample t and
increases the Accuracy of the proposed LIO-IDS. The selection of clas
outputs the majority attack class to which sample t resembles the most.
sifiers and data-balancing algorithms for the proposed LIO-IDS has been
Similarly, C2 outputs the minority attack class to which t resembles the
made after extensive experimentation. Table 1 shows the optimal
most. Since sample t can only belong to one attack class, a third classi
hyperparameters selected for developing LSTM, classifier C1 and clas
fier, C3, is used by the I-OVO technique to obtain the final prediction for
sifier C2 for the NSL-KDD, CIDDS-001, and CICIDS2017 datasets. Simi
sample t. Fig. 2 presents the algorithm for the testing phase of the pro
larly, Table 2 depicts the optimal hyperparameters used to create binary
posed LIO-IDS.
classifiers for Layer 2 of the proposed system. Out of all the binary
Classifier C3 is selected from a set of pre-trained binary classifiers at
classifiers developed for a dataset, one classifier is dynamically selected
the time of testing. In LIO-IDS, each of these pre-trained classifiers
as classifier C3 during the testing of each test sample. It can be seen from
performs binary classification between a majority attack class and a
Table 2 that in the case of the CICIDS2017 dataset, the value of random
minority attack class of the dataset. The pre-trained classifier that was
state was chosen to be 0 for all the binary classifiers. This was because in
trained on the two attack classes output by C1 and C2 is selected as
each case, optimal results were achieved using this value of random state
classifier C3 at the testing time. At last, C3 outputs the correct attack class
together with an appropriate value of the number of estimators.
of sample t. The proposed I-OVO method used in Layer 2 of LIO-IDS
improves the traditional OVO technique of performing multi-class
4. Experimental study
classification. If a dataset consists of N classes, then the traditional
OVO technique tests each input sample using NC2 binary classifiers to
The proposed LIO-IDS was developed using an Intel® Core™
obtain one prediction from each classifier. These NC2 predictions are
i7–8750H processor with Windows 10 operating system. Python pro
combined using a combining algorithm, such as majority voting, to get
gramming language was utilized for implementing the proposed system.
the final prediction.
The following sub-section describes the three datasets that have been
In the traditional OVO technique, as the number of classes increases,
used to assess the performance of the proposed LIO-IDS.
the number of classifiers required to classify each sample also increases
significantly. This, in turn, increases the testing time per traffic sample.
4.1. Datasets
In contrast, the proposed I-OVO technique uses only (2 + m*n) classi
fiers for training (where m denotes the number of majority classes, and n
This section discusses the three intrusion detection datasets that have
represents the number of minority classes, such that m + n = N) and only
been used in this paper for experimentation purposes. This includes NSL-
three classifiers for testing. The value (2 + m*n) includes classifiers C1
KDD, CIDDS-001, and CICIDS2017 datasets.
and C2 and the m*n binary classifiers developed for each pair of
majority-minority classes. It must be noted that in the proposed LIO-IDS,
4.1.1. NSL-KDD dataset
the I-OVO technique is applied to the attack classes only, i.e., the normal
The NSL-KDD (Network Socket Layer – Knowledge Discovery in
samples are separated at Layer 1, and only attack samples are classified
Databases) dataset was developed in 2009 as the successor of the KDD
using I-OVO. This eliminates the need to develop binary classifiers for
1999 dataset [46]. The NSL-KDD dataset overcame the drawbacks of the
comparing the normal class with each attack class separately, and LIO-
KDD dataset by removing several redundant and duplicate samples in
IDS requires only (2+(m-1)*n) binary classifiers.
training and testing datasets. It was created to maximize prediction
The proposed I-OVO technique compares all the majority classes
difficulty, and this characteristic makes it a preferred choice by re
using a single classifier C1, and all minority classes are compared using a
searchers even today [47]. NSL-KDD consists of separate training and
single classifier C2. Therefore, the binary classifiers distinguishing two
testing datasets containing network traffic samples represented by 41
7
Table 6
Detection rates obtained from the binary classification on the NSL-KDD dataset.
Classifiers Classes DoS vs. Probe R2L vs. U2R DoS vs. R2L DoS vs. U2R Probe vs. R2L Probe vs. U2R
DoS Probe R2L U2R DoS R2L DoS U2R Probe R2L Probe U2R
RF 96 90 94 61 92 75 100 73 97 81 100 84
ROS+RF 90 81 98 37 96 46 100 73 100 69 100 82
SMOTE+RF 89 84 96 55 94 65 99 78 99 74 100 81
BSMOTE+RF 91 85 97 61 91 64 100 82 99 76 100 84
SSMOTE+RF 90 85 96 64 92 65 100 75 99 65 100 70
Bagging 82 82 75 67 88 97 82 97 82 88 94 100
ROS + Bagging 74 83 95 37 90 74 96 78 96 90 100 70
SMOTE + Bagging 73 85 84 55 83 75 86 85 99 85 100 70
BSMOTE + Bagging 87 79 79 54 91 72 96 78 98 88 100 70
SSMOTE + Bagging 81 77 98 70 91 71 86 90 91 88 100 72
AdaBoost 98 89 87 33 87 93 91 90 92 69 100 78
Table 7
Detection rates obtained from the binary classification on the CIDDS-001 dataset.
Classifier DoS vs. Port Ping vs. Brute DoS vs. Ping DoS vs. Brute Port vs. Ping Port vs. Brute
Classes DoS Port Ping Brute DoS Ping DoS Brute Port Ping Port Brute
RF 100 99 98 98 100 98 100 98 98 58 98 95

ROS+RF 88 99 98 98 100 98 100 97 97 56 97 92
SMOTE+RF 98 99 98 98 100 98 100 56 96 68 97 93
BSMOTE+RF 100 89 99 97 99 97 100 100 96 66 99 96
SSMOTE+RF 99 98 99 98 100 98 100 97 96 68 98 96
Bagging 99 0 98 99 100 99 92 100 96 69 96 96
ROS + Bagging 98 0 98 99 100 98 100 26 95 81 97 94
SMOTE + Bagging 98 5 98 98 100 98 100 26 96 51 97 96
BSMOTE + Bagging 98 11 98 98 100 98 100 99 97 58 97 78
SSMOTE + Bagging 99 11 99 99 100 99 100 91 97 65 96 95
AdaBoost 98 100 0 2 100 98 100 96 99 19 4 99
OpenStack virtual environment and an External Server connected to the

Table 8 Internet. The training and testing datasets of CIDDS-001 consist of traffic
Detection rates obtained for classifiers C1 and C2 on the CICIDS2017 dataset. samples represented by 11 attributes. Each sample is assigned one of the
Classifiers DPP WBI five class labels: Normal, Denial of Service (DoS), Port Scan, Ping Scan,
Classes DoS Port Patator Web Bot Infiltration and Brute Force. The four attack classes were formed by combining 70
Attack sub-attack types present in the dataset. Since the original dataset con
RF 1 1 1 1 0.49 1
sists of more than 30 million flows captured in the OpenStack environ
ROS+RF 1 1 1 1 0.49 1 ment, this paper utilizes only a subset of the original CIDDS-001 dataset.
SMOTE+RF 1 1 0.98 1 0.49 1 The selected subset reflects the imbalanced nature of the original
BSMOTE+RF 1 1 1 1 1 1 CIDDS-001 and includes all the 70 sub-attack categories captured in the
SSMOTE+RF 1 1 1 1 0.49 1
OpenStack environment. Table 4 depicts the number of samples present
Bagging 1 1 1 1 0.49 1
ROS + Bagging 0.74 1 1 1 1 1 in various classes of the CIDDS-001 dataset. Since this dataset is one of
SMOTE + Bagging 0.74 1 0.92 1 1 1 the newest intrusion detection datasets with significant disproportion in
BSMOTE + 0.74 1 1 1 1 1 its classes, it is an appropriate choice for testing the proposed LIO-IDS.
Bagging
SSMOTE + 0.74 1 0.91 1 1 1
Bagging
4.1.3. CICIDS2017 dataset
AdaBoost 0.5 0.99 0.96 1 1 1 The CICIDS2017 dataset was developed by Sharafaldin et al. [49] by
generating and capturing network traffic for a duration of five days. The
dataset consists of normal traffic samples and traffic samples generated
from fourteen different types of attacks. The authors utilized the
attributes. Each instance has a label corresponding to the normal class or B-profile system to imitate benign human activities on the web and
one of the 22 attack types. These attack types are grouped into four generate normal traffic from HTTP, HTTPS, FTP, and SSH protocols.
major attack classes, namely Denial of Service (DoS), Probe, Remote to Different categories of attacks were generated using various tools
Local (R2L), and User to Root (U2R). Table 3 shows the number of available on the Internet. The original CICIDS2017 dataset consists of
samples present in various classes of the NSL-KDD dataset. The uneven eight CSV files containing 22,73,097 normal samples and 5,57,646
distribution of samples in different classes of this dataset makes it an attack samples. Each traffic sample consists of 80 features that were
appropriate choice for testing the proposed LIO-IDS. captured using the CICFlowMeter tool. Due to the huge size of the
original dataset, a subset of the CICIDS2017 dataset was selected for
4.1.2. CIDDS-001 dataset experimentation in this paper. The details of the selected subsets have
The CIDDS-001 (Coburg Intrusion Detection Data Set) dataset was been shown in Table 5.
developed in 2017 by Ring et al. [48]. It is a unidirectional NetFlow To create the training and testing subsets of the CICIDS2017 dataset,
dataset generated by emulating a business environment using an the attack samples were grouped into six main attack classes: DoS, Port
8
Scan, Patator, Web Attack, Bot, and Infiltration. The DoS attack class
Infiltration
consists of DoS Hulk, DoS Goldeneye, DoS Slowloris, DoS Slowhttptest,
DDoS, and Heartbleed attacks; the Patator class comprises of FTP-
Patator and SSH-Patator attacks while the Web Attack class consists of
1
1
1
1
1
1
1
1
1
1
1
WA-Brute Force, WA-SQL Injection and WA-XSS attacks. These subsets
contain all the samples for the Bot, Infiltration, Patator, Web Attacks,
Patator
and all DoS attacks except DDoS and DoS Hulk. As shown in Table 5, the
0.76
PaI
1
1
1
1
1
1
1
1
1
1
distribution of the CICIDS2017 dataset reflects the imbalanced nature of
network traffic and makes it suitable for evaluating the proposed LIO-
Bot
IDS.
1
1
1
1
1
1
1
1
1
1
1
In any dataset, the number of samples belonging to each class is
Patator
referred to as the class distribution of that dataset. When the distribution

PaB
1 of different dataset classes is neither equal nor close to equal, then such
1
1
1
1
1
1
1
1
1
1
datasets are said to be imbalanced datasets. In an imbalanced dataset,
few classes occupy a large proportion of the dataset (known as majority
Web Attack
classes), while remaining classes occupy a very small proportion of the

dataset (known as minority classes). Class imbalance can vary from a
slight imbalance to a severe imbalance when there is one sample in the
1
1
1
1
1
1
1
1
1
1
1
minority class for hundreds, thousands, or millions of samples in the

Patator
majority class [50–52]. The degree of imbalance can be measured using

PaW
the class imbalance ratio, which is defined as the number of majority

1
1
1
1
1
1
1
1
1
1
1
class samples divided by the number of minority class samples.

Real-world network traffic and intrusion detection datasets witness a
Infiltration
severe imbalance where certain attack classes have a class imbalance

ratio of 50:1, 100:1, and even 1000:1 (considering the Normal class as
1
1
1
1
1
1
1
1
1
1
1
the majority class). Such attack classes are designated to be minority

attack classes of the dataset. For all the classes of the datasets used in this
0.99
0.99
0.01
Port
PoI
paper, the class imbalance ratios have been shown in Table 14, Table 15,
1
1
1
1
1
1
0
and Table 16 in Appendix 2. The attack classes that have a class

0.11
0.11
0.03
imbalance ratio of 50:1 or more are considered to be minority classes in

Bot
1
1
1
1
1
1
1
intrusion detection datasets used in this paper. This includes R2L and
U2R classes in the NSL-KDD dataset; Ping Scan and Brute Force classes in
0.49
0.49
0.49
0.49
0.49
0.49
0.49
0.49
0.49
0.49
0.49
Port
PoB
the CIDDS-001 dataset; Web Attack, Bot, and Infiltration classes in the
CIC-IDS2017 dataset. The remaining classes are considered to be the
majority classes of the dataset.
Web Attack
The intrusion detection datasets selected in this paper consist of

categorical as well as numerical attribute values. To bring these values
0.99
1
1
1
1
1
1
1
1
1
1
in a uniform format, dataset pre-processing was performed on both of

them. This process has been explained in the following sub-section.
PoW
Port
1
1
1
1
1
1
1
1
1
1
1
4.2. Dataset pre-processing

Infiltration
Detection rates obtained from binary classification on the CICIDS2017 dataset.
In this paper, the intrusion detection datasets are pre-processed in

1
1
1
1
1
1
1
1
1
1
1
two steps: Quantization and Normalization. Quantization converts cat

egorical values into numerical values by assigning a unique number to
DoS
DI
each category of the attribute. The NSL-KDD dataset consists of three

1
1
1
1
1
1
1
1
1
1
1
categorical attributes (protocol, service, and flag), while the CIDDS-001

0.49
0.49
0.49
0.49
dataset consists of five categorical attributes (Date first seen, Proto, Src
Bot
1
1
1
1
1
1
1
IP Addr, Dst IP Addr, and Flags). All attributes and class labels of the two
datasets were quantized by assigning a unique number to each category.
DoS
DB
The CICIDS2017 dataset does not contain any categorical values, and
1
1
1
1
1
1
1
1
1
1
1
hence, only the class labels were quantized for this dataset. Quantization
Web Attack
is important because ML algorithms cannot directly process nominal

features. Python’s cat.codes function was used to perform this
0.99
0.98
0.99
0.98
0.96
0.99
0.94
0.84
0.92
0.99
conversion.
0.9
After quantization, all the numerical attributes were normalized to

bring their values in a uniform range [0,1]. Normalization was per
DoS
DW
formed using the Normalizer() function of Python’s sklearn library. It is

1
1
1
1
1
1
1
1
1
1
1
important to perform normalization so that the learning process of

BSMOTE + Bagging
SSMOTE + Bagging
Machine Learning algorithms is not affected by the range of values in

SMOTE + Bagging
Classifiers Classes
different dataset attributes. After performing dataset pre-processing, the

ROS + Bagging
BSMOTE+RF
SSMOTE+RF
next step was to select appropriate algorithms for each layer of the
SMOTE+RF
AdaBoost
ROS+RF
proposed LIO-IDS. All the features of the three datasets were used to
Bagging
Table 9
train and test the algorithms for the proposed system. This details of
RF
training and testing have been discussed in the following sub-sections.
9
Fig. 3. (a): Accuracy and Attack Detection Rate on NSL-KDD dataset; (b): Accuracy and Attack Detection Rate on CIDDS-001 dataset; (c): Accuracy and Attack
Detection Rate on CICIDS2017 dataset.
4.3. Layer 1 of the proposed LIO-IDS attack classes. The pre-trained binary classifiers differentiate between
DoS-Ping Scan, DoS-Brute Force, Port Scan-Ping Scan, and Port Scan-
To perform binary classification between the normal and attack Brute Force attack pairs. For the CICIDS2017 dataset, C1 distinguishes
classes, the LSTM classifier was used at Layer 1 of the proposed system. three majority classes, namely DoS, Port Scan, and Patator attacks, while
LSTM is a DL algorithm that contains different hyperparameters such as C2 classifies the minority attacks viz. Bot, Web Attack, and Infiltration.
the number of layers, number of neurons in each layer, learning rate, The set of binary classifiers for the CICIDS2017 dataset consists of
batch size, and number of epochs. Selecting the optimal values of these nine classifiers distinguishing DoS-Web Attack, DoS-Bot, DoS-Infiltra
hyperparameters is crucial for improving the performance of the LSTM tion, Port Scan-Web Attack, Port Scan-Bot, Port Scan-Infiltration, Pata
classifier. Hence, different configurations of LSTM were tested by tor-Web Attack, Patator-Bot, and Patator-Infiltration attack classes. To
selecting the number of layers as 1, 2, 3, 4, and 5. The number of neurons select appropriate algorithms for the binary classifiers of all the three
and the batch size were chosen to be 8, 16, 32, 64, 128, 256, 512, and datasets, several combinations of ensemble algorithms and data-
1024. The value of learning rate was selected as 0.0001, 0.001, 0.01, 0.1, balancing techniques were tested. Each of these combinations has
and 0.5. Besides, the number of epochs were varied between 10 and been implemented using the sklearn and imblearn libraries of the Python
1000. The optimal hyperparameters for LSTM on each of the three programming language. The detection rates obtained by different
datasets have been listed in Table 1 in Section 3. combinations for NSL-KDD, CIDDS-001, and CICIDS2017 datasets have
been shown in Tables 6–9, respectively. In these tables, the cells high
lighted in dark yellow colour depict the best classification performance
4.4. Layer 2 of the proposed LIO-IDS achieved out of all the combinations tested for a given set of attack
classes.
The second layer of the proposed system consists of classifiers C1, C2 As shown in Tables 6–9, classifier C1 was developed using the RF
and a set of pre-trained classifiers for selecting classifier C3. For the NSL- classifier for all three datasets. For constructing C2, the Bagging classifier
KDD dataset, classifier C1 distinguishes majority attack classes DoS and was selected to distinguish the minority attack classes of the NSL-KDD
Probe, while C2 distinguishes minority attack classes R2L and U2R. The and CIDDS-001 datasets. To enhance the detection rate of the minority
set of pre-trained classifiers for this dataset consists of four binary classes, these two datasets were oversampled using SVM-SMOTE before
classifiers distinguishing DoS-R2L, DoS-U2R, Probe-R2L, and Probe-U2R training the Bagging classifier. In the case of the CICIDS2017 dataset,
attack classes. In the case of the CIDDS-001 dataset, C1 distinguishes DoS classifier C2 was developed using RF and Borderline-SMOTE algorithms.
and Port Scan attacks, while C2 distinguishes Ping Scan and Brute Force
10
Fig. 4. (a): Recall values on the NSL-KDD dataset; (b): Recall values on the CIDDS-001 dataset; (c): Recall values on the CICIDS2017 dataset.
The third classifier, C3, was selected from a set of pre-trained binary 4.5. Evaluation metrics
classifiers based on the majority and minority class predicted by clas
sifiers C1 and C2. For the NSL-KDD dataset, the Bagging classifier was To evaluate the effectiveness of the proposed LIO-IDS, Accuracy,
selected to distinguish the DoS and R2L attacks, whereas a combination Recall, Precision, and F1-score were calculated for all the three datasets
of ROS and Bagging classifier was used to differentiate between Probe considered in this paper. Accuracy depicts the number of samples that
and R2L classes. The remaining two pre-trained binary classifiers used to have been classified correctly. The formula for calculating Accuracy is
distinguish DoS-U2R and Probe-U2R attacks were developed using the given by Eq. (7).
RF after applying the Borderline-SMOTE technique for handling class
TP + TN
imbalance. Accuracy = (7)
TN + TP + FP + FN
In the CIDDS-001 dataset, a binary classifier for distinguishing DoS-
Ping Scan classes was implemented using the Bagging classifier, while Recall denotes the number of samples of a class that were correctly
Port Scan-Ping Scan attacks were distinguished using a combination of identified out of all the samples belonging to that class. When Recall is
ROS and Bagging classifier. The pre-trained classifiers for DoS-Brute calculated for attack classes of intrusion detection datasets, it is also
Force and Port Scan-Brute Force attack classes were developed using referred to as the Attack Detection Rate. If class A contains n samples,
the RF after applying the Borderline SMOTE technique. Similarly, for the out of which only m samples were identified correctly, then Recall can
CICIDS2017 dataset, the RF algorithm was used to distinguish DoS- be calculated as shown in Eq. (8).
Infiltration and Port Scan-Web Attack classes. A combination of RF m
and ROS was used to develop a binary classifier for classifying DoS-Web Recall = (8)
n
Attack classes, while Port Scan and Infiltration attacks were classified
Precision denotes the number of samples that belong to a class out of
using RF and Borderline-SMOTE algorithm. The binary classifier for
all the samples that were predicted as belonging to that class. If npA
distinguishing Port Scan and Brute Force attacks was developed using
represents the number of samples that were predicted as class A samples,
the AdaBoost classifier, while the remaining classifiers were trained
and out of these, only maA samples belong to class A, then the formula for
using the Bagging algorithm.
Precision can be written as in Eq. (9).
The performance of the proposed LIO-IDS was evaluated using
different evaluation metrics. These have been described in the following Precision =
maA
(9)
sub-section. npA
F1-score refers to the harmonic mean of Recall and Precision values.

It is an evaluation metric that gives equal weightage to both Recall and
11
Fig. 5. (a): Precision values on NSL-KDD dataset; (b): Precision values on CIDDS-001 dataset; (c): Precision values on CICIDS2017 dataset.
Precision scores. Its formula is given by Eq. (10). performance was evaluated using the evaluation metrics mentioned
above, and the results have been presented in the next section.
2
F1 − score = 1 1
(10)
+
Recall Precision 5. Results
In addition to the metrics mentioned above, Receiver Operating
Characteristic (ROC) curves were plotted, and their corresponding Area This section presents the results obtained by comparing the proposed
Under the ROC Curve (AUC) values were calculated by performing bi LIO-IDS with state-of-the-art algorithms used for developing NIDSs.
nary classification on the NSL-KDD, CIDDS-001, and CICIDS2017 data These algorithms include DNN, CNN, XGBoost, RF, Siam-IDS [36], and
sets. The ROC curve is plotted using the False Positive Rate (FPR) on the I-SiamIDS [37]. Fig. 3(a), (b), and (c) depict the Accuracy and Attack
x-axis and the True Positive Rate (TPR) on the y-axis. In intrusion Detection Rate achieved by LIO-IDS and its counterparts on the
detection, FPR refers to benign network traffic incorrectly classified as NSL-KDD, CIDDS-001, and CICIDS2017, respectively. As it can be
malicious, while TPR is another name for Recall. The AUC value ob observed, the proposed LIO-IDS achieves the highest attack detection
tained from the ROC curve reflects how accurately a classifier can rate on all three datasets as compared to other algorithms in consider
perform binary classification. Higher values of the AUC depict the high ation. LIO-IDS also achieves the highest Accuracy values for the
detection ability of the classifier. For any ROC curve, the corresponding NSL-KDD and CIDDS-001 dataset and the second-highest Accuracy value
AUC value can be calculated by integrating the areas of small vertical for the CICIDS2017 dataset. These results highlight two important
trapezoids in the ROC curve. The area for each trapezoid can be obtained points: first, the proposed LIO-IDS accurately identifies attack traffic
using the formula shown in Eq. (11). from benign network traffic, and second, the proposed system detects a
much higher number of network attacks than its counterparts. Fig. 4
(a + b) ∗ h
A = Area of a trapezoid = (11) (a-c), 5(a-c), and 6(a-c) show the Recall, Precision, and F1-values ach
2 ieved by the proposed LIO-IDS and its counterparts on NSL-KDD,
where a represents the long base, b represents the short base, and h CIDDS-001, and CICIDS2017 datasets, respectively.
represents the height of the trapezoid. If a ROC curve can be divided into It can be seen from Fig. 4(a) that the proposed LIO-IDS achieved the
l non-overlapping trapezoids, then the AUC is given by Eq. (12) highest Recall values for Probe, R2L, and U2R attack classes of the NSL-
KDD dataset. It also obtained the third highest Recall value for the DoS
∑
l
attack class as compared to its counterparts. Similarly, for the CIDDS-
Area Under Curve (AUC) = Ai (12)
i=1
001 dataset, Fig. 4(b) shows that LIO-IDS achieved the highest Recall
values corresponding to DoS and Brute Force attack classes. For Ping
where Ai represents the area of trapezoid i. The proposed system’s Scan and Port Scan attack classes, the proposed system obtained the
12
Fig. 6. (a): F1-values on NSL-KDD dataset; (b): F1-values on CIDDS-001 dataset; (c): F1-values on CICIDS2017 dataset.
second-highest and the third-highest Recall values, respectively, third-best Precision values, respectively. For the CICIDS2017 dataset,
compared to other algorithms in consideration. From Fig. 4(c), it can be Fig. 5(c) shows that the proposed LIO-IDS attains the best Precision
observed that the graph for all the algorithms, except LIO-IDS, is very values for the Web Attack and the second-highest Precision values for
sparse. This is because most of the ML and DL algorithms used for the Normal class. The Precision values obtained by LIO-IDS for the
developing IDSs can only identify few attack classes with very high Patator and Bot attack classes rank third when compared to their
Recall values, and their performance is extremely low on the remaining counterparts. For the remaining attack classes, the values of Precision
classes. In contrast to its counterparts, the proposed LIO-IDS achieves achieved by LIO-IDS were close behind its counterparts. It must be noted
non-zero Recall values for all the classes of the CICIDS2017 dataset. that Precision is the ratio of the number of samples that truly belong to
Further, the proposed system achieves the highest Recall values for DoS, class A to the total number of samples predicted as belonging to class A.
Port Scan, and Web Attack classes and the second-highest Recall value Therefore, the Precision value can be high even when the denominator
for the Bot attack class. value (i.e., the total number of predictions made for a class) is very low.
These high Recall values depict that the proposed LIO-IDS identifies This is an important reason to consider when assessing the performance
almost all intrusions traversing the network. However, while ensuring of a system in terms of Precision values.
high Attack Detection Rates, some benign traffic samples that seem In terms of F1-values, the proposed LIO-IDS outperformed its coun
suspicious to the proposed system are also categorized as malicious. This terparts for Normal, Probe, and R2L classes of the NSL-KDD dataset as
causes the Recall values attained by LIO-IDS for the Normal class to be depicted in Fig. 6(a). For the DoS and U2R class, the F1-values obtained
slightly lower than its counterparts. Since attack detection is more by LIO-IDS were close to the values obtained by other algorithms in
crucial than a few misclassified benign samples, LIO-IDS is an effective consideration. For the CIDDS-001 dataset, Fig. 6(b) shows that the
NIDS. In terms of Precision, the proposed LIO-IDS achieved the best proposed system achieved the best F1-values for the DoS attack class. In
Precision value for the Normal class of the NSL-KDD dataset as shown in the remaining classes of this dataset, the proposed system ranked second
Fig. 5(a). For the same dataset, the third-highest Precision value was in terms of F1-values compared to its counterparts. In the case of the
obtained for the Probe attack class. For the remaining attack classes, the CICIDS2017 dataset, the proposed system obtained the best F1-score for
Precision values attained by LIO-IDS are close behind the values ach the Web Attack class and the second-highest F1 values for Patator, DoS,
ieved by its counterparts. Port Scan, and Bot attack classes. For the Normal class, LIO-IDS attained
For the CIDDS-001 dataset, the proposed LIO-IDS obtained the the third-highest F1 score as compared to other algorithms in consid
highest Precision values for Normal, Port Scan, and DoS classes, as eration. Fig. 7(a-f) present the ROC curves obtained for the binary
shown in Fig. 5(b). For the Brute Force and Ping Scan minority classes of classification performed by the proposed LIO-IDS and its six
this dataset, the proposed system achieved the second-best and the counterparts.
13
Fig. 7. (a): ROC curve for normal samples of NSL-KDD dataset; (b): ROC curve for attack samples of NSL-KDD dataset; (c): ROC curve for normal samples of CIDDS-
001 dataset; d): ROC curve for attack samples of CIDDS-001 dataset; (e): ROC curve for normal samples of CICIDS2017 dataset; (f): ROC curve for attack samples of
CICIDS2017 dataset.
A classifier is said to be effective if its ROC curve is aligned parallel to highest AUC for the attack class of the NSL-KDD dataset. For the CIDDS-
the y-axis till point (1,0), and from there, it extends parallel to the x-axis 001 dataset, the proposed system outperformed all the other classifiers
till point (1,1). This is because point (1,0) represents an ideal situation for both the Normal class and the attack class of the dataset. Similarly,
when the classifier has the least (0) False Positive Rate and the highest for the CICIDS2017 dataset, LIO-IDS achieved the highest AUC values
(1) True Positive Rate. As shown in Fig. 7(a-f), the ROC curves corre for the Normal class and Attack class of the dataset. These results
sponding to the proposed LIO-IDS trace an ideal path for the three highlight that the proposed LIO-IDS is more effective in identifying in
datasets considered in this paper. This highlights the effectiveness of the trusions from benign network traffic than all other algorithms in
proposed LIO-IDS in identify attacks from benign network traffic. This is consideration.
also verified by the AUC values provided in Table 10. The proposed LIO- In addition to the afore-mentioned evaluation metrics, the efficiency
IDS achieves the highest AUC value for the Normal class and the third- of the proposed LIO-IDS was tested by calculating the training time and
14
Table 10
AUC values for binary classification on NSL-KDD, CIDDS-001, and CICIDS2017
datasets.
Dataset
Classifier NSL-KDD CIDDS-001 CICIDS2017
Normal Attack Normal Attack Normal Attack
DNN 0.85 0.93 0.93 0.91 0.93 0.93

CNN 0.89 0.94 0.93 0.93 0.92 0.92
XGBoost 0.89 0.89 0.93 0.93 0.93 0.93
RF 0.84 0.84 0.97 0.97 0.72 0.72
Siam-IDS 0.80 0.81 0.65 0.65 0.63 0.70
I-SiamIDS 0.81 0.95 0.99 0.93 0.80 0.86
LIO-IDS 0.93 0.93 0.99 0.99 0.94 0.94
Table 11
Training times for LIO-IDS and its counterparts. Fig. 8. Accuracy comparison of LIO-IDS with LSTM and I-OVO techniques.
Training Times (seconds)
NSL-KDD CIDDS-001 CICIDS2017

The time taken to test each sample was noted for LIO-IDS and all its six
counterparts. For each classifier, the testing times of all the normal
DNN 7345.18 8375.83 309.18
samples were averaged to obtain the average normal time, and the
CNN 26,082.16 21,743.61 4511.48
XGB 104.19 272.39 132.11 testing times of all the attack samples were averaged to obtain the
RF 619.05 30.02 47.64 average attack time. This process was carried out for all three datasets,
Siam-IDS 2913.32 3619.86 516.22 and the results have been shown in Table 12. It can be seen from
I-SiamIDS 11,147.97 19,681.18 2943.86
Table 12 that the average time required by LIO-IDS for testing any traffic
LIO-IDS 391.13 345.10 153.25
sample is less than the time required by most of its counterparts
considered in this paper. Furthermore, the performance of the proposed
system was also compared with four recent works that developed NIDS
Table 12 using the NSL-KDD dataset. To ensure a fair comparison, the techniques
Average Testing times for LIO-IDS and its counterparts. proposed by the authors in [20–22], and [23] were implemented on the
Average Testing Times (seconds) same hardware that was used to develop the proposed system. The Ac
NSL-KDD CIDDS-001 CICIDS2017 curacy, training time, and testing time achieved by LIO-IDS and each of
these works has been shown in Table 13.
Normal Attack Normal Attack Normal Attack
Table 13 shows that the proposed system achieves the highest Ac
DNN 0.0020 0.0020 0.0020 0.0020 0.0020 0.0020
curacy value compared to all the four related works considered in this
CNN 0.0044 0.0034 0.0042 0.0030 0.0032 0.0031
XGB 1.0379 1.0346 0.9635 1.0387 1.0261 1.0618
paper. Though LIO-IDS achieves the third-best training time, this time is
RF 0.1287 0.1198 0.0020 0.0020 0.0063 0.0063 required only once when the proposed system is deployed for the first
Siam-IDS 0.0157 0.0170 0.1065 0.1050 0.0069 0.0068 time. Further, LIO-IDS attains an acceptable testing time per sample
I-SiamIDS 0.4396 0.9551 0.4346 1.0720 1.1092 0.9913 which is comparable to other related works. In addition, the perfor
LIO-IDS 0.0040 0.0061 0.0060 0.0076 0.0031 0.0053
mance of the proposed LIO-IDS was also compared with NIDS developed
using LSTM and the NIDS developed using I-OVO only. The results ob
tained from the comparison have been shown in Fig. 8.
Table 13 It can be seen from Fig. 8 that the proposed LIO-IDS achieves the
Comparison of proposed LIO-IDS with other related works. highest Accuracy values on all the three datasets compared to the NIDSs
Accuracy Training Time Testing Time developed using LSTM and I-OVO separately. Therefore, it can be seen
(seconds) (seconds) from the experimentation that the proposed LIO-IDS performs accurate
[20] 0.57 385.85 0.0072 intrusion detection on all three datasets, each of which contains a
[21] 0.77 1.3907 0.0047 different number and variety of attack classes. These results highlight
[22] 0.43 3053.38 0.0016 that the proposed system not only performs accurate attack identifica
[23] 0.43 113,078.38 0.0020 tion but does so in a time-efficient manner. Hence, the proposed LIO-IDS
Proposed LIO- 0.87 391.13 0.0061
IDS
is an efficient NIDS that can be deployed in the real-world for per
forming network intrusion detection.
the average time required for testing any traffic sample. Table 11 shows 6. Conclusion
the training times required by the proposed LIO-IDS and its counterparts
on the NSL-KDD, CIDDS-001, and CICIDS2017 datasets. It can be seen In the present digital era, Network-based Intrusion Detection Sys
from Table 11 that the training times vary greatly across different tems (NIDSs) are extensively used to identify malicious traffic traversing
datasets. This is because, for each dataset, a different configuration of computer networks. Though NIDSs can easily identify frequent attack
the classifier is required to identify different classes of the dataset types, they cannot identify infrequent intrusions accurately. This paper
accurately. It can be seen from Table 11 that the proposed LIO-IDS re proposed LIO-IDS, which identifies network intrusions and classifies
quires seven minutes for training on the NSL-KDD dataset, six minutes them into different attack classes. The proposed LIO-IDS separates
for training on the CIDDS-001 dataset, and three minutes for training on benign and malicious network traffic using a Deep Learning-based Long
the CICIDS2017 dataset approximately. Short-Term Memory (LSTM) classifier at Layer 1. The second layer
To compute the average testing time for a sample, ten normal sam further classifies these attacks into different attack classes using Random
ples and ten attack samples were randomly selected from each dataset. Forest and Bagging ensembles. To perform accurate multi-class
15
Table 14 OVO) technique, which is used at Layer 2 of LIO-IDS for the dynamic
Class imbalance ratio for NSL-KDD dataset. selection of ensemble classifiers. The proposed LIO-IDS also addresses
NSL-KDD Dataset the class imbalance problem of NIDSs by using Random Oversampling,
Training Samples CI Ratio Borderline-SMOTE, and SVM-SMOTE techniques. The performance of
Normal 67,343 1.00 the proposed LIO-IDS was evaluated using Accuracy, Recall, Precision,
DoS 45,927 1.47 F1-measure, Receiver Operating Characteristics (ROC) curve, Area
Probe 11,656 5.78 Under ROC curve (AUC), training time, and average testing time. The
R2L 995 67.68 results highlight that the proposed system performs accurate intrusion
U2R 52 1295.06
detection in a time-efficient manner. Thus, the proposed LIO-IDS is
suitable for deployment in real-world networks to perform network-
based intrusion detection.
Table 15
Class imbalance ratio for CIDDS-001 dataset. CRediT authorship contribution statement
CIDDS-001 Dataset
Training Samples CI Ratio Neha Gupta: Conceptualization, Methodology, Software, Writing –
Normal 53,000 1.00 original draft, Writing – review & editing, Data curtion, Visualization,
DoS 36,000 1.47 Formal analysis, Investigation, Resources, Funding acquisition. Vinita
Port Scan 9117 5.81
Jindal: Conceptualization, Project administration, Supervision, Writing
Ping Scan 500 106.00
Brute Force 1055 50.24
- review & editing, Resources, Validation. Punam Bedi: Conceptuali
zation, Project administration, Supervision, Writing - review & editing,
Resources, Validation.
Table 16
Declaration of Competing Interest
Class imbalance ratio for CICIDS2017 dataset.
The authors declare that they do not have any conflict of interest.
CICIDS2017
Training Samples CI Ratio
Acknowledgment
Normal 72,000 1.00
DoS/DDoS 41,799 1.72
Port Scan 13,000 5.54 The first author would like to acknowledge University Grants Com
Patator 7997 9.00 mission for partially funding this work via Junior Research Fellowship
Web attack 1367 52.67 Ref. No. 3505/(NET-NOV-2017).
Bot 1000 72.00
Infiltration 20 3600.00
classification, this paper also proposed the Improved One-vs-One (I-
Appendix 1
Here, the time-efficiency of the proposed I-OVO technique has been proved using the Principle of Mathematical Induction. The aim is to show that
the number of classifiers for training (2 + m*n) and testing (3) required by the proposed I-OVO technique is always less than or equal to the number of
classifiers required in the traditional OVO technique for both training and testing (NC2 each). The proof is given as follows:
Let m represents the number of majority classes, n represents the number of minority classes in the dataset, and N represents the total number of
classes such that N = m + n, then we need to prove that
( )
((2 + m ∗ n) + 3)⇐ 2∗N C2 (i)
This is equivalent to prove equation (ii) and (iii)

Number of classifiers required at training time:
(2 + m ∗ n)⇐N C2 (ii)
Number of classifiers required at testing time:
N
3⇐ C2 (iii)
Since OVO and I-OVO are techniques for performing multi-class classification, the proof considers cases when N > 2.
Proving Equation (ii) – for training
Case 1: N ¼ 3
For N = 3, either m = 1 and n = 2; or m = 2 and n = 1. In either of these two scenarios, only one of the two classifiers C1 and C2 will exist. In that
case, equation (ii) will change to (1 + m*n) <= NC2. Then
LHS: (1 + 2 × 1) = 3
RHS: 3C2 = 3
Hence, LHS = RHS. Therefore, equation (ii) is True for N = 3.
16
Case 2: N ¼ 4
For N = 4, there can be various combinations for the number of majority and minority classes, (m,n) as (1,3) or (2,2) or (3,1). Out of these three
cases, maximum number of classifiers are achieved for (2,2). The same has been shown as follows:
LHS: (2 + 2 × 2) = 6
RHS: 4C2 = 6
Hence, LHS = RHS. Therefore, equation (ii) is True for N = 4.
Case 3: N > 4
Proving by Mathematical Induction:
Base case: m = 3, n = 2
Total number of classes N = m + n = 3 + 2 = 5
LHS: 2 + m*n = 2 + 3 × 2 = 8
RHS: NC2 = 5C2 = 10
Since LHS < RHS, the equation (i) is True for the base case.
Assumption: Assume the equation (i) is True for m = k, n = l, and total number of classes N = k + l
i.e. (2 + k*l) < (k + l) C2
⇒ (2 + k ∗ l) < ((k + l) ∗ (k + l − 1))/2
⇒ 2 ∗ (2 + k ∗ l) (< (k + l) ∗ (k + l − 1) )
⇒(4 +(2 ∗ k ∗ l)〈 k2 +)2 ∗ k ∗ l + l2 − k − l (iv)
⇒4 < k2 + l2 − k − l
Proving that condition (ii) holds for m = k + 1, n = l + 1, and N = k + 1 + l + 1 = k + l + 2
i.e. prove that (2 + (k + 1) * (l + 1)) < (k + l + 2)

C2
⇒2 + k ∗ l + k + l + 1 <( ((k + l + 2) ∗ (k + l + 1))/2 )
⇒2 ∗ (3 + k ∗ l + k + l)〈 k2 + 2 ∗ k ∗ l + k + l2 + l + 2 ∗ k + 2 ∗ l + 2
( 2 )
2
⇒(6 + 2 ∗ k ∗ l + 2 ∗ k + 2 ∗ l)〈 k + l + 2 ∗ k ∗ l + 3 ∗ k + 3 ∗ l + 2 (v)
⇒4 < (k2 + l2 + k + l )
⇒4 < k2 + l2 − k − l + 2 ∗ k + 2 ∗ l
From equation (iv), (k2 + l2 - k - l) > 4. Since k and l will always be positive, therefore equation (v) holds True. Hence, (2 + m*n) <= NC2 will always
be True.
Proving Equation (iii) – for testing:
Base case: N = 3
LHS: 3
RHS: 3C2 = 3
Hence, LHS = RHS. Therefore, equation (iii) is True for N = 3.

Assumption: Assume that equation (iii) is True for N = X i.e.
X
3< C2
⇒3 < (X
( ∗/(X) − 1))/2/ (vi)
⇒3 < X 2/ 2 − ( − X/ 2)
2
⇒3 − − X 2 < X 2
Proving that condition (iii) holds for N = X + 1 i.e. prove that 3 < (X + 1)
C2
⇒((X
( /+ )1) ∗ (X))/2
/ (vii)
⇒ X2 2 + X 2
From equation (vi), 3 – X/2 < (X2/2)

Adding X/2 on both sides of equation (vi),
/ / ( 2/ ) /
3 − − X( 2 /+ )X 2 <
/ X 2 +X 2
2
⇒3 < X 2 + X 2
17
Hence, equation (iii) holds True.

Since both equation (ii) and (iii) always hold True, equation (i) is also always True. Further, it is observed that equality holds only for N = 3 while
for all N>=4, strict inequality holds, so as the number of classes increase, the proposed I-OVO technique will significantly outperform the existing
OVO technique. Therefore, the number of classifiers required by the proposed I-OVO technique is always less than or equal to the number of classifiers
required in the traditional OVO technique.
It must be noted that the proposed LIO-IDS uses the I-OVO technique only for attack classification at Layer 2. Hence, it does not consider the
Normal class while creating the classifiers C1, C2 and C3. This further reduces the number of classifiers required by the proposed system.
Appendix 2
Class Imbalance Ratio for NSL-KDD, CIDDS-001 and CICIDS2017 datasets
Appendix 3
Confusion Matrix obtained at Layer 1 of the proposed LIO-IDS

Tables 17-19.
Table 17
Layer 1 confusion matrix for NSL-KDD dataset.
NSL-KDD
Normal Attack
Normal 8037 1674
Attack 1198 11,635
Table 18
Layer 1 confusion matrix for CIDDS-001 dataset.
CIDDS-001
Normal Attack
Normal 14,680 320

Attack 116 11,306
Table 19
Layer 1 confusion matrix for CICIDS2017 dataset.
CICIDS2017
Normal Attack
Normal 42,927 12,073
Attack 2051 46,382 Table 22
Layer 2 confusion matrix for CICIDS2017 dataset.
CICIDS2017
Confusion Matrix obtained at Layer 2 of the proposed LIO-IDS Patator DoS Web Infiltration Bot Port
Tables 20-22. Attack Scan
Patator 4405 24 0 0 0 0
DoS 0 30,215 3 0 0 3
Web Attack 0 66 656 0 0 40
Infiltration 0 2 0 5 0 0
Table 20 Bot 0 1 0 0 474 490
Layer 2 confusion matrix for NSL-KDD dataset. Port Scan 0 13 13 0 0 9972
NSL-KDD
DoS Probe R2L U2R
References
DoS 6276 72 692 0
Probe 156 2105 132 0 [1] E. Vasilomanolakis, S. Karuppayah, M. Mühlhäuser, M. Fischer, Taxonomy and
R2L 153 257 1703 27 survey of collaborative intrusion detection, ACM Comput. Surv. 47 (4) (2015)
U2R 0 2 22 38 1–33.
[2] J. Lee, K. Park, GAN-based Imbalanced Data Intrusion Detection System, Pers
Ubiquit Comput, 2019, pp. 1–8.
[3] G. Haixiang, L. Yijing, J. Shang, G. Mingyun, H. Yuanyue, G. Bing, Learning from
class-imbalanced data: review of methods and applications, Expert. Syst. Appl. 73
(2017) 220–239.
Table 21
[4] A. Fernández, S. García, M. Galar, R.C. Prati, B. Krawczyk, F. Herrera, Learning
Layer 2 confusion matrix for CIDDS-001 dataset. from Imbalanced Data Sets, Springer, 2018, pp. 1–377.
CIDDS-001 [5] M.A. Ferrag, L. Maglaras, S. Moschoyiannis, H. Janicke, Deep learning for cyber
security intrusion detection: approaches, datasets, and comparative study, J. Inf.
Port Scan DoS Ping Scan Brute Force
Sec. Appl. 50 (2020), 102419.
Port Scan 2869 20 216 85 [6] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural. Comput. 9 (15
DoS 6 6572 0 26 November (8)) (1997) 1735–1780.
Ping Scan 243 3 504 3 [7] A. Aldweesh, A. Derhab, A.Z. Emam, Deep learning approaches for anomaly-based
Brute Force 22 17 0 720 intrusion detection systems: a survey, taxonomy, and open issues, Knowl. Based
Syst. 189 (2020) (2020), 105124.
18
[8] L. Breiman, Bagging predictors, Mach. Learn. 24 (1996) 123–140, 1996. [38] H. Zhang, L. Huang, C.Q. Wu, Z. Li, An effective convolutional neural network
[9] A.L. Buczak, E. Guven, A survey of data mining and machine learning methods for based on SMOTE and Gaussian mixture model for intrusion detection in
cyber security intrusion detection, IEEE Commun. Surveys Tutor. 18 (2) (2015) imbalanced dataset, Comput. Netw. 177 (2020), 107315.
1153–1176. [39] D. Gonzalez-Cuautle, A. Hernandez-Suarez, G. Sanchez-Perez, L.K. Toscano-
[10] L. Breiman, Random forests, Mach Learn 45 (1) (October 2001) 5–32. Medina, J. Portillo-Portillo, J. Olivares-Mercado, H.M. Perez-Meana, A.
[11] Y. Freund, R.E. Schapire, A short introduction to boosting, J. Japan. Soc. Artif. L. Sandoval-Orozco, Synthetic minority oversampling technique for optimizing
Intell. 14 (September (5)) (1999) 771–780. classification tasks in botnet and intrusion-detection-system datasets, Appl. Sci. 10
[12] J. Zhu, H. Zou, S. Rosset, T. Hastie, Multi-class AdaBoost, Stat. Interface 2 (3) (3) (2020) 1–19.
(2009) 349–360. [40] M.R. Pavan Kumar, P. Jayagopal, A preprocessing method combined with an
[13] G. Rekha, A.K. Tyagi, Necessary information to know to solve class imbalance ensemble framework for the multiclass imbalanced data classification, Int. J.
problem: from a user’s perspective, in: Proceedings of International Conference on Comput. Appl. (2019) 1–8.
Recent Innovations in Computing 2019, Jammu & Kashmir, India, 2020. [41] J.W. Mikhail, J.M. Fossaceca, R. Iammartino, A semi-boosted nested model with
[14] N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic sensitivity-based weighted Binarization for multi-domain network intrusion
minority over-sampling technique, J. Artif. Intell. Res. 16 (2002) 321–357, 2002. detection, ACM Trans. Intell. Syst. Technol. 10 (3) (2019) 1–27.
[15] H.M. Nguyen, E.W. Cooper, K. Kamei, Borderline over-sampling for imbalanced [42] A. Alabdallah, M. Awad, Using weighted support vector machine to address the
data classification, Int. J. Knowl. Eng.Soft Data Paradigms 3 (1) (2011) 4–21. imbalanced classes problem of intrusion detection system, KSII Trans. Internet Inf.
[16] H. Han, W.-.Y. Wang, B.-.H. Mao, Borderline-SMOTE: a new over-sampling method Syst. (TIIS) 12 (10) (2018) 5143–5158.
in imbalanced data sets learning. Advances in Intelligent Computing (ICIC 2005), [43] Z.-L. Zhang, X.-G. Luo, Y. Yu, B.-W. Yuan, J.-F. Tang, Integration of an improved
Lecture Notes in Computer Science, Hefei, China, 2005. dynamic ensemble selection approach to enhance one-vs-one scheme, Eng. Appl.
[17] M. Masdari, H. Khezri, A survey and taxonomy of the fuzzy signature-based Artif. Intell. 74 (2018) 43–53.
intrusion detection systems, Appl. Soft Comput. J. 92 (2020), 106301. [44] Y. Liu, J.-W. Bi, Z.-P. Fan, A method for multi-class sentiment classification based
[18] S. Aljbali, K. Roy, Anomaly detection using bidirectional LSTM. Intelligent Systems on an improved one-vs-one (OVO) strategy and the support vector machine (SVM)
and Applications. IntelliSys 2020. Advances in Intelligent Systems and Computing, algorithm, Inf Sci (Ny) 394 (July 2017) (2017) 38–52.
2020. London, United Kingdom. [45] M. Galar, A. Fernández, E. Barrenechea, F. Herrera, DRCW-OVO: distance-based
[19] S.A. Althubiti, E.M. Jones Jr., K. Roy, LSTM for anomaly-based network intrusion relative competence weighting combination for One-vs-One strategy in multi-class
detection, in: 2018 28th International Telecommunication Networks and problems, Pattern Recognit. 48 (1) (2015) 28–42.
Applications Conference (ITNAC), Sydney, NSW, Australia, 2018. [46] M. Tavallaee, E. Bagheri, W. Lu and A.A. Ghorbani, "NSL-KDD dataset," 2009.
[20] Y. Chuan-long, Y.-f. Zhu, J.-l. Fei, X.-z. He, A deep learning approach for intrusion [Online]. Available: https://www.unb.ca/cic/datasets/nsl.html. [Accessed 7 9
detection using recurrent neural networks, IEEE Access 5 (12 October) (2017) 2019].
21954–21961. [47] D. Chou, M. Jiang, Data-Driven Network Intrusion Detection: A Taxonomy of
[21] S.S. Dhaliwal, A.-.A. Nahid, R. Abbas, Effective intrusion detection system using Challenges and Methods, 2020, pp. 1–38, arXiv preprint arXiv:2009.07352.
XGBoost, Information 9 (21 June (7)) (2018) 149. [48] M. Ring, S. Wunderlich, D. Grüdl, D. Landes, A. Hotho, Flow-based benchmark data
[22] T.A. Tang, L. Mhamdi, D. McLernon, S.A.R. Zaidi, M. Ghogho, Deep learning sets for intrusion detection, in: Proceedings of the 16th European Conference on
approach for network intrusion detection in software defined networking, in: Cyber Warfare and Security (ECCWS), Dublin, Ireland, 2017.
International Conference on Wireless Networks and Mobile Communications [49] I. Sharafaldin, A.H. Lashkari, A.A. Ghorbani, Toward generating a new intrusion
(WINCOM), Fez, Morocco, 2016. detection dataset and intrusion traffic characterization, in: Proceedings of the 4th
[23] R. Vinayakumar, M. Alazab, K.P. Soman, P. Poornachandran, A. Al-Nemrat, International Conference on Information Systems Security and Privacy (ICISSP
S. Venkatraman, Deep learning approach for intelligent intrusion detection system, 2018), Funchal, Madeira - Portugal, 2018.
IEEE Access 7 (2019) 41525–41550. [50] S. Bagui, K. Li, Resampling imbalanced data for network intrusion detection
[24] P. Sun, P. Liu, Q. Li, C. Liu, X. Lu, R. Hao, J. Chen, DL-IDS: extracting features using datasets, J Big Data 8 (6 January (1)) (2021) 1–41.
CNN-LSTM hybrid network for intrusion detection system, Sec. Commun. Netw. [51] K. Bartosz, Learning from imbalanced data: open challenges and future directions,
2020 (2020) 1–11. Prog. Artif. Intell. 5 (4) (2016) 221–232.
[25] W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang, M. Zhou, HAST-IDS: [52] J.L. Leevy, T.M. Khoshgoftaar, R.A. Bauder, N. Seliya, A survey on addressing high-
learning hierarchical spatial-temporal features using deep neural networks to class imbalance in big data, J. Big Data 5 (1) (2018) 1–30.
improve intrusion detection, IEEE Access 6 (2017) 1792–1806.
[26] Y. Zhou, G. Cheng, S. Jiang, M. Dai, Building an efficient intrusion detection system
Neha Gupta: Neha Gupta is a research scholar at Department of Computer Science,
based on feature selection and ensemble classifier, Comput. Netw. 174 (2020),
University of Delhi. She completed her MCA (Masters in Computer Application) from
107247.
Department of Computer Science, University of Delhi in 2017 and later did 6 months
[27] F. Salo, A.B. Nassif, A. Essex, Dimensionality reduction with IG-PCA and ensemble
internship at Proptiger Realty Pvt Ltd in analytics and search engine optimization. Before
classifier for network intrusion detection, Comput. Netw. 148 (15 January) (2019)
her post-graduation, she did her BSc. (Computer Science) from Keshav Mahavidyalaya,
164–175.
University of Delhi. Her areas of interest include Cybersecurity, Intrusion Detection Sys
[28] M. Mazini, B. Shirazi, I. Mahdavi, Anomaly network-based intrusion detection
tems, Dark Web, Blockchain and Machine Learning.
system using a reliable hybrid artificial bee colony and AdaBoost algorithms,
J. King Saud Univ. - Comput. Inf. Sci. 31 (4) (2019) 541–553.
[29] F. Al-Obeidat, E.-S.M. El-Alfy, Hybrid multicriteria fuzzy classification of network Dr. Vinita Jindal: Dr. Vinita Jindal is an Assistant Professor in the Department of Com
traffic patterns, anomalies, and protocols, Pers Ubiquit Comput. 23 (16 November) puter Science, Keshav Mahavidyalaya, University of Delhi since August 2001. She was
(2017) 777–791. Head of Department of Computer Science, Keshav Mahavidyalaya, University of Delhi
[30] G. Kaur, A comparison of two hybrid ensemble techniques for network anomaly from June 2017 till May 2019. Before joining the Department of Computer Science, Keshav
detection in spark distributed environment, J. Inf. Sec. Appl. 55 (102601) (2020) Mahavidyalaya, University of Delhi, she worked as a Manager/ Sr. Faculty in the PCTI Ltd.
1–14. from July 1999 to July 2001. She did her Doctorate in Computer Science from University
[31] T. Su, H. Sun, J. Zhu, S. Wang, Y. Li, BAT: deep learning methods on network of Delhi in 2018. She did her M.Phil. in Computer Science from Madurai Kamaraj Uni
intrusion detection using NSL-KDD dataset, IEEE Access 8 (2020) 29575–29585. versity in 2007, MCA from IGNOU in 2000 and Bachelor in Mathematics from University of
[32] N. Chaabouni, M. Mosbah, A. Zemmari, C. Sauvignac, P. Faruki, Network intrusion Delhi in 1997. She is mainly working in the area of Artificial Intelligence and Networks.
detection for IoT security based on learning techniques, IEEE Commun. Surv. Her areas of interest include Cybersecurity, Intrusion Detection Systems, Dark Web, Deep
Tutor. 21 (3) (2019) 2671–2701. Learning, Recommender Systems and Vehicular Adhoc Networks to name a few.
[33] R. Mitchell, I.-.R. Chen, A survey of intrusion detection techniques for cyber-
physical systems, ACM Comput. Surv. 46 (4) (2014).
Prof. Punam Bedi: Punam Bedi is a Professor in the Department of Computer Science,
[34] P. Radoglou-Grammatikis, P. Sarigiannidis, Securing the smart grid: a
University of Delhi since March 2007. She worked as officiating Director, Delhi University
comprehensive compilation of intrusion detection and prevention systems, IEEE
Computer Centre from Oct. 20, 2017 to April 16, 2018. She was the Head of Department of
Access 7 (2019) 46595–46620.
Computer Science, University of Delhi during Oct. 2005 - Oct 2008. She also worked as the
[35] H. Hindy, D. Brosset, E. Bayne, A.K. Seeam, C. Tachtatzis, R. Atkinson,
acting Director, Delhi University Computer Centre from June 26 to Oct. 23, 2009. Before
X. Bellekens, A taxonomy of network threats and the effect of current datasets on
joining the Department of Computer Science, University of Delhi, she worked as a
intrusion detection systems, IEEE Access 8 (2020) 104650–104675.
Lecturer/ Reader in the Deshbandhu College, University of Delhi from January 1987 to
[36] P. Bedi, N. Gupta, V. Jindal, Siam-IDS: handling class imbalance problem in
January 2002. She did her Doctorate in Computer Science from University of Delhi in
intrusion detection systems using siamese neural network, in: Third International
1999. She did her M.Tech. in Computer Science from IIT Delhi in 1986 and M.Sc. in
Conference on Computing and Network Communications, Trivandrum, Kerala,
Mathematics from IIT Delhi in 1984. Her areas of interest include Cybersecurity, Intrusion
India, 2019.
Detection Systems, Recommender Systems, Deep Learning, Artificial Intelligence for
[37] P. Bedi, N. Gupta, V. Jindal, I-SiamIDS: an improved Siam-IDS for handling class
Healthcare, and Artificial Intelligence for Agriculture.
imbalance in network-based intrusion detection systems, Appl. Intell. (2020) 1–19.
19

1 s2.0 S1389128621001675 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S1389128621001675 Main

Uploaded by

Copyright:

Available Formats

Computer Networks 192 (2021) 108076

Contents lists available at ScienceDirect

LIO-IDS: Handling class imbalance using LSTM and improved one-vs-one

Fig. 1. The working of the proposed LIO-IDS.

Fig. 2. Algorithm for the testing phase of proposed LIO-IDS.

Training Data Testing Data

Table 2 Samples % Samples %

RF 100 99 98 98 100 98 100 98 98 58 98 95

OpenStack virtual environment and an External Server connected to the

referred to as the class distribution of that dataset. When the distribution

classes), while remaining classes occupy a very small proportion of the

minority class for hundreds, thousands, or millions of samples in the

majority class [50–52]. The degree of imbalance can be measured using

the class imbalance ratio, which is defined as the number of majority

class samples divided by the number of minority class samples.

severe imbalance where certain attack classes have a class imbalance

the majority class). Such attack classes are designated to be minority

and Table 16 in Appendix 2. The attack classes that have a class

imbalance ratio of 50:1 or more are considered to be minority classes in

The intrusion detection datasets selected in this paper consist of

in a uniform format, dataset pre-processing was performed on both of

4.2. Dataset pre-processing

In this paper, the intrusion detection datasets are pre-processed in

two steps: Quantization and Normalization. Quantization converts cat­

each category of the attribute. The NSL-KDD dataset consists of three

categorical attributes (protocol, service, and flag), while the CIDDS-001

is important because ML algorithms cannot directly process nominal

After quantization, all the numerical attributes were normalized to

formed using the Normalizer() function of Python’s sklearn library. It is

important to perform normalization so that the learning process of

Machine Learning algorithms is not affected by the range of values in

different dataset attributes. After performing dataset pre-processing, the

training and testing have been discussed in the following sub-sections.

F1-score refers to the harmonic mean of Recall and Precision values.

Classifier NSL-KDD CIDDS-001 CICIDS2017

Normal Attack Normal Attack Normal Attack

DNN 0.85 0.93 0.93 0.91 0.93 0.93

NSL-KDD CIDDS-001 CICIDS2017

classification, this paper also proposed the Improved One-vs-One (I-

This is equivalent to prove equation (ii) and (iii)

Hence, LHS = RHS. Therefore, equation (ii) is True for N = 3.

Hence, LHS = RHS. Therefore, equation (ii) is True for N = 4.

Proving by Mathematical Induction:

Total number of classes N = m + n = 3 + 2 = 5

Proving that condition (ii) holds for m = k + 1, n = l + 1, and N = k + 1 + l + 1 = k + l + 2

i.e. prove that (2 + (k + 1) * (l + 1)) < (k + l + 2)

Hence, LHS = RHS. Therefore, equation (iii) is True for N = 3.

From equation (vi), 3 – X/2 < (X2/2)

Hence, equation (iii) holds True.

Class Imbalance Ratio for NSL-KDD, CIDDS-001 and CICIDS2017 datasets

Confusion Matrix obtained at Layer 1 of the proposed LIO-IDS

Normal 14,680 320

You might also like

two steps: Quantization and Normalization. Quantization converts cat