You are on page 1of 63

ABSTRACT

As a tremendous amount of service being streamed online to their users along with massive
digital privacy information transmitted in recent years, the internet has become the backbone
of most people's everyday workflow. The extending usage of the internet, however, also
expands the attack surface for cyberattacks. If no effective protection mechanism is
implemented, the internet will only be much vulnerable, and this will raise the risk of data
getting leaked or hacked. The focus of this paper is to propose an Intrusion Detection System
(IDS) based on the Convolutional Neural Network (CNN) to reinforce the security of the
internet. The proposed IDS model is aimed at detecting network intrusions by classifying all
the packet traffic in the network as benign or malicious classes. The Canadian Institute for
Cybersecurity Intrusion Detection System (CICIDS2017) dataset has been used to train and
validate the proposed model. The model has been evaluated in terms of the overall accuracy,
attack detection rate, false alarm rate, and training overhead. A comparative study of the
proposed model's performance against nine other well-known classifiers has been presented.
CHAPTER -1
INTRODUCTION

In recent years, the amount of applications that stream services to their users has increased
explosively. This type of service requires minimal installations and computing power on the
user terminal because the applications are operating at the service carrier's cloud servers
instead of the local terminal; all the inputs and outputs are streamed to the users via the
internet. Seeing the obvious advantage of providing high-end service to customers, who are
not able to access high-end devices, many corporations have started to develop their
streaming services. For instance, entertaining service such as Google Stadia makes high-end
gaming, which is typically hardware demanding, now possible on any portable devices with
good internet connectivity. The game is processed and rendered at Google's cloud server with
user's inputs in real time, then the video is streamed back to the user's terminal via the
internet. However, the extensive data exchange at the network between the cloud servers and
local user terminals also expand the attack surface for intrusions. Malicious hackers may
deploy various types of attacks, such as Distributed Denial-of-Service (DDoS), Port Scan and
Infiltration attack to hijack valuable data or make servers unavailable to users. To stop these
cyberattacks from happening, the development of a reliable and effective Intrusion Detection
System (IDS) for cybersecurity has become an urgent issue to be solved.

The idea of the IDS is not new. In 1980, James P. Anderson delineated a type of cyber-
security device or application to monitor the network traffic with the intention to warn the
administrators of any suspicious activities or violations of the system policy [1]. The IDS was
proposed to include tools for the administrators to inspect the audit trails of the network
traffic in a system. By analysing the statistical information of audit trails, system
administrators could hunt for malicious traffic and take further actions to secure the system.
Most traditional IDS use an attack signature database that is created from expert's knowledge,
to cooperate with some predefined decision rules in the program to find out intrusions [2].
In [3], the author claimed that it is easy to develop and understand a signature-based IDS if
the network behavior of the target anomaly activities is known. However, in recent years,
cyberattacks have become more sophisticated, especially the attacks on the systems that are
storing or manipulating sensitive information.
The hand-crafted attack database by experts will obsolete quickly without constant updates.
Another major problem with the signature-based IDS is that they fail to be generalized to
detect innovative attacks that have signatures that are not logged in the signature database. In
short, the attack signature database causes a high storage overhead in order to include the
signature of all the known attacks, making them hard to get implemented or distributed. Also,
matching the incoming data flow with the signatures logged in the dataset could be
computationally expensive.

The Artificial Neural Network (ANN) architecture is a popular solution nowadays for
prediction and classification tasks. There are many advantages of ANN which make it
especially suitable for network intrusion detection [4]. Firstly, ANN performs great at
modelling non-linear data with a large number of input features; for example, network
packets. Secondly, once an ANN is trained, its forward propagation, in other words, the
predictions, are fast. This is crucial for network's speed if the IDS model is placed in-line
with the network traffic. In brief, an ANN is trained by a big dataset to become a generalized
solution for a specific task. The traditional signature-based IDS, on the other hand, is using
manually defined and human-understandable rules for intrusion detection. The ANN
approach relies on a solid mathematics backbone in how the error is backpropagated with the
stochastic gradient descent method to optimize the model [5], [6]. In addition, no predefined
rules are required during the ANN's training process. Meaning that the developers are not
required to possess much expertise in the field of cybersecurity to train an ANN-based IDS.
In addition, since the decision mechanisms of the ANN-based IDS are generalized from the
features of all the known attacks, they have the potential to detect the innovative attacks
sharing similar traits with the known attacks. On the other hand, the signature-based IDS will
fail to detect innovative attacks due to the lack of knowledge on their specific signature.

In this paper, an IDS for cybersecurity based on the Convolutional Neural Network (CNN)
architecture is proposed. Unlike some previously proposed CNN based IDS, which mostly
aiming at one-class or a subset of classes classification [7]–[9]. The proposed IDS has high
performance at multi-class classification, for identifying innovative and all the known attack
classes in a dataset. And compared to some state-of the-art CNN based IDS's that are aiming
at multi-class classification, such as Potluri's work in [10] for classifying the UNSW-NB 15
dataset [11]. The Canadian Institute for Cybersecurity Intrusion Detection System
(CICIDS2017) dataset, used in this paper, is more challenging for a classifier with its larger
feature set and more attack classes that are 59% and 55% more than the UNSW-NB 15
dataset. This paper is structured to include a brief review of some recent research on deep
learning applications at cybersecurity. Then in the model design section, CICIDS2017
database is explored to reveal the design decisions of the custom training and test dataset of
the proposed model. Also, the architecture of the CNN, which is the deep learning model on
which the proposed IDS is based, is discussed and the mathematical details are provided.
Finally, the architecture of the proposed IDS is presented along with its parameter selections.
And the model is evaluated by the defined benchmarks in the simulation result section to
provide performance validation and comparison.

Fig: 1.1 Cyber security


CHAPTER-2

LITERATURE REVIEW

Prior to the rise of machine and deep learning, the design of the IDS is usually based on
network experts’ knowledge of the attacks. In [12], Ansam Khraisat et al. classified various
kinds of IDS models based on their detecting techniques. In short, the IDS with statistics-
based techniques builds a distribution model for the benign traffic and flag the low
probability events as potential attacks. The IDS with knowledge-based techniques, on the
other hand, creates a knowledge base to reflect the legitimate traffic profile. Afterward, any
action that differs from the standard profile is flagged as an intrusion. Finally, there are IDS
with machine learning techniques. These models mine characteristics of each type of attack
from large quantities of data and classify traffic based on the learned characteristics. In
addition, a survey of the datasets for intrusion detection systems is presented by the authors
in [12]. They explored some public datasets such as Knowledge Discovery Databases (KDD)
Cup’99, Center for Applied Internet Data Analysis (CAIDA), Network Security Laboratory-
Knowledge Discovery Databases (NSL-KDD), and CICIDS2017, and a comparative study of
those IDS datasets, in terms of their feature selection and type of computer attacks, is
presented as well. Finally, the authors provided classification results of the selected datasets
based on their prior research. Specifically, on CICIDS2017, their model that combinedly uses
a Multiplayer Perceptron (MLP) neural network and payload classifier reaches an accuracy of
95.2%.

Machine learning is a popular solution for classification tasks due to its relatively simple
architecture and low computation overhead. Many studies that are applying machine learning
techniques on the CICIDS2017 dataset for attack classification have been proposed [13]–
[15]. Many of which have reached decent accuracy at one-class classifications for a certain
attack class, such as DDoS, in the dataset. However, to reach a usable multi-class
classification accuracy in detecting modern network intrusions, many data preprocessing
methods have been proposed to improve the model's performance. In [13], Yonghao Gu et
al. proposed a semi-supervised K-means model to detect the DDoS attacks in CICIDS2017.
Besides, they implemented a hybrid feature selection algorithm to exclude the unreasonable
features from being the input of the model to prevent “the curse of dimensionality”. Their
feature selection algorithm takes the available features as input. The features are processed by
a series of procedures including the data normalization, feature ranking, and feature subset
searching, and selected features are output by the algorithm. Finally, with their proposed
feature selection method, they reached a detection rate of 96.50% and a false positive rate of
30.5%.

In recent years, deep learning models such as neural networks have become more effective
solutions for classification tasks because of their ability to generalize more complex patterns
of features of tasks. In [16], the authors provided a study of anomaly analysis for intrusion
detection with K-Nearest Neighbors (KNN) and Deep Neural Network (DNN) to compare the
classification performance of a machine learning model to a deep learning model. They used
CICIDS2017 as the database for the simulations of the model's performance in the study.
They concluded that DNN performed significantly better than KNN. In specific, their DNN
yields an accuracy of 96.427%, which is much higher than 90.913% accuracy by the KNN. In
addition, they also compared the computation time overhead of the two models. The 110 (s)
CPU time of DNN is shorter than the 130 (s) CPU time of KNN, therefore, indicated that
DNN has a lower time overhead than KNN.

In [7], another study of using deep learning models for cybersecurity in the Internet of Things
(IoT) networks is proposed by Monika Roopak, Gui Yun Tian and Jonathon Chambers. The
authors used the DDoS attack samples in CICIDS2017 to evaluate the performance of MLP,
Long Short-Term Memory (LSTM), CNN, and a hybrid model of LSTM and CNN. They
reached a 98.44% precision by the LSTM model, followed by a 98.14% precision by the
CNN model and a 97.41% precision by the hybrid model. Lastly, the MLP model reached an
88.47% accuracy in their simulation. The authors also compared their results to some
machine learning models. After the simulation, they concluded that all the tested deep
learning models except MLP outperform the machine learning models such as SVM, Bayes
and Random forest.

In [8], Sungwoog Yeom and Kyungbeak Kim tested the performance of Naïve bayes, SVM
and CNN based classifier on the CICIDS2017 dataset. In the study, it was focus more on the
model's binary classification performance on each attack class in the dataset. The raw
CICIDS2017 dataset, which included separate sub-datasets of the network traffic in a day,
was used to train the models. The authors trained and evaluated a CNN based classifier based
on each sub-dataset, which mostly included only 1∼3 attack classes. And the accuracy,
precision, recall and F-measure of the models were record. After the evaluation, the authors
concluded that CNN and SVM generally had high detection rates. In addition, CNN was
better than SVM in term of the processing time. However, they also observed that CNN had
mediocre performance with datasets with many labels. In other words, it was challenging for
the CNN models if there were many labels need to be classified.

In [9], a CNN based IDS was proposed by Jiyeon Kim et al. In this paper, the authors
employed deep learning techniques and developed a CNN model for the CICIDS2018
dataset, which was a dataset sharing the same feature set with CICIDS2017 but with larger
sample counts. The training and test of the models in the study were performed on sub-
datasets which included a subset of types of network traffic from CICIDS2018. Therefore,
the models were simulated for multi-class classification for certain classes in the dataset, not
all of them at once. In this study, the experimental results showed that the performance of the
CNN based IDS could be higher than that of the recurrent neural network (RNN), which is
another deep learning model that is popular in the cases of time series data being used as
input. The CNN model proposed in this study reached a 96.77% accuracy in the sub-dataset
which was composed by the benign and DoS samples from CICIDS2018. On the other hand,
the RNN model tested in this study reached a 82.84% accuracy in the same dataset, which
was significantly lower than of the CNN model.

In [17], Ahmed Ahmim et al. proposed a novel hierarchical IDS that is based on Decision
Tree and Rules-based models. They also used CICIDS2017 as the dataset to evaluate the
performance of their model. Their proposed model combines Reduced Error Pruning Tree
(REP Tree) and JRip algorithm at its first stage. The input features of the dataset are used as
input at this stage to classify traffic as attacks or benign. Then a Forest PA classifier takes the
output of the two classifiers at the first stage, in combination with the input features of the
initial dataset as input to generate the final classification result. Their model reached decent
performance on almost every traffic class in CICIDS2017. They also provided a performance
comparison of their proposed model with 11 well-known classifiers to validate its
classification power. Within the 12 classifiers models, their model had the best classification
performance at seven attack classes and the lowest false alarm rate for benign traffic. This
model is competitive for its great overall classification performance on CICIDS2017.
Therefore, in the result section of this paper, the proposed IDS model has been compared
against their novel hierarchical IDS to assess the proposed model's performance.

CHAPTER-3

MODEL DESIGN

This section specifies many design elements of the proposed IDS model. At first, the
cybersecurity database CICIDS2017 that is used to train the proposed model is introduced
and analyzed. The data preprocessing methods and the design of the training dataset are also
presented. Then, the architecture and mathematical model of the CNN are introduced and
discussed. Finally, the architecture, operating flow of the proposed model and input data
collecting methods are presented in details.

A. CICIDS2017 Security Dataset

The training, validating, and testing of the proposed model are all accomplished with the data
from CICIDS2017 database. CICIDS2017 database was proposed in 2018 by Iman
Sharafaldin et al. [18]. The database was developed to combat the sophisticated and ever-
growing network attacks in the modern era. The creators of the database had the ambition to
replace outdating security databases such as DARPA98, KDD99, ISC2012, and ADFA13.
These databases were vastly used for the evaluation of intrusion detection and prevention
models in many papers in the past decades, but were in fact obsolete and unreliable to
use [19]. Therefore, for the proposed model, which aims at detecting modern cyberattacks,
CICIDS2017 is a top-notch option for the training database.

The CICIDS2017 database emulates the network traffic of a real environment for one week.
The network traffic contains normal network packets and a diverse set of attack scenarios that
are injected by the attack profiles created by developers. In total, the database contains
2830,743 samples, and the composition of the traffic classes is presented in Table 1.
TABLE 1: CIDIDS2017 With Sample Size and Class Composition

Each sample in the database contains 78 features that are extracted from the traffic of two
nodes in the network, and a label to indicate the class of the traffic. A few examples of the
extracted features are the destination port, flow duration, total forward packets, and total
backward packets. Among the 78 features, the proposed model takes all of them as input
except for the destination port. The destination port of the network traffic may be a useful
audit trail for the network admin to track down the place where a cyberattack is initiated or
target towards. However, it is a method of encoding the network ports into identification
numbers. The destination port is not a quantitative measure of network traffic that our neural
network-based IDS is expected for input. Accordingly, the inclusion of it, as an input feature,
can cause unexpected issues at the training process of the proposed neural network model.
The rest of the 77 features are all quantitative measure, thus, they are all valid inputs for the
proposed model.

B. Design of the Training and Test Dataset

Albeit CICIDS2017 is an advanced modern security database with merits, it does have certain
shortcomings that need to be considered and addressed to unlock its full potential. There are
three main issues of the database: scattered presence, missing values and class imbalance.
The network traffic is logged into eight separate files corresponding to the time window and
class of the traffic sample. The scattered presence is not ideal for the training of the proposed
model since the model aims to classify all the cyberattacks in the database instead of
particular types of attack. Further, according to [19], CICIDS2017 has a total of 288,602
samples with an absent class label, and 203 instances of missing information. The samples
with an absent class label need to be removed, and the missing information needs to be
restored.

Table 1 shows the class composition of the original data in CICIDS2017. Among all the
samples, the benign samples have a proportion of 80.301%, which is the highest in the
database. On the opposite, the Heartbleed samples have a proportion of only 0.001%. This
high imbalance of in-class sample size typically leads to biased classification results. It needs
to be avoided in the proposed model to prevent a drop at overall performance.

The three issues in the database raise the difficulty for the neural network model to reach
decent performance. However, they are not necessarily defects for a database, which is aimed
at simulating sophisticated network traffic. In fact, the three issues, mentioned above, do have
a high chance to appear in the data that is collected from a real-world environment.
Therefore, by solving these shortcomings in CICIDS2017, the proposed model is also
prepared to be implemented in real environments.

The first issue of missing values is solved in the data preprocessing phase by a data imputer.
In this case, the missing values are all replaced by 0 to prevent value errors. In addition, the
rest of the two issues, namely, class imbalance and scattered presence, are solved by building
a custom database that concatenates all the separate files of CICIDS2017 with balanced class
composition. This custom database is called α-Dataset.

Table 2 presents a few design decisions of the α-Dataset, such as the size of samples for each
class, splitting strategy of training and test set, and the class composition of the training set.
Balancing the samples for a high imbalance database is a problem with trade-offs. Most
neural network models work better with databases with balanced samples in each class.
However, neural network models also require a big dataset for better generalization of the
task assigned to it. Therefore, it is not feasible to simply trim down the number of samples in
the classes with a higher proportion to match with the minority classes. For the best of both
worlds, the database is divided into two categories. Namely, the normal attack samples, and
minority attack samples that are marked grey in Table 2.

TABLE 2: α-Dataset With Sample Size, Training/Test Distribution and Class Composition
in Training Dataset
The normal attack samples include the benign samples from the Monday file, DDoS, DoS,
and Port Scan. Each of the four classes is split into training and test samples with a distinct
ratio. These ratios are tuned to generate a training set with a balanced size of samples across
the four classes. Table 2 shows that each of the four classes has roughly 102,000 samples to
form a balanced training set. On the other hand, the minority samples contain the rest of the
attack samples in the database. Given their small size of samples, 80% of the samples, in each
class, become training samples, and the whole class is also used as test samples to provide
sufficient data for the training and testing process.
C. Convolutional Neural Network

CNN is a popular neural network that is typically used in image processing tasks. In recent
years, due to its popularity, CNN is also extensively used in natural language processing,
video analyzing, and even some models with sequential inputs. The unique convolution and
pooling process of CNN allow the model to learn complicated patterns of features form a
high-dimensional data space while maintaining reasonable storage and computation time
overhead. This advantage is significant when a CNN model is trained with a network traffic
dataset, given that these datasets typically have a large feature set. For example, there are 78
features recorded for each traffic sample in the CICIDS2017 dataset, which can lead to high
computational complexity for other deep learning models such as DNN [20]. Also, for any
network IDS, it is exceptionally crucial to have the ability to capture the complicated features
of network traffic within a short amount of time. Therefore, it is believed that CNN is the
priority neural network model for developing the prototype of the proposed
model. Fig. 1 shows the architecture of a CNN model with the input size of 8x8 matrix. The
whole model could be separated into three main sections: convolution layer, pooling layer,
and fully connected layer. In terms of functionality, the convolution layer and pooling layer
together handle the feature extraction of the training samples, and the fully connected layer
processes the final classification. At the convolution layer, CNN model convolves the input
sample with some specific matrices called “feature detector” or “kernel map”. The result of
the convolution is called “feature map”. Feature detectors are matrices used to extract, from
the input sample, specific features, such as patterns, shape and lines. Their value is initially
randomized and gradually updated by the optimizer during the training process. After the
input sample is convolved with the feature detectors, the feature maps are generated. For
CNN model with K feature detectors, this process could be mathematically denoted as:

featuremap{i}=input⊗featuredetector{i};i∈K.(1)
Fig 3.1: Architecture of a basic convolutional neural network.

Then, the feature map calculated by (1) is mapped to a non-linear activation function to break
the linearity of the model. In Fig. 1, the whole process of generating the feature maps and
mapping them to the activation function is represented by the green connections at the
convolution layer.

The second section of the model is the pooling layer. At this layer, the feature maps will be
pooled to bring down their dimension. The purpose of this process is to eliminate noise and
model's dependent on the spatial location of the learned patterns.

In short, pooling mitigates the effects when a specific pattern that the model is trained to
recognize appears at different locations or angles at the input sample. Furthermore, to
eliminate noise, pooling brings down the dimension of the feature maps, while preserving the
valid information carried by them. This also eases the calculation of any further processing to
the feature maps.

After the feature maps are calculated and pooled, they will undergo a “flattening” process
before being passed to the fully connected layer. The flattening process converts the feature
maps from 2-D matrices to 1-D arrays in row-major order, which is the format expected by
the fully connected layer. The fully connected layer is composed of input neurons, optional
hidden neurons and output neurons. Each neuron is connected to every neuron at the adjacent
layers with distinct weights and biases as the parameter. The output neurons, which are also
the final output of the CNN model, determine to which class each input sample belongs.

In the case of cybersecurity, the model can use the output neurons to represent the benign
class and every class of attack that the model is trained to classify. When a network traffic
sample is processed by the model, it will be classified as the class that is represented by the
output neuron that holds the highest value at the output layer.

One of the characteristics that distinguish CNN from other neural networks is weight sharing.
CNN has shared weights, which means the model uses the same weight vectors to do the
convolution. In other words, the feature detectors, which contain the weight vectors, are
replicated at every convolution process at the convolution layer. There are some advantages
to the implementation of shared weights. First, it makes the model have substantially fewer
parameters to be optimized. This means the optimizer can potentially converge the model to
optimal loss with less time overhead. Secondly, it slightly lowers the degrees of freedom of
the model. This could help the model to avoid overfitting, which happens when the model
makes too much of an effort to fit itself to a limited set of samples.

The constraint on weights also enables the model to achieve better generalization on the tasks
assigned to it. Briefly, CNN has merits that are suitable for the development of IDS for
cybersecurity in the modern era. CNN requires less time overhead at the testing process.
Besides, with its convolution mechanism, CNN could potentially learn the much more
complex characteristics of some modern cyberattacks, which other neural network models
struggle to capture. Lastly, CNN can achieve better generalization on the classification of
cyberattack samples. This enables the IDS to potentially detect innovative attacks that share
similar traits with the known attacks.

However, from [7]–[9], it is observed that CNN, albeit with its merits at generalize complex
patterns of features in the dataset, has some potential issues at multi-class classifications. The
CNN models proposed in these studies are tested on classification tasks of either one-class or
a subset of classes in the dataset. This is not a surprise since neural networks depend a lot on
training data to reach the full potential and prevent issues such as biased classification results
or over-fitting [4]. Therefore, to access the advantage of the neural network models, it is
important to train the model with a dataset that is not only rich in its sample counts and
feature set but also has a balance class composition.

In this study, a novel data-preprocessing method is proposed in the Model Design section to
generate a relatively balance sub-dataset, which still includes all the available attack classes
from the CICIDS2017 dataset, for the training and test of the proposed IDS model. And the
proposed IDS is trained to perform multi-class classifications on every types of attacks that
are known in the CICIDS2017 dataset.

D. Development of the Proposed IDS Model

The proposed IDS model, in this paper, is developed on Python 3. PyTorch is used as the
framework to build the neural network layers of the proposed model [21].

CNN has many layers to be implemented and some parameters to be selected before training
one. Empirically, the number of convolution and fully connected layers for a CNN for
network intrusion detection tasks could be selected from the range of 1∼3 layers
each [8], [9], [22]. In general, the more complex the task assigned to a neural network, the
more layers are added. For example, the ResNet-50, which is a large CNN that classify
images into 1000 object classes, has 50 layers in total. However, a large network size does
not always guarantee the performance of a CNN [22]. The input of the model is batches of
matrices. Each matrix has a size of 9 × 9, which is composed of 77 extracted features from
the network traffic samples and four zero pads for format concern. The model contains two
convolution layers and two fully connected layers. The decision of hyperparameters, such as
the number of layers, the kernel size for convolution, and the number of neurons in a fully
connected layer, are all determined by exhaustive grid search. In this method, all the
hyperparameters are selected from a subset of values that are manually specified, and the
model will be evaluated iteratively to search for the best combination in the parameter values.
The architecture of the proposed model is presented in Fig. 2. At the first convolution layer,
the kernel of the convolution is designed to have a size of 3 × 3, and the stride and padding of
the convolution to be one. In addition, the number of output feature maps is selected to be 16,
and the activation function to be a rectified linear unit (ReLU) function at this layer.
Accordingly, for each matrix in a training batch, the model generates 16 feature maps at the
first convolution layer. Each feature map has a size of 9x9 and is mapped to a ReLU function.
Then, the feature maps are pooled at the first pooling layer by the Max Pooling algorithm
with a kernel size of 2x2, stride of 2, and padding of 1. This generates 16 pooled feature
maps, each has a size of 5x5. Afterward, the pooled feature maps are passed to the second
convolution layer and pooling layer, which both have the same kernel size, stride, and
padding configuration as the first two layers.

Fig 3.2: Architecture of the proposed IDS model.

The second convolution layer is designed to output 32 feature maps, and they are also
individually mapped to a ReLU activation function. Eventually, after the two convolution
layers and two pooling layers, the proposed model generates 32 pooled 3 × 3 feature maps for
each input matrix. At the flattening layer, all the elements of the pooled feature maps are
converted into a 1-D array in row-major order, which is the format expected by the fully
connected layer. This generates an array of 288 elements for each input matrix. Then the
array becomes an input sample of the fully connected layer, which has 288 input neurons, a
hidden layer and nine output neurons.
Consequently, the proposed model is trained to classify network traffic into nine different
classes represented by the nine output neurons, namely, Benign, Brute Force, Bot, DDoS,
DoS, Heartbleed, Infiltration, Port Scan, and Web Attack. The number of classes is designed
this way for the model to reach a better generalization for the cyberattacks.

For a certain dataset with a limited number of samples, when the number of classes increases,
the number of samples for each class decreases inevitably. It raises the difficulty of the model
to generalize the characteristics of each class. Therefore, the proposed model has a custom
dataset function to merge some of the samples, based on their type of attack, into a bigger
class.

For instance, from CICIDS2017 database, samples with the four labels of DoS attacks,
namely, DoS GoldenEye, DoS Hulk, DoS Slowhttptest, and DoS slowloris are merged into a
single DoS class. The same merging strategy is also implemented at various labels of Brute
Force and Web Attack samples in the database to form bigger classes of the two types of
attacks.

Eventually, the samples from CICIDS2017 are assorted into nine distinct classes with some
greater sizes of in-class samples to facilitate the model's generalization of cyberattacks.
Hence the selection of nine output neurons at the final output layer of the model. Meanwhile,
the original label is still attached to each sample for label-wise performance evaluation of the
proposed model.
CHAPTER-4

SIMULATION RESULTS

This section presents in detail the performance of the proposed IDS model based on the
simulation results with some benchmarks. The benchmarks include the Detection Rate (DR),
True Negative Rate (TNR), and overall accuracy, which are all explicitly defined and
calculated. In addition, a comparative study of the proposed model with nine other well-
known classifiers is provided. Finally, the proposed model's performance on innovative
attacks is also validated. All the simulations in this section are done on a workstation of the
Networked Embedded Systems (NES) Laboratory at California State University, Long
Beach. This workstation runs a 64-bit Windows 10 OS and is equipped with an Intel Xeon
W-2125 processor and 64 GB of RAM.

A. Performance Benchmarks

To evaluate the classification performance of the proposed model on cyberattacks, it has been
simulated with several benchmarks that are derived from the confusion matrix generated by
the model. Confusion matrix, as shown in Table 3, is a matrix showing all the possible
scenarios of classification results. In terms of cybersecurity, the positive class is defined to be
the attack samples, and the negative class contains the benign samples. There are four
scenarios in a confusion matrix, namely, True Negative (TN), True Positive (TP), False
Negative (FN) and False Positive (FP). Among the four scenarios, TN and TP contribute to
the overall accuracy of the classification outcomes. They represent the instances when a
model successfully predicts a sample to be its true case. On the other hand, instances of FN
and FP deteriorate the performance of a classification model. They occur when a model
predicts a sample incorrectly. In a sense of cybersecurity, FP occurs when a benign sample is
classified as an attack, which is also known as a false alarm.
TABLE 3: Confusion Matrix

Confusion matrix is a great tool to record the simulation results of a classification model.
However, it is not necessarily easy to read nor handy enough to quickly evaluate the
performance of IDS models. Thus, for comparative studies, some performance benchmarks
are derived from the confusion matrix, which are universally used. The first benchmark is
DR, which could be understood as the ratio of the samples in an attack class, which are
correctly classified as attacks by an IDS model. Mathematically, DR could be derived from
the confusion matrix and denoted as:

DRclass=TPclassTPclass+FNclass.(2)

In general, when an IDS model has a high DR at an attack class, it implies the model
performs great at recognizing this type of attack and not confusing them with the benign
samples.

TNR is a benchmark of an IDS model's classification power on the benign samples. When an
IDS model yields a high TNR, it suggests that the model performs well on isolating the
benign samples from the attack samples. TNR is also derived from the confusion matrix.
Mathematically, it could be denoted as:

TNR=TNBENIGNTNBENIGN+FPBENIGN.(3)

Another performance benchmark, which is closely related to the TNR, is the False Alarm
Rate (FAR). As its name implies, FAR gives the probability of a false alarm, which occurs
when a benign sample is falsely classified as an attack, initiated by the IDS model. It is very
important to evaluate a classifier's performance at FAR, especially in the cases of anomaly
detections. A classifier for network intrusion detections is not trustworthy when it has a high
DR for attacks but also a high FAR. In such case, the classifier will perform poorly with data
from the real world. Given that the benign traffic is the majority in an ordinary network
environment, a high FAR means the IDS can falsely warn the network admin on benign
traffic very frequently, making the IDS not usable. In other words, for an IDS model, the
higher the FAR, the lower the TNR. FAR could be denoted as:

FAR=FPBENIGNTNBENIGN+FPBENIGN=1−TNR.(4)

Finally, the proposed IDS model's overall accuracy is also provided. The overall accuracy
reflects a classifier's ability of classifying each sample into its true class. In the case of
network intrusion detections, it is a comprehensive performance metric of an IDS's
classification power at both benign and attack samples. It is defined to be the ratio of the total
correct predictions made by the classifier to the total number of the input samples. The
overall accuracy is calculated as follows:

Accuracy=TNoverall+TPoverallTNoverall+TPoverall+FNoverall+FPoverall.(5)

B. Performance on the α-Dataset

The proposed IDS model is trained on the training set from the α-Dataset. The Adam
optimizer [23] is used in the training process. The learning rate is set to 0.0002 for the first 50
epochs and decreased to its 1/10 for refine tuning. Some performance benchmarks of the
trained model are shown in Table 4. In Table 4, DR and TNR of the proposed model on the
training set and test set are presented. The proposed model reaches over 90% in DR at 12
attack classes out of the total 14 classes. In addition, at 10 attack classes, the proposed model
successfully yields over 99% at DR. Also, TNR of the benign class reaches 98.984%. The
proposed IDS model performs relatively poorly at the Botnet and Sql Injection class when
compared to the other attack classes. This is due to the small size of in-class samples of the
two classes. Botnet and Sql Injection only make proportions of 0.07% and 0.001%,
respectively, in the original CICIDS2017 database. Accordingly, the proposed IDS model
struggles to capture the characteristics of these two types of network traffic due to their
insufficient samples. This also happened at some other IDS models using CICIDS2017 as the
training database, as shown in Table 5.
TABLE 4: Performance of the Proposed Model on α -Dataset
TABLE 5: Class-Wise Performance of the Proposed IDS Model and Other Classifiers

C. Performance Comparison

The proposed IDS model is also evaluated by comparing to some well-known classifiers.
In Table 5, the class-wise DR and TNR of the following nine models: Hierarchical [17],
WISARD [24], Forest PA [25], J48, LIBSVM [26], FURIA [27], Random Forest, MLP, and
Naive Bayes are presented. In addition, the figures in Table 5 that are bolded represent the
best performer in a class. For example, the 98.984% TNR yielded by the proposed IDS is
bolded to indicate it has the highest TNR performance among the competitors.

Among the selected models, the proposed IDS model has the highest DR at 10 attack classes
out of the total 14 classes. These 10 classes are the Bot, DDoS, DoS GoldenEye, DoS Hulk,
DoS Slowhttptest, DoS slowloris, Heartbleed, Port Scan, Web Attack-Brute Force, and Web
Attack-XSS. For the rest of the four attack classes, in terms of DR, the proposed model is
ranked the second at FTP-Patator, the seventh at SSH-Patator, the third at Infiltration, and the
fourth at Web Attack-Sql Injection. In addition, the proposed model has the highest TNR for
benign samples compared to the nine selected models.

This also means that this model has the lowest FAR, which can be calculated by (19).
Accordingly, out of the total 15 traffic classes in CICIDS2017, the proposed model has the
best classification performance at 11 classes when compared to the rest of the nine classifier
models. Finally, in Figs. 3 and 4, the overall attack DR, FAR and accuracy of the ten selected
classifiers are presented to provide a visually intuitive performance comparison. From the
figures, it is easily observable that the proposed model has the highest overall DR and
accuracy and the lowest FAR within the ten models.

Fig 4.1: Overall detection rate and accuracy of the ten selected classifiers.

Fig 4.2: False alarm rate of the ten selected classifiers.


In addition, it has a minimal performance gap between the overall DR and accuracy. Noted
that the overall DR is calculated by the ratio of the attack samples that are not classified as
benign by the model. And the overall accuracy is calculated based on the ratio of the samples
that are correctly classified into its true case. Therefore, the small gap between overall DR
and accuracy indicates that the proposed model is not only highly capable of detecting
cyberattacks, but also precisely classifies the attacks into their true class.

In addition, the proposed IDS is compared with two other CNN based IDS that are recently
published. The compared models have been introduced in the literature review
chapter [8], [9]. Both studies perform intrusion detections at the CICIDS dataset with a
similar CNN architecture yet different data pre-processing methods. In [8], the raw
CICIDS2017 dataset, which included separate sub-datasets of the network traffic in a day,
was used to train the models. The authors trained and evaluated a CNN based IDS based on
each of the sub-dataset, which mostly included only 1∼3 attack classes. In [9], the authors
manually separated the CICIDS2018 dataset into ten sub-datasets, and each sub-dataset
included up to three attack classes. Therefore, the CNN based IDS in [8], [9] both performed
one-class or a subset of class classification, and the model's accuracy, DR, and TNR were
provided. In Table 6, the DR and TNR of the proposed IDS and the compared CNN based
IDS are provided. The proposed IDS is superior to the other models in many aspects. First, in
the comparison, the proposed IDS is the only model performing true multi-class classification
on all the 15 classes that are available in the CICIDS dataset. The other two CNN based IDS,
while performing well on binary classification on one attack class, have poor performance if
there are many attack classes need to be classified. Second, in term of class-wise
classification performance, the proposed IDS also has the highest TNR, and a higher or
comparable DR at each attack class. In summary, the proposed IDS, thanks to the innovative
dataset pre-processing method, not only can classify more classes at once with one CNN
classifier but also yields a competitive benchmark at each class-wise classification category
compared with other previously proposed CNN based IDS.
TABLE 6: Class-Wise Performance of Other CNN Models on CICIDS2017

D. Performance on Innovative Attacks

To test how the proposed model performs on innovative attacks, the data from CICIDS2017
is used to design some custom training and testing datasets for simulation. More specifically,
the DoS attacks is selected to be the test subject due to its diversity.

CICIDS2017 has four DoS varieties, namely, the DoS GoldenEye, DoS Hulk, DoS
Slowhttptest and DoS slowloris. During the simulation, four IDS models are generated, each
has the same architecture of the proposed model, with four combinations of training and test
dataset. Each time a model is trained, only one type of the DoS samples are included in the
training dataset to represent the known attacks. Meanwhile, the rest of the three DoS varieties
are only used in the test dataset to represent the innovative attacks. In this scenario, since the
model never encounters the innovative DoS samples in the test dataset, some traditional
signature-based IDS models may fail easily due to the lack of signature information in the
database. On the other hand, the proposed model has the potential to recognize these
innovative DoS samples.

The proposed IDS model, which is a neural network-based model, captures the generalized
characteristics of each type of cyberattacks. Therefore, if some innovative DoS samples share
similar traits with the DoS samples in the training dataset, they might be successfully
classified as the DoS class by the model.

The four models are trained separately and their DR on the innovative DoS attacks in the test
dataset is recorded. The simulation results of all the four models are presented in Table 7.
From Table 7, one can easily observe that when only the DoS GoldenEye samples are used in
the model's training process, the model has the potential to also detect DoS Hulk at a decent
87.29% DR.

TABLE 7: Performance of the Proposed IDS Model At Innovative DoS Attacks

Besides, when the model is only trained on the DoS slowloris samples, it has the potential to
also detect DoS Slowhttptest at an 85.73% DR. These simulation results confirm that the
proposed model is superior to the traditional IDS at detecting innovative attacks.
CHAPTER-5

IOT INTRUSION DETECTION SYSTEMS TECHNIQUES

IoT Interruption is characterized as an unapproved activity or movement that hurts the IoT
biological system. For instance, an assault that will make the PC administrations inaccessible
to its real clients is viewed as an interruption. An IDS is characterized as a product or
equipment framework that keeps up with the security of the framework by recognizing
vindictive exercises on the PC frameworks. The primary point of IDS is to distinguish
unapproved PC utilization and vindictive organization traffic which is preposterous while
utilizing a customary firewall. This outcomes in making the PC frameworks exceptionally
defensive against the noxious activities that compromise the accessibility, respectability, or
secrecy of PC frameworks. A. Signature-based intrusion detection systems (SIDS) Signature
interruption location frameworks (SIDS) use design matching procedures to track down a
referred to assault; these are otherwise called Information based Recognition. In SIDS,
matching techniques are utilized to track down a past interruption. As such, when an
interruption signature matches the mark of a past interruption that as of now exists in the
mark data set, an alert sign is set off. For SIDS, the host's logs are reviewed to observe
arrangements of orders or activities which have recently been distinguished as malware.
SIDS has likewise been named in the writing as Information Based Discovery or Abuse
Recognition. Customary strategies for SIDS experience issues in distinguishing assaults that
length different parcels as they inspect network bundles and perform matching against an
information base of marks. With the expanded refinement of current malware, separating
mark data from different bundles might be required. With this, IDS needs to bring the
substance of prior parcels also. For making a mark for SIDS, by and large, there have been a
few strategies where marks are made as state machines, formal language string designs or
semantic circumstances. B. Anomaly-based intrusion detection system (AIDS) © 2022
IJRAR September 2022, Volume 9, Issue 3 www.ijrar.org (E-ISSN 2348-1269, P- ISSN
2349-5138) IJRAR22C2601 International Journal of Research and Analytical Reviews
(IJRAR) www.ijrar.org 932 Helps has drawn in a great deal of researchers due to its element
to beat the constraint of SIDS. In Helps, a typical model of the conduct of a PC framework is
made utilizing AI, measurable based or information based techniques. Any huge deviation
between the noticed conduct and the model is viewed as an irregularity, which can be
deciphered as an interruption. This sort of strategy chips away at the way that pernicious
conduct is not quite the same as commonplace client conduct. The conduct of unusual clients
that separates from the standard conduct is characterized as an interruption. There are two
stages in the advancement of Helps: the preparation stage and the testing stage. In the
preparation stage, the typical traffic profile is utilized to gain proficiency with a model of
ordinary conduct. In the testing stage, another informational index is utilized to foster the
framework's ability to sum up to beforehand inconspicuous interruptions. Helps can be sub-
arranged in light of the strategy utilized for preparing, for example, factual based, information
based and AI based. The primary benefit of Helps is the capacity to distinguish zero-day
assaults on the grounds that perceiving the strange client movement doesn't depend on a mark
information base. Helps sets off a risk signal when the inspected conduct goes amiss from
ordinary conduct. Moreover, Helps has various advantages. To begin with, they can find
inside malignant exercises. Assuming an interloper begins making exchanges in a taken
record that are unidentified in the average client movement, it makes a caution. Second, it is
trying for a cybercriminal to perceive what a typical client conduct is without delivering a
ready as the framework is developed from redid profiles. C. Machine Learning based
Technique AI is the most common way of separating information from huge amounts of
information. AI models include a bunch of rules, techniques, or complex "move works" that
can be applied to observe intriguing information designs or to perceive or anticipate conduct.
AI procedures have been applied broadly in the space of Helps. To extricate the information
from interruption datasets, various calculations and strategies, for example, grouping, brain
organizations, affiliation rules, choice trees, hereditary calculations, and closest neighbor
techniques are used. Some earlier examination has analyzed the utilization of various
strategies to assemble AIDSs. Analyzed the presentation of two element determination
calculations including Bayesian organizations (BN) and Characterization Relapse Trees
(CRC) and consolidated these strategies for higher exactness. Procedures of component
determination utilizing a mix of element choice calculations like Data Gain (IG) and
Connection Characteristic assessment. They tried the presentation of the chose highlights by
applying different order calculations like C4.5, guileless Bayes, NB-Tree and Multi-facet
Perceptron. A hereditary fluffy rule mining strategy has been utilized to assess the
significance of IDS highlights. NIDS by utilizing the Arbitrary Tree model to further develop
exactness and diminish the misleading problem rate. The primary point of utilizing AI
strategies is to make IDS that requires less human information and further develop exactness.
The amount of Helps which utilizes AI procedures has been expanding over the most recent
couple of years. The fundamental target of IDS in light of AI research is to distinguish
examples and fabricate an interruption discovery framework in view of the dataset. For the
most part, there are two classes of AI strategies, regulated and unaided. Internet of Things
(IoT) are interconnected systems of devices that facilitate seamless information exchange
between physical devices. These devices could be medical and healthcare devices, driverless
vehicles, industrial robots, smart TVs, wearables and smart city infrastructures; and they can
be remotely monitored and regulated. IoT devices are expected to become more prevalent
than mobile devices and will have access to the most sensitive information, such as personal
information. This will result in increasing attack surface area and probabilities of attacks will
increase. As security will be a vital supporting element of most IoT applications, IoT
intrusion detection systems need also be developed to secure communications enabled by
such IoT technologies (Granjal et al., 2015).
In the last few years, advancement in Artificial Intelligent (AI) such as machine learning and
deep learning techniques has been used to improve IoT IDS (Intrusion Detection System).
The current requirement is to do an up-to-date, thorough taxonomy and critical review of this
recent work. Numerous related studies applied different machine learning and deep learning
techniques through various datasets to validate the development of IoT IDS. But, it’s still not
clear that which dataset, machine learning or deep learning techniques are more effective for
building an efficient IoT IDS. Secondly, the time consumed in building and testing IoT IDS
is not considered in the evaluation of some IDSs techniques, despite being a critical factor for
the effectiveness of ‘on-line’ IDSs (Khraisat et al., 2019a).
This paper provides an up to date taxonomy, together with a critical review of the significant
research works on IoT IDSs up to the present time; and a classification of the proposed
systems according to the taxonomy. It provides a structured and comprehensive overview of
the existing IoT IDSs so that a researcher can become quickly familiar with the key aspects of
IoT IDS. This paper also provides a critical review of machine learning and deep learning
techniques applied to build IoT IDS. The detection techniques, validation strategies,
deployment strategies are reviewed, along with several techniques used in each method. The
complexity of different detection techniques, intrusion deployment strategy, and their
evaluation techniques are discussed, followed by a set of suggestions identifying the best
techniques, depending on the nature of the IoT IDS. Challenges for the current IoT IDSs are
also discussed. Compared to previous survey publications (Khraisat et al., 2019a; Benkhelifa
et al., 2018; Chaabouni et al., 2019; Zarpelao et al., 2017; Hindy et al., 2018) this paper
presents a discussion on IoT techniques, IoT deployment strategy and IDS dataset problems
which are of main concern to the research community in the area of IoT intrusion detection
systems (IDS). Prior studies such as (Yang et al., 2017; Yar & Steinmetz, 2019) have not
completely reviewed IoT IDSs in terms of the datasets, challenges and techniques. In this
paper, we provide a structured and contemporary, wide-ranging study on IDS in terms of
techniques, IoT attacks and datasets; and also highlight challenges of the IoT techniques and
then make recommendations.
During the last few years, several surveys on IoT IDS have been published. Table 1 shows
the IDS techniques and datasets covered by this survey and previous survey papers. The
comparison that in this table discusses the contributions of each survey related to the develop
intrusion detection system for IoT. The survey on intrusion detection systems and taxonomy
by Axelsson (Axelsson, 2000) classified intrusion detection systems based on the detection
methods. The highly cited survey by Debar et al. (Debar et al., 2000) surveyed detection
methods based on the behaviour and knowledge profiles of the attacks. A taxonomy of IoT
intrusion systems by Liao et al. (Liao et al., 2013a), has presented a classification of five
subclasses with an in-depth perspective on their characteristics: Statistics-based, Pattern-
based, Rule-based, State-based and Heuristic-based.
The highly cited survey by Alvarenga et al. (Zarpelao et al., 2017) reviews the IoT security
issues in general. Attacks against IoT devices are not discussed in their studies, such as
Denial of Service (DoS) Attack and attack on RPL (Routing Protocol for Low-Power and
Lossy Networks). Critical Infrastructure such as power systems, transport, the internet, air
traffic control, railways and power plants could all be disrupted by an IoT attacker. The
authors reviewed intrusion detection in IoT, and they presented a great taxonomy to classify
the IoT IDSs based on detection method, IDS placement strategy, and security threat and
validation strategy. It was also indicated by Alvarenga et al. (Zarpelao et al., 2017) in 2017
that intrusion detection for IoT is still in an initial stage and that the existing IDSs do not
enough for a wide variety of IoT attacks. This paper explored and discussed if the recent IoT
IDSs are enough to deal with different IoT attacks.
Existing review articles (e.g., such as (Chaabouni et al., 2019; da Costa et al., 2019; Buczak
& Guven, 2016; Lunt, 1988; Agrawal & Agrawal, 2015)) focus on intrusion detection
techniques or dataset issue or type of computer attack and IDS evasion. No articles
comprehensively reviewed IoT IDS, dataset problems, deployment strategies, IoT Intrusion
techniques, and different kinds of attack altogether. In addition, the development of IoT IDS
has been such that several different systems have been proposed in the meantime, and so
there is a need for an up-to-date. The updated survey of the taxonomy of IoT IDS discipline is
presented in this paper further enhances taxonomies given in (Khraisat et al., 2019a;
Benkhelifa et al., 2018; Chaabouni et al., 2019; Liao et al., 2013a).
Given the discussion on prior surveys, this article focuses on the following:
 Classifying various kinds of IoT IDS based on intrusion techniques, deployment
strategy, and validation strategy.
 Presenting a recent works effort to improve IoT security IDS.
 Taxonomy of IoT attacks.
 Discussion of available IDS datasets.
 The challenges of IoT IDS.
Intrusion detection in the internet of things

In this section, a review of the existing IDS research for IoT is presented. Each research was
categorized by considering the following characteristics: IDS placement strategy, detection
method, and validation strategy. Figure 1 shows the classification of IDS for IoT networks,
while Table 1 provides some recent related research.

Fig: 5.1 Classification of IDSs for IoT

Figure 1 shows the IDS techniques, deployment strategy, validation strategy, attacks on IoT
and datasets covered by this paper and previous research papers. The variety in the IoT IDS
surveys indicates that a study of IDS for IoT must be reviewed. Specifically, none of these
surveys cover all detection methods of IoT, which is considered crucial because of the
heterogeneous nature of the IoT ecosystem. For that reason, this survey review IDS for IoT
from a broad technological scale.

IoT intrusion detection systems methods

IoT Intrusion is defined as an unauthorised action or activity that harms the IoT ecosystem. In
other words, an attack that results in any kind of damage to the confidentiality, integrity or
availability of information is considered an intrusion. For example, an attack that will make
the computer services unavailable to its legitimate users is considered an intrusion. An IDS is
defined as a software or hardware system that maintains the security of the system by
identifying malicious activities on the computer systems (Liao et al., 2013a). The main aim of
IDS is to identify unauthorised computer usage and malicious network traffic which is not
possible while using a traditional firewall. This results in making the computer systems
highly protective against the malicious actions that compromise the availability, integrity, or
confidentiality of computer systems. IDS system has two main sub-categories: Signature-
based Intrusion Detection System (SIDS) and Anomaly-based Intrusion Detection System
(AIDS).

Signature-based intrusion detection systems (SIDS)

Signature intrusion detection systems (SIDS) utilize pattern matching techniques to find a
known attack; these are also known as Knowledge-based Detection or Misuse Detection
(Khraisat et al., 2018). In SIDS, matching methods are used to find a previous intrusion. In
other words, when an intrusion signature matches the signature of a previous intrusion that
already exists in the signature database, an alarm signal is triggered. For SIDS, the host’s logs
are inspected to find sequences of commands or actions which have previously been
identified as malware. SIDS has also been labelled in the literature as Knowledge-Based
Detection or Misuse Detection (Modi et al., 2013).
Figure 2 demonstrates the conceptual working of SIDS approaches. The main idea is to build
a database of intrusion signatures and to compare the current set of activities against the
existing signatures and raise the alarm if a match is found. For example, a rule in the form of
“if: antecedent -then: consequent” may lead to “if (source IP address=destination IP address)
then label as an attack “.
Fig: 5.2 Conceptual working of SIDS approache
SIDS usually gives an excellent detection accuracy for previously known intrusions (Kreibich
& Crowcroft, 2004). However, SIDS is unable to detect zero-day attacks as the database does
not contain a matching signature until the signature of the new attack is extracted and stored.
SIDS is employed in a number of common tools, such as Snort (Roesch, 1999) and NetSTAT
(Vigna & Kemmerer, 1999).
Traditional methods of SIDS have difficulty in identifying attacks that span multiple packets
as they examine network packets and perform matching against a database of signatures.
With the increased sophistication of modern malware, extracting signature information from
multiple packets may be required. With this, IDS needs to bring the contents of earlier
packets as well. For creating a signature for SIDS, generally, there have been several methods
where signatures are created as state machines (Meiners et al., 2010), formal language string
patterns or semantic conditions (Lin et al., 2011).
With the increasing rate of zero-day attacks (Symantec, 2017), SIDS techniques have become
progressively less effective because of the absence of signature for any such attacks. The
other factors such as the polymorphic variants of the malware and the rising amount of
targeted attacks also add up in compromising the adequacy of this traditional model. A
potential solution to this problem would be to use AIDS techniques. AIDS works by
differentiating between acceptable and unacceptable behaviour rather than profiling what is
anomalous, as described in the next section, as described in the next section.

Anomaly-based intrusion detection system (AIDS)

AIDS has attracted a lot of scholars because of its feature to overcome the limitation of SIDS.
In AIDS, a normal model of the behavior of a computer system is created using machine
learning, statistical-based or knowledge-based methods. Any significant deviation between
the observed behavior and the model is regarded as an anomaly, which can be interpreted as
an intrusion. This kind of technique works on the fact that malicious behaviour is different
from typical user behaviour. The behaviour of abnormal users that differentiates from the
standard behaviour is defined as an intrusion. There are two phases in the development of
AIDS: the training phase and the testing phase. In the training phase, the normal traffic
profile is used to learn a model of normal behaviour. In the testing phase, a new data set is
used to develop the system’s capacity to generalise to previously unseen intrusions. AIDS can
be sub-categorized based on the method used for training, for instance, statistical-based,
knowledge-based and machine learning-based (Butun et al., 2014).
The main advantage of AIDS is the ability to identify zero-day attacks because recognizing
the abnormal user activity does not rely on a signature database (Alazab et al., 2012). AIDS
triggers a danger signal when the examined behavior deviates from normal behavior.
Furthermore, AIDS has a number of benefits. First, they can discover internal malicious
activities. If an intruder starts making transactions in a stolen account that are unidentified in
the typical user activity, it creates an alarm. Second, it is challenging for a cybercriminal to
recognize what is a normal user behavior without producing an alert as the system is
constructed from customized profiles.
Table 2 presents the differences between signature-based detection and anomaly-based
detection. The main difference between these two is that AIDS can discover zero-day attacks,
whereas SIDS can only detect previously known intrusions. However, AIDS can result in a
high false-positive rate because anomalies may just be new normal activities rather than
genuine intrusions.
Since there is a lack of a taxonomy for anomaly-based intrusion detection systems, we have
identified five subclasses based on their features: Statistics-based, Pattern-based, Rule-based,
State-based and Heuristic-based as shown in Table 3.

Techniques for implementing AIDS

This section presents an overview of AIDS approaches proposed in recent years for
improving detection accuracy and reducing false alarms.
AIDS methods can be categorized into four main groups: supervised learning (Chao et
al., 2015), unsupervised learning (Elhag et al., 2015; Can & Sahingoz, 2015), reinforcement
learning and deep learning (Buczak & Guven, 2016; Meshram & Haas, 2017). Supervised
learning involves collecting and examining every input variable and an output variable and
you use an algorithm to learn the normal user behaviour from the input to the output. The
objective is to approximate the mapping function so well that when a new input record is
collected that predicts the output variables for that record. On the other hand, Unsupervised
learning tries to identify the requested actions from existing system data such as protocol
specifications and network traffic instances where you only have input data and no
corresponding output variables, while reinforcement learning methods enable an agent to
learn in an interactive environment by trial and error using feedback from its own actions and
experiences. In reinforcement learning, the aim is to obtain an appropriate action model that
would maximize the total cumulative reward of the agent. Deep learning models are based on
artificial neural networks, specifically convolutional neural networks (CNN)s. These four
classes along with examples of their subclasses are shown in Fig. 3.

Fig: 5.3 Classification of AIDS methods

Machine learning is the process of extracting knowledge from large quantities of data.
Machine learning models comprise of a set of rules, methods, or complex “transfer functions”
that can be applied to find interesting data patterns or to recognise or predict behaviour (Dua
& Du, 2016). Machine learning techniques have been applied extensively in the area of
AIDS. To extract the knowledge from intrusion datasets, different algorithms and techniques
such as clustering, neural networks, association rules, decision trees, genetic algorithms, and
nearest neighbour methods are utilized.
Some prior research has examined the use of different techniques to build AIDSs. Chebrolu
et al. examined the performance of two feature selection algorithms involving Bayesian
networks (BN) and Classification Regression Trees (CRC) and combined these methods for
higher accuracy (Chebrolu et al., 2005).
Bajaj et al. proposed a technique for feature selection using a combination of feature selection
algorithms such as Information Gain (IG) and Correlation Attribute evaluation. They tested
the performance of the selected features by applying different classification algorithms such
as C4.5, naïve Bayes, NB-Tree and Multi-Layer Perceptron (Khraisat et al., 2018; Bajaj &
Arora, 2013). A genetic-fuzzy rule mining method has been used to evaluate the importance
of IDS features (Elhag et al., 2015). Thaseen et al. proposed NIDS by using the Random Tree
model to improve accuracy and reduce the false alarm rate (Thaseen & Kumar, 2013).
Subramanian et al. proposed classifying the NSL-KDD dataset using decision tree algorithms
to construct a model for their metric data and studying the performance of decision tree
algorithms (Subramanian et al., 2012).
Various AIDSs have been created based on machine learning techniques as shown in Fig. 4.
The main aim of using machine learning methods is to create IDS that requires less human
knowledge and improve accuracy. The quantity of AIDS which makes use of machine
learning techniques has been increasing in the last few years. The main objective of IDS
based on machine learning research is to detect patterns and build an intrusion detection
system based on the dataset. Generally, there are two categories of machine learning
methods, supervised and unsupervised.

Fig: 5.4 Conceptual working of AIDS approaches based on machine learning

Supervised learning in intrusion detection system

This subsection presents various supervised learning techniques for IDS. Each technique is
presented in detail, and references to important research publications are presented.
Supervised learning-based IDS techniques detect intrusions by using labeled training data. A
supervised learning approach usually consists of two stages, namely, training and testing. In
the training stage, relevant features and classes are identified and then the algorithm learns
from these data samples. In supervised learning IDS, each record is a pair, containing a
network or host data source and an associated output value (i.e., label), namely intrusion or
normal. Next, feature selection can be applied to eliminating unnecessary features. Using the
training data for selected features, a supervised learning technique is then used to train a
classifier to learn the inherent relationship that exists between the input data and the labelled
output value. A wide variety of supervised learning techniques have been explored in the
literature, each with its advantages and disadvantages. In the testing stage, the trained model
is used to classify the unknown data into intrusion or normal class. The resultant classifier
then becomes a model that, given a set of feature values, predicts the class to which the input
data might belong. Figure 5 shows a general approach for applying classification techniques.
The most existing IDSs proposed are trained in a supervised way. It implies that the
cybersecurity professional need to label the network traffic and revise the model manually
from time to time.

Fig: 5.5 Classification as the task

There are many classification methods such as decision trees, rule-based systems, neural
networks, support vector machines, naïve Bayes and k-nearest neighbour. Each technique
uses a learning method to build a classification model. However, a suitable classification
approach should not only handle the training data, but it should also identify accurately the
class of records it has not ever seen before. Creating classification models with reliable
generalization ability is an important task of the learning algorithm.

Decision trees

A decision tree has three basic components. The first component is a decision node, which is
used to identify a test attribute. The second is a branch, where each branch represents a
possible decision based on the value of the test attribute. The third is a leaf that comprises the
class to which the instance belongs (Rutkowski et al., 2014). There are many different
decision tree algorithms, including ID3 (Quinlan, 1986), C4.5 (Quinlan, 2014) and CART
(Breiman, 1996).

Naïve Bayes

This approach is based on applying Bayes' principle with robust independence assumptions
among the attributes. Naïve Bayes answers questions such as “what is the probability that a
particular kind of attack is occurring, given the observed system activities?” by applying
conditional probability formulae. Naïve Bayes relies on the features that have different
probabilities of occurring in attacks and normal behavior. The naïve Bayes classification
model is one of the most prevalent models in IDS due to its ease of use and calculation
efficiency, both of which are taken from its conditional independence assumption property
(Yang & Tian, 2012). However, the system does not operate well if this independence
assumption is not valid, as was demonstrated on the KDD’99 intrusion detection dataset,
which has complex attribute dependencies (Koc et al., 2012). The results also reveal that the
Naïve Bayes model has reduced accuracy for large datasets. A further study showed that the
more sophisticated Hidden Naïve Bayes (HNB) model can be applied to IDS tasks that
involve high dimensionality, extremely interrelated attributes and high-speed networks (Koc
et al., 2012).

Genetic algorithms (GA)

Genetic algorithms are a heuristic approach to optimization, based on the principles of


evolution. Each possible solution is represented as a series of bits (genes) or chromosomes,
and the quality of the solutions improves over time by the application of selection and
reproduction operators, biased to favour fitter solutions. In applying a genetic algorithm to
the intrusion classification problem, there are typically two types of chromosome encoding:
one is according to clustering to generate binary chromosome coding method; another is
specifying the cluster center (clustering prototype matrix) by an integer coding chromosome.
Murray et al. have used GA to evolve simple rules for network traffic (Murray et al., 2014).
Every rule is represented by a genome and the primary population of genomes is a number of
random rules. Each genome is comprised of different genes that correspond to characteristics
such as IP source, IP destination, port source, port destination and 1 protocol type (Hoque &
Bikas, 2012).
Artificial neural network (ANN)

ANN is one of the most broadly applied machine-learning methods and has been shown to be
successful in detecting different malware. The most frequent learning technique employed for
supervised learning is the backpropagation (BP) algorithm. The BP algorithm assesses the
gradient of the network’s error with respect to its modifiable weights. However, for ANN-
based IDS, detection precision, particularly for less frequent attacks, and detection accuracy
still need to be improved. The training dataset for less-frequent attacks is small compared to
that of more-frequent attacks, and this makes it difficult for the ANN to learn the properties
of these attacks correctly. As a result, detection accuracy is lower for less frequent attacks. In
the information security area, huge damage can occur if low-frequency attacks are not
detected. For instance, if the User to Root (U2R) attacks evade detection, a cybercriminal can
gain the authorization privileges of the root user and thereby carry out malicious activities on
the victim’s computer systems. In addition, less common attacks are often outliers (Wang et
al., 2010). ANNs often suffer from local minima and thus learning can become very time-
consuming. The strength of ANN is that, with one or more hidden layers, it can produce
highly nonlinear models that capture complex relationships between input attributes and
classification labels. With the development of many variants such as recurrent and
convolutional NNs, ANNs are powerful tools in many classification tasks including IDS.

Fuzzy logic

This technique is based on the degrees of uncertainty rather than the typical true or false
Boolean logic on which the contemporary PCs are created. Therefore, it presents a
straightforward way of arriving at a conclusion based upon unclear, ambiguous, noisy,
inaccurate or missing input data. With a fuzzy domain, fuzzy logic permits an instance to
belong, possibly partially, to multiple classes at the same time. Therefore, fuzzy logic is a
good classifier for IDS problems as the security itself includes vagueness, and the borderline
between the normal and abnormal states is not well identified. In addition, the intrusion
detection problem contains various numeric features in the collected data and several derived
statistical metrics. Building IDSs based on numeric data with hard thresholds produces high
false alarms. An activity that deviates only slightly from a model could not be recognized, or
a minor change in normal activity could produce false alarms. With fuzzy logic, it is possible
to model this minor abnormality to keep the false rates low. Elhag et al. showed that with
fuzzy logic, the false alarm rate in determining intrusive actions could be decreased. They
outlined a group of fuzzy rules to describe the normal and abnormal activities in a computer
system, and a fuzzy inference engine to define intrusions (Elhag et al., 2015).

Support vector machines (SVM)

SVM is a discriminative classifier defined by a splitting hyperplane. SVMs use a kernel


function to map the training data into a higher-dimensioned space so that intrusion is linearly
classified. SVMs are well known for their generalization capability and are mainly valuable
when the number of attributes is large and the number of data points is small. Different types
of separating hyperplanes can be achieved by applying a kernel, such as linear, polynomial,
Gaussian Radial Basis Function (RBF), or hyperbolic tangent. In IDS datasets, many features
are redundant or less influential in separating data points into correct classes. Therefore,
feature selection should be considered during SVM training. SVM can also be used for
classification into multiple classes. In the work by Li et al., an SVM classifier with an RBF
kernel was applied to classify the KDD 1999 dataset into predefined classes (Li et al., 2012).
From a total of 41 attributes, a subset of features was carefully chosen by using a feature
selection method.

Hidden Markov model (HMM)

HMM is a statistical Markov model in which the system being modeled is assumed to be a
Markov process with unseen data. Prior research has shown that HMM analysis can be
applied to identify particular kinds of malware (Annachhatre et al., 2015). In this technique, a
Hidden Markov Model is trained against known malware features (e.g., operation code
sequence) and once the training stage is completed, the trained model is applied to score the
incoming traffic. The score is then contrasted to a predefined threshold, and a score greater
than the threshold indicates malware. Likewise, if the score is less than the threshold, the
traffic is identified as normal.
K-Nearest Neighbors (KNN) classifier: The k-Nearest Neighbor (k-NN) technique is a
typical non-parametric classifier applied in machine learning (Lin et al., 2015). The idea of
these techniques is to name an unlabelled data sample to the class of its k nearest neighbors
(where k is an integer defining the number of neighbors to be considered). Figure 6 illustrates
a K-Nearest Neighbors classifier where k = 5. The point X represents an instance of
unlabelled data that needs to be classified. Amongst the five nearest neighbors of X, there are
three similar patterns from the class Intrusion and two from the class Normal. Taking a
majority vote enables the assignment of X to the Intrusion class.

Fig: 5.6 An example of classification by k-Nearest Neighbour for k = 5

k-NN can be appropriately applied as a benchmark for all the other classifiers because it
provides a good classification performance in most IDSs (Lin et al., 2015).

Ensemble methods

Multiple machine learning algorithms can be used to obtain better predictive performance
than any of the constituent learning algorithms alone (Vasan et al., 2020a). Training several
classifiers at the same stage to detect different attacks, and then uniting their result to increase
the detection rate. Typically, the ensemble’s ability is better than a single classifier’s, as it can
enhance weak classifiers to produce better results than can a solitary classifier (Aburomman
& Reaz, 2017). Several different ensemble methods have been proposed, such as Boosting,
Bagging and Stacking. Boosting refers to a family of algorithms that can transform weak
learners into strong learners. Bagging means training the same classifier on different subsets
of the same dataset. Stacking combines various classification via a meta-classifier
(Aburomman & Ibne Reaz, 2016). The base-level models are built based on a whole training
set, and then the meta-model is trained on the outputs of the base level model as attributes.
Researchers have revealed that the combination of different classifier techniques is an
effective way to resolve the shortcomings traditional IDSs have when they are applied for
IoT. Jabbar et al. proposed an ensemble classifier that is built using Random Forest and also
the Average One-Dependence Estimator (AODE which solves the attribute dependency
problem in the Naïve Bayes classifier. Random Forest (RF) enhances precision and reduces
false alarms (Jabbar et al., 2017). It is combining both approaches in ensemble results in
improved accuracy over either technique applied independently.
More recently, Khraisat, et al. (Khraisat et al., 2019b) proposed a stacking ensemble method
that combined the C5 decision tree classifier and one-class support vector machine. The
reported classification accuracy of detection of malware is 94% on the IoT intrusion dataset
C5 decision tree classifier, while it is 92.5% in stage two. They reported in the stacking
ensemble, and the classification accuracy was 99.97%.

Unsupervised learning in intrusion detection system

Unsupervised learning is a kind of machine learning that makes use of input datasets without
class labels to extract interesting information. The input data points are normally treated as a
set of random variables. A joint density model is then created for the data set. In supervised
learning, the output labels are given and used to train the machine to get the required results
for an unseen data point. In contrast, in unsupervised learning, no labels are given, and
instead, the data is grouped automatically into various classes through the learning process. In
the context of developing an IDS, unsupervised learning means, use of a mechanism to
identify intrusions by using unlabelled data to train the model. IoT network traffic is clustered
into groups, based on the similarity of the traffic, without the need to pre-define these groups.
As shown in Fig. 7, once records are clustered, all of the cases that appear in small clusters
are labeled as an intrusion because the normal occurrences should produce sizable clusters
compared to the anomalies. In addition, malicious intrusions and normal instances are
dissimilar, thus they do not fall into an identical cluster.
Fig: 5.7 Using Clustering for Intrusion Detection

K-means

The K-means technique is one of the most prevalent techniques of clustering analysis that
aims to separate ‘n’ data objects into ‘k’ clusters in which each data object is selected in the
cluster with the nearest mean. K means it is an iterative clustering algorithm that aids to
obtain the highest value for every iteration. It is a distance-based clustering technique and it
does not need to compute the distances between all combinations of records. It applies a
Euclidean metric as a similarity measure. The number of clusters is determined by the user in
advance. Typically, several solutions will be tested before accepting the most appropriate
one. Annachhatre et al. used the K-means clustering algorithm to identify different host
behaviour profiles (Annachhatre et al., 2015). They have proposed new distance metrics that
can be used in the k-means algorithm to relate the clusters closely. They have clustered data
into several clusters and associated them with known behaviour for evaluation. Their
outcomes have revealed that k-means clustering is a better approach to classify the data using
unsupervised methods for intrusion detection when several kinds of datasets are available.
Clustering could be used in IDS for reducing intrusion signatures, generate a high-quality
signature or similar group intrusion.

Probabilistic clustering

This technique uses probability distribution to create the clusters.


Principal component analysis

is a common method for obtaining a set of low dimensional features from the largest set of
features.

Hierarchical clustering

This is a clustering technique that aims to create a hierarchy of clusters. Approaches for
hierarchical clustering are normally classified into two categories:
1. (i)
Agglomerative- bottom-up clustering techniques where clusters have sub-clusters,
which in turn have sub-clusters and pairs of clusters are combined as one moves up
the hierarchy.
2. (ii)
Divisive - hierarchical clustering algorithms where iteratively the cluster with the
largest diameter in feature space is selected and separated into binary sub-clusters
with a lower range.

Singular value decomposition

It is a method of decomposing a matrix into other matrices as a series of linear


approximations that expose the underlying meaning-structure of the matrix. The goal of
Singular Value Decomposition is to uncover the optimal set of features that best predict the
detection.

Independent component analysis

It is used for showing hidden factors that underlie sets of random features.
A lot of work has been done in the area of the cyber-physical control system (CPCS) with
attack detection and reactive attack mitigation by using unsupervised learning. For example, a
redundancy-based resilience approach was proposed by Alcara (Alcaraz, 2018). He proposed
a dedicated network sublayer that can handle the context by regularly collecting consensual
information from the driver nodes controlled in the control network itself, and discriminating
view differences through data mining techniques such as k-means and k-nearest neighbor.
Chao Shen et al. proposed Hybrid-Augmented device fingerprinting for IDS in Industrial
Control System Networks. They used different machine learning techniques to analyse
network packets to filter anomaly traffic to detect intrusions in ICS networks (Shen et
al., 2018).
Likewise, Khraisat, et al. (Khraisat et al., 2020) experimented with both single and ensemble
classifiers composed of the decision tree, and SVM, for classification of the NLS KDD
intrusion detection evaluation data set. They found that an ensemble of all three classifiers,
based on majority voting, marginally out-performed all other classifiers.

Reinforcement learning

Deep Reinforcement learning utilizes deep learning and reinforcement learning principles for
building IDS. Reinforcement learning involves an agent interacting with an environment. The
agent is trying to achieve a goal of some kind within the environment. The purpose of the
agent is to learn how to interact with its environment in such a way that allows it to achieve
its goals.
Deep reinforcement learning is the application of reinforcement learning to train deep neural
networks. It has an input layer, an output layer, and multiple hidden layers same as prior deep
neural networks. However, our input is the state of the environment. For instance, a bus is
trying to get its passengers to their destination. The inputs are the position, speed, and
direction; our output is a series of possible actions like speed up, slow down, turn left, or turn
right. In addition, we’re feeding our rewards signal into the network so that we can learn to
associate what actions produce positive results given a specific state of the environment.

Deep Q-network

It is combined reinforcement learning and deep neural networks at scale. The algorithm was
developed by enhancing a classic RL algorithm called Q-Learning with deep neural
networks.

Double Q-learning

It is an off-policy reinforcement learning algorithm that utilises double estimation to


counteract overestimation problems with traditional Q-learning.

Deep learning

Deep learning is a form of machine learning where a computer uses a hierarchy of data based
on experience and form multiple layers as an output. Deep learning can be supervised as well
as unsupervised. In the case of supervised deep learning, data can be classified whereas in the
case of unsupervised deep learning data patterns are analyzed. Deep learning is directly
related to artificial intelligence where machines will acquire knowledge by learning with
experience and will replace human intelligence. Deep learning works on the platform of
artificial neural networks by studying massive amounts of data with the help of algorithms
prepared by human intelligence. It is referred to as ‘deep learning’ as the artificial neural
networks possess different deep layers that enables them to learn. Table 4 shows a
Comparison of Machine learning and deep learning. Table 5 shows a summary of the deep
learning model techniques.
In neural networks, each neural node of every single hidden layer calculates the weighted
values receiving from the previous layer and passes on the output values to the subsequent
layer. The result value of the last layer can be considered as the final results achieved by the
neural networks from the raw data.

Fully connected neural networks (FCNN)

Fully Connected Feedforward Neural Networks are the standard network architecture applied
in mainly basic neural network applications. Fully connected denotes that an individual
neuron in the earlier layer is linked to every neuron in the subsequent layer. Feedforward
indicates that neurons in any preceding layer are only ever connected to the neurons in a
subsequent layer. Fully Connected Neural Networks can be used for feature extraction (Wang
et al., 2020).

Recurrent neural network (RNN)

The recurrent neural network can function efficiently on a series of data with variable input
length. This means that RNNs use the information of its prior state as an input for their
current prediction, and we can repeat this process for an arbitrary number of steps allowing
the network to propagate information via its hidden state through time. This is essentially like
giving a neural network a short-term memory. This feature makes RNNs very effective for
working with sequences of data that occur over time. Yin, et al. (Yin et al., 2017) proposed a
deep learning approach for intrusion detection using recurrent neural networks (RNN-IDS).
their experimental results show that RNN-IDS is very appropriate for creating IDS with high
accuracy and that its performance is superior to that of traditional machine learning
classification methods in both binary and multiclass classification.
Generative adversarial networks (GAN)

The Generative Adversarial Network is an integration of two deep learning neural networks:
Generator Network, and a Discriminator Network. The Generator Network produces
synthetic data, and the Discriminator Network tries to detect if the data that it’s seeing is real
or synthetic. These two networks are adversaries in the sense that they’re both competing to
beat one another.

Convolutional neural network (CNN)

A Convolutional Neural Network is contained of one or more convolutional layers and then
linked by one or more fully connected layers as in a standard multilayer neural network
(Vasan et al., 2020b). A convolutional neural network contains an input and an output layer,
as well as multiple hidden layers. The hidden layers of a CNN typically contain a sequence of
convolutional layers that convolve with a multiplication. A CNN receives a 2-D input and
abstracts high-level features via a sequence of hidden layers. CNN’s, which better upon the
architecture of the common neural networks, benefit from spatial features (Vasan et
al., 2020c). Spatial features are usually applied types of traffic features in the area of IDS.
When applying spatial features, network traffic is reformed into traffic images; it follows that
the image classification technique is used to categorize the traffic images, which also
ultimately achieves the objective of detecting the intrusion traffic. This technique is
comparatively recent, but then numerous recent research results prove its great potential. For
example, Vasan, et al. (Vasan et al., 2020b) adopted CNNS techniques and transformed the
raw malware binary into both grayscale and colour images and apply the fine-tuned CNN
architecture.

Autoencoder

An autoencoder is trained to restructure its inputs. Autoencoders have been used for
developing online IoT IDS (Mirsky et al., 2018). In general, an autoencoder trained on X
gains the capability to restructure unobserved instances from the identical data distribution as
X. If an instance does not be appropriate to the model learned from X, then it is expected the
restructure to have a high error.
IoT IDS deployment strategies

IDS can also be classified based on the deployment used to detect IoT attacks. In IDS
Deployment strategies, IDS can be classified as distributed, centralized or hybrid.

Distributed IDS

In distributed placement, the IoT devices could be responsible for checking other IoT
devices. Distributed IDS be made up of several IDS over a big IoT ecosystem, all of which
communicate with each other, or with a central server that assists advanced intrusion
detection systems, packet analysis, and incident response.
Several IDS deploy distributed architectures. This includes a subset of the network checking
the other nodes. Distributed IDS offers the incident analyst many advantages over centralized
IDS. The main benefit is the capability to identify attack forms across a whole IoT
ecosystem. This might increase prompt IoT attack prevention and detection. The additional
supported benefit is to allow early detection of an IoT Botnet creating its way through
corporate IoT devices. This data could then be used to detect and clean systems that have
been infected by the IoT Botnet and stop further spread of the Botnet into the IoT ecosystem
consequently take down any IoT devices damaged that would otherwise have occurred.
Furthermore, the advantage of distributed IDS rather than centralized IDS computing
resources also implies reduced control over those resources.

Centralized IDS

In the centralized IDS location, the IDS is placed in central devices, for instance, in the
boundary switch or a nominated device. All the information that the IoT devices collect and
then send to the network boundary switch passes through the boundary switch (Benkhelifa et
al., 2018). Consequently, the IDS positioned in a boundary switch can check the packets
switched between the IoT devices and the network. Despite this, checking the network
packets that pass through the boundary switch is not adequate to identify anomalies that
affect the IoT devices. The network traffic is monitored in centralized IDS. This traffic is
extracted from the network through different network data sources such as packet capture,
NetFlow, etc. The computers connected in a network can be monitored by Network-based
IDS. Moreover, NIDS is also capable of monitoring the external malicious activities that
could have been commenced from an external threat at an earlier stage, before these threats
expand to other computer systems. However, NIDS comes with some limitations such as its
restricted ability to inspect the whole data in a high bandwidth network because of the
volume of data passing through modern high-speed communication networks (Bhuyan et
al., 2014). NIDS deployed at several positions within a particular network topology, together
with HIDS and firewalls, can provide a concrete, resilient, and multi-tier protection against
both external and insider attacks. shows a summary of comparisons between IDS deployment
strategies.
Data source consists of system calls, application program interfaces, log files, data packets
that are extracted from well-known attacks. These data sources can be useful to classify
intrusion behaviors from abnormal actions.

Hierarchical IDS

In Hierarchical IDS, the network is separated into clusters. The sensor nodes that are adjacent
to each other typically belong to the same cluster. Each cluster is assigned a leader, the so-
called cluster head that screens the member nodes and plays a part in network-wide analyses.

IDS validation strategies

IDS Validation is the process for determining whether the IoT IDS model is an accurate
enough representation of the system, for detecting IoT attacks. To validate the effectiveness
of IDSs, researchers have used different techniques such as theoretical, empirical, and
hypothetical strategies for validating their techniques.
There are many classification metrics for IDS, some of which are known by multiple names.
shows the confusion matrix for a two-class classifier which can be used for evaluating the
performance of an IDS. Each column of the matrix represents the instances in a predicted
class, while each row represents the instances in an actual class.
IDS are typically evaluated based on the following standard performance measures:
 True Positive Rate (TPR): It is calculated as the ratio between the number of correctly
predicted attacks and the total number of attacks. If all intrusions are detected then the
TPR is 1 which is extremely rare for an IDS. TPR is also called a Detection Rate (DR)
or the Sensitivity. The TPR can be expressed mathematically as
���=����+��

False Positive Rate (FPR): It is calculated as the ratio between the number of normal
instances incorrectly classified as an attack and the total number of normal instances.
���=����+��
 False Negative Rate (FNR): False negative means when a detector fails to identify an
anomaly and classifies it as normal. The FNR can be expressed mathematically as:
���=����+��

 Classification rate (CR) or Accuracy: The CR measures how accurate the IDS is in
detecting normal or anomalous traffic behavior. It is described as the percentage of all
those correctly predicted instances to all instances:
��������=��+����+��+��+��

Receiver Operating Characteristic (ROC) curve: ROC has FPR on the x-axis and TPR on the
y-axis. In the ROC curve, the TPR is plotted as a function of the FPR for different cut-off
points. Each point on the ROC curve represents an FPR and TPR pair corresponding to a
certain decision threshold. As the threshold for classification is varied, a different point on the
ROC is selected with different False Alarm Rate (FAR) and different TPR. A test with
perfect discrimination (no overlap in the two distributions) has a ROC curve that passes
through the upper left corner (100% sensitivity, 100% specificity).

State-of-the-art intrusion detection in IoT

Table 8 shows a summary of the proposed research to IDSs for IoT.


Table 8 Summary of the proposed research to IDSs for IoT

Cho et al. proposed a methodology for checking packets that are passing through the border
router for communication between physical and network devices. Their methodology is based
on the botnet attacks by checking the packet length (Cho et al., 2009). However, no
information is presented about the technique employed to create a normal behaviour profile.
It is also not clear how the proposed IDS techniques would work on resource constraints
nodes in the IoT.
Rathore et al. proposed semi-supervised Fuzzy learning-based distributed attack detection
framework for IoT (Rathore & Park, 2018). The evaluation was done on the NSL-KDD
dataset and consequently suffered from the same limitations concerning the dataset as
mentioned above.
Hodo et al. use an Artificial Neural Network (ANN) to detect DDoS and DoS attacks against
legitimate IoT network traffic. The proposed ANN model was tested with the use of a
simulated IoT network. Hoda et al. proposed a threat analysis of IoT using ANN to detect
DDoS/DoS attacks. A multi-level perceptron, a type of supervised ANN, is trained using
internet packet traces and then the model is assessed on its ability to thwart (DDoS/DoS)
attacks (Hodo et al., 2016). Hoda et al. did not consider effectiveness after the deployment of
the proposed IDS in the IoT ecosystem on low-capacity devices. According to their
experimentation, the system achieved an accuracy of 99.4% for DDoS/DoS. However, no
details of the dataset are provided.
Diro et al. developed an IoT network attack detection system based on distributed deep
learning. Their work showed that distributed attack detection could identify IoT attacks better
than a centralized strategy with a 96% detection rate. Their approach was evaluated using the
NLS-KDD dataset. Even though this dataset is another version of the KDD data set, it still
suffers from various issues reviewed by McHugh (McHugh, 2000). We believe this dataset
should not be used as a practical benchmark dataset in the IoT as this data was collected from
the traditional network (Diro & Chilamkurti, 2018). This leads us to develop IDSs that take
into consideration the specific requirement of IoT protocol such as (Low-power Wireless
Personal Area Networks) 6LowPAN. Hence, the Intrusion detection system that is created for
the IoT ecosystem should operate under rigorous settings of low processing ability, high-
speed connection, and big capacity data processing.
Moustafa et al. proposed an ensemble of IDSs to detect abnormal activities, in specific botnet
attacks against Domain Name System (DNS), Hypertext Transfer Protocol (HTTP) and
Message Queue Telemetry Transport (MQTT) (Moustafa et al., 2019). Their ensemble
methods are based on the AdaBoost learning method and they used three machine learning
techniques: Artificial Neural Networks (ANN), Decision Tree (DT) and Naive Bayes (NB) to
evaluate their methodology (Moustafa et al., 2019). The proposed IDS result in significant
overhead which degrades its performance.
Cervantes, et al. proposed IDS for detecting sinkhole attacks on 6LoWPAN for the IoT. Their
IDS approach applies a combination of anomaly detection and support vector machine
(SVM). IDS during the training process, each IDS agent trains the SVM and executes a
majority voting decision to mark the infected nodes (Cervantes et al., 2015). Their simulation
results show that their IDS achieve a sinkhole detection rate of up to 92% on the fixed
scenario and 75% in a mobile scenario. However, their approach has not been evaluated for
other types of attacks in the IoT.
Khraisat, et al. (Khraisat et al., 2019b) proposed an ensemble Hybrid Intrusion Detection
System (HIDS) by combining a C5 classifier and a One-Class Support Vector Machine
classifier. C5 classifier is used to detect well know intrusion. One-Class Support Vector
Machine classifier is used to detect a new attack.
Attacks on IoT ecosystem

As IoT technology involves many devices like sensors, processors and many other
technologies, the purpose of sharing the data and connecting to other networks has been
served successfully. As it involves many devices connected, the data shared may not be
secure and the security concern raises. IoT Security refers to protect the information shared
among different networks through IoT devices using IoT technology. These devices are
connected to others using the internet which allows vulnerabilities to take place by allowing
the hacker to hack the data. Data without the security will lead to many concerns and brings
huge loss for many industries and even to the individuals ending with the loss of the data
from their systems (Khraisat et al., 2019b).
IoT grabbed the attention of the people and the organizations from many sectors onto it, by
providing extreme benefits to them. Along with its tremendous growth, some security issues
have risen by which IoT attacks have taken place by preventing people to use many of its
upcoming applications. Hence, this section report discusses the concept of IoT security, the
Challenge of IoT security, the impacts of them followed by the IoT attack and its types. IoT
devices can be accessed from any place within a trusted network. So, there are chances of lots
of malicious attacks in the IoT network. Hence, security, privacy, and confidentiality issues
must be appropriately addressed in the IoT to protect it from malicious attacks. For example,
the attacking of traffic lights and driverless vehicles not only reasons chaos and rises
contamination, but also can initiate harm and severe collisions leading to wounded.
Different devices and equipment of home and office can be virtually connected with the help
of the internet to left them they can perform their activities by monitoring the device’s
remote.
Figure 8 shows the IoT system architecture with layers where attacks can occur. An IoT
system can comprise three fundamental layers which are the perception layer, network layer,
and application layer (Liao et al., 2013a). The perception layer is the lowest layer of the
conventional architecture of IoT. This layer consists of devices, sensors, and controllers. This
layer’s fundamental task is to gather valuable information from IoT sensors systems.
Fig: 5.8 IoT architecture and layer attacks

In the network layer, IoT involves a variety of diverse networks such as WSNs, wireless
mesh networks, WLAN, etc. These networks help sensors in IoT exchange information. A
gateway can simplify the communication of several sensors over the network. Thus, a
gateway could be beneficial to handle many complex aspects involved in communication on
the network. The network layer ensures the successful transmission of data while the
application layer is the highest layer that processes the data for visualization.
In the application layer, the data source can be obtained from Internet Service Provider (ISP)
and mobile network providers’ web-based services, virtual online identities, edge network,
devices logs, Radio-Frequency Identification (RFID) tags, and readers, etc.
Most of the attackers’ target IoT devices and equipment rather than a single PC. IoT has an
interconnection of various devices and equipment along with some embedded devices as
well. The major causes of IoT as a malware target can be summarized below:
 All the devices and equipment in an IoT need to be always on and it is easy for
attackers to assess that equipment where the power mode is on at any point in time.
 Devices and equipment interconnected in an IoT are always connected and the
attackers may access the interconnected devices from a single device.
 In most cases, proper security measures and knowledge to defend and tackle attack in
a whole set of interconnected devices is difficult than in a single PC.
 Lack of proper encryption features in the interconnecting devices and weak passwords
is another cause of malware target in IoT.
 The level of sophistication for the exploitation of the IoT is much lower and easy as
compared to a single device.
 Twenty-four hours of internet exposure of the IoT devices and equipment is another
cause of IoT as a malware target. Due to the unlimited internet connection, the
devices will accept the incoming traffic signals and become vulnerable to attacks.
The attributes and features of malware differ in a single device and a set of interconnected
devices and equipment.
Table 9 shows the different security attributes of a single device that is PC and the set of
devices that is IoT about malware. Cyber-attacks on IoT applications can be both internal and
external attacks. The attacker is a compromised node of the network in an inside attack
whereas the attacker is not a part of the network in an outside attack. Figure 9 shows the
significant types of cyber-attack that target IoT applications. The types of attacks as well as
how the attack will impact the IoT network and their implications are described.
Fig: 5.9 Taxonomy of Security attacks within IoT

Physical/perception layer

Attacks are based on hidden aspects of devices and equipment. These attacks can take control
of the device by tampering with hardware. IoT physical attacks are launched when an attack
is close to the network or IoT device. Some of the significant threats at the
physical/perception layer include:
In this paper, an Intrusion
Detection System (IDS) for
cybersecurity based on a
Convolutional Neural Network
(CNN) classifier is proposed. The
convolution and pooling
process of CNN allow the
proposed IDS model to learn
complicated patterns of features form
network traffic, while
maintaining reasonable storage and
computation overhead.
This feature differentiates the
proposed IDS model with the
traditional one, which requires a
signature database that is
usually handmade by security
experts to perform
classification. Also, an innovative
dataset preprocessing
method is used to boost the multi-
class classification
performance of the proposed CNN
based IDS. In our
evaluation using CICIDS2017
security database, the
proposed IDS model outperforms
nine other well-known
classifier models in most multi-
class classification
categories. More specifically, out of
the 14 available attack
classes in CICIDS2017, the
proposed model reaches the
highest Detection Rate (DR) at 10
attack classes compared
to the nine selected models. This
IDS model also has the
highest True Negative Rate (TNR)
and the lowest False
Alarm Rate (FAR) for benign
network traffic. Also, the
proposed IDS provides better
multi-class classification
results than some other recently
proposed CNN based IDS,
which only capable of only one-
class classification or
classifying a few classes. In addition,
it shows the potential
at detecting innovative attacks,
which the traditional IDS
models struggle to perform. Some
custom datasets were used
to evaluate this model’s
performance at innovative DoS
attacks. The simulation results
indicate that the proposed
model is capable of detecting some
DoS samples that the
model never encounters in its training
process.
The proposed IDS reaches a new
altitude at detecting
cyberattacks, which the traditional
IDS models never
achieve. This model not only
offers protection against the
known attacks with a high DR and
low FAR, but also shows
CHAPTER-6
CONCLUSION

In this paper, an Intrusion Detection System (IDS) for cybersecurity based on a


Convolutional Neural Network (CNN) classifier is proposed. The convolution and pooling
process of CNN allow the proposed IDS model to learn complicated patterns of features
form network traffic, while maintaining reasonable storage and computation overhead.
This feature differentiates the proposed IDS model with the traditional one, which requires
a signature database that is usually handmade by security experts to perform
classification. Also, an innovative dataset preprocessing method is used to boost the
multi-class classification performance of the proposed CNN based IDS. In our
evaluation using CICIDS2017 security database, the proposed IDS model outperforms
nine other well-known classifier models in most multi-class classification categories.
More specifically, out of the 14 available attack classes in CICIDS2017, the proposed
model reaches the highest Detection Rate (DR) at 10 attack classes compared to the nine
selected models. This IDS model also has the highest True Negative Rate (TNR) and
the lowest False Alarm Rate (FAR) for benign network traffic. Also, the proposed IDS
provides better multi-class classification results than some other recently proposed CNN
based IDS, which only capable of only one-class classification or classifying a few
classes. In addition, it shows the potential at detecting innovative attacks, which the
traditional IDS models struggle to perform. Some custom datasets were used to evaluate
this model’s performance at innovative DoS attacks. The simulation results indicate
that the proposed model is capable of detecting some DoS samples that the model
never encounters in its training process. The proposed IDS reaches a new altitude at
detecting cyberattacks, which the traditional IDS models never achieve. This model not
only offers protection against the known attacks with a high DR and low FAR, but also
shows
CHAPTER-7
REFERENCES

[1] Anderson, James P., Computer Security Threat Monitoring and Surveillance,
Washing, PA, James P. Anderson Co., 1980.
[2] Wenke Lee et al., "Real time data mining-based intrusion detection," In Proceedings
DARPA Information Survivability Conference and Exposition II. DISCEX'01, Anaheim,
CA, USA, pp. 89-100, 2001.
[3] Jyothsna, V. V. R. P. V., VV Rama Prasad, and K. Munivara Prasad, "A review of
anomaly based intrusion detection systems," International Journal of Computer
Applications, vol. 28, no. 7, pp. 26-35, 2011.
[4] G. Ciaburro and B. Venkateswaran. “Neural networks with R: smart models using
CNN, RNN, deep learning, and artificial intelligence principles.” Birmingham, UK: Packt
Publishing, 2017.
[5] J. Kiefer and J. Wolfowitz, “Stochastic Estimation of the Maximum of a Regression
Function,” The Annals of Mathematical Statistics, vol. 23, no. 3, pp. 462-466, 1952.
[6] Bottou, Léon, Frank E. Curtis, and Jorge Nocedal, "Optimization methods for large-
scale machine learning," Siam Review, vol. 60, no. 2, pp. 223-311, 2018.
[7] M. Roopak, G. Yun Tian and J. Chambers, "Deep learning models for cyber security in
IoT networks," In Proceedings of IEEE 9th Annual Computing and Communication
Workshop and Conference, Las Vegas, NV, USA, pp. 0452-0457, 2019.
[8] Yeom, Sungwoong, and Kyungbaek Kim, "Detail analysis on machine learning based
malicious network traffic classification." In Proceedings of the International Conference
on Smart Media & Applications, pp. 49-53, 2019.
[9] Kim, Jiyeon, Yulim Shin, and Eunjung Choi, "An Intrusion Detection Model based on a
Convolutional Neural Network," Journal of Multimedia Information System, vol. 6, no. 4,
pp. 165-172, 2019.
[10] Khraisat, Ansam, et al. "Survey of intrusion detection systems: techniques, datasets
and challenges," Cybersecurity, vol. 2, no. 1, p. 20, 2019.
[11] Y. Gu, K. Li, Z. Guo, and Y. Wang, “Semi-supervised K-means DDoS detection
method using hybrid feature selection algorithm,” IEEE Access, vol. 7, pp. 64351–64365,
2019.

You might also like