You are on page 1of 26

Journal Pre-proof

Deep Recurrent Neural Network For IoT Intrusion Detection System

Muder Almiani , Alia AbuGhazleh , Amer Al-Rahayfeh ,


Saleh Atiewi , Abdul Razaque

PII: S1569-190X(19)30162-5
DOI: https://doi.org/10.1016/j.simpat.2019.102031
Reference: SIMPAT 102031

To appear in: Simulation Modelling Practice and Theory

Please cite this article as: Muder Almiani , Alia AbuGhazleh , Amer Al-Rahayfeh , Saleh Atiewi ,
Abdul Razaque , Deep Recurrent Neural Network For IoT Intrusion Detection System, Simulation
Modelling Practice and Theory (2019), doi: https://doi.org/10.1016/j.simpat.2019.102031

This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition
of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of
record. This version will undergo additional copyediting, typesetting and review before it is published
in its final form, but we are providing this version to give early visibility of the article. Please note that,
during the production process, errors may be discovered which could affect the content, and all legal
disclaimers that apply to the journal pertain.

© 2019 Published by Elsevier B.V.


1

Deep Recurrent Neural Network For IoT Intrusion


Detection System
Muder AlmianiA*, Alia AbuGhazlehA*, Amer Al-RahayfehB, Saleh
AtiewiC, Abdul RazaqueD
A
Al-Hussein Bin Talal University, Ma’an, Jordan.
A
Jordan University of Science and Technology, Irbid, Jordan.
B
Al-Hussein Bin Talal University, Ma’an, Jordan.
C
Al-Hussein Bin Talal University, Ma’an, Jordan.
D
Department of Computer Engineering and Telecommunication, International IT University, Almaty Kazakhstan
A*
Al-Hussein Bin Talal University, Computer Information Systems Department, Ma’an, Jordan.
Email: malmiani@my.bridgeport.edu. Muder.m.almiani@ahu.edu.jo
A*
Jordan University, 11942, P.O. Box 13375, Amman, Jordan.
E-mail address: aabughazleh03@eng.just.edu.jo
--------------------------------------------------------------------------------------------------------------------------------------------

ABSTRACT

As a results of the large scale development of the Internet of Things (IoT), cloud computing capabilities including
networking, data storage, management, and analytics are brought very close to the edge of networks forming Fog
computing and enhancing transferring and processing of tremendous amount of data. As the Internet becomes more
deeply integrated into our business operations through IoT platform, the desire for reliable and efficient connections
increases as well. Fog and Cloud security is a topical issue associated with every data storage, managing or
processing paradigm. Attacks once occurred, have ineradicable and disastrous effects on the development of IoT,
Fog, Cloud computing. Therefore, many security systems/models have been proposed and/or implemented for the
sake of Fog security. Intrusion detection systems are one of the premier choices especially ones that designed using
artificial intelligence. In our paper, we presented an artificially full-automated intrusion detection system for Fog
security against cyber-attacks. The proposed model uses multi-layered of recurrent neural networks designed to be
implemented for Fog computing security that is very close to the end-users and IoT devices. We demonstrated our
proposed model using a balanced version of the challenging dataset: NSL-KDD. The performance of our model was
measured using a variety of typical metrics, and we add two additional metrics: Mathew correlation and Cohen’s
Kappa coefficients for deeper insight. where the experimental results and simulations proved the stability and
robustness of the proposed model in terms of a variety of performance metrics.
Keywords: Internet of Things, intrusion detection, Kalman filter, IoT, Recursive Network

1INTRODUCTION

Internet of Things (IoT) is considered as the next evolution of the internet, where the capability to connect to the
internet is given to every entity [1,2] . Cisco-IBSG predicts that will be about more than 50 billion devices
connected to the internet by 2020 [3]. This huge number of connected devices reveals a corresponding gigantic
amount of traffic and digital data generated and transfer. From Megabyte (10 6) of data to Brontobyte (1027) and
Geopbyte (1030), these measurements will be used to describe the tremendous amount of digital pool formed by the
IoT platform. As a matter of fact, 40% of IoT-created data is stored, processed, analyzed and acted upon close to the
edge of network edges where cloud shortcomings to meet the IoT requirements manifest. These shortcomings and
2

accelerated IOT development necessities oil the wheels for the development of Fog computing paradigm. On the
other hand, as the depth of this digital pool magnifies, it is likely to become turbulent by various types of attacks and
penetrations [4]. Accordingly, various approaches and techniques designed and implemented to protect the platform
of IoT such as firewalls, data encryption, and user authentication through Fog computing paradigm. These vectors of
attacks and threats keep evolving, leaving classical security techniques inefficient and ineffective to address the
problem of IoT security opening the door for a new generation of intrusion detection systems built using machine
learning and artificial neural networks.
A huge body of works and researches have been conducted in the context of finding the best intelligent
intrusion detection system in IOT-based environments for differtent types of applications [5,6]. As intrusion
detection systems are one of the different major remedies applied for sake of IoT security, there is a tendency to use
more than one technique concurrently as proposed by Alharbi et al. [7] where they demonstrated a proof-of-concept
system for IoT security implemented in Fog computing layer. the proposed system composed of VPN server, a
traffic analysis engine, challenge-response unit, and a firewall. Each unit thwarts specific types of attacks. VPN
server secures the communication channels between IoT systems against sniff, spoof, and man-in-the-middle
attacks. The intrusion detection systems of traffic analysis units were used to detect DoS and DDoS attacks where
decision-tree machine learning technique was used as a classification engine. In order to authenticate the response of
intrusion detection system, a challenge message is sent by the challenge-response unit in case of intrusion detection.
As failing in responding to this message happened, the system blocks the connection by firewall unit.
Pajouh et al. [8] proposed a novel layered intrusion detection system for IoT backbone networks using two-tier
dimension reduction engine and two-tier classification engine. The engine of dimension reduction composed of
component analysis and linear discriminate analysis units whereas the classification engine composed of Naïve
Bayes and certainty factor version on K-nearest neighbor (CF-KNN) cascaded units. Naïve Bayes classifier was
used to classify attack records which, in turn, refined by CF-KNN classifier as a second filtering layer. Using NSL-
KDD [9] dataset, the proposed model achieved competitive detection performance for hard-to-catch attacks, i.e.,
U2R and R2L classes.
Using Wireshark software over IoT testbed network for four consecutive days and applying machine learning
techniques on it, Anthi et al. [10] proposed predictive and adaptive intrusion detection system for IoT systems. The
proposed system consists of two main phases. During the first phase, they built a real IoT smart-home testbed and
the normal activities were monitored for each device connected on the IoT network. Then, in the second phase,
malicious activities were applied to these devices leading to anomaly network traffic. These phases fed a supervised
machine learning technique with proper training data which composed the core of the intrusion detection model.
Dovom et al. [11] employed fuzzy and fast fuzzy pattern tree method for intrusion detection and malware
categorization in IoT network. This type of fuzzy-based technique composed of a tree-like fuzzy top-down induction
structure, where the inner nodes of the structure are fuzzy logic arithmetic operators whereas the leaves of these
nodes are associated with fuzzy predicates applied on input features. Using Vx-Heaven, IoT, Kaggle and
Ransomware datasets, their proposed model achieved high detection accuracy during reasonable run-times. For
improved detection capability, Wang et al. [12] implemented a logarithmic marginal density ratios transformation to
transmute NSL-KDD dataset features into new and better quality representative ones. Using the Support Vector
Machine (SVM) as a classification engine, the empirical results showed robust performance in terms of detection
rate and detection accuracy.
Using a comprehensive representation of modern IoT attack scenarios, Zhang et al [13], used UNSW-NB [14]
benchmark dataset to demonstrate the efficiency of machine learning-based intrusion detection. Although they used
a simple multi-layer perceptron as a classifier, they used a novel feature selection engine applying Denoising
Autoencoder (DAE) based on a weighted loss function. This novel technique of feature selection yielded an infused
focus on attack-representative features. As another application of UNSW-NB dataset, an IoT network forensic
architecture composed of decision tree C4.5, Naïve Bayes, Association Rule Mining (ARM) and Artificial Neural
Network (ANN) machine learning techniques was proposed by Koroniotis et al. [15]to identify and track novel and
complicated forms of current botnet attacks.
As an example of the integration of SDN and IoT, Dawoud et al. [16] presented a deep learning-based intrusion
3

detection system for SDN-based IoT architecture, where SDN modeling was used for the IoT security, scalability
and resilience enhancing purposes whereas Restricted Boltzman Machine (RBM) was used as the engine for
intrusion detection. The proposed model evaluated and validated using KDD Cup’99 dataset where it achieved a
competitive performance higher than 94% in terms of precision and accuracy.
Hodo et al [17] proposed a simple multi-layered perceptron neural network trained with feedforward and
backward learning algorithms for detecting DoS/DDoS attacks in IoT networks. IoT structure composed of five
node sensors, one of them acted as a server relay node for data analysis while the other acted as client. The traffic of
IoT network was captured using a network tap avoiding any modification may occur to the live traffic. DoS attacks
were conducted by sending over 10 million UDP packets to single host whereas DDoS attacks were conducted by
sending over 10 million of UDP packets to three hosts of wired speed overflowed the server node. The proposed was
successfully able to catch DoS/DDoS attacks at highly competitive accuracy reached up to 99.4% as overall
performance.
To be used in computer networks, Mohammadi and Sabokrou [18] proposed a semi-supervised intrusion
detection model built using deep structured neural networks trained by generative adversarial learning. The model
comprises of two major phases: training and testing. Training phase which was performed only using the packet
flow of normal connections of NSL-KDD dataset is composed mainly of two cascaded modules. The first module
consists of encoder-decoder network whereas the second module consists of a fully-connected neural network
followed by SoftMax classifier. The packet flow of anomaly network traffic was generated from the normal one
using adversarial training by re-constructing the normal packets via an optimized encoder-decoder network. On the
other hand, testing phase uses the trained neural network yielding from training phase where KDDTest+ was used
completely. Detection accuracy of 91.39% was achieved by the proposed model.
Another semi-supervised intrusion detection system was proposed by Kumari and Varma [19] where the
classification engine composed of a hybrid combination of active learning Support Vector Machine (SVM) and
Fuzzy c-means (FCM) clustering techniques. Besides the large amount of unlabeled data, active SVM technique
uses a small subset of labeled dataset where it was approved that after N iterations, active learning SVM exhibits
comparable detection performance as achieved by classical support vector machine. On the other hand, FCM
classifier was applied on data items around support vectors for sake of multi-class categorization. In this model,
intrusion detection was conducted using both classifiers engines: SVM and FCM. If both classifiers labeled an input
as instance as normal, then it is considered as normal with confidence. However, if the input instance was labeled as
anomaly by SVM engine as well as the sub-category of it was determined by FCM engine, then the instance is
considered as abnormal and the nearest circle to support vectors with higher fuzzy membership was chosen as the
sub-class.
Other researchers used ensemble learning for robust IoT security, which is a technique of using multiple
techniques/models or experts for solving a particular artificial intelligence-based problem. In the problem of
intrusion detection, ensemble learning promotes better generalization and the voting between the different
techniques of ensemble provide higher detection accuracy than the individual models as proposed by Illy et al. [20].
In this paper, we develop an intrusion detection system composed of cascaded filtering stages. where deep
multi-layered recursive neural networks used for each filter and tuned to catch specific types of attacks that are well-
known for IoT environments such as DoS, Probe, R2L, and U2R. the remainder of this paper is organized as
follows. Section 2 describes the details of our proposed intrusion detection model followed by thorough
experimental validation of the proposed model presented in Section 3.

2 The Proposed Model


4

In this section, the architecture, concept and design principles of our proposed model are presented. Figure 1 shows
the general architecture of our proposed model implemented in Fog computation layer.

Fig. 1 General framework of proposed intelligent IoT security model.


As shown in Figure 1, the proposed intelligent intrusion detection model composed of two major engines:
traffic analysis engine and classification engine. Traffic connection records are pre-processed in traffic processing
unit leading to traffic data in a format suitable to be processed by the deep neural network of classification engine
whereas these connections are classified into normal and attack by intelligent intrusion detection engine. The
proposed model can be implemented in Fog computing that is very close to end-users and IoT devices. The model
adopts a recurrent neural network trained by an adaptive version of backpropagation algorithm for enhanced
prediction capability of the normal/attack classification. A recursive structure from nonlinear parts’ outputs of
neurons to the liner parts enables fast response and reliable real-time security protection for the IoT system. The
recursive network represents the major engine of classification-based traffic analysis, namely, it analyses the
network traffic that attempts to access the IoT system and give security alarm in case of detected intrusion. These
two basic units are elaborated in the following subsections.
2.1 TRAFFIC PROCESSING ENGINE
We used NSL-KDD dataset [9] for the sake of model training, testing, and validation. Data features that represent
input traffic of networking system are naturally inconsistent. Thus, traffic data pre-treatment is a necessary gate for
the classification engine [21]. Traffic Pre-processor engine applies four pre-processing steps on raw traffic data: (1)
Symbolic-to-numeric transformation. (2) Features reduction. (3) Data min-max normalization. (4) Data
oversampling.
2.1.1 Symbolic-to-Numeric Transformation and Labels Encapsulation
As shown in Figure 2, a sample of off-line traffic data records show that the first numeric field is followed by
three symbolic fields represent: protocol, service, and flag features respectively of connection records. In our work,
we codify these fields as shown in Table 1. This step is equivalent to codify symbolic-value fields (attributes) of
NSL-KDD dataset into numeric ones. Where protocol features are symbolized as
. Service features as and Flag features as
. Where corresponding numeric value (attribute weight) for each attribute value was
selected based on the frequency of feature. As the frequency increases, the corresponding numeric value decreases.
This way, the attributes of the least frequency will not be overwhelmed by the value of the highest frequency
attributes.
5

Fig. 2 Snapshot of NSL-KDD dataset associated with zoomed box of symbolic attributes.

As a final step of dataset codification, the different labels of attacks sub-categories are capsulated and codified
to their main categories as shown in Table 1.
Table 1. NSL-KDD class categories: main and sub-class. †

Sub-class Category Assigned Numeric Code

Back, Land, Neptune, Pod, Smurf, Teardrop, Mailbomb, Processtable, Udpstorm, DOS 1
Apache2, Worm

Satan,Ipsweep, Nmap, Portsweep, Mscan, Saint Probe 2

Guess_password, Guess-passwd, ftp-write, Imap, Phf, Multihop, Warezmaster, R2L 3


Xlock, Xsnoop, Snmpguess,
Snmpgetattack,Httptunnel,Sendmail,Named,Warezclient,Spy

Buffer-overflow, Loadmodule, Rootkit, Perl, Sqlattack, Xterm, Ps U2R 4

† normal classes are given ‘0’ as a numeric code.


Refereeing to Table1, we have two types of capsulations: for the binary engine (normal, attack classification),
all records of training dataset are capsulated into normal and attack. In the second engine, the 40 attack labels
(classes) are capsulated to their four major categories as shown in Table 1.
2.1.2 Features Reduction
In this step, we get rid of all constant-valued attributes for all records in traffic data that have no effect on the
analytical results of the neural network. In our work, features have
been removed due to their zero value yields a reduced size of data volume from 41 to 26 features.
2.1.3 Data Min-Max Normalization
For sake of proper range of data suit to be as neural network inputs, the attributes of traffic data are scaled so as to
fall within a small specified range. In our work, we applied linear transformation in data represented by min-max
normalization.
Suppose that and represent the minimum and maximum values of feature respectively, then, min-
max normalization maps the value of feature to the new value in the new range of :
using (1):
6

( )
(1)

2.1.4 Data Oversampling


This step is an essential step in dataset pre-processing to address the issue of dataset imbalance. NSL-KDD
dataset consists of about 125,000 records. It is easily be calculated that the percentage of normal, DoS, probe, R2L
and U2R records are 67,343, 45,927, 11,656, 995 and 52 respectively. Graphically represented, Figure 1 shows that
normal records represent about 50% of dataset followed by DoS and Probe records, and the rest of the classes
composed less than 5% of the training dataset. As a result, to this oblivious imbalance, our neural network will show
biased classification behaviour towards normal and DoS records and weak classification response against other
least-frequent attack classes.
Owing to the low frequency that shown by {R2L and U2R} attack classes, the neural network will deal with it
as if they are noisy signals due to the negligible effect of these attacks on weight updating yielding an obtrusive
weak detection to these particular types of attacks. As a remedy to this problem, both R2L and U2R attacks are
oversampled through inserting repeated blocks of U2R and R2L records in different sites of data body.
Oversampling resulted in new statistics and distribution of these rare types of attacks as can be noted in Figure 3.

Fig. 3 Graphical comparison of attacks distribution for raw NSLKDDTrain+ and our proposed balanced version.
7

2.2 INTRUSION DETECTION ENGINE


Our proposed model consists of two cascaded detection tiers used two deep recursive neural networks with
different internal structures and setup parameters and hyperparameters as shown in Figure 4.
As shown in Figure 4, the first layer demonstrates DoS attack detection as it is considered one of the major

Fig. 4 Pipeline of proposed IoT intrusion detection model.


attack types that thwart IoT systems besides detecting other types of attacks. For a high level of security, the normal
8

response of the first filter is re-filtered using another network in the second layer with different internal structure,
recursive gain and parameters setup where it tuned to catch the attacks that leaked out from the first layer, especially
the hard-to-detect attacks, i.e., U2R and R2L attack; thus, for this purpose, the second layer of proposed model was
trained by same training dataset of first filter network except that DoS attacks were excluded for more oriented
training.
We used a deep proportional recursive network structure and a modified version of back propagation algorithm
as training algorithm to develop a stable intelligent multi-layered intrusion detection model. Originally, Scalero and
Tepedelenlioglu [22] applied a modified version of the backpropagation algorithm on a simple feedforward neural
network where the target mean-squared error was defined between the desired and actual inputs of the linear parts of
neural network. However, in our work, rather than using a simple feedforward structure, we applied the modified
backpropagation algorithm on a deep proportional recursive network structure as shown in Figure 5.

Fig. 5 Illustrative 3-layered neural network architecture of proposed intrusion detection model.

The deep recursive structure creates a non-linear proportional embedding of the previous state in the
current state as: ( ) , where ξ represents the recursive gain. The traditional neural
network structures that trained by traditional backpropagation algorithm suffers from the exploding gradient
problem where the setup parameters or the hyperparameters in the hidden layers do not force the change as expected
or it may force the neural network to instable state. Therefore, adding a proportional feeding back from previous
state to the current state elevates this problem and enhances the stability of the neural network response.
Referring to Figure 5, feedforward path consists primarily of two major parts: linear and non-linear parts. The
weighted edges and the output of summation composed the linear part while the non-linear activation function
composed the non-linear one.
The proposed training algorithm can be decomposed into the following four steps:
(1) Feedforward computation.
(2) Backpropagation to the output layer.
(3) Backpropagation to the hidden layer.
(4) Weights update.
9

The algorithm is stopped when the value of the mean-squared value of error function has become sufficiently small
or if it reaches the maximum number of iterations. These major steps are mathematically illustrated in the following
subsections.
2.2.1 Feedforward Computation.
Without loss of generality, in order to simplify the proposition of the training algorithm, we deal with a three-
layered network ℒ =3 as: {Input, Hidden, Output}. The major goal of the training is to attain the optimal set of
networks parameters/hyperparameters for hidden and output layers for the sake of optimal classification
performance. Consider a network with an input vector for pattern =[ ], hidden nodes and
output nodes. The weights between the input layer and hidden node will be called whereas the weights
between hidden nodes and output unit will be . The weighted response of the pattern ξ
returned back to the input of summation unit and it is implemented as an additional weighted edge. Including bias
into our account, the length of the input vector is extended by two: bias and weighted edge and this is applied
to all layers.
The excitation or the output of the linear part of neurons of the is given by (2):

∑ (2)

We choose symmetrical sigmoid as transfer function for all nodes of the network, the output of hidden layer is
thus given by (3):
∑ ) (3)

Where Represent the response of nonlinear parts of neurons and it chosen to be asymmetrical sigmoid as in (4):

(4)
Where,
: sigmoid slope.
The outputs of all nodes of the hidden layer can be compared with the vector-matrix multiplication as in (5):

(5)
Same applied for output layer as (6):
(6)

These formulas can be generalized for any number of layers.


Within the feedforward step, the vector =[ ] is presented to the network. Consequently, the vectors
and are computed in Figure 5. The values of activation function , the derivative of the activation
function and the inverse of the activation function are also computed at each unit.
2.2.2 Backpropagation to The Output Layer
In this step, we are looking for the first set of partial derivatives of error signal with respect to as .

However, our error signal is different than is used by all conventional versions of backpropagation neural networks.
The error signal is the total mean-squared difference between actual summation outputs and desired summation
outputs .For the output layer (7):
10

∑ (7)

Where:
: error signal of the neuron of the output layer.
: pattern.
total number of patterns.
: desired summation output of neuron at pattern.
: actual (estimated) summation output of the neuron of the output layer at pattern.
Error signal can be minimized by taking the partial derivative of with respect to each weight and
equating to zero as in (8):

∑( ) (8)

Where, ∑ , total number of neurons in the hidden layer. : the output of the neuron
of hidden layer for pattern.
For each neuron of the output layer, making a substitution of ∑ in (6), and following
derivation steps of [22] we end up with the following solution (9): (9)

Where,
: cross-correlation vector between training instances and corresponding desired summation outputs.

: autocorrelation matrix of training instances. ∑ .
: weight matrix.
According to (9), the solution to the partial differential equation of (8) is . where is the inverse of
autocorrelation matrix . For the sake of implementation, Scalero and Tepedelenlioglu [22] adapted on-line
training. This mode of training posed a problem where using an estimation of one-layer output depends on the data
received from the previous one. Subsequently, at the beginning of training, the previous layer still untrained yields
inaccurate correlation estimates in contrast to that attained at the end of training due to the accumulating
nature of weights-correction approach, i.e., as the number of patterns involved in training increases, the estimation
of network tends to be more accurate. In order to solve this dependency, Scalero and Tepedelenlioglu [22] added a
forgetting factor to the recursive form (pattern-dependent) that forced the effect of previous training to be of
negligible effect on the current estimates of the network as illustrated in (10) and (11):
(10)

(11)

Where:
is a forgetting factor. 0 ≤ ≤ 1.
If the correlations of the network are specified in this way, the problem of passing large data into the network is
solved. However, since attacks evolve and change in their attributes, frequency, and complexity, the approach of
on-line training and adding forgetting factor to speed up the process is not sufficient for such type of pattern
classification, therefore, in our work, we build our algorithm to work in a batch mode for accurate attack detection
and we set the forgetting factor in equations (10) and (11) to 0.99 and we add additional internal recursive units
that are independent of dataset correlations. These recursive units are of internal dependence, i.e., instead of
11

depending on the response of the whole network to the previous input pattern, our recursive units enable a
dependency at neuron-level on the previous ξ-weighted response of each neuron to previous input pattern in a batch
mode.
To sum up, we have two forms of recursive in this network, one recursive form determines the dependence on
the correlation of previous input patterns represented by and and the other recursive form determines
the dependence on the weighted-output of each neuron to previous patterns and it represented by .
In contrast to [22], we run both recursive forms in batch mode. and represent recursive forms that
accelerate the rate of learning whereas the recursive form enables discriminative weight updating against
attack instances that have overlapped features.
Returning to Figure 5, we note that the recursive edge of ξ gain has no associated weight value. Thus, it will not
be involved in weight updating steps. In other words, we can imagine ξ-weighted -edge as the basic catalyst for
enhanced discriminative weight updating. Now the problem reduced to the issue of solving for . Since this type
of problem belongs to recursive least square filtering [23], weight matrix can be found using Kalman
filtering as:
For pattern, is given as in (12):

(12)
Where,

(13)

: inverse of matrix of layer for input pattern.


: Kalman gain of layer for input pattern.
: Output of layer for input pattern.
: forgetting factor.
: transposed output of layer for input pattern.
Based on equations (12) and (13), backpropagated signal of the node of output layer ℒ is:

(14)
And the partial derivative we are looking for:

ℒ (15)

where,
: first derivative of neuron transfer function with respect to .
2.2.3 Backpropagation to the hidden layer
Now, our task is to compute the partial derivative for hidden layers, where each neuron is connected to

each neuron in the output layer with a weighted-edge , for all as shown in Figure 5. Thus, the
backpropagated error up to unit in the the hidden layer is computed as in (16):

∑ (16)

And the partial derivative of error function with respect to is given by (17):
12

(17)

2.2.4 Weights update


Since all partial derivatives have been computed, the network weights are updated in the negative gradient direction
using Kalman gain as in (18):
For output layer ℒ :

(18)
Where,
: the inverse function value of network target output as given mathematically by (19):

(19)
Where,
: desired network output (targets) of node of output layer ℒ .
: the desired summation of node of output layer ℒ .
On the other hand, for the hidden layer, weight updating proceeds as in (20):

(20)
Where,
: backpropagation step size of hidden layer.
This proposed algorithm can be extended to any number of hidden layers. In our proposed model, we adapted Mean-
Squared Error (MSE) and the number of iterations as measures for network output convergence.

3SIMULATION RESULTS

This section presents the experimentations carried out and the simulating results obtained from running our intrusion
detection model applying different types of running parameters. It also discusses these results and provides a
comparison with previous works. To build our model, we used Intel® Core TM i7 4Due 2.4,1.8GHZ CPU and 8.0
GB RAM configured with Windows 10. The model was developed in MATLAB® 2018b [24] environment.
Regarding our dataset, as we stated in our model exposition, NSL-KDD dataset was chosen to be our training
and testing benchmark dataset even though we could use KDD Cup’99 dataset for this purpose but the immensity of
KDD Cup’99 dataset imped a fatal issue. Due to the high redundancy of records, many of machine-learning and
artificial intelligence-based intrusion models that run using KDD Cup’99 dataset showed high performance reached
up to 99% in all aspects of performance measuring without notable trade-off or consolidated tuning operations.
Therefore, it was inequitable to use KDD Cup’99 dataset as a basis for comparing different machine learning models
in terms of detection performance. According to [9], the redundancy of KDD Cup’99 dataset is 78% and 75% in the
train and test datasets. Thus, intelligent systems/models are learned using duplicated records. To make it worse these
systems/models are validated and tested using duplicated records as well. Thus, the problem of inaccurate and
unreliable detection performance is amplified. On the other hand, although of merits of NSL-KDD dataset,
removing the redundancy of KDD Cup’99 dataset enhance the issue of imbalance between high frequent {DOS,
Probe} and low frequent attacks (U2R, R2L) which was solved in our work using data oversampling tactic as stated
earlier in our model exposition. In this layered model, the confusion matrix is the building block of all performance
metrics where it was generated for each layer. It includes significant information about actual and predicted output
classes. Based on the confusion matrix, the following performance metrics are computed as follows:
• True Positive (TP): this value represents the correct classified attack records as attacks.
13

• True Negative (TN): this value represents the correct classified normal records as normal.
• False Positive (FP) and False Negative (FN): these values illustrate that an incorrect classification takes place.
If the attack record is classified as a normal one, a value of FP is recorded and presents a critical problem for
confidentiality and availability of network resources since attackers succeed to pass through intrusion detection
system. On the other hand, FN is recorded when normal records classified as attack ones. A false positive is
basically an alarm on acceptable behaviour or as it called a false alarm rate. Table 2 elaborates these concepts in the
framework of the confusion matrix.
Table 2: Typical confusion matrix for binary classification.

Predicted Class

Normal Attack

Actual Normal TN FP

Class Attack FN TP

Based on the confusion matrix defined in Table 2, we define the following performance metrics: These metrics
are specified mathematically as in (21-26) and fully described in [25]:

(21)

(22)

(23)

(24)

(25)

(26)
While most of the performance measures focus on detection rate and detection accuracy, in our work, we adopt
two additional performance metrics: Kappa and Mathew Correlation Coefficient. The major reason behind this
further adaptation is measuring the stability of recursive network performance.
Mathew Correlation Coefficient (MCC) ranges between -1 and 1, where -1 refers to complete wrong binary
classification whereas 1 refers to completely correct binary classification. This metric allows us to gauge how robust
our classification engine is performing and it is given as in (27):

√ (27)

In prediction modelling, performance metrics as in (21-26) do not provide the complete picture of our
classification, especially in the highly-balanced dataset as we work with. Cohen’s Kappa К coefficient is a very
powerful one, where it can handle imbalanced classes effectively and it is mathematically given as in (28):

(28)
14

Where,

(29)

(30)
Where,

(31)

(32)

Basically, Kappa coefficient indicates how much better our classification engine is performing over the
performance that would obtain if the classifier depends on the random frequency of classes which, in turns, reflects
the robustness and high stability of classification engine against least-frequent difficult-to-catch attacks depending
on the numeric value of К coefficient. Landis and Koch (1977) [26] considered К values ≤ 0 as indicator of useless
classifier, (0-0.2) indicates a slight agreement, (0.21-0.4) as fair,(0.41-0.60) as moderate, (0.61-0.80) as substantial
and (0.81-1) as almost perfect agreement. Figure 6 shows the 1st detection layer of our proposed layered intrusion
detection model. This layer is considered as a first defence layer; therefore, the issues as binary detection rate, binary
detection accuracy and response time are of high priority.

Fig. 6. Block diagram of first detection layer of proposed model associated with network free parameters.

For first layer simulations, we used 68,000 training records, while 40,000 records were used for testing
purposes. Training dataset composed of {normal = 33,901 |Dos = 23,390 |Probe = 5356 |R2L = 4640 |U2R = 713}
whereas testing dataset composed of {normal = 19,657 |DoS= 13,855 |Probe = 3446 |R2L = 2079 |U2R = 277}. As
can be noted from Figure 6, the classification performance of recursive neural network is affected by various
network free parameters. In order to obtain the as-optimal-as-possible performance, the initial weights and the
initial value of the autocorrelation matrix represents first parameters were required to be set. After applying 500
training iterations, and values were set and the values of other parameters were changed starting with
recursive gain ξ until we got the most relatively satisfying performance as illustrated in Table 3 and Table 4.
Table 3: Confusion matrix for binary classification of 1st detection layer.

Predicted Class
15

Normal Attack

Actual Normal 19369 948

Class Attack 2084 17573

Table 4. Performance measurements of the 1st detection layer.

Performance Metric Value

Detection Rate 95.34%

Accuracy 92.42%

Precision 90.30 %

False-positive Rate 10.06%

Mathew correlation coefficient 0.8496

Cohen’s Kappa К 0.8482

Further analysis of normal and anomaly responses of 1 st detection layer reveals an important result: deep
recursive neural network was able to detect DoS attacks with 0% FPR. Equally important, the DoS detection rate
reached up to 97.83% for 1st detection layer and 98.27% after the records re-filtered through second detection layer
as depicted in Figure 7.

Fig. 7. Graphical pipeline represents detection performance of DoS attack.

Consequently, any anomaly traffic is caught by the first detection layer and identified as DoS, then it is 97%
correct detection, which represents very important detection capability to IoT security system. DoS attacks are
considered prominent attacks since these types of attacks affect bandwidth, IoT network resources (devices), CPU,
etc. where IoT devices are no longer accessible for legitimate users.
besides DoS detection, the first detection layer can detect Probe attacks category as shown in Figure 8.
Although probe attack is not as well-known as DoS attack in IoT networks, first layer of our proposed system can

Fig. 8. Graphical pipeline represents detection performance of Probe attack.


16

detect this type of attacks with 84.38% detection rate where probe attacks that leaked from first layer are detected by
the second layer enhancing the overall detection rate to 97.36% as shown in Figure 8.
Although first detection layer shows high detection performance against DoS and Probe attacks, it shows a
deliberate performance degradation against R2L and U2R attacks as can be shown in Figure 9. A closer examination
in the normal response of the first detection layer reveals the existence of {R2L, U2R} attacks in the normal
response that were weakly detected by the first recursive network which elaborate the major role of the second
filtering layer.

Fig. 9. Graphical pipeline represents detection performance of R2L and U2R attacks.

Fig. 10. MSE profiles versus different number of iterations for (a), (b), (c) bad weights initialization. (d) proper weights
initialization revealed in smooth monotonic decreasing MSE profile.
The performance experimental results presented in Table 3 and Table 4 are obtained for | | and
| |≤ applied on deep layered structure [ ] where the dataset was normalized to
range. ξ = 2.25, b= 0.99 and α= 0.2. Backpropagation step size µ was set to 1.5 for the first hidden layer and 0.2
for successive ones. As all intelligent learning models, initial weights are of high impact on network convergence as
can be demonstrated in Figure 10. (a), (b), and (c) for different improper weighs initialization.
17

On the other hand, adding a recursive gain ξ affects the stability and detection performance of 1st detection layer
as elaborated in Figure 11, Figure 12 and Figure 13.

Fig. 11. Effect of recursive gain ξ on detection performance in terms of false positive rate (alarm rate).
18

Fig. 12. Effect of recursive gain ξ on detection performance in terms of detection rate, detection accuracy and the precision of
detection.

Fig. 13. Effect of recursive gain ξ on detection performance in terms of detection rate, detection accuracy and the precision of
detection.
19

The major duty of the second detection layer is to detect the hidden attacks leaked in the normal response of the
first detection layer where the major focus is the capability of this layer in filtering difficult-to-catch attacks {R2L,
U2R} in a robust way without depending on the statistical distribution of these attacks besides detected the other
types that leaked from first filter. A detailed block diagram of the second filtering layer along with the
parameters/hyperparameters of the recursive network model are depicted in Figure 14.

Fig. 14. Pictorial Block diagram of second detection layer associated with network parameters/hyperparameters.

Since the first layer was able to detect DoS attacks with 0% FPR, then, as shown in the general block diagram
of proposed system in Figure 4, we excluded 23,390 DoS records out of training dataset and used the rest for the
recursive neural network of the second layer. Applying | | for hidden layers and | |
and | | applying for the deep structure [ ]
where dataset was normalized in the range of . Recursive gain ξ was set to 0.15, b= 0.99 and α= 0.2.
backpropagation step size was set to 1.5 for first hidden layer and 0.2 for successive ones. Table 5 lists the binary
performance of second filtering layer.
Table 5: Confusion matrix for binary classification of second detection layer.

Predicted Class

Normal Attack

Actual Normal 18341 1055

Class Attack 1123 961

Complementary to performance results reported in Table 5, zeta recursive gain ξ, as well as the nodes (neurons)
of the deep-layered structure of second recursive network exerts a major effect on the various aspects of detection
performance. Not only the sensitivity, false positive rate and the precision of second layer performance but also F-
measure, Mathew and Kappa coefficients are dramatically altered by the value of ξ recursive gain as demonstrated
in Figure 15, Figure 16, and Figure 17.
20

Fig. 15. Effect of recursive gain ξ on the predictive performance of second detection layer in terms false positive rate.

Fig. 16. Effect of recursive gain ξ on the predictive performance of second detection layer in terms F1-score, MCC and Kappa
coefficients
21

Fig. 17. Effect of recursive gain ξ on the predictive performance of second detection layer in terms Detection Rate,
Accuracy,Precision.

As the previous state of the recursive network partially controls the current state, then, a subsequent question
arises whether the detection of rare and hard-to-detect attacks of the second detection layer is affected by the number
of iterations that applied to the training phase. Figure 18 and Figure 19 demonstrate the effect of the number of
iterations on the model detection performance in terms of detection rate, detection accuracy, precision as the first
group and in terms of F1-score, MCC and К coefficients as the other group.

Fig. 18. Effect of number of training iterations on system performance in terms of detection rate, accuracy and precision metrics
of 2nd detection layer.
22

nd
Fig. 19. Effect of number of training iterations on system performance in terms of F1-score, Kappa and MCC metrics of 2
detection layer.

As can be shown from Figure 18 and Figure 19 after 200 training iterations, the model shows relatively higher
false positive rates at ξ=0.15 and ξ = 0.20. Nevertheless, the model shows the best performance in terms of detection
rate, accuracy, precision, F1-score, and MCC and К coefficients. As the goal is to optimize the predictive accuracy
of the proposed model, we adapted ξ =0.15 as optimal solution. In other words, returning more than 35% of previous
state of network back to the inputs results in noticeable degradation in the predictive performance, whereas
removing the recursive paths, i.e. set ξ = 0, results in worse performance. Therefore, returning 15% of previous state
was adapted as optimal one. Moreover, the network fails to converge at other ξ values such as ξ= {0.5,0.9,1,1.5,2, ξ
≥ 2.5}. Table 6 presents a comparison analysis of proposed detection model with other intrusion detection
models/systems in terms of computational overheads.
Table 6. Performance comparison with other IoT security models in terms of execution time.

Author, Year Time Method

Illy et al. (2019) [20] Training time: 7seconds for all Voting Ensemble using KNN, Random
records of KDDetrain+ dataset forest and boosting of decision trees.

Prediction time: less than 3 seconds


for all records of KDDTest+ dataset

Mohammadi and Sabokrou (2019) [18] Each input record needs 45µsec on Semi-supervised learning composed of:
average to be processed. encoder, decoder and neural networks
both trained using adversarial learning
algorithm.

Kumari and Verma (2017) [19] Training time 3.42 seconds for Hybrid combination of active SVM and
18,939 records FCM

Proposed Model Each input record requires 66 µsec Multi-layered deep recursive neural
on average to be processed. networks.

The overall model performance is pictorially elaborated in Figure 20 whereas Table 7 and Table 8 present a
23

comparison analysis of proposed detection model with other intrusion detection models/systems.

Fig. 20. Overall performance of predicted model.

Table 7. Multi-class overall model Performance.

Attack Category Detection Rate

DoS 98.27%

Probe 97.35%

R2L 64.93 %

U2R 77.25%

In our comparative analysis, we concentrate our attentions on those works specifically targeting the security of
IoT systems and networks as well as works that used artificial intelligence and machine learning techniques that
developed via NSL-KDD dataset.
Table 8. Performance comparison of different classifiers with binary class NSL-KDD dataset.

Author, Year DR ACC PRE FPR F1-Score Mathew Cohens’ FNR


correlation Kappa
coefficient coefficient
MCC К

Illy et al. - 85.81% - - - - - -


(2019) [20]

Pajouh et al. 84.86 % - - 4.86% - - - -


(2019) [8]

Mohammadi - 91.39% - - - - - -
and Sabokrou
(2019) [18]

Kumari and 99.6% - - 5.86% - - - 0.5%


Verma (2017)
[19]

Proposed 94.27% 92.18% 90.23% 9.8% 92.29% 84.44% 84.36% 5.7%


Model

† DR: Detection Rate. ACC: Accuracy. PRE: Precision. FPR: False Positive Rate. FNR: False Negative Rate.

As shown in Table 8, the results demonstrate that our cascaded multilayered filtering using recursive neural
networks produced better results in terms of detection accuracy compared to other methods and a comparable
performance in terms of detection rate and false positive rate. Kumari and Verma (2017) [19] performs better in
terms of FNR but underperforms for other metrics. On the other hand, our proposed model achieved almost perfect
24

agreement embodies in the values of К and MCC coefficients reached to 84.44% and 84.36% respectively, which, in
turn, ensures the robustness of the model against low-frequent attack. Moreover, due to the addition of recurrence
elements, the model detection performance shows almost perfect independence from the random frequency of the
attacks providing the suitability to be implemented in real-time IoT security applications.

4 C O N C LU S I O N

In this paper, we have proposed a Fog computing-based intrusion detection model for IoT network security. The
proposed model adapts a recurrent neural network trained by an enhanced version of backpropagation algorithm.
The results of performance evaluation reveal the effectiveness of the adaptive cascaded filtering using the recursive
structure of nueural networks where each network adaptively tuned to different parameters/hyperparameters for
enhancing the detection of specific intrusion types. As a result, the model shows high sensitivity to DoS attacks that
represent one of the prominent attacks thwart the development of IoT network besides detecting other types of
attacks’ categories such as Probe, R2L ,and U2R in a competitive computational overhead as each record requires 66
µsec on average to be processed. Thus, the proposed model is capable of properly and efficiently working in real
time environments.

5ACKNOWLEDGEMENT

This work is fully supported by the Deanship of Scientific Research and Graduate Studies at Al-Hussein Bin
Talal University, Jordan.

REFERENCES

[1] V. Balasubramanian, S. Otoum, M. Aloqaily, I. Al Ridhawi, Y. Jararweh, Low-latency vehicular edge: A


vehicular infrastructure model for 5G, Simul. Model. Pract. Theory. 98 (2020) 101968.
doi:10.1016/j.simpat.2019.101968.
[2] I. Al Ridhawi, M. Aloqaily, Y. Kotb, Y. Jararweh, T. Baker, A Profitable and Energy-Efficient Cooperative
Fog Solution for IoT Services, IEEE Trans. Ind. Informatics. 3203 (2019) 1–1.
doi:10.1109/tii.2019.2922699.
[3] D. Evans, The Internet of Things How the Next Evolution of the Internet The Internet of Things How the
Next Evolution of the Internet Is Changing Everything, (2011).
[4] Al Ridhawi, Ismaeel, Moayad Aloqaily, Burak Kantarci, Yaser Jararweh, and Hussein T. Mouftah. "A
continuous diversified vehicular cloud service availability framework for smart cities." Computer Networks
145 (2018): 207-218.
[5] K.A.P. da Costa, J.P. Papa, C.O. Lisboa, R. Munoz, V.H.C. de Albuquerque, Internet of Things: A survey
on machine learning-based intrusion detection approaches, Comput. Networks. 151 (2019) 147–157.
doi:https://doi.org/10.1016/j.comnet.2019.01.023.
[6] Quwaider M, Jararweh Y. A cloud supported model for efficient community health awareness. Pervasive
and Mobile Computing. 2016 Jun 1;28:35-50.
[7] S. Alharbi, P. Rodriguez, R. Maharaja, P. Iyer, N. Bose, Z. Ye, FOCUS: A Fog computing-based security
system for the Internet of Things, CCNC 2018 - 2018 15th IEEE Annu. Consum. Commun. Netw. Conf.
2018-Janua (2018) 1–5. doi:10.1109/CCNC.2018.8319238.
[8] H.H. Pajouh, R. Javidan, R. Khayami, A. Dehghantanha, K.R. Choo, A Two-Layer Dimension Reduction
and Two-Tier Classification Model for Anomaly-Based Intrusion Detection in IoT Backbone Networks,
IEEE Trans. Emerg. Top. Comput. 7 (2019) 314–323. doi:10.1109/TETC.2016.2633228.
[9] M. Tavallaee, E. Bagheri, W. Lu, A.A. Ghorbani, NRC Publications Archive ( NPArC ) Archives des
publications du CNRC ( NPArC ) A Detailed Analysis of the KDD CUP 99 Data Set A Detailed Analysis of
the KDD CUP 99 Data Set, (2009).
[10] E. Anthi, L. Williams, P. Burnap, Pulse: an adaptive intrusion detection for the internet of things, (2018) 35
(4 pp.)-35 (4 pp.). doi:10.1049/cp.2018.0035.
[11] E.M. Dovom, A. Azmoodeh, A. Dehghantanha, D.E. Newton, R.M. Parizi, H. Karimipour, Fuzzy pattern
25

tree for edge malware detection and categorization in IoT, J. Syst. Archit. 97 (2019) 1–7.
doi:10.1016/j.sysarc.2019.01.017.
[12] H. Wang, J. Gu, S. Wang, An effective intrusion detection framework based on SVM with feature
augmentation, Knowledge-Based Syst. 136 (2017) 130–139. doi:10.1016/j.knosys.2017.09.014.
[13] H. Zhang, C.Q. Wu, S. Gao, Z. Wang, Y. Xu, Y. Liu, An Effective Deep Learning Based Scheme for
Network Intrusion Detection, Proc. - Int. Conf. Pattern Recognit. 2018-Augus (2018) 682–687.
doi:10.1109/ICPR.2018.8546162.
[14] N. Moustafa, J. Slay, UNSW-NB15: a comprehensive data set for network intrusion detection systems
(UNSW-NB15 network data set), in: 2015 Mil. Commun. Inf. Syst. Conf., 2015: pp. 1–6.
[15] N. Koroniotis, N. Moustafa, E. Sitnikova, J. Slay, towards developing network forensic mechanism for
botnet activities in the IoT based on machine learning techniques, Lect. Notes Inst. Comput. Sci. Soc.
Telecommun. Eng. LNICST. 235 (2018) 30–44. doi:10.1007/978-3-319-90775-8_3.
[16] A. Dawoud, S. Shahristani, C. Raun, Deep learning and software-defined networks: Towards secure IoT
architecture, Internet of Things. 3–4 (2018) 82–89. doi:10.1016/j.iot.2018.09.003.
[17] E. Hodo, X. Bellekens, A. Hamilton, P.-L. Dubouilh, E. Iorkyase, C. Tachtatzis, R. Atkinson, Threat
analysis of IoT networks Using Artificial Neural Network Intrusion Detection System Keywords—Internet
of things,Artificial Neural Network,Denial of Service,Intrusion detection System and Multi-Level
Perceptron, (2016) 4–9.
[18] B. Mohammadi, M. Sabokrou, End-to-End Adversarial Learning for Intrusion Detection in Computer
Networks, (2019) 1–4. http://arxiv.org/abs/1904.11577.
[19] V.V. Kumari, P.R.K. Varma, A semi-supervised intrusion detection system using active learning SVM and
fuzzy c-means clustering, in: 2017 Int. Conf. I-SMAC (IoT Soc. Mobile, Anal. Cloud)(I-SMAC), 2017: pp.
481–485.
[20] P. Illy, G. Kaddoum, C.M. Moreira, K. Kaur, S. Garg, Securing Fog-to-Things Environment Using Intrusion
Detection System Based On Ensemble Learning, (2019) 15–18. http://arxiv.org/abs/1901.10933.
[21] Aloqaily, Moayad, Ismaeel Al Ridhawi, Haythem Bany Salameh, and Yaser Jararweh. "Data and service
management in densely crowded environments: Challenges, opportunities, and recent developments." IEEE
Communications Magazine 57, no. 4 (2019): 81-87.
[22] N. Tepedelenlioglu, A Fast New Algorithm for Training Feedforward Neural Networks, IEEE Trans. Signal
Process. 40 (1992) 202–210. doi:10.1109/78.157194.
[23] S.S. Haykin, S.S. Haykin, Kalman filtering and neural networks, Wiley Online Library, 2001.
[24] MATLAB, (2018). https://www.mathworks.com.
[25] M. Almiani, A. AbuGhazleh, A. Al-Rahayfeh, A. Razaque, Cascaded hybrid intrusion detection model
based on SOM and RBF neural networks, Concurr. Comput. Pract. Exp. (2019) e5233.
https://doi.org/10.1002/cpe.5233.
[26] J.R. Landis, G.G. Koch, The Measurement of Observer Agreement for Categorical Data, Biometrics. 33
(1977) 159. doi:10.2307/2529310.

You might also like