You are on page 1of 14

Received: 1 August 2019 Revised: 21 September 2019 Accepted: 14 October 2019

DOI: 10.1002/ett.3803

SPECIAL ISSUE ARTICLE

DL-IDS: a deep learning–based intrusion detection


framework for securing IoT

Yazan Otoum1 Dandan Liu2 Amiya Nayak1

1
School of Electrical Engineering &
Computer Science (EECS), University of Abstract
Ottawa, ON, Canada
The Internet of Things (IoT) is comprised of numerous devices connected
2
School of Computer Science, Wuhan
through wired or wireless networks, including sensors and actuators. Recently,
University, Wuhan, China
the number of IoT applications has increased dramatically, including smart
Correspondence homes, vehicular ad hoc network (VANETs), health care, smart cities, and wear-
Yazan Otoum, School of Electrical
Engineering & Computer Science, ables. As reported in IHS Markit (see https://technology.ihs.com), the number
University of Ottawa, ON K1N 6N5, of connected devices is projected to jump from approximately 27 billion in 2017
Canada.
to 125 billion in 2030, an average annual increment of 12%. Security is a critical
Email: yazan.otoum@uottawa.ca
issue in today's IoT field because of the nature of the architecture, the types of
devices, different methods of communication (mainly wireless), and the volume
of data being transmitted over the network. Security becomes even more impor-
tant as the number of devices connected to the IoT increases. To overcome the
challenges of securing IoT devices, we propose a new deep learning–based intru-
sion detection system (DL-IDS) to detect security threats in IoT environments.
There are many IDSs in the literature, but they lack optimal features learning
and data set management, which are significant issues that affect the accuracy
of attack detection. Our proposed module combines the spider monkey opti-
mization (SMO) algorithm and the stacked-deep polynomial network (SDPN)
to achieve optimal detection recognition; SMO selects the optimal features in
the data sets and SDPN classifies the data as normal or anomalies. The types
of anomalies detected by DL-IDS include denial of service (DoS), user-to-root
(U2R) attack, probe attack, and remote-to-local (R2L) attack. Extensive analy-
sis indicates that the proposed DL-IDS achieves better performance in terms of
accuracy, precision, recall, and F-score.

1 INTRODUCTION

The IoT is made up of many different physical endpoints or sensors that are connected to one another and the Internet.
They can collect data from surrounding environments and share it between themselves. Small devices with low power and
limited resources can also send data to IoT gateways, where it is aggregated and connected to endpoint networks. Security
in the IoT field is becoming very challenging as the industry develops and grows. This is largely due to the heterogeneity
of IoT architecture, the different types of accessed devices, multiple approaches of communication,1-4 and the enormous
volume of data being transmitted throughout the network. As shown in Figure 1, IoT attacks increased during 2018 by
217.5%, from 10.3 million in 2017 to 32.7 million, according to the 2019 Cyber Threat Report logged by SonicWall.5
Security issues typically involve authentication, data privacy, availability, confidentiality, integrity, energy efficiency,
single-point failures to be verified, and more.6,7 Most security threats can be categorized as follows:
• high-level threats: insecure interfaces, insecure software/firmware, and middle-ware security;

Trans Emerging Tel


Trans Emerging TelTech.
Tech.2019;e3803.
2022;33:e3803. wileyonlinelibrary.com/journal/ett
wileyonlinelibrary.com/journal/ett © 2019 John
© 2019 Wiley
John & Sons,
Wiley Ltd.
& Sons, Ltd. 11 of
of 14
14
https://doi.org/10.1002/ett.3803
https://doi.org/10.1002/ett.3803
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
2 of 14 OTOUM ET AL.
2 of 14 OTOUM ET AL.

FIGURE 1 IoT attacks volume year-over-year comparison4

• intermediate-level threats: routing disruptions, replay attacks, insecure neighbor discovery, buffer reservation, sink-
holes, authentication, session establishment, and privacy violations; and
• low-level threats: jamming, Sybil, spoofing, insecure initialization, and sleep deprivation attacks.
There are many intrusion detection systems (IDSs) in the literature that discuss IoT security problems. With traditional
IDS, data are collected by sensors and sent to an analysis engine, which examines the data and detects intrusions. Once
an intrusion is detected, the reporting system generates an alert for the network administrator. Intrusion detection sys-
tem approaches can be further classified into signature-based and anomalous-based methods, according to whether the
system or network behaviors match an attack signature or deviation from the normal exceeds the threshold. Machine
learning (ML) and deep learning (DL) methods have recently been proposed8-11 to detect and mitigate security threats. For
attack detection, ML algorithms such as support vector machines (SVMs), K-nearest neighbor (KNN) algorithm, decision
trees (DTs), Bayesian algorithms, random forest, and K-means algorithm are typically applied in IDS. However, because
traditional methods usually lack optimal features learning and data set management, the accuracy of attack detection
cannot be guaranteed. Dealing with irrelevant features in high dimensional data generated from thousands of IoT sen-
sors and devices can increase the potential for over-fitting, making decisions based on noise and extending training time.
In this proposed model, we show how spider monkey optimization (SMO) benefits our system by choosing the opti-
mal features in Section 5. Similarly, DL approaches such as deep belief network (DBN), convolutional neural network
(CNN), recurrent neural network (RNN), restricted Boltzmann machine (RBM), and deep AutoEncoder (AE) are also
used in IDSs. However, the DL approach has achieved better performance than ML, particularly in large data sets. Several
IoT systems produce substantial volumes of data, and DL methods are suitable for these systems because they support
high-dimensional features. They also enable deep linking, which allows IoT-based devices and their applications to inter-
act with one another without human intervention. Though security and privacy are the primary goals of DL IoT data
processing, it has also been widely used for image processing, big data analytics, video processing and speech processing
in IoT, and smart city applications.12 Therefore, to address the challenges of protecting the IoT, we propose a novel, deep
learning–based framework for IoT environments. The main contributions of this paper are as follows:
• A novel deep learning–based IDS (DL-IDS) design to secure IoT environments with optimal attack detection efficiency;
• A model that can handle data sets with uncertain or missing data and redundant values;
• An SMO algorithm that is an improvement over other optimization algorithms in terms of convergence time is pro-
posed for feature learning. The optimal packet features are identified by the SMO algorithm, and extracted from the
cleaned data set;
• Based on learned optimal features, stacked-deep polynomial network (SDPN) classifies data as normal or anomalous
(DoS, U2R, probe, and R2L). Data set cleaning and optimal feature learning help SDPN achieve higher accuracy and
shorter training times compared to other methods.
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
OTOUM ET AL. 3 of 14
OTOUM ET AL. 3 of 14

The balance of this paper is organized as follows: Section 2 addresses the background of IoT security and deep learning,
whereas state-of-the-art research in IoT security using deep learning is discussed in Section 3. Section 4 explains the pro-
posed DL-IDS with the algorithms, and Section 5 evaluates the performance of DL-IDS in terms of performance metrics.
In the final section, we summarize our contributions and highlight potential future research in DL-IDS IoT security.

2 BAC KG RO U N D

2.1 IoT security


Internet-of-Things architecture consists of three layers (Figure 2), known as application, network, and perception.13 The
application layer is the top level in the architecture, and it provides services to users such as smart cities and smart grids.
The network layer is the most critical, as it integrates the connected devices using the infrastructure, including gateways
and switches, with different communication technologies and protocols, such as Wi-Fi, ZigBee, and Bluetooth. The per-
ception layer, also known as the sensor layer, collects and processes the data from sensors, and can also control physical
objects in the near environment, such as actuators. The perception layer has two sections: sensor nodes like RFID tags
and wireless sensor networks that are self-organizing and have many sensor nodes deployed in large areas.
The IoT gateway is also very important, as it is responsible for data forwarding, node control, and communication pro-
tocol compatibility. It has a wide range of access capabilities, including seamless manageability and protocol interworking
between traditional networks and WSN. Manageability protocol interworking supports the traditional network and WSN
seamlessly. Each layer in an IoT system is susceptible to threats and should be secured by one or more techniques of
effective anomaly detection, such as authentication and encryption.14
Table 1 indicates the IoT security threats at each layer. Here, we discuss network layer threats and attacks on IoT
systems. Denial-of-service attacks are the most common, as they render network services unavailable to authorized users
by overwhelming all resources with a massive amount of traffic.15 The man-in-the-middle attack is a malicious device
controlled by the attacker that can intercept communications between two authorized devices by presenting itself as a
store. Then, it can forward data with the intent to modify, eavesdrop, and control the communication between the users. In
a spoofing attack, the attackers can control the address of a valid IoT device and, thereby, interfere with communications
between other network devices; they can also send malicious data that appear valid.

FIGURE 2 IoT architecture11


21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
4 of 14 OTOUM ET AL.
4 of 14 OTOUM ET AL.

TABLE 1 Attack classifications


Layer Security threats Threats description
Perception layer Node capture attacks, malicious code The security threats in the perception
injection attacks, false data injection attacks, layer focus on forging aggregated
replay attacks, cryptanalysis attacks and data from the IoT devices,
side channel attacks, eavesdropping and and destroying perception devices.13
interference, sleep deprivation attacks.
Networks layer DoS attacks, spoofing attacks, sinkhole attacks, The security threats in this layer focus on affecting
wormhole attacks, man-in-the-middle attack, the availability of the network resources because
routing information attacks, the main purpose of the layer transmits
Sybil attacks, unauthorized access. aggregated data. However, most devices in the
IoT environment are connected wirelessly.14
Application layer Phishing attack, malicious virus/worm, The security threats in the application layer
malicious scripts. are focused on software attacks because
the main purpose of the layer is to support
services requested by the users.13

2.2 Deep learning for IoT


Deep learning (DL) is a state-of-the-art machine learning (ML) field with powerful analytics capabilities, used for appli-
cations such as computer vision, bioinformatics, and natural language processing. It is seen to demonstrate significant
performance improvement over some ML algorithms and has recently been used in some IoT applications in edge/fog
computing. These applications require substantial volumes of data to be transferred over the network, and DL gives better
performance with extensive data than ML. Besides, DL can work with new features that can be used to solve the problems
without human interactions.16

3 LITERATURE SURVEY ON DL APPROACHES

DL has been used for many IoT applications for the reasons discussed above. Lane et al17 used two of the most common
deep learning algorithms (ie, CNNs and deep neural networks [DNNs]) to build a model that processes sensor data using
three different types of IoT hardware employed in wearables and smartphones. Deep learning approach–based authenti-
cation using long short-term memory (LSTM) that learns the hardware imperfections of low-power ratios has also been
proposed for IoT environments.18 Long short-term memory was designed to be sensitive to signal imperfections in device
identification. However, the method is only helpful for device identification as it cannot detect other significant attacks.
Long short-term memory has also been applied for botnet detection in IoT.19 In addition, an LSTM-based IoT threat
hunting approach involving OpCode extraction, feature extraction, and deep threat classification was proposed. Though
optimal feature selection based on term frequency and inverse document frequency was also tested (TF-IDF), it is only
usable with small data sets and is unsuitable for real-time applications. In the work of Tama and Rhee,20 a DNN was used
for attack classifications in IoT environments, and its performance was evaluated by cross-validation and subsampling.
A grid search method was used to initialize the parameters of DNN; the parameter learning using grid search takes a
long time to initialize optimal parameters. The learning-based deep-Q-network was proposed to maintain security and
privacy in IoT healthcare environments21 by managing authentication, access control, and intermediate attacks in IoT.
However, attack detection based on nonoptimal features leads to degradation of classification accuracy. Botnet detection
on the Internet of Battlefield Things (IoBT) was performed using the deep eigenspace learning approach.22 The operation
code (OpCode) sequence of devices was used for malware detection before classification, and a feature-based graph was
constructed for each sample. As the involvement of large computations results in high computational complexity, this is
ineffective for IDS. For botnet detection, a bidirectional LSTM-RNN method was used in IoT,23 and word embedding was
applied for text recognition and to convert the attack packet into integer format. However, this method only performs well
with a limited number of attack vectors; when the number increases, it becomes ineffective. Stacked AutoEncoder (AE)
is an unsupervised DL method proposed to detect intrusion in fog-enabled IoT environments,24 and its attack detection
latency and scalability were improved by using fog computing in IoT. However, the running time of AE is relatively long,
and the detection accuracy is low because of nonoptimal features.
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
OTOUM ET AL. 5 of 14
OTOUM ET AL. 5 of 14

This literature survey reveals the following research challenges:


• Dealing with uncertain data limits the accuracy of DL modules. However, there are huge volumes of uncertain data in
real-time environments.
• Processing irrelevant features (nonoptimal) degrades the accuracy of IDS and extends training time.
• A deep learning approach that can overcome all the above issues in LSTM and DBN is required.
All these problems are resolved by our proposed DL-IDS, which combines optimization algorithms and DL approaches.

4 PROPOSED DL-IDS

This section provides detailed explanations about using DL-IDS to secure IoT environments. The overall process of DL-IDS
is depicted in Figure 3 and shows that it begins with preprocessing to remove uncertainties in the normalized data set.
Preprocessing applies two effective processes: redundancy elimination and missing value replacement. The cleaned data
set is then processed with an optimal feature selection algorithm to extract relevant features, and based on these, the data
are classified into normal and anomalous data.
Here, we consider four categories of anomalies (DoS, Probing, U2R, and R2L), which can be defined as follows:
1. DoS: An attacker tries to make a service unavailable to legitimate users by uploading enormous unwanted packets.
Denial-of-service attacks include Apache2, Back, Land, Udpstorm, and Smurf.
2. Probing: In this type of attack, an attacker monitors the remote victim and tries to collect some information by doing
the port scanning. Attacks such as duration of the connection and source bytes are some types of probing attack.
3. R2L: In this type of attack, an unauthorized attacker attempts to access the system without a system account.
Remote-to-local attacks include Ftpwrite, Guesspasswd, and SNMP.
4. U2R: In this scenario, the attacker has local access to a victim's machine and tries to obtain legitimate user privileges.
Buffer overflow, HTTP tunnel, and rootkit attacks are types of U2R attacks.

4.1 Preprocessing phase


This process is initialized with similarity measurements of the data in the data set, using the Minkowski distance to
compute the distance between each pair of data. Duplicate and redundant data are then removed from the data set and
fed into the next stage, where the missing values are replaced by nearest neighbor computation to avoid the classifier

FIGURE 3 The proposed DL-IDS framework


21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
6 of 14 OTOUM ET AL.
6 of 14 OTOUM ET AL.

to bias toward more frequent records. Missing value replacement or imputation is the process of predicting the relevant
value of missed attributes in the data. This missing value replacement method identifies the nearest neighbors of missing
values, and the mean value of the missing attributes is replaced in the missing values. Consider a data set with m records
in which each record has n attributes, and x attributes are missing in the data record R1 . Then, the K nearest neighbors of
R1 are determined according to the Euclidean distance (ED) as follows:


m
ED = |R1 − Ri |2 . (1)
i=2

Based on the distance value, the K nearest neighbor records are identified for R1 , and the attribute x value of R1 is computed
from the attribute value of its K neighbors. The mean value of x is computed for the x values presented in neighbor records,
and the missing value is replaced by this mean value. This step eliminates all redundant data, and the missing values are
replaced to remove all uncertainties from the data set.

4.2 Optimal feature selection phase


In the preprocessing step, all uncertainties are removed from the data set, and in this step, we select the optimal features
from the data set. Optimal feature selection minimizes feature learning time for our SDPN classifier. For optimal feature
selection,25 we adapt the SMO algorithm, which is a food foraging–based optimization algorithm inspired by fission-fusion
social systems. The algorithm has been used in many engineering domains because it requires fewer control parameters,
which makes it easier to apply in complex optimization problems. The following are the main control parameters with
their values:
• LocalLeader Limit = [S x D],
• GlobalLeader Limit = [S/2,S x 2],
• Maximum no. of groups MG = [S/10], and
• Perturbation rate PR = [0.1,0.9],
where S is the population size and D is the dimension. Algorithm 1 explains the process of optimal feature selection with
the SMO algorithm.
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
OTOUM ET AL. 7 of 14
OTOUM ET AL. 7 of 14

The details of the SMO steps follow Initialization: At this step, all solutions are initialized as spider monkeys. In our
work, the 41 features are initialized as a population of N spider monkeys, where each monkey SMi (i = 1, 2, … , N) is
a D-dimensional vector, and SMi indicates the ith spider monkey (SM) in this population. Each dimension j of SMi is
initialized as follows:
SMi𝑗 = SMmin𝑗 + Rand[0, 1](SMmax𝑗 − SMmin𝑗 ) (2)

where SMminj and SMmaxj are limits of SMij in the jth direction, and rand[0, 1] is a random value in the range of 0 to 1.
Local Leader Phase (LLP): In this stage, each SM modernizes its present location based on the experience of every
local leader and local group members, and the fitness value of the new position is defined. If the fitness value of the new
position is better than the current, with higher being better, the spider monkey updates its position with the new one, as
follows:
SMnewi𝑗 = SMi𝑗 + rand[0, 1](LLk𝑗 − SMi𝑗 ) + rand[−1, 1](SMr𝑗 − SMi𝑗 ) (3)

where the jth dimension of the ith SM is indicated by SMij , the jth dimension of the kth local group leader position is
denoted by LLkj . SMrj indicates the jth dimension of the rth SM which is selected randomly from the Kth local group
where r ≠ i.
Global Leader Phase (GLP): After concluding the local leader phase, all spider monkeys can update their positions
based on the experience of the global leader and local group members, as follows:

SMnewi𝑗 = SMi𝑗 + rand[0, 1](GL𝑗 − SMi𝑗 ) + rand[−1, 1](SMr𝑗 − SMi𝑗 ), (4)

where GLj is the jth dimension of the GLP and j ∈ {1, 2, … , D} is a randomly selected index. The positions of the spider
monkeys (SMi ) are updated based on their probability Probi , which is defined by the means of their fitness. In this way,
the best candidate will have higher probability to be the leader in the next phase. The probability Probi is calculated as
follows (or by any other function of fitness):
Fitnessi
Probi = , (5)
∑N
Fitnessi
i=1

where Fitnessi refers to the fitness value of the ith SM. The fitness of the last position is defined and compared to the
previous one, and the best is then selected. In this work, the evaluation of each feature (spider monkey) is analyzed by an
SDPN classifier to calculate the efficiency of the signified feature subset (ie, the accuracy).
Global Leader Learning phase (GLL): In this phase, the global leader modifies its location by applying a greedy
global selection to all the population sets, and the SM with the highest fitness value becomes the global leader. A counter
GlobalLimitCount is incremented by 1, in case the global leader position does not update.
Local Leader Learning phase (LLL): In this phase, the local leader in each group updates its position by applying the
greedy selection of its group, and the SM with the highest fitness value in each group becomes the local leader. A counter
LocalLimitCount is incremented by 1 in case the local leader position does not update.
Local Leader Decision (LLD): In this phase, if the local limit count reaches its threshold value the group members
change their positions by using information from the global and local leaders by the equation:

SMnewi𝑗 = SMi𝑗 + U(0, 1) ∗ (GL𝑗 − SMi𝑗 ) + U(0, 1) ∗ (SMi𝑗 − LLk𝑗 ). (6)

Global Leader Decision (GLD): In this phase, if the global leader does not reach the maximum number of iterations (ie,
Global Leader Limit), the leader divides the population into smaller local groups until they reach the maximum number
of groups (MG). In this case, if the position of global leader is not updated, it combines all the groups into one.
The best subset of features with high classification accuracy becomes the optimal result. Thus, the algorithm simulates
the fusion-fission social system of SMs, and in the process, the optimal feature set is selected by the SMO algorithm.
Table 2 shows the NSL-KDD features selected by SMO that give the best accuracy. Spider monkey optimization helps
reach the highest accuracy values (as will be discussed in Section 4.3), by choosing a combination of optimal features that
can achieve these values (ie, 20 features out of a full features set). However, the best accuracy percentage we had obtained
without SMO was 96%.
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
8 of 14 OTOUM ET AL.
8 of 14 OTOUM ET AL.

TABLE 2 Selected features by SMO


No. Selected Type Feature No. Selected Type Feature
by SMO By SMO

1 Continuous duration 23 Continuous count

2 Symbolic protocol_type 24 Continuous srv_count
√ √
3 Symbolic service 25 Continuous serror_rate

4 Symbolic flag 26 Continuous srv_serror_rate

5 Continuous src_bytes 27 Continuous rerror_rate

6 Continuous dst_bytes 28 Continuous srv_rerror_rate
√ √
7 Symbolic land 29 Continuous same_srv_rate
√ √
8 Continuous wrong_fragment 30 Continuous diff_srv_rate
9 Continuous urgent 31 Continuous srv_diff_host_rate

10 Continuous hot 32 Continuous dst_host_count
11 Continuous num_failed_logins 33 Continuous dst_host_srv_count

12 Symbolic logged_in 34 Continuous dst_host_same_srv_rate

13 Continuous num_compromised 35 Continuous dst_host_diff_srv_rate

14 Continuous root_shell 36 Continuous dst_host_same_src_port_rate

15 Continuous su_attempted 37 Continuous dst_host_srv_diff_host_rate

16 Continuous num_root 38 Continuous dst_host_serror_rate

17 Continuous num_file_creations 39 Continuous dst_host_srv_serror_rate

18 Continuous num_shells 40 Continuous dst_host_rerror_rate
19 Continuous num_access_files 41 Continuous dst_host_srv_rerror_rate
20 Continuous num_outbound_cmds
21 Symbolic is_host_login
22 Symbolic is_guest_login

4.3 Classification phase


In the classification phase, we adopted a SDPN that has been used with numerous small and large data sets that have
shown high-performance results in the works of Shi et al26 and Zheng et al.27 In SDPN, multiple basic DPNs are stacked
to form a deep hierarchy to develop a new, supervised DNN algorithm with efficient layer-by-layer learning. The output
of each node is a quadratic function of its inputs, and the highest-level output representation is then fed to an SDPN
classifier. The elimination of irrelevant features by selecting optimal features minimizes SDPN training time, and the
optimal technique of building deep networks in SDPN allows data representation learning on small finite samples. The
algorithm starts with a simple network that could have significant bias but will not tend to over-fit (ie, low variance), and
as the network becomes deeper, the bias gradually decreases while increasing the variance. Thus, this algorithm could be
applied to train the natural curve of the solutions used to control the bias-variance trade-off. Construction of the layers in
DPN begins by assigning the degree-1 polynomial function (linear) over the training data set, where the bias is built using
singular vector decomposition (SVD) to generate an independent set of vectors. The outputs of the first layer extend all
the values obtained by the linear functions to the training instances. Then, using the same concept, the basis for degree-2
and three polynomials can be derived. This means that the vector of values achieved by any degree-2 polynomial extends
the vector of values obtained by nodes in the first layer and the products of the outputs of every two nodes in the first layer.
Letting R = {r1 , r2 , … , rm } ∈ Rmxd gives a set of m training samples, where ri is a d-dimensional sample. Then, the
construction of the first layer in DPN begins by assigning the degree-1 polynomial function (linear) over the training data
as follows:

{ }
(⟨w, [1 r1 ]⟩, … , ⟨w, [1 rm ]⟩) ∶ w ∈ Rd+1 , (7)

which is the (d+1)-dimensional linear subspace of Rm . Therefore, to build the base, we need to find d+1 vectors w1, … ,
wd+1 . Therefore, {(⟨w𝑗 , [1 r1 ]⟩, … , ⟨w𝑗 , [1 rm ]⟩)and}d+1
𝑗=1 are linearly independent vectors. This can be generated using SVD
to achieve linear transformation that is specified by matrix W, and maps [1 X] into the generated base where the columns
of W specify the d+1 linear function that form the first layer of the network. The jth node of the first-layer functions is

n1𝑗 (r) = ⟨W𝑗 , [1 R]⟩, (8)


21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
OTOUM ET AL. 9 of 14
OTOUM ET AL. 9 of 14

where {(n1𝑗 (r1 ), … , n1𝑗 (rm ))}d+1


𝑗=1 is the basis for all values obtained by degree-1 polynomials over the training data set. If F
1
1
refers to the m × (d + 1) matrix in which columns are the vectors of the matrix, then Fi.𝑗 = n1𝑗 (ri ). This is how the network
of the first layer, whose outputs extend all the values obtained by linear functions on the training data set, is constructed.
The basis of degree-2 and degree-3 can be determined in the same way; then, any degree s polynomial can be expressed as

gi (r)hi (r)+k(r), (9)
i

where gi (r) are degree-1 polynomials, hi (r) are (s-1)-degree polynomials, and k (r) is the polynomial of degree at most s-1.
The nodes in layer-1 first extend all degree-1 polynomials; then, extend polynomials gi , hi , and k, and we can express any
degree-2 polynomial as
( )( ) ( ) ( )( )
∑ ∑ ∑ ∑ ∑ ∑ ∑ ( )
(h )
𝛼𝑗 (gi ) n1𝑗 (r) 𝛼x (hi ) n1x (r) + 𝛼𝑗 (k) n1𝑗 (r) = n1𝑗 (r)n1x (r) 𝛼𝑗 (gi ) 𝛼x i + n1𝑗 (r) 𝛼𝑗(k) , (10)
i 𝑗 x 𝑗 𝑗,x i 𝑗

where 𝛼's are scalar factors. This means the vector of values attained by any degree-2 polynomial extend both the vector
of values from nodes layer and the products of outputs of every two nodes in the first layer. For algebraic representation,
when the first layer was constructed a matrix F1 was formed, in which columns extend all values obtained by degree-1
polynomials. Then, the matrix [F F ̃ 2 ] can be expressed as

[( ) ( ) ( ) ( )]
̃2 =
F F11◦ F11 … F11◦ F|F
1
… F 1 ◦ 1
F … F 1 ◦ 1
F , (11)
| |F |
1
1 1 |F |
1 |F1|

where ◦ refers to the direct matrix product. We can define F2 as the method as F1 ; then, the new matrix[F F̃ 2 ] extends all
2
possible values of degree-2 polynomials. Then, [F F ]'s columns are linearly independent basis for [F F̃ ]'s columns.
2

To construct layer 3, we simply repeat the previous process at each iteration s to maintain matrix F, whose columns
form a basis for the values obtained by all polynomials of degree ≤ s − 1. We can then write the new matrix as
[( ) ( ) ( ) ( )]
̃s =
F F1s−1◦ F11 … F1s−1◦ F|F
1
… F s−1 ◦ 1
F … F s−1 ◦ 1
F . (12)
| |F s−1 | 11 |F s−1 | |F1|

After someΔ − 1 iterations, a matrix F is constructed whose columns form a basis for all values obtained by polynomials
of degree ≤ Δ − 1 over the training data set. Algorithm 2 summarizes the steps for building DPN.

To verify that the model has high performance with both training and test data (ie, avoids over-fitting), we use the L2
regularization technique. This extends the loss function by adding a regularization term with a regularization parameter
known as a lambda value (𝜆), which controls how to manage the features. It is a hyperparameter with no fixed value,
and when its value increases, the features are highly penalized, and as a result, the model becomes simpler. When its
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
10 of 14 OTOUM ET AL.
10 of 14 OTOUM ET AL.

FIGURE 4 Accuracy relating to SDPN


layers

FIGURE 5 Comparison of
time cost to build SDPN layers

values decrease, there is minimal penalization of the features, and the model becomes more complex. Figure 4 shows
the accuracy related to the SDPN layers using different values of lambda (𝜆). This accuracy is referring to the percentage
of correct predictions, as discussed in detail in Section 5.2, and shows that each layer's archive has a different accuracy.
Through our experiments, we determined that the highest accuracy is achieved in layer 3 with a lambda value equals to
0.001. However, with the SDPN algorithm is not necessary to specify the number of iterations. We can simply create the
layers one-by-one, check the performance of the resulting network on a validation set, and stop once we reach satisfactory
performance.
With this methodology, all optimal features selected by SMO algorithms are learned in each layer of SDPN, and
high-level and optimal features are classified. Building on these important features, SDPN classifies the data into normal
and anomalous, based on the kernel function. Our proposed DL-IDS system performs preprocessing, feature selection,
and classification for intrusion detection, and the elimination of uncertainties in the data set improves attack detection
performance. Optimal feature selection improves the accuracy of the classifier, and SDPN maps the optimal low-level
features to high-level features.
Figure 5 shows comparisons of the time cost to build SDPN layers with different values of lambda (𝜆) before and after
using SMO, and highlights the time saved using the SMO optimal features selection. The SMO reduced training time an
average of 66.59% when the dimension of the data set was reduced (ie, from 41 features to 20 features). This also increased
the classification speed, as well as decreasing calculations that can lower the algorithm running time. This indicates that
this model can be used in the real-time environments.
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
OTOUM ET AL. 11 of 14
OTOUM ET AL. 11 of 14

Traffic Training Testing TABLE 3 Data set analysis


Normal 67 343 9711
Attacks DoS 45 927 7458
Probe 11 656 2754
R2L 995 2421
U2R 52 200
Total 125 973 22 544

5 PERFORMANCE EVALUATION

In this section, we analyze the performance of the proposed DL-IDS system, based on significant performance metrics,
such as accuracy, precision, recall, and F-score.

5.1 Data set description


To evaluate the performance of DL-IDS, we used the NSL-KDD benchmark data set, which is the most recent version of
the KDD'99 data set.28 This data set contains DoS, R2L, probe, and U2R cyber-attacks, and consists of 125,973 records
for training and 22,554 records for testing. It involves 41 features, including duration, protocol, service, flag, source bytes,
and destination bytes.
A brief analysis of the NSL-KDD data set is provided in Table 3. In DL-IDS, the data set first undergoes the preprocessing
phase in which redundant data are eliminated and missing values are replaced. Then, optimal features are selected by
the SMO algorithm, and further feature learning and classification with optimal features is performed by SDPN. We used
Python-Tensorflow on CPU core i7–based machine to simulate our proposed framework, and for optimal feature selection,
41 populations were initialized in the SMO algorithm and 100 iterations were performed.

5.2 Performance metrics


In this work, we considered for comparison performance metrics such as accuracy, precision, recall, and F score, which
are defined as follows.
Accuracy: This is defined as the percentage of correct predictions; that is, the percentage of anomalous traffic that is
classified correctly. It is the ratio of correct detections to the total number of records in the data set and can be computed
as follows.
Accurac𝑦 = (TP + TN)∕(TP + TN + FP + FN) (13)

Accuracy is computed based on false positive (FP) and true positive (TP) values.
Precision: This metric describes the classifier's ability to predict normal data without conditions. It is defined as the
number of TPs divided by the number of TPs, plus the number of FPs as follows.

Precision = TP∕((TP + FP)) (14)

Recall: Recall is the ratio of the number of records correctly classified to the number of all corrected events and can be
computed as follows.
Recall = TP∕((TP + FN)) (15)
Here, FN represents the false negative.
F1-Score: This is defined as the harmonic mean of recall and precision, which can be computed as follows.

F1-Score = 2TP∕((2TP + FP + FN)) (16)

All these metrics are significant for evaluation of the classifiers.

5.3 Comparative analysis


Figure 6 shows comparisons of four deep feature embedding learning (DFEL) classifiers: KNNs, DT, and SVM. Deep
feature embedding learning architecture has been proposed for intrusion detection in IoT environments.29 The most
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
12 of 14 OTOUM ET AL.
12 of 14 OTOUM ET AL.

FIGURE 6 Comparison of our model


with DFEL

important performance evaluation metrics for attack detection include accuracy, precision, recall, and F1 score, and these
are used here to compare our model with DFEL versions. Accuracy, as expressed in Equation (13), is an evaluation the
ability of the classifiers to correctly detect the intrusions. DFEL is based on different machine learning classifiers and
was intended to reduce the data dimensions through the use of edge-of-deep and transfer learning. Though it minimizes
training time, it does not improve the detection accuracy of the three classifiers because our DL-IDS system has an effective
preprocessing process and optimal feature selections. Another metric that is significantly improved is precision, as shown
in the Figure. Precision is important as it expresses the rate at which FPs will occur relative to the number of TPs. Traffic
classified as malicious requires an analyst's investigation, and a high rate of precision results in fewer FPs per alert for
a security analyst to manage. The ideal rate of precision is 1, which means every alert shown to an analyst will be a
TP. This is because DFEL is unable to predict the class of the data because it is designed to classify network traffic as
normal or abnormal only. It also cannot deal with uncertain data sets, which is why DFEL models have two classifiers (ie,
DFEL-KNN and DFEL-DT) with the same performance metrics. This indicates that the accuracy is equal to the sensitivity
of a recall, or TP rate, and equal to the selectivity, or true negative rate. This means that the model is balanced and
can simultaneously classify positive instances and negative instances correctly. Though sensitivity and specificity are
important in IDS cases, being in balance is not necessarily positive. However, our model has higher precision than recall,
which means that the model returned substantially more relevant results than irrelevant ones.
Similarly, we compared our model with the deep learning–based attack detection mechanism (D-DL),30 the results of
which are shown in Figure 7. In D-DL, a deep learning–based cyber-attack detection mechanism for IoT was distributed
over fog nodes using the NSL-KDD data set. It was concluded that though the DL approach outperforms shallow learning
methods, the detection efficiency is limited and requires further analysis with respect to payload-based detection. Based

FIGURE 7 Comparison of our model


with D-DL
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
OTOUM ET AL. 13 of 14
OTOUM ET AL. 13 of 14

Model Accuracy (%) Precision (%) Recall (%) F1-Score (%) TABLE 4 Performance comparison on NSL-KDD data
DL-IDS 99.02 99.38 98.91 99.14 set
D-DL 98.27 88.85 96.50 92.52
DFEL GBT 98.54 98.54 98.53 98.53
DFEL KNN 98.82 98.82 98.82 98.82
DFEL DT 98.77 98.77 98.77 98.77
DFEL SVM 98.86 98.86 98.87 98.86

on the calculated average values of each performance metric of the deep model with the five classes (ie, Normal, DoS,
Probe, R2L, and U2R), it is evident that DL-IDS achieves better accuracy, precision, recall, and F1-Score performance,
because of D-DL being processed with nonoptimal features. The Figure also highlights the trade-off between precision
and recall in the D-DL, where high recall and low precision mean most positive instances were correctly recognized (low
FN). However, IDS identifying activity results in many FPs (ie, false alarms), when in reality they are acceptable behavior.
For any classifier, the F1 score is critical to measure its performance because it gives a harmonic mean of precision and
recall metrics for the model. In Figures 5 and 6, we compared the DL-IDS F1 score with other models and achieved better
F1 scores. Therefore, our proposed DL-IDS with nearest neighbor–based preprocessing, SMO-based feature selection and
SDPN-based classification provide better security for IoT environments than previous models. In Table 4, we compare the
performance metrics values of our proposed DL-IDS with existing works (DFEL) and distributed DL (D-DL).

6 CO N C LU S I O N

Due to the growing number of IoT-based networks (see report by IHS Markit31), this paper proposes a novel DL-IDS to
identify severe anomalies. With DL-IDS, an SMO algorithm is used to extract the most relevant features from the data set.
An SDPN is then applied to identify the optimal features and classify the data as normal or anomalous in different attack
categories (eg, DoS, U2R, R2L, and probe). We evaluated our DL-IDS system using the NSL-KDD data set, and it achieved
superior results in accuracy (99.02%), precision (99.38%), recall (98.91%), and F1-score (99.14%). For future research, we
intend to evaluate DL-IDS with different classifiers, including Nave Bayes, DT, and random forest, and with various data
sets, such as KDD-99 and UNSW-NB 15.

ORCID
Yazan Otoum https://orcid.org/0000-0002-5500-3060

REFERENCES
1. Balasubramanian V, Zaman F, Aloqaily M, Al Ridhawi I, Jararweh Y, Salameh HB. A mobility management architecture for seamless
delivery of 5G-IoT services. In: Proceedings of the 2019 IEEE International Conference on Communications (ICC); 2019; Shanghai, China.
2. Al Ridhawi I, Aloqaily M, Kotb Y, Jararweh Y, Baker T. A profitable and energy-efficient cooperative fog solution for IoT services. IEEE
Trans Ind Inform. 2019.
3. Abbas N, Asim M, Tariq N, Baker T, Abbas S. A mechanism for securing IoT-enabled applications at the fog layer. J Sens Actuator Netw.
2019;8(1):16.
4. Kotb Y, Al Ridhawi I, Aloqaily M, Baker T, Jararweh Y, Tawfik H. Cloud-based multi-agent cooperation for IoT devices using
workflow-nets. J Grid Comput. 2019:1-26.
5. SonicWall Inc. 2019 SonicWall cyber threat report. 2019. https://www.sonicwall.com. Accessed June 2019.
6. Kouicem DE, Bouabdallah A, Lakhlef H. Internet of Things security: a top-down survey. Computer Networks. 2018;141:199-221.
7. Tariq N, Asim M, Al-Obeidat F, et al. The security of big data in fog-enabled IoT applications including blockchain: a survey. Sensors.
2019;19(8):1788.
8. Cañedo J, Skjellum A. Using machine learning to secure IoT systems. In: Proceedings of the 2016 14th Annual Conference on Privacy,
Security and Trust (PST); 2016; Auckland, New Zealand.
9. AL-Hawawreh M, Moustafa N, Sitnikova E. Identification of malicious activities in industrial Internet of Things based on deep learning
models. J Inf Secur Appl. 2018;41:1-11.
10. Asad M, Asim M, Javed T, Beg MO, Mujtaba H, Abbas S. DeepDetect: detection of distributed denial of service attacks using deep learning.
Comput J. 2019.
11. Zafar S, Jangsher S, Bouachir O, Aloqaily M, Othman JB. QoS enhancement with deep learning-based interference prediction in mobile
IoT. Computer Communications. 2019;148:86-97.
12. Aloqaily M, Otoum S, Al Ridhawi I, Jararweh Y. An intrusion detection system for connected vehicles in smart cities. Ad Hoc Netw. 2019.
21613915, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/ett.3803 by Obuda University, Wiley Online Library on [21/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
14 of 14 OTOUM ET AL.
14 of 14 OTOUM ET AL.

13. Lin J, Yu W, Zhang N, Yang X, Zhang H, Zhao W. A survey on Internet of Things: architecture, enabling technologies, security and privacy,
and applications. IEEE Internet Things J. 2017;4(5):1125-1142.
14. Adat V, Gupta BB. Security in Internet of Things: issues, challenges, taxonomy, and architecture. Telecommunication Systems.
2018;67(3):423-441.
15. Doshi R, Apthorpe N, Feamster N. Machine learning DDoS detection for consumer Internet of Things devices. In: Proceedings of the 2018
IEEE Security and Privacy Workshops (SPW); 2018; San Francisco, CA.
16. Li H, Ota K, Dong M. Learning IoT in edge: deep learning for the Internet of Things with edge computing. IEEE Network. 2018;32(1):96-101.
17. Lane ND, Bhattacharya S, Georgiev P, Forlivesi C, Kawsar F. An early resource characterization of deep learning on wearables, smart-
phones and Internet-of-Things devices. In: Proceedings of the 2015 International Workshop on Internet of Things towards Applications;
2015; Seoul, South Korea.
18. Das R, Gadre A, Zhang S, Moura JMF, Kumar S. A deep learning approach to IoT authentication. In: Proceedings of the 2018 IEEE
International Conference on Communications (ICC); 2018; Kansas City, MO.
19. HaddadPajouh H, Dehghantanha A, Khayami R, Choo K. A deep Recurrent Neural Network based approach for Internet of Things
malware threat hunting. Future Gener Comput Syst. 2018;85:88-96.
20. Tama BA, Rhee KH. Attack classification analysis of IoT network via deep learning approach. Res Br Inf Commun Technol Evol. 2017;3.
21. Shakeel PM, Baskar S, Dhulipala VRS, Mishra S, Jaber MM. Maintaining security privacy in health care system using learning based
deep-q-networks. J Med Syst. 2018;42:186.
22. Azmoodeh A, Dehghantanha A, Choo K-K. Robust malware detection for internet of (battlefield) things devices using deep eigenspace
learning. IEEE Trans Sustain Comput. 2019;4:88-95.
23. Mcdermott CD, Majdani F, Petrovski AV. Botnet detection in the Internet of Things using deep learning approaches. In: Proceedings of
the 2018 International Joint Conference on Neural Networks (IJCNN); 2018; Rio de Janeiro, Brazil.
24. Abeshu A, Chilamkurti N. Deep learning: the frontier for distributed attack detection in fog-to-things computing. IEEE Commun Mag.
2018;56(2):169-175.
25. Bansal JC, Sharma H, Jadon SS, Clerc M. Spider monkey optimization algorithm for numerical optimization. Memetic Computing.
2014;6(1):31-47.
26. Shi J, Zhou S, Liua X, Zhang Q, Lu M, Wang T. Stacked deep polynomial network based representation learning for tumor classification
with small ultrasound image dataset. Neurocomputing. 2016;194:87-94.
27. Zheng X, Shi J, Li Y, Liu X, Zhang Q. Multi-modality stacked deep polynomial network based feature learning for Alzheimer's disease
diagnosis. In: Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI); 2016; Prague, Czech Republic.
28. Tavallaee M, Bagheri E, Lu W, Ghorbani AA. A detailed analysis of the KDD CUP 99 data set. In: Proceedings of the 2009 IEEE Symposium
on Computational Intelligence for Security and Defense Applications; 2009; Ottawa, Canada.
29. Zhou Y, Han M, Liu L, He JS, Wang Y. Deep learning approach for cyberattack detection. In: Proceedings of the IEEE Conference on
Computer Communications Workshops (INFOCOM WKSHPS); 2018; Honolulu, HI.
30. Diro A, Chilamkurti N. Distributed attack detection scheme using deep learning approach for Internet of Things. Future Gener Comput
Syst. 2018;82:761-768.
31. Howell J. Number of connected IoT devices will surge to 125 billion by 2030, IHS Markit says. https://technology.ihs.com. Accessed June
2019.

How to cite
How to cite this
this article:
article: Otoum
Otoum Y,
Y, Liu
Liu D,
D, Nayak
Nayak A.
A. DL-IDS:
DL-IDS: aa deep
deep learning–based
learning–based intrusion
intrusion detection
detection
framework for securing IoT. Trans Emerging Tel Tech. 2022;33:e3803. https://doi.org/10.1002/ett.3803
framework for securing IoT. Trans Emerging Tel Tech. 2019;e3803. https://doi.org/10.1002/ett.3803

You might also like