Deep Learning Approach For Intelligent Intrusion Detection System

Received December 27, 2018, accepted January 3, 2019, date of current version April 11, 2019.
Digital Object Identifier 10.1109/ACCESS.2019.2895334
Deep Learning Approach for Intelligent Intrusion

Detection System
R. VINAYAKUMAR 1 , MAMOUN ALAZAB2 , (Senior Member, IEEE), K. P. SOMAN1 ,
PRABAHARAN POORNACHANDRAN3 , AMEER AL-NEMRAT4 ,
AND SITALAKSHMI VENKATRAMAN5
1 Center for Computational Engineering and Networking, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India
2 College of Engineering, IT & Environment, Charles Darwin University, Casuarina, NT 0810, Australia
3 Centre for Cyber Security Systems and Networks, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Amritapuri, India
4 School of Architecture, Computing, and Engineering, University of East London, London E16 2RD, U.K.
5 Department of IT, Melbourne Polytechnic, Prahran Campus, Melbourne, VIC 3181, Australia
Corresponding author: R. Vinayakumar (vinayakumarr77@gmail.com)

This work was supported in part by the Paramount Computer Systems, in part by the Lakhshya Cyber Security Labs, and in part by the
Department of Corporate and Information Services, Northern Territory Government of Australia.
ABSTRACT Machine learning techniques are being widely used to develop an intrusion detection sys-
tem (IDS) for detecting and classifying cyberattacks at the network-level and the host-level in a timely
and automatic manner. However, many challenges arise since malicious attacks are continually changing
and are occurring in very large volumes requiring a scalable solution. There are different malware datasets
available publicly for further research by cyber security community. However, no existing study has shown
the detailed analysis of the performance of various machine learning algorithms on various publicly available
datasets. Due to the dynamic nature of malware with continuously changing attacking methods, the malware
datasets available publicly are to be updated systematically and benchmarked. In this paper, a deep neural
network (DNN), a type of deep learning model, is explored to develop a flexible and effective IDS to detect
and classify unforeseen and unpredictable cyberattacks. The continuous change in network behavior and
rapid evolution of attacks makes it necessary to evaluate various datasets which are generated over the
years through static and dynamic approaches. This type of study facilitates to identify the best algorithm
which can effectively work in detecting future cyberattacks. A comprehensive evaluation of experiments of
DNNs and other classical machine learning classifiers are shown on various publicly available benchmark
malware datasets. The optimal network parameters and network topologies for DNNs are chosen through the
following hyperparameter selection methods with KDDCup 99 dataset. All the experiments of DNNs are run
till 1,000 epochs with the learning rate varying in the range [0.01–0.5]. The DNN model which performed
well on KDDCup 99 is applied on other datasets, such as NSL-KDD, UNSW-NB15, Kyoto, WSN-DS, and
CICIDS 2017, to conduct the benchmark. Our DNN model learns the abstract and high-dimensional feature
representation of the IDS data by passing them into many hidden layers. Through a rigorous experimental
testing, it is confirmed that DNNs perform well in comparison with the classical machine learning classifiers.
Finally, we propose a highly scalable and hybrid DNNs framework called scale-hybrid-IDS-AlertNet which
can be used in real-time to effectively monitor the network traffic and host-level events to proactively alert
possible cyberattacks.
INDEX TERMS Cyber security, intrusion detection, malware, big data, machine learning, deep learning,
deep neural networks, cyberattacks, cybercrime.
I. INTRODUCTION prone by various attacks from both internal and external

Information and communications technology (ICT) systems intruders [1]. These attacks can be manual and machine
and networks handle various sensitive user data that are generated, diverse and are gradually advancing in obfusca-
tions resulting in undetected data breaches. For instance,
The associate editor coordinating the review of this manuscript and the Yahoo data breach had caused a loss of $350 M and
approving it for publication was Junaid Arshad. Bitcoin breach resulted in a rough estimate of $70M loss [2].
2169-3536
2019 IEEE. Translations and content mining are permitted for academic research only.
VOLUME 7, 2019 Personal use is also permitted, but republication/redistribution requires IEEE permission. 41525
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
R. Vinayakumar et al.: Deep Learning Approach for Intelligent IDS
Such cyberattacks are constantly evolving with very sophis- required to persevere today’s rapidly increasing high-speed
ticated algorithms with the advancement of hardware, soft- network size, speed and dynamics. These challenges form
ware, and network topologies including the recent devel- the prime motivation for this work with a research focus on
opments in the Internet of Things (IoT) [4]. Malicious evaluating the efficacy of various classical machine learning
cyberattacks pose serious security issues that demand the classifiers and deep neural networks (DNNs) applied to NIDS
need for a novel, flexible and more reliable intrusion detec- and HIDS. This work assumes the following;
tion system (IDS). An IDS is a proactive intrusion detection • An attacker aims at pretence as normal user to remain
tool used to detect and classify intrusions, attacks, or viola- hidden from the IDS. However, the patterns of intru-
tions of the security policies automatically at network-level sive behaviors differ in some aspect. This is due to the
and host-level infrastructure in a timely manner. Based specific objective of an attacker for example getting an
on intrusive behaviors, intrusion detection is classified unauthorized access to computer and network resources.
into network-based intrusion detection system (NIDS) and • The usage pattern of network resources can be captured,
host-based intrusion detection system (HIDS) [5]. An IDS however the existing methods ends up in high false
system which uses network behavior is called as NIDS. The positive rate.
network behaviors are collected using network equipment via • The patterns of intrusions exist in normal traffic with a
mirroring by networking devices, such as switches, routers, very low profile over long time interval.
and network taps and analyzed in order to identify attacks and Overall, this work has made the following contributions to
possible threats concealed within in network traffic. An IDS the cyber security domain:
system which uses system activities in the form of various • By combining both NIDS and HIDS collaboratively,
log files running on the local host computer in order to detect an effective deep learning approach is proposed by
attacks is called as HIDS. The log files are collected via modeling a deep neural network (DNN) to detect
local sensors. While NIDS inspects each packet contents in cyberattacks proactively. In this study, the efficacy
network traffic flows, HIDS relies on the information of log of various classical machine learning algorithms and
files which includes sensors logs, system logs, software logs, DNNs are evaluated on various NIDS and HIDS datasets
file systems, disk resources, users account information and in identifying whether network traffic behavior is either
others of each system. Many organizations use a hybrid of normal or abnormal due to an attack that can be classi-
both NIDS and HIDS. fied into corresponding attack categories.
Analysis of network traffic flows is done using misuse • The advanced text representation methods of natural
detection, anomaly detection and stateful protocol analysis. language processing (NLP) are explored with host-level
Misuse detection uses predefined signatures and filters to events, i.e. system calls with the aim to capture the
detect the attacks. It relies on human inputs to constantly contextual and semantic similarity and to preserve the
update the signature database. This method is accurate in sequence information of system calls. The comparative
finding the known attacks but is completely ineffective in the performance of these methods is conducted with the
case of unknown attacks. Anomaly detection uses heuristic ADFA-LD and ADFA-WD datasets.
mechanisms to find the unknown malicious activities. In most • This study uses various benchmark datasets to conduct a
of the scenarios, anomaly detection produces a high false comparative experimentation. This is mainly due to the
positive rate [5]. To combat this problem, most organizations reason that each dataset suffers from various issues such
use the combination of both the misuse and anomaly detec- as data corruptions, traffic variety, inconsistencies, out
tion in their commercial solution systems. Stateful protocol of date and contemporary attacks.
analysis is most powerful in comparison to the aforemen- • A scalable hybrid intrusion detection framework called
tioned detection methods due to the fact that stateful protocol SHIA is introduced to process large amount of
analysis acts on the network layer, application layer and network-level and host-level events to automatically
transport layer. This uses the predefined vendors specification identify malicious characteristics in order to provide
settings to detect the deviations of appropriate protocols and appropriate alerts to the network admin. The proposed
applications. Though deep learning approaches are being framework is highly scalable on commodity hardware
considered more recently to enhance the intelligence of such server and by joining additional computing resources to
intrusion detection techniques, there is a lack of study to the existing framework, the performance can be further
benchmark such machine learning algorithms with publicly enhanced to handle big data in real-time systems.
available datasets. The most common issues in the exist- The code and detailed results are made publicly
ing solutions based on machine learning models are: firstly, available [7] for further research. The remainder of the chap-
the models produce high false positive rate [3], [5] with wider ter is organized as follows. Section II discusses various
range of attacks; secondly, the models are not generalizable stages of compromise according to attackers perspective.
as existing studies have mainly used only a single dataset Section III discusses the related works of similar research
to report the performance of the machine learning model; work done to NIDS and HIDS. Information of scalable
thirdly, the models studied so far have completely unseen framework, the mathematical details of DNNs and text
today’s huge network traffic; and finally the solutions are representation methods for intrusion detection is placed
41526 VOLUME 7, 2019

in Section IV. Section V includes information related to over a network or locally within a system. Now an attack can
major shortcomings of IDS datasets, problem formulation be defined as a set of actions that potentially compromises
and statistical measures. Section VI includes description of the confidentiality, data integrity, availability, or any kind of
datasets. Section VII and Section VIII includes experimental security policy of a resource. Primarily, an IDS system aims
analysis and a brief overview of proposed system and archi- at detecting all these types of attacks to prevent the com-
tecture design. Section IX presents the experimental results. puters and networks from malicious activities. In this work,
Conclusion, future work directions and discussions are placed we focus towards the categorization scheme as suggested by
in Section X. the DARPA Intrusion Detection Evaluation.
II. STAGES OF COMPROMISE: AN ATTACKER’S VIEW III. RELATED WORKS

Mostly, intrusions are initiated by unauthorized users named The research on security issues relating to NIDS and HIDS
as attackers. An attacker can attempt to access a computer exists since the birth of computer architectures. In recent
remotely via the Internet or to make a service remotely days, applying machine learning based solutions to NIDS and
unusable. Detection of intrusion accurately requires under- HIDS is of prime interest among security researchers and
standing the method to successfully attack a system. Gener- specialists. A detailed survey on existing machine learning
ally, an attack can be classified into five phases. They are based solutions is discussed in detail by [5]. This section
reconnaissance, exploitation, reinforcement, consolidation, discusses the panorama of largest study to date that explores
and pillage. An attack can be detected during the first three the field of machine learning and deep learning approaches
phases however once it reaches the fourth or fifth phase then applied to enhance NIDS and HIDS.
the system will be fully compromised. Thus, it is very difficult
to distinguish between a normal behavior and an attack. A. NETWORK-BASED INTRUSION DETECTION
During the reconnaissance phase, an attacker tries to collect SYSTEMS (NIDS)
information related to reachable hosts and services, as well Commercial NIDS primarily use either statistical measures
as the versions of the operating systems and applications or computed thresholds on feature sets such as packet length,
that are running. During the exploitation phase, an attacker inter-arrival time, flow size and other network traffic param-
utilizes a particular service with the aim to access the target eters to effectively model them within a specific time-
computer. A service may be identified as abusing, subverting, window [6]. They suffer from high rate of false positive and
or breaching. An abusing service includes stolen password or false negative alerts. A high rate of false negative alerts
dictionary attacks and subversion includes an SQL injection. indicates that the NIDS could fail to detect attacks more
After an illegal forced entry to a system, an attacker follows frequently, and a high rate of false positive alerts means the
camouflage activity and then installs supplementary tools and NIDS could unnecessarily alert when no attack is actually
services to take advantage of the privileges gained during taking place. Hence, these commercial solutions are ineffec-
the reinforcement phase. Based on the misused user account, tive for present day attacks.
an attacker tries to gain full system access. Finally, an attacker Self-learning system is one of the effective methods to
utilizes the applications that are accessible from the available deal with the present day attacks. This uses supervised,
user account. An attacker obtains a complete control over the semi-supervised and unsupervised mechanisms of machine
system in the consolidation phase and the installed backdoor learning to learn the patterns of various normal and malicious
which is used for communication purposes during the consol- activities with a large corpus of Normal and Attack net-
idation phase. The final phase is pillage where an attacker’s work and host-level events. Though various machine learning
possible malicious activities include theft of data and CPU based solutions are found in the literature, the applicability
time, and impersonation. to commercial systems is in early stages [9]. The existing
Since computers and networks are assembled and pro- machine learning based solutions outputs high false posi-
grammed by humans, there are possibilities for bugs in both tive rate with high computational cost [3]. This is because
the hardware and software. These human errors and bugs can machine learning classifiers learn the characteristic of sim-
lead to vulnerabilities [8]. Confidentiality, data integrity and ple TCP/IP features locally. Deep learning is a complex
availability are main pillars of information security. Authen- subnet of machine learning that learns hierarchical fea-
ticity and accountability are also plays an important role ture representations and hidden sequential relationships by
in information security. Generally attacks against the confi- passing the TCP/IP information on several hidden layers.
dentiality addresses passive attacks for example eavesdrop- Deep learning has achieved significant results in long stand-
ping, integrity addresses active attacks for example system ing Artificial intelligence (AI) tasks in the field of image
scanning attacks i.e. ’Probe’ and availability addresses the processing, speech recognition, natural language process-
attacks related to making network resources down so these ing (NLP) and many others [10]. Additionally, these per-
will be unavailable for normal users for example denial of formances have been transformed to various cyber security
service (’DoS’) and distributed denial of service (’DDoS’). tasks such as intrusion detection, android malware classifica-
IDS systems have limited capability to detect attacks related tion, traffic analysis, network traffic prediction, ransomware
to eavesdropping. ’Probe’ attack can be launched over either detection, encrypted text categorization, malicious URL
VOLUME 7, 2019 41527

detection, anomaly detection, and malicious domain name the feature relevance in terms of information gain. In addi-
detection [11]. This work focuses towards analyzing the tion, they presented the most relevant features for each class
effectiveness of various classical machine learning and deep label. Reference [21] discussed random forest techniques in
neural networks (DNNs) for NIDS with the publicly avail- misuse detection by learning patterns of intrusions, anomaly
able network-based intrusion datasets such as KDDCup 99, detection with outlier detection mechanism, and hybrid detec-
NSL-KDD, Kyoto, UNSW-NB15, WSN-DS and CICIDS tion by combining both the misuse and anomaly detection.
2017. They reported that the misuse approach worked better than
A large study of academic research used the de facto stan- winning entries of KDDCup 99 challenge results, and in
dard benchmark data, KDDCup 99 to improve the efficacy of addition anomaly detection worked better compared to other
intrusion detection rate. KDDCup 99 was used for the third published unsupervised anomaly detection methods. Overall,
International Knowledge Discovery and Data Mining Tools it was concluded that the hybrid system enhances the perfor-
Competition and the data was created as the processed form mance with the advantage of combining both the misuse and
of tcpdump data of the 1998 DARPA intrusion detection (ID) anomaly detection approaches [22], [23], [72]. In [24], an ID
evaluation network. The aim of the contest was to create algorithm using AdaBoost technique was proposed that used
a predictive model to classify the network connections into decision stumps as weak classifiers. Their system performed
two classes: Normal or Attack. Attacks were categorized into better than other published results with a lower false alarm
denial of service (’DoS’), ’Probe’, remote-to-local (’R2L’), rate, a higher detection rate, and a computationally faster
user-to-root (’U2R’) categories. The mining audit data for algorithm. However, the drawback is that it failed to adopt
automated models for ID (MADAMID) was used as feature the incremental learning approach. In [25], the performance
construction framework in KDDCup 99 competition [17]. of the shared nearest neighbor (SNN) based model in ID was
MADAMID outputs 41 features: first 9 features are basic studied and reported as the best algorithm with a high detec-
features of a packet, 10-22 are content features, 23-31 are traf- tion rate. With the reduced dataset they were able to conclude
fic features, and 32-41 are host-based features. The choices that SNN performed well in comparison to the K-means for
of available datasets are: (1) full dataset and (2) comple- ’U2R’ attack category. However, their work failed to show
mentary 10% data. The detailed evaluation results of KDD- the results on the entire testing dataset.
Cup 98 and KDDCup 99 challenge was published in [3]. In [26], Bayesian networks for ID was explored using
Totally, 24 entries were submitted in the KDDCup 98, in that Naive Bayesian networks with a root node to represent a
3 winning entries used variants of decision tree to whom class of a connection and leaf nodes to represent features
they showed only the marginal statistics significance in per- of a connection. Later, [27] investigated the application of
formance. The 9th winning entry in the contest used the Naive Bayes network to ID and through detailed experi-
1-nearest neighbor classifier. The first significant difference mental analysis, they showed that Bayesian networks per-
in performance was found between 17th and 18th entries. formed equally well and sometimes even better in ’U2R’ and
This inferred that the first 17 submissions method were robust ’Probe’ categories in comparison with the winning entries
and were profiled by [3]. The Third International Knowledge of KDDCup 99 challenge. In [28], a non-parametric den-
Discovery and Data Mining Tools Competition task remained sity estimation method based on Parzen-window estimators
as a baseline work and after this contest many machine learn- was studied with Gaussian kernels and Normal distribution.
ing solutions have been found. Most of the published results Without the intrusion data, their system was comparatively
took only the 10% data of training and testing and few of favorable to the existing winning entries that was based on
them used custom-built datasets. Recently, a comprehensive ensemble of decision trees. In [29], a genetic algorithm based
literature survey on machine learning based ID with KDDCup NIDS was proposed that facilitates to model both tempo-
99 dataset was conducted [18]. ral and spatial information to identify complex anomalous
After the challenge, most of the published results of behavior. An overview of ensemble learning techniques for
KDDCup 99 have used several feature engineering meth- ID was given in [30], and swarm intelligence techniques for
ods for dimensionality reduction [18]. While few studies ID using ant colony optimization, ant colony clustering and
employed custom-built datasets, majority used the same particle swarm optimization of systems were studied in [31].
dataset for newly available machine learning classifiers [18]. A comparative study in such research works show that the
These published results are partially comparable to the results descriptive statistics was predominantly used.
of the KDDCup 99 contest. Overall, a comprehensive literature review shows very
In [19], the classification model consists of two-stages: few studies use modern deep learning approaches for NIDS
i) P-rules stage to predict the presence of the class, and and the commonly used benchmark datasets for experimen-
ii) N-rules stage to predict the absence of the class. This tal analysis are KDDCup 99 and NSL-KDD [3], [32]–[34].
performed well in comparison with the aforementioned The IDS based on recurrent neural network (RNN) out-
KDDCup 99 results except for the user-to-root (’U2R’) cat- performed other classical machine learning classifiers in
egory. In [20], the significance of feature relevance analysis identifying intrusion and intrusion type on the NSL-KDD
was investigated for IDS with the most widely used dataset, dataset [32]. Two level approach proposed for IDS in which
KDDCup 99. For each feature they were able to express the first level extracts the optimal features using sparse
41528 VOLUME 7, 2019

autoencoder in an unsupervised way and classified using communication with the operating system via system calls,
softmax regression [33]. The application of stacked autoen- their behavior, ordering, type and length generates a unique
coder was proposed for optimal feature extraction in an trace. This can be used to distinguish between the known
unsupervised way where the proposed method is completely and unknown applications [12]. System calls of normal and
non-symmetric and classification was done using Random intrusive process are entirely different. Thus analysis of those
forest. Novel long short-term memory (LSTM) architecture system calls provides significant information about the pro-
was proposed and by modeling the network traffic informa- cesses of a system. Various feature engineering approaches
tion in time series obtained better performance. The proposed have been used for system call based process classification.
method performed well compared to all the existing methods They are N-gram [12], [13], sequence gram [14] and pair
and as well as KDDCup 98 and 99 challenge entries [3]. The gram [15]. An important advantage of HIDS is that it provides
performance of various RNN types were evaluated by [34]. detailed information about the attacks.
Various deep learning architectures and classical machine The three main components of HIDS, namely the data
learning algorithms were evaluated for anomaly based ID source, the sensor, and the decision engine play an important
on NSL-KDD dataset [74]. The configuration of SVM was role in detecting security intrusions. The sensor component
formulated as bi-objective optimization problem and solved monitors the changes in data source, and the decision engine
using hyper-heuristic framework. The performance was eval- uses the machine learning module to implement to the intru-
uated for malware and anomaly ID. The proposed framework sion detection mechanism. However, the benchmarking the
is very suitable for big data cyber security problems [75]. data source component requires much investigation.
To enhance the anomaly based ID rate, the spatial and Compiling the KDDCup 99 dataset involved the
temporal features were extracted using convolutional neu- data source component with system calls and Sequence
ral network and long short-term memory architecture. The Time-Delay Embedding (STIDE) approach used to analyze
performance was shown on both KDDCup 99 and ISCX the fixed length pattern of system calls to distinguish between
2012 datasets [76]. Two step attack detection method was normal and anomalous behaviors [13]. A large number of
proposed along with a secure communication protocol for decision engines have been used to analyze patterns of
big data systems to identify insider attack. In the first step, system calls to detect intrusions. Such a data source is
process profiling was done independently at each node and most commonly used among cyber security research com-
in second step using hash matching and consensus, process munity. Apart from system calls, since Windows operating
matching was done [77]. An online detection and estimation system (OS) does not provide a direct access to system calls,
method was proposed for smart grid system [78]. The method log entries [35] and registry entry manipulations [36] form
specifically designed for identifying false data injection and the other two most commonly used data sources. This work
jamming attacks in real-time and additionally provides online focuses on the decision engine component to benchmark the
estimates of the unknown and time-varying attack parameters data source.
and recovered state estimates [78]. A scalable framework Classical methods aim to find information about the nature
for ID over vehicular ad hoc network was proposed. The of the host activity by analyzing the patterns in the sequence
framework uses distributed machine learning i.e. alternating of system calls. While STIDE was most commonly used
direction method of multipliers (ADMM) to train a machine simple algorithm, Support Vector Machines (SVMs), Hid-
learning model in a distributed way to learn whether an den Markov Models (HMMs) and Artificial Neural Net-
activity normal or attack [79]. works (ANNs) are more recently adopted complex meth-
ods. In [37], N-gram feature extraction approach was used
B. HOST-BASED INTRUSION DETECTION SYSTEMS (HIDS) for compiling the ADFA-LD system call data and N-gram
Various software tools such as Metasploit, Sqlmap, Nmap, features were passed to different classical machine learning
Browser Exploitation provide the necessary framework to classifiers to identify and categorize attacks. In [39], in order
examine and gather information from target system vulner- to reduce the dimensions of system calls, K-means and KNN
abilities. Malicious attackers use such information to launch were experimented using a frequency based model. A revised
attacks to various applications like FTP server, web server, version of N-gram model was used in [38] to represent system
SSH server, etc. Existing methods such as firewall, cryptogra- calls with various classical machine learning classifiers for
phy methods and authentications aim to defend host systems both Binary and Multi-class categories. An approach for
against such attacks. However, these solutions have limita- HIDS based on N-gram system call representations with
tions and malicious attackers are able to gain unauthorized various classical machine learning classifiers was proposed
access to the system. To address this, a typical HIDS operates in [40]. To reduce the dimensions of N-gram, dimensionality
at host-level by analyzing and monitoring all traffic activities reduction methods were employed. In [41], frequency dis-
on the system application files, system calls and operating tribution based feature engineering approach with machine
system [73]. These types of traffic activities are typically learning algorithms was explored to handle the zero-day and
called as audit trials. A system call of an operating system is stealth attacks in Windows OS. In [42], an ensemble approach
a key feature that interacts between the core kernel functions for HIDS was proposed using language modeling to reduce
and low level system applications. Since an application makes the false alarm rates which is a drawback in classical methods.
VOLUME 7, 2019 41529

This method leveraged the semantic meaning and communi- The proposed scalable architecture employs distributed
cations of system call. The effectiveness of their methods was and parallel machine learning algorithms with various opti-
evaluated on three different publicly available datasets. mization techniques that makes it capable of handling very
Overall, the published results are limited in detecting the high volume of network and host-level events. The scalable
intrusions and cyberattacks using HIDS. Studies that show an architecture also leverages the processing capability of the
increase in detection rate of intrusions and cyberattacks also general purpose graphical processing unit (GPGPU) cores for
show an increase in false alarm rate. faster and parallel analysis of network and host-level events.
The pros and cons of NIDS and HIDS with its efficacy are The framework contains two types of analytic engines, they
discussed in detail by [16]. Major advantages of HIDSs are: are real-time and non-real-time. The purpose of analytic
HIDSs facilitate to detect local attacks and are unaffected by engine is to monitor network and host-level events to generate
the encryption of network traffic. Major disadvantage is that an alert for an attack. The developed framework can be scaled
they need all the configuration files to identify attack, but it out to analyze even larger volumes of network event data by
is a daunting task due to the huge amount of data. Allowing adding additional computing resources. The scalability and
access to big data technology in the domain of cyber security real-time detection of malicious activities from early warning
is of paramount importance, particularly IDS. The motivation signals makes the developed framework stand out from any
of this research is to develop a novel scalable platform with system of similar kind.
hybrid framework of NIDS and HIDS, which is capable of
handling large amount of data with the aim to detect the B. TEXT REPRESENTATION METHODS
intrusions and cyberattacks more accurately. System calls are essential in any operating system depict-
ing the computer processes and they constitute a humon-
gous amount of unstructured and fragmented texts that a
IV. PROPOSED SCALABLE FRAMEWORK typical HIDS uses to detect intrusions and cyberattacks.
Today’s ICT system is considerably more complex, con- In this research we consider text representation methods
nected and involved in generating extremely large volume to classify the process behaviors using system call trace.
of data, typically called as big data. This is primarily due to Classical machine learning approaches adopt feature extrac-
the advancement in technologies and rapid deployments of tion, feature engineering and feature representation meth-
large number of applications. Big data is a buzzword which ods. However, with advanced machine learning embedded
contains techniques to extract important information from approach such as deep learning, the necessity of the feature
large volume of data. Allowing access to big data technology engineering and feature extraction steps can be completely
in the domain cyber security particularly IDS is of paramount avoided. We adopt such advanced deep learning along with
importance [44]. The advancement in big data technology text representation methods to capture the contextual and
facilitates to extract various patterns of legitimate and mali- sequence related information from system calls. The follow-
cious activities from large volume of network and system ing feature representation methods in the field of NLP are
activities data in a timely manner that in turn facilitates used to convert the system calls into feature vectors in this
to improve the performance of IDS. However, processing study.
of big data by using the conventional technologies is often
• Bag-of-Words (BoW): This classical and most com-
difficult [43]. The purpose of this section is to describe the
monly used representation method is used to form
computing architecture and the advanced methods adopted in
a dictionary by assigning a unique number for each
the proposed framework, such as text representation methods,
system call. Term document matrix (TDM) and term
deep neural networks (DNNs) and the training mechanisms
frequency-inverse document frequency (TF-IDF) are
employed in DNNs.
employed to estimate the feature vectors. The drawback
is that it cannot capture the sequence information of
A. SCALABLE COMPUTING ARCHITECTURE system calls [46].
The technologies such as Hadoop Map reduce and Apache • N-grams: An N-gram text representation method has
Spark in the field of high performance computing is found the capability to preserve the sequence information of
to be an effective solution to process the big data and to pro- system calls. The size of N can be 1 (uni-gram), 2 (bi-
vide timely actions. We have developed scalable framework gram), 3 (tri-gram), 4 (four-gram), etc., which can be
based on big data techniques, Apache Spark cluster com- employed appropriately depending on the context.
puting platform [45]. Due to the confidential nature of the • Keras Embedding: This follows a sequential represen-
research, the scalable framework details cannot be disclosed. tation method to convert the system calls into a numeric
The Apache spark cluster computing framework is setup over form of vocabulary by simply assigning a unique num-
Apache Hadoop Yet Another Resource Negotiator (YARN). ber for each system call. The size of vocabulary defines
This framework facilitates to efficiently distribute, execute the number of unique system calls and their frequency
and harvest tasks. Each system has specifications(32 GB of occurrence places them in an ascending order within a
RAM, 2 TB hard disk, Intel(R) Xeon(R) CPU E3-1220 v3 lookup table. Each system call in a vector is transformed
@ 3.10GHz) running over 1 Gbps Ethernet network. to a numeric using the lookup table for assigning a
41530 VOLUME 7, 2019

corresponding index. We adopt a fixed length vector

method by transforming all vectors to the same length.
C. DEEP NEURAL NETWORK (DNN)

We employ an artificial neural network (ANN) approach
as the computational model since it is influenced by the
characteristics of biological neural networks to incorporate
intelligence in our proposed method. Feed forward neural
network (FFN), a type of ANN is represented as a directed
graph to pass various system information along edges from
one node to another without forming a cycle. We adopt a
multilayer perceptron (MLP) model which is a type of FFN
having three or more layers with one input layer, one or more
hidden layers and an output layer in which each layer has
many neurons or units in mathematical notation. We select
FIGURE 1. Architecture of a deep neural network (DNN).
the number of hidden layers by following a hyper parameter
selection method. The information is transformed from one
layer to another layer in a forward direction with neurons in
each layer being fully connected. MLP is defined mathemat- We employ DNNs as a more advanced model of the clas-
ically as O : Rm × Rn where m is the size of the input vector sical FFN with each hidden layer using the non-linear activa-
x = x1 , x2 , · · · , xm−1 , xm and n is the size of the output vector tion function, ReLU as it helps to reduce the state of vanishing
O(x) respectively. The computation of each hidden layer hi is and error gradient issue [47]. The advantage of ReLU is that
mathematically defined as it is faster than other non-linear activation functions and
facilitates training the MLP model with the large number of
hi (x) = f (wi T x + bi ) (1) hidden layers. The hidden layers define the depth of the neural
network and the maximum neurons define the width of the
where hi : Rdi −1 → Rdi , f : R → R, wi ∈ Rd×di−1 , b ∈ Rdi , neural network.
di denotes the size of the input, f is the non-linear activation The uniqueness of our method is in modeling the loss func-
function, which is either a sigmoid (values in the range [0, 1]) tions and the ReLU to maximize deep learning efficiently.
or a tangent function (values in the range [1, −1]). For These are described in detail.
the classification problem of Multi-class, our MLP model 1) Loss functions: In modeling an MLP, finding an
uses softmax function as the non-linear activation function. optimal parameter is essential towards achieving good
softmax function outputs the probabilities of each class and performance. This includes the loss function as an ini-
selects the largest value among probability values to give a tial step. A loss function is used to calculate the amount
more accurate value. The mathematical formulae for sigmoid, of difference between the predicted and target values.
tangent and softmax activation function are given below. This is defined mathematically as:
1
sigmoid = (2) d(t, p) =k t − p k22 (6)
1 + e−x
e2x − 1 where t denotes the target value and p denotes the
tan gent = 2x (3)
e +1 predicted value.
e xi Multi-class classification uses the negative log prob-
softmax(xi ) = Pn xj (4)
j=1 e ability with t as the target class and p(pad) as the
probability distributions as represented below:
where x defines an input.
Three-layer MLP with a softmax function in output layer d(t, p(pd)) = − log p(pd)t (7)
is same as a Multi-class logistic regression model. In general
terms, for many hidden layers, MLP is formulated as follows: However, the network receives a list of corrected
input-output set i_o = (i1 , o1 ), (i2 , o2 ), · · · , (in, on) in
H (x) = Hl (Hl−1 (Hl−2 (· · · (H1 (x))))) (5) the training process. Then, we aim to decrease the mean
of losses as defined below:
This way of stacking hidden layers is typically called deep
n
neural networks (DNNs). The architecture of deep neural 1X
loss(in, on) = d(oil, f (iris)) (8)
network (DNN) as shown in Figure 1 contains 1 hidden n
i=1
layer. It takes inputs x = x1 , x2 , · · · , xm−1 , xm and outputs
o = o1 , o2 , · · · , oc−1 , oc . However, all the connections and The loss function has to be minimized to get bet-
hidden layers along with its units are not shown in Figure 1. ter results in a neural network. A loss functions is
VOLUME 7, 2019 41531

defined as; well-defined times with well-defined protocols. Each con-

n nection record includes 100 bytes of information and labeled
1 X
Traini_o (θ ) ≡ Li_o (θ) = d(oi , fθ (ii )) (9) as either Normal or as an Attack with exactly one particular
n attack type. Each connection record has a vector and defined
i=1
where θ = (w1 , b1 , · · · , wn , bn ) as follows
Loss function minimization Li_o (θ) is done by follow- CV = (f1 , f2 , · · · , fn , cl) (14)
ing a right selection of the value θ ∈ Rd and inherently
includes the estimation of fθ (pi ) and ∇fθ (ii ) at the cost where f denotes features of length n, values of each f ∈ R
|i_o| and cl denotes a class label.
min L(θ ) (10) B. PROBLEM FORMULATION FOR HIDS

θ
In general, all the system events which are the system calls
Various optimization techniques exist and we adopt are collected for each process. Each process p is composed of
Gradient descent as it is most commonly used. Gradient sequence of system calls S = sp1 , sp2 , · · · spn where sp ∈ S,
descent uses the following rule to calculate and update sp is a finite set of system calls and S is the set of system
parameter repeatedly: calls used by the host. A sequence of system call information
θ new = θold − α∇θ L(θ) (11) is used to distinguish the behavior between the Normal and
Attack categories. The sp along with the label such as Normal
where α denotes learning rate and it is selected based or Attack can be used to learn the behaviors of Normal and
on a hyper parameter selection approach. To find a Attack activities.
derivative of L, backpropagation or backward propa-
gation of errors algorithm is adopted. Backpropagation C. DATASET LIMITATIONS
uses chain rule to compute θ ∈ Rd with the aim to Most of the datasets which represents the current network
minimize the loss function Li_o (θ). However, most of traffic attacks are private due to privacy and security issues.
the neural network uses an extension of backpropaga- On the other direction, the datasets which are publicly
tion called as stochastic gradient descent (SGD) for available are laboriously anonymized and suffer from var-
finding minimum θ. SGD uses a mini batch of train- ious issues. In particular they failed to validate that their
ing samples im_om, in which training samples i_o are datasets typically exhibit the real-world network traffic pro-
chosen randomly instead of using the entire training set file. KDDCup 99 is one of the most commonly used publicly
im_om ⊆ i_o. SGD update rule is given as: available datasets. Although with some known harsh criti-
θ new = θold − α∇θ J (θ; im(i) , om(i) ) (12) cisms, it has been continually used as an effective benchmark
dataset for many of the research study towards NIDS over the
where im(i) , om(i) denotes input-output pair training years. In contrast to critiques of strategy to create dataset, [50]
samples. revealed the detailed analysis of the contents and located
2) Rectified Linear Unit (ReLU ): Rectified linear unit the non-uniformity and simulated artifacts in the simulated
(ReLU ) is found to have a great proficiency and has the network traffic data. They strived to scale the performance
tendency to accelerate the training process [47]. ReLU of network anomaly detection between the KDDCup 99 and
was the main breakthrough in the neural network his- varied KDDCup 99. They reported that many of the net-
tory for reducing the vanishing and exploding gradient work attributes particularly, remote client address, TTL, TCP
issue. It’s found as the most efficient method in terms options and TCP window size are indicated as small and
of time and cost for training huge data in comparison limited range in KDDCup 99 datasets but actually exhibit to
to the classical non-linear activation function such as be of large and growing range in real world network traffic
sigmoid and tangent function [47]. We refer to neurons environment.
with this non linearity following [47]. The mathemati- In [51], discussions indicate why the machine learning
cal formula for ReLU is defined as follows classifiers have limited capability in detecting the attacks that
f (x) = max(0, x) (13) belong to content (’R2L’ and ’U2R’) category in KDDCup 99
dataset. With this dataset, none of the machine learning clas-
where x defines input. sifiers were able to improve the attack detection rate. They
admitted that the possibility of getting high attack detection
V. PROBLEM FORMULATION, DATASET LIMITATIONS rate in most of the cases is by producing new dataset with a
AND STATISTICAL MEASURES combination of training and testing datasets. In addition, [52]
A. PROBLEM FORMULATION FOR NIDS found that many ’snmpgetattack’ belongs to ’R2L’ category
Generally, the network traffic data is collected and stored attacks. As a result, in most of the cases, machine learning
in raw TCP dump format. Later, this data can be prepro- classifier poorly performs with this data.
cessed and converted into connection records. A connection DARPA / KDDCup 88 failed to evaluate the classical IDS
is simply a sequence of TCP packets starting and ending at and it was one of the major of many criticisms. To mitigate
41532 VOLUME 7, 2019

this [53] used Snort ID system on DARPA / KDDCup 98 tcp- the case of Binary classification. Let L and A be the number
dump traces. The system performed poorly, the accuracy and of Normal and Attack connection records in the test dataset,
the false positive rates were impermissible. This is mainly due respectively and the following terms are used for determining
to the system failed to detect the attacks belongs to the ’DoS’ the quality of the classification models:
and ’Probe’ category with a fixed signature. In contrast to this, • True Positive (TP) - the number of connection records
the detection performance of ’R2L’ and ’U2R’ is much better. correctly classified to the Normal class.
Despite of the harsh criticisms yet, KDDCup 99 is been • True Negative (TN ) - the number of connection records
the most widely used reliable benchmark dataset in most of correctly classified to the Attack class.
the study related to ID system evaluation and other security • False Positive (FP) - the number of Normal connec-
related tasks [55]. To resolve the inherent issues that are exists tion records wrongly classified to the Attack connection
in KDDCup 99, [55] proposed a most refined version called record.
NSL-KDD. They removed the redundant connection records • False Negative (FN ) - the number of Attack connection
in the entire train and test data and in addition the invalid records wrongly classified to the Normal connection
records, numbered 136,489 and 136,497 were removed from record.
test data. Thus it protects the classifier not to be biased Based on the aforementioned terms, the following most
in the direction of the more frequent connection records. commonly used evaluation metrics are considered.
NSL-KDD is also not a real world representative of network
1) Accuracy: It estimates the ratio of the correctly recog-
traffic data. Still, this refined version failed to entirely solve
nized connection records to the entire test dataset. If the
the issues reported by [57]. To enhance the performances in
accuracy is higher, the machine learning model is better
detecting attacks, 10 more extra features were added with
(Accuracy ∈ [0, 1]). accuracy serves as a good measure
14 important features from KDDCup 99 [58]. The Kyoto
for the test dataset that contains balanced classes and
dataset was generated using honeypots. Thus, each flow of
defined as follows
network traffic was done automatically. The normal traf-
fic of Kyoto dataset was not captured from the real world TP + TN
Accuracy = (15)
network traffic. Moreover, the dataset doesn’t contain false TP + TN + FP + FN
positives that help to minimize the number of alerts to the 2) Precision: It estimates the ratio of the correctly iden-
network admin [58]. In [56] generated a new dataset follow- tified attack connection records to the number of
ing two different profile system in which one system was all identified attack connection records. If the Preci-
for generating attacks and other one was for normal activi- sion is higher, the machine learning model is better
ties. This dataset doesn’t contain network traffic of HTTPS (Precision ∈ [0, 1]). Precision is defined as follows
protocol. Most of the attacks were simulated and failed to
TP
preserve the characteristics of real world statistics. In [57] Precision = (16)
proposed UNSW-NB15. They adopted the notion of profiles TP + FP
that contains the comprehensive information of intrusions and 3) F1-Score: F1-Score is also called as F1-Measure. It is
applications, protocols, or lower level network entities from the harmonic mean of Precision and Recall. If the
the modern network traffic and detailed information about F1-Score is higher, the machine learning model is better
the network traffic. Recently, to provide benchmark dataset (F1−Score ∈ [0, 1]). F1-Score is defined as follows
to the research community, [59] generated reliable dataset.
Pr ecision × Recall

This meets the real world benign and attacks of network F1 − Score = 2 × (17)
Pr ecision + Recall
activities. Moreover, the detailed evaluation of network traffic
features was done by them and detailed experiments towards 4) True Positive Rate (TPR): It is also called as Recall.
the importance of features to detect various attacks were It estimates the ratio of the correctly classified Attack
done. connection records to the total number of Attack con-
Most widely used dataset for HIDS is KDDCup 98, nection records. If the TPR is higher, the machine
KDDCup 99 and University of New Maxico (UNM). These learning model is better (TPR ∈ [0, 1]). TPR is defined
datasets were compiled decades ago and most of them are as follows
irrelevant for today’s operating system. Recently, [62] made TP
the dataset to be publicly available. Thus, this dataset has TPR = (18)
TP + FN
been used as new benchmark for evaluating system call
5) False Positive Rate (FPR): It estimates the ratio of
based HIDS. The dataset comprises of modern vulnerability
the Normal connection records flagged as Attacks to
exploits and attacks.
the total number of Normal connection records. If the
FPR is lower, the machine learning model is better
D. STATISTICAL MEASURES
(FPR ∈ [0, 1]). FPR is defined as follows
In evaluation to estimate the various statistical measures the
ground truth value is required. The ground truth composed of FP
FPR = (19)
set of connection records labeled either Normal or Attack in FP + TN
VOLUME 7, 2019 41533

TABLE 1. Training and testing connection records from KDDCup 99 and NSL-KDD datasets.
6) Receiver Operating Characteristics (ROC) curve: •Basic features [1-9]: The packet capture (Pcap)
ROC is plotted based on the trade-off between the TPR files of tcpdump are used to extract the basic
on the y axis to FPR on the x axis across different features from the packet headers, TCP segments,
thresholds. Area Under the ROC Curve (AUC) is the and UDP datagram instead of payload. This task
size of the area under the ROC curve used along with was carried out using a remodeled network analy-
ROC as a comparison metric for the machine learning sis framework, Bro IDS.
models. If the AUC is higher, the machine learning • Content features [10-22]: Content features are
model is better. extracted from the full payload of TCP/IP packets
rooted on domain knowledge in tcpdump files.
Z 1 TP FP The feature analysis of payload has remained
AUC = d
0 TP + FN TN + FP as research area for the last years. Recently,
in [65], a deep learning approach was introduced
to analyze the entire payload data instead of fol-
VI. MODELLING THE DATASET
lowing the feature extraction process. Content
Due to security and privacy issues, most of the datasets
features are mainly used to identify ’R2L’ and
are not publicly available. Additionally, the data which are
’U2R’ category attacks. For example, many failed
publicly available are laboriously anonymized and do not
login attempts is the most prominent feature to
contemplate today’s network traffic variety. Due to these
indicate the malicious behavior in the entire pay-
issues, the exemplary dataset is yet to be discerned [64]. The
load. Unlike other category attacks, ’R2L’ and
details of various IDS datasets are discussed in [59] and [66].
’U2R’ category do not have the prominent sequen-
A detailed overview of available datasets between 1998 and
tial patterns due to events happening in a single
2016 is discussed in detail by [66]. We consider the pros and
connection.
cons of existing datasets used in NIDS and HIDS and discuss
• Time-based traffic features [23-41]: Time-based
how our datasets were modeled.
traffic features are extracted with a specific tem-
poral window of two seconds. These are grouped
A. DATASETS USED IN NIDS into ’same host’ and ’same service’ based on the
1) KDDCup 99: KDDCup 99 dataset was built by pro- connection characteristics in the past 2 seconds.
cessing tcpdump data of the 1998 DARPA intrusion To handle slow probing attacks, the aforemen-
detection challenge dataset. The Mining Audit data tioned characteristics are recalculated based on a
for automated models for ID (MADMAID) frame- connection window of 100 connections to the same
work was used to extract features from raw tcpdump host. These are typically termed as connection
data. The detailed statistics of the dataset is reported based or host-based traffic features.
in Table 1. KDDCup 1998 dataset was created by MIT 2) NSL-KDD: NSL-KDD is the distilled version of
Lincon laboratory using 1000’s of UNIX machines and KDDCup 99 intrusion data. The filters are used to
100’s users accessing those machines. The network remove redundant connection records in KDDCup
traffic data was captured and stored in tcpdump format 99 and connection records numbered 136,489 and
for 10 weeks. The data of first seven weeks was used 136,497 are removed from the test data. NSL-KDD can
as training dataset and rest used as testing dataset. protect machine learning algorithms not to be biased.
KDDCup 99 dataset is available in two forms. They This can suits well for misuse detection in compared to
are full dataset and 10% dataset. The dataset contains the KDDCup 99 dataset. This also suffers from repre-
41 features and 5 classes (’Normal’, ’DoS’, ’Probe’, senting the real-time network traffic profile character-
’R2L’, ’U2R’). These features are grouped into differ- istics. The detailed statistics of NSL-KDD is reported
ent categories as given below: in Table 1.
41534 VOLUME 7, 2019
TABLE 2. Training and testing connection records of partial dataset of UNSW-NB15.
3) UNSW-NB15: The cyber security research team of 4) Kyoto: The honeypot systems of Kyoto university
Australian Centre for Cyber Security (ACCS) has intro- network traffic data have 24 statistical features. Among
duced a new data called as UNSW-NB15 to resolve 24 features, 14 features are from KDDCup 99. These
the issues found in the KDDCup 99 and NSL-KDD features are important as they are collected from raw
datasets. This data is generated in a hybrid way, con- traffic data of Kyoto university honeypot systems.
taining the normal and attack behaviors of a live net- Additionally, 10 more features are identified with the
work traffic using IXIA Perfect Storm tool that has a honeypot’s network traffic system. In this work, the net-
repository of new attacks and common vulnerability work logs of the year 2015 are considered. The logs
exposures (CVE), a storehouse containing information are preprocessed and divided into training and test-
regarding security vulnerabilities and exposures, which ing datasets. The detailed statistics of the connection
are known publicly. Two servers were used in IXIA records are reported in Table 5.
traffic generator tool where one server generated the 5) WSN-DS: It is an IDS dataset developed for wireless
normal activities, whereas the other generated mali- sensor networks (WSN). This composed of four dif-
cious activities in the network. Tcpdump tool used for ferent types of ’DoS’ attacks: Blackhole, Grayhole,
capturing network packet traces which took several Flooding, and Scheduling. They used Low-energy
hours to compile the whole data of 100 GBs that were adaptive clustering hierarchy (LEACH) protocol to
divided into 1,000 MB pcaps using tcpdump. From collect data from network Simulator 2 (NS-2) and then
pcap files, the features were extracted using Argus and preprocessed to generate 23 features. This dataset was
Bro-IDS in Linux Ubuntu 14.0.4. In addition to the termed as WSN-DS and its detailed statistics is reported
above methods, depth analysis of each packet was done in Table 3.
with 12 algorithms which are developed using C#. The 6) CICIDS2017: This dataset includes the contemporary
data is accessible in two forms as follows: activities of benign and attacks which depicts the real-
a) Full connection records consisting of 2 million time network traffic. The main interest is given towards
connection records collecting the real-time background traffic during cre-
b) A partition of full connection records which is ating this dataset. Using B-profile system, benign back-
composed of 82,332 train connection records ground traffic was collected. This benign traffic con-
and 175,341 test connection records confined tains the characteristics of 25 users based on the HTTP,
with 10 attacks. The partitioned dataset consists HTTPS, FTP, SSH, and email protocols. The network
of 42 features with their parallel class labels traffic for five days was collected and dumped with
which are Normal and nine different Attacks. The normal activity traffic on one day, and attacks injected
information regarding simulated attacks category on other days. The various attacks injected were Brute
and its detailed statistics are described in Table 2. Force FTP, Brute Force SSH, ’DoS’, Heartbleed, Web
VOLUME 7, 2019 41535

TABLE 3. Training and testing WSN-DS dataset.
TABLE 4. Training and testing CICIDS 2017 dataset.
Attack, Infiltration, Botnet and ’DDoS’. They also

claimed that their dataset covers 11 important criteria
which were discussed by [66]. The detailed informa-
tion of CICIDS 2017 dataset is reported in Table 4.
We have randomly chosen 20,000 connection records in
NIDS dataset and passed them into t-SNE [63] and their
visual representations are given in Figure 2 for KDDCup
99 and Figure 3 for CICIDS 2017. Almost both the datasets
are non-linearly separable and the connection records of
CICIDS 2017 is considered as more complex in comparison
with KDDCup 99. Also, CICIDS 2017 dataset is released
recently and contain attacks which have occurred recently.
Moreover, the CICIDS 2017 dataset has the characteristics of
a real-time network traffic.
B. DATASETS USED IN HIDS FIGURE 2. t-SNE visualization of KDDCup 99.
The windows and Linux operating system are the most

well-known and most commonly used operating system (OS).
Most commonly used datasets for host-based intrusion
datasets were compiled decades ago and do not include the
detection are KDDCup 98, KDDCup 991 and UNM.2 These
attacks of modern computer systems. The issues of these
1 http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html datasets were discussed in detail by [62] and proposed a new
2 https://www.cs.unm.edu/ immsec/systemcalls.htm dataset called as ADFA-LD and ADFA-WD.
41536 VOLUME 7, 2019

TABLE 7. Types of attacks in ADFA-LD dataset.
The ADFA-WD comprises of system calls and dynamic

link library (DLL) for various attacks. This was
collected in Windows XP SP2 host using Procmon
FIGURE 3. t-SNE visualization of CICIDS 2017. program [62]. The system setup enabled default fire-
wall and Norton AV 2013. The system was open for
file sharing and to run different applications like FTP
TABLE 5. Training and testing Kyoto dataset.
server, streaming media server, database server, web
server, PDF server etc. Using Metasploit framework
and other custom approaches 12 different known vul-
nerabilities were exploited for installed applications.
The detailed statistics of the ADFA-WD dataset is
reported in Table 6.
TABLE 6. ADFA-LD and ADFA-WD datasets. VII. EXPERIMENTAL DESIGN

All the experiments were implemented using Python on an
Ubuntu 14.0.4 LTS. All classical machine learning algo-
rithms were implemented using Scikit-learn.3 Deep neural
networks (DNNs) was implemented using GPU enabled Ten-
sorFlow4 as backend with Keras5 higher level framework.
The GPU was NVidia GK110BGL Tesla K40 and CPU had
a configuration (32 GB RAM, 2 TB hard disk, Intel(R)
Xeon(R) CPU E3-1220 v3 @ 3.10GHz) running over 1 Gbps
Ethernet network. To evaluate the performance of DNNs
1) ADFA Linux (ADFA-LD)/ADFA Windows(ADFA- and various classical machine learning classifiers on various
WD): ADFA-LD is a sequence of system calls dataset NIDS and HIDS datasets, the following different test cases
which was collected in networks of modern operating were considered.
systems [62]. These hosts were connected to a Linux 1) Classifying the network connection record as either
local server. This comprised of several traces of system benign or attack with all features.
calls which were collected under different situations. 2) Classifying the network connection record as either
This impersonated the real-time situations. These sys- benign or attack and categorizing an attack into its
tem call traces represented system level vulnerabili- categories with all features.
ties and attacks. This server permitted many services 3) Classifying the network connection record as either
such as remote access, web server, database on an benign or attack and categorizing an attack into its
operating system Ubuntu 11.04 (Linux kernel 2.6.38). categories with minimal features.
The SSH, FTP and MySQL 14.14 services used their
default ports. Apache 2.2.17 and PHP 5.3.5 and Tiki- A. FINDING OPTIMAL PARAMETERS IN DNNs
Wiki 8.1 are installed as web based services and web
As DNNs are parametrized, the performance depends on the
based collaborative tools respectively. The detailed
optimal parameters. The optimal parameter determination for
statistics of ADFA-LD is reported in Table 6. The attack
DNNs network parameter and DNNs network topologies was
types used in ADFA-LD dataset collection is reported
done only for KDDCup 99 dataset. To identify the ideal
in Table 7. The attack dataset of ADFA-LD is randomly
split into 55% training and 45% testing. The normal 3 https://scikit-learn.org/stable/
traces of training and validation data are merged and 4 https://www.tensorflow.org/
randomly split for the 55% training and 45% testing. 5 https://keras.io/
VOLUME 7, 2019 41537

parameter for the DNNs, a medium sized architecture was In order to find an optimal learning rate, three trials of
used for experiments with a specific hidden units, learning experiments for 500 epochs with learning rate varying in
rate and activation function. A medium sized DNN contains the range [0.01-0.5] were run. These experiments used a
3 layers. One is input layer, second one is hidden layer or fully medium sized DNN with 1,024 units. Learning rate has a
connected layer and third one is output layer. For KDDCup strong impact on the training speed, thus we had selected
99, the input layer contains 41 neurons, hidden layer contains the range [0.01-0.5]. The peak value for attack detection rate
128, 256, 384, 512, 640, 768, 896 and 1,024 units and output was obtained when the learning rate was 0.1. There was
layer contains 1 neuron in classifying the connection record a quick decrease in attack detection rate when the learn-
as either normal or attack. It contains 5 neurons in classi- ing rate was 0.2 and reached to peak accuracy at learn-
fying the connection record as either normal or attack and ing rates of 0.35, 0.45 and 0.45 compared to learning rate
categorizing attack into corresponding attack categories. The 0.1. This attack detection rate was intensified by running
connection between the units between input layer and hidden the experiments till 1,000 epochs. As we had examined
layer and hidden layer to output layer are fully connected. more complex architectures for this experiment, it showed
Initially, the train and test datasets were normalized using L2 less performance for epochs less than 500, henceforth we
normalization. Two trials of experiments were run for hidden decided to use the learning rate 0.1 for the rest of the
units 128, 256, 384, 512, 640, 768, 896 and 1,024 with a experimentation process, because learning rate greater than
medium sized DNN. The experiment was run for each param- 0.1 was found to be time consuming. To find the opti-
eter with appropriate units and for 300 epochs. The DNN mal learning rate for Multi-class classification of KDDCup
with various units have learnt the patterns of normal con- 99, we run two trials of experiments for learning rate in
nection records with epochs 200 in comparison to the those the range [0.01-0.5]. The experiments with lower learning
with attacks. To capture the significant features which can rate 0.01 showed better performance for attacks ’DoS’ and
distinguish the attack connection record by DNN, 200 epochs ’Probe’. When we increase the learning rate from 0.01,
were required. After 200 epochs, the performance of nor- the attack detection rate remained same. Experiments with
mal connection records fluctuated due to overfitting. Each learning rate 0.1 has showed optimal performance in detec-
number of units with DNN took varied number of epochs to tion of ’R2L’ and ’U2R’ attacks category. Based on the
attain considerable performance. It was found that the layer observations of performance of detection of attacks in both
containing 1,024 units had shown highest number of attack Binary and Multi-class classification, the learning rate was set
detection rates. When we increased the number of hidden to 0.1
units from 1,024 to 2,048, the performance in attack detection In third trial of experiments, we had also run experi-
rate deteriorated. Hence, we decided to use 1,024 units for ments with the sigmoid and tanh activation functions for
the rest of the experiments. The medium sized DNN with both the Binary and Multi-class classification. These func-
1,024 units in hidden layer was used for experiments with tions achieved good attack detection rates than the ReLU for
Multi-class classification of KDDCup 99 using three trials 100 epochs in Binary classification. When the same set of
of experiments for each hidden units until 500 epochs as experiments were run for 500 epochs the experiments with
the DNN performed well in comparison to the other units. ReLU activation function performed better than the sigmoid
A medium sized DNN with less number of units, 128, 256 and and tanh activation functions. In the case of Multi-class
384 learned the patterns of high frequency attack, ’DoS’. The classification, the performance of ReLU was good in com-
performance in detection of ’DoS’ attack remained same for parison to the sigmoid and tanh activation functions. Thus,
other variants in the number of units of a medium sized DNN. we decided to set the ReLU activation function for the rest
Due to the large number of connection records of ’DoS’, of the experiments. All the models were trained using adam
DNN network with less number of units was able to achieve optimizer with a batch size of 64 for 500 epochs to monitor
optimal detection rate. The acceptable detection rate for validation accuracy.
’Probe’ category of attacks was found with units 512 and 640.
A medium sized DNN with hidden units of 896 performed B. FINDING AN OPTIMAL NETWORK TOPOLOGY OF DNN
well for detection of ’R2L’ attacks in comparison to the other The following network topologies were used to choose
units. A medium sized DNN required 1,024 units to detect the best network topology for training an IDS model with
the attacks of ’U2R’. Once the number of units increased KDDCup 99.
from 1,024, the performance of DNN network deteriorated.
By considering all these test experiments, 1,024 was set as 1) DNN 1 layer
the ideal hidden layer units. To achieve a considerable perfor- 2) DNN 2 layers
mance, each DNN network topology required varied number 3) DNN 3 layers
of epochs. DNN network with less number of parameters 4) DNN 4 layers
have achieved good performance till 100 epochs but when it 5) DNN 5 layers
reaches 500 epochs, complex DNN networks have performed For all the above network topologies, we had run 3 trials
well in comparison to the DNN network with less number of of experimentation for 300 epochs each. We observed that
parameters. most of the deep learning architectures learnt the Normal
41538 VOLUME 7, 2019

TABLE 8. Configuration of proposed DNN model.
category patterns of input data for epochs less than 400, C. PROPOSED DNN ARCHITECTURE
while the number of epochs required for discovering Attack This work proposes a unique DNN architecture for NIDS
category was fluctuating. Both the DNN 1 layer and DNN and HIDS composed of an input layer, 5 hidden layers and
2 layers networks have completely failed to learn the attack an output layer. The hierarchical layers in the DNN facili-
categories of ’R2L’ and ’U2R’. The performance of ’DoS’ tate to extract highly complex features and do better pattern
and ’Probe’ attack categories was good with DNN 3 layers in recognition capabilities in IDS data. Each layer estimates
comparison to the DNN 2 layers and DNN 1 layer. The com- non-linear features that are passed to the next layer and the
plex network architectures required a large number of epochs last layer in the DNN performs the classification. An input
in order to reach the optimum accuracy. The performance layer contains 41 neurons for KDDCup 99, 41 neurons for
of DNN 5 layers for various Attack categories and Normal NSL-KDD, 43 neurons for UNSW-NB15, 17 neurons for
category was good as compared to other DNNs network WSN-DS and 77 neurons for CICIDS 2017. An output layer
topologies. By considering all these factors, we’ve decided to contains 1 neuron for Binary classification for all types of
use 5 layer DNNs network for the remaining experimentation datasets and 5 neurons for Multi-class classification in KDD-
process. Cup 99, 5 neurons for NSL-KDD, 10 neurons for UNSW-
In order to increase the speed of training and to avert NB15, 5 neurons for WSN-DS and 8 neurons for CICIDS
over fitting, we used batch normalization and dropout 2017. The detailed information and configuration details of
(0.01) approach. When we run experiments without dropout, the DNN architecture is shown in Table 8. The DNN is
the models ends up in over fitting. Also, the experiments with trained using the backpropogation mechanism [60]. Gener-
batch normalization achieved better results in comparison to ally, the units in input to hidden layer and hidden to output
the networks without batch normalization [10]. For HIDS, layer are fully connected. The DNN is composed of various
the best performed DNNs in NIDS as such used. In HIDS, components, a brief description of each component is given
we had followed hyper parameter tuning methods only in below.
conversion of system calls into numeric representation. For Fully connected layer: This layer is called as fully con-
N-gram, the 3 trials of experiments were run with 1-gram, nected layer since the units in this layer have connection to
2-gram, 3-gram and 4-gram with different DNNs network every other unit in the succeeding layer. Generally, the fully
topologies. DNNs network topologies with 3-gram system connected layers map the data into high dimensions. The
call representation performed well in comparison to the other output will be more accurate, when the dimension of data is
N-gram system call representation. more. It uses ReLU as the non-linear activation function.
VOLUME 7, 2019 41539

Batch Normalization and Regularization: Dropout system. The SHIA framework is basically decomposed into
(0.01) and Batch Normalization [10] was used in between two modules described below:
fully connected layers to obviate overfitting and speedup
1) Packet and system call processing module: In this
the DNN model training. A dropout removes neurons with
module, the networks which are to be monitored are
their connections randomly. In our alternative architectures,
connected to a single port mirroring switch, which
the DNNs could easily overfit the training data without regu-
larization even when trained on large number samples. replicates the flow of the entire network traffic of all
Classification: The last layer is a fully connected layer the switches. The proposed SHIA IDS is required to
monitor a network composed of different subnets, each
which uses sigmoid activation function for Binary classifi-
with n number of different machines. The monitored
cation and softmax activation function for Multi-class classi-
networks are hosts composed of computer machines
fication. The prediction loss function for sigmoid is defined
using Binary cross entropy and the prediction loss for softmax which allow users to communicate and transfer data.
is defined using the Categorical cross entropy as follows: All the traffic generated by the internet were collected,
without considering the internal traffic between net-
The prediction loss for Binary classification is estimated
works.
using Binary cross entropy given by,
The SHIA framework with the network where one of
N
1 X the switches is used as a port mirroring switch and
loss(pd, ed) = − [edi log pdi +(1−edi ) log(1−pdi )] connected to a traffic collector. This module collects
N
i=1 network traffic data using Netmap packet capturing
(20) tool and stores it in NoSQL database. Feature vectors
where pd is a vector of predicted probability for all samples are passed into the DNN module, and a copy of data
in testing dataset, ed is a vector of expected class label, values is passed into NoSQL database. Likewise, the system
are either 0 or 1. calls and configuration files are collected in a dis-
The prediction loss for Multi-class classification is esti- tributed manner by following the methodology pro-
mated using Categorical cross entropy given by, vided in [62]. These are passed to NoSQL database and
X followed by the text representation method to map the
loss(pd, ed) = − pd(x) log(ed(x)) (21) system calls to feature vectors. A copy of these feature
x
vectors are dumped into NoSQL database and these
where ed is true probability distribution, pd is predicted
feature vectors are again passed into the DNN module
probability distribution. We have used adam as an optimizer
for classification.
to minimize the loss of Binary cross entropy and Categorical
2) DNN module: From the experimental analysis,
cross entropy.
we found that the DNN performed well over other
VIII. SCALE-HYBRID-IDS-ALERTNET (SHIA) FRAMEWORK
algorithms in all cases of HIDS and NIDS. Thus, DNN
is used to model the network activities with the aim
An IDS has become an indispensable tool for any type of
organization or industry due to the wide growing nature of to detect the attacks more accurately. DNNs require
data and internet. There are many attempts that have been large volume of network and host-level events to learn
made to develop solutions for IDS. These methods used single the behaviors of legitimate and malicious activities.
To make the classifier more generalizable, the network
host for storage and computational resources and the algo-
traffic activities can be collected in different time with
rithms are not distributed. Using these legacy solutions, it is
different users in an isolated network. Finally, the DNN
entirely difficult to monitor and identify intrusions in today’s
network. This is due to high speed networks and attacks are module outputs are passed to Front End Broker. This
more and occurring rapidly. The legacy intrusion detection displays the results to the network admin. The imple-
methods which are existing in internet struggle to keep a mentation details regarding how to conduct a compre-
hensive experimental analysis will be considered as one
watch on the networks more efficiently. To address this, based
of the significant work directions towards future work.
on [61] our work leverages the distributed computing and dis-
tributed machine learning models to propose Scale-Hybrid-
IDS-AlertNet (SHIA) framework for an effective identifica- IX. RESULTS
tion of intrusions and attacks at both the network-level and Publicly available NIDS and HIDS datasets were used to
host-level. The framework provides a scalable design and acts evaluate the performance of classical machine learning and
as a distributed monitoring and reporting system. The SHIA DNNs in order to identify a baseline method. These datasets
enhances the computational performance using the charac- were separated into train and test datasets, and normalized
teristics of distributed computing and distributed machine using L2 normalization. Train datasets were used to train
learning algorithms using a hybrid system placed in different machine learning model and test datasets were used to eval-
locations in the network. The deployed system is designed uate the trained machine learning models. Train accuracy
to use the computational resource optimally and at the same of Multi-class using DNN for KDDCup 99, NSL-KDD and
time with low latency in response while monitoring a critical UNSW NB-15, WSN-DS are shown in Figure 4a and 4b
41540 VOLUME 7, 2019

FIGURE 4. Train accuracy. (a) KDDCup 99 and NSL-KDD. (b) UNSW-NB-15 and WSN-DS. (c) Visualization of 100 connection records with their
corresponding activation values of the last hidden layer neurons from Kyoto.
FIGURE 5. ROC curves of (a) KDDCup 99-using classical machine learning classifiers, (b) KDDCup 99-using DNNs, (c) NSL-KDD-using classical
machine learning classifiers, (d) NSL-KDD-using DNNs.
FIGURE 6. ROC curves of (a) UNSW-NB 15-using classical machine learning classifiers, (b) UNSW-NB 15-using DNNs, (c) Kyoto-using classical
machine learning classifiers, (d) Kyoto-using DNNs.
respectively. For KDDCup 99 and NSLKDD datasets, most obtained in terms of FPR is less in comparison to other
of the DNN network topologies showed train accuracy in classical machine learning classifiers in all the datasets. The
the range 95% to 99%. For UNSW-NB15 and WSN-DS, experiments with 3-gram representation performed well as
all the DNN network topologies showed train accuracy in compared to 1-gram and 2-gram.
the range 65% to 75%. Several observations were extracted
including ROC curve. The ROC curve for KDDCup 99, A. PERFORMANCE COMPARISONS
NSL-KDD, UNSW NB-15, Kyoto, WSN-DS is shown in The detailed results for Binary as well as Multi-class clas-
Figure 5a, Figure 5b, Figure 5c, Figure 5d, Figure 6a, sification of various classical machine learning classifiers
Figure 6b, Figure 6c, Figure 6d, Figure 7a, Figure 7b and and DNNs are reported in Table 9, Table 10 and Table 11,
Figure 7c, Figure 7d respectively. In most of the cases, Table 12 respectively. In terms of accuracy noted that the DT,
DNN performed well in comparison to the classical machine AB and RF classifiers performed better than the other classi-
learning classifiers with AUC used as the standard metric. fiers namely LR, NB, KNN and SVM-rbf. Additionally, the
This indicates that the DNN obtained a highest TPR and a performance of DT, AB and RF classifiers remains the same
lowest FPR and in some cases close to 0. The performance range across different datasets. However, the performance
VOLUME 7, 2019 41541

FIGURE 7. ROC curves of (a) WSN-DS-using classical machine learning classifiers, (b) WSN-DS-using DNNs, (c) CICIDS 2017-using classical
machine learning classifiers, (d) CICIDS 2017-using DNNs.
TABLE 9. Test results of DNNs for Binary class classification. TABLE 10. Test results of DNNs for Multi-class classification.
case of multi-class classification the performance of AB is

less in compared to DT and RF but performed better than
the LR, NB, KNN and SVM-rbf. This is due to the fact that
both AB and SVM are not directly applicable for multi-class
classification problems. In multi-class, we deal with strength-
of LR, NB, KNN and SVM-rbf are varied across different ening the classifier in identifying each individual attacks.
datasets. This indicates that the DT, AB and RF classifiers Experiments on KDDCup 99 and NSL-KDD, all the classical
are generalizable and can detect new attacks. While in the machine learning classifiers obtained less TPR for both ’R2L’
41542 VOLUME 7, 2019

TABLE 11. Test results of various classical machine learning classifiers for TABLE 12. Test results of various classical machine learning classifiers for
binary class classification. Multi-class classification.
Thus during training, the classifiers gives less preference for

these attack categories. In terms of accuracy, the performance
of the DNN is clearly superior to that of classical machine
learning algorithms, often by a large margin in both Binary
and Multi-class classification. Moreover, with DNN network
topologies, the performance in terms of accuracy is closer
to each other’s. For Multi-class classification, accuracy, true
and ’U2R’ in compared to the other categories such as ’DoS’ positive rate (TPR) and false positive rate (FPR) are estimated
and ’Probe’. The primary reason is that both the categories of for each class. The detailed results are shown in Table 15 for
attacks contain very less number of samples in training sets. KDDCup 99, Table 16 for NSL-KDD, Table 17 for WSN-DS,
VOLUME 7, 2019 41543

FIGURE 8. Sailency map for a randomly chosen connection record from (a) KDDCup 99, (b) CICIDS 2017 and visualization of 100 connection
records with their corresponding activation values of the last hidden layer neurons from (c) KDDCup 99, and (d) NSLKDD.
TABLE 13. Test results using minimal feature sets. TABLE 14. Test results of host-based IDS.
Table 18 for UNSW-NB15 and Table 19 for CICIDS 2017.

The proposed method, DNN with 3 layer showed an accuracy
of 93.5% that varies -0.32 from the top performed method [3].
However, the proposed method contains less number of
parameters. Thus, the proposed method is computationally into the higher dimension. Each layer facilitates to under-
inexpensive. Moreover, the top performed method [3] uses stand the significant features towards classifying the network
LSTM, primarily LSTM has been used on raw data [10]. connection into either Normal or Attack and in categorizing
There may be chance that the performance of LSTM can the attack into their attack categories. To understand, visual-
degrade when it sees the real-time datasets or completely ize and analyze the results, the activation values are passed
unseen samples. The performance obtained in terms of FPR into t-SNE [63]. This converts the high dimensional activa-
is less comparatively to other classical machine learning clas- tion values into low dimensional using Principal Component
sifiers in all the datasets. Analysis (PCA). The low dimensional feature representation
False positive occurs when the IDS identifies a connection for KDDCup 99, NSLKDD and Kyoto is represented in
record as an attack when it is actually a normal traffic. False Figure 8c, Figure 8d and Figure 4c respectively. The connec-
negative occurs when the IDS fails to interpret a malicious tion records of ’Normal’, ’DoS’ and ’Probe’ have appeared
connection record as an attack. In these cases, IDS must completely in a different cluster in KDDCup 99. This shows
be carefully tuned to ensure that these are kept very low. that DNN has learnt the patterns which can distinguish the
The reported results can be further enhanced by carefully connection records of ’Normal’, ’DoS’ and ’Probe’. It has
following a hyper parameter selection method with highly not completely learnt the optimal features to distinguish the
complex DNN architecture. In experiments with HIDS, Keras connection records of ’U2R’ and ’R2L’. This is one of the
embedding performed better than the N-gram and tf-idf text reasons why few connection records of ’U2R’ and ’R2L’ have
representation on both HIDS datasets, as shown in Table 14. appeared in ’Probe’ attack cluster. This shows that the attacks
Generally, in DNNs, the network connection records are of ’Probe’, ’U2R’ and ’R2L’ have common characteristics.
propagated through more than one hidden layers to learn the For NSL-KDD, the DNN had same issue as KDDCup 99.
optimal features. Each hidden layer aims at mapping the data For Kyoto dataset, few attacks have appeared in clusters of
41544 VOLUME 7, 2019

TABLE 15. Detailed test results for Multi-class classification- KDDCup 99.
TABLE 16. Detailed test results for Multi-class classification- NSL-KDD.
TABLE 17. Detailed test results for Multi-class classification- WSN-DS.
normal connection records. This shows that they have similar has been followed. This uses Taylor expansion for the fea-
characteristics and it requires additional features to classify it tures of penultimate layer and finds the first order partial
correctly. derivative of the classification results before placing them
To know the importance of each feature and as well as through the softmax function. This helps to detect the signif-
to identify the significant features, the methodology of [69] icant features to distinguish the connection record as either
VOLUME 7, 2019 41545

TABLE 18. Detailed test results for Multi-class classification- UNSW-NB15.
Normal or Attack and categorize an Attack into its categories. KDDCup 99 dataset. Thus, the dataset has unique set of
The connection record which belongs to ’R2L’ is shown train and test connection records. Moreover, the connection
in Figure 8a. The features have similar characteristics as of records of NSL-KDD is highly non-linearly separable in
features of ’U2R’. For CICIDS 2017, the connection record comparison to the KDDCup 99. Moreover, the performance
which belongs to ’DoS’ is shown in Figure 8b. The features of both the classical machine learning classifiers and DNNs
have similar characteristics between the ’DoS’ and ’DDoS’. considerably less in comparison to the KDDCup 99 and
This shows that the dataset requires few more additional fea- NSL-KDD. On subset of CICIDS 2017, both the classi-
tures to classify the connection record to ’DoS’ and ’DDoS’ cal machine learning classifiers and DNNs performed well.
correctly. Thus, the proposed SHIA architecture in this work can work
Both classical machine learning classifiers and DNNs net- well in real-time. The performance of DNNs can be enhanced
works have performed well on KDDCup 99 in comparison by carefully following a hyper parameter techniques for
to the NSL-KDD. The NSL-KDD is a refined version of NSL-KDD and UNSW-NB 15.
41546 VOLUME 7, 2019

TABLE 19. Detailed test results for Multi-class classification- CICIDS 2017.
The detailed test results of ADFA-LD and ADFA-WD B. IMPORTANCE OF MINIMAL FEATURE SETS
datasets are reported in Table 14. N-gram and Keras embed- Feature selection is an important step for intrusion detection.
ding text representation methods are used to transform sys- It is an important step in order to identify the various types of
tem call into numeric vectors with DNNs. For comparative attacks more accurately. Without feature selection, there may
study, tf-idf is used as system call representation method with be a possibility in misclassification of attacks and it would
classical machine learning classifier, SVM. The performance take a large time to train a model [70]. The significance of
of N-gram and Keras embedding is good in compared to feature selection method was discussed in detail for intru-
tf-idf. This is due to the fact that both N-gram and Keras sion detection using NLS-KDD dataset [71]. They reported
embedding have the capability to preserve the sequence infor- that the feature selection method significantly reduces the
mation of the system calls. Moreover, the Keras embedding training and testing time and also showed improved intru-
performed well over N-gram representation method. This sion detection rate. To evaluate the performance of vari-
is due to the fact that it facilitates to capture the relation ous DNN topologies and static machine learning classifiers,
among system calls. To choose the best value for N in two trials of experiments are run on minimal feature sets
N-gram, the experiments are run with 1, 2 and 3 gram. on KDDCup 99 and NSL-KDD [3]. The detailed results are
When the N is increased from 3 to 4, the performance reported in Table 13. The experiments with 11 and 8 fea-
reduced. The experiments with 3-gram representation per- ture sets performed well in comparison to the experiments
formed well in compared to 1-gram and 2-gram. However, with 4 feature set. Moreover, experiments with 11 feature
the N-gram and tf-idf representation produces very large sets performed well in comparison to the 8 feature set. The
matrix and in some cases this type of matrix can be sparse. difference in performance between 11 and 8 minimal feature
Thus there may be chance that using any classifier it is very sets is marginal.
difficult to achieve best performance on the sparse repre-
sentation. An additional advantage of Keras embedding is X. CONCLUSIONS AND FUTURE WORK
that the weights of the embedding layer is updated during In this paper, we proposed a hybrid intrusion detection alert
backpropogation. system using a highly scalable framework on commodity
VOLUME 7, 2019 41547

hardware server which has the capability to analyze the [8] M. Tang, M. Alazab, Y. Luo, and M. Donlon, ‘‘Disclosure of cyber secu-
network and host-level activities. The framework employed rity vulnerabilities: time series modelling,’’ Int. J. Electron. Secur. Digit.
Forensics, vol. 10, no. 3, pp. 255–275, 2018.
distributed deep learning model with DNNs for handling and [9] V. Paxson, ‘‘Bro: A system for detecting network intruders in real-
analyzing very large scale data in real-time. The DNN model time,’’ Comput. Netw., vol. 31, nos. 23–24, pp. 2435–2463, 1999.
was chosen by comprehensively evaluating their performance doi: 10.1016/S1389-1286(99)00112-7.
[10] Y. LeCun, Y. Bengio, and G. Hinton, ‘‘Deep learning,’’ Nature, vol. 521,
in comparison to classical machine learning classifiers on no. 7553, p. 436, 2015.
various benchmark IDS datasets. In addition, we collected [11] Y. Xin et al., ‘‘Machine learning and deep learning methods for cyberse-
curity,’’ IEEE Access, vol. 6, pp. 35365–35381, 2018.
host-based and network-based features in real-time and [12] S. A. Hofmeyr, S. Forrest, and A. Somayaji, ‘‘Intrusion detection using
employed the proposed DNN model for detecting attacks and sequences of system calls,’’ J. Comput. Secur., vol. 6, no. 3, pp. 151–180,
intrusions. In all the cases, we observed that DNNs exceeded 1998.
[13] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff, ‘‘A sense of
in performance when compared to the classical machine self for unix processes,’’ in Proc. IEEE Symp. Secur. Privacy, May 1996,
learning classifiers. Our proposed architecture is able to per- pp. 120–128.
form better than previously implemented classical machine [14] N. Hubballi, S. Biswas, and S. Nandi, ‘‘Sequencegram: n-gram modeling
of system calls for program based anomaly detection,’’ in Proc. 3rd Int.
learning classifiers in both HIDS and NIDS. To the best of Conf. Commun. Syst. Netw. (COMSNETS), Jan. 2011, pp. 1–10.
our knowledge this is the only framework which has the [15] N. Hubballi, ‘‘Pairgram: Modeling frequency information of lookahead
capability to collect network-level and host-level activities pairs for system call based anomaly detection,’’ in Proc. 4th Int. Conf.
Commun. Syst. Netw. (COMSNETS), Jan. 2012, pp. 1–10.
in a distributed manner using DNNs to detect attack more [16] H. Kozushko, ‘‘Intrusion detection: Host-based and network-based intru-
accurately. sion detection systems,’’ Independ. Study, New Mexico Inst. Mining Tech-
The performance of the proposed framework can be fur- nol., Socorro, NM, USA, 2003.
[17] W. Lee and S. J. Stolfo, ‘‘A framework for constructing features and models
ther enhanced by adding a module for monitoring the DNS for intrusion detection systems,’’ ACM Trans. Inf. Syst. Secur., vol. 3, no. 4,
and BGP events in the networks. The execution time of the 2000, Art. no. 227261. doi: 10.1145/382912.382914.
proposed system can be enhanced by adding more nodes to [18] A. Ozgur and H. Erdem, ‘‘A review of KDD99 dataset usage in intrusion
detection and machine learning between 2010 and 2015,’’ PeerJ PrePrints,
the existing cluster. In addition, the proposed system does vol. 4, Apr. 2016, Art. no. e1954.
not give detailed information on the structure and charac- [19] R. Agarwal and M. V. Joshi, ‘‘PNrule: A new framework for learning
teristics of the malware. Overall, the performance can be classifier models in data mining,’’ Dept. Comput. Sci., Univ. Minnesota,
Minneapolis, MN, USA, Tech. Rep. TR 00-015, 2000.
further improved by training complex DNNs architectures [20] H. G. Kayacik, A. N. Zincir-Heywood, and M. I. Heywood, ‘‘Selecting
on advanced hardware through distributed approach. Due to features for intrusion detection: A feature relevance analysis on KDD 99
extensive computational cost associated with complex DNNs intrusion detection datasets,’’ Proc. 3rd Annu. Conf. Privacy, Secur. Trust,
2005, pp. 12–14.
architectures, they were not trained in this research using the [21] J. Zhang, M. Zulkernine, and A. Haque, ‘‘Random-forests-based network
benchmark IDS datasets. This will be an important task in intrusion detection systems,’’ IEEE Trans. Syst., Man, Cybern. C, Appl.
Rev., vol. 38, no. 5, pp. 649–659, Sep. 2008.
an adversarial environment and is considered as one of the [22] S. Huda, J. Abawajy, M. Alazab, M. Abdollalihian, R. Islam, and
significant directions for future work. J. Yearwood, ‘‘Hybrids of support vector machine wrapper and filter based
framework for malware detection,’’ Future Gener. Comput. Syst., vol. 55,
pp. 376–390, Feb. 2016.
ACKNOWLEDGMENT [23] M. Alazab et al., ‘‘A hybrid wrapper-filter approach for Malware detec-
The authors would like to thank NVIDIA India, for the GPU tion,’’ J. Netw., vol. 9, no. 11, pp. 2878–2891, 2014.
[24] W. Hu, W. Hu, and S. Maybank, ‘‘AdaBoost-based algorithm for network
hardware support to research grant. They would also like intrusion detection,’’ IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 38,
to thank Computational Engineering and Networking (CEN) no. 2, pp. 577–583, Apr. 2008.
department for encouraging the research. [25] L. Ertöz, M. Steinbach, and V. Kumar, ‘‘Finding clusters of different sizes,
shapes, and densities in noisy, high dimensional data,’’ in Proc. SIAM Int.
Conf. Data Mining, 2013, pp. 47–58.
REFERENCES [26] N. B. Amor, S. Benferhat, and Z. Elouedi, ‘‘Naive Bayesian networks
in intrusion detection systems,’’ in Proc. 23rd Workshop Probabilistic
[1] B. Mukherjee, L. T. Heberlein, and K. N. Levitt, ‘‘Network intrusion Graph. Models Classification, 14th Eur. Conf. Mach. Learn. (ECML) 7th
detection,’’ IEEE Netw., vol. 8, no. 3, pp. 26–41, May 1994. Eur. Conf. Princ. Pract. Knowl. Discovery Databases (PKDD), Cavtat-
[2] D. Larson, ‘‘Distributed denial of service attacks–holding back the flood,’’ Dubrovnik, Croatia, 2003, p. 11.
Netw. Secur., vol. 2016, no. 3, pp. 5–7, 2016. [27] A. Valdes and K. Skinner, ‘‘Adaptive, model-based monitoring for cyber
[3] R. C. Staudemeyer, ‘‘Applying long short-term memory recurrent neural attack detection,’’ in Proc. Int. Workshop Recent Adv. Intrusion Detection.
networks to intrusion detection,’’ South Afr. Comput. J., vol. 56, no. 1, Berlin, Germany: Springer, Oct. 2000, pp. 80–93.
pp. 136–154, 2015. [28] D.-Y. Yeung and C. Chow, ‘‘Parzen-window network intrusion detectors,’’
[4] S. Venkatraman and M. Alazab, ‘‘Use of data visualisation for in Proc. 16th Int. Conf. Pattern Recognit., vol. 4, Aug. 2002, pp. 385–388.
zero-day Malware detection,’’ Secur. Commun. Netw., vol. 2018, [29] W. Li, ‘‘Using genetic algorithm for network intrusion detection,’’ in Proc.
Dec. 2018, Art. no. 1728303. [Online]. Available: https://doi.org/10.1155/ United States Dept. Energy Cyber Secur. Group Training Conf., 2004,
2018/1728303 pp. 24–27.
[5] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, ‘‘A detailed [30] L. Didaci, G. Giacinto, and F. Roli, ‘‘Ensemble learning for intrusion detec-
investigation and analysis of using machine learning techniques for intru- tion in computer networks,’’ in Proc. Workshop Mach. Learn. Methods
sion detection,’’ IEEE Commun. Surveys Tuts., to be published. doi: Appl., Siena, Italy, 2002, pp. 1–11.
10.1109/comst.2018.2847722. [31] C. Kolias, G. Kambourakis, and M. Maragoudakis, ‘‘Swarm intelligence in
[6] A. Azab, M. Alazab, and M. Aiash, ‘‘Machine learning based botnet intrusion detection: A survey,’’ Comput. Secur., vol. 30, no. 8, pp. 625–642,
identification traffic,’’ in Proc. 15th IEEE Int. Conf. Trust, Secur. Privacy 2011. doi: 10.1016/j.cose.2011.08.009.
Comput. Commun. (Trustcom), Tianjin, China, Aug. 2016, pp. 1788–1794. [32] C. yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru-
[7] R. Vinayakumar. (Jan. 2, 2019). Vinayakumarr/Intrusion-Detection V1 sion detection using recurrent neural networks,’’ IEEE Access, vol. 5,
(Version V1). [Online]. Available: http://doi.org/10.5281/zenodo.2544036 pp. 21954–21961, 2017.
41548 VOLUME 7, 2019

[33] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, ‘‘A deep learning [55] M. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, ‘‘A detailed analysis
approach for network intrusion detection system,’’ in Proc. 9th EAI of the KDD CUP 99 data set,’’ in Proc. 2nd IEEE Symp. Comput. Intell.
Int. Conf. Bio-Inspired Inf. Commun. Technol. (BIONETICS), 2016, Secur. Defence Appl., Jul. 2009, pp. 1–6.
pp. 21–26. [56] A. Shiravi, H. Shiravi, M. Tavallaee, and A. A. Ghorbani, ‘‘Toward
[34] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, ‘‘Long short term developing a systematic approach to generate benchmark datasets for
memory recurrent neural network classifier for intrusion detection,’’ intrusion detection,’’ Comput. Secur., vol. 31, no. 3, pp. 357–374,
in Proc.Int. Conf. Platform Technol. Service (PlatCon), Feb. 2016, 2012.
pp. 1–5. [57] N. Moustafa and J. Slay, ‘‘UNSW-NB15: A comprehensive data set for
[35] F. A. B. H. Ali and Y. Y. Len, ‘‘Development of host based intrusion network intrusion detection systems (UNSW-NB15 network data set),’’
detection system for log files,’’ in Proc. IEEE Symp. Bus., Eng. Ind. Appl. in Proc. IEEE Mil. Commun. Inf. Syst. Conf. (MilCIS), Nov. 2015,
(ISBEIA), Sep. 2011, pp. 281–285. pp. 1–6.
[36] M. Topallar, M. O. Depren, E. Anarim, and K. Ciliz, ‘‘Host-based intrusion [58] J. Song et al., ‘‘Statistical analysis of honeypot data and building of Kyoto
detection by monitoring Windows registry accesses,’’ in Proc. IEEE 12th 2006+ dataset for NIDS evaluation,’’ in Proc. 1st Workshop Building Anal.
Signal Process. Commun. Appl. Conf., Apr. 2004, pp. 728–731. Datasets Gathering Exper. Returns Secur., 2011, pp. 29–36.
[37] E. Aghaei and G. Serpen, ‘‘Ensemble classifier for misuse detection using [59] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ‘‘Toward generat-
N-gram feature vectors through operating system call traces,’’ Int. J. ing a new intrusion detection dataset and intrusion traffic characteriza-
Hybrid Intell. Syst., vol. 14, no. 3, pp. 141–154, 2017. tion,’’ in Proc. 4th Int. Conf. Inf. Syst. Secur. Privacy (ICISSP), 2018,
[38] B. Borisaniya and D. Patel, ‘‘Evaluation of modified vector space repre- pp. 1–8.
sentation using ADFA-LD and ADFA-WD datasets,’’ J. Inf. Secur., vol. 6, [60] H. Leung and S. Haykin, ‘‘The complex backpropagation algorithm,’’
no. 3, p. 250, 2015. IEEE Trans. Signal Process., vol. 39, no. 9, pp. 2101–2104, Sep. 1991.
[39] M. Xie, J. Hu, X. Yu, and E. Chang, ‘‘Evaluating host-based anomaly [61] M. Alazab, ‘‘Profiling and classifying the behavior of malicious codes,’’
detection systems: Application of the frequency-based algorithms to J. Syst. Softw., vol. 100, pp. 91–102, Feb. 2015.
ADFA-LD,’’ in Proc. Int. Conf. Netw. Syst. Secur. Cham, Switzerland: [62] G. Creech and J. Hu, ‘‘A semantic approach to host-based intrusion detec-
Springer, 2014, pp. 542–549. tion systems using contiguousand discontiguous system call patterns,’’
[40] B. Subba, S. Biswas, and S. Karmakar, ‘‘Host based intrusion detection IEEE Trans. Comput., vol. 63, no. 4, pp. 807–819, Apr. 2014.
system using frequency analysis of n-gram terms,’’ in Proc. IEEE Region [63] L. Van der Maaten and G. Hinton, ‘‘Visualizing data using t-SNE,’’
10 Conf. (TENCON), Nov. 2017, pp. 2006–2011. J. Mach. Learn. Res., vol. 9, pp. 2579–2605, Nov. 2008.
[41] W. Haider, G. Creech, Y. Xie, and J. Hu, ‘‘Windows based data sets for [64] J. O. Nehinbe, ‘‘A critical evaluation of datasets for investigating IDSs and
evaluation of robustness of host based intrusion detection systems (IDS) IPSs researches,’’ in Proc. IEEE 10th Int. Conf. Cybern. Intell. Syst. (CIS),
to zero-day and stealth attacks,’’ Future Internet, vol. 8, no. 3, p. 29, Sep. 2011, pp. 92–97.
2016. [65] Z. Wang, ‘‘The applications of deep learning on traffic identification,’’
[42] G. Kim, H. Yi, J. Lee, Y. Paek, and S. Yoon. (2016). ‘‘LSTM- BlackHat USA, pp. 21–26, 2015.
based system-call language modeling and robust ensemble method for [66] A. Gharib, I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani,
designing host-based intrusion detection systems.’’ [Online]. Available: ‘‘An evaluation framework for intrusion detection dataset,’’ in Proc. Int.
https://arxiv.org/abs/1611.01726 Conf. Inf. Sci. Secur. (ICISS), Dec. 2016, pp. 1–6.
[43] M. Kezunovic, L. Xie, and S. Grijalva, ‘‘The role of big data in improving [67] S. Ioffe and C. Szegedy, ‘‘Batch normalization: Accelerating deep network
power system operation and protection,’’ in in Proc. IREP Symp. Bulk training by reducing internal covariate shift,’’ in Proc. Int. Conf. Mach.
Power Syst. Dyn. Control-IX Optim., Secur. Control Emerg. Power Grid Learn., 2015, pp. 448–456.
(IREP), Aug. 2013, pp. 1–9. [68] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdi-
[44] M. Tang, M. Alazab, and Y. Luo, ‘‘Big data for cybersecurity: Vulnera- nov, ‘‘Dropout: A simple way to prevent neural networks from overfitting,’’
bility disclosure trends and dependencies,’’ IEEE Trans. Big Data, to be J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, 2014.
published. doi: 10.1109/tbdata.2017.2723570. [69] K. Simonyan, A. Vedaldi, and A. Zisserman. (2013). ‘‘Deep inside con-
[45] R. Vinayakumar, P. Poornachandran, and K. P. Soman, ‘‘Scalable frame- volutional networks: Visualising image classification models and saliency
work for cyber threat situational awareness based on domain name systems maps.’’ [Online]. Available: https://arxiv.org/abs/1312.6034
data analysis,’’ in Big Data in Engineering Applications (Studies in Big [70] M. Alazab, S. Venkatraman, P. Watters, and M. Alazab, ‘‘Zero-day mal-
Data), vol. 44, S. Roy, P. Samui, R. Deo, and S. Ntalampiras, Eds. Singa- ware detection based on supervised learning algorithms of API call sig-
pore: Springer, 2018. natures,’’ in Proc. 9th Australas. Data Mining Conf., vol. 121, 2011,
[46] C. D. Manning, P. Raghavan, and H. Schutze, Introduction to Informa- pp. 171–182.
tion Retrieval. Cambridge, U.K.: Cambridge Univ. Press, 2008, Ch. 20, [71] A. Alazab, M. Hobbs, J. Abawajy, and M. Alazab, ‘‘Using feature selection
pp. 405–416. for intrusion detection system,’’ in Proc. Int. Symp. Commun. Inf. Technol.
[47] X. Glorot, A. Bordes, and Y. Bengio, ‘‘Deep sparse rectifier neural net- (ISCIT), Oct. 2012, pp. 296–301.
works,’’ in Proc. 14th Int. Conf. Artif. Intell. Statist., 2011, pp. 315–323. [72] T. Kim, B. Kang, M. Rho, S. Sezer, and E. G. Im, ‘‘A multimodal deep
[48] A. L. Maas, A. Y. Hannun, and A. Y. Ng, ‘‘Rectifier nonlinearities improve learning method for Android Malware detection using various features,’’
neural network acoustic models,’’ in Proc. ICML, 2013, vol. 30, no. 1, IEEE Trans. Inf. Forensics Secur., vol. 14, no. 3, pp. 773–788, Mar. 2019.
pp. 1–3. [73] A. Saracino, D. Sgandurra, G. Dini, and F. Martinelli, ‘‘Madam: Effective
[49] V. Nair and G. E. Hinton, ‘‘Rectified linear units improve restricted boltz- and efficient behavior-based android malware detection and prevention,’’
mann machines,’’ in Proc. 27th Int. Conf. Mach. Learn. (ICML), 2010, IEEE Trans. Dependable Secure Comput., vol. 15, no. 1, pp. 83–97,
pp. 807–814. Jan. 2018.
[50] M. V. Mahoney and P. K. Chan, ‘‘An analysis of the 1999 DARPA/Lincoln [74] S. Naseer et al., ‘‘Enhanced network anomaly detection based on deep
Laboratory evaluation data for network anomaly detection,’’ in Recent neural networks,’’ IEEE Access, vol. 6, pp. 48231–48246, 2018.
Advances in Intrusion Detection (Lecture Notes in Computer Science), [75] N. R. Sabar, X. Yi, and A. Song, ‘‘A bi-objective hyper-heuristic sup-
vol. 2820. Berlin, Germany: Springer, 2003, pp. 220–237. port vector machines for big data cyber-security,’’ IEEE Access, vol. 6,
[51] M. Sabhnani and G. Serpen, ‘‘Why machine learning algorithms fail in pp. 10421–10431, 2018.
misuse detection on KDD intrusion detection data set,’’ Intell. Data Anal., [76] W. Wang et al., ‘‘HAST-IDS: Learning hierarchical spatial-temporal fea-
vol. 8, no. 4, pp. 403–415, 2004. tures using deep neural networks to improve intrusion detection,’’ IEEE
[52] Y. Bouzida and F. Cuppens, ‘‘Neural networks vs. decision trees for intru- Access, vol. 6, pp. 1792–1806, 2018.
sion detection,’’ in Proc. IEEE/IST Workshop Monitoring, Attack Detection [77] S. Aditham and N. Ranganathan, ‘‘A system architecture for the detection
Mitigation (MonAM), Sep. 2006, pp. 1–29. of insider attacks in big data systems,’’ IEEE Trans. Dependable Secure
[53] S. T. Brugger and J. Chow, ‘‘An assessment of the DARPA IDS evaluation Comput., vol. 15, no. 6, pp. 974–987, Nov. 2018.
dataset using snort,’’ Dept. Comput. Sci., Univ. California, Davis, Davis, [78] M. N. Kurt, Y. Yılmaz, and X. Wang, ‘‘Real-time detection of hybrid and
CA, USA, Tech. Rep. CSE-2007-1, 2005. stealthy cyber-attacks in smart grid,’’ IEEE Trans. Inf. Forensics Security,
[54] J. McHugh, ‘‘Testing intrusion detection systems: A critique of the 1998 vol. 14, no. 2, pp. 498–513, Feb. 2019.
and 1999 DARPA intrusion detection system evaluations as performed [79] T. Zhang and Q. Zhu, ‘‘Distributed privacy-preserving collaborative intru-
by Lincoln Laboratory,’’ ACM Trans. Inf. Syst. Secur., vol. 3, no. 4, sion detection systems for VANETs,’’ IEEE Trans. Signal Inf. Process.
pp. 262–294, 2000. doi: 10.1145/382912.382923. Netw., vol. 4, no. 1, pp. 148–161, Mar. 2018.
VOLUME 7, 2019 41549

R. VINAYAKUMAR received the B.C.A. degree PRABAHARAN POORNACHANDRAN is

from the JSS College of Arts, Commerce and Sci- currently a Professor with Amrita Vishwa
ences, Mysore, in 2011, and the M.C.A. degree Vidyapeetham. He has more than two decades
from Amrita Vishwa Vidyapeetham, Mysore, of experience in computer science and security
in 2014. He is currently pursuing the Ph.D. degree areas. His areas of interests are malware, critical
in computational engineering and networking with infrastructure security, complex binary analysis,
the Amrita School of Engineering, Amrita Vishwa AI, and machine learning.
Vidyapeetham, Coimbatore, India. His Ph.D. the-
sis is on the application of machine learning
(sometimes deep learning) for cyber security and
also discusses the importance of natural language processing, image process-
ing, and big data analytics for cyber security. He has published several papers
in machine learning applied to cyber security. He has participated in several
international shared tasks and organized a shared task on detecting malicious AMEER AL-NEMRAT is currently a Senior Lec-
domain names (DMD 2018) as part of SSCC’18 and ICACCI’18. turer (Associate Professor) with the School of
Architecture, Computing and Engineering, Uni-
versity of East London (UEL). He is the Leader
of the Professional Doctorate in IS and M.Sc.
MAMOUN ALAZAB received the Ph.D. degree
Information Security and Computer Forensics Pro-
in computer science from the School of Science,
grams. He is also the Founder and the Director of
Information Technology and Engineering, Feder-
the Electronic Evidence Laboratory, UEL, where
ation University of Australia. He is currently an
he is involved in cybercrime projects with different
Associate Professor with the College of Engineer-
U.K. law enforcement agencies. His research inter-
ing, IT and Environment, Charles Darwin Univer-
ests include security, cybercrime, and digital forensics, where he has been
sity, Australia. He is a Cyber-Security Researcher
publishing research papers in peer-reviewed conferences and internationally
and Practitioner with industry and academic expe-
reputed journals.
rience. He works closely with government and
industry on many projects. He has published more
than 100 research papers. He has delivered many invited and keynote
speeches, 22 events in 2018 alone. His research interest is multidisci-
plinary that focuses on cyber security and digital forensics of computer
SITALAKSHMI VENKATRAMAN received the
systems, including current and emerging issues in the cyber environment like
M.Sc. degree in mathematics and the M.Tech.
cyber-physical systems and the Internet of Things with a focus on cybercrime
degree in computer science from IIT Madras,
detection and prevention. He convened and chaired more than 50 conferences
Madras, in 1985 and 1987, respectively, the M.Ed.
and workshops. He is an Editor on multiple Editorial Boards, including an
degree from The University of Sheffield, in 2001,
Associate Editor of the IEEE ACCESS (2017 Impact Factor 3.5), an Editor of
and the Ph.D. degree in computer science from
the SECURITY AND COMMUNICATION NETWORKS (2016 Impact Factor: 1.067), and
the National Institute of Industrial Engineering,
a Book Review Section Editor of the Journal of Digital Forensics, Security
in 1993. Her Ph.D. thesis was entitled Efficient
and Law.
Parallel Algorithms for Pattern Recognition.
She has more than 30 years of work experience
both in industry and academics and has been developing turnkey projects for
K. P. SOMAN has 25 years of research and teach- IT industry and teaching a variety of IT courses for tertiary institutions in
ing experience at the Amrita School of Engineer- India, Singapore, New Zealand, and Australia, since 2007. She is currently
ing, Coimbatore. He has around 150 publications the Discipline Leader and a Senior Lecturer in information technology
in national and international journals and confer- with Melbourne Polytechnic. She is specialized in applying efficient com-
ence proceedings. He has organized a series of puting models and data mining techniques for various industry problems
workshops and summer schools in advanced signal and recently in the e-health, e-security, and e-business domains through
processing using wavelets, kernel methods for pat- collaborations with industry and universities in Australia. She has published
tern classification, deep learning, and big-data ana- seven book chapters and more than 130 research papers in internationally
lytics for industry and academia. He has authored well-known refereed journals and conferences. She is a Senior Member
the books Insight Into Wavelets, Insight Into Data of professional societies and editorial boards of international journals and
Mining, Support Vector Machines and Other Kernel Methods, and Signal and serves as a Program Committee Member of several international conferences
Image Processing—The Sparse Way, published by Prentice Hall, New Delhi, every year.
and Elsevier.
41550 VOLUME 7, 2019

Deep Learning Approach For Intelligent Intrusion Detection System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deep Learning Approach For Intelligent Intrusion Detection System

Uploaded by

Copyright:

Available Formats

Received December 27, 2018, accepted January 3, 2019, date of current version April 11, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2895334

Deep Learning Approach for Intelligent Intrusion

Corresponding author: R. Vinayakumar (vinayakumarr77@gmail.com)

I. INTRODUCTION prone by various attacks from both internal and external

41526 VOLUME 7, 2019

II. STAGES OF COMPROMISE: AN ATTACKER’S VIEW III. RELATED WORKS

VOLUME 7, 2019 41527

41528 VOLUME 7, 2019

VOLUME 7, 2019 41529

41530 VOLUME 7, 2019

corresponding index. We adopt a fixed length vector

C. DEEP NEURAL NETWORK (DNN)

VOLUME 7, 2019 41531

defined as; well-defined times with well-defined protocols. Each con-

min L(θ ) (10) B. PROBLEM FORMULATION FOR HIDS

41532 VOLUME 7, 2019

VOLUME 7, 2019 41533

TABLE 2. Training and testing connection records of partial dataset of UNSW-NB15.

VOLUME 7, 2019 41535

TABLE 3. Training and testing WSN-DS dataset.

TABLE 4. Training and testing CICIDS 2017 dataset.

Attack, Infiltration, Botnet and ’DDoS’. They also

B. DATASETS USED IN HIDS FIGURE 2. t-SNE visualization of KDDCup 99.

The windows and Linux operating system are the most

41536 VOLUME 7, 2019

TABLE 7. Types of attacks in ADFA-LD dataset.

The ADFA-WD comprises of system calls and dynamic

TABLE 6. ADFA-LD and ADFA-WD datasets. VII. EXPERIMENTAL DESIGN

VOLUME 7, 2019 41537

41538 VOLUME 7, 2019

TABLE 8. Configuration of proposed DNN model.

VOLUME 7, 2019 41539

41540 VOLUME 7, 2019

VOLUME 7, 2019 41541

case of multi-class classification the performance of AB is

41542 VOLUME 7, 2019

Thus during training, the classifiers gives less preference for

VOLUME 7, 2019 41543

Table 18 for UNSW-NB15 and Table 19 for CICIDS 2017.

41544 VOLUME 7, 2019

TABLE 16. Detailed test results for Multi-class classification- NSL-KDD.

TABLE 17. Detailed test results for Multi-class classification- WSN-DS.

VOLUME 7, 2019 41545

TABLE 18. Detailed test results for Multi-class classification- UNSW-NB15.

41546 VOLUME 7, 2019

VOLUME 7, 2019 41547

41548 VOLUME 7, 2019

VOLUME 7, 2019 41549

R. VINAYAKUMAR received the B.C.A. degree PRABAHARAN POORNACHANDRAN is

41550 VOLUME 7, 2019

You might also like