You are on page 1of 50

JDMS

Applications

Journal of Defense Modeling and


Simulation: Applications,
Machine learning in cybersecurity: a Methodology, Technology
1–50

comprehensive survey Ó The Author(s) 2020


DOI: 10.1177/1548512920951275
journals.sagepub.com/home/dms

Dipankar Dasgupta1, Zahid Akhtar2 and Sajib Sen1

Abstract
Today’s world is highly network interconnected owing to the pervasiveness of small personal devices (e.g., smartphones)
as well as large computing devices or services (e.g., cloud computing or online banking), and thereby each passing minute
millions of data bytes are being generated, processed, exchanged, shared, and utilized to yield outcomes in specific appli-
cations. Thus, securing the data, machines (devices), and user’s privacy in cyberspace has become an utmost concern for
individuals, business organizations, and national governments. In recent years, machine learning (ML) has been widely
employed in cybersecurity, for example, intrusion or malware detection and biometric-based user authentication.
However, ML algorithms are vulnerable to attacks both in the training and testing phases, which usually leads to remark-
able performance decreases and security breaches. Comparatively, limited studies have been conducted to understand
the essence and degree of the vulnerabilities of ML techniques against security threats and their defensive mechanisms. It
is imperative to systematize recent works related to cybersecurity using ML to seek the attention of researchers, scien-
tists, and engineers. Therefore, in this paper, we provide a comprehensive survey of the works that have been carried
out most recently (from 2013 to 2018) on ML in cybersecurity, describing the basics of cyber-attacks and corresponding
defenses, the basics of the most commonly used ML algorithms, and proposed ML and data mining schemes for cyberse-
curity in terms of features, dimensionality reduction, and classification/detection techniques. In this context, this article
also provides an overview of adversarial ML, including the security characteristics of deep learning methods. Finally, open
issues and challenges in cybersecurity are highlighted and potential future research directions are discussed.

Keywords
Cybersecurity, machine learning, intrusion detection, deep neural network, adversarial examples, adversarial learning,
defensive techniques

1. Introduction
The advent of technologies ranging from smartphones to each year. For instance, in 2015, the United States Office
large-scale communication systems has resulted in an of Personnel Management (OPM) was attacked and infor-
exceptionally digital interconnected society and humon- mation for approximately 21.5 million government
gous usage of the internet. It is estimated that there are employees, such as names, social security numbers,
more than 5 billion smart devices and 3 billion internet addresses, etc., was stolen.2 Yahoo, the email providing
users in the world as of today.1 This cyber connectivity is system, suffered a cyber-attack in 2013, and almost 3 bil-
widely being used in a diverse set of applications, such as lion Yahoo email addresses were affected. In 2017,
online banking and shopping, email, documents or critical WannaCry, a ransomware attack, encrypted the contents
information sharing, video chatting, and gaming, to name
a few. Consequently, lots of data, in terabytes per second, 1
Center for Information Assurance (CfIA), University of Memphis, USA
are being created, processed, exchanged, and stored by dif- 2
Department of Network and Computer Security, State University of
ferent applications as well as the Internet of Things (IoT). New York (SUNY) Polytechnic Institute, USA
In fact, it is believed that 90% of the data in the world
Corresponding author:
today has been generated in the last two years alone.1 Dipankar Dasgupta, Department of Computer Science, University of
The increase in the use of the internet and related ser- Memphis, 333 Dunn Hall, 38152, USA.
vices has also raised the number of cyber-attack events Email: dasgupta@memphis.edu
2 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

of devices and demanded a payment in Bitcoin.3 A total of studied, respectively, insider threats and DL-based anomaly
US$7.4 million Bitcoin-style cryptocurrencies, Ethereum detection. Furthermore, Sultana et al.28 provided a review
and Ether, was stolen in 3 minutes from the Ethereum app on ML and DL security techniques in the IoTs, Gupta
in 2017. Equifax, a credit rating agency, announced in et al.29 surveyed the literature on phishing attacks, and
2017 that their server was attacked and 150 million peo- Wang et al.30 elucidated about the game-theoretic approach
ple’s personal information was breached. Also, GitHub, used for cybersecurity. All in all, most prior survey papers
the popular version control hosting service, suffered a usually did not cover all facets of ML and cybersecurity,
massive Denial of Service (DoS) attack in 2018.2 such as details of widely adopted algorithms, cyber-attacks,
Although cyber-attacks do not use any physical weap- and respective defenses methods, and details of cybersecur-
ons, they are the most dangerous and harmful weapons that ity datasets, challenges, and adversarial ML. This paper sig-
may cause revelation of the topmost classified information nificantly differs from the previous articles as it provides a
of government organizations through espionage or sensi- comprehensive overview of the basic cyber-attacks and
tive personal information through to phishing. According their defenses, the basics of ML algorithms, and research
to cybersecurity experts, just in 2017 cyber-attacks might works (from 2008 to 2010) in cybersecurity, adversarial
have caused US$5 billion worth of damage and will grow ML, datasets, and current challenges and future research
in the future, for example, damage may hit US$6 trillion directions. Namely, this article is an attempt to systematize
annually by 2021 (https://www.hackmageddon.com/). the knowledge on ML in cybersecurity by following guide-
Several counter-measures against cyber-attacks have been lines such as what is the objective or problem domain, what
introduced in recent decades, generally known as intrusion feature representations have been utilized, which (and why)
detection systems (IDSs).4–7 In recent years, computational algorithms have been adopted, what kind of datasets have
intelligence techniques, including machine learning (ML), been employed, and what potential adversarial attacks
deep learning (DL), and data mining (DM), have been uti- would be faced by the algorithms. Among the significant
lized to ensure cybersecurity.8,9 Despite remarkable prog- contributions of this survey article, we can cite the
ress in the use of computational intelligence techniques following:
and subsequent increase in performances, robustness
against cyber-attacks and insights of malicious samples • a description of the main cybersecurity attacks and
and attacks, computational intelligence in cybersecurity
corresponding defenses;
still needs to advance greatly, besides overcoming many • a general overview of well-adopted ML and DL
challenges, such as zero-day attacks. Moreover, there is
algorithms;
also a growing concern about the security and vulnerabil- • a survey of a wide range of ML in cybersecurity
ities of ML techniques against attacks.
following a systematic categorization with the use
Over the years, several survey papers (e.g., Kwon et al.,3
of algorithms, and feature extraction and output
Tong et al.,4 Gardiner and Nagaraja,5 Shanbhogue and
mapping schemes;
Beena,8 Liu et al.,9 Bou-Harb et al.,10 Luh et al.,11 Shabut
• an overview of work related to the security of ML
et al.,12 Humayed et al.,13 Wang et al.,14 Deng et al.,15 Bou-
(i.e., adversarial ML), including the vulnerability of
Harb,16 Beasley et al.,17 Ucci et al.,18 Ye et al.,19
DL methods against adversarial examples;
Bazrafshan et al.,20 Souri and Hosseini,21 Barriga and
• a synopsis of publicly available databases for cyber-
Yoo,22 Bontupalli and Taha,23 Buczak and Guven,24
security; and
Resende and Drummond,25 KishorWagh et al.,26 Liu
• a discussion of open issues and future research
et al.,27 Sultana et al.,28 Gupta et al.,29 and Wang et al.33)
directions for cybersecurity.
and case studies (e.g., Al-Enezi et al.31 and Alotaibi et al.32)
on the use of computational intelligence in cybersecurity
have been published, but with limited scope. For instance, The rest of the paper is structured as follows. Section 2
Humayed et al.,13 Wang et al.,14 Deng et al.,15 Bou-Harb,16 summarizes possible cyber-attacks and corresponding
and Beasley et al.17 discussed only cybersecurity related to defense methods. Most widely adopted ML/DL algorithms
cyber–physical systems (e.g., smart grid33) or specific appli- for cybersecurity are described in Section 3. Section 4 pro-
cations (e.g., block-chain34), while Ucci et al.,18 Ye et al.,19 vides a comprehensive review of all significant works
Bazrafshan et al.,20 Souri and Hosseini,21 and Barriga and (from 2013 to 2018) on ML in cybersecurity. Section 5
Yoo23 gave details about DM and ML techniques used for presents a survey of adversarial ML, including DL under
malware detection. Similarly, Bontupalli and Taha,23 adversarial examples. In Section 6, publicly available data-
Buczak and Guven,24 Resende and Drummond,25 and bases for cybersecurity research are described. Future
KishorWagh et al.26 focused only on the IDS and its related research directions and conclusions are described in
ML approaches, whereas Liu et al.35 and Kwon et al.3 Sections 7 and 8, respectively.
Dasgupta et al. 3

Figure 1. The cyber-attack taxonomy considered in this paper.36

2. Cyber-attacks and their defenses network resources, which opens a backdoor to attackers to
create mass destruction to the organization. Although
This section introduces the key notions related to cyber-
external attacks seem to make headlines all the time, insi-
attacks and defenses.
der misuse of resources is still a dominant problem for
companies and organizations throughout the world.
2.1. Cyber-attacks According to the Ponemon Institute, more than 25% and
28% of security breaches in the world, respectively, are
A cyber-attack is nothing but a way of compromising com-
caused by misuse of resources by employees and by sys-
puter functions to a victim’s network or having unauthor-
tem glitches inside the organization.37 At least 53% of glo-
ized digital access to a victim’s computer by removing the
bal security breaches last year were found within
barricades. According to the definition by the Institute for
organizations.37 The Man-in-the-Middle (MitM) attack is
Security Technology Studies at Dartmouth College, a
the best-known example of misuse of resources attacks.
cyber-attack is considered as an attack on a computer sys-
tem that compromises the confidentiality, integrity, or
MitM attack: this attack happens when a hacker
availability of the information in that system. Cyber-
inserts itself between a trusted connection of a trusted
attacks can be classified into different categories from dif-
client and its server. A common example of a MitM
ferent perspectives. In this article, the cyber-attacks are
attack is session hijacking. In this type of attack, an
grouped (shown in Figure 1), like in Abdulkareem et al.,36
attacker hijacks or inserts itself in a session between
based on the effects they cause to a system or its
the victim (trusted client for the server) and the server.
architecture.
Here, the attacker replaces the victim’s Internet
Protocol (IP) with their chosen IP and continues the
2.1.1. Misuse of resources attack. Inadvertently unaware or session with the server, where the server treats the
over-trusting employees in an organization trigger security attacker’s IP as its trusted client. In that process, the
breaches and provide access to organization data to attack- attacker’s computer disconnects the victim’s computer
ers. Sometimes, employees with the best intentions open and spoofs the victim’s sequence number as well as
an email or perform a wire transfer to individuals or access organization information. A pictorial view of a MitM
the virtual private network (VPN) through a company’s attack is given in Figure 2.
4 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Figure 2. An illustration of a Man-in-the-Middle attack.27 IP:


Internet Protocol.

2.1.2. User access compromise. Compromising user personal


information, for example, a password, is a commonly used
attack type. Some popular ways of gaining a user’s personal
information are sniffing into the network connections to Figure 3. An illustration of the phishing attack taxonomy.39
gain unencrypted passwords, social engineering attacks to DNS: Domain Name System.
gain access to the password database, or brute-force gues-
sing and dictionary attack. Other popular ways of compro-
mising user information are phishing and spear-phishing input data (e.g., data for login credentials) from the
attacks.29 A phishing attack is a trick38 to deceive users by client to the server. So, instead of using expected
trusting an email to obtain personal credentials or to pro- data for a post request, attackers use predefined
voke the user to perform some actions. The tricks involve SQL command. This command executes, reads
social engineering or some technical trickery, such as a link sensitive data from the database, occasionally can
to a legitimate website as an attachment to the email to modify sensitive data and/or run administrative
download malware or to hand over personal credentials to operations in-case the database missing read-only
the visiting website, whereas spear phishing is a more tar- access. For example, a dynamic SQL query for a
geted type of attack than a phishing attack. For this type of website on the internet may ask a user to input their
attack, attackers take time to do research about the targets account number to pull up associated information
and create personal and plausible messages. One of the sim- for that account from the database, for example,
plest ways of conducting a spear-phishing attack is email ‘‘SELECT * FROM users WHERE account = ‘‘ +
spoofing. Here, the attackers send emails to the victims as AccountNumber + ’’;.’’ Although this command
one of their management officials or someone known as works perfectly for a valid account number, it
their company partners. To create more credibility, attack- leaves a hole for attackers. For instance, if an
ers clone legitimate websites and make time to gain person- attacker provides a number ‘‘ or ‘2’ = ‘2’’’ like
ally identifiable information. The whole phishing attack this, then this number will result in a SQL query as
taxonomy is provided in Figure 3. below: ‘‘SELECT * FROM users WHERE account
= ‘2’ = ‘2’;’’ As ‘2’ = ‘2’,’’ which always gives
true, so instead of providing details for a single
2.1.3. Root access compromise. This attack is similar to a account, the database will provide the data of all
user compromise attack, but the difference here is that users.
instead of getting access to an individual host, attackers b) XSS attack: this type of attack makes victims
get access to the administrator’s account that has some run or download scripts from the attacker’s (third-
special privileges compared with others on the system. party) web resources in the victim’s web browser.
Generally, attackers insert some JavaScript code
2.1.4. Web access compromise. This attack is performed by with a payload into different websites. When a vic-
exploiting vulnerabilities on websites. Some common tim searches or requests information from one of
ways of web compromising attacks are the Structured such websites, the website processes the attacker’s
Query Language (SQL) injection attack and cross-site script with payload in the victim’s web browser.
scripting (XSS). The malicious script of the attacker can unfold the
victim’s cookie for the session, hijacking it to steal
a) SQL injection attack: this type of attack information such as log keystrokes. It also allows
happens in database-driven websites where attack- the attacker to access the victim’s machine remo-
ers insert SQL queries to the database through the tely. An overview of XSS is given in Figure 4.
Dasgupta et al. 5

Figure 4. An overview of a cross-site scripting attack.27

2.1.5. Malware attack. Malware (short for malicious soft- deceive users to install a trojan in their machine.
ware) is nothing but an unwanted software program. Compared to viruses and worms, a trojan does not
Malware has been used by cyber criminals for many years infect files in the host machine or replicate itself;
to achieve their goals, such as shutting down or destroying rather, it creates a backdoor for attackers to launch
a cyber–physical system, stealing sensitive large-scale a malicious program when necessary. Zeus, Dark
data, compromising a network or systems, injecting mali- comet, and Shedun Android malware are some
cious scripts, etc. Based on the goals of the attackers and examples of trojans.
their propagation frequency, malware may be categorized d) Spyware: spyware is used to spy on user activities
into several types.40,41 Among them, viruses, worms, tro- instead of launching an attack immediately. This
jans, spyware, ransomware, scareware, bots, and rootkits software is used to steal sensitive user information,
are common. such as login credentials, collecting keystrokes, etc.,
without the user’s consent or knowledge.
a) Viruses: like a live virus in the human body, e) Ransomware: this kind of malware is different to
viruses, a piece of malicious code, come with other others because its payload not only infects the vic-
host programs and end up corrupting files in the tim’s files but also creates a process to obtain a ran-
host machine as well as in a shared network. Some som from the victim. Generally, ransomware attacks
examples of malicious viruses are the Creeper virus are performed through a trojan. Some examples of
and the Melissa virus. ransomware are WannaCry, TorrentLocker, etc.
b) Worms: worms are different to viruses regarding
their propagation principle. Unlike viruses, worms
do not need a host machine to propagate. Worms 2.1.6. Denial of Service. The main purpose of this type of
are self-replicating and generally come with email cyber-attack is to completely destroy the normal operating
attachments. Moreover, worms do not corrupt files condition of a system or network. A DoS attack has three
in the host machine. Worms can create a DoS main categories.
attack by replicating themselves to all contacts of
the victim’s email and using the available network a) Host based: in host-based attacks, malware or
resources. Love Gate, CodeRed, SQL Slammer, worms are installed in host machines to execute
MyDoom, and Sorm worm are common examples their payload or operation to flood the complete
of worms. network, starting from the host, with an infinite
b) Trojan: a trojan is completely different to number of host requests.
viruses and worms, based on their purpose of b) Network based: instead of a host machine,
actions. Attackers use social engineering tricks to attackers target a complete network to run its
6 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Figure 5. Potential defenses against cyber-attacks, which have been depicted in Figure 1.31

payload and consequently halt the normal opera- rate; the disadvantages are the high false alarm rate
tion of the network. for unknown or zero attack types and that it is nec-
c) Distributed: a Distributed Denial of Service essary to maintain a large database. On the other
(DDoS) attack is performed generally from a host hand, anomaly-based detection keeps a profile for
machine as well as from a network to shut down all normal behaviors and detects anomalies based
the victim’s network completely. on the degree of deviation from this profile. The
advantage of anomaly-based detection is its ability
to detect unknown attacks; the disadvantages are
2.2. Cyber-attack defenses the high false alarm rate and inability to provide
In the previous section, we noticed how attackers can possible reasons for an irregularity.
spread their cyber-attacks through networks or the internet b) Data source-based method: intrusion from a
to serve their purpose. To defend against such attacks or to particular host can easily be detected by host-based
restrict those attacks to minimal damage, several defensive IDSs. The advantage of this technique is that the
mechanisms specific to particular attack areas have been IDSs can detect the behavior of the network objects,
developed. The system is where the defensive mechanisms such as programs, files, and ports, very precisely;
are installed, termed the IDS.23 Broadly, based on attack the limitations are that this system is dependent on
areas and their defensive types, the IDS is subdivided into host resources and is unable to detect network
three categories. Before discussion of any particular defen- anomalies or attacks. On the other hand, network-
sive mechanism (shown in Figure 5) for several attack based IDSs are not dependent on host resources and
areas, a brief overview of the IDS is given below. are deployed on major switches, such as routers.
The advantages of this system are that it is operating
2.2.1. Intrusion Detection System. As a broader class of system (OS) independent and it is able to identify
technology, the IDS consists of intrusion detection as well specific types of network protocol. The main limita-
as an intrusion prevention mechanism.23 The IDS is gener- tion is that this system is only dedicated to monitor
ally built with a combination of software and hardware to data passing through a specific network.
observe and control network activities within a network.
Based on the purpose and detection mechanism, the Although the whole taxonomy of IDSs, shown in Figure 6,
IDS can be classified into two different groups.42 There is outside the scope of our discussion, the details can be
are two types of IDS classification methods: the detection- found in Liu and Lang.42 However, classifications and con-
based method and the data source-based method. cepts related to detection-based method IDSs are directly
Detection-based methods are also sub-classified into related to our article.
misuse-based and anomaly-based detection, whereas data
source-based methods are classified into network-based 2.2.2. Defense against misuse of resources attack. To avoid
and host-based methods. and secure a network against misuse of resources attack
(e.g., a MitM attack) an anomaly-based IDS27 needs to be
a) Detection-based method: the other name installed. This system will monitor the network flows and
for signature-based detection is misuse detection. raise an alarm if anyone tries to hijack any network ses-
The idea behind this method is keeping the known sion. Although this technique performs well for a known
or seen attack behaviors as signatures in a data- attack, for a zero-day attack it provides a high false alarm
base. The advantage of the misuse detection rate. Thus, to be safe from a zero-day attack, it is advised
method is that it is fast and has a low false alarm to adopt preventive methods prior to the IDS. For example,
Dasgupta et al. 7

Figure 6. Taxonomy of intrusion detection systems (IDSs).42 The blue colored arrows indicate the topics of our interest. (Color
online only.)

using a VPN inside an organization’s network to access


resources.

2.2.3. Defense against user and root compromise attacks. A


phishing attack is a common example of a user and root
compromise attack. So, defense and prevention against
phishing attacks can secure user and root access to a sys-
tem. Several related defensive mechanism can be found in
the literature.39,43–45 In particular, Khonji et al.43 and
Almomani et al.44 proposed two defensive mechanisms
against user and root access attacks. Another defensive
mechanism is proposed by Almomani et al.,45 where the
authors presented an email-based filtering method as a
defensive mechanism for phishing attacks on both the ser-
ver side and the client side. The overall taxonomy of
Figure 7. A taxonomy of phishing attack detection
detecting a phishing attack is given in Figure 7 for
methodologies.39
clarification.

2.2.4. Defense against a web compromise attack. Website compromise attack (suck as a SQL injection attack, XSS
vulnerability is a honey spot for attackers to launch a web etc.). Both anomaly- and signature-based detection
8 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

mechanisms can be employed to defend against a web


compromise attack. Besides, up-to-date knowledge about
website vulnerability, proper patches for applications, and
secure coding practice to block database vulnerability are
also preventive measures for web comprise attacks.

2.2.5. Defenses against installed malware. Malware is now


an epidemic in the cyber world and is a preferable means
of attack for attackers. To defend against malware attacks,
malware detection techniques come as the first line of
Figure 8. A schematic overview of the behavior-based
defense. There are several malware classification tech-
malware detector.41
niques found in the literature.40,46,47 Based on how mal-
ware will be handled for detection, the detection
mechanism has three categories.
c) Heuristic based: although the behavior-based
detection method is more powerful than signature-
a) Signature based: this approach to detect mal- based, attackers still can bypass this approach
ware is very common and has been around for a through strong counter-measures (e.g., encryption,
long time. Several anti-malware companies (e.g., obfuscation, polymorphism, etc.). To overcome
products from Kaspersky, MacAfee) analyze mal- this limitation, researchers nowadays employ a
ware and create signatures (a short sequence of DM- and ML-based approach, termed the heuristic
bytes) from them. These signatures are used to pro- method. It is used to define rules/patterns chosen
vide safety to their users based on a pattern- by experts to differentiate benign and malware
matching algorithm. However, the main limitation executables. Generally, behavior like application
of this technique is that attackers are now obfuscat- programming interface (API) call sequences,
ing codes or changing some portion of previous instructional opcodes, or n-grams are chosen as
malware to bypass signature-based detection sys- features to build ML model. A detailed explanation
tems. Moreover, this approach is not suitable for of all available learning models is provided in
zero-day attacks. Section 4.
b) Behavior based: although the principle of
behavior-based malware detection is quite similar
to that of signature-based, the behavior-based tech- 2.2.6. Defense against a Denial of Service attack. A defense
nique has a different method of feature extraction. mechanism against a DoS attack is a broad area of
In this technique, the detection mechanism counts research. Defense against a DoS attack can be subdivided
what that malware does instead of what that mal- into two main categories,48 namely attack prevention and
ware says (i.e., a short sequence of bytes). This attack detection.
technique is suitable to detect obfuscated or mutant
malware. Instead of providing different signatures a) Attack prevention: attack prevention is
to different byte code patterns, malware with simi- employed generally in routers of a network to iden-
lar behaviors are grouped into a single signature, tify malicious traffic based on signatures and is the
which significantly reduces the false alarm rate of first line of defense against a DoS attack. Some of
the signature-based approach. According to the common ways of filtering packets are given
Bazrafshan et al.,20 the behavior-based detection below.
method has three components. The first component (i) Ingress/egress filtering: this fil-
is the data collector, which collects data about sta- tering provides access to network traffic if
tic/dynamic information about the executable ele- and only if ingress (incoming traffic to the
ment. The next component works as an local network) traffic or egress (outgoing net-
intermediate medium to convert those collected work traffic from the local network) traffic
data into an intermediate representation. In the matches with its expected source IP.
final steps, those representations are matched with (ii) Router-based packet filtering:
the behavior signature database to provide output. this filtering is done based on routing infor-
The overall architecture of the behavior-based mation on the incoming packets with respect
detector is provided in Figure 8. to their source and destination IP.
Dasgupta et al. 9

(iii) Packet filtering based on hop In supervised learning algorithms, models are trained in
count: the hop count is the difference such a way that the provided true output labels are
between the initial and observed Time To mapped49 to learn the relationship with their correspond-
Live (TTL) value for a packet. In this tech- ing feature value. Decision trees (DTs), neural networks,
nique, a router in a network prepares a data- support vector machines (SVMs), etc., are examples of
base table for every valid user and its supervised learning algorithms. On the other hand, unsu-
corresponding hop count to any particular pervised learning algorithms learn the information and
destination. Thus, if the router finds any make clusters based on the whole training dataset, without
anomaly in the expected hop counts, it drops knowing the output of every input. The difference between
the packet or raises an alarm to prevent the supervised and unsupervised learning is that the latter has
network from potential attack. no category labels in its training data. Examples of unsu-
b) Attack detection: the attack detection pervised learning algorithms are k-means clustering, k-
mechanism for a DoS attack is mainly categorized nearest neighbors (k-NN), etc. RL, a trial-and-error-based
into the following classes. learning algorithm, aims to learn an environment,51 pro-
(i) Signature-based detection: this vided a given agent.52 Training data in RL is a mixture of
technique is similar to signature-based mal- supervised and unsupervised approaches. Instead of pro-
ware detection, where malicious traffic is viding training data with the correct label, RL explores
recognized and differentiated based on signa- actions until it is right.49
tures of the attack traffic data. In the following, we provide a brief background of
(ii) Anomaly-based DoS detection: some widely used algorithms in cybersecurity for both ML
anomaly-based detection mechanisms are algorithms and neural network-based algorithms (i.e., DL).
mostly employed nowadays for DoS detec- Firstly, we will discuss the traditional ML algorithms and
tion, as attack patterns are becoming more their applications. Then, we will dive into neural network-
complex than ever before. ML techniques based algorithms and their associated applications.
are largely employed for anomaly-based Moreover, Table 1 briefly shows the working principle of
detection. This approach consists of two the algorithms and their advantages and disadvantages.
main parts. Firstly, features like IP packet
length, TTL, etc., are extracted from the net-
work traffic data using DM techniques and 3.1. Traditional ML algorithms
then a detection model is built on that fea- 3.1.1. Decision tree. A DT is a rule-based tree-structured
ture representation. Secondly, incoming traf- classification model where each vertex (node) of the tree
fic is passed with this model and, based on represents an attribute and each branch determines the
the pre-selected threshold value, the model value that attribute can have.53 The topmost vertex in a
decides whether the traffic is malicious or tree is called the root, which contains most information
not. gain (differences in entropy) among all features and is
used to optimally split all training data. The bottom nodes
are called leaves. Each leaf represents each class. During
3. Basics of machine learning and deep classification, the DT traverses in a top-down manner
satisfying the instance needing to be classified. The equa-
learning
tion of information gain used in a DT to optimally split
ML is an umbrella term used for computational methods instances in a tree-structured manner is given below:
that try to imitate human learning activities through com-
puters in order to automatically discover and acquire X jPv j
Gain(P, Q) = Entropy(P)  Entropy(Pv ) ð1Þ
knowledge. It is a multidisciplinary research area, which v ∈ DQ
jPj
includes statistics, psychology, computer science, and
neuroscience. Nowadays, the learning algorithm has Here, Gain(P,Q) is the reduction in entropy in order to sort
advanced significantly in practice, because of the new P on attribute Q. Features with increasing information gain
advancements in processor speed and big data.49 To pro- value are chosen as nodes in a top-down manner. During
vide readers with some background knowledge both of DT construction, researchers54 suggested some important
traditional ML and neural network-based algorithms, we points (e.g., pre-pruning, post-pruning etc.) to keep the
provide a separate section for each of them. Based on model from being over-fitted or under-fitted. Finally, the
learning techniques, ML algorithms are broadly categor- tree structure is converted to some set of rules to classify
ized into supervised, unsupervised, and reinforcement or predict new instances. An example of a DT used for
learning (RL).50 intrusion detection is given in Figure 9. The main
10 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Table 1. Summary of machine learning and deep learning popular algorithms used in cybersecurity.

Method Working principal Advantages Disadvantages

Decision tree (DT) A rule-based tree-structured Computational cost is less Need to save all the
classification model, trained on and easy to implement. information of the trained
the basis of information gain of model. Space complexity is
all features in training data high.
Support vector Aims to find separating Suitable for small sample size Selecting optimal kernel size
machine (SVM) hyperplane in the feature space but large feature dimensions (k-value) is difficult
among its classes so that
distance between the
hyperplane and its nearest data
points is maximized
Naive Bayes (NB) Calculates posterior probability Robust to noisy training data, Assumes all features
classifier of a class given inputs based on easy to implement, contribute independently
Bayes’ rule performance does not during the learning algorithm,
degrade with low sample size but in practice this hardly
happens
Artificial neural Consists of one or more hidden Suitable for pattern Computational complexity is
network (ANN) layers between the input and recognition problem with high compared to other
output layer. Stores input data high accuracy algorithms
information as weights in the
hidden layer using the back-
propagation algorithm
k-means clustering Makes clusters or groups among Easy to implement. Suitable Selecting k-value at the
training data points based on for problems where labeling beginning requires domain
similarity measures data is very difficult knowledge
Convolutional Convolution layer of CNN Very useful for image Computationally complex.
neural network extracts features from training classification and pattern Performance degrades with
(CNN) data in a generative fashion using recognition low sample size
several hidden layers and a
pooling layer that pull that
information to predict output
Recurrent neural Processes sequential data Shows excellent Vanishing or exploding
network (RNN) integrating the temporal layer performance for sequential gradient is the main
data analysis, such as speech disadvantage of the RNN
text
Restricted Unsupervised generative Feedback mechanism allows Computational cost is very
Boltzmann machine learning model, restricts the the RBM to extract high
(RBM) connection between nodes of important features in an
the same layer (i.e., visible layer unsupervised learning
and hidden layer) environment
Deep belief The DBN comprises stacked The DBN shows better Computational cost is very
network (DBN) RBMs that execute greedy layer- performance than the RBM high as it takes a high
by-layer training to get robust as it takes the RBM on the number of parameters
performance top of each layer of its
training data

advantage of the DT algorithm is that it is easy to imple- the feature space among its classes.57 The hyperplane is
ment as well as providing high classification accuracy. chosen in such a way that the distance between the hyper-
Computational complexity is one of the main disadvan- plane and its closest data point is maximized. For exam-
tages of the DT classifier. In addition to being used as a ple, Figure 10 shows a hyperplane, b, where w is a weight
single classifier in security applications, the DT is also and b is a bias defined by N data points
used as a collaborative classifier in intrusion detection.55,56 (x1 , y1 ), (x2 , y2 ), :::, (xn , yn ). Here x is an element of real
values R and y = (1,  1)aslabels. The goal of the SVM is
to correctly classify training data when y = + 1 using
3.1.2. Support vector machines. The SVM, one of the most wxi + b ø 1 and when y =  1 using wxi + b ≤ 1. So, for
popular algorithms in cybersecurity, is a supervised learn- all i, yi (wxi + b) ø 1 using the distance measure performed
ing algorithm that aims to find a separating hyperplane in by the following:
Dasgupta et al. 11

2
M= : ð2Þ
jwj
Figure 11 shows the basic principle of the SVM in the
two-dimensional (2D) and three-dimensional (3D) planes.
A simple example of using the SVM for the two-class
classification problem given in Figure 10. The advantages
of the SVM are that it is simple to implement and has also
shown higher accuracy than other algorithms when the
number of features (m) is very much larger than the num-
ber of samples (n), that is, m  n.24 Besides, the security
applications the SVM also used widely in medicine, biol-
ogy, pattern recognition, etc. Another advantage of using
the SVM is that it can create a hyperplane with time com-
plexity O(N 2 ).24
Figure 9. An example of a decision tree for intrusion
detection systems.24 3.1.3. Naive Bayes’ classifier. The Naive Bayes’ classifier is
a probabilistic-based supervised learning algorithm, which
provides the probability of a class given all features as
input. The Naive Bayes’ classifier is based on Bayes’
rule.59 It is also called the generative model. To calculate
the posterior probability of a class p(bja), this classifier
calculates the conditional probability of all features given
a class, that is, p(ajb), with prior probability of all classes
p(b). The term ‘‘naı̈ve’’ is used because in calculating the
posterior probability of a class, all features contribute
individually:

p(a, b) p(ajb)p(b)
p(bja) = = ð3Þ
p(a) p(a)

Figure 10. A simple illustration of the support vector machine where a and b, respectively, are the input vector and the
concept.1 class vector. The main advantage of the Naive Bayes’ clas-
sifier is that it is robust with noisy training data. As this
classifier is based on the probabilistic value of all features,
low training samples with this classifier do not degrade

Figure 11. An illustration of the support vector machine hyperplane in two dimensions and three dimensions.58
12 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Figure 13. The general architecture of an artificial neural


network. The input nodes take input in the input layer, which is
processed by the hidden layer nodes, and output is given to the
output layer.1

3.2. Neural network-based algorithms


Figure 12. An example of three clusters obtained by the k- 3.2.1. Artificial neural networks. Artificial neural networks
means clustering algorithm.1 (ANNs) are composed of nodes (perceptron), which are
inspired by the neurons of the brain. There are three layers
performance. On the other hand, the main limitation of this in an ANN, namely the input layer, hidden layer, and out-
algorithm is that all features are considered to be indepen- put layer. There can be more than one hidden layer based
dent, although in practice this hardly ever happens. on algorithmic design. The input layer passes its output to
the hidden layer and, similarly, each layer passes its output
to the next layer and, finally, the output layer outputs the
3.1.4. k-means clustering. k-means clustering is one of the result. ANNs were very popular until the SVM was
unsupervised ML algorithms, which aims at figuring out invented in 1990. With the development of recurrent,
defined clusters in the dataset given k as the value of clus- feed-forward, and convolution neural networks, again the
ter groups. Clusters are formed based on the similarity ANN has gained popularity in the cybersecurity field.
characteristics among all data points in the dataset. Firstly, In the ANN, inputs (x1 , x2 , :::, xn ) are given with output
k number of centroids is estimated among m data points. label y, where the information from the input is weighted
Next, based on Euclidean distance measures (Equation by a weight vector (w1 , w2 , ::, wn ) during the learning pro-
(4)), m data points x1 , x2 , ::, xm are assigned to its nearest cess. Throughout the learning process the weights are
centroids: adjusted inPsuch a way that they minimize the learning
error, E = ni= 1 jdi  yi j, where the error is the difference
X
m between the desired output (di ) and the actual output (yi )
distance = d(xi , centroid(xi )) ð4Þ of the neuron. This adjustment is done by a gradient algo-
i=1 rithm named back-propagation, where the learning process
iterates back and forth until the model obtains an error less
here, centroid(xi )) means the centroid to which xi data than its threshold value. The weight vector is adjusted
point belongs. In the later steps, the centroids are recalcu- according to the following equation:
lated based on the mean distance from all the data points
assigned to those centroids. These steps iterate throughout wi, j wi, j + wi, j ð5Þ
the algorithm until any data point cannot modify any clus-
ter centroids. The goal is to minimize the distance from where wi, j = ηδj xi, j , i is the input node, and j is the hid-
each centroid to its corresponding data points within a den node. An example of ANN architecture is provided in
cluster. The simulation result of the k-means clustering Figure 13.
algorithm is given in Figure 12. Clustering algorithms are
largely used to find data patterns or data clusters in a big 3.2.2. Convolutional neural networks. DL, a subclass of ML
data framework, where data labeling becomes a difficult algorithms, is largely used to handle large training datasets
task. One of the disadvantages of k-means clustering is using hierarchical-based feature abstraction and represen-
defining the k value at the beginning. Using feature simi- tation. The performance of traditional ML algorithms
larity calculations, k-means clustering has been used degrades when datasets are very large, as well as due to
largely in security applications.60 the dimensionality of the data. To handle this problem, DL
Dasgupta et al. 13

Figure 14. An example of a convolutional neural network for image classification. The convolution layer convolutes the input data
with the help of multiple same-size kernels, the pooling layer is used for down-sampling the feature sizes, and the fully connected
layer stores the generated weight value to predict output in the output layer.61,62

is being used with the help of graphics processing units


(GPUs) to process big data. Among all DL algorithms, the
convolutional neural network (CNN) is being utilized
extensively in cybersecurity applications. In the CNN
there are two main layers: the convolution layer and the
pooling layer. The convolution layer convolutes the input
data with the help of multiple same-size kernels. The con-
volution operation retrieves features from the input data
by providing high value for a given position if the desired
feature is present in that location and vice versa.63 As an
example, for image data, the convolution kernel takes
element-wise multiplication of each kernel cell value and
the corresponding overlapping image pixel value. The fol-
lowing formula is used to calculate the exact value (where
m is the kernel width and height, h is the convolution out-
put, x is the input, and w is the convolution kernel):
Figure 15. An illustration of recurrent neural networks
m X
X m (RNNs).62 Similar to the feed-forward neural network, the
hi, j = wk, l xi + k1, j + l1 ð6Þ RNN consist of three units: input units, hidden units, and output
k=1 l=1 units.
The pooling layer is used for down-sampling the feature
computational cost is one of the disadvantages of the
sizes through two types of pooling techniques (i.e., max-
CNN.
pooling, average pooling). In particular, max-pooling
chooses the maximum value in features calculated in the
previous layer, while average pooling takes the average 3.2.3. Recurrent neural networks. Applications where the
values. In short, within a kernel, the pooling mechanism at output of the present state is dependent on the state of sev-
a particular position outputs the maximum value of the eral previous states (i.e., sequential data), traditional ML
given input that falls under the kernel.63 Mathematically: algorithms fail to provide good performance, as in tradi-
tional ML algorithms there is no inter-dependency between
hi, j = maxfxi + k1, j + l1 81 k m, 1 l mg ð7Þ the input and the output. The recurrent neural network
(RNN), another DL algorithm, particularly handles these
A different activation layer is used in the CNN, the most sequential data (e.g., speech text, sensory data) and shows
commonly used of which is a rectified linear unit, which better performance than any other algorithm.64
includes nodes with activation function f (x) = max(0, a). Similar to the feed-forward neural network, the RNN
Figure 14 shows a pictorial view of the working principle consist of three units: input units, hidden units, and output
of the CNN algorithm. Among all other advantages, the units. Information flows in the RNN happen only one way:
14 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Here, vi and hj are the binary states of the ith visible


unit of n visible units and the jth hidden unit of m hidden
units. The joint probability distribution p(v, h) over all visi-
ble and hidden layers defined as follows67:

1 E(v, h)
p(v, h) = e ð11Þ
Z
Here, Z is the normalization factor67 that is computed by
summing all possible pairs of hidden and visible vectors.
Besides Equations (10) and (11), the conditional probabil-
Figure 16. An example of the Boltzmann machine (BM) and ity can also be computed as follows:
restricted Boltzmann machine (RBM). The RBM restricts the
connections between all units in the same layer (i.e., visible layer !
and hidden layer) compared to the BM.3
X
n
p(hj = 1jv) = σ wij vi + ai ð12Þ
i=1
!
from input units to hidden units. The synthesis of this one- X
m

way information flow happens from a previous sequential p(vi = 1jh) = σ wij hj + bi ð13Þ
j=1
disguised unit to the current timing hidden units. All of the
information throughout the RNN is saved in the hidden Here, σ is the sigmoid function. Another important issue of
units. The working principle of the RNN is shown in adjusting weights shown in studies (e.g., Alom et al.67) is
Figure 15. The RNN computes the hidden unit vector with a simple formula given below:
sequence h = (h1 , h2 , :::, hT ) to calculate the output unit
vector y = (y1 , y2 , :::, yT ) through t = 1 to T iteration of the wij = ε(hvi hj idata  hvi hj imodel) ð14Þ
following equations:
Here ε is the learning rate. Obtaining an unbiased sample
ht = H(Wxh xt + Whh ht1 + bh ) ð8Þ of hvi hj idata and hvi hj imodel and updating all visible and
hidden layers is computationally inefficient. To overcome
yt = Why ht + by ð9Þ this, a faster learning algorithm, contrastive divergence
(CD) learning, is proposed by this69 research study:
where W represents the weight matrix, b is a bias vector,
and H represents the recurrent hidden layer function.65 wij = ε(hvi hj idata  hvi hj ireconstruction) ð15Þ
Figure 15 shows briefly the working principle of the RNN
algorithm.
3.2.5. Deep belief networks. Another popular DL algorithm
in cybersecurity is the deep belief network (DBN), which
3.2.4. Restricted Boltzmann machines. The restricted is also a generative model consisting of multiple layers of
Boltzmann machine (RBM) is an updated version of the hidden variables. The DBN uses the RBM in its architec-
Boltzmann machine (BM), which reduces the complexity ture. The DBN comprises stacked RBMs, which perform
of the BM. In other words, the RBM increases the learning layer-by-layer greedy learning activity in an unsupervised
speed of the algorithm, restricting the connections between learning environment during training data. In the DBN,
all units in the same layer (i.e., visible layer and hidden the RBM is trained on top of the previously trained layer.
layer).66 The differences between the BM and the RBM The DBN executes its training layer by layer. This concept
architectures are shown in Figure 16. Training and recon- is shown in Figure 17 for clarification. For example, if a
struction are the two main objectives of the RBM.67 A DBN comprises vi visible units and hj hidden units, the
RBM network with vi visible layer, hj hidden layer, wij Gaussian–Bernoulli RBM for continuous-valued data can
weights between the ith visible layer and the jth hidden be defined as follows70:
layer, and biases (a,b) can have energy functions of the n X
X n
vi X n
(vi  ai )2 X m
visible and hidden units as follows68: E(v, h) =  wij hj   bj hj
i=1 j=1
σ i i = 1 2σ 2i j=1
X
n X
m X
n
E(v, h) =  ai v i  bj hj  vi hj wij , ð16Þ
i = 1, i ∈ V j = 1, j ∈ V i=1
Here σ i is associated with the standard deviation of vi . The
ð10Þ mathematical symbols shown here are similar to those of
Dasgupta et al. 15

classification strategy needs to be installed in the network.


Among different classification techniques in ML, the DT
is the simplest one. The main principle of the DT is to split
the dataset according to the information gain of the
features.
Prajapati et al.71 showed that, using the KDD Cup 1999
dataset, the DT algorithm performed better compared to
the ANN, SVM, and biological neural network to detect
DoS, probing, User-to-Root (U2R), and Remote-to-Local
(R2L) attacks. There are several DT algorithms (e.g., ID3,
J48, C4.5, etc.) used by researchers to ensure security. The
detection rate, as well as execution speed, is one of the
main concerns for the DT algorithm, as it saves all input
information into its memory. To overcome this problem
for the C4.5 algorithm, Relan and Patil72 demonstrated a
method using the KDDcup99 and NSL_KDD datasets. The
method solved the problem of the C4.5 algorithm being
biased toward multiple-valued features as well as unequal
fragmentation of the dataset. The authors proposed pruning
techniques to the nodes of a DT to avoid over-fitting and
Figure 17. An overview of the deep belief network (DBN) to get better accuracy (98.45%). However, the pruning
concept. The DBN comprises stacked restricted Boltzmann techniques have two types of limitations. One is that pre-
machines (RBMs), which perform layer-by-layer greedy learning pruning forced the algorithm to stop before due time and,
activity.3
because of post-pruning, at first the DT grows completely
and then undergoes pruning to remove the branches. To
overcome this problem, Wang and Chen73 demonstrated a
the Bernoulli–Bernoulli RBM, so conditional probability
multi-strategy pruning algorithm to obtain the optimal
for visible and hidden units can be defined as follows:
results with the optimal tree size. This method reduced
! error rates more than traditional pruning algorithms. In
X
p(vi = vjh) = N vjai + hj wij σ 2i ð17Þ turn, combining two or more algorithms with the DT to
j overcome the detection rate problem was proposed by
! Elekar,74 where different categories of attack detection are
X vi performed using different combinations of algorithms,
p(hj = 1jv) = f bj wij 2 ð18Þ
i=1
σi such as J48 DT with a combination of Random Forest, J48
with the Random Tree, and the Random Tree with colla-
boration of the Random Forest. The results showed that
J48 combined with the Random Forest improved the detec-
4. An overview of cybersecurity methods tion rate (92.62%) for DoS, U2R, and R2L attacks with a
based on machine learning and data low false positive rate for probe attacks.
mining: 2013–2018 There are other security applications where the DT had
been used successfully as a collaborative classifier for
In this section, a comprehensive overview of recent works
on ML in cybersecurity is considered. We also focused intrusion detection.55,56 Goeschel55 proposed a model that
our search considering algorithm details and efficiency, combines J48 DT, LibSVM, and the Naive Bayes’ classi-
feature extractions and selection method (if any), and the fier to decrease the false positive rate significantly and
relevant dataset employed to solve a particular problem(s). also improve the computational efficiency. This model
A brief overview of all the techniques is also presented in was built with WEKA’s (an open-source software for ML)
Tables 2–4. knowledge flow Graphical User Interface (GUI). The
author’s model has three cascading steps: firstly, a SVM
classifier is trained to identify whether an instance is nor-
4.1. Decision tree mal or an attack type, and after that the instance is directed
From the cybersecurity perspective, tracks of legal or ille- through a DT algorithm to predict the output. Thirdly, the
gal activity can be found when network logs are inspected. Naive Bayes’ classifier is used with a DT algorithm to
To differentiate illegal activities from legal ones, a make a decision for any unclassified attack traffic. The
16
Table 2. Summary of popular machine learning and deep learning techniques used in cybersecurity.

Study Problem domain Feature representation Classification/detection Dataset Figure of merit Year
machine learning
technique

Wang et al.14 Detect DoS, probe, U2R, From 8729 data records Decision tree (used multi- KDDcup99 and NSL- Improvement of error 2013
and R2L attack of 11 attack types and strategy pruning KDD datasets rate = 0.4%
normal, frequency of techniques)
illegal host request
Kim et al.56 Detect DoS, probe, U2R, Frequency of attack host Combination of decision NSL-KDD datasets Training time = 50% and 2014
and R2L attack type attributes in dataset tree and SVM testing time = 60%
compared to
conventional method
Relan and Patil72 Detect DoS, probe, U2R, Frequency of discrete Decision tree (used with KDDcup99, NSL-KDD Accuracy = 98.45 2015
and R2L attack valued attributes (e.g., pruning techniques) datasets
is_host_login) of KDD
Cup 99 dataset
Elekar74 Detect DoS, probe, U2R, Illegal host request of 41 Decision tree (J48 with KDDcup99 and NSL- Accuracy = 92.62% 2015
and R2L attack. network features random forest, J48 with KDD datasets
particular to 4 attack random tree, and random
types tree with random forest)
Sahu and Mehtre76 Detect intrusion Combination of illegal J48 decision tree Kyoto 2006 + dataset True positive rate (TPR) 2015
network call with 24 algorithm = 97.2%, false positive
features rate (FPR) = 4.7%
Ganeshkumar and Detect DoS, probe, U2R, Frequency of attack host Integrating fuzzy system KDD Cup 1999 database Accuracy = 99.87% for 2016
Pandesswari80 and R2L attack type attributes in dataset with neural network DoS attack, accuracy =
78.61% for probe attack,
accuracy = 95.52% for
R2L attack, accuracy =
85.30% for U2R attack.
Guha et al.82 Detect cyber-attacks in Frequency of attack Artificial neural network NSL-KDD, UNSW-NB15 Accuracy = 91.98% with 2016
cloud infrastructures signatures (e.g., invalid (ANN) with combination dataset NSL-KDD dataset and
host request) of genetic algorithm (GA) accuracy = 95.46% with
as feature selection UNSW-NB15 dataset
method
Saied et al.83 Fight against zero-day Signature of invalid host ANN with ‘‘detection, Snort data Accuracy = 100% for 2016
DDoS attacks request defense and cooperative known DDoS attacks and
mechanism’’ accuracy = 95% for
unknown DDoS attack
Kosek85 Discover malicious Frequency of malicious ANN with contextual Private distributed energy Improvement = 55% for 2016
voltage control action in voltage control actions anomaly detection resources data control detection,
the low voltage grid technique improvement = 56% for
malicious control
detection
(continued)
Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
Table 2. (Continued)

Study Problem domain Feature representation Classification/detection Dataset Figure of merit Year
machine learning
technique
Dasgupta et al.

Alharbi et al.75 Detect DoS, probe, U2R, Decision tree taking Detect DoS attack in IoT Private dataset Flow of packet count, 2017
and R2L attack advantage of VPN environment arrival rate, and bursty
behavior of network.
Improvement of
performance = 41% over
cloud
Dash81 Detect Intrusion Frequency of attack ANN had been used with NSL-KDD dataset Accuracy = 98.13%, false 2017
signatures combination of positive rate less than 2%
gravitational search (GS)
and particle swarm
optimization (GSPSO)
techniques as feature
selection methods
Villaluna and Detect DoS, probe, U2R, Connections protocol, ANN with soft- NSL-KDD and KDD99 Accuracy = 89.74% with 2017
Cruz84 and R2L attacks login attempts, service computing datasets fuzzy logic, accuracy =
ports, network services, 96.09% with ANN and
etc. accuracy = 96.19% with
fuzzy neural network
Teoh et al.86 Detect malware attack The characteristics of ANN fuzzy k-means Private dataset 2017
attacker’s IP addresses (FKM) as feature
extracted from datasets selection method
Shenfield et al.79 Identify shellcode Frequency of low-level ANN classifier Random files from local Average accuracy = 98%, 2018
patterns in network data shellcode network ROC = 0.98, FPR < 2%
with other conventional
network traffic data

DoS: Denial of Service; U2R: User-to-Root; R2L: Remote-to-Local; SVM: support vector machine; DDoS: Distributed Denial of Service; ROC: Receiver Operating Characteristic; VPN: virtual private
network; IoT: Internet of Things; IP: Internet Protocol.
17
18
Table 3. Summary of popular machine learning and deep learning techniques used in cybersecurity.

Study Problem domain Feature representation Classification/detection Dataset Figure of merit Year
machine learning
technique

Al-Jarrah and Network attacks Frequency of attack Recurrent neural DARPA 1998 dataset Detection rate = 100% 2014
Arafat118 signatures in the dataset network (RNN) with
embedded temporal
behavior of the network
attacks to maximize the
detection rate
Liu and Pi91 DoS and probe attacks SVM algorithm with game DARPA dataset Accuracy = 99.8857% 2015
theory
Senthilnayaki et DoS and probe attacks 10 features are extracted SVM with GA as feature KDD cup dataset Accuracy = 99.15% for 2015
al.92 by GA from 41 network selection method DoS attack, accuracy =
data features 99.08% for probe attack
Shailendra Singh97 DDoS attack detection Improved support vector KDDCup2009 dataset Accuracy = 100% 2015
machine (iSVM) through
modifying Gaussian kernel
to enlarge the spatial
resolution around the
support margin and
generalized discriminant
analysis (GDA) as feature
selection techniques
Malhotra et al.119 Anomaly detection in the Long short-term memory Private datasets Accuracy = 93% 2015
network (LSTM) RNN to detect
anomaly using time series
Shin et al.100 Cybersecurity risk Bayesian network - - 2015
detection
Dalmazo et al.93 Detect intrusion in cloud Poisson moving average SVM with Poisson moving DARPA and KDDcup99 Accuracy = 98.56% and 2016
computing environment predictor average predictor as dataset, CAIDA DDoS false negative rate (FNR)
feature selection method Attack 2007 Dataset = 8%
Bali and Kumar104 Data integrity of vehicular Trust metrics of each Efficiently data Private dataset 2016
cyber–physical systems vehicle dissemination between
(VCPS) different devices in VCPS
environment
Bezemskij et al.101 Attack in autonomous Processing and - Accuracy = 99.5% 2017
vehicle system communication module
feature frequency
Kolini and Compare national Similarity metrics Clustering techniques - 2017
Janczewski103 cybersecurity strategies between datapoints with topic modeling as
(NCSs) selecting features
(continued)
Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
Table 3. (Continued)

Study Problem domain Feature representation Classification/detection Dataset Figure of merit Year
machine learning
technique
Dasgupta et al.

Omrani et al.94 Detect attack in TCP Frequency of three Hybrid approach of ANN NSL-KDD dataset True positive rate (TPR) 2017
network connection selected features (flag, and SVM algorithm = 79.71%, false positive
protocol, and service) of rate (FPR) = 0.92%
dataset
Ghanem et al.95 Intrusion detection Frequency of attack SVM algorithm Private data at Accuracy = 100% 2017
signatures in the dataset Loughborough University
Terai et al.96 Intrusion detection in Distinctive SVM algorithm Private dataset Error rate less than 5%. 2017
industrial control system communication
characteristics in the data
Goh et al.65 Detect cyber-attack in Unsupervised learning SWaT dataset - 2017
cyber–physical system method using RNN
(CPS)
Wu et al.98 Cyber identity detection Features of keystroke for SVM algorithm-based - 2018
individual typing pattern software, a two-factor,
pressure-enhanced
keystroke dynamics-based
security system to
authenticate and identify
users
Li et al.124 Detect software RNN MNIST and CIFAR Detection rate = 100% 2018
vulnerability (binary) dataset

DoS: Denial of Service; GA: genetic algorithm; SVM: support vector machine; DDoS: Distributed Denial of Service; TCP: Transmission Control Protocol.
19
20
Table 4. Summary of popular machine learning and deep learning techniques used in cybersecurity.

Study Problem domain Feature representation Classification/detection Dataset Figure of merit Year
machine learning
technique

Dahl et al.111 Malware detection Convolutional neural Private dataset Error rate = 0.49% for 2013
network (CNN) on single neural network and
random projections error rate = 0.42% for
ensemble of neural
networks
Gao et al.121 Intrusion detection Frequency of attack Deep belief network NSL-KDD dataset Accuracy = 95.25% 2014
signatures in the dataset (DBN) model
Kosek and Identify unauthorized Frequency of attack Ensemble machine Private dataset Improves precision = 2016
Gehrke125 control actions in cyber– signatures in cyber– learning methods 75.7% and accuracy =
physical system physical system 9.2% respectively of a
classic model-based
anomaly detection
He et al.122 Detect false data injection High-dimensional Conditional deep belief KDD Cup ‘99 dataset Accuracy = 93.73% for 2017
attack in real-time for temporal behavior network (CDBN) using compromised attack,
cyber–physical system features conditional Gaussian– accuracy = 98.51% for
Bernoulli restricted normal attack
Boltzmann machine
(RBM) to extract
temporal features
Kumar et al.126 Defend against tailored Features related to Ensemble machine Private dataset Detection accuracy = 2017
malware attack connection, packets, and learning methods 98.2%
flags
Musman and Minimize cybersecurity Game-theoretical analysis - - 2017
Turner127 risk
Yazdankhah and Detect and prevent Game-theoretical analysis - - 2017
Honarvar128 denial-of-service attacks
in the network of the
Internet of Things (IoT)
Miller and Busby- Identify unauthorized Frequency of abnormal Multi perspective machine NSL-KDD dataset Improvement = 4% over 2017
Earle129 control actions in cyber traffic data learning (MPML) other ensemble methods
physical system approach
Su et al.105 DDoS attack detection in CNN IoTPOT dataset Accuracy = 94.0% 2018
IoT
Cakir and Malware attack detection Opcode representation of Shallow deep learning BIG 2015 dataset Accuracy = 96% 2018
Dogdu114 a malware method with word2vec as
feature selection
techniques
Neupane et al.130 Defend against flooding Frequency of attack Two-stage ensemble Private dataset using Accuracy = 100% 2018
attack signatures in cloud learning algorithm GENI cloud testbed
environment
(continued)
Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)
Table 4. (Continued)

Study Problem domain Feature representation Classification/detection Dataset Figure of merit Year
machine learning
technique
Dasgupta et al.

Feng et al.131 Android malware Extracted features Ensemble machine Private dataset Accuracy = 96.56%, false 2018
detection contain redundant or learning methods with positive rate (FPR) =
irrelevant information chi-square method as 1.85%
feature selection
techniques
Zainal and Jali132 Risk classification of spam Artificial immune systems UCI Machine Learning true positive rate (TPR) = 2018
message in short message (AISs) repository 80%
service (SMS) format
Mehare and Detect anomaly in the Negative selection Private dataset - 2018
Thakur133 network algorithm (NSA) and
improved positive
selection algorithm (PSA)
Wu et al.134 Provide optimal security Game-theoretical analysis - - 2018
detection strategy with Bayesian network .
Kim135 Analyze passive Game-theoretical analysis - – 2018
eavesdropping attack in
wireless personal
network area

DDoS: Distributed Denial of Service.


21
22 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

overall accuracy of 99.62% by the proposed model was signature-based intrusion detection method. To evaluate
found for all stages, with a 1.57% false positive rate. the performance of this proposed method, the authors took
Another collaborative learning algorithm was proposed 400,000 random files as data, which consisted of a mixture
by Kim et al.,56 which is also based on the combination of of log files, text files, executables, music, etc., in a similar
two algorithms (i.e., DT and SVM). In particular, the format. To feed into the ANN model, the authors con-
authors proposed a method that participates hierarchically verted all the byte-level data into integer values. The
in the misuse-based detection and anomaly-based detection experimental results using 10-fold cross-validation demon-
in a breakdown manner. In the first stage, the C4.5 DT strated 98.00% average accuracy with a less than 2.00%
algorithm is used to build a misuse-based detection false positive rate.
method, where normal training data are disintegrated into ANNs have also shown promising performance in the
smaller subsets of data within this model. In the next stage, cloud computing environment when used individually as
several one-class SVM models are generated for the disin- well as when combined with other algorithms. Detection
tegrated subsets from the previous stage, which not only of unauthorized access is one of the challenging issues for
use known attack information directly but also build a pro- cloud service providers and cloud users. Related to the
file of normal behavior meticulously. The study was con- cloud environment, an anomaly detection system at the
ducted on the NSL_KDD dataset, a modified version of hypervisor layer was developed by Ganeshkumar and
the KDDcup99 dataset, to evaluate the proposed model. Pandesswari,80 named the hypervisor detector, to detect
Their methods proved that a hybrid IDS can improve the malicious activities. Since a fuzzy-based IDS shows poor
detection rate for unknown attacks as well as improve the performance if it is designed for target-based models, they
detection speed significantly. As a result of their experi- developed an adaptive method named the Adaptive Neural
ments, it is found that they required only 50% and 60% Fuzzy Inference System (ANFIS), integrating the fuzzy
training and testing times, respectively, compared to con- system with neural networks. To evaluate their method,
ventional detection models. A similar research study was they used the KDD database. Their method showed
reported by Alharbi et al.,75 where the authors took advan- 99.87% accuracy for DoS attacks and 78.61%, 95.52%,
tage of using a VPN to secure network communication for and 85.30% accuracy for probe, R2L, and U2R attacks,
IoT devices, simultaneously using a DT to analyze network respectively Dash81 also proposed a similar hybrid IDS, in
traffic to detect malicious attack sources as well as DDoS which the gravitational search (GS) and a collaboration of
attacks. Similar work for a network intrusion detection sys- the GS and particle swarm optimization (GSPSO) tech-
tem (NIDS) was proposed by Sahu and Mehtre,76 where niques were used to train a neural network, and then the
the authors used the Kyoto 2006 + dataset, which is an GS-ANN and GSPSO-ANN models were used for the
updated version of the KDDcup99 dataset such that all intrusion detection process. To evaluate the performance
samples are labeled as normal, known attack, or unknown of their method, the author’s approaches were compared
attack. They applied the J48 DT algorithm over this dataset with other optimization algorithms, such as the genetic
to generate classification rules by using the information algorithm (GA), PSO, and an ANN based on gradient des-
gain theorem with 134,665 samples given. Their generated cent (GD-ANN). The author claimed that this approach is
rules from the dataset during the training process showed more suitable for unbalanced datasets. Using the NSL-
97.2% accuracy to identify the network connection cor- KDD dataset, the presented method achieved 94.90% and
rectly, that is, attack, no attack, and unknown attack. 98.13% accuracy using the GS and GSPSO, respectively.
To detect cyber-attacks in a cloud infrastructure, Guha
et al.82 used an ANN and, trained with network traffic
4.2. Artificial neural networks data, found the connecting link of the cloud infrastructure.
Due to the advent of hardware technologies and public The GA was utilized to extract features. The proposed col-
datasets, ANNs are again being used in diverse cyberse- laborative approach was evaluated using the NSL-KDD
curity applications. For instance, the ANN has been and UNSW-NB15 datasets. According to the reported
applied in several software domains, such as finding soft- results, the method attained 91.98% accuracy on the NSL-
ware design flaws.77 Although the ANN’s applications to KDD dataset and 95.46% accuracy on the UNSW-NB15
detect multiple attack types have also been shown use- dataset.
ful,78 applications related to shellcode detection is a new A ‘‘zero-day attack’’ is one of the main reasons to
line of research. Shenfield et al.79 proposed an IDS that degrade the performance of IDSs. As ML algorithms cannot
utilizes an ANN classifier to identify shellcode patterns in identify those instances that were not close to training data,
network data with other conventional network traffic data, the zero-day attack is now a burning issue in cybersecurity.
such as images, logs, dynamic link library (DLL) files, Saied et al.83 proposed the interesting concepts of ‘‘detec-
etc., which significantly improves the performance of the tion, defense and cooperative mechanism’’ to fight against
Dasgupta et al. 23

zero-day DDoS attacks. The authors presented their study 4.3. Support vector machine
in two ways. Firstly, they evaluated their method with old Network intrusion detection (NID) is viewed as a problem
datasets, where new attack types were not updated. Later, of pattern or signature recognition/classification. Although
their method was evaluated by a new dataset. The authors there are many existing pattern recognition methods,88,89
created their own dataset by launching several DDoS their performance largely depends on the size of the train-
attacks, and a Java Neural Network Simulator (JNNS) was ing samples. However, modern cyber-attacks in practice
used to process and prepare the dataset to train the algo- are complex and diverse and they are constantly updated,90
rithm. The authors evaluated and compared their solutions which tends to make training samples smaller and unrepre-
with Snort-AI and other related research and found a 92% sentative. Although the SVM performs better than other
detection accuracy without knowledge sharing about zero- algorithms with a smaller data size, selecting, constructing,
day attacks and found 98% detection accuracy when using and improving its kernel functions is the key factor for
up-to-date datasets. From their study, they concluded that improving its performance. Liu and Pi91 proposed a novel
the more the database for intrusion detection is updated, the method for NID, where a novel kernel SVM algorithm is
greater the accuracy for unknown attacks will be. Villaluna combined with game theory (named GTNID-SVM) to
and Cruz84 proposed an information security system that integrate the advantages of the radial basis function (RBF)
shows slightly better performance than that of Saied et al.83 kernel as well as the polynomial kernel function. To evalu-
for detecting unknown attacks. To identify zero-day attacks, ate the effectiveness of the method, the authors performed
the authors used soft-computing so that new attack features a series of experiments comparing with conventional
are not misclassified and can be detected based on some methods based on the DARPA dataset. Their experimental
common features. Their system can detect DoS, probe, results found a total detection rate of 98.75% and 89.99%
U2R, and R2L attacks. The authors showed comparative for DoS and probe attacks, respectively, compared to
performances during network data analysis among fuzzy 96.9% and 87.88% for the RBF kernel SVM. A similar
logic, ANN, and fuzzy neural networks. From their study, it hybrid intrusion detection approach to improve the false
was found that the fuzzy neural network takes less detection alarm rate of IDSs was proposed by Goeschel,55 where the
time and shows better performance (96.19% accuracy and SVM, DT, and Naive Bayes’ classifiers were collectively
98.60% detection rate) than the other two mentioned above. used. Another hybrid method with the KDDcup99 dataset
Significant research studies have also been conducted was proposed by Senthilnayaki et al.,92 where the authors
on the use of the ANN to detect cyber-attacks in the
employed the GA to extract 10 important features from 41
cyber–physical domain. For instance, Kosek85 proposed total features of the dataset and then applied the SVM for
an anomaly detection technique that is used in the low vol- classification. The experiments were carried out on
tage grid to detect malicious voltage control action. The
100,000 samples with 90% of them used for training. The
method used the ANN to detect the behavior of control SVM classifier with the GA found 95.26% accuracy com-
actions and to find any anomalies in the distributed energy pared to 55.75% for Naive Bayes, 67.87% for a multi-layer
resources (DERs). The presented approach was evaluated perceptron (MLP), and 76.75% for a linear algorithm,
in a co-simulated set-up testbed. The experimental result whereas Liu and Pi91 and Senthilnayaki et al.92 separately
provided 56.00% better accuracy with respect to anomaly showed 98.75% and 95.26% accuracy for the DARPA and
control detection. KDDcup99 datasets, respectively. Dalmazo et al.93 showed
For malware detection, very recently a study using the
98.56% detection accuracy for both datasets in their pro-
ANN proposed by Teoh et al.86 was published where the
posed method to detect intrusion in the cloud computing
authors used a semi-supervised learning algorithm to detect
environment. Instead of using principle component analy-
malware. Firstly, the authors extracted the features from net-
sis (PCA) or the GA as a feature extraction algorithm, the
work data and provided a weight value to each feature,
authors used the Poisson Moving Average predictor in
finally annotating the log history and making a scaling sys-
their methods. Later, the SVM was used for anomaly
tem. The log history in the network data is classified into
detection. A recent research study by Omrani et al.94 com-
known, unknown, and attack class using the fuzzy k-means
bined ANN and SVM algorithm decisions in a data fusion
(FKM) clustering algorithm. With their private dataset, the
approach to detect an attack in Transmission Control
authors evaluated their algorithmic performance and claimed
Protocol (TCP) network connections. The authors took the
that their approach is able to get a lower false positive rate
percentage of correct classes from the decisions of both
compared to usual anomaly detection technology. Moreover,
algorithms and combined them one by one with a Naive
Saroare et al.87 also proposed a modified fuzzy clustering
Bayes’ probabilistic model to evaluate the effectiveness of
algorithm that can improve the accuracy of classifying the
the approach. They found a true positive rate of 79.58% on
log history, if used, associated with Teoh et al.86
the NSL-KDD dataset.
24 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Besides the above hybrid approaches, the SVM had also with the Bayes’ rule that scans contents for spam emails.
been used as a second line of detection method as a com- To improve the system’s performance, the authors inte-
plement to the performance of the IDS. For example, grated swarm intelligence for recognizing and blocking
Ghanem et al.95 utilized the SVM as an alternative method spam emails. Instead of an enterprise system, Shin et al.100
for IDS in order to improve the performance of their IDS. proposed a cybersecurity risk model to protect a nuclear
To evaluate their performance, they compared their IDS in facility. The authors proposed a Bayesian network model
linear and non-linear forms with the one-class SVM to evaluate and ensure the cybersecurity of a nuclear facil-
method as well as the two-class SVM method. Their ity. Based on an action performed in a nuclear facility, the
experimental results found that the two-class SVM per- presented model is supposed to provide a security risk
formed better than the one-class SVM and showed 98.78% value, corresponding vulnerability, and necessary mitiga-
accuracy to detect an anomaly. tion techniques to ensure the security of that facility. To
SVMs also have been used in security applications of evaluate their model’s effectiveness, the authors applied
cyber–physical systems. Terai et al.96 developed a discrimi- the same model to gauge the cyber risk factor of a reactor
nant model for anomaly detection systems in industrial con- protection system of a research reactor.
trol system using the SVM algorithm. The authors used ICS A different line of research with the Bayesian network
communication profile (packet interval and length) as a was performed by Bezemskij et al.101 to detect cyber-
basis for detection. Further, they proposed to integrate their attacks in autonomous vehicle systems. Their methods are
discriminant model with the existing IDS to complement effective in identifying not only the status of a cyber-
their IDS. They evaluated their method on a cybersecurity attack but also in identifying the attacker’s exact location.
testbed through penetration testing. Their experimental To evaluate the feasibility of their proposed method, the
result demonstrated that the existing IDS with the integra- authors generated simulated attacks on an autonomous
tion of the SVM algorithm provides a 5% lower false alarm vehicle system, such as magnetic interference attacks,
rate compared to the IDS alone. To reduce the false alarm false command injection attacks, etc.
rate of the previous works, Shailendra Singh97 designed an In fact, there are many security areas where Bayesian
improvement of the SVM algorithm by modifying the networks are being used to ensure security. To determine
Gaussian Kernel to enlarge the spatial resolution around the which algorithm should be used for a particular domain
support margin to increase the distance of several classes. area, Chockalingam et al.102 examined 17 standard
Shailendra Singh’s97 approach is divided into two steps. In Bayesian network models in cybersecurity. They exam-
the first phase, generalized discriminant analysis (GDA) is ined these models on eight different criteria and finally
used to reduce the feature dimension, and in the next phase recognized decisive patters about using these models. The
the improved SVM to detect anomaly is utilized. The eva- outcomes of their study provide great insight into how the
luation results provided 100% detection accuracy for nor- Bayesian network is being used to solve the core critical
mal traffic and DDoS classes. problem of cybersecurity and thus illuminated the key
Recently, few research studies have been performed on research areas and corresponding gaps.
cyber person identity detection via ML. For example, Wu
et al.98 demonstrated an interesting and novel approach to
4.5. k-means clustering
identify authentic users by developing a two-factor authenti-
cating mechanism using pressure-enhanced keystroke Several research studies can be found in the literature
dynamics to signify and validate users based on their key- related to cybersecurity where k-means have demonstrated
stroke patterns. The authors designed a triboelectric key- reliable performances. For instance, to cluster or to com-
stroke device to convert the typing motions as well as pare national cybersecurity strategies (NCSs), Kolini and
patterns of a user into electrical signals. Furthermore, they Janczewski103 employed k-means clustering and topic
installed the SVM algorithm in their software to classify and modeling to figure out the similarity as well as the differ-
authenticate users. This novel and promising approach with ences between all NCSs. For their study, the authors col-
the help of the ML algorithm in cyber identity detection can lected 60 NCSs, which were developed between 2003 and
improve the performance of multi-factor authentication tech- 2016. The study concluded that being a member of inter-
nology to prevent users’ access from being compromised. national institutions works as a determinant factor to
assimilate between NCSs. The authors also used a hier-
archical clustering algorithm and found that NCSs estab-
4.4. Bayesian network lished by members of the European Union (EU) or the
Bayesian networks have shown significantly good perfor- North Atlantic Treaty Organization (NATO) have very
mance for malware and spam detection. Recently, a spam similar characteristics. Besides those outcomes, the authors
detection study was performed by Rathore and Yadav,99 also suggested using topic modeling to develop national
where the authors proposed a hybrid Bayesian approach policies and strategies. Similar research studies were also
Dasgupta et al. 25

performed by Bali and Kumar104 where the authors pro- missing in their study. On the other hand, Huang and
posed a novel clustering mechanism to disseminate data Stokes113 evaluated multi-task learning by using feed-
efficiently between different devices in a vehicular cyber– forward neural networks with four hidden layers. Most
physical system (VCPS) environment. To secure the estab- recently, Cakir and Dogdu114 devised a feature extraction
lished cluster in their proposed model, the authors also method (word2vec) using shallow DL to represent any
suggested an algorithm for secure clustering and establish- malware based on its opcodes. For classification, a gradi-
ing trust. Their trust metrics are utilized to establish the ent boosting algorithm with k-fold cross-validation was
security level of vehicles. To evaluate the feasibility of employed to validate the performance. With their limited
their model, the authors performed experiments in differ- sample data, the authors found 96% accuracy, which is
ent network scenarios. better than that of similar studies.

4.6. Convolutional neural network


4.7. Recurrent neural network
In the most recent years, the CNN has been widely being
used for pattern matching other tasks. The IoT, owing to The most recent work using the RNN was proposed by Li
the use of the internet, has created big data from the user et al.,115 where the network was trained with library/API
side as well as opened many vulnerabilities for attackers function call code snippets to detect the vulnerability
to launch attacks. Lack of security solutions and detection related to that function call. Li et al.115 are among the ini-
methods for emerging IoT environments has recently tial people who used DL to detect software vulnerability.
caused many DDoS attacks, for example, Mirai, The authors aimed at several objectives. Firstly, they pro-
Brickerbot botnets, etc. To detect such DDoS malware posed to replace human experts with their method to
attacks with respect to large user datasets, CNN-based relieve humans from that tiresome work. Secondly, the
algorithms are widely being employed. For example, in authors wanted to reduce the false negative rate related to
2018, Su et al.105 proposed a novel light-weight malware manually defining features. Although software defect
detection approach, where first the malware binary is con- detection is different from software vulnerabilities, some-
verted into a one-channel gray-scale image for generative what related work using DL was proposed to predict soft-
feature representation and then a light-weight CNN was ware defects by Wang et al.116 and Yang et al.117
used to identify malware and its families. Their experi- The RNN has also been used for anomaly detection in
mental results showed that during two-class classifica- cyber–physical systems. Al-Jarrah and Arafat118 proposed
tion, their approach provided 94.0% accuracy, while for a time delay neural network (TDNN) structure by embed-
three-class classification it provided 81.8% accuracy. By ding the temporal behavior of network attacks to maximize
comparing previous approaches, Yue106 proposed a simi- their detection rate. Similarly, Malhotra et al.119 used a
lar malware classification approach, which uses a similar long short-term memory (LSTM) RNN to detect anomalies
light-weight CNN and malware images to identify differ- using time series. The idea is to use the anomaly prediction
ent PC malware families. The only differences between rate over a number of time steps and use this error to model
the approaches are that Yue106 used a very deep neural multivariate Gaussian distribution to predict anomalous
network (DNN) with 10 different layers with a very behavior. The approach was tested on an electrocardiogram,
complex pre-processing methodology. On the other a power demand, and a space shuttle dataset. A similar but
hand, Su et al.105 utilized only raw features for classifi- better approach was proposed by Goh et al.65 to detect
cation. Although the method in Su et al.105 can detect anomalies in cyber–physical systems by learning pattern
only two malware families effectively, based on the sim- sequences in time series.
plicity and classification accuracy, it is preferable to use Use of the RNN in a cyber–physical system was
as the first layer of malware detection mechanism in IoT recently found in a paper where the authors used the RNN
devices. with an unsupervised learning method to detect anomalies
For malware detection, a great deal of research can be in cyber–physical testbeds.65 For their experiment, the
found based on the use of neural networks.107–110 For authors collected complex datasets from the Secure Water
instance, Dahl et al.111 used a large-scale approach to clas- Treatment Testbed (SWaT). The outcomes of their study
sify malware via neural networks on random projections. are that their method not only identified the status of a
Their work showed that an increasing number of hidden cyber-attack but also could locate the device or sensors
layers did not improve accuracy significantly. Saxe and where the attack was actually generated. Although they
Berlin112 used feed-forward DNNs for static malware clas- did not perform their study with a publicly available data-
sification. Although their work provides significant accu- set, their controlled environment experiment provided sig-
racy improvement, the result for dynamic analysis is nificant accuracy with a lower false positive rate.
26 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

4.8. Deep belief network for threat detection. One of the reasons the authors choose
the ensemble ML method is that their algorithm adapts
An intrusion detection technique using DL was presented
itself with the changing network behaviors as well as pro-
by Kang and Kang120 for a vehicle network security. The
tects itself from being over-fitted. Specifically, the authors
authors used DBNs as their classification algorithm. They
combined the outputs of five ML algorithms, namely RF,
found that pre-learning, the classification algorithm
PART, JRIP, J.48, and Ridor, to detect malware for an
enhances the performance of the DBN. Meanwhile, Gao
Android-based mobile system. Their methods were evalu-
et al.121 used the DBN model for intrusion detection on
ated with network data to achieve 98.2% accuracy of
the KDD Cup’99 dataset and achieved 93.49% accuracy.
threat detection. Similar work on Android malware detec-
Most recently, He et al.122 proposed a novel scheme to
tion was proposed by Feng et al.131 They also employed
detect false data injection attacks in real-time for cyber–
ensemble methods as a detection mechanism. The authors
physical systems by extending the DBN architecture,
proposed a novel dynamic analysis framework named
called the conditional deep belief network (CDBN), which
EnDroid, which automatically abstracts features to train
uses the conditional Gaussian–Bernoulli RBM to extract
the model. To reduce the dimensionality of the features,
high-dimensional temporal features.123 The authors
the authors used the chi-square method. Then, the reduced
designed the CDBN architecture to analyze a temporal
dataset was used to train the ML models and, finally, all of
attack in cyber–physical systems, for example, a smart
these models were combined to predict the data. Stacking
grid, by providing real-time measurement data from dis-
of different ML algorithms provided 96.56% accuracy
tributed sensors/meters. In fact, instead of using a time
with a 1.85% false positive rate for Android malware
series generative model for future actions, the authors used
detection.
the CDBN as a classifier that reduced the complexity of
Ensemble learning methods are also being used for
the training and execution time of the CDBN architecture.
intrusion detection in cyber–physical systems. Kosek and
Gehrke125 developed a detection method to identify
4.9. Other machine learning methods unauthorized control action or intrusion in DER opera-
tions. The proposed model employed the ensemble ML
Because of reliability and scalability attributes, cloud- approach of several non-linear artificial neural network-
hosted services are extensively being used in major con- based DER models. To evaluate the approach, measure-
sumer fields, such as retail, healthcare, etc. Based on their ment data in a distributed energy system was used to attain
beneficial issues, these cloud-based systems can be termed 97.6% accuracy in detecting unauthorized control actions,
software-defined everything infrastructure (SDxI). Given which is a 9.2% improvement compared to classic
the enormous benefits of SDxI, IDSs are also evolving anomaly-based models. To further show the advantages of
largely to protect these environments. For example, to pro- using ensemble ML algorithms, Miller and Busby-Earle129
tect networks from a DDoS attack, software defined net- designed an approach, called multi-perspective ML
work-based (SDN-based) detection mechanisms136–138 (MPML), where similar types of features are grouped
using ML and/or in cloud environment have been pro- together to form perspectives and thus increase the variety
posed, which mitigate targeted virtual machines (as a in the classifier during the ensemble process. The perfor-
replication of users’ computers) to safe virtual machines. mance of the system was evaluated on the NSL-KDD
Similar research was also performed by Neupane et al.,130 dataset, and authors claimed that the method was capable
where the authors proposed a ‘‘defense by pretense’’ of attaining at least a 4% improvement compared to the
mechanism to protect the network from flooding attacks. conventional ensemble method algorithm.
The proposed method utilized a two-stage ensemble learn- Although biologically inspired methods are used exten-
ing algorithm that employed features to detect the origin sively with other ML algorithms, there are comparatively
and type of attack. They used the principle of a honeypot, limited research studies using the artificial immune system
where the presented scheme provided an impression of a (AIS). One very recent research work was presented by
successful attack to the attackers and thereby blocked the Zainal and Jali132 regarding risk classification of the spam
traffic flows. The method’s performance was evaluated on short message service (SMS). The authors proposed an
a GENI cloud testbed. Berman et al.139 found that their improved risk assessment approach for text spam by rede-
approach was effective in filtering and detecting DDoS fining the dendritic cell algorithm (DCA) of AISs. To eval-
attacks in a SDxI-based network infrastructure. uate their method, a dataset from the UCI ML repository
Due to obfuscating techniques used by attackers to was employed to show an 80% true positive rate. In Zainal
bypass the detection method, traditional malware detection et al.,140 the authors performed several simulations using
systems are becoming useless. To counteract such prob- the same dataset with the DCA and deterministic DCA
lems, Kumar et al.126 proposed an ensemble ML method algorithms to assess the risk level of text spam. A similar
to analyze the network flow of malware communication method was also proposed by Mehare and Thakur133 and
Dasgupta et al. 27

Hońko141 using the negative selection algorithm (NSA) In response to security and privacy threats to ML tech-
and improved positive selection algorithm (PSA), respec- niques, an emerging research sub-field, called adversarial
tively, to detect anomalies in the network. ML, is gaining much more momentum now. The adversar-
Game theory applications to provide network security ial ML field resides at the junction of ML and computer
is another popular approach. The use of game theory security, which aims at enhancing the robustness of ML
together with advanced ML in the database is incomplete. algorithms in adversarial settings such as cybersecurity,
A similar game-theoretic-based approach in the wireless biometric recognition, and human–computer interaction.148
personal network area proposed by Kim135 to analyze and In particular, the adversarial ML field explores three main
detect passive eavesdropping attack was studied. In partic- directions: (a) recognizing the prospective vulnerabilities
ular, for game eavesdroppers and transmissions, this is an of ML techniques both at the training and inference stages;
emerging research area. For instance, recently Wu et al.134 (b) developing corresponding attacks, along with an eva-
proposed a game theory-based IDS, where the Bayesian luation to estimate their impact on ML algorithms; and (c)
network and robust Nash equilibrium analysis were pro- devising counter-measures to improve the security of ML
vided for the game when the informer works as players, systems against these attacks. In this section, we provide
whereas the Bayesian Nash equilibriums are used to select an overview of the recent works on the security evaluation
optimal game strategies between these players. Also, an of ML algorithms, attacks at training and testing times, and
application of analyzing cybersecurity risk through game associated defensive techniques. Moreover, we also dis-
theory was proposed by Musman and Turner.127 The cuss vulnerabilities and defenses of DNNs against adver-
authors proposed a system called the cybersecurity game sarial samples.
(CSG) that is employed to minimize the system risk by
maximizing the system’s ability. This approach favors the 5.1. Adversarial machine learning attack types
defender in such a way that defense strategies minimize
the maximum cyber risk (MiniMax). A similar approach The security threats to ML systems can be divided into
but in a different area, using game theory, was designed three categories based on the influence on classifiers, the
by Turner and Musman142 to reduce the cybersecurity risk security violation, and the attack specificity
on a point of sale system. To prevent DoS attacks in the perspectives.149
IoT environment, Yazdankhah and Honarvar128 proposed
a game-theoretic-based algorithm by simulating the 5.1.1. Influence on classifiers perspective. Under the perspec-
approach with NS2 (a network simulator) to create a net- tive of the influence on classifiers, the ML security threats
work environment and found that game-theoretic-based can be classified as causative attack and exploratory
schemes perform better in respect of energy consumption attack. In a causative attack, the attacker can alter the train-
(25–30%), operational throughput (10–15%), and latency ing data and model parameters, thereby resulting in a
than other algorithms. remarkable drop in ML algorithm performance. Causative
attacks are performed at the training stage such that the
attacker can either inject adversarial samples into the exist-
5. Adversarial machine learning ing training data or directly modify the training data.
Despite the exceptional success of ML and DL methods in Likewise, the adversary can temper the learning algo-
various real-world applications, including self-driving cars, rithm’s parameters in a RL environment, which is also
facial recognition, and cybersecurity (e.g., malware or intru- known as logic corruption. While in the exploratory attack
sion detection), these methods (and their training datasets) the attackers do not tamper with the trained model, they do
are vulnerable to different security menaces that usually lead target to misclassification or collect evidence about the
to a notable decrease in performances.36,143,144 The adversar- training data and model characteristics. Exploratory attacks
ial nature of ML is raising serious concerns, especially in are performed at the testing (inference) stage.
safety-, security- and privacy-sensitive domains, since
attacks could cause severe ramifications. For instance, mali- 5.1.2. Security violation perspective. Under the perspective of
cious attackers can exploit the vulnerabilities of biometric security violation, the ML security threats can be grouped
recognition systems and steal sensitive data145,146 or can into integrity attack, availability attack, and privacy viola-
seize control of autonomous vehicles to force wrong deci- tion attack. Integrity attacks attempt to undermine the
sions or commands that may give rise to accidents.147 This integrity of the inference process, without damaging nor-
problem occurs because traditional ML techniques were not mal system operation, with the goal of inducing increased
designed originally to deal with clever and adaptive attacks. false negatives by the ML model when classifying adver-
Thereby, the ML-based system security may be undermined sarial samples. Availability attacks attempt to compromise
by exploiting their certain vulnerabilities. the normal system functionalities available to genuine
28 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

users by reducing the quality (e.g., confidence), perfor- of ML-based cybersecurity systems was conducted by
mance (e.g., speed), or access (e.g., DoS) with the goal of Dalvi et al.151 in 2004. In particular, the authors demon-
inducing increased false positives by the ML model when strated that spam filtering systems based on linear classi-
classifying benign samples. Privacy violation attacks fiers can be easily fooled by a few changes in the matter
attempt to extract confidential and private information of spam emails without influencing the readability of the
from training data or the ML model by reverse-engineer- spam text. Yu et al.152 analyzed the robustness of malware
ing. For instance, if the model is an intellectual property and intrusion detection in large-scale networks. In turn,
(e.g., financial market system) or if the training data is sen- PDF file malware detection framework using learning-
sitive (e.g., medical records), they require confidentiality based classifiers was investigated by Rndic and Laskov153
and privacy preservation, respectively, and should not be and Wu et al.154 Since clustering-based algorithms are
exposed under attacks. widely used in the information security domain, Zhao
et al.155 examined the effect of poisoning attacks against
clustering algorithms and active learning, whereas Li
5.1.3. Attack specificity perspective. Under the perspective
et al.124 experimentally showed that generative classifiers
of the attack specificity, the ML security threats can be
are more robust than discriminative ones against adversar-
categorized as non-targeted attack and targeted attack,
which cause the target ML model to produce, respectively, ial attacks. Akhtar et al.,156 Akhtar,145 and Biggio et al.157
any incorrect output by reducing the probability of the true respectively evaluated biometric systems against spoofing
output/class and a specific incorrect output/class. In addi- attacks and intrusion as well as spam filtering algorithms
tion, the attacks can be white-box, gray-box, or black-box against evasion attacks and proposed corresponding frame-
attacks. In white-box (perfect-knowledge) attacks, the works for the empirical evaluation of cybersecurity tech-
adversary is assumed to have knowledge of or access to niques under attacks.
the target model and its parameters, for example, full Recently, DL-based systems have been tested for their
training data. Since the adversary has knowledge of the security against adversarial attacks and found not be
model, they may be able to produce very powerful attacks. robust. For instance, Szegedy et al.158 showed that DNNs
In gray-box (limited-knowledge) attacks, the adversary is can be easily deceived by a slight perturbation in the input
assumed to have only knowledge of the feature representa- data. Fawzi et al.159 probed the robustness of different
tion and learning algorithm type (e.g., the system is using
the SVM or DNN for classification) but not of training
data and algorithm parameters. However, the adversary
can assemble a surrogate dataset and create a substitute
model, also known as an auxiliary or surrogate model, to
launch attacks. In black-box (zero-knowledge) attacks, the
adversary has no knowledge about the ML model except
the predictions and utilizes past inputs to infer the model’s
vulnerability. Black-box attacks are a more realistic threat,
and most ML models are vulnerable regardless of their
underlying structure, for example, an oracle attack.148
All in all, attacks at the training and testing stages are,
respectively, called poisoning and evasion attacks in gen-
eral. Like traditional ML models, DNNs have also been
recently proven to be vulnerable to inputs with small care-
fully crafted manipulations, which were introduced with
the intention to fool systems or lead to misclassification.
Such slightly perturbed samples or inputs to the models
are called adversarial examples.150

5.2. Security and robustness evaluation of


cybersecurity ML algorithms
The security evaluation of ML systems focuses on sys-
tematic and empirical robustness assessment against vari- Figure 18. Conceptual representation of (a) reactive and (b)
ous attacks, as well as the development of carefully proactive defensive mechanisms for machine learning in
targeted attacks. The seminal study to evaluate the security computer security.157
Dasgupta et al. 29

well-known DNN-based classifiers against random noise


and proved that their security suffers both by random and
well-crafted adversarial noise. In turn, Kurakin et al.160
investigated the security of DNNs against adversarial sam-
ples obtained from a cell-phone camera and demonstrated
a decline in the accuracy of the system.
Most quintessential assessment methods for ML-based
cybersecurity algorithms are not competent because they
mainly quantitatively evaluate the performance under nor-
mal operation rather than robustness.161 Nonetheless, a
few researchers have proposed a robustness and security
ML evaluation framework, considering it as either a reac-
tive or proactive arms race.157 In a reactive arms race Figure 19. Illustration of poisoning attacks.35
(Figure 18(a)), the attacker and the system developer aim
to attain their objectives by adjusting their actions with
regards to the contender, namely learning from the past.
the training data utilized to retrain the decision model.
Specifically, a potential attacker assesses the defense
This injection of malicious samples alters the genuine data
framework and implements the attacks, while in succes-
classification centroid (Cgenuine ) to the abnormal one
sion the system developer investigates the newly added
(Cabnormal ), as depicted in Figure 19. When the system is
attacks and devises some novel defense methods, for
retrained and deployed, the adversary may utilize adversar-
example, retraining the system with new features to
ial samples rather than genuine ones to fool the ML secu-
enhance the security. It is worth noting that the reactive
rity system. One of the best-known examples of poisoning
strategy is not capable of averting never-before-seen
attacks on a system that keeps retraining periodically is
attacks. In a proactive arms race (Figure 18(b)), the system
Microsoft Tay (https://www.wired.com/2017/02/keep-ai-
developer anticipates the adversary and evaluates the secu-
turning-racist-monster/), that is, a Twitter conversation bot
rity of the system under adversarial attacks. Then, the
released on 23 March 2016, for youngsters. After 16 hours,
developer designs appropriate counter-measures against
Microsoft took it offline as it started raising highly offen-
adversarial attacks. This procedure is performed until the
sive content that was racist, sexual, drug-related, and abu-
system is installed.
sive in nature, which was caused by a coordinated attack
by a subset of people. Also, a prominent antivirus com-
5.3. Security threats and defenses in adversarial ML pany, Kaspersky Lab, was accused (although they rejected
any wrongdoing) of poisoning rival antivirus products via
Here we present a summary of adversarial attacks at the
the injection of false positive samples into VirusTotal
training and testing phases of computer security ML sys-
(http://virustotal.com).
tems, including DL.
The poisoning attacks can be performed by input or
label manipulations, that is, when adversaries modify the
5.3.1. Training phase attacks (poising attacks) input samples/features or label information, respectively.
5.3.1.1. Attacks. The training dataset plays a vital role Furthermore, input manipulations can be direct or indirect
in setting up the upper bound of the ML system’s perfor- poisoning. In direct poisoning, the adversarial samples are
mance. Therefore, numerous adversaries attempt to inject directly injected into the training data. For instance, Mei
a small portion of poisoning samples into the training data- and Zhu163 targeted a convex loss-based ML model (e.g.,
set with the aim to significantly drop the ML model’s linear and logistic regression or SVMs) and developed a
overall performance at test time. A poisoning attack is a poisoning attack framework that searches the optimal
typical kind of attack in the training phase. Owing to the changes to the training data with regards to cardinality or
fact that training data in computer security is guarded with the Frobenius norm. In particular, the attacks are fabri-
high confidentiality and cannot be accessed by the attack- cated through two nested optimization problems using the
ers, the adversaries alternatively take advantage of the inner problem’s Karush–Kuhn–Tucker conditions and gra-
retraining stage of the model. For example, adaptive ran- dient descent. Meanwhile, Li et al.164 and Zhao et al.165
somware detection, spam classification, or face recognition developed attacks using projected gradient ascent algo-
system decision models are periodically upgraded to adapt rithms that increase the empirical loss, respectively, for
to the changing operating scenarios by retraining the sys- recommendation and collaborative filtering and multi-task
tem.149 Biggio et al.162 showed how an attacker exploits learning tasks. Besides, for supervised ML systems in
periodic update attribute to inject poisoning samples into computer security, attacks for unsupervised learning-based
30 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

system (e.g., clustering analysis) have been devised. For with a trimmed loss function to erase data samples with
example, Biggio et al.166,167 proposed poisoning attacks large residuals), and kernel logistic regression (adapting
against single-linkage and complete-linkage hierarchical the multiple kernel learning approach with a Bayesian reg-
clustering. Specifically, in Biggio et al.,166 the most effec- ularization scheme), whereas Xu et al.178 developed robust
tive adversarial samples are selected via a heuristic bridge SVMs with a rescaled hinge loss function that is a mono-
method, which estimates the effectiveness of the injected tonic, bounded, and nonconvex.
adversarial data on community discovery, singular value
decomposition (SVD), and node2vec clustering accuracy.
In turn, Chen et al.168 proposed two targeted poisoning 5.3.2. Testing/inference phase attacks (evasion attacks)
attacks, called small community and noise injection, to 5.3.2.1. Attacks. The process of employing a trained
evade graph clustering techniques. It is worth noting that ML model to classification or clustering is usually called
these attacks require limited adversarial knowledge and the training or inference phase. The adversaries may
low cost. In input indirect poisoning attacks, the adver- exploit inherent vulnerabilities of the ML techniques to
saries poison the training data indirectly before pre- crack the system through spoofing (e.g., impersonation
processing stage of the system. For example, Xiao et al.169 and evasion), membership, training data extraction, and
utilized the gradient ascent routine to attack feature selec- model extraction attacks.
tion algorithms, such as LASSO, ridge regression, and the In an evasion attack, the adversary fabricates adversar-
elastic net. ial samples, which are capable of evading their detection,
In label poisoning attacks, the basic scheme arbitrarily leading to a remarkable decline in the system’s overall
perturbs the labels, where adversaries select a novel label security. Evasion attacks have been studied extensively
for a portion of the training data utilizing a random distri- and proved to be very effective in various computer secu-
bution. For instance, Mozaffari-Kermani et al.170 proposed rity domains ranging from spam detection and PDF mal-
a ML algorithm-independent approach for label poisoning ware detection to biometric recognition systems. Xu
attacks in healthcare and demonstrated that only flipping et al.154 developed an evasion attack generation method
around 40% of the training label will lead to a remarkable for random forest and SVM-based malware detection sys-
drop in performance. Paudice et al.171 and Wang et al.172 tems by applying the GA such that the fitness of genetic
proposed an in-flip and out-flip random classification noise variants obtained by mutation was estimated with respect
probabilities-based label manipulation attack, respectively to the probability of oracle’s class. In turn, Biggio et al.179
for binary and multi-class classification systems. devised a method that depends on attackers’ knowledge
and non-linear gradient algorithms, to evade SVM and
5.3.1.2. Defenses. The counter-measures of poisoning neural network malware detection systems in PDF files.
attacks can be clustered into two groups: data sanitation Kantchelian et al.180 designed two evasion attack strate-
and robust learning. In data sanitation, purity of the train- gies for computer security systems using tree ensembles.
ing data is ensured by adversarial attack sample detection The first attack algorithm uses a mixed integer linear pro-
and removal.35 For instance, Paudice et al.171 presented a gram solver with an expressive set of constraints, whereas
poisoning attack sample detection algorithm using k-NN the second algorithm utilizes symbolic prediction. In an
that reassigns the correct label to each instance in the impersonation attack, the adversary emulates the victim’s
training data, while an empirical risk minimization-based data samples in order to gain the facilities and authority of
defense mechanism is proposed by Steinhardt et al.173 that a genuine user/sample. For instance, a GA (multi-dimen-
detects outlier samples and removes them with the aid of sional archive of phenotypic elites) was formulated by
upper bounds on the loss across different attacks. Mei and Zhu163 for adversarial attacks, which was profi-
In robust learning, techniques depending on robust sta- cient in evading an image-based security systems using
tistics that are inherently less sensitive to outlier training AlexNet and the Le-Net-5 DL networks. Similarly, a spe-
samples are employed to improve the robustness of the cific pair of glasses impersonating the attack technique
system, for example, kernel functions, bootstrap aggregat- against the DNN face recognition systems was introduced
ing, and the random subspace method (RSM).148 Feng by Sharif et al.181
et al.,174 Liu et al.,175 Bootkrajang and Kabán,176 and Moreover, studies have shown that biometric-based
Jagielski et al.177 respectively devised robust systems by user authentication systems are vulnerable to evasion
means of logistic regression (using a simple linear pro- attacks, known as spoofing and DeepFakes attacks.182–184
gramming procedure to optimize a robustified linear corre- Biometric spoofing attacks happen when a masquerader
lation between response and linear measures), linear attempts to masquerade as a genuine user by replicating
regression (assuming that the feature matrix can be the trait of a genuine user and submitting it to the system
approximated by a low-rank projection matrix), regression in order to bypass it, thereby gaining illegitimate access
learning (estimating the regression parameters iteratively and advantages.185–187 In general, DeepFakes attacks are
Dasgupta et al. 31

performed by digitally manipulated facial samples, which iteratively retraining the system on the adversarial attacks.
are fake facial images/videos obtained by swapping the Namely, such a trained classifier on training data contain-
face of one individual for the face of another individual ing evasion attack samples becomes capable of thwarting
using DL-based methods.188 Other digital face manipula- attacks at test time. For instance, Kloft and Laskov192 pro-
tion involves modifying facial attributes (e.g., age and posed adding adversarial attacks along with their labels to
gender), swapping/morphing two faces, adding impercep- training, which yields a robust system. Similar approaches
tible perturbations (i.e., adversarial examples), syntheti- were suggested by Miyato et al.193 and Cai et al.194 for
cally generating faces, or animating/reenacting facial semi-supervised text classification and image-based secu-
expressions in the face images/videos.184 rity ML systems, respectively. Another technique to
In membership attacks, the adversary tries to find out improve the resilience of computer security systems
whether a particular sample was part of the training data. against evasion attack is feature transformation, that is,
In this way the adversary may learn the trained model’s feature selection, insertion, and rescaling. Bhagoji et al.195
parameters. For instance, Shokri et al.189 employed a presented principal component analysis and a data ‘‘anti-
shadow model (similar to the original one) to determine if whitening’’-based evasion defense framework to enhance
the data sample was part of the original model’s training. the robustness of SVM and DNN systems.
A random sample with the hill climbing technique was Game theory or robust optimization, also known as
used as a query to the original model to obtain strong class smoothing model outputs, is an efficient tool that mainly
confidence. Recently, Long et al.190 investigated the rea- treats the adversarial ML as a min–max problem between
son for membership vulnerability and suggested that over- attackers and defenders. More explicitly, the training loss
fitting is a major reason for information leakage, but is not is maximized by the inner problem by manipulating the
the fundamental cause of the issue. training samples with worst-case and bounded modifica-
In training data extraction attacks, the adversary’s aim tion, whereas the training algorithm is trained by the outer
is to obtain samples, and their information is utilized for problem, minimizing the adversarial sample training
training the computer security system. Fredrikson et al.149 loss.196 It has been empirically found that different regu-
proposed a regression-based attack method for medicine larizers realize different kinds of bounded attacks, and
dosage prediction tasks. The authors showed that their therefore there is an equivalency between regularized
attack model can recover genomic data about the patient, learning problems and robust optimization that leads to
which can be used to reproduce training samples via the secured ML systems,197 including systems with structured
observed model predictions. However, the reproduced learning.198 A similar effect may also be achieved by pun-
sample is not an actual sample point in the training, but an ishing the input gradient with regularization terms.199
average representation. Apart from the above-mentioned categories, defenses
In model extraction attacks, the adversary attempts to based on classifier ensemble and detection and rejection of
accumulate some information about the target computer adversarial samples have also been proposed. For instance,
security ML model. The extracted information is fed to a in Jordaney et al.,200 malware samples residing far from
reverse analysis, which may lead to privacy leakage data the training centroids in the feature spaces were uncovered
and users, for example, survey data of customers. The and rebuffed. Meanwhile, Corona et al.201 and Wild
model extraction attacks could be white-box attacks et al.202 employed classifier fusion for phishing webpages
(adversary can easily access and download models or and secure biometric fusion against spoofing attacks,
information) or black-box attacks (adversary can access respectively. Also, privacy-preserving-based defense
only the APIs of the target model). Fredrikson et al.149 methods that preserve the privacy of training data as well
implemented model extraction attacks using gradient- as learning algorithms (via differential privacy203) and data
descent techniques against online face recognition and security and privacy in the cloud environment (via homo-
DTs. Likewise, Tramèr et al.191 exploited the output confi- morphic encryption204) have been explored. The differen-
dence values of ML cloud service platforms (e.g., Amazon tial privacy is a data encryption technique whose output is
AWS) such that by equation-solving attacks one can invariant to one data point change, and thereby suppres-
extract ML models, such as the DNN, SVM, and multi- sion of privacy leakage even after new data addition is
class logistic regression. attained,203 while the homomorphic encryption encrypts
data in such a fashion that ML systems can process it with-
5.3.2.2. Defenses. The defenses for inference time out decryption.204
attacks can be broadly classified into adversarial training, Also, many research works have been conducted to
feature transformation, and game theory or robust optimi- counteract biometric spoofing and DeepFakes attacks. A
zation classes. typical counter-measure to spoofing and DeepFakes
Adversarial training is an active defense technique in attacks is anti-spoofing and the DeepFakes detection tech-
which an adversary-aware ML system is procured by nique, which aims at disambiguating human bona fide face
32 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Figure 20. An adversarial example formulated using the fast gradient sign method.150 VGG16213 is able to recognize the original
sample correctly with high confidence. However, when imperceptible perturbations are combined, the system produces incorrect
labels with high confidence.

samples from spoofing or DeepFakes artifacts.184,205 For xadv can be defined as a minimization problem in a norm
instance, Akhtar and Foresti206 developed a face anti- ball:
spoofing detection method using seven novel methods to
obtain discriminative patches in a face image. The features min jjxadv  xjj
of the selected discriminative image patches are fed to a xadv

specific classifier (i.e., the SVM, Naive Bayes, Quadratic s:t: f (xadv ) = yadv ð19Þ
Discriminant Analysis, or Ensemble), which is followed f (x) = y
by a majority voting scheme to determine the attack.
y 6¼ yadv
Sengur et al.207 evaluated the efficacy of the pre-trained
DL features for detecting face spoofing, whereas a trait- where y and yadv are output labels of x and xadv , respec-
independent anti-spoofing framework that can detect face, tively, and jj · jj is typically some Lp norm.
fingerprint, and iris spoofing attacks was proposed by 5.3.3.1.1. Fast gradient sign method (FGSM): the
Akhtar et al.186 Korshunov and Marcel208 and Akhtar and FGSM150 computes adversarial attacks by perturbing each
Dasgupta209 analyzed, respectively, image quality and bit of information (e.g., a pixel in the image) of a clean
local image features for DeepFakes detection, while a new sample x by  on the direction of the gradient of the train-
adversarial learning algorithm that can improve the secu- ing loss J , given the true label y with respect to the input:
rity of multimodal systems against spoof attacks was pro-
posed by Akhtar and Alfarid.210 xadv = x + sign · ½rx J (x, y) ð20Þ

5.3.3.1.2. Iterative gradient sign method (IGSM): the


5.3.3. Attacks against deep neural networks. As mentioned IGSM214 is an iterative version of the FGSM. xadv at itera-
above, DNNs are vulnerable to adversarial examples (i.e., tion i will be given by the following:
samples with deliberate perturbations).211,212 Usually, the
added perturbations are so minimal that they are impercep-
tible to humans, but DNNs yield errors with higher confi- 0 = x,
xadv
ð21Þ
i = xi1 +  · sign½rxadv
xadv adv
dence. For example, as shown in Figure 20, the deep i1
J (xadv
i1 , y)
adversarial example can be flawless without noticing per-
turbations, yet it is misclassified by a widely used convo- 5.3.3.1.3. Jacobian saliency map attack (JSMA): a
lutional classifier, VGG16.213 JSMA215 consists of a greedy iterative technique for tar-
geted attacks. The Jacobian of the outputs with respect to
inputs is used to determine which bits of information (e.g.,
5.3.3.1. Attacks. In recent years, several methods to gen- pixels) yield higher variations on the outputs after
erate adversarial samples for DNNs have been proposed. perturbations:
Here, we summarize a few representative approaches.
Problem formulation: let (x, y) and n be a pair corre- ∂t X ∂j
sponding to an original sample and its class label and a st = ; s0 = ; s(xi ) = st js0 j:(st < 0):(s0 > 0) ð22Þ
∂xi jt
∂xi
trained neural network. Generating an adversarial attack
Dasgupta et al. 33

In Equation (22), st represents the Jacobian of target class processing or denoising input, architecture alteration, net-
t with respect to input image x. Also, so represents the work verification, ensembling counter-measures, and
sum of Jacobian values of all non-target class. As a result, adversarial detection, as shown in Table 5.
changing the selected pixel of the image will also increase Adversarial training methods train the system with
the likelihood of labeling it as the target class. The per- adversarial examples to amplify the regularization and loss
turbed value of each bit of information p of a given input functions, thereby forcing the DNNs to be more resilient.
x belonging to class y, given model f and target class t, Goodfellow et al.150 and Wu et al.223 added adversarial
will then be as follows: examples in the training data and empirically demonstrated
the improvement in robustness and precision. Moreover,
padv = p + Goodfellow et al.150 argued that adversarial training should
8 P be used only to avoid over-fitting. Kurakin et al.224 con-
∂ft (x) ∂fy (x)
>
< 0,
> ; if ∂p < 0 or
jt
∂p >0
ducted a comprehensive evaluation on the ImageNet data-
> ∂ft (x) P ∂fy (x) set of adversarial training techniques, the results of which
>
: ∂p j ∂p j, otherwise
j6¼t
showed that adversarial training is robust against one-step
attacks (e.g., FGSM), but not for iterative attacks (e.g.,
ð23Þ
IGSM).
5.3.3.1.4. DeepFool: DeepFool216 computes attacks by In defensive distillation, the outputs of the first DNN
determining the minimal distance from original sample to system are fed to the second DNN system to prevent the
the decision boundary. A linear approximation of the affine framework from being too well-fitting on the training sam-
classifier is computed iteratively. The minimal perturba- ples. The softmax outcomes of the first and second DNNs
tion (ηadv (x) =  jjwjj
f (x)
2 w) is estimated as follows:
can be expressed as follows:
2

arg min jjηi jj2 exp( zi )


ηi qi = P T zj ð26Þ
ð24Þ j exp T
s:t: f (xi ) + rf (xi )T ηi = 0
where T is a temperature parameter that controls the
5.3.3.1.5. AdvGAN: Zilong Lin and Shi217 and Zhao degree of knowledge distillation. Papernot et al.228 pro-
et al.218 proposed adversarial examples of generation tech- posed using network distillation as a defense technique.
niques using generative adversarial networks (GANs) for The results reported by Papernot et al.239 showed not only
intrusion detection and machine translation, respectively. In that high-temperature softmax is more efficient against
particular, in Zhao et al.,219 firstly, a Wasserstein Generative adversarial examples with small perturbations, but also
Adversarial Network (WGAN) model is trained on the data- that distillation enhances the generalization of the system.
set to map random noise to the input through the ‘‘genera- Liu et al.240 devised a feature distillation defense for the
tor’’ (G), which is followed by an ‘‘inverter’’ (I) training to DNN-oriented JPEG compression scheme by maximizing
map the input sample to the dense inner representations of z the malicious feature loss of adversarial examples.
space. Then, the adversarial noise is computed by minimiz- The methods in the pre-processing or denoising cate-
ing the distance of the inner representations. The adversarial gory either suppress adversarial noise in the input data or
examples are obtained using the generator as follows: transform the data before feeding to the DNNs. For
instance, the autoencoder network in Gu and Rigazio241
xadv = G(zadv ) and bit-depth reduction, total variance minimization, and
min jjz  I(x)jj ð25Þ JPEG compression in Guo et al.242 were utilized to remove
s:t: f (G(z)) 6¼ f (x) adversarial perturbations. The PixelDefend approach in
Song et al.243 purifies the adversarial examples by chang-
The AdvGAN and JSMA techniques were employed to ing every information bit (e.g., each pixel in an image) to
attack domain generation220 and Android malware detec- project it back toward the original training distribution.
tion221 systems, respectively. Besides evading attacks, a Moreover, Song et al.243 suggested that adversarial exam-
few poisoning attack methods for DNNs have been pre- ples usually reside in low-probability region training data.
sented. For example, different backdoor poisoning attacks Architecture alteration schemes alter the general neural
schemes for user authentication frameworks were studied network architectures, for example, appending an extra
by Chen et al.222 robust layer. Borkar and Karam244 applied a few convolu-
tional layers with residual connections that remarkably
5.3.3.2. Defenses. The published DNN defenses against enhance the DNN security against attacks. Bradshaw
adversarial examples can be broadly grouped into seven et al.233 developed a robust DNN architecture that
categories: adversarial training, defensive distillation, pre- employed Gaussian processes with RBF kernels in DNNs,
34 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Table 5. Summary of counter-measures against adversarial examples.

Defense technique Approach/scheme

Adversarial training Ensemble adversarial training, a training methodology that incorporates perturbed inputs
transferred from other pre-trained models225
Extended adversarial and virtual adversarial training as a means of regularizing a text classifier by
stabilizing the classification function226
Training state-of-the-art speech emotion recognition on the mixture of clean and adversarial
examples to help regularization227
Defensive distillation The main idea used is training the model twice, initially using the one-hot ground truth labels but
ultimately using the initial model probability as outputs to enhance robustness228,229
Pre-processing defense Using PCA, low-pass filtering, JPEG compression, soft thresholding techniques as pre-processing
technique to improve robustness230
Use of use two randomization operations: (a) random resizing of input images and (b) random
padding with zeros around the input images231
Architecture alteration Synonym encoding method that inserts an encoder before the input layer of the model and then
trains the model to eliminate adversarial perturbations232
An architecture using Bayesian classifiers (Gaussian processes with RBF kernels) to build more
robust neural networks233
Network verification A verification algorithm for DNNs with the ReLU function was proposed by Katz et al.234 verified
the neural networks utilizing Satisfiability Modulo Theory (SMT) solver
The method in Katz et al.234 was modified in max(x,y) = ReLU(x  y) + y and jjxjj = ReLu(2x)  x
to reduce the computational time
Ensembling counter-measures The proposed strategy used an ensemble of classifiers with weighted/unweighted average of their
prediction to increase robustness against attacks235
A probabilistic ensemble framework against adversarial examples that capitalizes on intrinsic
depth properties (e.g., probability divergence) of DNNs236
Adversarial detection Firstly, the features are squeezed either by decreasing each pixel’s color bit depth or smoothing
the sample using a spatial filter. Then, a binary classifier that uses as features the predictions of a
target model before and after squeezing of the input sample237
A framework that utilizes 10 nonintrusive image quality features to distinguish between legitimate
and adversarial attack samples212
Multiversion programming based an audio adversarial example detection approach, which utilizes
multiple off-the-shelf automatic speech recognition systems to determine whether an audio input
is an adversarial example238

PCA: principle component analysis; RBF: radial basis function; DNN: deep neural network.

called Gaussian process hybrid deep neural networks incorporating adversarial examples transferred from other
(GPDNNs), to attain comparable results both with and models for ensemble adversarial training purposes. Adam
without adversarial example scenarios. et al.236 introduced a probabilistic ensemble framework
Network verification strategies probe the sample to against adversarial examples that capitalizes on the intrinsic
determine if it is complying or infringing on the properties depth properties (e.g., probability divergence) of DNNs. He
of the DNN systems. A verification algorithm for DNNs et al.247 analyzed many defensive systems and demonstrated
with a ReLU function was proposed by Katz et al.234 that that ensembling counter-measures are not always able to
verified the neural networks utilizing the Satisfiability enhance the security of DNNs against adversarial attacks.
Modulo Theory (SMT) solver. Although the proposed Adversarial detection techniques focus on detecting the
method was robust under small perturbation, it has a attacks as anomaly detection or binary classification prob-
higher computational cost/time. The method was modified lems to determine whether the input sample is adversarial
by Carlini et al.245 by replacing ReLU with or not. Lu et al.248 trained a DNN-based binary detector
max(x, y) = ReLU(x  y) + y and jjxjj = ReLu(2x)  x to that labeled each input as legitimate or attack. Similarly,
reduce the computational time. Another method to attacks were detected by Grosse et al.249 (using maximum
decrease the computational cost was designed by mean discrepancy and energy distance), Feinman et al.250
Gopinath et al.246 that produces sage regions for targeted (using the Bayesian uncertainty view), Meng and Chen251
classes and also does not examine each point individually. (using Jensen–Shannon divergence probability diver-
Ensembling counter-measures combines multiple gence), and Pang et al.252 (using reverse cross-entropy),
defenses in parallel or serial fashion for improved security while Miller et al.253 and Paudice et al.254 designed anom-
and generalization capacity. Tramèr et al.225 recommended aly detection frameworks against evasion (based on null
Dasgupta et al. 35

Table 6. Overview of the KDD Cup 1999 dataset. Table 7. Overview of Kyoto 2006 + dataset.

Attack type Training dataset (%) Testing dataset (%) Session Session quantity

DoS 79.24 73.90 Unknown 425,719


R2L 0.23 5.21 Attack 43,043,225
Probe 0.83 1.34 Normal 50,033,015
U2R 0.01 0.07
Normal 19.69 19.48

DoS: Denial of Service; U2R: User-to-Root; R2L: Remote-to-Local.


are very popular for performance evaluation in a NIDS,
based on the KDD Cup 99 dataset. In addition to these 14
features, 10 more features were added to the dataset for
hypothesis density models) and poisoning (based on data
further and detailed analysis of NIDSs. The Kyoto 2006 +
pre-filtering and outlier detection) attacks.
dataset was captured using darknet sensors, honeypots, and
email servers, as well as a web crawler.259 An overview of
6. Databases for cybersecurity domains the dataset is given in Table 7.
and applications
Databases of diverse cybersecurity applications and 6.3. NSL-KDD dataset
domains constitute essential ground truth for training, test- The NSL-KDD dataset was generated in 2009, which is an
ing, and benchmarking methods for cyberspace. Over the updated version of the KDD Cup 99 dataset that not only
years, several databases have been released in the public solved the inherent redundant record problems but also made
domain. In this section, we present an overview of a few the training and testing record in such a fashion that algo-
representative datasets that have been used in the literature. rithms are not biased to any redundant records. The dataset
consists of the KDDTrain+ dataset as the training data and
KDDTest+ and KDDTest21 datasets as the testing data,
6.1. KDD Cup 1999 Dataset (DARPA1998) where the KDDTest21 data have different normal attack
To evaluate NIDSs, the Defense Advanced Research records as well as four different attack records. Like KDD
Projects Agency (DARPA) and Air Force Research Cup 99, the NSL-KDD dataset also has four attack types. A
Laboratory (AFRL) sponsored MIT Lincoln Laboratory in brief overview of the NSL-KDD dataset is given in Table 8.
1998 to distribute the first standard dataset. KDD Cup
1999 is a program that collected data from the MIT
Lincoln Laboratory, including TCP dump and BSM list
6.4. ECML-PKDD 2007 dataset
files. This data was prepared for the DARP’98 IDS eva- In 2007, the ECML-PKDD 2007 dataset was created for a
luation program and after that prepared by Fraley and DM competition named the ECML/PKDD Discovery
Cannady255 Although this dataset was prepared at least a Challenge, which was held in concurrence with the 18th
decade ago, it is one of the most popular datasets available European Conference on Machine Learning (ECML). The
to evaluate the performance of NIDSs. As of today, at characteristics of the ECML/PKDD 2007 dataset are given
least 30 researchers are using this dataset for their research in Table 9. This dataset is presented in extensible markup
purposes.256–258 This dataset normally deals with four language (XML), where all samples have their own unique
main attack types, namely DoS, probing, U2R, and R2L ID and comprise context, class, and query parts.260 The
attacks. Moreover, there are 38 numerical features with context part provides information about the OS running on
three content features. The features are mainly based on the system, the requested Hypertext Transfer Protocol
basic TCP connections; the content features collected by (HTTP) server, and the existence of Lightweight Directory
domain knowledge within a connection and, using a 2-sec- Access Protocol (LDAP), XML Path Language (XPATH),
ond time window, traffic features were captured. An over- and SQL database technology on the server. The proto-
view of the KDD Cup 1999 dataset is provided in Table 6. cols, packet header, body, Uniform Resource Identifier
(URI), etc., can also be found in the query parts.

6.2. Kyoto 2006 + dataset


6.5. Information Security and Object Technology
The Kyoto 2006 + dataset is a new evaluation dataset built
on 3 years of real traffic data (November 2006–August dataset
2009) by Kyoto University. This dataset consists of 24 sta- The Information Security and Object Technology (ISOT)
tistical features, where 14 features were extracted, which dataset is a publicly available new dataset that is a
36 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Table 8. Overview of NSL-KDD dataset.

Total Normal DoS Probe R2L U2R

KDDTest+ 22544 9711 7458 2421 2754 200


KDDTest21 11850 2152 4342 2402 2754 200
KDDTrain+ 125973 67343 45927 11656 995 52

DoS: Denial of Service; U2R: User-to-Root; R2L: Remote-to-Local.

traffic data also involves five big datasets, which made a


Table 9. ECML/PKDD dataset features.
big dataset for the Ericson lab as well as for researchers.261
Training dataset Testing dataset

Attacks 15,110 (30%) 28,137 (40%) 6.6. HTTP CSIC 2010 dataset
Path traversal 20% 18%
LDAP injection 15% 16% The HTTP CSIC 2010 dataset is a newly formed big data-
Cross-site scripting 12% 11% set that has been widely used to evaluate web attack pro-
SQL Injection 17% 18% tection systems. This dataset was developed by the
XPATH Injection 15% 16% Information Security Institute of the CSIC (Spanish
Command Execution 23% 23%
SSI 13% 12%
Research National Council) and consists of automatically
Total Request 50.116 70,143 generated several thousand web traffic requests. This data-
Valid Request 35,006 (70%) 42,006 (60%) set has in total 25,000 anomalous web requests and 6000
normal requests. Anomalous web request in this dataset
LDAP: Lightweight Directory Access Protocol; SQL: Structured Query
mainly refers to a wide variety of application-layer attacks.
Language; XPATH: Extensible Markup Language Path Language; SSI:
Server Side Includes. Static, dynamic, and unintentional illegal requests are the
three types of attacks that are mainly in the dataset. For
example, SQL injection attacks, XSS, buffer overflows,
etc., comprise dynamic attacks, whereas static attacks
combination of various botnet data with normal datasets, compromise hidden resources, such as a session ID
and contains in total 1,675,424 traffic flows. From the rewrite, configuration file modification, etc. The uninten-
French chapter of the honeynet project, malicious traffic tional illegal request type is not actually a malicious type;
for the ISOT dataset was collected that consists of Storm rather, it is a type that does not follow the behavior and
and Waledac botnets. Traffic Lab Ericson Research in structure of normal network data.
Hungary provided normal traffic for the ISOT dataset.
After that this, normal data were combined with data pro-
vided by the Lawrence Berkeley National Lab (LBNL), 6.7. CTU-13 (Czech Technical University) dataset
which provides the normal traffic data collected from sev- The CTU-13 dataset was created by the Czech Technical
eral applications and created diversity. Besides, LBNL University (CTU) of Prague, Czech Republic, in 2011.

Table 10. Data distribution for each botnet scenario.

Dataset scenarios Duration (h) Network flows Size (GB) Bot name No. of bots Botnet flow

1 16.36 1,925,150 34 Virut 1 38,791 (2.01%)


2 66.85 4,710,639 121 Rbot 1 26,759 (0.56%)
3 11.63 129,833 37.6 Virut 1 695 (0.53%)
4 2.18 558,920 30 Menti 1 4431 (0.79%)
5 6.15 2,824,637 52 Neris 1 39,933 (1.41%)
6 0.38 114,078 5.8 Sogou 1 37 (0.03%)
7 4.75 1,309,792 73 Rbot 10 106,315 (8.11%)
8 19.5 2,954,231 123 Murlo 1 5052 (0.17%)
9 4.21 1,808,123 60 Neris 1 18,839 (1.04%)
10 0.26 107,252 5.2 Rbot 3 8161 (7.6%)
11 5.18 2,753,885 94 Neris 10 179,880 (6.5%)
12 4.21 1,121,077 53 Rbot 1 1719 (0.15%)
13 1.21 325,472 8.3 NSIS.ay 3 2143 (0.65%)
Dasgupta et al. 37

Table 11. The composition of Australian Defence Force Table 12. UNSW-NB15 dataset features.
Academy Linux datasets.
Category Training set Testing set
System call trace type Number Label
Normal 56,000 37,000
Training 833 normal Analysis 2000 677
Webshell 118 attack Backdoor 1746 583
Hydra-SSH 148 attack DoS 12,264 4089
Java-Meterpreter 125 attack Exploits 33,393 11,132
Meterpreter 75 attack Fuzzers 18,184 6062
Hydra-FTP 162 attack Generic 40,000 18,871
Adduser 91 attack Reconnaissance 10,491 3496
Validation 4373 normal Shellcode 1133 378
Worms 130 44
SSH: Secure Shell; FTP: File Transfer Protocol. Total records 175,341 82,332

DoS: Denial of Service.


This dataset is one of the largest and more labeled existing
datasets in the cybersecurity field for botnet detection. The
aim of this dataset is to help the researcher to capture real
mixed botnet traffic. This dataset consists of 13 different Table 13. UNB-CIC android botnet dataset features.
scenarios, capturing different botnet samples. The details
of different botnet samples with their properties are Family Year of discovery No. of samples
described in Table 10. The advantage of using this dataset
for botnet detection research is that this dataset had been AnserverBot 2011 244
Bmaster 2012 6
prepared in a controlled environment as well as being DroidDream 2011 363
carefully labeled, ensuring fewer error data. Geinimi 2010 264
MisoSMS 2013 100
6.8. ADFA Linux datasets NickySpy 2011 199
Not Compatible 2014 76
Although in the field of NIDSs there are some existing PJapps 2011 244
yardstick datasets, such as DARPA, which was prepared Pletor 2014 85
more than a decade ago, they are not integrated with mod- RootSmart 2012 28
Sandroid 2014 44
ern computer characteristics. The aim of this dataset is to
TigerBot 2012 96
take the place of these standard datasets, reflecting all Wroba 2014 100
modern computer characteristics. Keeping this into mind, Zitmo 2010 80
in 2013, the Australian Defence Force Academy (ADFA)
at the University of New South Wales released the ADFA
Linux dataset. The dataset was prepared on an Ubuntu
Linux 11.04 host OS with Apache 2.2.17 running the File categorized into five different groups, namely flow, con-
Transfer Protocol (FTP), Secure Shell (SSH), MySQL tent, time, basic, and additionally generated features.
14.14, and PHP 5.3.5. The dataset consists of Linux-based Compared to other benchmark datasets, such as DARPA
system call traces of the normal and attack types. The and ADFD, the UNSW-NB15 dataset has many attack
details of the attack types are presented in Table 11. families that perfectly reflect modern low-footprint
attacks. A brief overview of attack types with their corre-
sponding data is given in Table 12.
6.9. UNSW-NB15 dataset
Another modern dataset with several attack families,
UNSW-NB15, was created by the Australian Centre for
6.10. UNB-CIC dataset
Cyber Security (ACCS). The ACCS in their Cyber Range The University of New Brunswick (UNB) publishes data-
Lab used the IXIA PerfectStorm (https://www.ixiacom. sets related to cybersecurity almost every year, collaborat-
com/products/perfectstorm) tool to create this dataset. This ing with the Canadian Institute for Cybersecurity (CIC;
dataset contains anonymized traffic traces from DDoS https://www.unb.ca/cic/datasets/index.html). These data-
attacks in 2007 for approximately 1 hour. This dataset sets are being used by many independent researchers, pub-
contains, in total, nine major types of attack. Moreover, lic and private industries, and universities around the
there are 49 attack features in the dataset. Argus and Bro- world. The available datasets that are currently available
IDS tools as well as 12 different models were used to have been categorized and are mentioned below based on
extract these features. All of these 49 features are the attack types.
38 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

Table 14. UNB-CIC botnet dataset features.

Botnet name Type Portion of flows in dataset

Neris IRC (Internet Relay Chat) 21,159 (12%)


Rbot IRC 39,316 (22%)
Virut HTTP (Hypertext Transfer Protocol) 1638 (0.94 %)
NSIS P2P (peer-to-peer) 4336 (2.48%)
SMTP Spam P2P 11,296 (6.48%)
Zeus P2P 31 (0.01%)
Zeus control (C&C) P2P 20 (0.01%)

6.10.1. Android botnet dataset. A large collection of not reflect current trends or are highly anonymized, and
Android botnet samples from the Android Genome others lack particular statistical features. Researchers at
Malware project, malware security blog, VirusTotal, and ISCX focused on these above-mentioned issues and also
well-known anti-malware vendors are accumulated in this worked on generating datasets dynamically instead of sta-
dataset. The dataset represents a total of 14 botnet fami- tic and one-time datasets. Moreover, they also focused on
lies. Moreover, the dataset has been enriched from 2010 to the aspects of modifiable, reproducible, and extensible
2014 to include 1929 samples, as depicted in Table 13. datasets. The UNB ISCX IDS 2012 dataset includes a full
packet payload in pcap format, including all labeled net-
work traces. This dataset is also publicly available for
6.10.2. Botnet dataset. Diversity in data is the key to simu- researchers.
late real traffic to an acceptable level in terms of botnet
detection approaches. To create a repository of botnet
datasets heterogeneous enough to simulate a real network, 6.10.5. Tor-nonTor dataset (ISCXTor2016). The researchers
the authors emphasize generality, realism, and representa- used seven traffic categories (browsing, email, chat, audio-
tiveness features of a dataset. The authors used overlay streaming, video-streaming, FTP, Voice over Internet
methodology, one of the popular methods for creating syn- Protocol (VoIP), peer-to-peer (P2P)) for Tor traffic, with
thetic datasets, to combine all different collections of data- their benign traffic from the VPN project for non-Tor. The
sets into one. Their dataset is divided into training and test researchers used Wireshark and tcpdump to capture the
datasets, including seven and 16 types of total botnet cate- traffic, generating a total of 22 GB of data. They facilitate
gories, respectively. Moreover, 43.92% and 44.97% of their labeling process for the outgoing traffic by collecting
malicious traffic flow is contained in the training and test- a set of pairs of pcap files at the workstation (for non-Tor)
ing datasets, respectively. A simple distribution of botnet and also at the gateway (for Tor), as detailed in Table 15.
types in the training dataset is given in Table 14.
6.10.6. UNB ISCX VPN-nonVPN dataset. The researchers
collected data over 14 categories (VoIP, VPN-VoIP, P2P,
6.10.3. CIC DoS dataset. The UNB-CIC created a dataset
VPN-P2P, etc.) in regular sessions and also a session over
with application-layer DoS attacks in the response adver-
VPN for the UNB ISCX Network Traffic (VPN-nonVPN)
sity of this type of dataset. Apache Linux v.2.2.22, PHP5,
dataset. To capture the data, the researchers used
and Drupal v.7 were used as a content management system
Wireshark and tcpdump and collected a total amount of 28
in the testbed environment. The authors focused on select-
GB of data. To expedite the labeling of the data, research-
ing the most common types of application-layer DoS
ers closed all unnecessary services and applications when
attacks in their dataset. The authors also intermixed their
capturing the network traffic. These datasets consist of
generated application-layer DoS attack with the attack-free
labeled network traffic in pcap and csv formats; these
traces from the ISCX-IDS dataset (where ISCX stands for
datasets are also available in public.
Information Security Centre of Excellence). In total, the
authors produced four different types of attack using differ-
ent tools, resulting in eight different types of application- 7. Challenges and future research
layer DoS attack traces. directions
Although years of research and development have resulted
6.10.4. UNB ISCX-IDS-2012. The scarcity of adequate data- in impactful cybersecurity systems, cyberspace security
sets creates inaccurate evaluation, comparison, and deploy- mechanisms against ever-evolving threats still have a long
ment of anomaly-based systems. Some datasets in IDSs way to go. In this section, we briefly describe a few open
are internal, some have privacy issues to share, some do issues and research directions.
Dasgupta et al. 39

Table 15. UNB-CIC Tor network traffic dataset content. samples, attacks, datasets, contents, or contexts in which
the model(s) have not been tuned or trained, thus leading
Traffic categories Content to low generalization capability, for instance, the perfor-
Web browsing Firefox and Chrome mance drop reported on novel malware samples in
Email SMPTS, POP3S, and IMAPS Yousefi-Azar et al.262 Similarly, the security model pro-
Chat ICQ, AIM, Skype, Facebook, and posed for ransomware attacks will not necessarily work
Hangouts for spyware-related attacks. Cybersecurity models are
Streaming Vimeo and YouTube often based on a priori known threats, which limits their
File transfer Skype, FTP over SSH (SFTP) and FTP
over SSL (FTPS) using Filezilla and an capacity in real-world domains, where the nature of
external service attacks is unpredictable (especially zero-day attacks).
VoIP Facebook, Skype and Hangouts voice Developing generalized mathematical security models by
calls considering a wide range of network infrastructure (e.g.,
P2P uTorrent and Transmission (Bittorrent) bandwidth, jitter), varying attacks (e.g., worm, virus), attri-
SSH: Secure Shell; FTP: File Transfer Protocol; SSL: Secure Sockets butes (e.g., bitrate, temporal information), contents (e.g.,
Layer; VoIP: Voice over Internet Protocol; P2P: peer-to-peer. data information), contexts (e.g., government, private),
and devices/terminals (e.g., mobile, cloud computers) will
crucially push the state-of-the-art in cybersecurity.
7.1. Performance evaluation framework Moreover, since collecting and predicting attacks are
remarkably challenging, developing new models to gener-
Cybersecurity system performance or accuracy lacks stan-
ate advanced attacks with varying strengths should also be
dardization as a whole, thereby raising the question, ‘‘What
explored in order to improve generalization ability.
is the best way to evaluate, configure or compare different
cybersecurity systems?’’ or ‘‘Do we have a universal metho-
dology to evaluate the robustness and performance under all 7.3. Security by design
or different scenarios?’’ There is a strong requirement of a
Each step in a conventional cybersecurity system design,
comprehensive performance evaluation framework to rank ranging from data collection to final detection/classifica-
current and future schemes. To this end, research and indus- tion (including feature extraction, selection, and security/
trial communities may explore three directions, that is, devis- accuracy evaluation), requires a revisit to consider the
ing tools and protocols with and without attack scenarios, presence of adversaries. Such an approach is usually
producing documents for standardized common criteria, and known as the security by design paradigm. For example,
developing an open online platform. For protocols and tools, feature extraction must have not only great generalization
participants should be urged to propose new performance capability but also feature vulnerability to attacks. Another
method-, matrix-, privacy-, and security-relevant error rates potential solution might be updating systems periodically by
as well as a unified scheme and common vocabulary to adding new features and/or samples; such solutions should
explain cybersecurity system performances, as the majority be computationally fast, efficient, and (if possible) auto-
of current metrics do not cover all aspects of various cyber- mated. Techniques like concept drift and reinforcement/
security domains. Also, different metrics have been utilized unsupervised/active learning might be useful in proactively
in different studies for the same or different domains, thus embedding ‘‘security by design’’ into the models.
making it difficult to compare the systems across domains.
Given the diverse and multivariate nature of cybersecurity
systems, common criteria for evaluating robustness, attack 7.4. Advanced machine learning algorithms for
sophistication, decision making, and policies would help cybersecurity
report baseline operations without giving a false sense of Quintessential cyberspace security systems are built on
progress. In addition, resources like large public databases, modeling non-linear adversary behaviors with untrust-
open-source software, and experimental setups will encour- worthy legacy functions and features, which makes them
age ‘‘reproducible research’’ on scalability and challenges in prone to over-fitting and lower overall reliability. In con-
real-world applications, since most reported results in trast, security systems based on advanced machine learn-
research publications are based on comparatively small lab- ing (AML) try to mimic adversary (with different content,
based datasets. contexts, and human thinking) instead of developing an
explicit model(s) of an attack or system. Comparatively,
limited works have been conducted on the use of AML,
7.2. Generalization to unknown such as open set recognition, dictionary learning, and DL
The security and performance of most cybersecurity sys- for cybersecurity. In the future, AML should be employed
tems degrade remarkably when they encounter novel for robust feature extraction/representation/selection/
40 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

classification and finding temporal correlations within and 7.8. Hardware-based counter-measures against
between different adversaries and domains to attain higher cyber-attacks
interoperability and generalization.
Day-by-day cyber-attacks are expanding not only in num-
bers but also in sophistication, and the vast majority of
7.5. Robustness of DL-based security systems their counter-measures are software-based techniques.
Regardless of the high accuracies of DNN systems in a There is an emerging strategy in thwarting cyber-attacks,
variety of security tasks, recent studies have shown them that is, the (re)design of the micro-architecture of hard-
to be vulnerable to inputs with subtle perturbations. ware processors; for instance, the addition of hardware
Almost all kinds of DNNs (e.g., CNNs, MLPs, RNNs) can performance counters in central processing units (CPUs).
be fooled by adversarial examples.243 Thus, we can state However, this research topic is still in its infancy and the
that adversarial examples are a real threat to DL systems state-of-the-art is nascent. Moreover, the potential of
in the real-world. A few researchers have attempted to find hardware-based together with software-based counter-
the scientific reasons for DNNs’ vulnerability. For measures needs to be fully realized and investigated.
instance, in Goodfellow et al. the authors argued that the
linear nature of the DNN-classifier is the source of sus- 7.9. Interdisciplinary research
ceptibility. Anyway, the issue of security and privacy due
to adversarial attacks is compounded owing to the trans- To further cybersecurity technologies, the research and
ferability property, that is, the adversarial examples gener- industrial communities should promote and support inter-
ated against a neural network can also fool other neural disciplinary basic science research, including contributions
networks with discrete layouts. More systematic and dedi- from computer science, ML, psychology, and geometry, to
cated research efforts are needed to improve the DNN name a few. Multi-disciplinary joint researches would lead
robustness, as most defenses proposed in the literature to attain reliable, natural, and efficient methodologies
have been broken.247 All in all, there is ample room to against various security threats, attacks, and surfaces.
develop new adversarial examples of generation and
defense schemes. 8. Conclusion
Because of digital omni-connectivity and the ubiquitous
7.6. Privacy preserving in cybersecurity presence of small (e.g., smart-watches) to large computing
Recent works have experimentally demonstrated that ML- devices (e.g., smart metering systems), an enormous
and DL-based cybersecurity systems, including distributed, amount of data scaled from public to classified by individ-
decentralized, and privacy-preserving schemes, can leak uals and government organizations are being produced,
data and/or privacy. Moreover, the addition of privacy- processed, stored, and traded throughout cyber-enabled
preserving algorithms in the system affects the system’s networks. Therefore, securing the data and cyber networks
efficiency. Therefore, further research and development on has become of paramount importance for small-to-large
high-performing privacy-preserving technology without organizations as well as from individuals to nations.
affecting the accuracy of the systems is essential to find Nowadays, the use of ML to secure cyber-space has shown
productive solutions. great improvement by ensuring the robustness of a network
as well as maintaining the integrity of the data. On the
other hand, attackers have also figured out the adversarial
7.7. Encyclopedic datasets way of using ML to twist the performance of cybersecurity
Since cybersecurity is highly data-driven, there exist sev- measures, for example, the malware detection mechanism,
eral public cybersecurity datasets. Nonetheless, there is IDS, cyber identity detection, etc. Comparatively, limited
lack of large-scale comprehensive datasets that have prop- studies have been conducted to analyze the ML vulnerabil-
erties such as being fully labeled, structured, and complete ity issues and their corresponding defensive techniques.
(with all related information, e.g., original network Being responsive to that urge, we in this paper organized
dumps), having diversified normal samples, attacks, proto- the most recent (2013–2018) works related to cybersecur-
cols, domains, and usages, and that are real (without being ity, where ML is being used. In this comprehensive survey,
artificially generated traces in labs) and reliable (100% we presented the most commonly used ML algorithms in
correct labels). Data and user privacy, difficulties creating cybersecurity by considering the basics of the algorithms,
true labels, and access to an application’s real environ- DM techniques used in those algorithms, and the applica-
ments are some of the biggest hurdles in database creation tions. In addition, works on adversarial ML have been also
and public release. described in detail, including the robustness of DL against
Dasgupta et al. 41

attacks. A comprehensive overview of cybersecurity data- 13. Humayed A, Lin J, Li F, et al. Cyber-physical systems
sets is also presented. Finally, open issues, challenges, and security—a survey. IEEE Internet Things J 2017; 4: 1802–
future research directions have been provided for budding 1831.
researchers and engineers. 14. Wang D, Guan X, Liu T, et al. A survey on bad data injection
attack in smart grid. In: IEEE PES Asia-Pacific power and
energy engineering conference (APPEEC), IEEE, Kowloon,
Funding
China, 8–11 December 2013, pp.1–6.
This research received no specific grant from any funding 15. Deng R, Xiao G, Lu R, et al. False data injection on state
agency in the public, commercial, or not-for-profit sectors. estimation in power systems—attacks, impacts, and defense:
a survey. IEEE Trans Ind Informat 2017; 13: 411–423.
16. Bou-Harb E. A brief survey of security approaches for cyber-
ORCID iD physical systems. In: 8th IFIP international conference on
Zahid Akhtar https://orcid.org/0000-0002-5026-5416 new technologies, mobility and security (NTMS), IEEE,
Larnaca, Cyprus, 21–23 November 2016, pp.1–5.
17. Beasley C, Zhong X, Deng J, et al. A survey of electric
References power synchrophasor network cyber security. In: IEEE PES
1. Al-Garadi MA, Mohamed A, Al-Ali A, et al. A survey of innovative smart grid technologies, IEEE, Istanbul, Turkey,
machine and deep learning methods for Internet of Things 12-15 October 2014, pp.1–5.
(IoT) security. arXiv.org, vol. arXiv:1807.11023, 2018, 18. Ucci D, Aniello L and Baldoni R. Survey on the usage of
pp.1–42. machine learning techniques for malware analysis. ACM
2. Thomas J. Individual cyber security: empowering employees Trans Web 2017; 1: 1–34.
to resist spear phishing to prevent identity theft and ransom- 19. Ye Y, Li T, Adjeroh D, et al. A survey on malware detection
ware attacks. Int J Business Manag 2018; 13: 1–24. using data mining techniques. ACM Comput Surv 2017; 50:
3. Kwon D, Kim H, Kim J, et al. A survey of deep learning- 41:1–41:40.
based network anomaly detection. Cluster Comput 2017; 22: 20. Bazrafshan Z, Hashemi H, Fard SMH, et al. A survey on
949–-961. heuristic malware detection techniques. In: The 5th confer-
4. Tong W, Lu L, Li Z, et al. A survey on intrusion detection ence on information and knowledge technology, Shiraz, Iran,
system for advanced metering infrastructure. In: Sixth inter- 28–30 May 2013, pp.113–120.
national conference on instrumentation & measurement, 21. Souri A and Hosseini R. A state-of-the-art survey of mal-
computer, communication and control (IMCCC), IEEE, ware detection approaches using data mining techniques.
Harbin, China, 21–23 July 2016, pp.33–37. Hum Centric Comput Inform Sci 2018; 8: 1–3.
5. Gardiner J and Nagaraja S. On the security of machine learn- 22. Barriga J and Yoo SG. Malware detection and evasion with
ing in malware c & c detection: a survey. ACM Comput Surv machine learning techniques: a survey. Int J Appl Eng Res
2016; 49: 59:1–59:39. 2017; 12: 7207–7214.
6. Siddique K, Akhtar Z, Aslam Khan F, et al. KDD cup 99 data 23. Bontupalli V and Taha TM. Comprehensive survey on intru-
sets: a perspective on the role of data sets in network intru- sion detection on various hardware and software. In: 2015
sion detection research. Computer 2019; 52: 41–51. national aerospace and electronics conference (NAECON),
7. Siddique K, Akhtar Z, Khan MA, et al. Developing an intru- IEEE, Dayton, OH, USA, 15–19 June 2015, pp.267–272.
sion detection framework for high-speed big data networks: 24. Buczak AL and Guven E. A survey of data mining and
a comprehensive approach. KSII Trans Internet Inform Syst machine learning methods for cyber security intrusion detec-
2018; 12: 4021–4037. tion. IEEE Commun Surv Tutorial 2016; 18: 1153–1176.
8. Shanbhogue RD and Beena BM. Survey of data mining 25. Resende PAA and Drummond AC. A survey of random for-
(DM) and machine learning (ML) methods on cyber secu- est based methods for intrusion detection systems. ACM
rity. Ind J Sci Technol 2017; 10: 1153–1176. Comput Surv 2018; 51: 48:1–48:36.
9. Liu Q, Li P, Zhao W, et al. A survey on security threats and 26. KishorWagh S, Pachghare V and Kolhe S. Survey on intru-
defensive techniques of machine learning: a data driven sion detection system using machine learning techniques. Int
view. IEEE Access 2018; 6: 12103–12117. J Comput Appl 2013; 78: 30–37.
10. Bou-Harb E, Debbabi M and Assi C. Cyber scanning: a com- 27. Liu L, Vel OD, Han Q, et al. Detecting and preventing cyber
prehensive survey. IEEE Commun Surv Tutorial 2014; 16: insider threats: a survey. IEEE Commun Surv Tutorial 2018;
1496–1519. 20: 1397–1417.
11. Luh R, Marschalek S, Kaiser M, et al. Semantics-aware 28. Sultana N, Chilamkurti N, Peng W, et al. Survey on SDN
detection of targeted attacks: a survey. J Comput Virol Hack based network intrusion detection system using machine learn-
Techniq 2017; 13: 47–85. ing approaches. Peer-to-Peer Netw Appl 2019; 12: 493–501.
12. Shabut AM, Lwin KT and Hossain MA. Cyber attacks, coun- 29. Gupta S, Singhal A and Kapoor A. A literature survey on
termeasures, and protection schemes – a state of the art survey. social engineering attacks: Phishing attack. In: 2016 interna-
In: 2016 10th international conference on software, knowledge, tional conference on computing, communication and auto-
information management & applications (SKIMA), IEEE, mation (ICCCA), IEEE, Noida, India, 29–30 April 2016,
Chengdu, China, 15–17 Dec. 2016, pp.37–44. pp.537–540.
42 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

30. Wang Y, Wang Y, Liu J, et al. A survey of game theoretic 44. Almomani A, Gupta BB, Wan TC, et al. Phishing dynamic
methods for cyber security. In: 2016 IEEE first international evolving neural fuzzy framework for online detection zero-
conference on data science in cyberspace (DSC), IEEE, day phishing email. CoRR 2013; 1302.0629: 1–5.
Changsha, China, 13–16 June 2016, pp.631–636. 45. Almomani A, Gupta BB, Atawneh S, et al. A survey of
31. Al-Enezi KA, Al-Shaikhli IF, Al-Kandari AR, et al. A sur- phishing email filtering techniques. IEEE Commun Surv
vey of intrusion detection system using case study Kuwait Tutorial 2013; 15: 2070–2090.
Governments entities. In: 2014 3rd international conference 46. Ling Y and Sani NFM. Short review on metamorphic mal-
on advanced computer science applications and technolo- ware detection in hidden Markov models. Int J Adv Res
gies, IEEE, Amman, Jordan, 29–30 December 2014, pp.37– Comput Scie Softw Eng 2017; 7: 62–69.
43. 47. Alqurashi S and Batarfi O. A comparison of malware detec-
32. Alotaibi F, Furnell S, Stengel I, et al. A survey of cyber- tion techniques based on hidden Markov model. J Inform
security awareness in Saudi Arabia, 2016 11th international Secur 2016; 07: 215–223.
conference for internet technology and secured transactions 48. Zeb K, Baig O and Asif MK. DDoS attacks and countermea-
(ICITST), IEEE, Barcelona, Spain, 5–7 December 2016, sures in cyberspace. In: 2015 2nd world symposium on web
pp.154–158. applications and networking (WSWAN), IEEE, Sousse,
33. Sen S, Gupta KD, Poudyal S, et al. A genetic algorithm Tunisia, 21–23 March 2015, pp.1–6.
approach to optimize dispatching for a microgrid energy sys- 49. Jordan MI and Mitchell TM. Machine learning: trends, per-
tem with renewable energy sources. In: 5th international spectives, and prospects. Am Assoc Adv Sci 2015; 349: 255–
conference on computer science and information technology 260.
(CSTY 2019), AIRCC, Dubai, UAE, 30 November 2019, 50. Qiu J, Wu Q, Ding G, et al. A survey of machine learning for
vol. 9:14, pp.1–9. big data processing. EURASIP J Adv Sign Proc 2016; 2016:
34. Dasgupta D, Shrein JM and Gupta KD. A survey of block- 67:1–67:16.
chain from security perspective. J Bank Finan Technol 2019; 51. Mnih V, Kavukcuoglu K, Silver D, et al. Human-level con-
3: 1–17. trol through deep reinforcement learning. Nature 2015; 518:
35. Liu Q, Li P, Zhao W, et al. A survey on security threats and 529–533.
defensive techniques of machine learning: a data driven 52. Alsheikh MA, Lin S, Niyato D, et al. Machine learning in
view. IEEE Access 2018; 6: 12103–12117. wireless sensor networks: algorithms, strategies, and applica-
36. Abdulkareem Al-Enezi K, Alshaikhli I, Alkandari A, et al. tions. IEEE Commun Surv Tutorial 2014; 16: 1996–2018.
A survey of intrusion detection system using case study 53. Alpaydin E. Introduction to machine learning. Cambridge,
Kuwait Governments entities. In: proceedings of the 2014 Massachusetts: MIT Press, 2020.
3rd international conference on advanced computer science 54. Kotsiantis SB. Decision trees: a recent overview. Artif Intell
applications and technologies (ACSAT ’14). IEEE Computer Rev 2013; 39: 261–283.
Society, USA, Amman, Jordan, 29–30 December 2014, 55. Goeschel K. Reducing false positives in intrusion detection
pp.37–43. systems using data-mining techniques utilizing support vec-
37. Arief B, Adzmi MAB and Gross T. Understanding cyber- tor machines, decision trees, and naive Bayes for off-line
crime from its stakeholders’ perspectives: Part 1–attackers. analysis. In: SoutheastCon 2016, IEEE, Norfolk, VA, USA,
IEEE Secur Priv 2015; 13: 71–76. 30 March–3 April 2016, pp.1–6.
38. Shaikh AN, Shabut AM and Hossain MA. A literature 56. Kim G, Lee S and Kim S. A novel hybrid intrusion detection
review on phishing crime, prevention review and investiga- method integrating anomaly detection with misuse detection.
tion of gaps. In: 2016 10th international conference on soft- Expert Syst Appl 2014; 41: 1690–1700.
ware, knowledge, information management & applications 57. Hastie T, Tibshirani R and Friedman J. The elements of sta-
(SKIMA), IEEE, Chengdu, India, 15–17 December 2016, pp. tistical learning: data mining, inference, and prediction.
9–15. New York, USA: Springer Science & Business Media, 2009.
39. Gupta BB, Arachchilage NAG and Psannis KE. Defending 58. Chauhan NS. A friendly introduction to support vector
against phishing attacks: taxonomy of methods, current machines (SVM). Towards data science. https://towardsda-
issues and future directions. Telecommun Syst 2018; 67: tascience.com/a-friendly-introduction-tosupport-vector-mach
247–267. ines-svm-925b68c5a079 (2020, accessed 28 July 2020).
40. Ye Y, Li T, Adjeroh D, et al. A survey on malware detection 59. Duda RO, Hart PE and Stork DG. Pattern classification.
using data mining techniques. ACM Comput Surv 2017; 50: New York, USA: John Wiley Sons, 2012.
41:1–41:40. 60. Bhuyan MH, Bhattacharyya DK and Kalita JK. Network
41. Tahir R. A study on malware and malware detection tech- anomaly detection: methods, systems and tools. IEEE
niques. Int J Educ Manag Eng 2018; 8: 20–30. Commun Surv Tutorial 2014; 16: 303–336.
42. Liu H and Lang B. Machine learning and deep learning 61. Saha S. A comprehensive guide to convolutional neural net-
methods for intrusion detection systems: a survey. Mach works – the ELI5 way. Towards data science. https://
Learn Cybersecur Threat Chall Opportun 2019; 9: 1–48. towardsdatascience.com/a-comprehensive-guide-toco
43. Khonji M, Iraqi Y and Jones A. Phishing detection: a literature nvolutional-neural-networks-the-eli5-way-3bd2b1164a53
survey. IEEE Commun Surv Tutorial 2013; 15: 2091–2121. (2020, accessed 28 July 2020).
Dasgupta et al. 43

62. Fadlullah ZM, Tang F, Mao B, et al. State-of-the-art deep 76. Sahu S and Mehtre BM. Network intrusion detection system
learning: Evolving machine intelligence toward tomorrow’s using J48 decision tree. In: 2015 international conference on
intelligent network traffic control systems. IEEE Commun advances in computing, communications and informatics
Surv Tutorial 2017; 19: 2432–2455. (ICACCI), IEEE, Kochi, India, 10–13 August 2015, pp.2023–
63. Ganegedara T. Intuitive guide to convolution neural net- 2026.
works. Towards data science. Available at: https://towards 77. Adebiyi A, Arreymbi J and Imafidon C. A neural network
datascience.com/light-on-math-machine-learning-intuitive- based security tool for analyzing software. In: Technological
guide-to-convolution-neural-networks-e3f054dd5daa (2020, innovation for the Internet of Things. DoCEIS 2013 LM
accessed 28 July 2020). Camarinha-Matos, S Tomic, P Gracxa (eds). IFIP advances in
64. Hermans M and Schrauwen B. Training and analyzing deep information and communication technology, vol 394,
recurrent neural networks. In: Advances in neural informa- Springer, Berlin, Heidelberg, 2013, pp.80–87.
tion processing systems (NIPS) 26, 2013, pp.190–198. 78. Wu J, Peng D, Li Z, et al. Network intrusion detection based
65. Goh J, Adepu S, Tan M, et al. Anomaly detection in cyber on a general regression neural network optimized by an
physical systems using recurrent neural networks. In: IEEE improved artificial immune algorithm. PLOS one 2015; 10:
18th international symposium on high assurance systems 1–13.
engineering (HASE), IEEE Computer Society, Singapore, 79. Shenfield A, Day D and Ayesh A. Intelligent intrusion detec-
12–14 January 2017, pp.140–145. tion systems using artificial neural networks. ICT Expr 2018;
66. Fischer A and Igel C. Training restricted Boltzmann 4: 95–99.
machines: an introduction. Pattern Recognit 2014; 47: 25– 80. Ganeshkumar P and Pandeeswari N. Adaptive neuro-fuzzy-
39. based anomaly detection system in cloud. Int J Fuzzy Syst
67. Alom MZ, Bontupalli V and Taha TM. Intrusion detection 2016; 18: 367–378.
using deep belief networks. In: 2015 national aerospace and 81. Dash T. A study on intrusion detection using neural networks
electronics conference (NAECON), IEEE, Dayton, OH, trained with evolutionary algorithms. Soft Comput 2017; 21:
USA, 15–19 June 2015, pp.339–344. 2687–2700.
68. Deng L and Yu D. Deep learning: methods and applications. 82. Guha S, Yau SS and Buduru AB. Attack detection in cloud
Foundation Trend Sign Proc 2014; 7: 197–387. infrastructures using artificial neural network with genetic
69. Hinton GE. A practical guide to training restricted feature selection. In: 2016 IEEE 14th international confer-
Boltzmann machines (version 1). Technical Report UTML ence on dependable, autonomic and secure computing, 14th
TR 2010-003, University of Toronto, vol. 9, 2010, pp.599– international conference on pervasive intelligence and com-
619. puting, 2nd international conference on big data intelligence
70. Yamashita T, Tanaka M, Yoshida E, et al. To be Bernoulli or and computing and cyber science and technology congress
to be Gaussian, for a restricted Boltzmann machine. In: 2014 (DASC/PiCom/DataCom/CyberSciTech), IEEE, Auckland,
22nd international conference on pattern recognition, IEEE, New Zealand, 8–12 August 2016, pp.414–419.
Stockholm, Sweden, 24–28 August 2014, pp.1520–1525. 83. Saied A, Overill RE and Radzik T. Detection of known and
71. Prajapati NM, Mishra A and Bhanodia P. Literature survey - unknown DDoS attacks using artificial neural networks.
IDS for DDoS attacks. In: 2014 conference on IT in business, Neurocomputing 2016; 172: 385–393.
industry and government (CSIBIG), IEEE, Indore, India, 8–9 84. Villaluna JA and Cruz FRG. Information security technology
March 2014, pp.1–3. for computer networks through classification of cyber-attacks
72. Relan NG and Patil DR. Implementation of network intru- using soft computing algorithms. In: 2017 IEEE 9th interna-
sion detection system using variant of decision tree algo- tional conference on humanoid, nanotechnology, information
rithm. In: 2015 international conference on nascent technology, communication and control, environment and
technologies in the engineering field (ICNTE), IEEE, Navi management (HNICEM), IEEE, Manila, Philippines, 1–3
Mumbai, India, 9–10 January 2015, pp.1–5. December 2017, pp.1–6.
73. Wang H and Chen B. Intrusion detection system based on 85. Kosek AM. Contextual anomaly detection for cyber-physical
multi-strategy pruning algorithm of the decision tree. In: pro- security in smart grids based on an artificial neural network
ceedings of 2013 IEEE international conference on grey sys- model. In: 2016 joint workshop on cyber-physical security
tems and intelligent services (GSIS), IEEE, Macao, China, and resilience in smart grids (CPSR-SG), IEEE, Vienna,
15–17 November 2013, pp.445–447. Austria, 12 April 2016, pp.1–6.
74. Elekar KS. Combination of data mining techniques for intru- 86. Teoh TT, Zhang Y, Nguwi YY, et al. Analyst intuition
sion detection system. In: 2015 international conference on inspired high velocity big data analysis using PCA ranked
computer, communication and control (IC4), IEEE, Indore, fuzzy k-means clustering with multi-layer perceptron (MLP)
India, 10–12 September 2015, pp.1–5. to obviate cyber security risk. In: 2017 13th international
75. Alharbi S, Rodriguez P, Maharaja R, et al. Secure the conference on natural computation, fuzzy systems and knowl-
Internet of Things with challenge response authentication in edge discovery (ICNC-FSKD), IEEE, Guilin, China, 29–31
fog computing. In: 2017 IEEE 36th international perfor- July 2017, pp.1790–1793.
mance computing and communications conference (IPCCC), 87. Saroare MK, Sefat MS, Sen S, et al. A modified penalty
IEEE, San Diego, CA, USA, 10–12 December 2017, pp.1–2. function in fuzzy clustering algorithm. In: 2017 intelligent
44 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

systems conference (IntelliSys), IEEE, London, UK, 7–8 communications (GreenCom) and IEEE cyber, physical and
September 2017, pp.446–451. social computing (CPSCom) and IEEE smart data
88. Kevric J, Jukic S and Subasi A. An effective combining clas- (SmartData), IEEE, Exeter, UK, 21–23 June 2017, pp.98–103.
sifier approach using tree algorithms for network intrusion 102. Chockalingam S, Pieters W, Teixeira A, et al. Bayesian net-
detection. Neur Comput Appl 2017; 28: 1051–1058. work models in cyber security: a systematic review. Secur
89. Vasić V, Mikuc M and Vuković M. Lightweight and adapta- IT Syst 2017; 10674: 105–122.
ble solution for security agility. KSII Trans Internet Inform 103. Kolini F and Janczewski LJ. Clustering and topic model-
Syst 2016; 10: 1–17. ling: a new approach for analysis of national cyber security
90. Jo M, Han L, Kim D, et al. Selfish attacks and detection in strategies. In: 21st Pacific Asia conference on information
cognitive radio ad-hoc networks. IEEE Network 2013; 27: systems (PACIS), Association for Information Systems, AIS
46–50. Electronic Library (AISeL), Langkawi, Malaysia, 16–20
91. Liu Y and Pi D. A novel kernel SVM algorithm with game July 2017, pp.126:1–126:12.
theory for network intrusion detection. KSII Trans Internet 104. Bali RS and Kumar N. Secure clustering for efficient data
Inform Syst 2017; 11: 4043–4060. dissemination in vehicular cyber–physical systems. Fut
92. Senthilnayaki B, Venkatalakshmi K and Kannan A. Intrusion Generat Comput Syst 2016; 56: 476–492.
detection using optimal genetic feature selection and SVM 105. Su J, Vasconcellos VD, Prasad S, et al. Lightweight classi-
based classifier. In: 2015 3rd international conference on fication of IoT malware based on image recognition. In:
signal processing, communication and networking (ICSCN), 2018 IEEE 42nd annual computer software and applica-
IEEE, Chennai, India, 26–28 March 2015, pp.1–4. tions conference (COMPSAC), IEEE, Tokyo, Japan, 23–27
93. Dalmazo BL, Vilela JP, Simoes P, et al. Expedite feature July 2018, pp.664–669.
extraction for enhanced cloud anomaly detection. In: NOMS 106. Yue S. Imbalanced malware images classification: a CNN
2016: 2016 IEEE/IFIP network operations and management based approach. CoRR 2017; 1708: 1–5.
symposium, IEEE, Istanbul, Turkey, 25–29 April 2016, 107. Poudyal S, Subedi KP and Dasgupta D. A framework for
pp.1215–1220. analyzing ransomware using machine learning. In: 2018
94. Omrani T, Dallali A, Rhaimi BC, et al. Fusion of ANN and IEEE symposium series on computational intelligence
SVM classifiers for network attack detection. In: 2017 18th (SSCI), iEEE, Bangalore, India, 18–21 November 2018,
international conference on sciences and techniques of auto- pp.1692–1699.
matic control and computer engineering (STA), IEEE, 108. Poudyal S, Dasgupta D, Akhtar Z, et al. A multi-level ran-
Monastir, Tunisia, 21–23 December 2017, pp.374–377. somware detection framework using natural language pro-
95. Ghanem K, Aparicio-Navarro FJ, Kyriakopoulos KG, et al. cessing and machine learning. In: 14th international
Support vector machine for network intrusion and cyber- conference on malicious and unwanted software
attack detection. In: 2017 sensor signal processing for (MALCON), MA, USA, October 2019, pp.101–108.
defence conference (SSPD), IEEE, London, UK, 6–7 109. Poudyal S, Akhtar Z, Dasgupta D, et al. Malware analytics:
December 2017, pp.1–5. review of data mining, machine learning and big data per-
96. Terai A, Abe S, Kojima S, et al. Cyber-attack detection for spectives. In: 2019 IEEE symposium series on computa-
industrial control system monitoring with support vector tional intelligence (SSCI), IEEE, Xiamen, China, 6–9
machine based on communication profile. In: 2017 IEEE December 2019, pp.649–656.
European symposium on security and privacy workshops 110. Poudyal S, Gupta KD and Sen S. PEFile analysis: a static
(EuroS&PW), IEEE, Paris, France, 26–28 April 2017, approach to ransomware analysis. Int J Forens Comput Sci
pp.132–138. 2019; 1: 34–39.
97. Shailendra Singh SS. Cyber attack detection system based on 111. Dahl GE, Stokes JW, Deng L, et al. Large-scale malware
improved support vector machine. Int J Secur Appl 2015; 9: classification using random projections and neural net-
371–386. works. In: 2013 IEEE international conference on acous-
98. Wu C, Ding W, Liu R, et al. Keystroke dynamics enabled tics, speech and signal processing, IEEE, Vancouver, BC,
authentication and identification using triboelectric nanogen- Canada, 26–31 May 2013, pp.3422–3426.
erator array. Mater Today 2018; 21: 216–222. 112. Saxe J and Berlin K. Deep neural network based malware
99. Rathore SK and Yadav DSK. A hybrid Bayesian approach detection using two dimensional binary program features.
with ABC to recognition of email spam. Int J Comput Sci In: 2015 10th international conference on malicious and
Mobile Comput 2017; 6: 459–466. unwanted software (MALWARE), IEEE, Fajardo, Puerto
100. Shin J, Son H, Rahman K, et al. Development of a cyber Rico, 20–22 October 2015, pp.11–20.
security risk model using Bayesian networks. Reliab Eng 113. Huang W and Stokes JW. MtNet: a multi-task neural net-
Syst Saf 2015; 134: 208–217. work for dynamic malware classification. In: proceedings
101. Bezemskij A, Loukas G, Gan D, et al. Detecting cyber-physi- of the 13th international conference on detection of intru-
cal threats in an autonomous robotic vehicle using Bayesian sions and malware, and vulnerability assessment: Volume
networks. In: 2017 IEEE international conference on Internet 9721 (DIMVA 2016). Springer-Verlag, Berlin, Heidelberg,
of Things (iThings) and IEEE green computing and July 2016, pp.399–418.
Dasgupta et al. 45

114. Cakir B and Dogdu E. Malware classification using deep 127. Musman S and Turner A. A game theoretic approach to
learning methods. In: proceedings of the ACMSE 2018 con- cyber security risk management. J Defens Model Simulat
ference (ACMSE ’18). Association for Computing 2018; 15: 127–146.
Machinery, Article 10, New York, NY, USA, March 2018, 128. Yazdankhah F and Honarvar AR. An intelligent security
pp.1–5. approach using game theory to detect DoS attacks in IoT.
115. Li Z, Zou D, Xu S, et al. Vuldeepecker: A deep learning- Int J Adv Comput Sci Appl 2017; 8: 313–318.
based system for vulnerability detection. CoRR 2018; 129. Miller ST and Busby-Earle C. Multi-perspective machine
1801.01681: 1–15. learning a classifier ensemble method for intrusion detec-
116. Wang S, Liu T and Tan L. Automatically learning semantic tion. In: proceedings of the 2017 international conference
features for defect prediction. In: 2016 IEEE/ACM 38th on machine learning and soft computing (ICMLSC ’17),
international conference on software engineering (ICSE), Association for Computing Machinery, New York, NY,
IEEE, Austin, TX, USA, 14–22 May 2016, pp.297–308. USA, 17–19 January 2017, pp.7–12.
117. Yang X, Lo D, Xia X, et al. Deep learning for just-in-time 130. Neupane RL, Neely T, Chettri N, et al. Dolus: cyber defense
defect prediction. In: 2015 IEEE international conference using pretense against DDoS attacks in cloud platforms. In:
on software quality, reliability and security, IEEE, proceedings of the 19th international conference on distrib-
Vancouver, BC, Canada, 3–5 August 2015, pp.17–26. uted computing and networking (ICDCN ’18), Association
118. Al-Jarrah O and Arafat A. Network intrusion detection sys- for Computing Machinery, Article 30, New York, NY,
tem using attack behavior classification. In: 2014 5th inter- USA, 4–7 January 2018, pp.1–10.
national conference on information and communication 131. Feng P, Ma J, Sun C, et al. A novel dynamic android mal-
systems (ICICS), IEEE, Irbid, Jordan, 1–3 April 2014, pp.1– ware detection system with ensemble learning. IEEE
6. Access 2018; 6: 30996–31011.
119. Malhotra P, Vig L, Shroff G, et al. Long short term memory 132. Zainal K and Jali Z. An immunological-based simulation: a
networks for anomaly detection in time series. In: European case study of risk concentration for mobile spam context
symposium on artificial neural networks, computational assessment. Int J Adv Sci Eng Inform Technol 2018; 8:
intelligence and machine learning, i6doc.com pub, Bruges 732–742.
(Belgium), 22–24 April 2015, pp.89–94. 133. Mehare V and Thakur RS. Data mining models for anomaly
120. Kang M and Kang J. A novel intrusion detection method detection using artificial immune system. in: proceedings of
using deep neural network for in-vehicle network security. international conference on recent advancement on com-
In: 2016 IEEE 83rd vehicular technology conference (VTC puter and communication. In: Lecture notes in networks
Spring), IEEE, Nanjing, China, 15–18 May 2016, pp. 1–5. and systems, vol 34. Springer, Singapore, 19 April 2018,
121. Gao N, Gao L, Gao Q, et al. An intrusion detection model pp.425–432.
based on deep belief networks. In: 2014 second interna- 134. Wu H, Wang W, Wen C, et al. Game theoretical security
tional conference on advanced cloud and big data, IEEE, detection strategy for networked systems. Inform Sci 2018;
Huangshan, China, 20–22 November 2014, pp.247–252. 453: 346–363.
122. He Y, Mendis GJ and Wei J. Real-time detection of false 135. Kim M. Game theoretic approach of eavesdropping attack
data injection attacks in smart grid: a deep learning-based in millimeter-wave-based WPANs with directional anten-
intelligent mechanism. IEEE Trans Smart Grid 2017; 8: nas. Wireless Network 2018; 25: 3205–3222.
2505–2516. 136. Kampanakis P, Perros H and Beyene T. SDN-based solu-
123. Wei J and Mendis GJ. A deep learning-based cyber-physi- tions for moving target defense network protection. In: pro-
cal strategy to mitigate false data injection attack in smart ceeding of IEEE international symposium on a world of
grids. In: 2016 joint workshop on cyber-physical security wireless, mobile and multimedia networks 2014, IEEE,
and resilience in smart grids (CPSR-SG), IEEE, Vienna, Sydney, NSW, Australia, 19 June 2014, pp.1–6.
Austria, 12 April 2016, pp.1–6. 137. Debroy S, Calyam P, Nguyen M, et al. Frequency-minimal
124. Li Y, Bradshaw J and Sharma Y. Are generative classifiers moving target defense using software-defined networking.
more robust to adversarial attacks? arXiv preprint In: 2016 international conference on computing, network-
arXiv:1802.06552, 2018, pp.1–18. ing and communications (ICNC), IEEE, Kauai, HI, USA,
125. Kosek AM and Gehrke O. Ensemble regression model- 15–18 February 2016, pp.1–6.
based anomaly detection for cyber-physical intrusion detec- 138. Sen S, Gupta KD and Manjurul Ahsan M. Leveraging
tion in smart grids. In: 2016 IEEE electrical power and machine learning approach to setup software-defined net-
energy conference (EPEC), IEEE, Ottawa, ON, Canada, work (SDN) controller rules during DDoS attack. In: pro-
12–14 October 2016, pp.1–7. ceedings of the international joint conference on
126. Kumar S, Viinikainen A and Hamalainen T. Evaluation of computational intelligence (MS Uddin and JC Bansal, eds),
ensemble machine learning methods in mobile threat detec- Springer Singapore, 14–15 December 2018, pp.49–60.
tion. In: 2017 12th international conference for internet 139. Berman M, Chase JS, Landweber L, et al. GENI: a feder-
technology and secured transactions (ICITST), IEEE, ated testbed for innovative network experiments. Comput
Cambridge, UK, 11–14 December 2017, pp.261–268. Network 2014; 61: 5–23.
46 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

140. Zainal K, Jali MZ and Hasan AB. Comparative analysis of 153. Rndic N and Laskov P. Practical evasion of a learning-
danger theory variants in measuring risk level for text spam based classifier: a case study. In: 2014 IEEE symposium on
messages. In: 5th international symposium on data mining security and privacy, IEEE, San Jose, CA, 18–21 May
applications (M Alenezi and B Qureshi B, eds), Advances 2014, pp.197–211.
in Intelligent Systems and Computing, vol 753. Springer, 154. Xu W, Qi Y and Evans D. Automatically evading classifiers:
Cham, Riyadh, Saudi Arabia, 21–22 March 2018, pp.133– a case study on pdf malware classifiers. In: network and dis-
152. tributed system security symposium 2016 (NDSS), IEEE,
141. Hońko P. Binary tree based deterministic positive selection San Diego, CA, USA, 21–24 February 2016, pp.1–15.
approach to network security. In: future data and security 155. Zhao W, Long J, Yin J, et al. Sampling attack against
engineering, 2017, pp.368–380. active learning in adversarial environment. In: Modeling
142. Turner AJ and Musman S. Applying the cybersecurity game decisions for artificial intelligence (MDAI 2012) V Torra,
to a point-of-sale system. In: Disciplinary convergence in Y Narukawa, López B, et al. (eds). Lecture Notes in
systems engineering research (A Madni, B Boehm, R Computer Science, vol 7647. Springer, Berlin, Heidelberg,
Ghanem R, et al., eds), Springer, Cham, Redondo Beach, Catalonia, Spain, 21–23 November 2012, pp.222–233.
CA, 2018, pp.129–144. 156. Akhtar Z, Fumera G, Marcialis GL, et al. Robustness eva-
143. Vorobeychik Y and Kantarcioglu M. Adversarial machine luation of biometric systems under spoof attacks. In: Image
learning. Synth Lect Artif Intell Mach Learn 2018; 12: 1–169. analysis and processing - ICIAP 2011 (G Maino and GL
144. Huang L, Joseph AD, Nelson B, et al. Adversarial machine Foresti eds). Lecture Notes in Computer Science, vol 6978,
learning. In: CCS’11: the ACM conference on computer Ravenna, Italy, September 14–16, 2011, pp.159–168.
and communications security Chicago Illinois USA. In: pro- 157. Biggio B, Fumera G and Roli F. Security evaluation of pat-
ceedings of the 4th ACM workshop on security and artifi- tern classifiers under attack. IEEE Trans Knowl Data Eng
cial intelligence, Association for Computing Machinery, 2014; 26: 984–996.
New York, NY, USA, 2011, pp.43–58. 158. Szegedy C, Zaremba W, Sutskever I, et al. Intriguing prop-
145. Akhtar Z. Security of multimodal biometric systems against erties of neural networks. CoRR 2013; 1312.6199: 1–10.
spoof attacks. Department of Electrical and Electronic 159. Fawzi A, Moosavi-Dezfooli S-M and Frossard P.
Engineering, University of Cagliari, Cagliari, Italy, 2012. Robustness of classifiers: from adversarial to random noise.
146. Akhtar Z, Hadid A, Nixon MS, et al. Biometrics: in search In: Advances in neural information processing systems
of identity and security (Q&A). IEEE MultiMedia 2018; 25: (NIPS 2016), NIPS, Barcelona, Spain, 5–10 December
22–35. 2016, pp.1632–1640.
147. Papernot N, McDaniel P, Goodfellow I, et al. Practical 160. Kurakin A, Goodfellow IJ and Bengio S. Adversarial exam-
Black-Box Attacks against Machine Learning. In: proceed- ples in the physical world. CoRR 2016; 1607.02533: 1–14.
ings of the 2017 ACM on Asia conference on computer and 161. Papernot N, McDaniel PD, Sinha A, et al. Towards the sci-
communications security (ASIA CCS ’17), Abu Dhabi ence of security and privacy in machine learning. CoRR
United Arab Emirates, April 2017 New York, NY: 2016; 1611.03814: 1–19.
Association for Computing Machinery. pp.506–519. 162. Biggio B, Fumera G, Roli F, et al. Poisoning Adaptive
148. Biggio B and Roli F. Wild patterns: ten years after the rise Biometric Systems. In: Structural, syntactic, and statistical
of adversarial machine learning. Pattern Recognit 2018; 84: pattern recognition. SSPR /SPR 2012 (Gimel’farb G, et al.,
317–331. eds). Lecture Notes in Computer Science, vol 7626,
149. Fredrikson M, Jha S and Ristenpart T. Model inversion Hiroshima, Japan, November 7–9, 2012, pp.417–425.
attacks that exploit confidence information and basic 163. Mei S and Zhu X. Using machine teaching to identify opti-
countermeasures. In: proceedings of the 22nd ACM mal training-set attacks on machine learners. In: proceed-
SIGSAC conference on computer and communications ings of the twenty-ninth AAAI conference on artificial
security (CCS ’15), Denver Colorado USA October, 2015. intelligence (AAAI’15). AAAI Press, January 2015,
New York, NY: Association for Computing Machinery, pp.2871–2877.
pp.1322–1333. 164. Li B, Wang Y, Singh A, et al. Data poisoning attacks on
150. Goodfellow IJ, Shlens J and Szegedy C. Explaining and factorization-based collaborative filtering. In: proceedings
harnessing adversarial examples. arXiv preprint of the 30th international conference on neural information
arXiv:1412.6572, 2014, pp.1–11. processing systems (NIPS’16). Curran Associates Inc., Red
151. Dalvi N, Domingos P Mausam, et al. Adversarial classifica- Hook, NY, USA, December 2016, pp.1893–1901.
tion. In: proceedings of the tenth ACM SIGKDD interna- 165. Zhao M, An B, Yu Y, et al. Data poisoning attacks on
tional conference on knowledge discovery and data mining multi-task relationship learning. In: AAAI conference on
(KDD ’04), Seattle, WA, USA, August 2004. New York, artificial intelligence, AAAI Press, 2018, pp.1885–1893.
NY: Association for Computing Machinery, pp.99–108. 166. Biggio B, Bulò SR, Pillai I, et al. Poisoning complete-link-
152. Yu S, Gu G, Barnawi A, et al. Malware propagation in age hierarchical clustering. In: proceedings of the joint
large-scale networks. IEEE Trans Knowl Data Eng 2015; IAPR international workshop on structural, syntactic, and
27: 170–179. statistical pattern recognition - Volume 8621 (S+SSPR
Dasgupta et al. 47

2014). Springer-Verlag, Berlin, Heidelberg, August 2014, 181. Sharif M, Bhagavatula S, Bauer L, et al. Accessorize to a
pp.42–52. crime: real and stealthy attacks on state-of-the-art face rec-
167. Biggio B, Rieck K, Ariu D, et al. Poisoning behavioral mal- ognition. In: proceedings of the 2016 acm sigsac confer-
ware clustering. In: proceedings of the 2014 workshop on ence on computer and communications security (CCS ’16),
artificial intelligent and security workshop (AISec ’14), Vienna, Austria, October, 2016. New York, NY:
Scottsdale Arizona, USA, November, 2014. New York, Association for Computing Machinery. pp.1528–1540.
NY: Association for Computing Machinery pp.27–36. 182. Akhtar Z, Micheloni C and Foresti GL. Correlation based
168. Chen Y, Nadji Y, Kountouras A, et al. Practical attacks against fingerprint liveness detection. In: 2015 international con-
graph-based clustering. In: proceedings of the 2017 ACM ference on biometrics (ICB), IEEE, Phuket, Thailand, 19–
SIGSAC conference on computer and communications security 22 May 2015, pp.305–310.
(CCS ’17), Association for Computing Machinery, New York, 183. Akhtar Z, Fumera G, Marcialis GL, et al. Robustness anal-
NY, USA, October 2017, pp.1125–1142. ysis of likelihood ratio score fusion rule for multimodal
169. Xiao H, Biggio B, Brown G, et al. Is feature selection biometric systems under spoof attacks. In: 2011 Carnahan
secure against training data poisoning? CoRR 2018; conference on security technology, IEEE, Barcelona, Spain,
1804.07933: 1–10. 18–21 October 2011, pp.1–8.
170. Mozaffari-Kermani M, Sur-Kolay S, Raghunathan A, et al. 184. Akhtar Z, Dasgupta D and Banerjee B. Face authenticity: an
Systematic poisoning attacks on and defenses for machine overview of face manipulation generation, detection and rec-
learning in healthcare. IEEE J Biomed Health Informat ognition. In: proceedings of international conference on
2015; 19: 1893–1905. communication and information processing (ICCIP), SSRN,
171. Paudice A, Muñoz-González L and Lupu EC. Label saniti- Pune, India, 17 May 2019, pp.1–8.
zation against label flipping poisoning attacks. arXiv pre- 185. Akhtar Z, Michelon C and Foresti GL. Liveness detection
print arXiv:1803.00992, 2018, pp.1–5. for biometric authentication in mobile applications. In:
172. Wang R, Liu T and Tao D. Multiclass learning with par- 2014 international carnahan conference on security tech-
tially corrupted labels. IEEE Trans Neural Network Learn nology (ICCST), IEEE, Rome, Italy, 13–16 October 2014,
Syst 2018; 29: 2568–2580. pp.1–6.
173. Steinhardt J, Koh PWW and Liang PS. Certified defenses 186. Akhtar Z, Micheloni C, Piciarelli C, et al. Mobio_livdet:
for data poisoning attacks. In: Advances in neural informa- mobile biometric liveness detection. In: 2014 11th IEEE
tion processing systems, NIPS-2017, Long Beach, CA, 4–9 international conference on advanced video and signal
December 2017, pp.3517–3529. based surveillance (AVSS), Seoul, IEEE, 26–29 August
174. Feng J, Xu H, Mannor S, et al. Robust logistic regression 2014, pp.187–192.
and classification. In: Advances in neural information pro- 187. Akhtar Z and Rattani A. A face in any form: new chal-
cessing systems, NIPS-2014, Montreal, Quebec, Canada, 8– lenges and opportunities for face recognition technology.
13 December 2014, pp.253–261. Computer 2017; 50: 80–90.
175. Liu Y, Chen X, Liu C, et al. Delving into transferable 188. Tolosana R, Vera-Rodriguez R, Fierrez J, et al. Deepfakes
adversarial examples and black-box attacks. CoRR 2016; and beyond: a survey of face manipulation and fake detec-
1611.02770: 1–24. tion. arXiv preprint arXiv:2001.00179, 2020.
176. Bootkrajang J and Kabán A. Learning kernel logistic 189. Shokri R, Stronati M, Song C, et al. Membership inference
regression in the presence of class label noise. Pattern attacks against machine learning models. In: 2017 IEEE
Recognit 2014; 47: 3641–3655. symposium on security and privacy (SP), IEEE, San Jose,
177. Jagielski M, Oprea A, Biggio B, et al. Manipulating CA, USA, 22–26 May 2017, pp.3–18.
machine learning: poisoning attacks and countermeasures 190. Long Y, Bindschaedler V, Wang L, et al. Understanding
for regression learning. arXiv preprint arXiv:1804.00308, membership inferences on well-generalized learning mod-
2018, pp.1–17. els. arXiv preprint arXiv:1802.04889, 2018, pp.1–16.
178. Xu G, Cao Z, Hu B-G, et al. Robust support vector 191. Tramèr F, Zhang F, Juels A, et al. Stealing machine learn-
machines based on the rescaled hinge loss function. Pattern ing models via prediction APIs. In: proceedings of the 25th
Recognit 2017; 63: 139–148. USENIX conference on security symposium (SEC’16).
179. Biggio B, Corona I, Maiorca D, et al. Evasion attacks USENIX Association, Berkeley, CA, USA, August 2016,
against machine learning at test time. In: Machine learning pp.601–618.
and knowledge discovery in databases. ECML PKDD 2013 192. Kloft M and Laskov P. Online anomaly detection under
(H Blockeel, K Kersting, S Nijssen, et al. (eds). lecture adversarial impact. In: proceedings of the thirteenth inter-
notes in computer science, vol. 8190, Prague, Czech national conference on artificial intelligence and statistics,
Republic, 23–27 September 2013. Springer, Berlin, JMLR workshop and conference proceedings, vol. 9, 2010,
Heidelberg, pp.387–402. pp.405–412.
180. Kantchelian A, Tygar J and Joseph A. Evasion and harden- 193. Miyato T, Dai AM and Goodfellow I. Adversarial training
ing of tree ensemble classifiers. In: proceedings of the 33rd methods for semi-supervised text classification. arXiv pre-
international conference on international conference on print arXiv:1605.07725, 2016, pp.1–11.
machine learning - Volume 48 (ICML’16). JMLR.org, June 194. Cai Q-Z, Du M, Liu C, et al. Curriculum adversarial train-
2016, pp.2387–2396. ing. arXiv preprint arXiv:1805.04807, 2018, pp.1–8.
48 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

195. Bhagoji AN, Cullina D, Sitawarin C, et al. Enhancing 210. Akhtar Z and Alfarid N. Secure learning algorithm for mul-
robustness of machine learning systems via data transfor- timodal biometric systems against spoof attacks. In: pro-
mations. In: 2018 52nd annual conference on information ceedings of the international conference on information
sciences and systems (CISS), IEEE, Princeton, NJ, USA, and network technology (IPCSIT), IPCSIT, vol. 4, January
21–23 March 2018, pp.1–5. 2011, pp.52–57.
196. Qi Z, Tian Y and Shi Y. Robust twin support vector 211. Monteiro J, Albuquerque I, Akhtar Z, et al. Generalizable
machine for pattern classification. Pattern Recognit 2013; adversarial examples detection based on Bi-model decision
46: 305–316. mismatch. In: 2019 IEEE international conference on sys-
197. Demontis A, Melis M, Biggio B, et al. Yes, machine learn- tems, man and cybernetics (SMC), IEEE, Bari, Italy, 6–9
ing can be more secure! A case study on android malware October 2019, pp.2839–2844.
detection. In: IEEE transactions on dependable and secure 212. Akhtar Z, Monteiro J and Falk TH. Adversarial examples
computing, vol. 16, no. 4, July–August 2019, pp.711–724. detection using no-reference image quality features. In:
198. Torkamani MA and Lowd D. On robustness and regulariza- 2018 international carnahan conference on security tech-
tion of structural support vector machines. proceedings of nology (ICCST), IEEE, Montreal, QC, Canada, 22–25
the 31st international conference on machine learning, October 2018, pp.1–5.
PMLR, vol. 32, no. 2, pp.577–585, 2014. 213. Simonyan K and Zisserman A. Very deep convolutional
199. Sokolić J, Giryes R, Sapiro G, et al. Robust large margin networks for large-scale image recognition. arXiv preprint
deep neural networks. IEEE Trans Sign Proc 2017; 65: arXiv:1409.1556, 2014, pp.1–14.
4265–4280. 214. Kurakin A, Goodfellow I and Bengio S. Adversarial exam-
200. Jordaney R, Sharad K, Dash SK, et al. Transcend: detecting ples in the physical world. arXiv preprint
concept drift in malware classification models. In: proceed- arXiv:1607.02533, 2016, pp.1–14.
ings of the 26th USENIX conference on security symposium 215. Papernot N, McDaniel P, Sinha A, et al. Towards the sci-
(SEC’17), USENIX Association, Berkeley, CA, USA, ence of security and privacy in machine learning. arXiv pre-
August 2017, pp.625–642. print arXiv:1611.03814, 2016, pp.1–19.
201. Corona I, Biggio B, Contini M, et al. Deltaphish: detecting 216. Moosavi-Dezfooli S-M, Fawzi A and Frossard P.
phishing webpages in compromised websites. In: Computer DeepFool: a simple and accurate method to fool deep
security - ESORICS 2017: 22nd European symposium on neural networks. In: 2016 IEEE conference on computer
research in computer security, Oslo, Norway, 11–15 vision and pattern recognition (CVPR), IEEE, Las Vegas,
September 2017, pp.370–388. NV, USA, 27–30 June 2016, pp.2574–2582.
202. Wild P, Radu P, Chen L, et al. Robust multimodal face and 217. Zilong Lin ZX and Shi Y. Idsgan: generative adversarial
fingerprint fusion in the presence of spoofing attacks. networks for attack generation against intrusion detection.
Pattern Recognit 2016; 50: 17–25. arXiv preprint arXiv:1809.02077, 2018, pp.1–8.
203. Kusner MJ, Gardner JR, Garnett R, et al. Differentially pri- 218. Zhao Z, Dua D and Singh S. Generating natural adversarial
vate Bayesian optimization. In: proceedings of the 32nd examples. arXiv preprint arXiv:1710.11342, 2017, pp.1–15.
international conference on international conference on 219. Zhao M, An B, Gao W, et al. Efficient label contamination
machine learning - Volume 37 (ICML’15), JMLR.org, Lille, attacks against black-box learning models. In: proceedings
France, 6–11 July 2015, pp.918–927. of the 26th international joint conference on artificial intel-
204. Ling S and Chi E. Investigation on distributed k-means ligence (IJCAI’17), Sydney, AAAI Press, August 2017,
clustering algorithm of homomorphic encryption. Comput pp.3945–3951.
Technol Develop 2017; 27: 81–85. 220. Anderson HS, Woodbridge J and Filar B. DeepDGA:
205. Akhtar Z, Micheloni C and Foresti GL. Biometric liveness adversarially-tuned domain generation and detection. In:
detection: challenges and research opportunities. IEEE proceedings of the 2016 ACM workshop on artificial intel-
Secur Priv 2015; 13: 63–72. ligence and security (AISec ’16), Vienna, Austria, October
206. Akhtar Z and Foresti GL. Face spoof attack recognition 2016, New York, NY, USA: Association for Computing
using discriminative image patches. J Electr Comput Eng Machinery, pp.13–21.
2016; 2016: 1–15. 221. Grosse K, Papernot N, Manoharan P, et al. Adversarial
207. Sengur A, Akhtar Z, Akbulut Y, et al. Deep feature extraction examples for malware detection. In: Computer security -
for face liveness detection. In: 2018 international conference ESORICS 2017. ESORICS 2017, Oslo, Norway, 11–15
on artificial intelligence and data processing (IDAP), IEEE, September 2017: (S Foley, D Gollmann and E Snekkenes,
Malatya, Turkey, 28–30 September 2018, pp.1–4. eds), Lecture Notes in Computer Science, vol. 10493.
208. Korshunov P and Marcel S. Deepfakes: a new threat to face Springer, Cham, pp.62–79.
recognition? assessment and detection. arXiv preprint 222. Chen X, Liu C, Li B, et al. Targeted backdoor attacks on
arXiv:1812.08685, 2018, pp.1–5. deep learning systems using data poisoning. arXiv preprint
209. Akhtar Z and Dasgupta D. A comparative evaluation of arXiv:1712.05526, 2017, pp.1–18.
local feature descriptors for DeepFakes detection. In: 2019 223. Cao P, Chen Y, Liu K, et al. Adversarial training for rela-
IEEE international symposium on technologies for home- tion classification with attention based gate mechanism. In:
land security (HST), IEEE, Woburn, MA, USA, 5–6 Knowledge computing and language understanding: CCKS
November 2019, pp.1–5. 2018 (J Zhao, F Harmelen, J Tang, et al., eds).
Dasgupta et al. 49

Communications in Computer and Information Science, 238. Zeng Q, Su J, Fu C, et al. A multiversion programming
vol. 957, Tianjin, China, August 14–17, 2018. Singapore: inspired approach to detecting audio adversarial examples.
Springer, pp.91–102. In: 2019 49th annual IEEE/IFIP international conference
224. Kurakin A, Goodfellow I and Bengio S. Adversarial on dependable systems and networks (DSN), IEEE,
machine learning at scale. arXiv preprint Portland, OR, USA, 24–27 June 2019, pp.39–51.
arXiv:1611.01236, 2016, pp.1–17. 239. Papernot N, McDaniel P, Jha S, et al. The limitations of deep
225. Tramèr F, Kurakin A, Papernot N, et al. Ensemble adver- learning in adversarial settings. In: 2016 IEEE European sym-
sarial training: attacks and defenses. arXiv preprint posium on security and privacy (EuroS&P), IEEE,
arXiv:1705.07204, 2017, pp.1–20. Saarbrucken, Germany, 21–24 March 2016, pp.372–387.
226. Miyato T, Maeda S, Koyama M, et al. Virtual adversarial 240. Liu Z, Liu Q, Liu T, et al. Feature distillation: DNN-
training: a regularization method for supervised and semi- oriented jpeg compression against adversarial examples.
supervised learning. IEEE Trans Pattern Anal Mach Intell arXiv preprint arXiv:1803.05787, 2018, pp.1–7.
2019; 41: 1979–1993. 241. Gu S and Rigazio L. Towards deep neural network archi-
227. Carlini N and Wagner D. Audio adversarial examples: tar- tectures robust to adversarial examples. arXiv preprint
geted attacks on speech-to-text. In: 2018 IEEE security and arXiv:1412.5068, 2014, pp.1–9.
privacy workshops (SPW), IEEE, San Francisco, CA, USA, 242. Guo C, Rana M, Cisse M, et al. Countering adversarial
24–24 May 2018, pp.1–7. images using input transformations. arXiv preprint
228. Papernot N, McDaniel P, Wu X, et al. Distillation as a arXiv:1711.00117, 2017, pp.1–12.
defense to adversarial perturbations against deep neural net- 243. Song Y, Kim T, Nowozin S, et al. Pixeldefend: leveraging
works. In: IEEE symposium on security and privacy (SP), generative models to understand and defend against adver-
IEEE Xplore, 2016, pp.582–597. sarial examples. arXiv preprint arXiv:1710.10766, 2017,
229. Soll M, Hinz T, Magg S, et al. Evaluating defensive distil- pp.1–20.
lation for defending text processing neural networks against 244. Borkar T and Karam L. Deepcorrect: correcting DNN mod-
adversarial examples. In: Artificial neural networks and els against image distortions. arXiv preprint
machine learning - ICANN 2019: image processing. arXiv:1705.02406, 2017, pp.1–17.
ICANN 2019 (I Tetko, V Kůrková, P Karpov, et al., eds). 245. Carlini N, Katz G, Barrett C, et al. Ground-truth adversarial
Lecture Notes in Computer Science, vol 11729. Springer, examples. arXiv preprint arXiv:1709.10207, 2017, pp.1–8.
Cham, Munich, Germany, 17–19 September 2019, pp.685– 246. Gopinath D, Katz G, Pasareanu CS, et al. Deepsafe: a data-
696. driven approach for checking adversarial robustness in
230. Shaham U, Garritano J, Yamada Y, et al. Defending against neural networks. arXiv preprint arXiv:1710.00486, 2017,
adversarial images using basis functions transformations. pp.1–17.
arXiv preprint arXiv:1803.10840, 2018, pp.1–12. 247. He W, Wei J, Chen X, et al. Adversarial example defenses:
231. Xie C, Wang J, Zhang Z, et al. Mitigating adversarial ensembles of weak defenses are not strong. In: proceedings
effects through randomization. arXiv preprint of the 11th USENIX conference on offensive technologies
arXiv:1711.01991, 2017, pp.1–16. (WOOT’17), USENIX Association, Berkeley, CA, USA,
232. Wang X, Jin H and He K. Natural language adversarial August 2017, p.15.
attacks and defenses in word level. arXiv preprint 248. Lu J, Issaranon T and Forsyth D. Safetynet: detecting and
arXiv:1909.06723, 2019, pp.1–15. rejecting adversarial examples robustly. arXiv preprint
233. Bradshaw J, Matthews AGdG and Ghahramani Z. arXiv:1704.00103, 2017, p.1–9.
Adversarial examples, uncertainty, and transfer testing 249. Grosse K, Manoharan P, Papernot N, et al. On the (statisti-
robustness in gaussian process hybrid deep networks. arXiv cal) detection of adversarial examples. arXiv preprint
preprint arXiv:1707.02476, 2017, pp.1–33. arXiv:1702.06280, 2017, pp.1–13.
234. Katz G, Barrett C, Dill DL, et al. Reluplex: an efficient 250. Feinman R, Curtin RR, Shintre S, et al. Detecting adversar-
SMT solver for verifying deep neural networks. In: ial samples from artifacts. arXiv preprint
Computer aided verification: CAV 2017 (R Majumdar and arXiv:1703.00410, 2017, pp.1–9.
V Kunčak, eds). Lecture Notes in Computer Science, vol. 251. Meng D and Chen H. Magnet: a two-pronged defense
10426, Heidelberg, Germany, July 24–28 2017. Springer, against adversarial examples. In: proceedings of the 2017
Cham, pp.97–117. ACM SIGSAC conference on computer and communica-
235. Strauss T, Hanselmann M, Junginger A, et al. Ensemble tions security (CCS’17), October 2017, pp.135–147.
methods as a defense to adversarial perturbations against 252. Pang T, Du C, Dong Y, et al. Towards robust detection of
deep neural networks. arXiv preprint arXiv:1709.03423, adversarial examples. arXiv preprint arXiv:1706.00633,
2017, pp.1–10. 2017, pp.1–14.
236. Adam GA, Smirnov P, Goldenberg A, et al. Stochastic com- 253. Miller DJ, Wang Y and Kesidis G. When not to classify:
binatorial ensembles for defending against adversarial exam- Anomaly detection of attacks (ADA) on DNN classifiers at
ples. arXiv preprint arXiv:1808.06645, 2018, pp.1–15. test time. arXiv preprint arXiv:1712.06646, 2017, pp.1–
237. Xu W, Evans D and Qi Y. Feature squeezing: detecting 142.
adversarial examples in deep neural networks. arXiv pre- 254. Paudice A, Muñoz-González L, Gyorgy A, et al. Detection
print arXiv:1704.01155, 2017, pp.1–15. of adversarial training examples in poisoning attacks
50 Journal of Defense Modeling and Simulation: Applications, Methodology, Technology 00(0)

through anomaly detection. arXiv preprint arXiv:1802. Memphis, Memphis, USA. His research covers broad
03041, 2018, pp.1–10. areas of computational intelligence (including artificial
255. Fraley JB and Cannady J. The promise of machine learning intelligence (AI) and machine learning) for the design and
in cybersecurity. In: SoutheastCon 2017, IEEE, Charlotte, development of intelligent solutions. He is one of the
NC, USA, 30 March–2 April 2017, pp.1–6. founding fathers of the field of AISs, making major contri-
256. Chowdhury S, Khanzadeh M, Akula R, et al. Botnet detec-
butions in developing tools for digital immunity and survi-
tion using graph-based feature clustering. J Big Data 2017;
4: 14.
vable systems. He has published a number of books and
257. Kozik R, Choraś M, Renk R, et al. A proposal of algorithm edited volumes including Advances in User
for web applications cyber attack detection. In: Computer Authentication (2017), Immunological Computation
information systems and industrial management: CISIM 2015 (2008), Artificial Immune Systems (1999), and another
(K Saeed and V Snášel, eds), Ho Chi Minh City, Vietnam, 5– book on GAs (1996). His current research interests are
7 Nov 2014. Lecture Notes in Computer Science, vol 8838. bio-inspired computing, cybersecurity, and trustworthy
Springer, Berlin, Heidelberg, pp.680–687. AI.
258. Xie M, Hu J and Slay J. Evaluating host-based anomaly
detection systems: application of the one-class SVM algo- Zahid Akhtar is an assistant professor in the Department
rithm to ADFA-LD. In: 2014 11th international conference of Network and Computer Security at the State University
on fuzzy systems and knowledge discovery (FSKD), IEEE,
of New York (SUNY) Polytechnic Institute, USA. He
Xiamen, China, 19–21 August 2014, pp.978–982.
259. Singh R, Kumar H and Singla R. An intrusion detection system
received the PhD degree in electronic and computer engi-
using network traffic profiling and online sequential extreme neering from the University of Cagliari, Italy. Prior to that
learning machine. Exp Syst Appl 2015; 42: 8609–8624. he was a Research Assistant Professor with the University
260. Wang J and Paschalidis IC. Botnet detection based on of Memphis, USA, and Postdoctoral Fellow with INRS-
anomaly and community detection. IEEE Trans Contr EMT, University of Quebec, Canada, University of Udine,
Network Syst 2017; 4: 392–404. Italy, Bahcesehir University, Turkey, and the University
261. Bhamare D, Salman T, Samaka M, et al. Feasibility of of Cagliari. His research interests include the areas of
supervised machine learning for cloud security. In: 2016 computer vision and machine learning with applications to
international conference on information science and secu- cybersecurity, biometrics, affect recognition, image and
rity (ICISS), IEEE, Pattaya, Thailand, 19–22 December video processing, and audiovisual multimedia quality
2016, pp.1–5.
assessment. He is also a senior member of the IEEE.
262. Yousefi-Azar M, Hamey L, Varadharajanz V, et al.
Malytics: a malware detection scheme. arXiv preprint
arXiv:1803.03465, 2018, pp.1–14. Sajib Sen is a graduate research assistant in the
Department of Computer Science at the University of
Memphis, Memphis, USA. His research interest includes
Author biographies algorithms, AI, cybersecurity, and cyber–physical systems.
Dipankar Dasgupta is a full professor in the
Department of Computer Science at the University of

You might also like