Alhaidari, 2021

Review
International Journal of Distributed

Sensor Networks
2021, Vol. 17(3)
A simulation work for generating a Ó The Author(s) 2021
DOI: 10.1177/15501477211000287
novel dataset to detect distributed journals.sagepub.com/home/dsn
denial of service attacks on Vehicular

Ad hoc NETwork systems
Fahd A Alhaidari and Alia Mohammed Alrehan
Abstract
Vehicular Ad hoc NETwork is a promising technology providing important facilities for modern transportation systems.
It has garnered much interest from researchers studying the mitigation of attacks including distributed denial of service
attacks. Machine learning techniques, which mainly rely on the quality of the datasets used, play a role in detecting many
attacks with a high level of accuracy. We conducted a comprehensive literature review and found many limitations on
the datasets available for distributed denial of service attacks on Vehicular Ad hoc NETwork including the following: una-
vailability of online versions, an absence of distributed denial of service traffic, unrepresentative of Vehicular Ad hoc
NETwork, and no information regarding the network configurations. Therefore, in this article, we proposed a novel
simulation technique to generate a valid dataset called Vehicular Ad hoc NETwork distributed denial of service dataset,
which is dedicated to Vehicular Ad hoc NETworks. Vehicular Ad hoc NETwork distributed denial of service dataset holds
information on distributed denial of service attack traffic considering Vehicular Ad hoc NETwork architecture, traffic
density, attack intensity, and nodes mobility. Well-known simulation tools such as SUMO, OMNeT++, Veins, and INET
were used to ensure that all the properties of Vehicular Ad hoc NETwork have been captured. We then compared
Vehicular Ad hoc NETwork distributed denial of service dataset with several studies to prove its novelty and evaluated
the dataset using several machine learning models. We confirmed that studied models using this dataset achieved high
accuracy above 99.5% except support-vector machine that achieved 97.3%.
Keywords
Vehicular Ad hoc NETwork, ad hoc network, distributed denial of service, machine learning, OMNeT++, Veins, dataset
Date received: 11 February 2021; accepted: 14 February 2021
Handling Editor: Ashish Kr Luhach
Introduction to employ the new technologies in reducing such traffic

accident rate. Among these technologies are the
Globally, car accidents represent a high proportion of Vehicular Ad hoc NETworks (VANETs), a very
road traffic deaths as shown by a World Health
Organization1 report, in 2018, on road safety, reporting College of Computer Science and Information Technology, Imam
a road traffic death rate of 18.2 per 100,000 population Abdulrahman Bin Faisal University, Dammam, Saudi Arabia
around the World and about 26.6 per 100,000 popula-
tion in some regions. Thus, determining ways to utilize Corresponding author:
Fahd A Alhaidari, College of Computer Science and Information
technologies to enhance and improve traffic safety is
Technology, Imam Abdulrahman Bin Faisal University, P.O. Box 1982,
driven by a mandatory need to safe people’s lives. Dammam 31441, Saudi Arabia.
From this context, several studies have been proposed Email: faalhaidari@iau.edu.sa
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work
without further permission provided the original work is attributed as specified on the SAGE and Open Access pages
(https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 International Journal of Distributed Sensor Networks
popular and important technology due to its link to

and impact on the safety of people and communities.
The system is a combination of several wireless and
sensor technologies including intelligent transport sys-
tem (ITS), mobile ad hoc network (MANET), and an
Internet of thing (IoT) application.2
VANET is comprised of nodes that use wireless net-
working technologies as a means of communication.
The vehicles’ nodes communicate with each other and
with the road side units (RSUs) through a communica-
tion unit in each vehicle called the on-board unit
(OBU), which in turn is connected to the application
unit (AU) to provide an application interface.3
VANET supports both safety and non-safety applica- Figure 1. Vehicle to infrastructure (V2I) DDoS.
tions. The main goal of its safety applications is to
minimize accidents and improve driving safety by alert-
ing drivers regarding collision avoidance, road sign attacker vehicles targets the RSU, so that it becomes
notifications, and alarms for incident management. By unavailable to legitimate nodes. Figure 1 explains a
contrast, non-safety applications are divided into two vehicle to infrastructure DDoS attack, where the red
subsections: traffic coordination and infotainment vehicles represent attackers targeting the RSU2 by
applications. Traffic coordination leverages vehicular flooding it with fake messages, eventually disabling it.
communications to broadcast traffic information In this article, we explored several studies related
between vehicles on the road; this optimizes traffic flow DDoS attack detection in VANET systems to highlight
and improves driver experience. Infotainment applica- the existing limitations on the datasets used as basis for
tions aim to provide drivers with contextual informa- training the prediction models. Based on the found lim-
tion such as pertinent advertisements and parking itations and gap, we proposed and conducted a simula-
assistance, in addition to entertainment during their tion framework to generate a suitable dataset that fits
journey.4 However, safety and non-safety applications the VANET architecture considering a common rout-
are not completely separated from each other; hence, ing protocol in VANET, namely, ad hoc on-demand
all aspects should be considered when designing distance vector (AODV). Moreover, the generated
VANET applications.5 dataset is used to evaluate numerous machine learning
In VANET, routing is a challenging factor owing to (ML) models for the purpose of detecting DDoS
the unique characteristics of the network, especially the attacks targeting VANET nodes.
rapid mobility of nodes that causes quick variations in The organization of this study is as follows: we intro-
topology. In addition, diverging density and speed of duce the literature review in section ‘‘Literature review’’
the vehicles on the road may lead to either overhead or with a focus on the studies that involve the usage of
poor connectivity due to sparse distribution. The cur- datasets in evaluating their DDoS attack detection tech-
rent VANET routing protocols are categorized based niques. The simulation work is presented in section
on topology, position, cluster, broadcast, and geocast- ‘‘Simulation work for generating the dataset.’’ The pro-
based protocols.6 The unique characteristics of cess of generating and adopting the dataset is presented
VANET expose it to many threats that may compro- in section ‘‘Dataset specifications.’’ Finally, section
mise and corrupt the whole VANET system.7 Threats ‘‘Conclusion’’ summarizes the main points as a conclu-
may originate owing to vulnerabilities such as those sion, and section ‘‘Future work’’ presents the future
related to communication protocols,8 energy flow and work.
authentication,9 integration,10 and information privacy
and integrity.11 Among these attacks is the distributed
Literature review
denial of service (DDoS) attack, which aims to deny
network availability by flooding either the vehicle, the Several studies have been proposed to mitigate the effects
infrastructure, or both, with spurious messages.12 of DDoS attacks targeting VANET or ad hoc networks.
There are two possible scenarios when targeting Some of the studies used statistical methods to detect the
VANET with DDoS attacks—from vehicle to vehicle attack and others relied on ML techniques to detect and
and from vehicle to infrastructure.13 The first scenario classify the attack traffic. Others still introduced frame-
occurs when a number of attacker vehicles targets a works on generating datasets for evaluating network
vehicle by flooding it with fake messages, so that the attack detection techniques. Figure 2 shows the hierar-
victim vehicle is unable to send/receive legitimate chy and taxonomy of the explored studies in this article.
requests. The second scenario occurs when a number of We categorized the studies into six classes based on
Alhaidari and Alrehan 3
Figure 2. Categories of studies investigated in the literature review.
network environment (VANET or non-VANET), the for both low- and high-rate DDoS attacks and then
inclusion of DDoS attack, the presence of a dataset blocking the traffic generated by the attackers based on
(dataset-based or statistical method), and the evaluation their locations.
method (ML-based or generation framework). The study presented in Shabbir et al.15 proposed a
The obtained classes as shown in Figure 2 are as fol- threshold-based framework for detecting and prevent-
lows: (1) Category 1: studies that used statistical meth- ing attacks by utilizing communication time as a com-
ods rather than ML techniques for detecting attacks on munication characteristic to be compared with a
networks.13–20 (2) Category 2: studies that used ML specific threshold to decide about alerting the other
techniques trained on datasets of VANET to evaluate nodes to avoid further communication with the
detection of network attacks including DDoS attacker nodes. This type of study was excluded as it
attacks.21–25 (3) Category 3: studies satisfying the filter- did not involve datasets for training the proposed
ing criteria for Category 2, but not including DDoS models.
attacks.26–34 (4) Category 4: studies that used ML tech- Mirsadeghi et al.16 introduced a cryptography
niques trained on datasets to evaluate detection of dif- method based on certificates issued by a trusted author-
ferent attacks including DDoS, but not on a VANET ity. To have a trusted clustered vehicular network, they
environment architecture.35–40 (5) Category 5: studies proposed estimating a trust degree for each node con-
introducing frameworks for generating datasets for the sidering the trust between vehicles and RSUs. Then,
VANET environment.41,42 (6) Finally, Category 6: based on the estimated trust degree and other mobility
studies introducing the generation of datasets for measures, the appropriate cluster head is selected which
detecting attacks but not considering the specifications in turn checks the trust degree of abnormal nodes. Any
of VANETs.43,44 abnormal nodes will be in the blacklist of the certifica-
tion authority and thus unable to communicate with
either other vehicles or control units.
Statistical-based studies for DDoS attack Bhushan and Gupta17 discussed the features of
The first category of these studies includes statistical- software-defined networking (SDN) and proposed a
based techniques for detecting DDoS/denial of service novel flow table sharing technique to mitigate DDoS
(DoS) attacks targeting smart communication networks attacks that target the network by overloading the flow
such as VANET.13–15 However, these studies do not use table, which usually has a limited size. For the proposed
datasets for validating the proposed techniques; they approach, they modeled the flow table space as an M/
rely only on statistical methods. G/S/C queuing model and then applied several rule-
In Kolandaisamy et al.,13 multivariant stream analy- based methods to detect and prevent DDoS on SDNs
sis (MVSA) was proposed as a method to detect and by utilizing the flow table status for all the switches and
prevent DDoS attacks targeting VANET. Similarly, the blacklist database that holds Internet protocol (IP)
Haydari and Yilmaz14 used a statistical anomaly detec- addresses of attack sources.
tion technique to detect the attack by applying an Kolandaisamy et al.18 proposed an analysis model
online discrepancy test (ODIT) at the detection phase that is capable of detecting DDoS attacks on a VANET
environment with less time needed to identify the attack Aneja et al.22 introduced a hybrid Intrusion
compared with other techniques discussed in their Detection System (IDS) to detect the RREQ Flooding
study-related work. The main idea is to calculate differ- attack in VANET environment where they used
ent measures through different stages as follows. Based SUMO, MOVE, and NS-2 tools in their conducted
on the clustering score of the incoming packets, a simulation experiments. They combined Artificial
stream position analysis was used to calculate specific Neural Networks (ANNs) with a Genetic Algorithm
determined features for the nodes including the volume (GA) model as a detection model where ANN performs
of the communicated data, payload, and message rates. the classification and GA tunes the selected input fea-
Then, these computed features were used to calculate tures. The dataset generated in Aneja et al.22 has a
the conflict field, conflict data, and attack signature detailed description of the simulation tools used along
sample rate, which are finally used in a statistically with the related steps and parameters. However, it has
based model to decide on the legitimacy of a node. certain limitations: the dataset itself is not available for
Kolandaisamy et al.19 proposed using an analytical other researchers, the network configuration was not
approach that utilizes the measures gathered by packet presented in the study, and there is no report on the
marking based on an adapted stream region scheme. dataset features.
The proposed approach involves extracting the neigh- The dataset generated in Karagiannis and
bor log file, calculating the node’s value, deciding on Argyriou23 is not available online, the study did not
the source region, identifying the routes for each show a clear procedure on processing the data, and
region, and computing the circulation rate. Finally, the there is no report on either the simulation tools or the
identification of a DDoS attack is based on the devia- network configuration parameters. Similarly, the data-
tion of the circulation from the current rate. Similarly, set introduced in Belenko et al.24 is not available online
Bensalah et al.20 proposed a statistical method for and has no description of the configuration of the net-
detecting and controlling malicious nodes in a VANET work and simulation environments. Thus, the gener-
using a variable control chart as a model to monitor ated datasets introduced in Singh et al.,21 Aneja
the quality of the communication for each node. Then, et al.,22 Karagiannis and Argyriou,23 and Belenko
based on the taken measurements, a node is considered et al.24 have been excluded from this study owing to the
a malicious node when its statistical quality violates the unavailability of both the dataset and the configuration
control limit. of simulation work used to generate the datasets.
The study in Zeng et al.25 proposed a deep learning
technique as an IDS that can perform feature extraction
ML-based studies for DDoS attacks on VANET and classification of different attacks on VANET
ML techniques are promising techniques for providing including DDoS, wormhole, and Sybil attacks. The
accurate detection and prediction mechanisms used in dataset used to train the detection models was gener-
many areas and domains including VANETs.45 There ated using an NS-3 simulator considering only the raw
are several ML studies applied to detect numerous mal- packets and logs as the output from the simulator. In
icious behaviors on VANETs. Since our focus in this addition, the ISCX IDS dataset46 was used to regener-
study is on the dataset that fits the VANET environ- ate and extract samples for different types of attacks
ment, we explored several studies that implemented such as DDoS attacks. The main observed limitations
ML techniques to detect DDoS attacks on VANET of the generated dataset are related to the generalization
systems.21–25 Moreover, we have highlighted the data- of the configuration parameters related to VANETs as
sets, simulation tools, and ML techniques used in such well as the lack of enough scenarios on the experimental
studies. models. Moreover, the ISCX IDS dataset is a tradi-
Singh et al.21 conducted an analysis on the impact of tional dataset for IDSs and is not designed to capture
DDoS attacks on the vehicle to infrastructure (V2I) the structure and features of VANETs.
communication under an SDN architecture. They
simulated Software-Defined Video Networking
(SDVN) using Mininet-WiFi and scikit-learn library ML-based studies for malicious attacks on VANET
for ML classifiers. In addition, eight supervised classi- Many studies have presented different ML techniques
fiers were used including gradient boost, random forest trained on datasets for detecting different types of
(RF), logistic regression (LR), nearest neighbors, deci- attacks; however, they have not included or considered
sion tree (DT), Support-Vector Machine (SVM), naı̈ve DDoS attacks.26–34
Bayes (NB), and neural network. Gradient boost classi- Ghaleb et al.26 used ANN model to detect malicious
fier gives the best accuracy among the models used. traffic in VANET. They trained ANN on a next gener-
However, the study did not provide details of the gen- ation simulation (NGSIM) dataset using MATLAB
erated dataset or the simulation scenarios such as the tools, and the results showed an accuracy of 99%. They
number of attacker nodes and the attack rate. used real traffic along with injected dynamic noises to
generate a dataset that had many attacks. However, they used about 40 vehicles assuming movements of
there were no DDoS attacks; moreover, the dataset did fixed speed during the simulation time. The simulation
not give any details on the network configuration. was designed to generate the traffic holding features of
Another study in Li et al.27 presented the usage of a wormhole attack. Then, the data were preprocessed
SVM to detect nodes with suspicious behavior in a and used as a dataset for training both K-nearest neigh-
VANET environment by considering several input bor (KNN) and SVM models for detecting wormhole
parameters such as the movement speed and transmis- attacks. However, they presented no details on the pro-
sion range. The dataset was generated by the cedure and methodology of simulating the environment
GloMoSim simulation framework. However, the study and generating the dataset. Moreover, the dataset can
did not report the dataset specification, nor is the data- only be used for wormhole attacks.
set available online. A study Singh et al.32 proposed using SVM and LR
Grover et al.28 conducted an experimental work as ML techniques to detect false position data gener-
using the NCTUns-5.0 simulator to generate a dataset ated by malicious VANET nodes, known as a false
that was used to train and evaluate several ML tech- position attack. The evaluation of the detection model
niques on detecting malicious nodes in VANET where was conducted on the VeReMi dataset,41 and the results
Weka had been used to evaluate the specified classi- showed a high accuracy of about 97%. However, the
fiers. Although it presented the simulation work to gen- dataset does not involve traces for DDoS attacks.
erate the dataset, it did not explain the procedure. The VeReMi dataset has been used in Gyawali and
Moreover, it did not involve a DDoS attack in the gen- Qian33 to validate different ML techniques (LR, K-
erated dataset. nearest, DT, bagging, and RF) on detecting misbeha-
Ali Alheeti et al.29 proposed a smart security frame- vior attacks including both false alert generation and
work to protect the outside communication system for position falsification attacks. Similarly, a study pre-
autonomous and semi-autonomous vehicles by detect- sented in So et al.34 proposed ML-based techniques
ing gray hole and rushing attacks in real time. They (KNN and SVM) for detecting misbehavior attacks on
simulated the environment using SUMO, MOVE, and VANETs using the VeReMi dataset for training the
NS-2 which generates a trace file to produce a set of proposed models. However, the evaluation and predic-
features that can be used to differentiate legitimate from tion of DDoS attacks were not a part of their studies;
malicious behavior. Two ML classifier algorithms were moreover, the dataset does not involve traces for such
applied—SVM and FeedForward Neural Networks attacks.
(FFNNs), where results showed that the FFNN model
had a lower false negative rate than SVM. However,
SVM showed a high performance in terms of detection
ML-based studies for DDoS attacks on non-VANET
time and is faster than FFNN. To summarize, a Several studies have been proposed for detecting DDoS
detailed procedure for generating a dataset was pre- attacks using either existing datasets or simulations to
sented, but without involving DDoS attack traffic. generate their own datasets for the purpose of training
Moreover, there was insufficient information regarding and validating different ML classifiers.35–40 The main
network configuration parameters. limitation of these studies is that the datasets used do
Aloqaily et al.30 proposed a framework called D2H- not capture the characteristics of VANETs or the
IDS as an IDS in vehicle nodes connected through a environment.
cloud network. The effectiveness of this solution was Kim et al.35 used the KDD CUP 1999 intrusion
validated through simulations where they generated detection dataset to train SVM for detecting several
normal traces using NS-3 and NSL-KDD dataset47 for attacks including DDoS attacks, and the results showed
generating several attacks including a DoS attack. The an effective classification of different attacks with an
features were selected by applying a Deep Belief accuracy of about 85%. Although the KDD CUP 1999
Network (DBN), and a DT was used for the classifica- used in Kim et al.35 is available online and contains
tion of attacks where results showed high accuracy and DDoS attacks, it was not designed for VANETs.
low false rates. However, generally, the datasets pro- Yu et al.36 proposed a framework to detect DDoS
posed and generated in Li et al.,27 Grover et al.,28 Ali attacks on SDVN environments by implementing three
Alheeti et al.,29 and Aloqaily et al.30 did not fulfill the different detection models including a trigger detection
required criteria due to the unavailability of the online model for inbound packets, a flow table feature-based
dataset, not reporting the network configurations, and detection model utilizing the features of OpenFlow pro-
not considering DDoS attacks as a part of the gener- tocol, and an attack detection model based on SVM.
ated dataset. They used a combination of real and generated network
Singh et al.31 generated a synthetic dataset using an traffic to generate a dataset considering different types
NS-3 simulator and mobility traces produced by an of DDoS attacks using the Scapy and hping3 tools, and
SUMO traffic simulator. In the network simulator, simulation results showed an accuracy of greater than
97%. However, the dataset used is not available online, training model focused only on the CAM transmission,
and the simulation was conducted on virtual machines, leading to short training sequences of 100 s. Moreover,
which do not reflect VANET characteristics as nodes details on the configuration of the simulated environ-
are mobile in VANET and at different speeds resulting ment and the obtained traces are not available online
in quick changes to the topology. for further studies. A recently published dataset on
Luong et al.37 presented a simulation work to gener- VANET environments presented in Van der Heijden
ate a training dataset to be used with KNN classifiers et al.41 held many misbehaviors and attacks but did not
for detecting flooding attacks on MANETs by consid- include DDoS attacks which is the focus in this study.
ering the frequency of route request packets. Similarly, A study presented in Damasevicius et al.43 proposed
a study presented in Reddy and Thilagam38 conducted a dataset called LITNET-2020 generated using LITNET
a simulation work using Network Simulator (NS-2) for NetFlow topology and holding different attacking sce-
ad hoc networks to evaluate a proposed DDoS attack’s narios including DoS, DDoS, worms, land, and frag-
mitigation technique that relies on the usage of NB mentation attacks. They considered data flow for
classifier. Gao et al.39 proposed an IDS for DDoS different protocols including IPv6, transmission control
attacks using RF classifier utilizing big data technolo- protocol (TCP), user datagram protocol (UDP), and
gies such as Spark and Hadoop distributed file system Internet control message protocol (ICMP). The dataset
for implementing the proposed approach. They used is available online and the study provided a description
both NSL-KDD47 and UNSW-NB1548 datasets for about the dataset features and the network configura-
evaluating the proposed method for detecting DDoS tion parameters. However, the proposed dataset is not
attacks. However, the datasets used in Luong et al.37 dedicated for VANETs as it does not capture the prop-
and Reddy and Thilagam38 were generated by consid- erties of VANETs nor the VANET protocols.
ering only general ad hoc network characteristics, and A framework was proposed in Al-Hadhrami and
are not available online. Similarly, the NSL-KDD and Hussain44 for dataset generation that can be used to
UNSW-NB15 datasets used in Gao et al.39 include traf- train and validate IDS models on IoT networks. The
fic for DDoS attacks, but are not properly representa- dataset is called IoT-DDoS and involves different types
tive of VANET as they were designed for general of traffic including normal traffic, flooding attacks,
network traffic and thus do not capture the characteris- selective forwarding attacks, and blackhole attacks
tics of a VANET environment. considering different protocols such as the RPL routing
Ali Alheeti and McDonald-Maier40 proposed a protocol, ICMPv6, IEEE 802.15.4, 6LoWPAN, and
hybrid IDS for detecting malicious intrusion attacks UDP. However, the framework does not take into
such as DDoS and network scanning attacks on auton- account the specifications and protocols of VANETs.
omous vehicles. They used multi-layer perception As a summary, we believe that there are limited
(MLP) with fuzzy logic techniques trained on the existing solutions for detecting DDoS attacks in
Koyoto dataset,49 showing an accuracy of 99%. VANETs. The limitations are due to the lack of real or
Although the Koyoto dataset holds traffic and features synthetic DDoS datasets designed or generated for
from real network communications, it does not have VANET environments. Furthermore, applying tradi-
traffic obtained from VANET communication systems tional network solutions on VANETs without consider-
that use different types of protocols and work on a spe- ing the VANETs’ unique characteristics may lead to
cific style of communication. inaccurate results. Even though some studies have gen-
erated datasets considering the VANET environment,
Framework-based studies for generating datasets they did not illustrate the features available in their
datasets and others did not demonstrate the methods
The last group of studies explored and evaluated in this
they followed to generate the datasets. Thus, it is diffi-
study is of studies that presented frameworks for gener-
cult for other researchers to utilize these datasets as well
ating datasets that can be used for training and evaluat-
as to validate and compare their results with such solu-
ing intrusion detection techniques against several types
tions. Consequently, we believe that these datasets can-
of attacks. Some of these studies were dedicated for
not be used for further studies owing to one or more of
VANET41,42 and others were not.43,44
the following reasons: (1) dataset is not available
Lyamin et al.42 proposed a heuristic approach
online, (2) dataset does not contain a DDoS attack, (3)
derived from data mining methods for real-time detec-
dataset is not designed for VANET environments, and
tion of radio jamming DoS attacks in a VANET com-
munication environment. To train the proposed (4) unavailability of information regarding network
detection model, they conducted a simulation experi- configuration. Table 1 compares these explored data-
ment using MATLAB to generate a sort of dataset sets and studies based on the fulfillment of these four
holding cooperative awareness message (CAM) trans- criteria. Moreover, the table presents other aspects of
missions in the IEEE 802.11p protocol. However, the the evaluated studies including the type of attacks being
Table 1. Comparison of different datasets used for attack detection in VANET.
Reference Used dataset Online Involving Dedicated Network Involved attacks Trained models Performance ratio
availability DDoS for VANET configuration
attack availability
Singh et al.21 Generated dataset ß ß DDoS RF, DT, SVM, Above 90%
Alhaidari and Alrehan
boosting, others
Aneja et al.22 Generated dataset ß ß Flooding attack ANN 99%
Karagiannis and Argyriou23 Generated dataset ß ß RF jamming attack K-means NA.
Belenko et al.24 Generated dataset ß ß Including DDoS None NA.
Zeng et al.25 Generated dataset for ß ß DoS, DDoS, blackhole, CNN, LSTM 96.9%
VANET and ISCX IDS wormhole, Sybil
Ghaleb et al.26 NGSIM ß ß Malicious nodes ANN 99%
Li et al.27 Generated dataset ß ß ß Malicious nodes SVM Above 95%
Grover et al.28 Generated dataset ß ß ß Malicious nodes NB, J48, RF 97%
Ali Alheeti et al.29 Generated dataset ß ß ß Gray hole SVM, FFNN 90%
Aloqaily et al.30 NSL-KDD for ß ß ß DoS, Probe, R2L, U2R Deep belief, DT 99.43%
attack traffic
Singh et al.31 Generated dataset ß ß Wormhole attack KNN and SVM 99%
Singh et al.32 VeReMi ß Position falsification LR and SVM 97%
attack
Van der Heijden et al.41 Generated dataset ß Position falsification Non-machine Close to 1
(VeReMi) attack learning
Gyawali and Qian33 VeReMi ß False alert, position LR, K-N, DT, 97%
falsification bagging, RF
So et al.34 VeReMi ß Location spoofing KNN and SVM 94%
Kim et al.35 KDD CUP 1999 ß ß DoS attack SVM 85%
Yu et al.36 Generated dataset ß ß ß DDoS SVM 98.56%
Luong et al.37 Generated dataset ß ß ß Flooding attack KNN Above 99%
Reddy and Thilagam38 Generated dataset ß ß DDoS NB 80%
Gao et al.39 NSL-KDD and ß DDoS RF, SVM, NB 99.9% and 98.7%
UNSW-NB15
Ali Alheeti and Koyoto dataset ß ß Malicious intrusion MLP 99%
McDonald-Maier40
Lyamin et al.42 Generated dataset ß ß ß Radio jamming DoS Heuristic 95%
approach
Damasevicius et al.43 LITNET-2020 ß DDoS, worms, land, and None NA
others
Al-Hadhrami Generated dataset ß ß ß Flooding, forwarding, None NA
and Hussain44 called IoT-DDoS blackhole attacks
VDDD Generated dataset DDoS J48, SVM, ANN, 99.7%
KNN, RF, NB
VANET: Vehicular Ad hoc NETworks; RF: random forest; DT: decision tree; SVM: support-vector machine; ANN: artificial neural network; NA: not applicable; IDS: intrusion detection system; CNN:
convolutional neural network; LSTM: long short-term memory; NGSIM: next generation simulation; NB: naı̈ve Bayes; FFNN: feedforward neural network; LR: logistic regression; K-N: K-nearest; KNN: K-
nearest neighbor; MLP: multi-layer perception; VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset.
7
Table 2. Simulation tools and versions used in this work.
Category Tool Version
Network interface OMNeT++ V 5.1

Model library INET V 3.6
Network mobility Veins V 4.7
framework
Traffic generator SUMO V 0.30.0
Machine learning Weka V 3.8.3
evaluation
Operating system Windows Windows 7 64 bits
considered, the detection techniques used, and the per-

centage of accuracy achieved.
To solve the limitations of the existing approaches
discussed here, we aim to generate a novel dataset that
can be used for DDoS attack detection in VANET
environments called VDDD, which can be used by the
researchers working in this field. To make our dataset
available for others, we present with enough details the
procedure we followed for generating this dataset start-
ing from the selection of the simulation tools until the
generation of the complete dataset in the VANET envi- Figure 3. Simulation map showing the King Fahad Highway.
ronment. Moreover, we analyze and evaluate the qual-
ity and performance of the generated dataset by Ethernet, IEEE 802.11, and many other protocols and
applying commonly used ML techniques. components.
Veins (vehicles in network simulation) is an open-
source framework for running vehicular network simu-
Simulation work for generating the lations. It allows dynamic interaction between
OMNeT++ and SUMO through implementing TraCI
dataset
(traffic control interface). Veins was selected due to the
When simulating a VANET environment with many unique features it has such as supporting realistic maps,
vehicles conceivably broadcasting several messages per realistic traffic, and the use of different protocols
second, the selection of simulation tools becomes a cru- including the routing protocols provided by INET.
cial task. Some important parameters should be consid- Finally, SUMO is an open-source traffic generator
ered such as user-friendliness, scalability, and the ability which creates mobility scenarios on real road maps
to connect network communication simulators and based on user-specified parameters. SUMO provides
road traffic. We implemented our model using a combi- an application programming interface (API) called
nation of four frameworks: OMNeT++,50 SUMO,51 TraCI, which stands for traffic control interface. TraCI
INET,52 and Veins.53 The simulation tools and versions allows accessing and synchronizing retrieved values of
used in this study are shown in Table 2. To create a rea- the SUMO simulated scenarios by managing the TCP
listic testbed, we simulated the traffic on King Fahad connections under the client/server architecture.
Highway, which is located in the Eastern Province of The selection of the aforementioned simulation tools
the Kingdom of Saudi Arabia, and connects between was a result of evaluating various simulation tools in
two cities Dammam and Al-Khobar, as shown in two stages. The first stage was exploring the simulation
Figure 3. tools in the literature review and their features as well
OMNeT++ records simulation results as scalar val- as how the researchers use these tools to simulate their
ues, vector values, and histograms. In addition, it sup- network. In the second stage, we selected some tools
ports exporting the results to different formats based on the following criteria: model development,
including Python files, SQLite files, and others. INET customization, ability to generate different types of net-
Framework is an open-source model library for the work traffic, scalability, integration with other tools,
OMNeT++ and it supports the implementation of and capability to record and analyze the generated
many transport layer protocols such as TCP, UDP, events. The combination of the used frameworks satis-
and stream control transmission protocol (SCTP). In fies all requirements that we need to simulate VANET
addition, it supports wired/wireless interfaces like including normal and UDP flooding attacks as well as
Figure 4. Proposed work diagram.
recording all the traffic events within the configured geo-coordinates to metric coordinates of the OSM map;
environment. these metric coordinates are utilized in the next step by
SUMO. The following command does this task:
Overview of proposed work netconvert - -osm-files *.osm -o *.net.xml
The main idea of the proposed work is to simulate a
VANET environment considering both normal and Besides the network file, we need to consider the
DDoS traffic for the purpose of generating a synthetic obstacles found within the scenario such as buildings
dataset based on several simulated scenarios to be used and parks. OSM files have the advantage of providing
as an input for ML methods. The proposed work such information in addition to other information like
involves three main stages as shown in Figure 4. streets, lanes, junctions, and the maximum speed for
Starting from the bottom, the first stage is to generate each street. We used the poly-convert utility to generate
realistic network mobility traffic using SUMO. The sec- a poly file, which can be used in Veins to identify all
ond stage is to import the SUMO mobility traffic into the obstacles using the following command:
OMNeT++ to generate the network traffic (normal
and DDoS) utilizing both Veins and INET. The final polyconvert - -net-file *.net.xml - -osm-files *.osm - -type-
stage is to for collect and prepare the dataset that will *.xml -o *.poly.xml
be used for evaluating and studying the performance of
several ML algorithms. After generating the obstacles file, the SUMO net-
We started using SUMO to prepare the network and work is established and we proceed to generate the net-
generate the traffic. The first step is to export the simu- work traffic. There are two options to generate traffic
lation area which is the King Fahad Highway from for the vehicles in SUMO. The first one is to generate a
OpenStreetMap (OSM).54 Then, the OSM file is pro- random trip and the second one is to design a custom
cessed with SUMO’s net-convert utility that transforms trip in a specific route. In this study, we selected the
Figure 5. KingFahadHighway.launchd.xml.
second choice and assumed that all vehicles considered Table 3. Simulation parameters.
in our scenarios cross the King Fahad Highway from
Dammam to Al-Khobar. To simulate traffic in our net- Parameter Value
work, we generated four files as follows:
Routing protocol AODV
PHY model IEEE 802.11p
1. Traffic Analysis Zone (TAZ) file which contains Channel Wireless
the edges for our route. Mobility scenario Highway (18 km)
2. Origin/Destination (OD) matrix file that Thread DDoS
Transport protocol UDP
includes the origin point, the destination point, Vehicle communication range 550 m
and the number of vehicles passed while taking RSU communication range 600 m
the route. Packet size 100 byte
3. Od2trips file that takes the TAZ and OD files RSU 3
as an input. Before generating the fourth file, Number of vehicles 20, 60
Speed Maximum of road speed
we combined these three files to generate Number of attackers 2
Od_file.odtrips.xml, by running the following Attack duration 25 s
command: Attack rate 10 and 50 pps
Normal rate 1–5 pps
od2trips -c PATH\od2trips.config.xml -n PATH\taz_ Run time 500 s
file.taz.xml -d PATH\OD_file.od -o PATH\od_ AODV: ad hoc on-demand distance vector; DDoS: distributed denial of
file.odtrips.xml service; UDP: user datagram protocol; RSU: road side unit.
4. SUMO configuration file that takes both the

network and the OD trip files as input and then to TCP.55 Two types of UDP applications were used in
generates the route file as an output. By applying this work: the first one is UDP Basic App and the sec-
the following command, we generate trips and ond one is the UDP sink. UDP Basic App sends UDP
route files. The trip file contains the trip for each packets to the given IP addresses in each time interval
vehicle and other information like departure where the IP address could be a Wireless Local Area
time and speed. Conversely, route files look like Network (WLAN) or a node IP. The UDP sink App
trip files except that the route file contains all the binds a UDP socket to a given local port and prints the
intermediate edges from origin to destination. received packets’ information such as the source, desti-
nation, and length of the packet.
duarouter –c PATH\duarcfg_file.trips2routes.duarcfg In the Veins subproject, we started editing and
building the simulation in three steps to meet our sce-
In OMNeT++ , we imported two frameworks— nario’s parameters as shown in Table 3. The first step
INET and Veins as shown in Figure 4. Veins uses pro- is to replace square files (square.net.xml, square.po-
tocols and applications provided by INET to simulate ly.xml, and square.rou.xml) with our SUMO files pre-
both normal and attack traffic. INET provides and viously generated by the simulation and that have the
supports different transport layer protocols like TCP, same extensions. Figure 5 shows the created contents
UDP, and SCTP. Moreover, it provides several routing as a result of the first conducted step. The second step
protocols that can be used within the simulation. In is to edit the ‘‘scenario.ned’’ file to meet our scenario’s
this work, we used UDP as it is widely used in VANET parameters and other required network configuration
owing to its ability to rapidly transport data compared like the life cycle Controller which manages general
OMNeT++ provides all the requirements to simu-

late different security attacks. Several researchers used
OMNeT++ to simulate different types of DDoS
attacks in traditional networks.56–58 In this work, we
generated normal and DDoS traffic for VANET sce-
narios. For the normal traffic, each node (vehicle or
RSU) broadcasts UDP packets with a transmission
interval of 1–5 per second. Conversely, the DDoS traf-
fic is based on two key attributes: attack intensity and
the number of attacker nodes. The attack intensity is
between 10 and 50 packets per second (pps) and the
number of attackers either 2 or 6 according to the
designed scenario.
Figures 8 and 9 show the configuration parameters
for both UDP normal traffic and DDoS attack traffic,
respectively. The parameters include the IP addresses,
port numbers, the multicasting group, start time, end
time, and the traffic rate.
Figure 6. Scenario design.

Scenarios
In this study, the implemented topology includes three
operations such as shutdown, restart, suspend, and RSUs and N number of vehicles along a highway of
crash. Figure 6 shows the design of ‘‘scenario.ned.’’ 18 km where we have considered a low traffic rate of
Furthermore, we have added the AODV routing proto- 20 nodes, N = 20, and a high traffic rate of 60 nodes,
col to the vehicle node ‘‘car.ned’’ to be connected with N = 60. For each rate scenario, we considered and
the network layer as shown in Figure 7. The final step used two levels of attack rate: 10 and 50 pps, resulting
is to simulate both the normal and DDoS traffic in four different scenarios. In addition, we configured
through the usage of ‘‘omnetpp.ini.’’ one of the RSUs to be the victim unit that will be
exploited by the attack traffic.
Figure 7. Internal node design.

Figure 8. Configuration parameters for UDP normal traffic.
Figure 9. Configuration parameters for DDoS traffic.
Normal and attack traffic was generated using used dataset, and how the normal and attack traffic is
OMNeT++ where each node broadcasts requests to all being generated.
reachable nodes. All nodes send normal packets in a In this section, we explore the procedure to create
random manner where the transmission interval is VDDD. The following sections illustrate the steps to
between 1 and 5 s. The attack traffic was generated by generate a synthetic dataset on VANET environment.
specific vehicles to target the victim RSU with two dif- This section starts with the data collection and data
ferent rates (10 and 50 pps). preparation steps. After that, we proceeded to the data
We designed these scenarios with their related para- pre-processing step. Finally, we presented the dataset’s
meters based on recent studies, which simulated a feature selection step.
VANET environment to either study the impact of
some attacks or to generate a dataset for VANET Data collection
environments. Table 4 shows the simulation parameters
used by several studies from which we have adapted In this stage, we collect our data from OMNeT++ for
further analysis. Two files are mainly required to gener-
our parameters shown in Table 5.
ate the dataset, which are the trace file (log file) and
simulation results (vector file). Figure 10 illustrates the
Dataset specifications workflow we followed starting with collecting the raw
data and proceeding until we obtained a complete and
Generally, evaluating an intrusion detection–based ML informative dataset.
model depends on more than the classification accuracy The log file holds the events of messages’ transmis-
result as many other dimensions should also be consid- sion taking place among modules during the simula-
ered such as characteristics of the simulation area, the tion. Among the information recorded in this file are
event number, time, source and destination, packet
Table 4. Simulation parameters used by recent studies.
Reference Simulation Number of attackers Number of Number Transmission Speed

time (s) vehicles of RSUs rate
Li et al.27 900 5, 10, 15, 20, 50, 100, 200 – – 5, 10, 20, 30 m/s
25, 30, 35, 40
Ali Alheeti et al.29 499 4 40 9 – 30 m/s
Aloqaily et al.30 600 – 40 – 8 pps 20 m/s
Aneja et al.22 200 2 20 – – 30 m/s
Belenko et al.24 100 2 30 – 250 Kbps 30 m/s
Haydari and Yilmaz14 200 – 250 – 1 pps –
Siddiqui and Boukerche59 120 1 20, 60 3 10, 50, 100 pps 15 m/s
RSU: road side unit.
Table 5. Details of simulated scenarios.
Number of scenarios First scenario Second scenario Third scenario Fourth scenario
Simulation time 500 s 500 s 500 s 500 s

Attacker Node [1...2] Node [4...5] Node [1...6] Node [8...13]
Victim RSU2 RSU2 RSU2 RSU2
Attack duration 25 s 25 s 25 s 25 s
Attack time 180–205 s 180–205 s 180–205 s 180–205 s
Number of vehicles 20 20 60 60
Number of RSUs 3 3 3 3
Attack rate Low rate High rate Low rate High rate
Number of pps 10 50 10 50
RSU: road side unit.
name, source and destination port, and packet length. data by removing redundant information, thus making
The vector file records data values as a series of times, it ready to be merged with the vector data. Figure 13
that is, with a timestamp, which is necessary to calculate shows the final version of the log file.
the features in the upcoming steps. These data values The vector file has been exported OMNeT++ to a
are recorded and captured based on several categories SQLite database browser.60 This exported vector file
or features. Moreover, OMNeT++ provides several contains 12 tables with some containing general infor-
analyses and validation tools that can be used to vali- mation such as the simulation run information. We
date the accuracy of such data vectors. For example, only focused on three tables as shown in Figure 14.
Figure 11 shows a vector plot for all the transmission The vector table contains information about all mod-
rates that happened during the simulation. ules along with many statistics such as Min, Max, Sum,
and others. Figure 15 shows a part of the vector table.
The preparation step for this file involves correcting
Data preparation some errors in the vector data table such as correcting
After collecting the raw data in the previous step, the the data types of some of the fields and validating the
data are ready to be prepared and processed in such a data values exported from OMNeT++.
way that it can be used for evaluating ML techniques. In order to have an informative dataset, it is neces-
As shown in Figure 12, the raw data goes through sev- sary to merge the log file with the vector file. To
eral stages until we get an informative dataset in a suit- achieve that, we wrote python functions and used
able format to be read and analyzed. These stages Jupyter Notebook61 to merge these two files by explor-
involve processing the log file obtained from the log ing each event in the log file, and then for each event
viewer, processing the vector file generated by calculating the current, previous, and next time for
OMNeT++, merging the log and vector files using each node to obtain the 16 selected features’ values in
both Python functions and Jupyter Notebook, and the interval between the previous and next time. Figure
labeling traffic instances using queries in SQLite. The 16 shows the workflow of the functions that extract the
purpose of processing log files is to keep only the features’ values from both the log and vector files. The
important information in the log as well as to clean the main idea is to conduct some queries on the data files
Figure 10. Data workflow.
Labeling the dataset is an indispensable stage of data

pre-processing. From previous steps, we have full infor-
mation about each traffic item/event. In this stage, we
labeled all traffic to normal and DDoS based on the
attack details such as source IP, destination IP, attack
times, and duration. Labeling the dataset was done by
applying queries in the SQLite DB browser. Table 6
shows the number of instances and their label class in
each dataset.
Figure 11. Vector plot for transmission state. Data pre-processing

Usually, in the ML field, raw data may contain wrong
to accumulate the values related to each feature accord-
data or missing values. So, data pre-processing is
ing to the given time interval that takes place between
required before applying any classifiers. The pre-
the previous and next time events. As shown in Figure
processing stage in our proposed architecture involves
16, we developed several functions and queries to han-
three steps: data normalization, feature selection, and
dle different features based on their natures. Functions
balancing. In this section, we leveraged Weka capability
perform queries to extract the instances related to a
when pre-processing data and the following sections give
specific node based on the event’s timestamp, previous
a detailed explanation for each data pre-processing step.
time, and next time, and then perform procedures to
accumulatively calculate the features’ values.
Figure 12. Data preparation flowchart.
Figure 13. Log file.
Figure 14. Schema of exported vector file.
Data normalization. Data normalization is the process of Feature selection. Feature selection is one of the data
rescaling the dataset attributes to lie in one particular reduction methods where selecting features significantly
range, for example, between 0 and 1 or 21 and 1. influences the performance as it reduces the training
According to equation (1) time and improves the accuracy. Conversely, keeping
irrelevant or partially relevant features can negatively
ðx MinÞ affect performance. Various feature selection tech-
X= ð1Þ
ðMax MinÞ niques are available today such as correlation-based
feature selection (CFS), information gain (IG)–based
Normalizing data often makes the dataset ready for feature selection, and gain ratio (GR) feature selec-
applying any classifier. In addition, to increase accuracy tion.49 A brief description of each of these techniques is
results, we applied normalization to our dataset using presented as follows.
Weka.
Figure 15. Vector table.
Figure 16. Workflow of merging and calculating the values for dataset features.
CFS. CFS is a popular technique for estimating a cor- krcf

Ms = ð2Þ
relation between the subset of attributes and their corre- k + k ðk 1Þrff
sponding classes, as well as the inter-correlations among
the features. It measures the relevance of a group of fea- where Ms refers to the heuristic of a subset containing
tures as a high value of the correlation between the fea- K features, rcf is the mean correlation between the fea-
tures and the classes indicates the group has more tures and the classes, and rff is the average correlation
relevance, whereas a high value of inter-correlation only between features. After calculating CFS, we
shows a lower relevance of the group of features.62,63 selected only those attributes that have a high positive
The measure of CFS is presented in equation (2) or negative correlation. In other words, these attributes
Table 6. Details of instances in each dataset.
First dataset Second dataset Third dataset Fourth dataset
Number of instances 4195 6186 11,556 17,375

Number of normal traffic 3695 3686 10,056 10,115
Number of attack traffic 500 2500 1500 7260
Table 7. Ranks of first dataset attributes.
Number Ranked Attribute name Number Ranked Attribute name
1 0.53783 throughput 2 0.23248 rcvdPkFromHL

3 0.4611 passedUpPk 4 0.05142 droppedPkNotForUs
5 0.46028 rcvdPkSeqNo 6 0.00982 radioMode
7 0.46028 endToEndDelay 8 0.00875 sentPk
9 0.46028 passedUpPkCount 10 0.0055 queueingTime
11 0.45724 rcvdPk 12 0.0051 transmissionState
13 0.37168 rcvdPkFromLL 14 0.00485 sentDownPk
15 0.3252 receptionState 16 0.00359 queueLength
Table 8. Ranks of second dataset attributes.

must be close to 21 or 1. We discarded the low correla- IG

GR = ð4Þ
tion attributes that were close to zero. HðX Þ
When the data of X completely forecast Y, then the
IG–based feature selection. IG or entropy is another value of GR = 1. However, when there is no relation
popular feature selection technique as it measures the between Y and X, then the value of GR = 0. The GR
contributed information for each feature on the class. favors variables with small values which is a conflict
The value varies from 0 to 1, where highly informative with IG.64 Usually, with supervision information, fea-
features get the highest values and 0 means that the feature significance is assessed via its correlation with the
ture has no information or impact on the classes.64 The class labels.65 Based on that, we used a CFS, which is
measure of IG is presented in equation (3) supported by Weka.
Tables 7–10 show the rank attribute for each dataset
Y X
IG = H ðY Þ H = HðX Þ H ð3Þ obtained by Weka. Based on the ranked attributes, we
X Y selected our cutoff to be equal to or greater than 0.2.
X and Y in equation (3) represent the random variables, Thus, if the attribute has a rank value equal to or
and the entropy of a random variable X is written as H greater than 0.2, it is considered an important feature
(X). to be included in the evaluation process. Otherwise, it
has to be discarded.
GR feature selection. The GR is a ratio of IG to the
intrinsic information, which can be obtained by divid- Balancing. In our scenarios, we simulated both normal
ing IG over the entropy of X as shown in equation (4) and attack traffic, and according to a real-world
Table 9. Ranks of third dataset attributes.

Table 10. Ranks of fourth dataset attributes.

environment, the majority of traffic is normal traffic Dataset description

rather than attack traffic. This leads to having an unba- In each record of the dataset, there are 29 different fea-
lanced dataset. Strictly speaking, we do not have 50% tures including one class attribute as either a DDoS
normal and 50% attack or 60% to 40% traffic in our class or a normal one. The features in bold in Table 11
datasets. To handle this problem, we applied the are the ones chosen after applying CFS. Note that
Synthetic Minority Oversampling Technique some non-qualified features were excluded such as IP
(SMOTE)66 which is supported by Weka. SMOTE is a addresses, protocol type, and times (the first 12 features
popular balancing technique as it creates synthetic in Table 11) from the initial feature set to ensure that
examples between existing real minority instances. The the classification model is not reliant on particular
main idea of SMOTE is to increase the samples of the acquisition biases. Overall, 10 features were selected
minority class by generating new instances in a random from the original 29 for the next stage. Table 11 shows
fashion among minority class samples using the KNN all features alongside their descriptions and an example
method. For example, before the balancing technique, of each feature.
the first scenario dataset had 3695 normal samples and
only 500 attack samples. However, after applying the
SMOTE balancing and sampling technique, the dataset
Dataset evaluation
holds 3695 normal samples and 3500 attack samples. One of the main objectives of this work is to generate
For more validation on managing the unbalancing VDDD from a VANET environment and to share it
issues within the generated dataset, we applied other with other researchers. To evaluate the validation and
techniques besides SMOTE to do the balancing. The the quality of our generated dataset, VDDD, we fol-
techniques we used include ClassBalancer, lowed the 11 criteria proposed by the Canadian
CostSensitiveClassifier, and ThresholdSelector, which Institute for Cybersecurity as a framework to evaluate
are available with the Weka tool.67 However, with datasets.68 VDDD fulfilled nine out of eleven criteria:
these balancing techniques, there was no significant the two criteria that our dataset did not satisfy are het-
increase in the accuracy of the evaluated models. Thus, erogeneity and attack diversity. Moreover, VDDD con-
we consider applying only the SMOTE sampling tains different attack scenarios that have diversity in
method as it gives the best accuracy among the meta- the attack rates and attack sources. Table 12 demon-
classifiers used in this study. strates how VDDD achieves/does not achieve each
criterion.
Table 11. Dataset features.
Number Feature State Description Example value
1 No h Sequence number 1
2 Event h Id of event 56
3 Time h Event time 1.921908541
4 PreviousValue h Previous event time 0
5 NextValue h Next event time 4.112046962
6 SourceName h Name of source/sender node Node [1]
7 Packet h Name of the packet AODV-RREQ
8 PacketType h Type of packet (UDP, TCP) Udp
9 SourceIP h Source IP address 10.0.0.169
10 SourcePort h Source port number 9003
11 DestIP h Destination IP address 10.0.0.5
12 DestPort h Destination port number 9002
13 transmissionState h Transmission state of the radio 1
14 throughput Throughput 116
15 sentPk h Number of sent bytes 208
16 sentDownPk h Packets sent to lower layer 128
17 receptionState Reception state of the radio 48
18 rcvdPkSeqNo Sequence number of received packets 1
19 rcvdPkFromLL Packets received from lower layer 256
20 rcvdPkFromHL Packets received from higher layer 256
21 rcvdPk Number of received bytes 336
22 radioMode h Requested radio operational mode 1
23 queueingTime h Queueing time 1
24 queueLength h Queue length 3
25 passedUpPk Packets bytes passed to higher layer 128
26 passedUpPkCount Packets count passed to higher layer 1
27 endToEndDelay End to end delay 1
28 droppedPkNotForUs h Drop packet not addressed to us 0
29 class Label Represent traffic type Normal, DDoS
UDP: user datagram protocol; TCP: transmission control protocol; IP: Internet protocol; DDoS: distributed denial of service.
Table 12. Evaluated VDDD.
Number Criteria Status Reasons
1 Information of The simulation scenario that we used to generate VDDD

network configuration contains all VANET components such as RSU, vehicles,
and router.
2 Complete traffic VDDD is generated based on complete traffic captured
from all the nodes.
3 Labeled dataset All the traffic in VDDD is labeled as normal or DDoS
4 Complete interaction VDDD captures the whole network interactions including
V2V, V2I, and I2I
5 Complete capture VDDD has captured all the traffic without removing any
set of normal or attack traffic
6 Anonymity IP is provided for each node in the VDDD
7 Available protocols VDDD has both normal and anomalous traffic considering
several protocols involved in generating these traffics
8 Feature set We provided and presented all features along with the way
we used to extract those features.
9 Metadata In this study, we provided all metadata about the
generated dataset
10 Heterogeneity ß We only generated VDDD from one source, which is the
simulation log
11 Attack diversity ß In VDDD, we focused only on UDP flood attack, which is
a type of DDoS attacks
VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset; DDoS: distributed denial of service dataset; VANET: Vehicular Ad hoc
NETworks; RSU: road side unit; UDP: user datagram protocol; IP: Internet protocol.
Table 13. Used features by other studies.
Reference Number of features Features Environment
Yu et al.36 7 H (srcIP) SDVN

H (flows)
Average number of packets
Average number of bytes
Rate of flow table entries (Rf)
Percentage of pair flows (Ppf)
Ports generating speed (Pgs)
Karagiannis and Argyriou23 4 Received signal strength indicator (RSSI) VANET
Signal to noise and interference ratio (SINR)
Packet delivery ratio (PDR)
Relative speed variations (RSVs)
Singh et al.21 8 Source IP SDVN
Destination IP
Source port
Destination port
Protocol: the layer 4 protocol used such as TCP, UDP, ICMP
Byte count
Packet count
Time duration
Aneja et al.22 18 Not mentioned VANET
VDDD 9 Throughput VANET
Reception state of the radio
Sequence number of received packets
Packets received from lower layer
Packets received from higher layer
Number of received bytes
Bytes count
Packets count
Delay
SDVN: software-defined video networking; VANET: Vehicular Ad hoc NETworks; TCP: transmission control protocol; UDP: user datagram protocol;
ICMP: Internet control message protocol; VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset.
One of the important points to be considered is the correctly classified as positives out of all samples that
feature set. In this study, we provided all the available are actually positive. Finally, the F1-score, also called
features from the simulation experiments and then we the F-score, combines both recall and precision to
applied feature selection techniques to select the feature reflect the test’s accuracy. The experiments were con-
set that gives the best accuracy. Table 13 summarizes ducted using the Weka tool with classifiers’ parameters
the features recently studied and used in detecting shown in Table 14 for all the applied ML classifiers.
DDoS attacks on VANETs and other environments The results of the classification on VDDD presented
using ML techniques. in Table 15 show that all the applied classifiers achieved
To evaluate the validation of VDDD, we examined high detection accuracies generally greater than 99%
the performance and accuracy of the selected features except for SVM, which shows an accuracy of 97%. The
with different ML techniques including J48, SVM, RF, RF classifier achieved the highest accuracy at 99.7%
KNN, ANN, and NB, which are commonly used with compared to other applied ML classifiers. Table 16 pre-
DDoS attack detection as shown in Section ‘‘Literature sents the confusion matrices for each classifier. Based
review.’’ Here, we used the VDDD generated for the on these statistics, Figure 17 presents the false rates for
fourth scenario discussed early in section ‘‘Scenarios.’’ classifiers, where SVM shows a higher false rate com-
The experimental results have been evaluated in terms pared to other classifiers.
of the accuracy, precision, recall, and F1-score. The To calculate the computing time for our proposed
accuracy reflects the percentage of correctly classified approach, we subtracted the time taken to generate the
instances recorded in the test dataset. The precision cri- whole dataset. The focus was on the other computa-
terion measures the ratio of total relevant results that tional aspects, namely, the classifier’s building time and
are correctly classified as positives out of all the sam- feature weight calculation time.7 We conducted several
ples that are predicted as positives. The recall criterion experiments on VDDD starting with the feature selec-
measures the ratio of total relevant results that are tion method and then executing all the studied ML
Table 14. Main parameters of the classifiers.
Classifier Parameters Value Classifier Parameters Value
J48 Confidence factor 0.25 KNN KNN 1

Minimum number of objects 2 Used distance algorithm Euclidean distance
Number of folds 3 Window size 0
SVM Cache size 40 ANN Hidden layers a
Coef0 0.0 Learning rate 0.3
Cost 1.0 Momentum 0.2
Degree 3 Training time 500
Validation threshold 20
Eps 0.001 Learning rate 0.3
Gamma 0.0
Loss 0.1 Momentum 0.2
Nu 0.5
RF Bag size percent 100 NB Use kernel estimator False
Maximum depth 0
Number of execution slots 1 Use supervised discretization False
Number of features 0 Display model False
Number of iterations 100
KNN: K-nearest neighbor; SVM: support-vector machine; ANN: artificial neural network; RF: random forest; NB: naı̈ve Bayes.
Table 15. Evaluating results of ML classifiers on VDDD fourth Table 16. Confusion matrix for fourth scenario.
scenario.
J48 KNN
Fourth scenario
10,069 46 10,072 43
Classifier Accuracy Precision Recall F1 11 10,153 11 10,153
J48 99.7189% 0.995 0.999 0.997 SVM ANN

SVM 97.3667% 0.951 0.999 0.974
RF 99.7534% 0.996 0.999 0.998 9587 528 10,058 57
KNN 99.7337% 0.996 0.999 0.997 6 10,158 8 10,156
ANN 99.6795% 0.994 0.999 0.997
RF NB
NB 99.5266% 0.995 0.996 0.995
10,077 38 10,064 51
SVM: support-vector machine; RF: random forest; KNN: K-nearest 12 10,152 45 10,119
neighbor; ANN: artificial neural network; NB: naı̈ve Bayes.
Bold values represent the best values among the others. KNN: K-nearest neighbor; SVM: support-vector machine; ANN: artificial
neural network; RF: random forest; NB: naı̈ve Bayes.
methods to get the average of the computation time for
both building the classifier and ranking and selecting
the features. Generally, we considered the average of results showed that all models have around 99% as an
these computation measures from a total of seven runs area under the ROC curve, except for the SVM model,
on a computer running Windows 10 with CPU of which showed 97%. These results indicate the effective-
2.6 GHz and 8.00 GB RAM. The results are as fol- ness of all models in predicting classes with very low
lows: the average time taken to do the feature selection false rates.
using IG ranking method was about 0.23 s. The aver-
age time taken to build an ANN model was the longest
at about 2.54 s, followed by SVM at 1.8 s; the rest of Conclusion
the models had a comparable time of no more than
0.12 s. An insecure VANET can lead to fatal accidents, physi-
Moreover, we presented a receiver operating charac- cal disability, and even deaths. Accordingly, the secu-
teristic (ROC) curve69 as shown in Figure 18 which rity concerns regarding VANET required the attention
reflects the quality of the decision made by the classi- of researchers and developers with special consideration
fiers. ROC is one of the best measures to evaluate the of the unique characteristics of the network. In addi-
performance of classification models based on the tion, VANET must be capable of accurately detecting
threshold setting values as it reflects the model’s ability and preventing possible threats that might occur on the
to predict classes. Under VDDD, the experimental network. In this article, we explored several aspects of
Figure 17. False rates for applied ML models on VDDD.
generated dataset, VDDD, currently contains only

UDP flooding attacks. It can be extended to be more
generic by adding more types of DDoS attacks, as well
as other types of attacks. More ML techniques can be
trained on the dataset as a sort of evaluation of the
VDDD dataset. Generating more attack traffic to bal-
ance between legitimate and malicious traffic can be
considered as an extension to this work.
Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with
respect to the research, authorship, and/or publication of this
Figure 18. AUC–ROC curve for applied ML models on VDDD.
article.
the VANET system including its architecture and char-

acteristics, as well as introducing a literature review on Funding
recent studies about securing VANETs against DDoS The author(s) received no financial support for the research,
attacks. Due to the lack of available DDoS attack data- authorship, and/or publication of this article.
sets that fit the VANET environment, we simulated a
VANET environment involving a real highway scenario ORCID iD
using several tools including OMNeT++, INET, Veins,
Fahd A Alhaidari https://orcid.org/0000-0003-4383-0269
and SUMO. The simulated scenarios were used to gen-
erate and build a dataset for detecting DDoS attack in
VANET environment. The dataset records were pro- References
cessed effectively following all the principles of data 1. World Health Organization (WHO). Global status report
preparation and pre-processing that existed in the liter- on road safety 2018. WHO, 2018, https://www.who.int/
ature review. The proposed dataset VDDD has fulfilled violence_injury_prevention/road_safety_status/2018/Eng-
the majority of the requirements of being a valid data- lish-Summary-GSRRS2018.pdf (accessed 20 August
set as it overcomes the issues with existing VANET 2019).
datasets such as ignoring the VANET characteristics, 2. Jabbarpour MR, Nabaei A and Zarrabi H. Intelligent
dissimilar network configurations, and unavailability of guardrails: an IoT application for vehicle traffic conges-
the datasets to the public. Several ML models were tion reduction in smart city. In: Proceedings of the 2016
IEEE international conference on Internet of Things
trained on the generated dataset and all showed signifi-
(iThings) and IEEE green computing and communications
cant accuracy in detecting DDoS attack traffic.
(GreenCom) and IEEE cyber, physical and social comput-
ing (CPSCom) and IEEE smart data (SmartData),
Chengdu, China, 15–18 December 2016, pp.7–13. New
Future work York: IEEE.
As a future work, the study can be extended by consid- 3. Jain M and Saxena R. VANET: security attacks, solu-
ering other aspects when simulating VANET such as tion and simulation. In: Bhateja V, Tavares JMR, Rani
context and weather conditions. Moreover, the BP, et al. (eds) Proceedings of the second international
conference on computational intelligence and informatics. 17. Bhushan K and Gupta BB. Distributed denial of service
Singapore: Springer, 2018, pp.457–466. (DDoS) attack mitigation in software defined network
4. Ghebleh R. A comparative classification of information (SDN)-based cloud computing environment. J Amb Intel
dissemination approaches in vehicular ad hoc networks Hum Comp 2019; 10(5): 1985–1997.
from distinctive viewpoints: a survey. Comput Netw 2018; 18. Kolandaisamy R, Noor RM, Z’aba MR, et al. Adapted
131: 15–37. stream region for packet marking based on DDoS attack
5. Gruebler A, McDonald-Maier KD and Alheeti KMA. detection in vehicular ad hoc networks. J Supercomput
An intrusion detection system against black hole attacks 2020; 76: 5948–5970.
on the communication network of self-driving cars. In: 19. Kolandaisamy R, Noor RM, Kolandaisamy I, et al. A
Proceedings of the 2015 6th international conference on stream position performance analysis model based on
emerging security technologies (EST), Braunschweig, 3–5 DDoS attack detection for cluster-based routing in
September 2015, pp.86–91. New York: IEEE. VANET. J Amb Intel Hum Comp. Epub ahead of print 3
6. Brendha R and Prakash VSJ. A survey on routing proto- July 2020. DOI: 10.1007/s12652-020-02279-2.
cols for vehicular Ad Hoc networks. In: Proceedings of 20. Bensalah F, Elkamoun N and Baddi Y. SDNStat-Sec: a
the 2017 4th international conference on advanced comput- statistical defense mechanism against DDoS attacks in
ing and communication systems (ICACCS), Coimbatore, SDN-based VANET. In: Saeed F, Al-Hadhrami T,
India, 6–7 January 2017, pp.1–7. New York: IEEE. Mohammed F, et al. (eds) Advances on smart and soft
7. Arif M, Wang G, Geman O, et al. SDN-based VANETs, computing, vol. 1188. Singapore: Springer, 2020,
security attacks, applications, and challenges. Appl Sci pp.527–540.
2020; 10(9): 3217. 21. Singh PK, Jha SK, Nandi SK, et al. ML-based approach
8. Arif M, Wang G and Balas VE. Secure VANETs: trusted to detect DDoS attack in V2I communication under
communication scheme between vehicles and infrastruc- SDN architecture. In: Proceedings of the TEN-
ture based on fog computing. Stud Inform Control 2018; CON2018—2018 IEEE region 10 conference, Jeju, South
27(2): 235–246. Korea, 28–31 October 2019, pp.144–149. New York:
9. Irshad A, Usman M, Chaudhry SA, et al. A provably IEEE.
secure and efficient authenticated key agreement scheme 22. Aneja MJS, Bhatia T, Sharma G, et al. Artificial intelli-
for energy Internet-based vehicle-to-grid technology gence based intrusion detection system to detect flooding
framework. IEEE T Ind Appl 2020; 56(4): 4425–4435. attack in VANETs. In: Shrivistava G, Kumar P, Gupta
10. Hussain R, Hussain F and Zeadally S. Integration of BB, et al. (eds) Handbook of research on network forensics
VANET and 5G security: a review of design and imple- and analysis techniques. Hershey, PA: IGI Global, 2018,
mentation issues. Future Gener Comp Sy 2019; 101: pp.87–100.
843–864. 23. Karagiannis D and Argyriou A. Jamming attack detec-
11. Khelifi H, Luo S, Nour B, et al. Security and privacy tion in a pair of RF communicating vehicles using unsu-
issues in vehicular named data networks: an overview. pervised machine learning. Veh Commun 2018; 13: 56–63.
Mob Inf Syst 2018; 2018: 5672154. 24. Belenko V, Krundyshev V and Kalinin M. Synthetic
12. Pathre A, Agrawal C and Jain A. A novel defense scheme datasets generation for intrusion detection in VANET.
against DDOS attack in VANET. In: Proceedings of the In: Proceedings of the 11th international conference on
2013 10th international conference on wireless and optical security of information and networks (SIN’18), Cardiff,
communications networks (WOCN), Bhopal, India, 26–
10–12 September 2018, pp.1–6. New York: ACM.
28 July 2013, pp.1–5. New York: IEEE.
25. Zeng Y, Qiu M, Zhu D, et al. DeepVCM: a deep learning
13. Kolandaisamy R, Noor RM, Ahmedy I, et al. A multi-
based intrusion detection method in VANET. In: Pro-
variant stream analysis approach to detect and mitigate
ceedings of the IEEE 5th international conference on big
DDoS attacks in vehicular ad hoc networks. Wirel Com-
data security on cloud, Washington, DC, 27–29 May
mun Mob Com 2018; 2018: 2874509.
2019, pp.288–293. New York: IEEE.
14. Haydari A and Yilmaz Y. Real-time detection and miti-
26. Ghaleb FA, Zainal A, Rassam MA, et al. An effective
gation of DDoS attacks in intelligent transportation sys-
misbehavior detection model using artificial neural net-
tems. In: Proceedings of the 2018 21st international
work for vehicular ad hoc network applications. In: Pro-
conference on intelligent transportation systems (ITSC),
ceedings of the 2017 IEEE conference on application,
Maui, HI, 4–7 November 2018, pp.157–163. New York:
information and network security (AINS), Miri, Malay-
IEEE.
sia, 13–14 November 2017, pp.13–18. New York: IEEE.
15. Shabbir M, Khan MA, Khan US, et al. Detection and
27. Li W, Joshi A and Finin T. SVM-CASE: an SVM-based
prevention of distributed denial of service attacks in
context aware security framework for vehicular ad-hoc
VANETs. In: Proceedings of the 2016 international con-
networks. In: Proceedings of the 2015 IEEE 82nd vehicu-
ference on computational science and computational intelli-
lar technology conference (VTC2015-Fall), Boston, MA,
gence (CSCI), Las Vegas, NV, 15–17 December 2016,
pp.970–974. New York: IEEE. 6–9 September 2015, pp.1–5. New York: IEEE.
16. Mirsadeghi F, Rafsanjani MK and Gupta BB. A trust 28. Grover J, Prajapati NK, Laxmi V, et al. Machine learning
infrastructure based authentication method for clustered approach for multiple misbehavior detection in VANET.
vehicular ad hoc networks. Peer Peer Netw Appl. Epub Comm Com Inf Sc 2011; 192: 644–653.
ahead of print 24 October 2020. DOI: 10.1007/s12083- 29. Ali Alheeti KM, Gruebler A and McDonald-Maier K.
020-01010-4. Intelligent intrusion detection of grey hole and rushing
attacks in self-driving vehicular networks. Computers 43. Damasevicius R, Venckauskas A, Grigaliunas S, et al.
2016; 5(3): 16. LITNET-2020: an annotated real-world network flow
30. Aloqaily M, Otoum S, Al Ridhawi I, et al. An intrusion dataset for network intrusion detection. Electronics 2020;
detection system for connected vehicles in smart cities. Ad 9(5): 800.
Hoc Netw 2019; 90: 101842. 44. Al-Hadhrami Y and Hussain FK. Real time dataset gen-
31. Singh PK, Gupta RR, Nandi SK, et al. Machine learning eration framework for intrusion detection systems in IoT.
based approach to detect wormhole attack in VANETs. Future Gener Comp Sy 2020; 108: 414–423.
In: Proceedings of the workshops of the international con- 45. Alrehan AM and Alhaidari FA. Machine learning tech-
ference on advanced information networking and applica- niques to detect DDoS attacks on VANET system: a sur-
tions, Matsue, Japan, 27–29 March 2019, pp.651–661. vey. In: Proceedings of the 2019 2nd international
Cham: Springer. conference on computer applications and information secu-
32. Singh PK, Gupta S, Vashistha R, et al. Machine learning rity (ICCAIS), Riyadh, Saudi Arabia, 1–3 May 2019,
based approach to detect position falsification attack in pp.1–6. New York: IEEE.
VANETs. In: Proceedings of the international conference 46. Shiravi A, Shiravi H, Tavallaee M, et al. Toward develop-
on security and privacy, Jaipur, India, 9–11 January 2019, ing a systematic approach to generate benchmark data-
pp.166–178. Singapore: Springer. sets for intrusion detection. Comput Secur 2012; 31(3):
33. Gyawali S and Qian Y. Misbehavior detection using 357–374.
machine learning in vehicular communication networks. 47. Tavallaee M, Bagheri E, Lu W, et al. A detailed analysis
In: Proceedings of the ICC 2019—2019 IEEE interna- of the KDD CUP 99 dataset. In: Proceedings of the 2009
tional conference on communications (ICC), Shanghai, IEEE symposium on computational intelligence for security
China, 20–24 May 2019, pp.1–6. New York: IEEE. and defense applications, Ottawa, ON, Canada, 8–10 July
34. So S, Sharma P and Petit J. Integrating plausibility 2009, pp.1–6. New York: IEEE.
checks and machine learning for misbehavior detection in 48. Moustafa N and Slay J. UNSW-NB15: a comprehensive
VANET. In: Proceedings of the 2018 17th IEEE interna- dataset for network intrusion detection systems (UNSW-
tional conference on machine learning and applications NB15 network dataset). In: Proceedings of the 2015 mili-
(ICMLA), Orlando, FL, 17–20 December 2018, pp.564– tary communications and information systems conference
571. New York: IEEE. (MilCIS), Canberra, ACT, Australia, 10–12 November
35. Kim M, Jang I, Choo S, et al. Collaborative security 2015, pp.1–6. New York: IEEE.
attack detection in software-defined vehicular networks. 49. Koyoto dataset, 2016, https://www.takakura.com/Kyo-
In: Proceedings of the 2017 19th Asia-Pacific network to_data/BenchmarkData-Description-v5.pdf (accessed 16
operations and management symposium (APNOMS), November 2020).
Seoul, South Korea, 27–29 September 2017, pp.19–24. 50. Varga A. OMNeT++. In: Wehrle K, Günesx M and
New York: IEEE. Gross J (eds) Modeling and tools for network simulation
36. Yu Y, Guo L, Liu Y, et al. An efficient SDN-based (1st ed.) Berlin, Heidelberg: Springer, 2010, pp.35–59.
DDoS attack detection and rapid response platform in 51. Lopez PA, Behrisch M, Bieker-Walz L, et al. Microscopic
vehicular networks. IEEE Access 2018; 6: 44570–44579. traffic simulation using SUMO. In: 2018 21st International
37. Luong NT, Vo TT and Hoang D. FAPRP: a machine conference on intelligent transportation systems (ITSC),
learning approach to flooding attacks prevention routing Maui, HI, USA, 4–7 November 2018, pp.2575–2582.
protocol in mobile ad hoc networks. Wirel Commun Mob New York: IEEE.
Com 2019; 2019: 6869307. 52. INET. INET framework, https://inet.omnetpp.org/
38. Reddy KG and Thilagam PS. Naı̈ve Bayes classifier to (accessed 14 July 2019).
mitigate the DDoS attacks severity in ad-hoc networks. 53. Sommer C, German R and Dressler F. Bidirectionally
Int J Comm Network Inform Secur 2020; 12(2): 221–226. coupled network and road traffic simulation for
39. Gao Y, Wu H, Song B, et al. A distributed network intru- improved IVC analysis. IEEE T Mobile Comput 2011;
sion detection system for distributed denial of service 10(1): 3–15.
attacks in vehicular ad hoc network. IEEE Access 2019; 54. Haklay M and Weber P. OpenStreetMap: user-generated
7: 154560–154571. street maps. IEEE Pervas Comput 2008; 7(4): 12–18.
40. Ali Alheeti KM and McDonald-Maier K. Intelligent intru- 55. Fathy M, Firouzjaee SG and Raahemifar K. Improving
sion detection in external communication systems for auton- QoS in VANET using MPLS. Procedia Comput Sci 2012;
omous vehicles. Syst Sci Control Eng 2018; 6(1): 48–56. 10: 1018–1025.
41. Van der Heijden RW, Lukaseder T and Kargl F. VeR- 56. Kotenko I and Ulanov A. Agent-based simulation of
eMi: a dataset for comparable evaluation of misbehavior DDOS attacks and defense mechanisms. Int J Comput
detection in VANETs. In: Beyah R, Chang B, Li Y, et al. 2014; 4(2): 113–123.
(eds) Security and privacy in communication networks 57. Kaur R, Sangal AL and Kumar K. Modeling and simula-
(SecureComm 2018; Lecture notes of the Institute for tion of DDoS attack using Omnet++. In: Proceedings of
Computer Sciences, Social Informatics and Telecommu- the 2014 international conference on signal processing and
nications Engineering), vol. 254. Cham: Springer, 2018, integrated networks (SPIN), Noida, India, 20–21 Febru-
pp.318–337. ary 2014, pp.220–225. New York: IEEE.
42. Lyamin N, Kleyko D, Delooz Q, et al. Real-time jam- 58. Alzahrani S and Hong L. Generation of DDoS attack
ming DoS detection in safety-critical V2V C-ITS using dataset for effective IDS development and evaluation. J
data mining. IEEE Commun Lett 2019; 23(3): 442–445. Inf Secur 2018; 9(4): 225–241.
59. Siddiqui AJ and Boukerche A. On the impact of DDoS 64. Novaković J, Strbac P and Bulatović D. Toward optimal
attacks on software-defined Internet-of-vehicles control feature selection using ranking methods and classification
plane. In: Proceedings of the 2018 14th international wire- algorithms. Yugosl J Oper Res 2011; 21(1): 119–135.
less communications and mobile computing conference 65. Li J, Cheng K, Wang S, et al. Feature selection: a data
(IWCMC), Limassol, Cyprus, 25–29 June 2018, perspective. ACM Comput Surv 2018; 50(6): 94.
pp.1284–1289. New York: IEEE. 66. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: syn-
60. SQLite Brwoser. DB Browser for SQLite, https://sqliteb- thetic minority over-sampling technique. J Artif Intell Res
rowser.org/ (accessed 14 July 2019). 2002; 16: 321–357.
61. Project Jupyter. Project Jupyter home, https://jupyter.- 67. Jain S, Kotsampasakou E and Ecker GF. Comparing the
org/index.html (accessed 14 July 2019). performance of meta-classifiers-a case study on selected
62. Karegowda AG, Manjunath AS and Jayaram MA. Com- imbalanced datasets relevant for prediction of liver toxi-
parative study of attribute selection using gain ratio and city. J Comput Aid Mol Des 2018; 32(5): 583–590.
correlation based feature selection. Int J Inf Technol 68. Sharafaldin I, Gharib A, Lashkari AH, et al. Towards a
Knowl Manag 2010; 2(2): 271–277. reliable intrusion detection benchmark dataset. Softw
63. Hall MA. Correlation-based feature selection for Netw 2017; 2017(1): 177–200.
machine learning. PhD Dissertation, Department of 69. Omar L and Ivrissimtzis I. Using theoretical ROC curves
Computer Science, Waikato University, Hamilton, for analysing machine learning binary classifiers. Pattern
New Zealand, 1999. Recogn Lett 2019; 128(6): 447–451.

Alhaidari, 2021

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Alhaidari, 2021

Uploaded by

Copyright:

Available Formats

Review

International Journal of Distributed

denial of service attacks on Vehicular

Fahd A Alhaidari and Alia Mohammed Alrehan

Date received: 11 February 2021; accepted: 14 February 2021

Handling Editor: Ashish Kr Luhach

Introduction to employ the new technologies in reducing such traffic

popular and important technology due to its link to

Figure 2. Categories of studies investigated in the literature review.

Table 2. Simulation tools and versions used in this work.

Category Tool Version

Network interface OMNeT++ V 5.1

considered, the detection techniques used, and the per-

Figure 4. Proposed work diagram.

4. SUMO configuration file that takes both the

OMNeT++ provides all the requirements to simu-

Figure 6. Scenario design.

Figure 7. Internal node design.

Figure 8. Configuration parameters for UDP normal traffic.

Figure 9. Configuration parameters for DDoS traffic.

Table 4. Simulation parameters used by recent studies.

Reference Simulation Number of attackers Number of Number Transmission Speed

RSU: road side unit.

Table 5. Details of simulated scenarios.

Simulation time 500 s 500 s 500 s 500 s

RSU: road side unit.

Figure 10. Data workflow.

Labeling the dataset is an indispensable stage of data

Figure 11. Vector plot for transmission state. Data pre-processing

Figure 12. Data preparation flowchart.

Figure 13. Log file.

Figure 14. Schema of exported vector file.

Figure 15. Vector table.

CFS. CFS is a popular technique for estimating a cor- krcf

Table 6. Details of instances in each dataset.

First dataset Second dataset Third dataset Fourth dataset

Number of instances 4195 6186 11,556 17,375

Table 7. Ranks of first dataset attributes.

Number Ranked Attribute name Number Ranked Attribute name

1 0.53783 throughput 2 0.23248 rcvdPkFromHL

Table 8. Ranks of second dataset attributes.

Number Ranked Attribute name Number Ranked Attribute name

1 0.769838 throughput 2 0.404341 rcvdPkFromHL

must be close to 21 or 1. We discarded the low correla- IG

Table 9. Ranks of third dataset attributes.

Number Ranked Attribute name Number Ranked Attribute name

1 0.33156 throughput 2 0.241 rcvdPkFromHL

Table 10. Ranks of fourth dataset attributes.

Number Ranked Attribute name Number Ranked Attribute name

1 0.54006 throughput 2 0.39829 rcvdPkFromHL

environment, the majority of traffic is normal traffic Dataset description

Table 11. Dataset features.

Number Feature State Description Example value

Table 12. Evaluated VDDD.

Number Criteria Status Reasons

1 Information of The simulation scenario that we used to generate VDDD

Table 13. Used features by other studies.

Reference Number of features Features Environment

Yu et al.36 7 H (srcIP) SDVN

Table 14. Main parameters of the classifiers.

Classifier Parameters Value Classifier Parameters Value

J48 Confidence factor 0.25 KNN KNN 1

J48 99.7189% 0.995 0.999 0.997 SVM ANN

Figure 17. False rates for applied ML models on VDDD.