Professional Documents
Culture Documents
Abstract
Vehicular Ad hoc NETwork is a promising technology providing important facilities for modern transportation systems.
It has garnered much interest from researchers studying the mitigation of attacks including distributed denial of service
attacks. Machine learning techniques, which mainly rely on the quality of the datasets used, play a role in detecting many
attacks with a high level of accuracy. We conducted a comprehensive literature review and found many limitations on
the datasets available for distributed denial of service attacks on Vehicular Ad hoc NETwork including the following: una-
vailability of online versions, an absence of distributed denial of service traffic, unrepresentative of Vehicular Ad hoc
NETwork, and no information regarding the network configurations. Therefore, in this article, we proposed a novel
simulation technique to generate a valid dataset called Vehicular Ad hoc NETwork distributed denial of service dataset,
which is dedicated to Vehicular Ad hoc NETworks. Vehicular Ad hoc NETwork distributed denial of service dataset holds
information on distributed denial of service attack traffic considering Vehicular Ad hoc NETwork architecture, traffic
density, attack intensity, and nodes mobility. Well-known simulation tools such as SUMO, OMNeT++, Veins, and INET
were used to ensure that all the properties of Vehicular Ad hoc NETwork have been captured. We then compared
Vehicular Ad hoc NETwork distributed denial of service dataset with several studies to prove its novelty and evaluated
the dataset using several machine learning models. We confirmed that studied models using this dataset achieved high
accuracy above 99.5% except support-vector machine that achieved 97.3%.
Keywords
Vehicular Ad hoc NETwork, ad hoc network, distributed denial of service, machine learning, OMNeT++, Veins, dataset
Creative Commons CC BY: This article is distributed under the terms of the Creative Commons Attribution 4.0 License
(https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work
without further permission provided the original work is attributed as specified on the SAGE and Open Access pages
(https://us.sagepub.com/en-us/nam/open-access-at-sage).
2 International Journal of Distributed Sensor Networks
network environment (VANET or non-VANET), the for both low- and high-rate DDoS attacks and then
inclusion of DDoS attack, the presence of a dataset blocking the traffic generated by the attackers based on
(dataset-based or statistical method), and the evaluation their locations.
method (ML-based or generation framework). The study presented in Shabbir et al.15 proposed a
The obtained classes as shown in Figure 2 are as fol- threshold-based framework for detecting and prevent-
lows: (1) Category 1: studies that used statistical meth- ing attacks by utilizing communication time as a com-
ods rather than ML techniques for detecting attacks on munication characteristic to be compared with a
networks.13–20 (2) Category 2: studies that used ML specific threshold to decide about alerting the other
techniques trained on datasets of VANET to evaluate nodes to avoid further communication with the
detection of network attacks including DDoS attacker nodes. This type of study was excluded as it
attacks.21–25 (3) Category 3: studies satisfying the filter- did not involve datasets for training the proposed
ing criteria for Category 2, but not including DDoS models.
attacks.26–34 (4) Category 4: studies that used ML tech- Mirsadeghi et al.16 introduced a cryptography
niques trained on datasets to evaluate detection of dif- method based on certificates issued by a trusted author-
ferent attacks including DDoS, but not on a VANET ity. To have a trusted clustered vehicular network, they
environment architecture.35–40 (5) Category 5: studies proposed estimating a trust degree for each node con-
introducing frameworks for generating datasets for the sidering the trust between vehicles and RSUs. Then,
VANET environment.41,42 (6) Finally, Category 6: based on the estimated trust degree and other mobility
studies introducing the generation of datasets for measures, the appropriate cluster head is selected which
detecting attacks but not considering the specifications in turn checks the trust degree of abnormal nodes. Any
of VANETs.43,44 abnormal nodes will be in the blacklist of the certifica-
tion authority and thus unable to communicate with
either other vehicles or control units.
Statistical-based studies for DDoS attack Bhushan and Gupta17 discussed the features of
The first category of these studies includes statistical- software-defined networking (SDN) and proposed a
based techniques for detecting DDoS/denial of service novel flow table sharing technique to mitigate DDoS
(DoS) attacks targeting smart communication networks attacks that target the network by overloading the flow
such as VANET.13–15 However, these studies do not use table, which usually has a limited size. For the proposed
datasets for validating the proposed techniques; they approach, they modeled the flow table space as an M/
rely only on statistical methods. G/S/C queuing model and then applied several rule-
In Kolandaisamy et al.,13 multivariant stream analy- based methods to detect and prevent DDoS on SDNs
sis (MVSA) was proposed as a method to detect and by utilizing the flow table status for all the switches and
prevent DDoS attacks targeting VANET. Similarly, the blacklist database that holds Internet protocol (IP)
Haydari and Yilmaz14 used a statistical anomaly detec- addresses of attack sources.
tion technique to detect the attack by applying an Kolandaisamy et al.18 proposed an analysis model
online discrepancy test (ODIT) at the detection phase that is capable of detecting DDoS attacks on a VANET
4 International Journal of Distributed Sensor Networks
environment with less time needed to identify the attack Aneja et al.22 introduced a hybrid Intrusion
compared with other techniques discussed in their Detection System (IDS) to detect the RREQ Flooding
study-related work. The main idea is to calculate differ- attack in VANET environment where they used
ent measures through different stages as follows. Based SUMO, MOVE, and NS-2 tools in their conducted
on the clustering score of the incoming packets, a simulation experiments. They combined Artificial
stream position analysis was used to calculate specific Neural Networks (ANNs) with a Genetic Algorithm
determined features for the nodes including the volume (GA) model as a detection model where ANN performs
of the communicated data, payload, and message rates. the classification and GA tunes the selected input fea-
Then, these computed features were used to calculate tures. The dataset generated in Aneja et al.22 has a
the conflict field, conflict data, and attack signature detailed description of the simulation tools used along
sample rate, which are finally used in a statistically with the related steps and parameters. However, it has
based model to decide on the legitimacy of a node. certain limitations: the dataset itself is not available for
Kolandaisamy et al.19 proposed using an analytical other researchers, the network configuration was not
approach that utilizes the measures gathered by packet presented in the study, and there is no report on the
marking based on an adapted stream region scheme. dataset features.
The proposed approach involves extracting the neigh- The dataset generated in Karagiannis and
bor log file, calculating the node’s value, deciding on Argyriou23 is not available online, the study did not
the source region, identifying the routes for each show a clear procedure on processing the data, and
region, and computing the circulation rate. Finally, the there is no report on either the simulation tools or the
identification of a DDoS attack is based on the devia- network configuration parameters. Similarly, the data-
tion of the circulation from the current rate. Similarly, set introduced in Belenko et al.24 is not available online
Bensalah et al.20 proposed a statistical method for and has no description of the configuration of the net-
detecting and controlling malicious nodes in a VANET work and simulation environments. Thus, the gener-
using a variable control chart as a model to monitor ated datasets introduced in Singh et al.,21 Aneja
the quality of the communication for each node. Then, et al.,22 Karagiannis and Argyriou,23 and Belenko
based on the taken measurements, a node is considered et al.24 have been excluded from this study owing to the
a malicious node when its statistical quality violates the unavailability of both the dataset and the configuration
control limit. of simulation work used to generate the datasets.
The study in Zeng et al.25 proposed a deep learning
technique as an IDS that can perform feature extraction
ML-based studies for DDoS attacks on VANET and classification of different attacks on VANET
ML techniques are promising techniques for providing including DDoS, wormhole, and Sybil attacks. The
accurate detection and prediction mechanisms used in dataset used to train the detection models was gener-
many areas and domains including VANETs.45 There ated using an NS-3 simulator considering only the raw
are several ML studies applied to detect numerous mal- packets and logs as the output from the simulator. In
icious behaviors on VANETs. Since our focus in this addition, the ISCX IDS dataset46 was used to regener-
study is on the dataset that fits the VANET environ- ate and extract samples for different types of attacks
ment, we explored several studies that implemented such as DDoS attacks. The main observed limitations
ML techniques to detect DDoS attacks on VANET of the generated dataset are related to the generalization
systems.21–25 Moreover, we have highlighted the data- of the configuration parameters related to VANETs as
sets, simulation tools, and ML techniques used in such well as the lack of enough scenarios on the experimental
studies. models. Moreover, the ISCX IDS dataset is a tradi-
Singh et al.21 conducted an analysis on the impact of tional dataset for IDSs and is not designed to capture
DDoS attacks on the vehicle to infrastructure (V2I) the structure and features of VANETs.
communication under an SDN architecture. They
simulated Software-Defined Video Networking
(SDVN) using Mininet-WiFi and scikit-learn library ML-based studies for malicious attacks on VANET
for ML classifiers. In addition, eight supervised classi- Many studies have presented different ML techniques
fiers were used including gradient boost, random forest trained on datasets for detecting different types of
(RF), logistic regression (LR), nearest neighbors, deci- attacks; however, they have not included or considered
sion tree (DT), Support-Vector Machine (SVM), naı̈ve DDoS attacks.26–34
Bayes (NB), and neural network. Gradient boost classi- Ghaleb et al.26 used ANN model to detect malicious
fier gives the best accuracy among the models used. traffic in VANET. They trained ANN on a next gener-
However, the study did not provide details of the gen- ation simulation (NGSIM) dataset using MATLAB
erated dataset or the simulation scenarios such as the tools, and the results showed an accuracy of 99%. They
number of attacker nodes and the attack rate. used real traffic along with injected dynamic noises to
Alhaidari and Alrehan 5
generate a dataset that had many attacks. However, they used about 40 vehicles assuming movements of
there were no DDoS attacks; moreover, the dataset did fixed speed during the simulation time. The simulation
not give any details on the network configuration. was designed to generate the traffic holding features of
Another study in Li et al.27 presented the usage of a wormhole attack. Then, the data were preprocessed
SVM to detect nodes with suspicious behavior in a and used as a dataset for training both K-nearest neigh-
VANET environment by considering several input bor (KNN) and SVM models for detecting wormhole
parameters such as the movement speed and transmis- attacks. However, they presented no details on the pro-
sion range. The dataset was generated by the cedure and methodology of simulating the environment
GloMoSim simulation framework. However, the study and generating the dataset. Moreover, the dataset can
did not report the dataset specification, nor is the data- only be used for wormhole attacks.
set available online. A study Singh et al.32 proposed using SVM and LR
Grover et al.28 conducted an experimental work as ML techniques to detect false position data gener-
using the NCTUns-5.0 simulator to generate a dataset ated by malicious VANET nodes, known as a false
that was used to train and evaluate several ML tech- position attack. The evaluation of the detection model
niques on detecting malicious nodes in VANET where was conducted on the VeReMi dataset,41 and the results
Weka had been used to evaluate the specified classi- showed a high accuracy of about 97%. However, the
fiers. Although it presented the simulation work to gen- dataset does not involve traces for DDoS attacks.
erate the dataset, it did not explain the procedure. The VeReMi dataset has been used in Gyawali and
Moreover, it did not involve a DDoS attack in the gen- Qian33 to validate different ML techniques (LR, K-
erated dataset. nearest, DT, bagging, and RF) on detecting misbeha-
Ali Alheeti et al.29 proposed a smart security frame- vior attacks including both false alert generation and
work to protect the outside communication system for position falsification attacks. Similarly, a study pre-
autonomous and semi-autonomous vehicles by detect- sented in So et al.34 proposed ML-based techniques
ing gray hole and rushing attacks in real time. They (KNN and SVM) for detecting misbehavior attacks on
simulated the environment using SUMO, MOVE, and VANETs using the VeReMi dataset for training the
NS-2 which generates a trace file to produce a set of proposed models. However, the evaluation and predic-
features that can be used to differentiate legitimate from tion of DDoS attacks were not a part of their studies;
malicious behavior. Two ML classifier algorithms were moreover, the dataset does not involve traces for such
applied—SVM and FeedForward Neural Networks attacks.
(FFNNs), where results showed that the FFNN model
had a lower false negative rate than SVM. However,
SVM showed a high performance in terms of detection
ML-based studies for DDoS attacks on non-VANET
time and is faster than FFNN. To summarize, a Several studies have been proposed for detecting DDoS
detailed procedure for generating a dataset was pre- attacks using either existing datasets or simulations to
sented, but without involving DDoS attack traffic. generate their own datasets for the purpose of training
Moreover, there was insufficient information regarding and validating different ML classifiers.35–40 The main
network configuration parameters. limitation of these studies is that the datasets used do
Aloqaily et al.30 proposed a framework called D2H- not capture the characteristics of VANETs or the
IDS as an IDS in vehicle nodes connected through a environment.
cloud network. The effectiveness of this solution was Kim et al.35 used the KDD CUP 1999 intrusion
validated through simulations where they generated detection dataset to train SVM for detecting several
normal traces using NS-3 and NSL-KDD dataset47 for attacks including DDoS attacks, and the results showed
generating several attacks including a DoS attack. The an effective classification of different attacks with an
features were selected by applying a Deep Belief accuracy of about 85%. Although the KDD CUP 1999
Network (DBN), and a DT was used for the classifica- used in Kim et al.35 is available online and contains
tion of attacks where results showed high accuracy and DDoS attacks, it was not designed for VANETs.
low false rates. However, generally, the datasets pro- Yu et al.36 proposed a framework to detect DDoS
posed and generated in Li et al.,27 Grover et al.,28 Ali attacks on SDVN environments by implementing three
Alheeti et al.,29 and Aloqaily et al.30 did not fulfill the different detection models including a trigger detection
required criteria due to the unavailability of the online model for inbound packets, a flow table feature-based
dataset, not reporting the network configurations, and detection model utilizing the features of OpenFlow pro-
not considering DDoS attacks as a part of the gener- tocol, and an attack detection model based on SVM.
ated dataset. They used a combination of real and generated network
Singh et al.31 generated a synthetic dataset using an traffic to generate a dataset considering different types
NS-3 simulator and mobility traces produced by an of DDoS attacks using the Scapy and hping3 tools, and
SUMO traffic simulator. In the network simulator, simulation results showed an accuracy of greater than
6 International Journal of Distributed Sensor Networks
97%. However, the dataset used is not available online, training model focused only on the CAM transmission,
and the simulation was conducted on virtual machines, leading to short training sequences of 100 s. Moreover,
which do not reflect VANET characteristics as nodes details on the configuration of the simulated environ-
are mobile in VANET and at different speeds resulting ment and the obtained traces are not available online
in quick changes to the topology. for further studies. A recently published dataset on
Luong et al.37 presented a simulation work to gener- VANET environments presented in Van der Heijden
ate a training dataset to be used with KNN classifiers et al.41 held many misbehaviors and attacks but did not
for detecting flooding attacks on MANETs by consid- include DDoS attacks which is the focus in this study.
ering the frequency of route request packets. Similarly, A study presented in Damasevicius et al.43 proposed
a study presented in Reddy and Thilagam38 conducted a dataset called LITNET-2020 generated using LITNET
a simulation work using Network Simulator (NS-2) for NetFlow topology and holding different attacking sce-
ad hoc networks to evaluate a proposed DDoS attack’s narios including DoS, DDoS, worms, land, and frag-
mitigation technique that relies on the usage of NB mentation attacks. They considered data flow for
classifier. Gao et al.39 proposed an IDS for DDoS different protocols including IPv6, transmission control
attacks using RF classifier utilizing big data technolo- protocol (TCP), user datagram protocol (UDP), and
gies such as Spark and Hadoop distributed file system Internet control message protocol (ICMP). The dataset
for implementing the proposed approach. They used is available online and the study provided a description
both NSL-KDD47 and UNSW-NB1548 datasets for about the dataset features and the network configura-
evaluating the proposed method for detecting DDoS tion parameters. However, the proposed dataset is not
attacks. However, the datasets used in Luong et al.37 dedicated for VANETs as it does not capture the prop-
and Reddy and Thilagam38 were generated by consid- erties of VANETs nor the VANET protocols.
ering only general ad hoc network characteristics, and A framework was proposed in Al-Hadhrami and
are not available online. Similarly, the NSL-KDD and Hussain44 for dataset generation that can be used to
UNSW-NB15 datasets used in Gao et al.39 include traf- train and validate IDS models on IoT networks. The
fic for DDoS attacks, but are not properly representa- dataset is called IoT-DDoS and involves different types
tive of VANET as they were designed for general of traffic including normal traffic, flooding attacks,
network traffic and thus do not capture the characteris- selective forwarding attacks, and blackhole attacks
tics of a VANET environment. considering different protocols such as the RPL routing
Ali Alheeti and McDonald-Maier40 proposed a protocol, ICMPv6, IEEE 802.15.4, 6LoWPAN, and
hybrid IDS for detecting malicious intrusion attacks UDP. However, the framework does not take into
such as DDoS and network scanning attacks on auton- account the specifications and protocols of VANETs.
omous vehicles. They used multi-layer perception As a summary, we believe that there are limited
(MLP) with fuzzy logic techniques trained on the existing solutions for detecting DDoS attacks in
Koyoto dataset,49 showing an accuracy of 99%. VANETs. The limitations are due to the lack of real or
Although the Koyoto dataset holds traffic and features synthetic DDoS datasets designed or generated for
from real network communications, it does not have VANET environments. Furthermore, applying tradi-
traffic obtained from VANET communication systems tional network solutions on VANETs without consider-
that use different types of protocols and work on a spe- ing the VANETs’ unique characteristics may lead to
cific style of communication. inaccurate results. Even though some studies have gen-
erated datasets considering the VANET environment,
Framework-based studies for generating datasets they did not illustrate the features available in their
datasets and others did not demonstrate the methods
The last group of studies explored and evaluated in this
they followed to generate the datasets. Thus, it is diffi-
study is of studies that presented frameworks for gener-
cult for other researchers to utilize these datasets as well
ating datasets that can be used for training and evaluat-
as to validate and compare their results with such solu-
ing intrusion detection techniques against several types
tions. Consequently, we believe that these datasets can-
of attacks. Some of these studies were dedicated for
not be used for further studies owing to one or more of
VANET41,42 and others were not.43,44
the following reasons: (1) dataset is not available
Lyamin et al.42 proposed a heuristic approach
online, (2) dataset does not contain a DDoS attack, (3)
derived from data mining methods for real-time detec-
dataset is not designed for VANET environments, and
tion of radio jamming DoS attacks in a VANET com-
munication environment. To train the proposed (4) unavailability of information regarding network
detection model, they conducted a simulation experi- configuration. Table 1 compares these explored data-
ment using MATLAB to generate a sort of dataset sets and studies based on the fulfillment of these four
holding cooperative awareness message (CAM) trans- criteria. Moreover, the table presents other aspects of
missions in the IEEE 802.11p protocol. However, the the evaluated studies including the type of attacks being
Table 1. Comparison of different datasets used for attack detection in VANET.
Reference Used dataset Online Involving Dedicated Network Involved attacks Trained models Performance ratio
availability DDoS for VANET configuration
attack availability
Singh et al.21 Generated dataset ß ß DDoS RF, DT, SVM, Above 90%
Alhaidari and Alrehan
boosting, others
Aneja et al.22 Generated dataset ß ß Flooding attack ANN 99%
Karagiannis and Argyriou23 Generated dataset ß ß RF jamming attack K-means NA.
Belenko et al.24 Generated dataset ß ß Including DDoS None NA.
Zeng et al.25 Generated dataset for ß ß DoS, DDoS, blackhole, CNN, LSTM 96.9%
VANET and ISCX IDS wormhole, Sybil
Ghaleb et al.26 NGSIM ß ß Malicious nodes ANN 99%
Li et al.27 Generated dataset ß ß ß Malicious nodes SVM Above 95%
Grover et al.28 Generated dataset ß ß ß Malicious nodes NB, J48, RF 97%
Ali Alheeti et al.29 Generated dataset ß ß ß Gray hole SVM, FFNN 90%
Aloqaily et al.30 NSL-KDD for ß ß ß DoS, Probe, R2L, U2R Deep belief, DT 99.43%
attack traffic
Singh et al.31 Generated dataset ß ß Wormhole attack KNN and SVM 99%
Singh et al.32 VeReMi ß Position falsification LR and SVM 97%
attack
Van der Heijden et al.41 Generated dataset ß Position falsification Non-machine Close to 1
(VeReMi) attack learning
Gyawali and Qian33 VeReMi ß False alert, position LR, K-N, DT, 97%
falsification bagging, RF
So et al.34 VeReMi ß Location spoofing KNN and SVM 94%
Kim et al.35 KDD CUP 1999 ß ß DoS attack SVM 85%
Yu et al.36 Generated dataset ß ß ß DDoS SVM 98.56%
Luong et al.37 Generated dataset ß ß ß Flooding attack KNN Above 99%
Reddy and Thilagam38 Generated dataset ß ß DDoS NB 80%
Gao et al.39 NSL-KDD and ß DDoS RF, SVM, NB 99.9% and 98.7%
UNSW-NB15
Ali Alheeti and Koyoto dataset ß ß Malicious intrusion MLP 99%
McDonald-Maier40
Lyamin et al.42 Generated dataset ß ß ß Radio jamming DoS Heuristic 95%
approach
Damasevicius et al.43 LITNET-2020 ß DDoS, worms, land, and None NA
others
Al-Hadhrami Generated dataset ß ß ß Flooding, forwarding, None NA
and Hussain44 called IoT-DDoS blackhole attacks
VDDD Generated dataset DDoS J48, SVM, ANN, 99.7%
KNN, RF, NB
VANET: Vehicular Ad hoc NETworks; RF: random forest; DT: decision tree; SVM: support-vector machine; ANN: artificial neural network; NA: not applicable; IDS: intrusion detection system; CNN:
convolutional neural network; LSTM: long short-term memory; NGSIM: next generation simulation; NB: naı̈ve Bayes; FFNN: feedforward neural network; LR: logistic regression; K-N: K-nearest; KNN: K-
nearest neighbor; MLP: multi-layer perception; VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset.
7
8 International Journal of Distributed Sensor Networks
recording all the traffic events within the configured geo-coordinates to metric coordinates of the OSM map;
environment. these metric coordinates are utilized in the next step by
SUMO. The following command does this task:
Overview of proposed work netconvert - -osm-files *.osm -o *.net.xml
The main idea of the proposed work is to simulate a
VANET environment considering both normal and Besides the network file, we need to consider the
DDoS traffic for the purpose of generating a synthetic obstacles found within the scenario such as buildings
dataset based on several simulated scenarios to be used and parks. OSM files have the advantage of providing
as an input for ML methods. The proposed work such information in addition to other information like
involves three main stages as shown in Figure 4. streets, lanes, junctions, and the maximum speed for
Starting from the bottom, the first stage is to generate each street. We used the poly-convert utility to generate
realistic network mobility traffic using SUMO. The sec- a poly file, which can be used in Veins to identify all
ond stage is to import the SUMO mobility traffic into the obstacles using the following command:
OMNeT++ to generate the network traffic (normal
and DDoS) utilizing both Veins and INET. The final polyconvert - -net-file *.net.xml - -osm-files *.osm - -type-
stage is to for collect and prepare the dataset that will *.xml -o *.poly.xml
be used for evaluating and studying the performance of
several ML algorithms. After generating the obstacles file, the SUMO net-
We started using SUMO to prepare the network and work is established and we proceed to generate the net-
generate the traffic. The first step is to export the simu- work traffic. There are two options to generate traffic
lation area which is the King Fahad Highway from for the vehicles in SUMO. The first one is to generate a
OpenStreetMap (OSM).54 Then, the OSM file is pro- random trip and the second one is to design a custom
cessed with SUMO’s net-convert utility that transforms trip in a specific route. In this study, we selected the
10 International Journal of Distributed Sensor Networks
Figure 5. KingFahadHighway.launchd.xml.
second choice and assumed that all vehicles considered Table 3. Simulation parameters.
in our scenarios cross the King Fahad Highway from
Dammam to Al-Khobar. To simulate traffic in our net- Parameter Value
work, we generated four files as follows:
Routing protocol AODV
PHY model IEEE 802.11p
1. Traffic Analysis Zone (TAZ) file which contains Channel Wireless
the edges for our route. Mobility scenario Highway (18 km)
2. Origin/Destination (OD) matrix file that Thread DDoS
Transport protocol UDP
includes the origin point, the destination point, Vehicle communication range 550 m
and the number of vehicles passed while taking RSU communication range 600 m
the route. Packet size 100 byte
3. Od2trips file that takes the TAZ and OD files RSU 3
as an input. Before generating the fourth file, Number of vehicles 20, 60
Speed Maximum of road speed
we combined these three files to generate Number of attackers 2
Od_file.odtrips.xml, by running the following Attack duration 25 s
command: Attack rate 10 and 50 pps
Normal rate 1–5 pps
od2trips -c PATH\od2trips.config.xml -n PATH\taz_ Run time 500 s
file.taz.xml -d PATH\OD_file.od -o PATH\od_ AODV: ad hoc on-demand distance vector; DDoS: distributed denial of
file.odtrips.xml service; UDP: user datagram protocol; RSU: road side unit.
Normal and attack traffic was generated using used dataset, and how the normal and attack traffic is
OMNeT++ where each node broadcasts requests to all being generated.
reachable nodes. All nodes send normal packets in a In this section, we explore the procedure to create
random manner where the transmission interval is VDDD. The following sections illustrate the steps to
between 1 and 5 s. The attack traffic was generated by generate a synthetic dataset on VANET environment.
specific vehicles to target the victim RSU with two dif- This section starts with the data collection and data
ferent rates (10 and 50 pps). preparation steps. After that, we proceeded to the data
We designed these scenarios with their related para- pre-processing step. Finally, we presented the dataset’s
meters based on recent studies, which simulated a feature selection step.
VANET environment to either study the impact of
some attacks or to generate a dataset for VANET Data collection
environments. Table 4 shows the simulation parameters
used by several studies from which we have adapted In this stage, we collect our data from OMNeT++ for
further analysis. Two files are mainly required to gener-
our parameters shown in Table 5.
ate the dataset, which are the trace file (log file) and
simulation results (vector file). Figure 10 illustrates the
Dataset specifications workflow we followed starting with collecting the raw
data and proceeding until we obtained a complete and
Generally, evaluating an intrusion detection–based ML informative dataset.
model depends on more than the classification accuracy The log file holds the events of messages’ transmis-
result as many other dimensions should also be consid- sion taking place among modules during the simula-
ered such as characteristics of the simulation area, the tion. Among the information recorded in this file are
event number, time, source and destination, packet
Alhaidari and Alrehan 13
Li et al.27 900 5, 10, 15, 20, 50, 100, 200 – – 5, 10, 20, 30 m/s
25, 30, 35, 40
Ali Alheeti et al.29 499 4 40 9 – 30 m/s
Aloqaily et al.30 600 – 40 – 8 pps 20 m/s
Aneja et al.22 200 2 20 – – 30 m/s
Belenko et al.24 100 2 30 – 250 Kbps 30 m/s
Haydari and Yilmaz14 200 – 250 – 1 pps –
Siddiqui and Boukerche59 120 1 20, 60 3 10, 50, 100 pps 15 m/s
Number of scenarios First scenario Second scenario Third scenario Fourth scenario
name, source and destination port, and packet length. data by removing redundant information, thus making
The vector file records data values as a series of times, it ready to be merged with the vector data. Figure 13
that is, with a timestamp, which is necessary to calculate shows the final version of the log file.
the features in the upcoming steps. These data values The vector file has been exported OMNeT++ to a
are recorded and captured based on several categories SQLite database browser.60 This exported vector file
or features. Moreover, OMNeT++ provides several contains 12 tables with some containing general infor-
analyses and validation tools that can be used to vali- mation such as the simulation run information. We
date the accuracy of such data vectors. For example, only focused on three tables as shown in Figure 14.
Figure 11 shows a vector plot for all the transmission The vector table contains information about all mod-
rates that happened during the simulation. ules along with many statistics such as Min, Max, Sum,
and others. Figure 15 shows a part of the vector table.
The preparation step for this file involves correcting
Data preparation some errors in the vector data table such as correcting
After collecting the raw data in the previous step, the the data types of some of the fields and validating the
data are ready to be prepared and processed in such a data values exported from OMNeT++.
way that it can be used for evaluating ML techniques. In order to have an informative dataset, it is neces-
As shown in Figure 12, the raw data goes through sev- sary to merge the log file with the vector file. To
eral stages until we get an informative dataset in a suit- achieve that, we wrote python functions and used
able format to be read and analyzed. These stages Jupyter Notebook61 to merge these two files by explor-
involve processing the log file obtained from the log ing each event in the log file, and then for each event
viewer, processing the vector file generated by calculating the current, previous, and next time for
OMNeT++, merging the log and vector files using each node to obtain the 16 selected features’ values in
both Python functions and Jupyter Notebook, and the interval between the previous and next time. Figure
labeling traffic instances using queries in SQLite. The 16 shows the workflow of the functions that extract the
purpose of processing log files is to keep only the features’ values from both the log and vector files. The
important information in the log as well as to clean the main idea is to conduct some queries on the data files
14 International Journal of Distributed Sensor Networks
Data normalization. Data normalization is the process of Feature selection. Feature selection is one of the data
rescaling the dataset attributes to lie in one particular reduction methods where selecting features significantly
range, for example, between 0 and 1 or 21 and 1. influences the performance as it reduces the training
According to equation (1) time and improves the accuracy. Conversely, keeping
irrelevant or partially relevant features can negatively
ðx MinÞ affect performance. Various feature selection tech-
X= ð1Þ
ðMax MinÞ niques are available today such as correlation-based
feature selection (CFS), information gain (IG)–based
Normalizing data often makes the dataset ready for feature selection, and gain ratio (GR) feature selec-
applying any classifier. In addition, to increase accuracy tion.49 A brief description of each of these techniques is
results, we applied normalization to our dataset using presented as follows.
Weka.
16 International Journal of Distributed Sensor Networks
Figure 16. Workflow of merging and calculating the values for dataset features.
1 No h Sequence number 1
2 Event h Id of event 56
3 Time h Event time 1.921908541
4 PreviousValue h Previous event time 0
5 NextValue h Next event time 4.112046962
6 SourceName h Name of source/sender node Node [1]
7 Packet h Name of the packet AODV-RREQ
8 PacketType h Type of packet (UDP, TCP) Udp
9 SourceIP h Source IP address 10.0.0.169
10 SourcePort h Source port number 9003
11 DestIP h Destination IP address 10.0.0.5
12 DestPort h Destination port number 9002
13 transmissionState h Transmission state of the radio 1
14 throughput Throughput 116
15 sentPk h Number of sent bytes 208
16 sentDownPk h Packets sent to lower layer 128
17 receptionState Reception state of the radio 48
18 rcvdPkSeqNo Sequence number of received packets 1
19 rcvdPkFromLL Packets received from lower layer 256
20 rcvdPkFromHL Packets received from higher layer 256
21 rcvdPk Number of received bytes 336
22 radioMode h Requested radio operational mode 1
23 queueingTime h Queueing time 1
24 queueLength h Queue length 3
25 passedUpPk Packets bytes passed to higher layer 128
26 passedUpPkCount Packets count passed to higher layer 1
27 endToEndDelay End to end delay 1
28 droppedPkNotForUs h Drop packet not addressed to us 0
29 class Label Represent traffic type Normal, DDoS
UDP: user datagram protocol; TCP: transmission control protocol; IP: Internet protocol; DDoS: distributed denial of service.
VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset; DDoS: distributed denial of service dataset; VANET: Vehicular Ad hoc
NETworks; RSU: road side unit; UDP: user datagram protocol; IP: Internet protocol.
20 International Journal of Distributed Sensor Networks
SDVN: software-defined video networking; VANET: Vehicular Ad hoc NETworks; TCP: transmission control protocol; UDP: user datagram protocol;
ICMP: Internet control message protocol; VDDD: Vehicular Ad hoc NETwork distributed denial of service dataset.
One of the important points to be considered is the correctly classified as positives out of all samples that
feature set. In this study, we provided all the available are actually positive. Finally, the F1-score, also called
features from the simulation experiments and then we the F-score, combines both recall and precision to
applied feature selection techniques to select the feature reflect the test’s accuracy. The experiments were con-
set that gives the best accuracy. Table 13 summarizes ducted using the Weka tool with classifiers’ parameters
the features recently studied and used in detecting shown in Table 14 for all the applied ML classifiers.
DDoS attacks on VANETs and other environments The results of the classification on VDDD presented
using ML techniques. in Table 15 show that all the applied classifiers achieved
To evaluate the validation of VDDD, we examined high detection accuracies generally greater than 99%
the performance and accuracy of the selected features except for SVM, which shows an accuracy of 97%. The
with different ML techniques including J48, SVM, RF, RF classifier achieved the highest accuracy at 99.7%
KNN, ANN, and NB, which are commonly used with compared to other applied ML classifiers. Table 16 pre-
DDoS attack detection as shown in Section ‘‘Literature sents the confusion matrices for each classifier. Based
review.’’ Here, we used the VDDD generated for the on these statistics, Figure 17 presents the false rates for
fourth scenario discussed early in section ‘‘Scenarios.’’ classifiers, where SVM shows a higher false rate com-
The experimental results have been evaluated in terms pared to other classifiers.
of the accuracy, precision, recall, and F1-score. The To calculate the computing time for our proposed
accuracy reflects the percentage of correctly classified approach, we subtracted the time taken to generate the
instances recorded in the test dataset. The precision cri- whole dataset. The focus was on the other computa-
terion measures the ratio of total relevant results that tional aspects, namely, the classifier’s building time and
are correctly classified as positives out of all the sam- feature weight calculation time.7 We conducted several
ples that are predicted as positives. The recall criterion experiments on VDDD starting with the feature selec-
measures the ratio of total relevant results that are tion method and then executing all the studied ML
Alhaidari and Alrehan 21
KNN: K-nearest neighbor; SVM: support-vector machine; ANN: artificial neural network; RF: random forest; NB: naı̈ve Bayes.
Table 15. Evaluating results of ML classifiers on VDDD fourth Table 16. Confusion matrix for fourth scenario.
scenario.
J48 KNN
Fourth scenario
10,069 46 10,072 43
Classifier Accuracy Precision Recall F1 11 10,153 11 10,153
conference on computational intelligence and informatics. 17. Bhushan K and Gupta BB. Distributed denial of service
Singapore: Springer, 2018, pp.457–466. (DDoS) attack mitigation in software defined network
4. Ghebleh R. A comparative classification of information (SDN)-based cloud computing environment. J Amb Intel
dissemination approaches in vehicular ad hoc networks Hum Comp 2019; 10(5): 1985–1997.
from distinctive viewpoints: a survey. Comput Netw 2018; 18. Kolandaisamy R, Noor RM, Z’aba MR, et al. Adapted
131: 15–37. stream region for packet marking based on DDoS attack
5. Gruebler A, McDonald-Maier KD and Alheeti KMA. detection in vehicular ad hoc networks. J Supercomput
An intrusion detection system against black hole attacks 2020; 76: 5948–5970.
on the communication network of self-driving cars. In: 19. Kolandaisamy R, Noor RM, Kolandaisamy I, et al. A
Proceedings of the 2015 6th international conference on stream position performance analysis model based on
emerging security technologies (EST), Braunschweig, 3–5 DDoS attack detection for cluster-based routing in
September 2015, pp.86–91. New York: IEEE. VANET. J Amb Intel Hum Comp. Epub ahead of print 3
6. Brendha R and Prakash VSJ. A survey on routing proto- July 2020. DOI: 10.1007/s12652-020-02279-2.
cols for vehicular Ad Hoc networks. In: Proceedings of 20. Bensalah F, Elkamoun N and Baddi Y. SDNStat-Sec: a
the 2017 4th international conference on advanced comput- statistical defense mechanism against DDoS attacks in
ing and communication systems (ICACCS), Coimbatore, SDN-based VANET. In: Saeed F, Al-Hadhrami T,
India, 6–7 January 2017, pp.1–7. New York: IEEE. Mohammed F, et al. (eds) Advances on smart and soft
7. Arif M, Wang G, Geman O, et al. SDN-based VANETs, computing, vol. 1188. Singapore: Springer, 2020,
security attacks, applications, and challenges. Appl Sci pp.527–540.
2020; 10(9): 3217. 21. Singh PK, Jha SK, Nandi SK, et al. ML-based approach
8. Arif M, Wang G and Balas VE. Secure VANETs: trusted to detect DDoS attack in V2I communication under
communication scheme between vehicles and infrastruc- SDN architecture. In: Proceedings of the TEN-
ture based on fog computing. Stud Inform Control 2018; CON2018—2018 IEEE region 10 conference, Jeju, South
27(2): 235–246. Korea, 28–31 October 2019, pp.144–149. New York:
9. Irshad A, Usman M, Chaudhry SA, et al. A provably IEEE.
secure and efficient authenticated key agreement scheme 22. Aneja MJS, Bhatia T, Sharma G, et al. Artificial intelli-
for energy Internet-based vehicle-to-grid technology gence based intrusion detection system to detect flooding
framework. IEEE T Ind Appl 2020; 56(4): 4425–4435. attack in VANETs. In: Shrivistava G, Kumar P, Gupta
10. Hussain R, Hussain F and Zeadally S. Integration of BB, et al. (eds) Handbook of research on network forensics
VANET and 5G security: a review of design and imple- and analysis techniques. Hershey, PA: IGI Global, 2018,
mentation issues. Future Gener Comp Sy 2019; 101: pp.87–100.
843–864. 23. Karagiannis D and Argyriou A. Jamming attack detec-
11. Khelifi H, Luo S, Nour B, et al. Security and privacy tion in a pair of RF communicating vehicles using unsu-
issues in vehicular named data networks: an overview. pervised machine learning. Veh Commun 2018; 13: 56–63.
Mob Inf Syst 2018; 2018: 5672154. 24. Belenko V, Krundyshev V and Kalinin M. Synthetic
12. Pathre A, Agrawal C and Jain A. A novel defense scheme datasets generation for intrusion detection in VANET.
against DDOS attack in VANET. In: Proceedings of the In: Proceedings of the 11th international conference on
2013 10th international conference on wireless and optical security of information and networks (SIN’18), Cardiff,
communications networks (WOCN), Bhopal, India, 26–
10–12 September 2018, pp.1–6. New York: ACM.
28 July 2013, pp.1–5. New York: IEEE.
25. Zeng Y, Qiu M, Zhu D, et al. DeepVCM: a deep learning
13. Kolandaisamy R, Noor RM, Ahmedy I, et al. A multi-
based intrusion detection method in VANET. In: Pro-
variant stream analysis approach to detect and mitigate
ceedings of the IEEE 5th international conference on big
DDoS attacks in vehicular ad hoc networks. Wirel Com-
data security on cloud, Washington, DC, 27–29 May
mun Mob Com 2018; 2018: 2874509.
2019, pp.288–293. New York: IEEE.
14. Haydari A and Yilmaz Y. Real-time detection and miti-
26. Ghaleb FA, Zainal A, Rassam MA, et al. An effective
gation of DDoS attacks in intelligent transportation sys-
misbehavior detection model using artificial neural net-
tems. In: Proceedings of the 2018 21st international
work for vehicular ad hoc network applications. In: Pro-
conference on intelligent transportation systems (ITSC),
ceedings of the 2017 IEEE conference on application,
Maui, HI, 4–7 November 2018, pp.157–163. New York:
information and network security (AINS), Miri, Malay-
IEEE.
sia, 13–14 November 2017, pp.13–18. New York: IEEE.
15. Shabbir M, Khan MA, Khan US, et al. Detection and
27. Li W, Joshi A and Finin T. SVM-CASE: an SVM-based
prevention of distributed denial of service attacks in
context aware security framework for vehicular ad-hoc
VANETs. In: Proceedings of the 2016 international con-
networks. In: Proceedings of the 2015 IEEE 82nd vehicu-
ference on computational science and computational intelli-
lar technology conference (VTC2015-Fall), Boston, MA,
gence (CSCI), Las Vegas, NV, 15–17 December 2016,
pp.970–974. New York: IEEE. 6–9 September 2015, pp.1–5. New York: IEEE.
16. Mirsadeghi F, Rafsanjani MK and Gupta BB. A trust 28. Grover J, Prajapati NK, Laxmi V, et al. Machine learning
infrastructure based authentication method for clustered approach for multiple misbehavior detection in VANET.
vehicular ad hoc networks. Peer Peer Netw Appl. Epub Comm Com Inf Sc 2011; 192: 644–653.
ahead of print 24 October 2020. DOI: 10.1007/s12083- 29. Ali Alheeti KM, Gruebler A and McDonald-Maier K.
020-01010-4. Intelligent intrusion detection of grey hole and rushing
24 International Journal of Distributed Sensor Networks
attacks in self-driving vehicular networks. Computers 43. Damasevicius R, Venckauskas A, Grigaliunas S, et al.
2016; 5(3): 16. LITNET-2020: an annotated real-world network flow
30. Aloqaily M, Otoum S, Al Ridhawi I, et al. An intrusion dataset for network intrusion detection. Electronics 2020;
detection system for connected vehicles in smart cities. Ad 9(5): 800.
Hoc Netw 2019; 90: 101842. 44. Al-Hadhrami Y and Hussain FK. Real time dataset gen-
31. Singh PK, Gupta RR, Nandi SK, et al. Machine learning eration framework for intrusion detection systems in IoT.
based approach to detect wormhole attack in VANETs. Future Gener Comp Sy 2020; 108: 414–423.
In: Proceedings of the workshops of the international con- 45. Alrehan AM and Alhaidari FA. Machine learning tech-
ference on advanced information networking and applica- niques to detect DDoS attacks on VANET system: a sur-
tions, Matsue, Japan, 27–29 March 2019, pp.651–661. vey. In: Proceedings of the 2019 2nd international
Cham: Springer. conference on computer applications and information secu-
32. Singh PK, Gupta S, Vashistha R, et al. Machine learning rity (ICCAIS), Riyadh, Saudi Arabia, 1–3 May 2019,
based approach to detect position falsification attack in pp.1–6. New York: IEEE.
VANETs. In: Proceedings of the international conference 46. Shiravi A, Shiravi H, Tavallaee M, et al. Toward develop-
on security and privacy, Jaipur, India, 9–11 January 2019, ing a systematic approach to generate benchmark data-
pp.166–178. Singapore: Springer. sets for intrusion detection. Comput Secur 2012; 31(3):
33. Gyawali S and Qian Y. Misbehavior detection using 357–374.
machine learning in vehicular communication networks. 47. Tavallaee M, Bagheri E, Lu W, et al. A detailed analysis
In: Proceedings of the ICC 2019—2019 IEEE interna- of the KDD CUP 99 dataset. In: Proceedings of the 2009
tional conference on communications (ICC), Shanghai, IEEE symposium on computational intelligence for security
China, 20–24 May 2019, pp.1–6. New York: IEEE. and defense applications, Ottawa, ON, Canada, 8–10 July
34. So S, Sharma P and Petit J. Integrating plausibility 2009, pp.1–6. New York: IEEE.
checks and machine learning for misbehavior detection in 48. Moustafa N and Slay J. UNSW-NB15: a comprehensive
VANET. In: Proceedings of the 2018 17th IEEE interna- dataset for network intrusion detection systems (UNSW-
tional conference on machine learning and applications NB15 network dataset). In: Proceedings of the 2015 mili-
(ICMLA), Orlando, FL, 17–20 December 2018, pp.564– tary communications and information systems conference
571. New York: IEEE. (MilCIS), Canberra, ACT, Australia, 10–12 November
35. Kim M, Jang I, Choo S, et al. Collaborative security 2015, pp.1–6. New York: IEEE.
attack detection in software-defined vehicular networks. 49. Koyoto dataset, 2016, https://www.takakura.com/Kyo-
In: Proceedings of the 2017 19th Asia-Pacific network to_data/BenchmarkData-Description-v5.pdf (accessed 16
operations and management symposium (APNOMS), November 2020).
Seoul, South Korea, 27–29 September 2017, pp.19–24. 50. Varga A. OMNeT++. In: Wehrle K, Günesx M and
New York: IEEE. Gross J (eds) Modeling and tools for network simulation
36. Yu Y, Guo L, Liu Y, et al. An efficient SDN-based (1st ed.) Berlin, Heidelberg: Springer, 2010, pp.35–59.
DDoS attack detection and rapid response platform in 51. Lopez PA, Behrisch M, Bieker-Walz L, et al. Microscopic
vehicular networks. IEEE Access 2018; 6: 44570–44579. traffic simulation using SUMO. In: 2018 21st International
37. Luong NT, Vo TT and Hoang D. FAPRP: a machine conference on intelligent transportation systems (ITSC),
learning approach to flooding attacks prevention routing Maui, HI, USA, 4–7 November 2018, pp.2575–2582.
protocol in mobile ad hoc networks. Wirel Commun Mob New York: IEEE.
Com 2019; 2019: 6869307. 52. INET. INET framework, https://inet.omnetpp.org/
38. Reddy KG and Thilagam PS. Naı̈ve Bayes classifier to (accessed 14 July 2019).
mitigate the DDoS attacks severity in ad-hoc networks. 53. Sommer C, German R and Dressler F. Bidirectionally
Int J Comm Network Inform Secur 2020; 12(2): 221–226. coupled network and road traffic simulation for
39. Gao Y, Wu H, Song B, et al. A distributed network intru- improved IVC analysis. IEEE T Mobile Comput 2011;
sion detection system for distributed denial of service 10(1): 3–15.
attacks in vehicular ad hoc network. IEEE Access 2019; 54. Haklay M and Weber P. OpenStreetMap: user-generated
7: 154560–154571. street maps. IEEE Pervas Comput 2008; 7(4): 12–18.
40. Ali Alheeti KM and McDonald-Maier K. Intelligent intru- 55. Fathy M, Firouzjaee SG and Raahemifar K. Improving
sion detection in external communication systems for auton- QoS in VANET using MPLS. Procedia Comput Sci 2012;
omous vehicles. Syst Sci Control Eng 2018; 6(1): 48–56. 10: 1018–1025.
41. Van der Heijden RW, Lukaseder T and Kargl F. VeR- 56. Kotenko I and Ulanov A. Agent-based simulation of
eMi: a dataset for comparable evaluation of misbehavior DDOS attacks and defense mechanisms. Int J Comput
detection in VANETs. In: Beyah R, Chang B, Li Y, et al. 2014; 4(2): 113–123.
(eds) Security and privacy in communication networks 57. Kaur R, Sangal AL and Kumar K. Modeling and simula-
(SecureComm 2018; Lecture notes of the Institute for tion of DDoS attack using Omnet++. In: Proceedings of
Computer Sciences, Social Informatics and Telecommu- the 2014 international conference on signal processing and
nications Engineering), vol. 254. Cham: Springer, 2018, integrated networks (SPIN), Noida, India, 20–21 Febru-
pp.318–337. ary 2014, pp.220–225. New York: IEEE.
42. Lyamin N, Kleyko D, Delooz Q, et al. Real-time jam- 58. Alzahrani S and Hong L. Generation of DDoS attack
ming DoS detection in safety-critical V2V C-ITS using dataset for effective IDS development and evaluation. J
data mining. IEEE Commun Lett 2019; 23(3): 442–445. Inf Secur 2018; 9(4): 225–241.
Alhaidari and Alrehan 25
59. Siddiqui AJ and Boukerche A. On the impact of DDoS 64. Novaković J, Strbac P and Bulatović D. Toward optimal
attacks on software-defined Internet-of-vehicles control feature selection using ranking methods and classification
plane. In: Proceedings of the 2018 14th international wire- algorithms. Yugosl J Oper Res 2011; 21(1): 119–135.
less communications and mobile computing conference 65. Li J, Cheng K, Wang S, et al. Feature selection: a data
(IWCMC), Limassol, Cyprus, 25–29 June 2018, perspective. ACM Comput Surv 2018; 50(6): 94.
pp.1284–1289. New York: IEEE. 66. Chawla NV, Bowyer KW, Hall LO, et al. SMOTE: syn-
60. SQLite Brwoser. DB Browser for SQLite, https://sqliteb- thetic minority over-sampling technique. J Artif Intell Res
rowser.org/ (accessed 14 July 2019). 2002; 16: 321–357.
61. Project Jupyter. Project Jupyter home, https://jupyter.- 67. Jain S, Kotsampasakou E and Ecker GF. Comparing the
org/index.html (accessed 14 July 2019). performance of meta-classifiers-a case study on selected
62. Karegowda AG, Manjunath AS and Jayaram MA. Com- imbalanced datasets relevant for prediction of liver toxi-
parative study of attribute selection using gain ratio and city. J Comput Aid Mol Des 2018; 32(5): 583–590.
correlation based feature selection. Int J Inf Technol 68. Sharafaldin I, Gharib A, Lashkari AH, et al. Towards a
Knowl Manag 2010; 2(2): 271–277. reliable intrusion detection benchmark dataset. Softw
63. Hall MA. Correlation-based feature selection for Netw 2017; 2017(1): 177–200.
machine learning. PhD Dissertation, Department of 69. Omar L and Ivrissimtzis I. Using theoretical ROC curves
Computer Science, Waikato University, Hamilton, for analysing machine learning binary classifiers. Pattern
New Zealand, 1999. Recogn Lett 2019; 128(6): 447–451.