Professional Documents
Culture Documents
DATASET
DATASET
Emulation
Arvin Hekmati Eugenio Grippo Bhaskar Krishnamachari
hekmati@usc.edu egrippo@usc.edu bkrishna@usc.edu
University of Southern California University of Southern California University of Southern California
Los Angeles, California, USA Los Angeles, California, USA Los Angeles, California, USA
We present in this work first a data set obtained from a real urban
IoT system in a large city consisting of more than 4000 spatially
distributed sensors. The data consists of the binary activity status
of each node at a granularity of 30 seconds over a period of one
month under a benign (non-attacked) setting.
To make the dataset useful for training machine learning tools
for DDoS detection, we need to augment it through synthetic attack
emulation. Therefore, in addition to the raw (benign) activity data,
we also provide a dedicated script that generates attacks in the
proposed dataset synthetically. Our script allows the setting of
multiple parameters: number of nodes to be processed, total attack
duration, attack ratio, starting time of the attack; and each particular
node is equipped with a time-stamp and an output that varies
binary between 0 (no attack) and 1 (attacked node). To illustrate
the utility of our dataset and attack emulator, we design, train,
and implement a simple supervised feed-forward NN model to
detect malicious attacks utilizing the provided dataset. We make the
dataset and the attack emulation script along with our illustrative Figure 1: Active Nodes Percentage vs Time
NN model available as an open-source repository online at https:
//github.com/ANRGUSC/Urban_IoT_DDoS_Data.
The paper is structured as follows: section 2 presents the original
dataset and some statistics about it. The attack and defense mecha-
nism are presented in section 3. Finally, we conclude the paper in
section 4.
(a) Histogram of mean activity time per node - From 8 AM to 8 (b) Histogram of mean inactivity time per node - From 8 AM to
PM 8 PM
(c) Histogram of mean activity time per node - From 8 PM to 8 (d) Histogram of mean inactivity time per node - From 8 PM to
AM 8 AM
mean active times and mean inactive times tend to be higher in the even by focusing on activity status only, we are effectively emulat-
nights compared to the days. ing more challenging futuristic DDoS attacks that may be hard to
detect from a single node’s traffic.
A script is provided that can be used for generating attacks over
3 ATTACK AND DEFENSE MECHANISM the dataset. Three parameters can be set in generating the attacks:
This section presents how synthetic DDoS attacks are generated the start time of the attack, the duration of the attack, and the
on the IoT nodes. Furthermore, here we define the training dataset percentage of the nodes that go under attack. In our experiments,
features and also the detection mechanism. we used one week’s worth of dataset for generating attacks. The
attacks are started at 2 AM on each day of the week over all of the
3.1 Generating Attack Dataset nodes with durations of 1, 2, 4, 8, 16 hours.
time windows, namely as 1, 10, 30, minutes and 1, 2, 4, 8, 16, 24, 30,
36, 42, 48 hours, to calculate the mean activity time of the nodes.
[4] Yang Lu and Li Da Xu. Internet of things (iot) cybersecurity research: A review [31] Iman Sharafaldin, Arash Habibi Lashkari, Saqib Hakak, and Ali A Ghorbani. De-
of current research topics. IEEE Internet of Things Journal, 6(2):2103–2115, 2018. veloping realistic distributed denial of service (ddos) attack dataset and taxonomy.
[5] Dhuha Khalid Alferidah and NZ Jhanjhi. Cybersecurity impact over bigdata and In 2019 International Carnahan Conference on Security Technology (ICCST), pages
iot growth. In 2020 International Conference on Computational Intelligence (ICCI), 1–8. IEEE, 2019.
pages 103–108, 2020. [32] DARPA intrusion detection evaluation dataset, Lincoln Laboratory, Mas-
[6] Ibrahim Alrashdi, Ali Alqazzaz, Esam Aloufi, Raed Alharthi, Mohamed Zohdy, sachusetts Institute of Technology. https://www.ll.mit.edu/r-d/datasets/2000-
and Hua Ming. Ad-iot: Anomaly detection of iot cyberattacks in smart city darpa-intrusion-detection-scenario-specific-datasets. Accessed: 09-12-2021.
using machine learning. In 2019 IEEE 9th Annual Computing and Communication [33] University of New Brunswick, “DDoS Evaluation Dataset (CICDDoS2019)”,unb.ca,
Workshop and Conference (CCWC), pages 0305–0310, 2019. 2019. https://www.unb.ca/cic/datasets/ddos-2019.html. Accessed: 09-13-2021.
[7] Elisa Bertino and Nayeem Islam. Botnets and internet of things security. Com- [34] University of New Brunswick, “CSE-CIC-IDS2018 on AWS,” 2018. https://www.
puter, 50(2):76–79, 2017. unb.ca/cic/datasets/ids-2018.html. Accessed: 09-13-2021.
[8] Constantinos Kolias, Georgios Kambourakis, Angelos Stavrou, and Jeffrey Voas. [35] Canadian Institute for Cybersecurity, “CICIDS2017,” unb.ca, 2017. https://www.
Ddos in the iot: Mirai and other botnets. Computer, 50(7):80–84, 2017. unb.ca/cic/datasets/ids-2017.html. Accessed: 09-13-2021.
[9] Roger Hallman, Josiah Bryan, Geancarlo Palavicini, Joseph Divita, and Jose [36] Frank Beer, Tim Hofer, David Karimi, and Ulrich Bühler. A new attack composi-
Romero-Mariona. Ioddos-the internet of distributed denial of sevice attacks. tion for network security. In Paul Müller, Bernhard Neumair, Helmut Raiser, and
In 2nd international conference on internet of things, big data and security. Gabi Dreo Rodosek, editors, 10. DFN-Forum Kommunikationstechnologien, pages
SCITEPRESS, pages 47–58, 2017. 11–20, Bonn, 2017. Gesellschaft für Informatik e.V.
[10] Bruno Bogaz Zarpelo, Rodrigo Sanches Miani, Cludio Toshio Kawakani, and [37] Derya Erhan and Emin Anarım. Boğaziçi university distributed denial of service
Sean Carlisto de Alvarenga. A survey of intrusion detection in internet of things. dataset. Data in brief, 32:106187, 2020.
J. Netw. Comput. Appl., 84(C):25–37, April 2017. [38] A Scheme for Generating a Dataset for Anomalous Activity Detection in IoT Net-
[11] Sebastián García, Alejandro Zunino, and Marcelo Campo. Survey on network- works. https://sites.google.com/view/iot-network-intrusion-dataset. Accessed:
based botnet detection methods. Security and Communication Networks, 7(5):878– 09-12-2021.
903, 2014. [39] University of California Irvine, Machine Learning Repository. https://archive.
[12] Saman Taghavi Zargar, James Joshi, and David Tipper. A survey of defense ics.uci.edu/ml/datasets/detection_of_IoT_botnet_attacks_N_BaIoT. Accessed:
mechanisms against distributed denial of service (ddos) flooding attacks. IEEE 09-12-2021.
communications surveys & tutorials, 15(4):2046–2069, 2013. [40] University of New South Wales, The Bot-IoT Dataset. https://research.unsw.edu.
[13] Manos Antonakakis, Tim April, Michael Bailey, Matt Bernhard, Elie Bursztein, au/projects/bot-iot-dataset. Accessed: 09-12-2021.
Jaime Cochran, Zakir Durumeric, J Alex Halderman, Luca Invernizzi, Michalis [41] Yair Meidan, Michael Bohadana, Yael Mathov, Yisroel Mirsky, Asaf Shabtai, Do-
Kallitsis, et al. Understanding the mirai botnet. In 26th {USENIX } security minik Breitenbacher, and Yuval Elovici. N-baiot—network-based detection of iot
symposium ( {USENIX } Security 17), pages 1093–1110, 2017. botnet attacks using deep autoencoders. IEEE Pervasive Computing, 17(3):12–22,
[14] Hamdija Sinanović and Sasa Mrdovic. Analysis of mirai malicious software. In 2018.
2017 25th International Conference on Software, Telecommunications and Computer
Networks (SoftCOM), pages 1–5. IEEE, 2017.
[15] K Vengatesan, Abhishek Kumar, M Parthibhan, Achintya Singhal, and R Rajesh.
Analysis of mirai botnet malware issues and its prediction methods in internet of
things. In International conference on Computer Networks, Big data and IoT, pages
120–126. Springer, 2018.
[16] Constantinos Kolias, Georgios Kambourakis, Angelos Stavrou, and Jeffrey Voas.
Ddos in the iot: Mirai and other botnets. Computer, 50(7):80–84, 2017.
[17] Christopher D. McDermott, Farzan Majdani, and Andrei V. Petrovski. Botnet
detection in the internet of things using deep learning approaches. In 2018
International Joint Conference on Neural Networks (IJCNN), pages 1–8, 2018.
[18] Joel Margolis, Tae Tom Oh, Suyash Jadhav, Young Ho Kim, and Jeong Neyo Kim.
An in-depth analysis of the mirai botnet. In 2017 International Conference on
Software Security and Assurance (ICSSA), pages 6–12, 2017.
[19] Artur Marzano, David Alexander, Osvaldo Fonseca, Elverton Fazzion, Cristine
Hoepers, Klaus Steding-Jessen, Marcelo H. P. C. Chaves, Ítalo Cunha, Dorgival
Guedes, and Wagner Meira. The evolution of bashlite and mirai iot botnets. In
2018 IEEE Symposium on Computers and Communications (ISCC), pages 00813–
00818, 2018.
[20] Niharika Sharma, Amit Mahajan, and V Malhotra. Machine learning techniques
used in detection of dos attacks: a literature review. International Journal of
Advance Research in Computer Science and Software Engineering, 6(3):100–105,
2016.
[21] Kevin Gurney. An introduction to neural networks. CRC press, 2018.
[22] Ethem Alpaydin. Introduction to machine learning, 3rd editio. ed, 2014.
[23] Amer Zayegh and N Bassam. Neural network principles and applications, vol-
ume 10. London: Pearson, 2018.
[24] University of New Brunswick, Canadian Institute for Cybersecurity. https:
//www.unb.ca/cic/datasets/. Accessed: 09-11-2021.
[25] University of California Irvine, KDD Archive. https://www.kdd.org/kdd-cup/
view/kdd-cup-1999/Data. Accessed: 09-11-2021.
[26] University of New South Wales, The UNSW-NB15 Dataset. https://research.unsw.
edu.au/projects/unsw-nb15-dataset. Accessed: 09-11-2021.
[27] Dilara Gümüşbaş, Tulay Yıldırım, Angelo Genovese, and Fabio Scotti. A compre-
hensive survey of databases and deep learning methods for cybersecurity and
intrusion detection systems. IEEE Systems Journal, 15(2):1717–1731, 2021.
[28] Markus Ring, Sarah Wunderlich, Dominik Grüdl, Dieter Landes, and Andreas
Hotho. Flow-based benchmark data sets for intrusion detection. In Proceedings of
the 16th European Conference on Cyber Warfare and Security. ACPI, pages 361–369,
2017.
[29] Markus Ring, Sarah Wunderlich, Dominik Grüdl, Dieter Landes, and Andreas
Hotho. Creation of flow-based data sets for intrusion detection. Journal of
Information Warfare, 16:40–53, 2017.
[30] The CAIDA UCSD "DDoS Attack 2007" Datasett. https://www.caida.org/catalog/
datasets/ddos-20070804_dataset. Accessed: 09-12-2021.