2023153344

DATA DRIVEN SKILL ASSESSMENT FOR CYBERSECURITY EXERCISES
Saif Hassan1 , Sarang Shaikh2 , Muhammad Mudassar Yamin2 , Basel katt2 ,

Ali Shariq Imran2 , Mohib Ullah2
1
Department of Computer Science, Sukkur IBA University, Pakistan.
2
Norwegian University of Science and Technology, Gjøvik Norway.
ABSTRACT and the more it was considered an easy challenge. Dynamic

Measuring an individual’s cybersecurity competence in cy- scoring helped to assess the challenge difficulty. However, it
bersecurity training is a challenging and complex task. It in- can not map the challenges and their difficulties to the skill set
volves a wide range of skill assessments ranging from recon- required to solve the challenges and their level of competence.
naissance to post-exploitation in order to achieve some objec- Researchers employed pre and post-exercise surveys to col-
tives on a target. Traditionally, flags (correct answer submis- lect qualitative data about the exercise participants’ skill and
sions) were used to measure individuals’ competence, which their skill improvement before and after the exercise [4, 5].
provides a holistic view of their performance in accomplish- Such qualitative data only provides a holistic view of partic-
ing a particular cybersecurity task. In this work, we inves- ipants’ skill sets. Cybersecurity exercises involve complex
tigate a deep learning-based diffusion model for measuring hands-on cybersecurity operations. A set of different system
individual competence based on observed skills during a cy- commands can execute this cybersecurity operation [6]. Re-
bersecurity exercise. Our method helped identify weaknesses searchers are carrying out a multitude of different research
in an individual’s skill set, so a targeted intervention can be activities [7, 8] to identify the skill set of exercise participants
suggested to overcome the weaknesses and raise overall cy- using textual data extracted from logs of command execution.
bersecurity competence. This makes cybersecurity skill assessment through assigned
tasks limited to challenge creator subjective, as a single task
Index Terms— Cybersecurity, Education, Skill assess- can be completed in multiple creative ways. Therefore, we
ment focus on the steps that an individual takes to complete the
task and then map them to known attack models like the Cy-
1. INTRODUCTION ber kill chain to identify their concrete skills and deficiencies.
So in this research work, we are investigating the following
Due to the ever-increasing digitization of the world’s infras- research questions:
tructure, the need for trained individuals is rising to enable its
1. How can we measure the skill set of an exercise partic-
security. Due to cybersecurity’s practical nature, conventional
ipant based on the Cyber kill chain ?
teaching techniques like lectures and seminars cannot ade-
quately educate people about it [1]. Practical operational cy- 2. How can we visualize the measured skill set into a
bersecurity exercises are one suitable way to teach hands-on model for the identification of skills and deficiencies?
cybersecurity skills. These exercises use different evaluation
3. How can we validate the developed model for improv-
techniques to assess the individuals’ performance, challenge
ing the skill set of exercise participants?
difficulty, and skill set. Flags were primarily used for assess-
ing individual performance [2]. They are usually text strings To answer the first research question, we use a machine
that the exercise participant needs to access and submit to re- learning-based approach [9] to classify the commands used
ceive score points according to the challenge difficulty [3]. by an exercise participant in Cyber kill chain steps. We col-
The participant’s level of competence was determined by the lected data from various cybersecurity exercises and trained
number of flags they submitted. It is important to note that all a diffusion model to find similarities in commands. We used
challenges carry equal weight in this regard. Challenge diffi- this number to populate a heptagon (see figure 5) for visual-
culty is assigned to their subjective experience, which makes izing the skill set based on the seven Cyber kill chain steps,
the difficulty variable. Some participants may find a given which addressed our second research question. This identi-
task easy, while others struggle to solve it. Dynamic scoring fies which skills the participants have and which skills they
was created to address this issue. Flag score points are derived lack [10]. We used the produced visualization to create a
from the number of solutions with respect to time. Points for skill profile of exercise participants and conduct cybersecu-
a flag decreased when more exercise participants captured it, rity exercises after training on the deficiencies for validating
the research results, which provided the answer to our third scanning or discovery, attackers widely use Nmap and other
research question. scanning utilities, which can be used to map their skill set and
The rest of the paper is organized in the following or- the relevant stage of the attack.
der. Section 2 gives the background knowledge related to
the Cyber kill chain and security awareness assessment. De-
2.2. Cybersecurity Awareness Assessment
tails of the proposed assessment system are given in section
3. The implementation detail, including data collection, data Samaher et al. [13] worked on awareness of cybersecurity
pre-processing, and feature extraction, is discussed in section among students, academic staff, and researchers by focus-
4. The proposed diffusion model for classification and the ing on educational settings in the Middle East. The authors
quantitative analysis are elaborated in section 5. Section 6 concluded that the overall security plans of the Middle East
concludes the paper with final remarks. should include security awareness and training programs for
all kinds of users. In the context of cybersecurity awareness,
2. BACKGROUND AND RELATED WORK Slusky et al. [14] analyzed the cybersecurity awareness for
students’ assessment at California State University, Los An-
2.1. The Cyber kill Chain geles, USA. The authors identified that the major problem
with the students is not related to the availability of required
The Cyber kill chain is a model to explain cybersecurity at-
information but rather to the approach used in practical cir-
tacks in order to develop attack response and attack analysis
cumstances. Similarly, Aljeaid et al. [15] worked on the
capabilities. It is used to model and analyze the offensive ac-
assessment of student knowledge related to phishing attacks,
tions of a cyber-attacker [11]. Figure 1 shows the details of
focusing on understanding cybersecurity threats. The authors
seven stages of the Cyber kill chain process.
identified that users with limited knowledge could be easily
deceived or targeted for this kind of cybersecurity attack. Our
study focuses on the skill assessment of students in cyberse-
curity exercises. From [13, 14, 15], it is indicated that there is
a lack of awareness about cybersecurity-related issues in uni-
versity students, which can be addressed through cybersecu-
rity exercises [16]. However, assessing the skill improvement
in such exercises is a challenging task, which we discuss in
the following subsection (2.3).
2.3. Student skills assessment in cybersecurity exercises
Valdemar et al. in [17] explored the dataset from 18 cyberse-

curity training exercises with state-of-the-art machine learn-
ing techniques. They analyzed the 8834 commands using the
proposed automatic techniques that the proposed data min-
ing techniques are suitable for analyzing cybersecurity train-
ing data. Similarly, Maennel et al. [18] conducted a liter-
ature review of several data sources which can be used for
students’ skill assessment in cybersecurity awareness. These
Fig. 1. Seven Phases of the Cyber kill Chain [12] data sources include command-line data, input logs, and tim-
ing information. Our study investigates the use of command-
Cyber kill chain provides an easy-to-understand intrusion line data for skill assessment of students mapped into the
mechanism of a system. It maps both attacker and defender seven stages of the Cyber kill chain process. The dataset is
steps in the kill chain matrix in order to avoid the attack and developed by Švábenský et al. in [19]. Weiss et al. [20]
assist the defenders in performing relevant defensive manoeu- explained that command-line data is valuable for student as-
vres. Despite being simple, it provides a powerful way of sessment in cybersecurity exercises. The authors analyzed
presenting the skill sets of the attacker and defender in a par- the students’ progress by using their exact steps along with
ticular stage of the attack. Considering this, we can focus on the success or failure score. Mirkovic et al. [21] collected
each attack stage and what kind of operational commands an command-line data from participants in different hands-on
attacker uses to model their behaviour and assess their skill cybersecurity exercises. The authors used an automatic sys-
set. For instance, in most of the reconnaissance phases, the tem to compare the collected data with pre-defined milestones
attacker uses reconnaissance tools, such as packet sniffers, to and visualize the participants’ progress. This helped the au-
identify the relevant systems. Similarly, in the case of active thors to understand the areas where students need assistance.
Margus et al. [22] presented a method for meaningful analy- 4. SYSTEM IMPLEMENTATION
sis of participants’ skill sets in a cybersecurity exercise. The
researchers used an extensive logging mechanism to collect This section explains the steps involved in implementing the
raw exercise data and created a layer of abstraction to link classification model to classify the command-line data into
the collected data with the exercise participant’s competence. the cybersecurity kill chain stages. Figure 3 provides an
From the raw data, they collected logs that are associated with overview of the six significant steps involved in the process.
a particular task in the cybersecurity exercise. They linked The first step is data collection, which involves gathering
the task with the skills requirements to identify cybersecu- command-line data from participants’ systems during cyber-
rity competencies. The exercise participants have to fill out a security exercises. The second step is pre-processing, which
form to complete the task organized in the form of groups or includes data cleaning, filtering, and normalization to prepare
hierarchical order to be evaluated. The input from the form is the data for analysis. The third step is feature engineering,
then correlated with the collected data for assigning the score. which involves selecting relevant features that will be used in
Based on the diamond model for intrusion analysis [23], we the classification model. In this study, trigram features were
can see that the attacker steps and task are not linear: an at- used to classify the command-line data. The fourth step is
tacker can move from reconnaissance to delivery in Cyber kill
Chain [12] and MITRE ATT&CK [24]. For instance, an at-
tacker can start with the reconnaissance phase of the Cyber
kill chain to gather information about the potential target and
then move to the delivery stage by sending a phishing email
to a victim. If the victim falls for the phishing email, the at-
tacker can then exploit a known vulnerability in the victim’s
system to gain access and install malware. Once the malware
is installed, the attacker can establish command and control
over the compromised system and use it to carry out further
actions on objectives.
Fig. 2. Distributions of Commands According to Class Label
3. PROPOSED ASSESSMENT SYSTEM data splitting, where the dataset is divided into training and
testing sets. The training set is used to train the classification
model, while the testing set is used to evaluate the model’s
In cybersecurity exercises, it is crucial to measure the skill set performance. The fifth step is model construction, where the
of individuals in order to assess their capabilities and iden- selected machine learning algorithms are trained on the train-
tify areas for improvement. One way to measure an individ- ing set and optimized using a grid search approach. We em-
ual’s skill set is by looking at each phase of the attack they ployed five baseline machine learning algorithms, including
are performing in the exercise. As mentioned earlier, the Cy- K-Nearest Neighbors (KNN), Random Forest (RF), Multino-
ber kill chain is a framework that breaks down the various mial Naive Bayes (MBN), Logistic Regression (LR), and En-
stages of a cyber attack into seven phases, each utilizing dif- semble for benchmarking in addition to our diffusion model
ferent sets of commands from an attacker’s perspective. In for classification. Finally, the sixth step involves evaluating
order to measure an individual’s skill set, we are proposing the performance of the trained model on the testing set. The
a system that can fetch the attacker’s command line history evaluation metrics used in this study included accuracy, preci-
from a system and use different machine learning classifica- sion, recall, and F1-score. The trained model predicts which
tion algorithms to classify the attacker’s use of commands in of the seven stages (figure 5) of the Cyber kill Chain pro-
solving particular challenges. By classifying an individual’s cess the command-line data belongs to. These stages include
use of commands during each phase of the Cyber kill Chain, reconnaissance, weaponization, delivery, exploitation, instal-
we can measure their skill set in cybersecurity exercises and lation, command and control, and action on objectives.
identify areas for improvement. The system would work by
collecting the command line history of an attacker during a
4.1. Data Collection
cybersecurity exercise and then using machine-learning algo-
rithms to classify the commands used by the attacker during To identify the skill sets of the participants, more than 40,000
each phase of the Cyber kill Chain. Once the commands have commands were collected from several participants involved
been classified, the system could generate a skill for solving in cybersecurity exercises. Only ten thousand commands
the challenge that provides insights into the individual’s skill were labelled using the Cyber kill chain classes. The dataset
set and areas for improvement. was developed by Švábenskỳ et al. in [19].
Fig. 4. Distributions of Classes with Respect to Train Test
Split
5. PROPOSED MODEL
5.1. Classification and Regression Model (CARD)

The CARD model [28] is a robust framework for tackling re-
gression and classification tasks with an emphasis on estimat-
ing prediction uncertainty. It leverages deep neural networks
and the diffusion model to provide accurate predictions along
Fig. 3. Proposed Methodology for Cybersecurity Kill Chain
with reliable measures of uncertainty. For regression tasks,
Classification
the CARD model estimates the conditional distribution of the
target variable y given the input data x and the diffusion time
4.2. Data Preprocessing t. The noise estimation loss Lε is defined as follows:
√ √
Lε = ∥ε − εθ (x, αt y0 + 1 − αt ε (1)
We preprocessed the data removing the IP addresses from √ 2
commands. Additionally, the Bash history with no commands + (1 − αt )fφ (x), fφ (x), t)∥ ,
was also discarded [25]. Furthermore, we transform the raw where εθ is the function approximator parameterized by a
text into vector forms. This step extracts the features from deep neural network, αt denotes the diffusion time, fφ (x)
raw data and represents these features as a numeric vector. represents the output of a pre-trained model (e.g., ResNet-
We used techniques like trigram with TF-IDF [26] for the data 18), ε is the true noise, ∥2 denotes the squared Euclidean
vectorization. distance. Following are the equations for the diffusion noise
and denoising process in the CARD model for text classifica-
tion: Here, we provide a simple representation of the diffu-
4.3. Data Splitting
sion noise equation 2 and denoising process equation 3 in the
We used Pareto Principle to split our data into ”80% of effects CARD model:
come from 20% of causes” [27]. This principle is also known
as the 80:20 ratio. Using this principle, 80% and 20% of data 5.1.1. Diffusion Noise Process:
will be used for training and testing, respectively. Table 1
The diffusion noise process introduces noise to the input text
shows the splitting of data on the basis of the given classes.
at each timestep. It can be represented as:
However, Figure 4 represents class-wise instances of the train
and test set. xt = xt−1 + ϵt · mt (2)
Where xt is the noisy input text at timestep t, xt−1 is the noisy
Table 1. Details of Dataset after Data Splitting input text from the previous timestep, ϵt is a noise term sam-
Sequence Class Instances Train Set Test Set pled from a noise distribution, mt is a diffusion noise function
1 Reconnaissance 1179 937 242 that transforms the noise term. The diffusion noise function
2 Weaponization 3937 3127 810
3 Delivery 218 183 35
mt can be parameterized by learnable parameters and applied
4 Exploitation 3436 2785 651 element-wise to the noise term ϵt .
5 Installation 301 245 56
6 Command & control 770 602 168
7 Action on objectives 455 357 98
5.1.2. Denoising Process:
Total 10296 8236 2060
The denoising process aims to recover the clean and accurate
representation of the original input text from the noisy input.
It involves estimating the denoised output at each timestep Given a specific index i, the dataset can be accessed as fol-
based on the noisy input. The denoising process can be rep- lows:
resented as follows:
TextDataset[i] = (vectorized texti , encoded labeli )
yt = Dt (xt ) (3) Where:
where yt is the denoised output at timestep t, Dt (·) is the • vectorized texti : represents the vectorized represen-
denoising function applied to the noisy input xt . The denois- tation of the text at index i, obtained by applying the
ing function Dt (·) can be a complex neural network or any CountV ectorizer and converting it to a PyTorch ten-
other function that learns to remove the noise and capture the sor.
essential information from the input text. These equations • encoded labeli is the numerical encoding of the corre-
capture the general idea of the diffusion noise and denoising sponding label at index i.
process in the CARD model for text classification. By mini-
mizing the noise estimation loss, the CARD model learns to By training the diffusion CARD model using the TextClas-
estimate the conditional distribution of the target variable ac- sifier and TextDataset, we aim to optimize the model’s pa-
curately while accounting for the inherent uncertainty in the rameters to minimize the classification loss and to improve
predictions. For classification tasks, the CARD model for- the overall performance of the CARD Model. This optimiza-
mulates the problem in a similar fashion as above, but with a tion is achieved by using Adam optimizer with a learning rate
one-hot encoded label vector yt replacing the continuous re- lr = 0.0001. The diffusion CARD model for text classi-
sponse variable y. The noise estimation loss Lε is computed fication leverages the principles of the CARD framework to
using the same equation as in the regression case but with incorporate uncertainty estimation and provide accurate pre-
the appropriate modifications for classification. The CARD dictions for classification tasks based on text data.
model provides a flexible and practical framework for regres-
sion and classification tasks, offering reliable uncertainty es- 5.2. Model Evaluation
timates for predictions. By leveraging the diffusion model
In classification, accuracy, precision, recall, and F-measure
and deep neural networks, it achieves accurate conditional
score are commonly used metrics to evaluate the performance
distribution estimation, making it a valuable tool in various
of trained models. As Figure 2 shows that the instances of the
machine learning applications. Specifically for text classifi-
given dataset are not equally distributed. Therefore, we used
cation tasks using the diffusion CARD model, such as in our
precision, recall, and F-measure to evaluate the performance
task, we utilized a linear classifier instead of ResNet-18 and
of constructed models. The details of these performance met-
leveraged CountVectorizer to convert the text into numerical
rics and their relationship are discussed in [29].
representations. Architecture for text classification using Lin-
ear Classifier can be represented as:
5.3. Classification Models Results
TextClassifier(x) = fc2(ReLU(fc1(x)))
In addition to the CARD model, we also used trained five dif-
ferent machine-learning models, namely LR[30], MNB[31],
SVM[32], RF[33], and Ensemble classifier[34] using the
Where:
given vector of train data for evaluating the performance. Ta-
• x: Input feature vector obtained from CountVectorizer ble 2 shows the results obtained by different models using test
data, i.e., 20% of data. The CARD model achieved the high-
• f c1: First fully-connected layer with input size 1265
est precision, recall and F-measure, i.e., 0.98, followed by the
and hidden size 512
Ensemble, SVM, RF and LR classifiers with a precision of
• ReLU : Rectified Linear Unit activation function 0.97. The CARD model achieved an impressive result of 98%
on the dataset. This high performance demonstrates the effec-
• f c2: Second fully-connected layer with output size tiveness and reliability of the model in accurately classifying
equal to the number of classes (7) text data. With such a high level of performance, the CARD
To prepare the dataset, we define the T extDataset class, model can be considered highly capable and proficient in its
which consists of: classification abilities. The achieved results indicate that the
model is successful in capturing and learning the underlying
• A set of texts X and their corresponding labels Y patterns and features within the text data, allowing it to make
• A vectorizer that transforms the text into numerical rep- accurate predictions and classifications with a high degree of
resentations (CountV ectorizer). confidence. The robust performance of the CARD model, as
reflected in its exceptional precision, recall and F-measure,
• A label encoder that maps the class labels to numerical makes it a valuable tool for various text classification tasks,
values. providing reliable and precise results.
racy of 75.4%. It correctly classified 40 instances but had
Table 2. CARD model comparison with traditional machine
13 instances misclassified. Class exploitation, with a total of
learning models.
688 instances, demonstrated high accuracy, with 98.8% of in-
Method Precision Recall F-Measure
stances predicted correctly and eight instances misclassified.
LR [30] 0.97 0.91 0.94
In the Class installation, the model achieved a class-wise ac-
MNB [31] 0.94 0.76 0.82 curacy of 95.9% out of 49 instances. It successfully classi-
RF [33] 0.97 0.95 0.96 fied 47 instances, with two instances misclassified. Similarly,
SVM [32] 0.97 0.94 0.95 in Class command, out of 152 instances, the model attained
Ensemble [34] 0.98 0.94 0.96 an accuracy rate of 98.7%, correctly predicting 150 instances
Proposed 0.98 0.98 0.98 and misclassifying two instances. Lastly, Class control had 95
instances, and the model achieved an accuracy of 94.7%. It
correctly classified 90 instances, with five instances misclas-
sified. Overall CARD model, along with the linear classifier,
predicted 2020 instances correctly, while only 40 instances
were misclassified. The quantitative results are used to popu-
late a heptagon (see figure 5) for visualizing the skill set based
on the seven Cyber kill chain steps. The heptagon identifies
which skills the participants have and which skills they lack.
Table 3 provides a detailed analysis of the model’s accuracy
across different classes, enabling a comprehensive evaluation
of its performance.
6. CONCLUSION
In this study, we used the diffusion model to classify

Fig. 5. heptagon of skill set command-line data according to the seven stages of the Cy-
ber kill Chain process. We aimed to assess students’ perfor-
mance in cybersecurity exercises. We compared the results of
5.4. Class-wise Performance of Constructed Models our model against five classical machine-learning algorithms.
The results showed that our model, utilizing a linear classi-
The quantitative results of the CARD classification model are fier and vectorizer, outperformed the other algorithms, indi-
presented in Table 3. The table provides a comprehensive cating its effectiveness for text classification. Conversely, the
overview of the model’s accuracy across multiple classes. It Multinomial Naive Bayes (MNB) model exhibited the lowest
presents detailed information on the total number of instances performance, suggesting it may not be suitable for this spe-
in each class, the number of correct predictions, the num- cific task. These findings have practical and scientific value,
ber of incorrect predictions, and the class-wise accuracy. The serving as a baseline for future research in similar domains.
class-wise classification results for the CARD model can be
represented using equation 4. Let’s denote the total number
of instances in class i as Ni , the number of correct predic-
tions in class i as Ci , the number of incorrect predictions in acknowledgement
class i as Ii , and the class-wise accuracy for class i as Ai .The
We would like to express our gratitude for the support from
class-wise accuracy Ai can be calculated using the following
the SDDE research group at NTNU and the Open Cyber
equation:
Range Project. The financial support provided by this project
Ci has played a significant role in enabling the research and de-
Ai = × 100% (4) velopment efforts presented in this work. We would also like
Ni
to thank Valdemar Švábenskỳ for sharing part of the dataset
Examining the results, we find that Class reconnaissance from Kypo cyber range.”
had a total of 220 instances. The model achieved an accuracy
of 96.8% by correctly predicting 213 instances, with seven
incorrect predictions. In Class weaponization, which com- 7. REFERENCES
prised 803 instances, the model exhibited exceptional per-
formance, achieving an accuracy rate of 99.6% with only [1] Mohib Ullah, Sareer Ul Amin, Muhammad Munsif,
three misclassifications. Moving to Class delivery, consist- Utkurbek Safaev, Habib Khan, Salman Khan, and Habib
ing of 53 instances, the model achieved a class-wise accu- Ullah, “Serious games in science education. a sys-
Table 3. Results of CARD model with linear-classifier
Class name Total instances Correct Predictions Incorrect Predictions Class-wise Accuracy
reconnaissance 220 213 7 96.8

weaponization 803 800 3 99.6
delivery 53 40 13 75.4
exploitation 688 680 8 98.8
installation 49 47 2 95.9
command 152 150 2 98.7
control 95 90 5 94.7
Total 2060 2020 40 -
tematic literature review,” Virtual Reality & Intelligent Journal of Information Security and Applications, vol.
Hardware, vol. 4, no. 3, pp. 189–209, 2022. 57, pp. 102722, 2021.
[2] Lucas McDaniel, Erik Talvi, and Brian Hay, “Cap- [10] Muhammad Mudassar Yamin and Basel Katt, “Cyber
ture the flag as cyber security introduction,” in 2016 security skill set analysis for common curricula develop-
49th hawaii international conference on system sciences ment,” in 14th International Conference on Availability,
(hicss). IEEE, 2016, pp. 5479–5486. Reliability and Security, 2019, pp. 1–8.
[3] Stela Kucek and Maria Leitner, “An empirical survey [11] Tarun Yadav and Arvind Mallari Rao, “Technical as-
of functions and configurations of open-source capture pects of cyber kill chain,” in Security in Computing
the flag (ctf) environments,” Journal of Network and and Communications: Third International Symposium,
Computer Applications, vol. 151, pp. 102470, 2020. SSCC 2015, Kochi, India, August 10-13, 2015. Proceed-
[4] Erik Moore, Steven Fulton, and Dan Likarish, “Eval- ings 3. Springer, 2015, pp. 438–452.
uating a multi agency cyber security training program [12] Eric M Hutchins, Michael J Cloppert, Rohan M Amin,
using pre-post event assessment and longitudinal anal- et al., “Intelligence-driven computer network defense
ysis,” in Information Security Education for a Global informed by analysis of adversary campaigns and intru-
Digital Society: 10th IFIP WG 11.8 World Conference, sion kill chains,” Leading Issues in Information Warfare
WISE 10, Rome, Italy, May 29-31, 2017, Proceedings & Security Research, vol. 1, no. 1, pp. 80, 2011.
10. Springer, 2017, pp. 147–156. [13] Samaher Al-Janabi and Ibrahim Al-Shourbaji, “A study
[5] Muhammad Mudassar Yamin, Ankur Shukla, Mohib of cyber security awareness in educational environment
Ullah, and Basel Katt, “Adapt-automated defence train- in the middle east,” Journal of Information & Knowl-
ing platform in a cyber range,” in International Confer- edge Management, vol. 15, no. 01, pp. 1650007, 2016.
ence on Information Systems and Management Science. [14] Ludwig Slusky and Parviz Partow-Navid, “Students in-
Springer, 2022, pp. 184–203. formation security practices and awareness,” Journal of
[6] Muhammd Mudassar Yamin and Basel Katt, “Detecting Information Privacy and Security, vol. 8, no. 4, pp. 3–
malicious windows commands using natural language 26, 2012.
processing techniques,” in Innovative Security Solutions [15] Dania Aljeaid, Amal Alzhrani, Mona Alrougi, and
for Information Technology and Communications: 11th Oroob Almalki, “Assessment of end-user susceptibil-
International Conference, SecITC 2018. Springer, 2019, ity to cybersecurity threats in saudi arabia by simulating
pp. 157–169. phishing attacks,” Information, vol. 11, no. 12, pp. 547,
[7] William Aubrey Labuschagne and Marthie Grobler, 2020.
“Developing a capability to classify technical skill levels [16] Muhammad Mudassar Yamin and Basel Katt, “Model-
within a cyber range,” in ECCWS 2017 16th European ing attack and defense scenarios for cyber security ex-
Conference on Cyber Warfare and Security. Academic ercises,” in 5th interdisciPlinary cyber research confer-
Conferences and publishing limited, 2017, p. 224. ence, 2019, p. 7.
[8] Muhammad Mudassar Yamin, Mohib Ullah, Habib Ul- [17] Valdemar Švábenskỳ, Jan Vykopal, Pavel Čeleda,
lah, Basel Katt, Mohammad Hijji, and Khan Muham- Kristián Tkáčik, and Daniel Popovič, “Student as-
mad, “Mapping tools for open source intelligence with sessment in cybersecurity training automated by pat-
cyber kill chain for adversarial aware security,” Mathe- tern mining and clustering,” Education and Information
matics, vol. 10, no. 12, pp. 2054, 2022. Technologies, vol. 27, no. 7, pp. 9231–9262, 2022.
[9] Muhammad Mudassar Yamin, Mohib Ullah, Habib Ul- [18] Kaie Maennel, “Learning analytics perspective: Evi-
lah, and Basel Katt, “Weaponized ai for cyber attacks,” dencing learning from digital datasets in cybersecurity
exercises,” in 2020 IEEE European Symposium on Secu- ARPN Journal of Engineering and Applied Sciences,
rity and Privacy Workshops (EuroS&PW). IEEE, 2020, vol. 10, no. 14, pp. 5947–5953, 2015.
pp. 27–36. [31] David D Lewis, “Naive (bayes) at forty: The indepen-
[19] Valdemar Švábenskỳ, Jan Vykopal, Pavel Seda, and dence assumption in information retrieval,” in European
Pavel Čeleda, “Dataset of shell commands used by par- conference on machine learning. Springer, 1998, pp. 4–
ticipants of hands-on cybersecurity training,” Data in 15.
Brief, vol. 38, pp. 107398, 2021. [32] Thorsten Joachims, “Text categorization with support
[20] Richard Weiss, Franklyn Turbak, Jens Mache, and vector machines: Learning with many relevant fea-
Michael E Locasto, “Cybersecurity education and as- tures,” in European conference on machine learning.
sessment in edurange,” IEEE Security & Privacy, vol. Springer, 1998, pp. 137–142.
15, no. 03, pp. 90–95, 2017. [33] Baoxun Xu, Xiufeng Guo, Yunming Ye, and Jiefeng
[21] Jelena Mirkovic, Aashray Aggarwal, David Weinman, Cheng, “An improved random forest classifier for text
Paul Lepe, Jens Mache, and Richard Weiss, “Using ter- categorization.,” J. Comput., vol. 7, no. 12, pp. 2913–
minal histories to monitor student progress on hands-on 2920, 2012.
exercises,” in Proceedings of the 51st ACM technical [34] Omer Sagi and Lior Rokach, “Ensemble learning: A
symposium on computer science education, 2020, pp. survey,” Wiley Interdisciplinary Reviews: Data Min-
866–872. ing and Knowledge Discovery, vol. 8, no. 4, pp. e1249,
[22] Margus Ernits, Kaie Maennel, Sten Mäses, Toomas 2018.
Lepik, and Olaf Maennel, “From simple scoring towards
a meaningful interpretation of learning in cybersecurity
exercises,” in ICCWS 2020 15th International Confer-
ence on Cyber Warfare and Security. Academic Confer-
ences and publishing limited, 2020, p. 135.
[23] Sergio Caltagirone, Andrew Pendergast, and Christo-
pher Betz, “The diamond model of intrusion analysis,”
Tech. Rep., Center For Cyber Intelligence Analysis and
Threat Research Hanover Md, 2013.
[24] Blake E Strom, Andy Applebaum, Doug P Miller,
Kathryn C Nickels, Adam G Pennington, and Cody B
Thomas, “Mitre att&ck: Design and philosophy,” Tech-
nical report, 2018.
[25] Sindhu Abro, Sarang Shaikh, Zahid Hussain Khand,
Ali Zafar, Sajid Khan, and Ghulam Mujtaba, “Au-
tomatic hate speech detection using machine learning:
A comparative study,” International Journal of Ad-
vanced Computer Science and Applications, vol. 11, no.
8, 2020.
[26] Juan Ramos et al., “Using tf-idf to determine word rel-
evance in document queries,” in Proceedings of the first
instructional conference on machine learning. Citeseer,
2003, vol. 242, pp. 29–48.
[27] Rosie Dunford, Quanrong Su, and Ekraj Tamang, “The
pareto principle,” 2014.
[28] Xizewen Han, Huangjie Zheng, and Mingyuan Zhou,
“Card: Classification and regression diffusion models,”
arXiv preprint arXiv:2206.07275, 2022.
[29] Naeem Seliya, Taghi M Khoshgoftaar, and Jason
Van Hulse, “A study on the relationships of classifier
performance metrics,” in 2009 21st IEEE international
conference on tools with artificial intelligence. IEEE,
2009, pp. 59–66.
[30] L Mary Gladence, M Karthi, and V Maria Anu, “A
statistical comparison of logistic regression and differ-
ent bayes classification methods for machine learning,”

2023153344

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2023153344

Uploaded by

Copyright:

Available Formats

DATA DRIVEN SKILL ASSESSMENT FOR CYBERSECURITY EXERCISES

Saif Hassan1 , Sarang Shaikh2 , Muhammad Mudassar Yamin2 , Basel katt2 ,

ABSTRACT and the more it was considered an easy challenge. Dynamic

2.3. Student skills assessment in cybersecurity exercises

Valdemar et al. in [17] explored the dataset from 18 cyberse-

Fig. 2. Distributions of Commands According to Class Label

5.1. Classification and Regression Model (CARD)

In this study, we used the diffusion model to classify

reconnaissance 220 213 7 96.8

You might also like