Professional Documents
Culture Documents
Aezeden Mohamed
Janne Heilala Nelson Sizwe Madonsela
Department of Mechanical
University of Turku, Technology, Department of Quality and Operations
Engineering, PNG University of
Finland Management, University of
Technology,
janne.p.heilala@utu.fi Johannesburg
Papua New Guinea
nmadonsela@uj.ac.za
aezeden.mohamed@pnguot.ac.pg
Abstract—Intrusion detection systems (IDS) that are powered by When evaluating the efficacy and precision of various
machine learning have become an important tool for enhancing online machine learning algorithms for cyber threat identification
security due to their capacity to detect and respond quickly to potential
attacks. This is one of the reasons why IDS have grown so popular. In
and mitigation, this dataset is of incalculable value[3].
this article, we investigate how machine learning techniques might be On the UNSW-NB15 dataset, this study will conduct an
applied to the challenge of intrusion detection by making use of the
UNSW-NB15 dataset. The dataset, compiled by researchers at the
in-depth evaluation of several machine learning techniques,
University of New South Wales, includes benign and malicious network such as decision trees, support vector machines (SVMs),
activity examples. The dataset known as UNSW-NB15 is utilised random forests, and deep learning models. We first
throughout this work in order to conduct an in-depth analysis and preprocess the data on network traffic using feature
comparison of a number of different machine learning strategies. engineering approaches, and then we extract the relevant
These strategies include decision trees, support vector machines
(SVM), random forests, and deep learning models. The data collected
features from those data sets. This allows us to capture the
from the network traffic is first subjected to preprocessing, after which distinctive characteristics of both usual and attack
feature engineering methods are utilised to extract the features of occurrences. We want to analyse the performance of the
interest. In order to evaluate the usefulness of the constructed models, created models using common metrics such as accuracy,
conventional metrics including as accuracy, precision, recall, and F1 precision, recall, and F1 score in order to determine how well
score are utilised. The findings provide evidence that a number of various machine learning approaches can identify the many
different machine learning algorithms can be used to detect the
numerous types of assaults that were represented in the UNSW-NB15 types of attacks that are contained in the UNSW-NB15
dataset. The study also analyses how different feature engineering dataset. This will allow us to determine whether or not these
strategies affect detection accuracy. Finding the best machine learning approaches are effective in recognising the various types of
algorithms and feature engineering techniques to improve assaults[4].
cybersecurity using the UNSW-NB15 dataset is an important outcome
of this research that will help to move intrusion detection systems In addition, we explore how alternative feature
forward. The findings can be used to improve IDS, which helps engineering strategies affect the overall detection
strengthen the defenses of vital systems and networks against new performance. We aim to improve cybersecurity through
cyber-attacks.
intrusion detection by identifying the most effective
Keywords— Machine learning, Intrusion detection systems, combinations of existing algorithms and feature engineering
Cybersecurity, UNSW-NB15 dataset, Emerging cyber threats. techniques. The results of this study add to the development
of IDSs by shedding light on how efficient machine learning
I. INTRODUCTION algorithms and feature engineering methods are in bettering
network security. Organizations can better protect their most
In today's interconnected world, strong cybersecurity vital systems and networks from evolving cyber threats by
measures are of critical importance. Detecting and implementing and optimizing machine learning-based
responding to new cyberattacks is becoming increasingly intrusion detection systems (IDS), once the most effective
difficult with conventional rule-based intrusion detection methods have been identified. In the next sections of this
systems (IDS). As a solution, intrusion detection systems research article, we thoroughly examine our findings by
powered by machine learning have become increasingly delving into the methodology, experimental setup, results,
popular[1]. and commentary. In addition, we provide some suggestions
Using machine learning techniques in intrusion detection for the direction of future research into machine learning-
systems has the potential to further the field's overall based intrusion detection systems with the intention of
revolution. IDS can learn and adapt from massive volumes of enhancing online safety, and we discuss the implications of
network data to detect patterns and abnormalities suggestive our findings as well as some possible implementations of
of hostile activities thanks to the power of machine learning them[5].
algorithms[2]. This improves the speed and accuracy
businesses can detect and respond to security threats. This
study article focuses on improving cybersecurity with
machine learning-based intrusion detection systems. Using
the UNSW-NB15 dataset developed at In the University of
New South Wales, we research how machine learning
techniques might be applied in many contexts. The UNSW-
NB15 dataset includes data on a wide variety of network
traffic, some of which is malicious and some of which is not.
2023 Second International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
367
models. Explore model fusion techniques to combine the information, and reformatting the data so that it can be
best features of various algorithms for improved detection analysed effectively. Apply feature engineering to the
accuracy and system robustness. dataset in order to draw out useful features that capture
important aspects of network traffic and attacks. Methods
E. Testing for Efficiency and Effectiveness in Real Time for improving the model's ability to distinguish between
To detect anomalies in network traffic in real-time, benign and malicious actions include selecting features
deploy the trained intrusion detection model in a real-time using domain expertise, employing dimensionality reduction
setting. Examine how well the system can identify and techniques (such as principal component analysis), and
counteract different kinds of attacks to gauge its developing derived features[12]. From the collected data,
performance. Maintain the system's efficacy over time by you should generate a training set, a validation set, and a test
constantly checking and updating the model to account for set. The training set is used to teach the machine learning
new types of attacks[11]. models, the validation set is used to check their accuracy,
and finally, the models are put to the test using the testing
F. Verification and Evaluation by Experiment set to determine their overall effectiveness. Practise your
Use standard datasets to verify the proposed intrusion machine learning abilities by using the training set provided
detection system's accuracy and evaluate its effectiveness by the UNSW-NB15 dataset. Determine the requirements
compared to other methods. Assess the system's precision, and objectives of your IDS installation before selecting the
efficiency, scalability, and robustness against various attack appropriate algorithms. In intrusion detection systems (IDS),
scenarios and network settings through experimentation and decision trees, random forests, support vector machines
analysis. (SVM), neural networks, and anomaly detection approaches
are all extensively used[13]Assess the trained models '
G. Datasets efficacy using the validation set or cross-validation methods.
The datasets used in this research work is UNSW-NB15 Examine indicators like recall, precision, accuracy, and F1-
Intrusion detection datasets which is opensource score. Adjust the models' hyperparameters until they give
available datasets in which 70% was used for training the best results. You can employ methods like grid and
and 15% used for testing and 15% for validation , the random search to discover the optimal hyperparameter
experimental analysis was carried out in Matlab. settings. Apply the UNSW-NB15 testing set to the trained
models and see if they pass muster. This check is necessary
to make sure the models can be applied successfully in the
real world.
V. ALGORITHMS
A. Decision Trees
B. Random Forest
Random Forest is a form of ensemble learning that mixes
the results of numerous decision trees in order to increase
the accuracy of classification and decrease the likelihood of
Fig .1 Proposed architecture for Intrusion Detection overfitting occurring. The ultimate conclusion regarding
classification is arrived at by constructing a number of
IV. IMPLEMENTATION decision trees and compiling the results of their forecasts.
Random Forests are well-known for their reliability as well
Acquire the UNSW-NB15 dataset, which can be used freely
as their capacity to deal with high-dimensional data[14].
in scientific study. It is deposited in the official repository
and is available for download from the UNSW website. C. Support Vector Machines (SVM)
Learn the dataset's layout and characteristics as well as the SVMs are effective in binary classification tasks and can be
assaults and everyday traffic it contains. Data from applied to IDS in order to differentiate between normal and
simulated network traffic, encompassing malicious attacks malicious network activities. SVMs are also effective in
and everyday operations, makes up the UNSW-NB15 identifying anomalies in network traffic. The goal of support
dataset. Clean and organise the dataset before using it to vector machines (SVMs) is to locate a hyperplane that
train machine learning models. This process may include effectively partitions the data points of the various classes.
filling in missing numbers, filtering out irrelevant
2023 Second
368Authorized licensed International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
Through the utilisation of kernel functions, they are capable traffic while keeping its detection accuracy. When
of handling linear as well as non-linear separation. compared to the performance of traditional rule-based or
signature-based IDS, machine learning-based IDS has been
D. Neural Networks
shown to have superior results. In this article, we will
In the field of IDS, neural networks, and more specifically highlight some of the advantages of adopting machine
deep learning models like convolutional neural networks learning approaches, such as the capacity to handle
(CNNs) or recurrent neural networks (RNNs), have shown unexpected attacks, react to developing threats, and enhance
promising results. CNNs are appropriate for capturing detection accuracy over time.
spatial dependencies in network traffic data, whereas RNNs Discuss some of the more practical aspects of implementing
excel in processing sequential data, which makes them and operating the intrusion detection system, such as the
appropriate for analysing network packet sequences or log required processing resources, the complexity of the
entries. Both types of neural networks can be used in deployment, and the demand for regular updates and
conjunction with one another[15]. maintenance. Determine the viability and applicability of the
system in terms of real-world cybersecurity problems by
taking into account aspects such as the amount of time
VI. RESULT AND DISCUSSION
required for training, the size of the model, and its real-time
Conduct an analysis to determine the extent to which the processing capabilities.
IDS can distinguish between normal and malicious network
traffic. In order to evaluate how successfully this is carried TABLE 1 COMPARISON OF ALGORITHMS
out, many metrics including as accuracy, precision, recall,
F1-score, and the area under the ROC curve (AUC-ROC) Algorithms Accuracy Precision Recall F1
can be applied. A system's ability to differentiate between Score
routine operations and prospective intrusions is reflected in Decision Tree 0.956 0.923 0.812 0.789
its detection accuracy. A high detection accuracy implies Random forest 0.976 0.935 0.856 0.806
that the system is effective. SVM 0.952 0.956 0.875 0.825
Neural 0.983 0.978 0.896 0.836
Evaluate how well the IDS is able to detect and categorise Network
attacks that are already known to exist and have distinct
signatures or patterns. Evaluate how well the system can
recognise different kinds of attacks, such as DoS (Denial of
Service), DDoS (Distributed Denial of Service), attempted
intrusions, port scans, or SQL injections. Discuss the
percentage of successful labelling attempts made by the IDS
for certain well-known assaults.
2023 Second International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
369
engineering approaches, selecting appropriate algorithms, “Dual sink efficient balanced energy technique for underwater
and optimising hyperparameters. In addition, improving the acoustic sensor networks,” Proc. - IEEE 30th Int. Conf. Adv. Inf.
overall performance of the intrusion detection system (IDS) Netw. Appl. Work. WAINA 2016, pp. 551–556, 2016, doi:
10.1109/WAINA.2016.156.
may be possible through the utilisation of ensemble
[15] A. Kannappan and R. M. Bommi, “Energy-Efficient Routing
approaches or the combination of several detection
using the Hybrid Bilevel-Litechenbery-Optimization Algorithm
strategies. in Comparison with Ant-colony Optimization,” ICDCS 2022 -
2022 6th Int. Conf. Devices, Circuits Syst., no. April, pp. 464–
In conclusion, the application of machine learning-based 466, 2022, doi: 10.1109/ICDCS54290.2022.9780826.
intrusion detection systems (IDS) using the UNSW-NB15
dataset has shown its potential to improve cybersecurity by
increasing the accuracy and efficiency of intrusion
detection. These systems, if they continue to improve and be
the subject of research, have the potential to play a vital part
in the process of protecting networks and systems against
changing cyber threats.
REFERENCE
[1] R. Latha and R. M. Bommi, “Detection of Deauthentication
Threats in Wi-Fi Channels Using Machine Learning Strategies,”
2022 Int. Conf. Data Sci. Agents Artif. Intell. ICDSAAI 2022, pp.
4–9, 2022, doi: 10.1109/ICDSAAI55433.2022.10028874.
[2] A. M. Sauber, P. M. El-Kafrawy, A. F. Shawish, M. A. Amin,
and I. M. Hagag, “A New Secure Model for Data Protection over
Cloud Computing,” Comput. Intell. Neurosci., vol. 2021, 2021,
doi: 10.1155/2021/8113253.
[3] R. Latha, “Deauthentication Attack Detection in the Wi-Fi
network by Using ML Techniques,” 2022.
[4] S. Caleb and S. J. J. Thangaraj, “Data-driven ML Approaches for
the concept of Self-healing in CWN , Including its Challenges
and Possible Solutions,” 2023 Eighth Int. Conf. Sci. Technol.
Eng. Math., pp. 1–7, doi:
10.1109/ICONSTEM56934.2023.10142451.
[5] S. Caleb and S. J. J. Thangaraj, “Secured Node Identification
Approach Based on Artificial Neural Network Infrastructure for
Wireless Sensor Networks,” pp. 646–651.
[6] H. Du, J. Chen, M. Chen, C. Peng, and D. He, “A Lightweight
Authenticated Searchable Encryption without Bilinear Pairing for
Cloud Computing,” Wirel. Commun. Mob. Comput., vol. 2022,
2022, doi: 10.1155/2022/2336685.
[7] A. Abdulridha, D. Salama, and K. M, “NHCA: Developing New
Hybrid Cryptography Algorithm for Cloud Computing
Environment,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 11, pp.
479–486, 2017, doi: 10.14569/ijacsa.2017.081158.
[8] I. No, “ANALYSIS OF MACHINE LEARNING ALGORITHM
FOR PREDICTION OF,” vol. 11, no. 3, pp. 42–47, 2020.
[9] T. J. Nandhini and K. Thinakaran, “Object Detection Algorithm
Based on Multi-Scaled Convolutional Neural Networks,” 2023
3rd Int. Conf. Artif. Intell. Signal Process., pp. 1–5, doi:
10.1109/AISP57993.2023.10134980.
[10] V. Dilli Ganesh and R. M. Bommi, “‘Prediction of Tool Wear by
Using RGB Techniques in Comparison with Experimental
Analysis,’” 2022 Int. Conf. Data Sci. Agents Artif. Intell.
(ICDSAAI), Chennai, India, 2022, pp., 2022.
[11] A. Elgammal, D. Harwood, and L. Davis, “Non-parametric
Model for Background Subtraction,” pp. 751–767, 2000.
[12] T. Wu and N. Sun, “A reliable and evenly energy consumed
routing protocol for underwater acoustic sensor networks,” 2015
IEEE 20th Int. Work. Comput. Aided Model. Des. Commun. Links
Networks, CAMAD 2015, pp. 299–302, 2016, doi:
10.1109/CAMAD.2015.7390528.
[13] T. Subhash Bora and M. D. Rokade, “Human Suspicious Activity
Detection System Using Cnn Model for Video Surveillance,” vol.
7, no. 3, p. 2021, 2021, [Online]. Available: www.ijariie.com
[14] M. A. Khan, N. Javaid, A. Majid, M. Imran, and M. Alnuem,
2023 Second
370Authorized licensed International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.