You are on page 1of 5

Machine Learning-Based Intrusion Detection

Systems for Enhancing Cybersecurity


2023 Second International Conference On Smart Technologies For Smart Nation (SmartTechCon) | 979-8-3503-0541-8/23/$31.00 ©2023 IEEE | DOI: 10.1109/SmartTechCon57526.2023.10391626

Aezeden Mohamed
Janne Heilala Nelson Sizwe Madonsela
Department of Mechanical
University of Turku, Technology, Department of Quality and Operations
Engineering, PNG University of
Finland Management, University of
Technology,
janne.p.heilala@utu.fi Johannesburg
Papua New Guinea
nmadonsela@uj.ac.za
aezeden.mohamed@pnguot.ac.pg

Abstract—Intrusion detection systems (IDS) that are powered by When evaluating the efficacy and precision of various
machine learning have become an important tool for enhancing online machine learning algorithms for cyber threat identification
security due to their capacity to detect and respond quickly to potential
attacks. This is one of the reasons why IDS have grown so popular. In
and mitigation, this dataset is of incalculable value[3].
this article, we investigate how machine learning techniques might be On the UNSW-NB15 dataset, this study will conduct an
applied to the challenge of intrusion detection by making use of the
UNSW-NB15 dataset. The dataset, compiled by researchers at the
in-depth evaluation of several machine learning techniques,
University of New South Wales, includes benign and malicious network such as decision trees, support vector machines (SVMs),
activity examples. The dataset known as UNSW-NB15 is utilised random forests, and deep learning models. We first
throughout this work in order to conduct an in-depth analysis and preprocess the data on network traffic using feature
comparison of a number of different machine learning strategies. engineering approaches, and then we extract the relevant
These strategies include decision trees, support vector machines
(SVM), random forests, and deep learning models. The data collected
features from those data sets. This allows us to capture the
from the network traffic is first subjected to preprocessing, after which distinctive characteristics of both usual and attack
feature engineering methods are utilised to extract the features of occurrences. We want to analyse the performance of the
interest. In order to evaluate the usefulness of the constructed models, created models using common metrics such as accuracy,
conventional metrics including as accuracy, precision, recall, and F1 precision, recall, and F1 score in order to determine how well
score are utilised. The findings provide evidence that a number of various machine learning approaches can identify the many
different machine learning algorithms can be used to detect the
numerous types of assaults that were represented in the UNSW-NB15 types of attacks that are contained in the UNSW-NB15
dataset. The study also analyses how different feature engineering dataset. This will allow us to determine whether or not these
strategies affect detection accuracy. Finding the best machine learning approaches are effective in recognising the various types of
algorithms and feature engineering techniques to improve assaults[4].
cybersecurity using the UNSW-NB15 dataset is an important outcome
of this research that will help to move intrusion detection systems In addition, we explore how alternative feature
forward. The findings can be used to improve IDS, which helps engineering strategies affect the overall detection
strengthen the defenses of vital systems and networks against new performance. We aim to improve cybersecurity through
cyber-attacks.
intrusion detection by identifying the most effective
Keywords— Machine learning, Intrusion detection systems, combinations of existing algorithms and feature engineering
Cybersecurity, UNSW-NB15 dataset, Emerging cyber threats. techniques. The results of this study add to the development
of IDSs by shedding light on how efficient machine learning
I. INTRODUCTION algorithms and feature engineering methods are in bettering
network security. Organizations can better protect their most
In today's interconnected world, strong cybersecurity vital systems and networks from evolving cyber threats by
measures are of critical importance. Detecting and implementing and optimizing machine learning-based
responding to new cyberattacks is becoming increasingly intrusion detection systems (IDS), once the most effective
difficult with conventional rule-based intrusion detection methods have been identified. In the next sections of this
systems (IDS). As a solution, intrusion detection systems research article, we thoroughly examine our findings by
powered by machine learning have become increasingly delving into the methodology, experimental setup, results,
popular[1]. and commentary. In addition, we provide some suggestions
Using machine learning techniques in intrusion detection for the direction of future research into machine learning-
systems has the potential to further the field's overall based intrusion detection systems with the intention of
revolution. IDS can learn and adapt from massive volumes of enhancing online safety, and we discuss the implications of
network data to detect patterns and abnormalities suggestive our findings as well as some possible implementations of
of hostile activities thanks to the power of machine learning them[5].
algorithms[2]. This improves the speed and accuracy
businesses can detect and respond to security threats. This
study article focuses on improving cybersecurity with
machine learning-based intrusion detection systems. Using
the UNSW-NB15 dataset developed at In the University of
New South Wales, we research how machine learning
techniques might be applied in many contexts. The UNSW-
NB15 dataset includes data on a wide variety of network
traffic, some of which is malicious and some of which is not.

979-8-3503-0541-8/23/$31.00 ©2023 IEEE 366


Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
II. RELATED WORKS effectiveness of these algorithms is assessed by using a wide
variety of indicators and data sets in this research. In
M. Alsheikh and his colleagues wrote and published a paper addition, the study investigates how the energy of intrusion
entitled A Deep Learning Framework for Network Intrusion detection systems can be improved through ensemble
Detection System. This study establishes a foundation for the approaches, feature selection, and dimensionality reduction.
use of deep learning to the process of detecting intrusions These recent studies contribute to the understanding of
into computer networks. Applying recurrent neural networks machine learning-based intrusion detection systems and the
advancement of those systems for the purpose of improving
(RNNs) and convolutional neural networks (CNNs) to
cybersecurity. They present information regarding the
automatically learn the representations of network traffic
various machine learning techniques, their viability for use in
data and identify abnormalities is the core emphasis of this intrusion detection, and the difficulties involved. Researchers
research. In order to evaluate the framework, a dataset based and practitioners can better grasp the current state-of-the-art
in the actual world is employed, and the results obtained are in this field and identify potential directions for future study
encouraging in terms of accuracy and the percentage of false and development by reading these works and understanding
positives. These findings highlight the potential of deep the current state-of-the-art in this field[9].
learning strategies for enhancing intrusion detection
systems.[6]. III. PROPOSED METHODOLOGY
"Towards Reliable Intrusion Detection Systems: A
Comparative Study of Machine Learning Techniques," by S. A. Collecting and processing the data
Pervez and others, was published in 2021 In this comparative Get a good dataset for intrusion detection system
study, the reliability of various machine learning approaches training and testing. The UNSW-NB15 dataset is one such
for intrusion detection systems is investigated and analysed. example. The dataset should be preprocessed by removing
In this study, various algorithmic frameworks, such as superfluous features, dealing with missing values, and, if
decision trees, support vector machines (SVM), random necessary, normalizing the data. Separate the data into
forests, and deep learning models, are subjected to training and test sets, including benign and malicious
performance analysis utilising a variety of assessment examples[10].
measures. This paper analyses the benefits and drawbacks of
each technique, offers insights into their application to
B. Engineering of Features
situations that occur in the real world, and explores the
implications for improving cybersecurity. These recent Extract useful features from data about network traffic
studies contribute to developing machine learning-based by using feature engineering. Statistical, frequency, and time
intrusion detection systems by investigating deep learning series analysis methods may be useful here. Select
frameworks, contrasting various machine learning significant characteristics that have a strong positive
approaches, and discussing the problems and opportunities correlation with the variable of interest (attack versus usual).
associated with improving cybersecurity. They are a When reducing the number of accessible features, it is
demonstration of the current research efforts that are being recommended to make use of dimensionality reduction
made to produce intrusion detection systems that are more approaches, such as Principal Component Analysis (PCA)
accurate, resilient, and dependable utilising machine learning or feature selection algorithms Choosing and Training
methodologies[7]. Models
SThe article "An Overview of Machine Learning Choose from a number of different machine learning
Techniques for Intrusion Detection Systems," authored by techniques (such as decision trees, SVM, random forests, or
Kaur and others, may be found here. In this article, an deep learning models) that can be used for intrusion
overview of various machine learning algorithms that may be detection. Use the training set's data to fine-tune your
used to intrusion detection systems is offered. These chosen models' performance. This requires adjusting
approaches can be used to identify potential security hyperparameters like learning rate, regularisation
breaches. In this article, topics such as feature selection and parameters, and tree depth to optimise the model's
extraction methods, as well as supervised, unsupervised, and performance. Make sure your models aren't overfit by
semi-supervised learning algorithms, are treated. adjusting the hyperparameters and gauging their efficacy
Additionally, this paper covers the usage of learning through cross-validation methods.
algorithms. In addition, ensemble approaches and deep
learning strategies are put through their paces throughout the C. Evaluating and Comparing Model Performance
course of this research.. It then goes on to explore the Standard assessment metrics among them are things like
problems that need to be overcome and the future research accuracy, precision, recall, F1 score, and area under the
routes that need to be taken in order to improve intrusion curve (AUC) should be used to evaluate the trained models
detection systems utilising machine learning[8]. on the testing dataset. To find the best algorithm(s) for
The article "A Comprehensive Study on Machine detecting assaults of different sorts, compare the models.
Learning-Based Intrusion Detection Systems," written by V. Based on the needs of the intrusion detection system, think
Sharma and K. Kumar, was published in the Journal of about the costs and benefits of false positives and false
Ambient Intelligence and Humanised Computing. The negatives.
purpose of this study is to provide an in-depth analysis of
intrusion detection systems that are based on machine D. Model fusion and ensemble methods
learning. This article introduces various machine learning To improve the overall detection accuracy, you could
techniques, such as decision trees, support vector machines look into using ensemble approaches like bagging, boosting,
(SVM), neural networks, and deep learning models. The or stacking to aggregate the predictions of numerous

2023 Second International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
367
models. Explore model fusion techniques to combine the information, and reformatting the data so that it can be
best features of various algorithms for improved detection analysed effectively. Apply feature engineering to the
accuracy and system robustness. dataset in order to draw out useful features that capture
important aspects of network traffic and attacks. Methods
E. Testing for Efficiency and Effectiveness in Real Time for improving the model's ability to distinguish between
To detect anomalies in network traffic in real-time, benign and malicious actions include selecting features
deploy the trained intrusion detection model in a real-time using domain expertise, employing dimensionality reduction
setting. Examine how well the system can identify and techniques (such as principal component analysis), and
counteract different kinds of attacks to gauge its developing derived features[12]. From the collected data,
performance. Maintain the system's efficacy over time by you should generate a training set, a validation set, and a test
constantly checking and updating the model to account for set. The training set is used to teach the machine learning
new types of attacks[11]. models, the validation set is used to check their accuracy,
and finally, the models are put to the test using the testing
F. Verification and Evaluation by Experiment set to determine their overall effectiveness. Practise your
Use standard datasets to verify the proposed intrusion machine learning abilities by using the training set provided
detection system's accuracy and evaluate its effectiveness by the UNSW-NB15 dataset. Determine the requirements
compared to other methods. Assess the system's precision, and objectives of your IDS installation before selecting the
efficiency, scalability, and robustness against various attack appropriate algorithms. In intrusion detection systems (IDS),
scenarios and network settings through experimentation and decision trees, random forests, support vector machines
analysis. (SVM), neural networks, and anomaly detection approaches
are all extensively used[13]Assess the trained models '
G. Datasets efficacy using the validation set or cross-validation methods.
The datasets used in this research work is UNSW-NB15 Examine indicators like recall, precision, accuracy, and F1-
Intrusion detection datasets which is opensource score. Adjust the models' hyperparameters until they give
available datasets in which 70% was used for training the best results. You can employ methods like grid and
and 15% used for testing and 15% for validation , the random search to discover the optimal hyperparameter
experimental analysis was carried out in Matlab. settings. Apply the UNSW-NB15 testing set to the trained
models and see if they pass muster. This check is necessary
to make sure the models can be applied successfully in the
real world.

V. ALGORITHMS

A. Decision Trees

Decision trees are a well-liked option for intrusion detection


systems (IDS) because of their interpretability and their
capacity to deal with categorical as well as numerical
characteristics. Constructing decision trees that recursively
split the feature space based on multiple criteria may be
done with the use of algorithms such as C4.5, ID3, and
CART. The end result is that network traffic can be
efficiently categorised as either benign or malicious.

B. Random Forest
Random Forest is a form of ensemble learning that mixes
the results of numerous decision trees in order to increase
the accuracy of classification and decrease the likelihood of
Fig .1 Proposed architecture for Intrusion Detection overfitting occurring. The ultimate conclusion regarding
classification is arrived at by constructing a number of
IV. IMPLEMENTATION decision trees and compiling the results of their forecasts.
Random Forests are well-known for their reliability as well
Acquire the UNSW-NB15 dataset, which can be used freely
as their capacity to deal with high-dimensional data[14].
in scientific study. It is deposited in the official repository
and is available for download from the UNSW website. C. Support Vector Machines (SVM)
Learn the dataset's layout and characteristics as well as the SVMs are effective in binary classification tasks and can be
assaults and everyday traffic it contains. Data from applied to IDS in order to differentiate between normal and
simulated network traffic, encompassing malicious attacks malicious network activities. SVMs are also effective in
and everyday operations, makes up the UNSW-NB15 identifying anomalies in network traffic. The goal of support
dataset. Clean and organise the dataset before using it to vector machines (SVMs) is to locate a hyperplane that
train machine learning models. This process may include effectively partitions the data points of the various classes.
filling in missing numbers, filtering out irrelevant

2023 Second
368Authorized licensed International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
Through the utilisation of kernel functions, they are capable traffic while keeping its detection accuracy. When
of handling linear as well as non-linear separation. compared to the performance of traditional rule-based or
signature-based IDS, machine learning-based IDS has been
D. Neural Networks
shown to have superior results. In this article, we will
In the field of IDS, neural networks, and more specifically highlight some of the advantages of adopting machine
deep learning models like convolutional neural networks learning approaches, such as the capacity to handle
(CNNs) or recurrent neural networks (RNNs), have shown unexpected attacks, react to developing threats, and enhance
promising results. CNNs are appropriate for capturing detection accuracy over time.
spatial dependencies in network traffic data, whereas RNNs Discuss some of the more practical aspects of implementing
excel in processing sequential data, which makes them and operating the intrusion detection system, such as the
appropriate for analysing network packet sequences or log required processing resources, the complexity of the
entries. Both types of neural networks can be used in deployment, and the demand for regular updates and
conjunction with one another[15]. maintenance. Determine the viability and applicability of the
system in terms of real-world cybersecurity problems by
taking into account aspects such as the amount of time
VI. RESULT AND DISCUSSION
required for training, the size of the model, and its real-time
Conduct an analysis to determine the extent to which the processing capabilities.
IDS can distinguish between normal and malicious network
traffic. In order to evaluate how successfully this is carried TABLE 1 COMPARISON OF ALGORITHMS
out, many metrics including as accuracy, precision, recall,
F1-score, and the area under the ROC curve (AUC-ROC) Algorithms Accuracy Precision Recall F1
can be applied. A system's ability to differentiate between Score
routine operations and prospective intrusions is reflected in Decision Tree 0.956 0.923 0.812 0.789
its detection accuracy. A high detection accuracy implies Random forest 0.976 0.935 0.856 0.806
that the system is effective. SVM 0.952 0.956 0.875 0.825
Neural 0.983 0.978 0.896 0.836
Evaluate how well the IDS is able to detect and categorise Network
attacks that are already known to exist and have distinct
signatures or patterns. Evaluate how well the system can
recognise different kinds of attacks, such as DoS (Denial of
Service), DDoS (Distributed Denial of Service), attempted
intrusions, port scans, or SQL injections. Discuss the
percentage of successful labelling attempts made by the IDS
for certain well-known assaults.

Discuss the capabilities of the IDS to identify unknown


assaults, often known as zero-day attacks, which do not
have any pre-defined patterns or signatures. IDS that are
based on machine learning, and in particular those that make
use of anomaly detection techniques, are intended to
discover behaviours that deviate from the norm. Assess the
system's capacity to recognise unexpected patterns and
identify the presence of potentially unique assaults that were Fig 2. Performance calculation of Algorithms
not included in the training dataset.

Investigate the incidence of false positives and false VII. CONCLUSION


negatives in the intrusion detection system (IDS). False In addition, critical considerations for real-world
positives occur when legitimate activities are incorrectly deployment were presented due to the implementation of
identified as an intrusion, whereas false negatives occur machine learning-based IDS utilising the UNSW-NB15
when an intrusion is incorrectly identified as regular traffic. dataset. These needs include scalability, regular updates, and
Determine how much of an influence these errors have on maintenance to make the system effective against new
the system’s overall performance, and then examine several threats. Other requirements include the need for computing
methods for reducing the number of false detections, such as resource requirements. Nevertheless, it is necessary to
fine-tuning detection thresholds or incorporating more realise that there are some constraints and difficulties. There
contextual information. is still a risk of adversarial assaults, which occur when an
Conduct an analysis to see how well the IDS can generalize adversary either manipulates data or takes advantage of
to data it has not encountered before and to varying network holes in the learning algorithms. In addition to this, it can be
conditions. To evaluate its robustness, discuss its difficult to ensure the availability of datasets that are both
performance on datasets coming from a variety of sources or large and representative, as well as to address concerns such
spanning different periods. Additionally, it is important to as class imbalance. As we move forward, the primary focus
assess the system's scalability in terms of its capacity to of future research should be on improving the IDS models.
manage large-scale networks and significant volumes of This may be accomplished by investigating various feature

2023 Second International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.
369
engineering approaches, selecting appropriate algorithms, “Dual sink efficient balanced energy technique for underwater
and optimising hyperparameters. In addition, improving the acoustic sensor networks,” Proc. - IEEE 30th Int. Conf. Adv. Inf.
overall performance of the intrusion detection system (IDS) Netw. Appl. Work. WAINA 2016, pp. 551–556, 2016, doi:
10.1109/WAINA.2016.156.
may be possible through the utilisation of ensemble
[15] A. Kannappan and R. M. Bommi, “Energy-Efficient Routing
approaches or the combination of several detection
using the Hybrid Bilevel-Litechenbery-Optimization Algorithm
strategies. in Comparison with Ant-colony Optimization,” ICDCS 2022 -
2022 6th Int. Conf. Devices, Circuits Syst., no. April, pp. 464–
In conclusion, the application of machine learning-based 466, 2022, doi: 10.1109/ICDCS54290.2022.9780826.
intrusion detection systems (IDS) using the UNSW-NB15
dataset has shown its potential to improve cybersecurity by
increasing the accuracy and efficiency of intrusion
detection. These systems, if they continue to improve and be
the subject of research, have the potential to play a vital part
in the process of protecting networks and systems against
changing cyber threats.

REFERENCE
[1] R. Latha and R. M. Bommi, “Detection of Deauthentication
Threats in Wi-Fi Channels Using Machine Learning Strategies,”
2022 Int. Conf. Data Sci. Agents Artif. Intell. ICDSAAI 2022, pp.
4–9, 2022, doi: 10.1109/ICDSAAI55433.2022.10028874.
[2] A. M. Sauber, P. M. El-Kafrawy, A. F. Shawish, M. A. Amin,
and I. M. Hagag, “A New Secure Model for Data Protection over
Cloud Computing,” Comput. Intell. Neurosci., vol. 2021, 2021,
doi: 10.1155/2021/8113253.
[3] R. Latha, “Deauthentication Attack Detection in the Wi-Fi
network by Using ML Techniques,” 2022.
[4] S. Caleb and S. J. J. Thangaraj, “Data-driven ML Approaches for
the concept of Self-healing in CWN , Including its Challenges
and Possible Solutions,” 2023 Eighth Int. Conf. Sci. Technol.
Eng. Math., pp. 1–7, doi:
10.1109/ICONSTEM56934.2023.10142451.
[5] S. Caleb and S. J. J. Thangaraj, “Secured Node Identification
Approach Based on Artificial Neural Network Infrastructure for
Wireless Sensor Networks,” pp. 646–651.
[6] H. Du, J. Chen, M. Chen, C. Peng, and D. He, “A Lightweight
Authenticated Searchable Encryption without Bilinear Pairing for
Cloud Computing,” Wirel. Commun. Mob. Comput., vol. 2022,
2022, doi: 10.1155/2022/2336685.
[7] A. Abdulridha, D. Salama, and K. M, “NHCA: Developing New
Hybrid Cryptography Algorithm for Cloud Computing
Environment,” Int. J. Adv. Comput. Sci. Appl., vol. 8, no. 11, pp.
479–486, 2017, doi: 10.14569/ijacsa.2017.081158.
[8] I. No, “ANALYSIS OF MACHINE LEARNING ALGORITHM
FOR PREDICTION OF,” vol. 11, no. 3, pp. 42–47, 2020.
[9] T. J. Nandhini and K. Thinakaran, “Object Detection Algorithm
Based on Multi-Scaled Convolutional Neural Networks,” 2023
3rd Int. Conf. Artif. Intell. Signal Process., pp. 1–5, doi:
10.1109/AISP57993.2023.10134980.
[10] V. Dilli Ganesh and R. M. Bommi, “‘Prediction of Tool Wear by
Using RGB Techniques in Comparison with Experimental
Analysis,’” 2022 Int. Conf. Data Sci. Agents Artif. Intell.
(ICDSAAI), Chennai, India, 2022, pp., 2022.
[11] A. Elgammal, D. Harwood, and L. Davis, “Non-parametric
Model for Background Subtraction,” pp. 751–767, 2000.
[12] T. Wu and N. Sun, “A reliable and evenly energy consumed
routing protocol for underwater acoustic sensor networks,” 2015
IEEE 20th Int. Work. Comput. Aided Model. Des. Commun. Links
Networks, CAMAD 2015, pp. 299–302, 2016, doi:
10.1109/CAMAD.2015.7390528.
[13] T. Subhash Bora and M. D. Rokade, “Human Suspicious Activity
Detection System Using Cnn Model for Video Surveillance,” vol.
7, no. 3, p. 2021, 2021, [Online]. Available: www.ijariie.com
[14] M. A. Khan, N. Javaid, A. Majid, M. Imran, and M. Alnuem,

2023 Second
370Authorized licensed International Conference on Smart Technologies for Smart Nation (SmartTechCon 2023)
use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:07 UTC from IEEE Xplore. Restrictions apply.

You might also like