Professional Documents
Culture Documents
Abstract— An intrusion detection system (IDS) is a software and using an Intrusion Detection System we can deal with
system that keeps track of network traffic and looks for cyber-attacks. These systems continuously monitor a network
anomalies. Abnormal or unusual network changes could be to find anomalous activity or intrusions [4]. Various Machine
signs of fraud at any phase from the start of an attempt to a Learning (ML) techniques; the subset of artificial intelligence
complete intrusion. Since data sharing primarily depends on which is capable of finding hidden patterns or trends in data
the internet, it must be safe. For internet security, data and making precise predictions can be used for intrusion
encryption and authentication are insufficient and firewalls are detection [5].
unable to identify fraudulent packets that are fragmented.
Moreover, attackers frequently vary their strategy, equipment, The paper organization is as follows: Section II addresses
methods and tactics which can have disastrous results and the various types of IDS methods and attacks and the
effects such as lost productivity, financial loss, data loss etc. So, problem statement. Section III describes the material and
it become essential to put in place an effective intrusion methods used. Section IV shows the implementation and
detection system which is a very challenging task. The various result analysis of the proposed IDS and section V highlights
supervised Machine Learning (ML) algorithms are applied in the various challenges and the future scope of the intrusion
this paper, like J48, Random Forest, Random Tree, Hoeffding detection system. Section VI concludes the overall paper with
Tree and Logistic Model to predict the accuracy of an IDS a discussion on future scope.
system. The analysis was performed on the basis of three
categories of data split and the algorithm that gives the best II. ANOMALY BASED INTRUSION DETECTION
accuracy is suggested for future predictions. The various
performance measures like accuracy, execution time, precision, A. Different types of intrusion detection methods and
F-measure and ROC curve are also analyzed. Random Forest attacks:
exhibits best accuracy of 99.84% at a split ratio of 80:20 ratio The network size and associated data have significantly increased
as compared to other ML algorithms in all aspects. The as a result of the fast developments in the internet and
execution time to build and test the model is less incase of communication areas. The resulting growth of various
Random Tree. As accuracy is the prime concern for an unique threats has made it difficult for network security to properly
intrusion detection system (IDS); Random Forest is suggested identify attacks [6]. The different IDS methods for detecting attacks
to be the best solution as it provides highest accuracy. has broadly categorised as:
Keywords― Anomaly based Intrusion detection model,
Network Security, Machine Learning techniques, Network
x Signature based detection
Attack.
x Anomaly based detection
I. INTRODUCTION
x Hybrid based detection.
Data protection is of paramount importance in today’s
world. The vast amount of dataflow between corporations The specific types of attacks are categorized into four
and consumers needs to be secured considering that they are groups which are being used as a benchmark by various
entrusted with a lot of belief. The company can spend researchers to compare their intrusion detection systems
millions of dollars on the most secured servers but it takes a performance [7]. These attacks are summarized in the table
single hacker to ruin the goodwill between the organizations. (TABLE I) below:
To prevent these malicious attacks many automated security
systems have been developed which are also known as Traditional intrusion detection is losing effectiveness as
Intrusion Detection systems (IDS) [1] [2]. An IDS is a host or new threats appear and communication protocols grow.
system that is inserted into a network to record traffic and Malicious activity detection is a crucial issue that needs to be
detect malicious activity based on predetermined rules. This addressed in future IDS, particularly for undiscovered threats.
malicious conduct is then recorded and a notification of an Additionally, the presence of intruders who intend to launch a
intrusion is sent to the appropriate parties thus identifying variety of attacks within the network cannot be neglected and
attempts to compromise the integrity, confidentiality or must be dealt with immediately. In order to effectively detect
accessibility of resources and assets [3]. breaches across the network, a variety of ML algorithms are
being implemented. The following figure Fig. 1 shows the
Threats to computer networks, infrastructure and block diagram of an IDS that makes use of the ML model to
equipment have grown rapidly in the current years. Hence, predict an attack. It uses two labels normal and anomaly to
cyber security has shown to be a significant issue. Malicious classify an attack.
activity detection is a serious issue that needs to be solved
2
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:34 UTC from IEEE Xplore. Restrictions apply.
classify which approach yields the best results [16]. The The following figure shows the result in the form of line
algorithm that has low execution time but high precision, graph and bar graph for the above table to provide a clear
recall and accuracy is considered as the best. visualization of the results.
The following figure (Fig. 3) provides the step by step
description in the form of flowchart for the methodology
being adopted and it also shows the type of result being
produced by the intrusion detection system. As shown in the
figure, the ML based algorithms are applied to test the
accuracy and classify whether there is an attack or not.
Moreover, it provides the best ML algorithm based on the
highest accuracy value.
Fig. 4. Line graph showing highest accuracy for Random Forest at the split
ratio of 80:20
3
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:34 UTC from IEEE Xplore. Restrictions apply.
employed in real time scenario [19]. So, the biggest challenge
for the proposed IDS is to be efficient enough to verify its
effectiveness for modern networks as demonstrated in the
results. The dataset which are available are imbalanced in
nature which is also a major problem that needs to be
considered. The conventional IDS cannot process large
volumes of data and may not respond accurately to new
threats [20].
Future research should focus on developing Deep
Learning (DL) based intrusion systems that are compact,
effective and capable of quickly identifying network
intrusions. IDS can be dispersed among the sensor nodes or
deployed at the places where network traffic from the internet
enters the IoT network and for this a lightweight IDS model
can be used [21][22]. To enhance the number of minority
attack instances it is necessary to provide an updated, real-
time and balanced dataset on which effective methods and
Fig. 6 Bar graph showing time taken to build and test the model for different techniques can be applied to detect every kind of intrusion
classifiers at 80:20 ratio
thus resulting in a safe network.
The various other factors like precision, recall, F – VI. CONCLUSION
Measure and ROC area have been used to perform the
evaluation of the ML algorithm being used in IDS at a split IDS is a software application that is used to detect
ratio of 80:20 as it is analysed as the best split ratio. The network intrusion. It uses various machine learning
performance of a model mainly depends upon its accuracy algorithms to detect whether there is an attack or not. IDS
especially in case of an IDS system; however the other keeps an eye out for malicious activities on a network or
parameters are also important and can also be taken into system and guards against unwanted access from anyone
consideration as these values can provide a better insight and including insiders. The results from the experiment shows
good prediction rate of the model. The values which are close that an IDS system works more accurately at the split ratio of
to 1 show better performance in terms of precision, recall and 80:20 for most of the supervised machine learning
F-measure. As shown in the table below Random Forest algorithms. The best result is produced by the Random Forest
shows the highest precision, recall and F-measure value. in terms of the accuracy among all the five ML algorithms
that are taken into consideration. The IDS main task is to
TABLE IV. PERFORMANCE METRICS SHOWN FOR DIFFERENT build a predictive model that is capable of distinguishing
CLASSIFIERS AT 80:20 RATIO between a bad connection (i.e. attack) represented by the
ROC
label anomaly and a good connection (i.e., not an attack)
ML Algorithm Precision Recall F-Measure Area represented by the label normal. One of the biggest
challenges in developing an IDS is building a lightweight
J48 0.997 0.997 0.997 0.998 IDS model for IoT devices that can be more effective in
terms of attack detection rate.
Random Forest 0.998 0.998 0.998 1.000
REFERENCES
Random Tree 0.981 0.981 0.981 0.992
[1] Razan Abdulhammed, Miad Faezipour, Khaled M. Elleithy, "Network
Hoeffding Tree 0.975 0.975 0.975 0.995 intrusion detection using hardware techniques: A review," Systems
Applications and Technology Conference (LISAT) IEEE, Long Island,
Logistic 0.967 0.967 0.967 0.981 pp. 1-7, 2016.
[2] S. Ustebay, Z. Turgut, and M. A. Aydin, "Intrusion detection system
with recursive feature elimination by using random Forest and deep
At last, the paper performs classification using various learning classifier," International Congress on Big Data, Deep
supervised learning ML techniques like J48, Random Forest, Learning and Fighting Cyber Terrorism (IBIGDELFT), pp. 71–76,
Random Tree, Hoeffding Tree and Logistic. It has been 2018.
observed that among all classification model used to classify [3] S. Kumar, S. Gupta and S. Arora, "Research Trends in Network-Based
an attack Random Forest shows the highest accuracy of Intrusion Detection Systems: A Review," IEEE Access, vol. 9, pp.
99.8412% as well as higher precision (0.998) and F-measure 157761-157779, 2021.
value (0.998) at 80:20 split ratio. [4] Khraisat, A., Alazab, A., “A critical review of intrusion detection
systems in the internet of things: techniques, deployment strategy,
V. RESEARCH CHALLENGES AND FUTURE validation strategy, attacks, public datasets and challenges,”
Cybersecurity, vol. 4, no. 18, 2021.
SCOPE [5] Hasan, M., Islam, M. M., Zarif, M. I. I., & Hashem, M. M. A., “Attack
There are lot of research challenges that need to be and anomaly detection in IoT sensors in IoT sites using machine
addressed in this area of cyber security particularly for learning approaches,” Internet of Things, vol. 135, no.1, vol. 7, no.1,
pp.100059, 2019.
intrusion detection. Despite of the tremendous efforts by the
[6] B. Singh and S. N. Panda, “An Adaptive Approach to Mitigate Ddos
researchers an IDS still faces a number of challenges in Attacks in Cloud,” Int. J. Adv. Comput. Sci. Appl., vol. 6, no. 10, pp.
detecting new and unexpected intrusions. Since the majority 47–52, 2015.
of the suggested approaches are examined and validated in a [7] Giovanni Vigna and Richard A. Kemmerer, “NetSTAT: A network-
lab environment utilising freely accessible and publically based intrusion detection system,” Journal of Computer Security, vol.
available datasets there results are not so practical when 7, pp. 37-71, Jan. 1999.
4
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:34 UTC from IEEE Xplore. Restrictions apply.
[8] Moustafa, N., & Slay, J., “The evaluation of Network Anomaly Hybrid Approach and Impact on Growth Trend due to COVID-19”,
Detection Systems: Statistical analysis of the UNSWNB15 data set International Journal of Networking and Virtual Organisations, vol.
and the comparison with the KDD99 data set,” Information Security 25, no. 3-4, 2021.
Journal: A Global Perspective, vol. 25 no. 1-3, pp.18-31, 2016. [16] S. Goel, K. Guleria, and S. N. Panda, “Machine Learning Techniques
[9] https://www.kaggle.com/datasets/hassan06/nslkdd , accessed online for Precision Agriculture Using Wireless Sensor Networks”, ECS
on 10 June 2022. Transactions, vol. 25, no. 3, pp. 9229-9238, 2022.
[10] Miriam Seoane Santos, Jastin Pompeu Soares, Pedro Henrigues [17] Belouch, M., Hadaj, S. E., & Idhammad, M., “Performance evaluation
Abreu, Helder Araujo, and Joao Santos, “Cross-validation for of intrusion detection based on machine learning using Apache
imbalanced datasets: avoiding overoptimistic and overfitting Spark”, Procedia Computer Science, vol. 127,pp. 1-6, 2018.
approaches,” IEEE Computational Intelligence Magazine, vol. 13, no. [18] Belavagi, M. C., &Muniyal, B., “Performance evaluation of
4, pp. 59-76, 2018. supervised machine learning algorithms for intrusion detection”,
[11] Umesh Kumar Lilhore, Sarita Simaiya, Devendra Prasad, Kalpna Procedia Computer Science, vol. 89, no. 1, pp. 117-123, 2016.
Guleria, “A Hybrid Tumour Detection and Classification Based on [19] Sridevi, S., Parthasarathy, S., & Rajaram, S. , “An Effective Prediction
Machine Learning,” Journal of Computational and Theoretical System for Time Series Data Using Pattern Matching Algorithms”,
Nanoscience, vol. 17, no. 6, pp. 2539-2544, 2020. International Journal of Industrial Engineering, vol. 25, no. 2, pp.
[12] Amandeep Sharma, Kalpna Guleria, Nitin Goyal, “Prediction of 123-136, 2018.
Diabetes Disease using Machine Learning Models”, International [20] SB Atham, K Guleria, "Smart City in Underwater Wireless Sensor
Conference on Communication, Computing and Electronics Systems, Networks", Energy-Efficient Underwater Wireless Communications
vol. 733, pp. 683-690, 2021. and Networking, pp. 287-301, 2021.
[13] Remco R. Bouckaert, Eibe Frank, Mark Hall, Richard Kirkby, Peter [21] S. Badotra, Di. Nagpal, S. N. Panda, S. Tanwar, and S. Bajaj, “IoT-
Reutemann, Alex Seewald, and David Scuse, WEKA manual version Enabled Healthcare Network with SDN,” ICRITO 2020 - IEEE 8th
3-8-1. University of Waikato, Hamilton, New Zealand, Dec. 2016. Int. Conf. Reliab. Infocom Technol. Optim. Trends Futur. Dir., pp. 38–
[14] Zanariah Zainudin, Siti Mariyam Shamsuddin, and Shafaatunnur 42, 2020.
Hasan, “Deep learning for image processing in WEKA environment”, [22] Elsaeidy, A., Munasinghe, K. S., Sharma, D., &Jamalipour, A.,
International Journal of Advances in Soft Computing and its “Intrusion detection in smart cities using Restricted Boltzmann
Applications, vol. 11, no. 1, pp. 1-21, 2019. Machines”, Journal of Network and Computer Applications, vol. 135,
[15] Pradeepta Kumar Sarangi, Kalpna Guleria, Devendra Prasad, Deepak no.1, pp. 76-83, 2019.
Kumar Verma , “ Stock Movement Prediction Using Neuro Genetic
5
Authorized licensed use limited to: Staffordshire University. Downloaded on March 28,2024 at 16:28:34 UTC from IEEE Xplore. Restrictions apply.