You are on page 1of 5

Anomaly Based Intrusion Detection on IOT Devices using Logistic Regression

Sasikala K Vasuhi S
Assistant Professor Associate Professor
Department of Computer Science Engg., Department of Electronics Engg.,
Saveetha Engineering College Madras Institute of Technology, India
ksasikala1792@gmail.com vasuhi_s@annauniv.edu
2023 International Conference on Networking and Communications (ICNWC) | 979-8-3503-3600-9/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICNWC57852.2023.10127375

Abstract
The collecting and exchange of information without human intervention will soon be possible thanks to
the Internet of Things. Numerous conflicts with IOT technology are emerging due to the fast increase
in connected devices, including those related to diversity, expansibility, service quality, security
requirements, and many more. IOT technology has advanced as a result of technological developments
like machine learning. To reduce learning difficulty by computing features, factor selection, also called
feature selection, is crucial, especially for a large, huge data set like network traffic. Despite the ease of
the new selection approaches, it is actually not an easy task to do feature selection properly. The
Internet of Things will soon make it feasible to gather and transmit information without human
involvement. Due to the rapid growth in connected devices, a number of conflicts with IOT
technologies are arising. These conflicts include those involving diversity, expansibility, quality of
service, security needs, and many more. As a consequence of technical advancements like machine
learning, IOT technology has improved. Factor selection, also known as feature selection, is essential
to lessen the complexity of learning by computing features, especially for a massive, enormous data set
like internet traffic. Even though the new selection methods are simple, selecting features correctly is a
difficult undertaking. Systems that detect and prevent intrusions are the most popular technology for
spotting suspicious behaviour and defending diverse infrastructures against network intrusions (IDPSs).
On the UNSW (University of New South Wales)-NB15 data set, our suggested logistic regression
algorithm makes predictions of anomalies with an accuracy of 98% using the automated feature
selection approach since the accuracy of the model depends on the feature. The dimensionality
reduction approach is used to reduce the misleading data.

Keywords: Internet of things, Anomaly, Factor selection, logistic regression, Detection rate.

I Introduction information from theft, disclosure, and DOS


With smart sensors having enhanced connection, attacks. Researchers are looking at alternative
such as patient monitoring, environmental anomalous traffic detection techniques using
sensing, flood control, smart farming, and smart machine learning methodologies to improve
homes, the Internet of Everything (IoT) is conventional signature-based intrusion detection
transforming the industry and making life wiser. systems. An intrusion detection system can
More precisely, Without the need of human or identify both attacks and suspicious activity.
device-to-device interfaces, the Internet of However, the intrusion protection system could
Things (IoT) enables a variety of dissimilar miss certain frames if the network is congested.
physical items to cooperate and interact with one systems that audit files, monitor real-time traffic,
another for the goal of sharing data over a wide and identify breaches. Depending as to how
range of networks. [1].On the other hand, intrusion detection systems interpret the traffic,
malicious traffic is growing more intricate and it is categorized as real positive, false alarm, true
regular. To assault their targets, organized cyber negative, or false-negative.
terrorists employ a range of strategies. script
youths, hacktivists, and hackers supported by II. RELATED WORK
nations. Detecting, protecting, and limiting The data set utilized in Adeel et alstudy .'s of the
unwanted traffic is challenging and expensive as classification issue is termed as CICIDS2017. In
businesses strive to adhere to national norms, their work[1], they employ six distinct methods
organization-specific requirements, and for supervised ML classification in intrusion
compliance duties. To defend internal operations detection. Tree structure (DT), naive Bayes
from outside threats, a secured network (NB), Gaussian and multi variable, random
architecture should incorporate front-line forests (RF), regression models (LR), linear
systems such firewalls, intrusion prevention SVM, and stochastic gradient descent predictor
system, online content screening, and URL are some of the techniques utilized with stacked
filtering[2]Due to the evolution of attack classifier as an ensemble approach (SGD
strategies and the complexity of organized Classifier). Using an ensemble paradigm, it
crime, it is difficult to properly protect sensitive gives high accuracy with less processing power,

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 15,2023 at 04:53:44 UTC from IEEE Xplore. Restrictions apply.
resources, and false alarm rate by using ML techniques for figuring out how many features
algorithms instead of ANN and DL techniques. should remain in the decreased feature set. The
Classifier NB, DT, and LR have accuracy levels experimental findings with two current network
of 83.93%, 93.50%, and 87.60% in multi-class, datasets demonstrate that the provided ensemble
respectively, with a notable improvement in and halting approaches consistently produce
accuracy. performance with a lesser number of features
Fekadu et al. experimented with five machine compared to traditional selection strategies for
learning algorithms to discover an effective the feature sets found.
classifier that detects anomalous traffic from the Machine learning-based intrusion in resource-
NSL- KDD data set with high precision level constrained IoT contexts was proposed by
and minimum error rate[2]. Five classifiers— Yakub et al. [5]. In order to prevent information
Stochastic Gradient Descending, Random from leaking onto test data, the proposed
Forests, Logistic Regression, Support Vector learning algorithm uses calculated on the basis
Machine, and Sequential Models—have been based on the min-max idea of normalization in
examined and validated to provide the result. the UNSW-NB15 data set. Then, all XGBoost,
The results show that Random forest Cat-boost, K - nearest neighbors, SVM, QDA, &
classification performs better than the other four NB classifiers are trained using the reduced data
classifiers both with and without using the set. In terms of testing data, accuracy, F1,MCC,
normalisation method to the dataset. or correctness of the two suggested models, the
In a sensor or Internet of Things node with experimental results of the proposed study
limited resources, Christiana et al. assessed the exceeded the state-of-the-art, reaching 99.99%.
viability of operating a lightweight detection
mechanism system[3]. They employed mIDS, Ahmed et alevaluation .'s of the issues with
which is a statistical analysis technique based on machine learning-based intrusion detection
Binary Regression Analysis to monitor and systems for the The Internet of Things may be
detect assaults (BLR). mIDS creates a normal found at [6]. Investigated large dimensionality,
behaviour model that recognizes abnormalities computational complexity, and changing and
inside the confined node using just local node idea drift as the three key issues of learning
characteristics for both malicious and benign algorithms when interacting with an IDS for
activity as input. ensures proper functioning by such IoT. demonstrated every one of these
validating mIDS in a situation with active difficulties generally, and their connections to
network-layer assaults. Critical information from machine learning specifically. Additionally, it
the network level is acquired and used in this featured the KDD99, NSL, and Kyoto datasets
system as the foundation for profiling sensor as its three core datasets. The key issues with
activity. IoT and IDSs were then discussed, along with
Rough Set Theory (RST) and Support Vector solutions based on the literature that has already
Machine (SVM) were utilised by Vipin et al. to been written on the subject.
detect network breaches[4]. RST is used to To better perform its function as an assistant
reduce the size and restore the previous state of driver, ADAS should incorporate the suggested
the data after the initial collection of packets RBM-LSTM framework. D. Wu et al.[14]
from the network. The SVM model will get the proposed the RBM-LSTM framework serves as
attributes that RST chooses to utilise for training the driver's warning since it is able to identify
and testing. The technique works well to reduce how the driver's actions affect the vehicle while
data space density. The studies indicate that RST driving. The security of ADAS itself is increased
and SVM schema may lower the number of false thanks to RBM-LSTM, which helps ADAS
positives and boost accuracy by comparing the comprehend the effects of each operation, which
results with Principal components analysis actions are appropriate and which behaviors
(PCA). would cause issues.
With the use of network anomaly detection, The operational architectures and theoretical
Makiya et al. investigated the difficulties in underpinnings of the primary anomaly-based
automated feature selection. By incorporating network intrusion detection methods were
the existing methodologies, the author proposed explored by Jyothsna et al. [7] in addition to
an ensemble classifier that reaps their benefits. their classification of processing types in relation
One of the suggested ensemble techniques, to the "behavioral" model for the target system.
based on greedy search, performs highly The key characteristics of numerous ID
consistently and produces results that are systems/platforms that are now accessible are
comparable to those of the existing also briefly described in this study. With
techniques[5]. The author also discusses the assessment given special consideration, the most
issue of knowing when to end the feature significant outstanding issues with anomaly-
removal process and offers a variety of based network intrusion detection are addressed.

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 15,2023 at 04:53:44 UTC from IEEE Xplore. Restrictions apply.
Talking about research and development in the models. Here, the IDS was designed using
field of IDS must start with the material that has characteristics whose values fluctuate during
been presented. More speed and effectiveness in attack phases as opposed to periods of regular
countermeasures are needed to deal with the operation. Even the best algorithm will fail to
uptick in assaults. detect an incursion or an unusual state if the
specified feature does not change during the
III PROPOSED WORK attacks.
[13] In order to find the ideal parameter values,
A security system that integrates host and choosing hyper parameters is a time-consuming
network activity is known as an IDS.[11] This and expensive computing operation. As a result,
examines the network packets being exchanged, several selection approaches are available,
looks for unusual activity, and processes the including grid search, random search, and
alert notice. The cornerstone for anomaly-based Bayesian-based optimization techniques.
detection is network behaviour. The anomaly -
based event will be initiated or the network
activity will be permitted if it fits the expected
behaviour. The expected network behaviour is
prepared for or taught by the network
administrators. Anomaly detection is the process
of identifying deviations from the regular pattern
of network traffic. During non-attack times, the
network's typical profile is noted and is mostly
reflected by statistical information. [6] A
manager who generally only uses his account
after hours and only accesses the network during
the day is an example of this type of activity
deviation. Even if it was not related to an assault,
such a variance is odd and might be a sign of
one. This can result in a false alert as a result. In
order to avoid erroneous alarms, it is therefore
possible to carry out regular updates of the user Figure 1. Pie chart with labels for normal and
behaviour patterns on the network. The three unhealthy conditions.
categories of anomaly-based intrusion detection
are statistical, knowledge-based, and machine Figure 1 displays the labels' normal and
learning-based. abnormal values. The multi-class label
Statistical-based anomaly IDS Anomaly distribution chart is displayed.
detection based on statistics The regular
statistical properties of the traffic are compared i. Logistic Regression (LR)
to a stochastic model of the usual functioning of
the traffic using statistical-based anomaly IDS. LR is a supervising method of machine learning
According to sources, the gap is the source of for categorizing things that keeps an eye on a
the attack. certain set of classes.While it is referred to as
knowledge-based anomaly IDS Professionals "regression," Logistic Regression (LR) is a
supply a set of rules in the form of an expert machine learning approach that is mostly used
system or fuzzy system to characterise the for the binary classification job. When the
behaviour of usual connections and attacks in learning algorithm uses the one-vs rest
knowledge-based anomaly identification. The techniques for multi class classification tasks,
rule-based technique is coupled to inputs for soft the LR can also be applied[15]. The logistic
anomaly detection. In addition to enabling some function uses the sigmoid function as its cost
of the rules to be based on input values, function. Predictions are converted to
heuristics or a syntactic description of the probabilities using this function [9]. We showed
attack's behaviour may be provided. that the probability of an event happening may
Machine learning-based anomaly IDS be anticipated by fitting the data to a linear
Analyzed patterns are either explicitly or model. The sigmoid function's formula is:
implicitly modeled in an anomaly IDS that is
(1)
based on machine learning. These models
undergo frequent updates to improve the
efficacy of intrusion detection based on prior Where F(x) is an outputs between 0 and 1, x is
findings. [12]Extracting features from the traffic an input to the function, and e is the natural log
is a crucial stage in the training of the ML base. Logistic regression is one of them that has

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 15,2023 at 04:53:44 UTC from IEEE Xplore. Restrictions apply.
been widely utilized as a broad data processing account. Common categorization metrics include
tool for binary classification that predictions. correctness, real positive values, and false
However, one linear model—logistic positive rates. Recall and precision are functions
regression—is the subject of this work [10]. of accuracy and true positive rates.
Binary classifier data types might be modeled
using a number of learning strategies. ! " # / ! " # "
1| , 1 ! " # (3)
(2)
! % & '( !/ ! " ! (4)
The model for logistic regression for a binary
)% ** !/ ! " # (5)
answer may be stated by integrating the different
combination of input characteristics, a
correlating weight (w), or a biased term (b) for
every occurrence, as shown in equation (2).

Figure 4. Multi-label correlation matrix

Figure2. Binary categorization in log regression The accuracy of logistic regression using binary
classification is 97.80%, and the accuracy of
The binary classification in Figure 2 of our regression models using multi-class
suggested strategy for logistic regression is classification is 97.58%.
shown. The response variable in logistic
regression has a numerical value. Logistic IV CONCLUSION
regression uses the log of the probability of To defend internal operations from outsider
being allocated to the i-th grouping of a single or threats, a protected network design should
multi-class answer as the answer variable [9]. incorporate front-line systems including
When employing logistic regression, a variety of firewalls, intrusion prevention and detection
assumptions are made, such as the following: systems, online media filtering, and URL
independent, normally distributed answers filtering. Using logistic regression, we suggested
(logits) at each and every level of a subgroups of an anomaly-based intrusion detection system for
the explained variable, with variance between IOT devices in this work. Binary and multi-class
the answers and so all values of the explained classification were employed using logistic
variable. regression. The accuracy of logistic regression
using classifier is 97.80%, and the accuracy of
regression models using multi-class
classification is 97.58%.

References
[1]. Abbas, Adeel & Khattak, Muazzam &
Latif, Shahid & Ajaz, Maria & Shah, Awais &
Ahmad, Jawad. (2021). A New Ensemble-Based
Figure 3. Multi class classification and logistic Intrusion Detection System for the Internet of
regression Things. ARABIAN JOURNAL FOR SCIENCE
AND ENGINEERING. 47. 10.1007/s13369-
Figure 3 displays the multi class categorization 021-06086-5.
using logistic regression in our suggested
approach. [2].F. Yihunie, E. Abdelfattah and A. Regmi,
"Applying Machine Learning to Anomaly-Based
IV Results Intrusion Detection Systems," 2019 IEEE Long
Island Systems, Applications and Technology
When assessing the effectiveness of the Conference (LISAT), Farmingdale, NY, USA,
complete model, correctness, positive predictive 2019, pp. 1-5, doi:
value, rate of false positives accuracy, and 10.1109/LISAT.2019.8817340.
recalls were the primary criteria taken into

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 15,2023 at 04:53:44 UTC from IEEE Xplore. Restrictions apply.
[3].Christiana Ioannou and Vasos Vassiliou. Journal of Computer Applications. 28. 26-35.
2018. An Intrusion Detection System for 10.5120/3399-4730.
Constrained WSN and IoT Nodes Based on
Binary Logistic Regression. Proceedings of the [10].M. A. Siddiqi and W. Pak, "An Agile
21st ACM International Conference on Approach to Identify Single and Hybrid
Modeling, Analysis and Simulation of Wireless Normalization for Enhancing Machine Learning-
and Mobile Systems (MSWIM '18). Association Based Network Intrusion Detection," in IEEE
for Computing Machinery, New York, NY, Access, vol. 9, pp. 137494-137513, 2021, doi:
USA, 259–263. 10.1109/ACCESS.2021.3118361.
https://doi.org/10.1145/3242102.3242145
[11]. P. L. S. Jayalaxmi, R. Saha, G. Kumar,
[4]. Vipin, Das & Vijaya, Pathak & Sattvik, M. Conti and T. -H. Kim, "Machine and Deep
Sharma & Sreevathsan, & MVVNS.Srikanth, & Learning Solutions for Intrusion Detection and
T, Gireesh. (2010). Network Intrusion Detection Prevention in IoTs: A Survey," in IEEE Access,
System Based On Machine Learning vol. 10, pp. 121173-121192, 2022, doi:
Algorithms. International Journal of Computer 10.1109/ACCESS.2022.3220622.
Science & Information Technology. 2.
10.5121/ijcsit.2010.2613. [12]. M. Zolanvari, M. A. Teixeira, L. Gupta,
K. M. Khan and R. Jain, "Machine Learning-
[5] Yakub Kayode Saheed, Aremu Idris Based Network Vulnerability Analysis of
Abiodun, Sanjay Misra, Monica Kristiansen Industrial Internet of Things," in IEEE Internet
Holone, Ricardo Colomo-Palacios,A machine of Things Journal, vol. 6, no. 4, pp. 6822-6834,
learning-based intrusion detection for detecting Aug. 2019, doi: 10.1109/JIOT.2019.2912022.
internet of things network attacks,Alexandria
Engineering Journal, Volume 61, Issue 12, 2022, [13]. G. Abdelmoumin, D. B. Rawat and A.
Pages 9395-9409, ISSN 1110-0168, Rahman, "On the Performance of Machine
https://doi.org/10.1016/j.aej.2022.02.063. Learning Models for Anomaly-Based Intelligent
(https://www.sciencedirect.com/science/article/p Intrusion Detection Systems for the Internet of
ii/S1110016822001570) Things," in IEEE Internet of Things Journal, vol.
[6]Adnan, A.; Muhammed, A.;Abd Ghani, A.A.; 9, no. 6, pp. 4280-4290, 15 March15, 2022, doi:
Abdullah, A.;Hakim, F. An Intrusion Detection 10.1109/JIOT.2021.3103829.
System for the Internet of Things Based on
Machine Learning: Review and Challenges. [14] Di Wu, Hanlin Zhu, Yongxin Zhu,
Symmetry 2021, 13, 1011. Victor Chang, Cong He, Ching-Hsien Hsu, Hui
https://doi.org/10.3390/sym13061011 Wang, Songlin Feng, Li Tian, and Zunkai
Huang. 2020. Anomaly Detection Based on
[7]Ansam Khraisat *, Iqbal Gondal, Peter RBM-LSTM Neural Network for CPS in
Vamplew, Joarder Kamruzzaman and Ammar Advanced Driver Assistance System. ACM
Alazab Internet Commerce Security Laboratory, Trans. Cyber-Phys. Syst. 4, 3, Article 27 (July
Federation University Australia, Mount Helen 2020), 17 pages.
3350, Australia; iqbal.gondal@federation.edu.au https://doi.org/10.1145/3377408.
(I.G.); p.vamplew@federation.edu.au (P.V.);
joarder.kamruzzaman@federation.edu.au (J.K.); [15] Kasongo, S.M., Sun, Y. Performance
aalazab@mit.edu.au (A.A.) * Correspondence: Analysis of Intrusion Detection Systems Using a
a.khraisat@federation.edu.au,A Novel Ensemble Feature Selection Method on the UNSW-NB15
of Hybrid Intrusion Detection System for Dataset. J Big Data 7, 105 (2020).
Detecting Internet of Things Attacks ,2020 https://doi.org/10.1186/s40537-020-00379-6
[8]. Valerio Morfifino and Salvatore Rampone *
Department of Law, Economics, Management
and Quantitative Methods (DEMM), University
of Sannio, I-82100 Benevento, Italy;
valerio.morfifino@ctcgroup.it * Correspondence:
rampone@unisannio.it,Towards Near-Real-Time
Intrusion Detection for IoT Devices using
Supervised Learning and Apache Spark ,2020

[9]. Veeramreddy, Jyothsna & Prasad, V. &


Prasad, Koneti. (2011). A Review of Anomaly
based Intrusion Detection Systems. International

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Authorized licensed use limited to: ANNA UNIVERSITY. Downloaded on June 15,2023 at 04:53:44 UTC from IEEE Xplore. Restrictions apply.

You might also like