Article Elmo Ud

Mouhsine Elmoudir
IDSD
Proposed Intrusion Detection Model using Decision

Tree
• Classifier with Feature Engineering
4
Abstract
• Actually , one of the most problems happening in the connected devices to the internet and there’s a
huge need of security and privacy to make the transformation of data more safe , then the
solution is the IDS stands for intrusion detection systems , this software provide more reliability and
power to make a difference between normal information and anormal information
• Keywords— Network security; IDS; ML; Decision tree; Data Quality; Entropy decision;
Feature selection
Introduction
• This problem show us the use of classification which include to the supervised learning , like the
detection of the the normal and spam emails in gmail .So , there’s an interesting process to
implement the IDS and maintain it in real time in order to keep away the attack,
• Then the only way to make it is to create a model which have the ability to detect this problem and
keep it active
• Recently, ML methods have been integrated to enhance intrusion detection and reinforce computer
security. Numerous research contributions explore how to incorporate ML techniques in
intrusion detection to obtain reliable IDS with accurate performances by enhancing data
quality and training . we have the using of more techniques , like the decision tree is an
induction algorithm which has been used for classiﬁcation in many issues. It is based on
splitting features and testing the value of each one.The decision tree is more than equivalent
representation to the training set. Hence.
• On the other side, the data is not always obtained in a structured form. For relevant analysis, the
unstructured data have to be pre-processed. This operation is a stage which performed to
enhance data quality and make accurate decisions.
• To solve it , we use the two datasets (NSL-KDD and CICIDS2017)
• The using of KDD
• Features engineering (aims to improve the data quality)

Data transformation  this step comes from implementing the traffic network and use it as
process to touch the building of decision tree model
Data normalization  is performed to implement a particular coding to enumerate feature values and
establish a pattern of activities facilitating the distinction between the activities , we used the a function
called MinMax
Feature selection  is a desirable process aiming to select the useful features to both reduce the
computational cost of modelling and to improve the performance of the predictive model , it starts first with
data transformation by applying this process
• Machine learning algorithms
After doing all of this parts of feature enginnering , we move to the techniques which
provide a deep analysis to make this action happen :
Decision tree classifier  such as method used in machine learning to describe the part of moving
from building of decision tree model and create like a zone which can make a difference between
the message by using the intrusion and then classify the normal data to YES ==1 or anormal data
to NO==0 , so this zone called anomaly detection
Confusion matrix  for estimating the performances of a classifier is predictive accuracy. The proportion
of a set of unseen instances that it correctly classifies. For numerical performances evaluation of the
proposed model, we used equations :
ACC is obtained from equation 2. It is the ratio of instances that are correctly predicted as normal or attack
to the overall number of instances in test set. DR is calculated using equation 3 and indicates the ratio of the
number of instances that are correctly classified as attack to the total number of attack instances present in
test set. FAR is obtained from equation 4 and represents the ratio of instances which is categorized as attack
to the overall number of instances of normal behavior
• Training and test set
As we know “the training set” take a big portion than “test set” when we have to
process it in real time
Training and test set  As we see first with data transformation by applying feature selection using
entropy decision on original traffic collected within network traffic to obtain a good training set. In fact, it is
a critical step aiming to improve accuracy of our approach. It aims also to overcome training complexity by
reducing analysis data and obtain a great model with best performances in terms of accuracy, detection rate
and real-time detection
For validation step of our model, there are various strategies used to split the data into a training and test set.
In this case, we use the efficient and recommended one, k-fold
Sharing the data  the sharing is very important, so we used the cross validation by completing the
rules (take the group as a hold out or test data set, take the remaining groups as training data set, fit a model on
the training set and evaluate it on the test set and make a score to it
To validate our proposed intrusion detection model, we use the 10-fold cross validation technique to obtain
training and test set. Hence, we split randomly full dataset into ten parts with same size. Nine parts are used in
the training and the last part in the test step. Finally,
the performances of model are presented by repeating this procedure ten times
• Evaluation and performance
i. The performances are evaluated on two datasets: NSL-KDD and CICIDS2017.
ii. The experimental setting of our research work is performed and evaluated on a computer with a Core-i7
2700K CPU@ 2.50 GHz and 32 GB of DDR3 running windows 7 professional 64 bits. The entropy
feature selection and decision tree model training are implemented using python version 3.8.0
iii. We split randomly full dataset into ten parts with same size. Nine parts are used in the training and the
last part in the test step. Finally , the performances of model are presented by repeating this procedure
ten times
• Deployments

Article Elmo Ud

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Article Elmo Ud

Uploaded by

Copyright:

Available Formats

Mouhsine Elmoudir

Proposed Intrusion Detection Model using Decision

• The using of KDD

• Features engineering (aims to improve the data quality)

• Machine learning algorithms

• Evaluation and performance

i. The performances are evaluated on two datasets: NSL-KDD and CICIDS2017.

You might also like