You are on page 1of 4

Mouhsine Elmoudir

IDSD

Proposed Intrusion Detection Model using Decision


Tree
• Classifier with Feature Engineering
4

Abstract
• Actually , one of the most problems happening in the connected devices to the internet and there’s a
huge need of security and privacy to make the transformation of data more safe , then the
solution is the IDS stands for intrusion detection systems , this software provide more reliability and
power to make a difference between normal information and anormal information
• Keywords— Network security; IDS; ML; Decision tree; Data Quality; Entropy decision;
Feature selection

Introduction
• This problem show us the use of classification which include to the supervised learning , like the
detection of the the normal and spam emails in gmail .So , there’s an interesting process to
implement the IDS and maintain it in real time in order to keep away the attack,

• Then the only way to make it is to create a model which have the ability to detect this problem and
keep it active
• Recently, ML methods have been integrated to enhance intrusion detection and reinforce computer
security. Numerous research contributions explore how to incorporate ML techniques in
intrusion detection to obtain reliable IDS with accurate performances by enhancing data
quality and training . we have the using of more techniques , like the decision tree is an
induction algorithm which has been used for classification in many issues. It is based on
splitting features and testing the value of each one.The decision tree is more than equivalent
representation to the training set. Hence.
• On the other side, the data is not always obtained in a structured form. For relevant analysis, the
unstructured data have to be pre-processed. This operation is a stage which performed to
enhance data quality and make accurate decisions.
• To solve it , we use the two datasets (NSL-KDD and CICIDS2017)

• The using of KDD

• Features engineering (aims to improve the data quality)


Data transformation  this step comes from implementing the traffic network and use it as
process to touch the building of decision tree model

Data normalization  is performed to implement a particular coding to enumerate feature values and
establish a pattern of activities facilitating the distinction between the activities , we used the a function
called MinMax
Feature selection  is a desirable process aiming to select the useful features to both reduce the
computational cost of modelling and to improve the performance of the predictive model , it starts first with
data transformation by applying this process

• Machine learning algorithms

After doing all of this parts of feature enginnering , we move to the techniques which
provide a deep analysis to make this action happen :
Decision tree classifier  such as method used in machine learning to describe the part of moving
from building of decision tree model and create like a zone which can make a difference between
the message by using the intrusion and then classify the normal data to YES ==1 or anormal data
to NO==0 , so this zone called anomaly detection

Confusion matrix  for estimating the performances of a classifier is predictive accuracy. The proportion
of a set of unseen instances that it correctly classifies. For numerical performances evaluation of the
proposed model, we used equations :
ACC is obtained from equation 2. It is the ratio of instances that are correctly predicted as normal or attack
to the overall number of instances in test set. DR is calculated using equation 3 and indicates the ratio of the
number of instances that are correctly classified as attack to the total number of attack instances present in
test set. FAR is obtained from equation 4 and represents the ratio of instances which is categorized as attack
to the overall number of instances of normal behavior
• Training and test set

As we know “the training set” take a big portion than “test set” when we have to
process it in real time

Training and test set  As we see first with data transformation by applying feature selection using
entropy decision on original traffic collected within network traffic to obtain a good training set. In fact, it is
a critical step aiming to improve accuracy of our approach. It aims also to overcome training complexity by
reducing analysis data and obtain a great model with best performances in terms of accuracy, detection rate
and real-time detection
For validation step of our model, there are various strategies used to split the data into a training and test set.
In this case, we use the efficient and recommended one, k-fold

Sharing the data  the sharing is very important, so we used the cross validation by completing the
rules (take the group as a hold out or test data set, take the remaining groups as training data set, fit a model on
the training set and evaluate it on the test set and make a score to it
To validate our proposed intrusion detection model, we use the 10-fold cross validation technique to obtain
training and test set. Hence, we split randomly full dataset into ten parts with same size. Nine parts are used in
the training and the last part in the test step. Finally,
the performances of model are presented by repeating this procedure ten times

• Evaluation and performance

i. The performances are evaluated on two datasets: NSL-KDD and CICIDS2017.

ii. The experimental setting of our research work is performed and evaluated on a computer with a Core-i7
2700K CPU@ 2.50 GHz and 32 GB of DDR3 running windows 7 professional 64 bits. The entropy
feature selection and decision tree model training are implemented using python version 3.8.0

iii. We split randomly full dataset into ten parts with same size. Nine parts are used in the training and the
last part in the test step. Finally , the performances of model are presented by repeating this procedure
ten times
• Deployments

You might also like