G82 Major Midsem

A Network Intrusion Detection System using Neural Networks
MAJOR PROJECT REPORT

(G82)
DEPARTMENT OF COMPUTER SCIENCE AND I.T.
Mentor Name: Submitted by:

Dr. Manju Shubham Sharma
(18103327)
Mayuri Sahai
(18104039)
Damyanti Singh
(18104032)
Introduction
In view of the increasing number of attacks targeting information systems, a defence system is
essential. Intrusion Detection attempts to detect computer attacks by examining data records
observed by processes on the same network. These attacks are typically split into two categories,
host-based attacks and network-based attacks. Host based attack detection routines normally use
system call data from an audit process that tracks all system calls made on behalf of each user on
a particular machine. These audit processes usually run on each monitored machine. Network-
based attack detection routines typically use network traffic data from a network packet sniffer.
Many computer networks, including the widely accepted Ethernet network, use a shared medium
for communication. Therefore, the packet sniffer only needs to be on the same shared subnet as
the monitored machines.
Problem statement:
Accessibility Login is an important concept in the security of modern computer networks. Rather
than protecting the network from malicious computer programs that are known to block network
connections, such as Intrusion Prevention Systems, Intrusion Detection is intended to analyze the
current state of the network in real time and to identify potential confusing processes, reporting
them as soon as they are detected. With the remarkable advancement of technology, criminals
have been developing even more sophisticated malware attacks making login detection a very
difficult task. In this context, traditional analytical tools face the daunting challenges of
identifying and mitigating these threats. In this work, we present a novel analysis analysis and
intelligent intelligence acquisition system driven by autoencoder (AE) (IDS). Specifically, the
proposed IDS incorporates data analysis and statistical techniques and the latest developments in
machine learning theory to exclude highly improved, highly correlated features. The proposed
IDS is tested using the NSL-KDD benchmark website. The purpose of this work, in particular, is
the production of labeled NIDS data labels that can detect suspicious behavior on the network and
classify individual communications as normal or bizarre.

Significance/ novelty of the problem:
Comparative test results show that the statistical analysis and IDE-based IDS achieve better
classification performance compared to conventional deep and shallow machine learning and
other recent proposed strategies.
Crime Detection Systems are generally divided into the following categories [2]:
• Uncertainty Detection Compared to Acquisition: In the detection of misuse, each event in the
data set is labeled ‘normal’ or ‘dangerous’ and the learning algorithm is trained with labeled data.
Conflict detection methods, on the other hand, create normal data models and detect deviations
from the normal data model observed.
• Network-based compared to host-based: NIDS login systems are located in a strategic location
or points within the network to monitor the return traffic to all network devices, while Hosting
(HIDS) operating systems are operational. on individual hosts or on network devices.

Background Study - Literature Survey
Title of paper Network Intrusion Detection System:

A Machine Learning Approach
Authors
• Mrutyunjaya Panda
• Ajith Abraham
• Swagatam Das
• Manas Ranjan Patra
Year of publication
2011
Journal IEEE
Summary In this paper the authors propose the results of this

study's empirical analysis that show our suggested
methodology, which employs a variety of machine
learning approaches and the NSL-KDD dataset,
performs admirably in all categories of majority
attacks. However, owing of the substantial bias present
in the sample, a low detection rate for U2R and R2L
minority attacks is unavoidable. Our approach is
particularly fascinating because of the low false
positive rate we achieved in identifying both assaults
and routine situations. We also ran a cost-sensitive
classification and discovered that the models have a
low cost of misclassification. Finally, although using
the best available techniques to protect our resources
from network attacks, no system can be guaranteed to
be completely secure. As a result, computer security is
constantly a lively and hard study topic
Title of paper Detailed analysis of Intrusion detection using machine
learning algorithms
Authors
• Samriddhi Verma
• Nithyanandam P.
Year of publication
2020
Journal
IEEE
Summary In this research, seven distinct machine learning models

were used to apply eight various types of classification
algorithms, including Nave-Bayes, Decision Tree, Random
Forest, K Nearest Neighbors, Logistic Regression, and
Support Vector Machine Learning Using the Soft Voting
approach, a machine and a voting classifier are created. For
the intrusion detection system, a hard voting approach is
used. The All of these machine learning models performed
well. Observed and compared using various standards
Accuracy, Precision, False-Positive, and False-Negative are
some of the test data evaluation factors. These are the
classifiers were capable of dealing with such a large number
of high-dimensional data which yields satisfactory
precision. It was discovered that the Hard-voting method is
used in the Voting Classifier algorithm fared better than
other models, with a precision of With an accuracy of 84.233
percent and a precision of 0.83647, The total number of
False-Negatives and False-Positives.
Tittle of paper Network intrusion detection system: A systematic study of machine
learning and deep learning approaches
Authors
• Zeeshan Ahmed
• Adnan Shahid Khan
Year of publication
2020
Journal IEEE
Summary
In this research the AI-powered NIDS' performance is mainly
dependent on training with an appropriate dataset. To improve the
results of ML models, the algorithm might be trained with a limited
dataset. However, unless the dataset is labelled in nature, ML is not
suitable for larger datasets. For larger datasets, DL approaches are
recommended since labelling is an expensive and time-consuming
operation. From raw datasets, these algorithms will learn and uncover
important patterns. To make NIDS effective in detecting zero-day
attacks, it must be updated on a regular basis with new data received
from network traffic monitoring. The vast dataset and deep nature of
the deep learning algorithms will make the learning process more
computationally intensive. There are also some challenges which
they faced like unavailability of systematic dataset, Low detection
accuracy due to imbalance dataset, Resources consumed by complex
models.
Title of paper
Design and Development of an Efficient Network Intrusion
Detection System Using Machine Learning Techniques
Authors • Thomas Rincy N

• Roopam Gupta
Year of
publication 2021
Journal
Hindawi
Summary In this paper, an efficient hybrid NID-Shield NIDS is proposed.

Furthermore, for precise and highly merit feature subsets, CAPPER,
a successful hybrid feature selection method, is used. According to
attack kinds and names, the proposed hybrid NID-Shield NIDS
classifies the UNSW-NB15 and NSL-KDD 20% datasets. Some
attacks, such as R2L and U2R, may have very few N/W connections,
whilst others, such as Probe and DoS, may have a huge number of
N/W connections or be a combination of any of them. Furthermore,
the hybrid NID-Shield NIDS derives individual performance metrics
for attack names identified in the NSL-KDD 20% dataset (DoS,
Probe, U2R, and R2L) and the UNSW-NB15 dataset. This strategy
also aids us.
Title of the Network Intrusion Detection System Based On Machine Learning
Algorithms
paper
Authors
Vipin Das
Vijaya Pathak
Sattvik Sharma
Sreevathsan
Year of
publication 2010
Journal IEEE
Summary In this paper the authors propose that the IDS is intended to provide
basic detection techniques in order to secure systems in networks that
are connected to the Internet either directly or indirectly. But, at the
end of the day, it is the Network Administrator's responsibility to
ensure that his network is secure. Although IDS does not entirely
protect the network from intruders, it does assist the Network
Administrator in tracking down bad guys on the Internet whose sole
objective is to infiltrate your network and make it vulnerable to
attacks. To reduce the number of features from 41 to 29, we devised
an intrusion detection approach based on an SVM-based system on a
RST. We also looked into RST's performance.
Title of the paper Survey of intrusion detection systems: techniques, datasets and
challenges
Authors • Ansam Karisat

• Iqbal Gondail
• Peter Vamplew
• Joarder Kamruzzman
Year of publication 2019
Journal Springer Open
Summary In this research paper Cybercriminals utilise sophisticated

tactics and social engineering strategies to target computer
users. Some cybercriminals are becoming more sophisticated
and motivated as time goes on. Cybercriminals have
demonstrated their capacity to conceal their identities, mask
their communications, keep their identities separate from
unlawful gains, and use secure infrastructure. As a result,
advanced intrusion detection systems capable of identifying
contemporary malware are becoming increasingly vital for
computer systems to be safeguarded. To develop and build such
IDS systems, a thorough understanding of the strengths and
limitations of current IDS research is required.
We have included an overview of intrusion detection system

approaches, types, and technologies, as well as their benefits and
drawbacks, in this work. There are several machine learning
approaches available.
Title of the Paper Intrusion Detection System (IDS): Anomaly Detection Using
Outlier Detection Approach
Authors • J. Jabez
• B. Muthukumar
Year of publication 2015
Journal ScienceDirect
Summary Cloud computing is an internet-based technology that is

employed all over the world. This survey looked at privacy and
security challenges in cloud data storage, as well as potential
privacy options for dealing with privacy concerns in cloud
computing's untrusted data stores. In this paper, encryption-
based approaches and auditability systems are employed as
methodology. The correctness of data integrity in cloud storage
may be efficiently confirmed in this work.
TPA cannot learn any data content during public auditing,
preserving identification while performing numerous auditing
tasks at the same time. Data owners will be able to track any
concerns with their data.
The user was recognised as a problematic user based on meta
data verification and auditing. The findings section provides
detailed information about the data.
Neuroscent Networks
Artificial Neural Networks supervised machine learning algorithms are inspired by
the human brain. The main idea is to have many simple units, called neurons,
organized into categories. In particular, in the artificial neural network feed-forward
all layer neurons are connected to all neurons of the next layer, and so on until the
last layer, which contains the output of the neural network.
This type of network is a popular choice among Data Mining Strategies in modern
times, and has proven to be an important Import and Export option.
In this work we use feed-forward neural networks trained in the NSL-KDD database
to classify network connectivity as one of two possible possibilities: normal or
abnormal. The goal of this work is to increase accuracy in identifying new data
samples, while also avoiding overuse, which occurs when the algorithm is heavily
attached to the read data and is unable to perform normally on previously unseen data.
NSL-KDD data set
Dataset had 43 attributes, attribute 'difficulty_level' was dropped.
The database used to train and validate the neural network is the NSL-KDD database,
which is an improved version of the KDD CUP ’99 database. This data set is a well-
known symbol in the field of Network Acquisition strategies, which provides 42
features per example and many amazing examples.

Summary
train rows 125973
test rows 22544
total rows 148517
columns 42
duplicates 629
null values None
As for the features of this dataset, they can be broken down into 4 types (excluding
the target column):
4 Multi-Class
6 Binary
16 Discrete
15 Continuous
The NSL-KDD dataset contains a few binary and multi-class categorical features,
which are used to label each connection

Multi-Class Features
Feature Distinct Values
protocol type 3
service 70
flag 11
su attempted 3
Binary Features
Feature Number of ’0’s
land 99.98%
logged in 59.72%
root shell 99.85%
num outbound cmds 100.00%
is host login 99.99%
is guest login 98.77%

Data Normalization
A standard practice is to re-measure data from the actual range so that all values are
within the new range of 0 and 1.
Familiarity requires that you know or be able to accurately measure minimum and
maximum visible values. You can estimate these values in your available data.
Database measurement involves re-measuring the distribution of values so that the
mean value of the target is 0 and the standard deviation is 1.
This can be thought of as subtracting average or placing data in the center.
As a general rule, configuration can be useful, and is required for some machine
learning algorithms where your data has input values with different scales.
Direct and standard data conversions to stop predictive variables. To stop the forecast
variance, the forecast value is deducted from all values. As a result of placement, the
prediction has a zero meaning. Similarly, to measure data, each value of the prediction
variance is divided by its standard deviation. Data measurement forces values to have
a standard deviation of one.
• 38 DataFrame Number Columns are measured using Standard Scale.
• We will use the default configuration and scale values to remove the definition to
set it to 0.0 and divide by the standard deviation to give a standard deviation of 1.0.
First, the StandardScaler model is defined by default hyperparameters.
• Once defined, we can call the function fit_transform () and transfer it to our database
to create a modified version of our database.

One-hot-encoding
One hot code coding is one way to convert data to optimize the algorithm and get a
better prediction. With one heat, we convert each category value into a new category
column and assign a binary value of 1 or 0 to those columns. The value of each
number is represented as a binary vector. All values are zero, and the index is marked
One hot code is useful for unrelated data to another. Machine learning algorithms
treat numerical order as a value attribute. In other words, they will learn a higher
number as better or more important than a low number.
While this is useful in some ordinal situations, some input data does not have a
category value level, and this can lead to problems with guessing and poor
performance. That’s where one hot code copy saves the day.
One hot code coding makes our training data very useful and clear, and can be easily
modified. By using numerical values, we easily determine the probability of our
values. In particular, a single hot code text is used for our output values, as it provides
predictions with a very different perspective than a single label.
• The columns 'protocol_type', 'service', 'flag' have one hot code using
pd.get_dummies ().
• Dataframe 'category' had 84 attributes after one hot coding.

Binary separation
A copy of DataFrame was created for Binary Partition.
Attack label ('label') is divided into two 'normal' and 'unusual' categories.
'label' is encoded using LabelEncoder (), encoded labels are saved 'in'.
'label' is coded.
Divide into several categories
A copy of DataFrame was created for Multi-Category.
Attack label (label 'label') is divided into five 'normal' categories, 'U2R', 'R2L',
'Probe', 'Dos'.
'label' is encoded using LabelEncoder (), encoded labels are saved 'in'. 'label encoded'.
Feature Domain
In machine learning, features are unique independent features that work as part of
your system. In fact, while making predictions, models use such features to make
predictions. And with the use of feature engineering, new features can be found in
older machine learning features.
Number of features of 'bin_data' - 45
Number of 'big_data' features - 48

The 'bin_data' and 'many_data' attributes are selected using the 'Pearson Correlation
Coefficient'.
Characters with an interaction coefficient greater than 0.5 and the target 'entry'
attribute are selected.
9 adjectives 'count', 'srv_serror_rate', 'serror_rate', 'dst_host_serror_rate',
'dst_host_srv_serror_rate', 'logged_in', 'dst_host_same_srv_rate',
'dst_host_srv_count'.
Number of 'bin_data' attributes after selecting a feature and joining 'Categorical'
DataFrame - 97
Number of 'data_more' adjectives after selecting a feature and joining 'Categorical'
DataFrame - 100
Database division
Divide the database into a 1: 4 Test and Training Rate.
93 out of 97 attributes were selected, to exclude targeted (coded, single-coded, real)
Dividend attribute.
The 'login' attribute is selected as the target attribute.
93 attributes were selected out of 100 attributes, so as not to include the targeted
(coded, hot-coded, real) attribute of Multi-Category.

Tools and Technologies:
Python: It is an interpreted, high-level, general programming language. It is

powerful, fast and open source. With a rich support for tons of libraries and
frameworks related to Deep Learning & Computer Vision.
Recurrence Neural Networks: A recurrent neural network (RNN) is a class of

artificial neural networks where connections between nodes form a directed graph
along a temporal sequence. This allows it to exhibit temporal dynamic behavior.
Keras: Keras is a neural networks library written in Python that is high-level in

nature – which makes it extremely simple and intuitive to use. It works as a wrapper
to low-level libraries like TensorFlow or Theano high-level neural networks library,
written in Python that works as a wrapper to TensorFlow or Theano.
Pickle: Python pickle module is used for serializing and de-serializing python
object structures. The process to converts any kind of python objects (list, dict, etc.)
into byte streams (0s and 1s) is called pickling or serialization or flattening or
marshalling.
Word Embeddings: An embedding is a relatively low-dimensional space into

which you can translate high-dimensional vectors. Embeddings make it easier to do
machine learning on large inputs like sparse vectors representing words. Word
embeddings are computed by applying dimensionality reduction techniques to
datasets of co-occurrence statistics between words in a corpus of text.
References:
• Jiang, Kaiyuan, et al. "Network intrusion detection combined hybrid

sampling with deep hierarchical network." IEEE Access 8 (2020):
32464-32476.
• Jiang, Hui, et al. "Network intrusion detection based on PSO-
XGBoost model." IEEE Access 8 (2020): 58392-58401.
• Zaman, Marzia, and Chung-Horng Lung. "Evaluation of machine
learning techniques for network intrusion detection." NOMS 2018-
2018 IEEE/IFIP Network Operations and Management Symposium.
IEEE, 2018.
• Choudhury, Sumouli, and Anirban Bhowal. "Comparative analysis
of machine learning algorithms along with classifiers for network
intrusion detection." 2015 International Conference on Smart
Technologies and Management for Computing, Communication,
Controls, Energy and Materials (ICSTM). IEEE, 2015.
• Taher, Kazi Abu, Billal Mohammed Yasin Jisan, and Md Mahbubur
Rahman. "Network intrusion detection using supervised machine
learning technique with feature selection." 2019 International
conference on robotics, electrical and signal processing techniques
(ICREST). IEEE, 2019.
• Abdulhammed, Razan, et al. "Features dimensionality reduction
approaches for machine learning based network intrusion
detection." Electronics 8.3 (2019): 322.
• Chowdhury, Md Nasimuzzaman, Ken Ferens, and Mike Ferens.
"Network intrusion detection using machine learning." Proceedings
of the International Conference on Security and Management
(SAM). The Steering Committee of The World Congress in
Computer Science, Computer Engineering and Applied Computing
(WorldComp), 2016.

G82 Major Midsem

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

G82 Major Midsem

Uploaded by

Copyright:

Available Formats

A Network Intrusion Detection System using Neural Networks

MAJOR PROJECT REPORT

DEPARTMENT OF COMPUTER SCIENCE AND I.T.

Mentor Name: Submitted by:

the monitored machines.

classify individual communications as normal or bizarre.

other recent proposed strategies.

from the normal data model observed.

(HIDS) operating systems are operational. on individual hosts or on network devices.

Title of paper Network Intrusion Detection System:

Summary In this paper the authors propose the results of this

Summary In this research, seven distinct machine learning models

Design and Development of an Efficient Network Intrusion

Detection System Using Machine Learning Techniques

Authors • Thomas Rincy N

Summary In this paper, an efficient hybrid NID-Shield NIDS is proposed.

Authors • Ansam Karisat

Journal Springer Open

Summary In this research paper Cybercriminals utilise sophisticated

We have included an overview of intrusion detection system

Year of publication 2015

Summary Cloud computing is an internet-based technology that is

organized into categories. In particular, in the artificial neural network feed-forward

last layer, which contains the output of the neural network.

times, and has proven to be an important Import and Export option.

to classify network connectivity as one of two possible possibilities: normal or

NSL-KDD data set

Dataset had 43 attributes, attribute 'difficulty_level' was dropped.

known symbol in the field of Network Acquisition strategies, which provides 42

features per example and many amazing examples.

train rows 125973

test rows 22544

total rows 148517

null values None

the target column):

which are used to label each connection

Feature Distinct Values

Feature Number of ’0’s

root shell 99.85%

num outbound cmds 100.00%

is host login 99.99%

is guest login 98.77%

within the new range of 0 and 1.

Database measurement involves re-measuring the distribution of values so that the

mean value of the target is 0 and the standard deviation is 1.

This can be thought of as subtracting average or placing data in the center.

a standard deviation of one.

• 38 DataFrame Number Columns are measured using Standard Scale.

First, the StandardScaler model is defined by default hyperparameters.

to create a modified version of our database.

number as better or more important than a low number.

modified. By using numerical values, we easily determine the probability of our

predictions with a very different perspective than a single label.

• Dataframe 'category' had 84 attributes after one hot coding.

A copy of DataFrame was created for Binary Partition.

Divide into several categories

A copy of DataFrame was created for Multi-Category.

older machine learning features.

Number of features of 'bin_data' - 45

Number of 'big_data' features - 48

attribute are selected.

9 adjectives 'count', 'srv_serror_rate', 'serror_rate', 'dst_host_serror_rate',

'dst_host_srv_serror_rate', 'logged_in', 'dst_host_same_srv_rate',