You are on page 1of 20

A Network Intrusion Detection System using Neural Networks

MAJOR PROJECT REPORT


(G82)

DEPARTMENT OF COMPUTER SCIENCE AND I.T.

Mentor Name: Submitted by:


Dr. Manju Shubham Sharma
(18103327)
Mayuri Sahai
(18104039)
Damyanti Singh
(18104032)
Introduction
In view of the increasing number of attacks targeting information systems, a defence system is

essential. Intrusion Detection attempts to detect computer attacks by examining data records

observed by processes on the same network. These attacks are typically split into two categories,

host-based attacks and network-based attacks. Host based attack detection routines normally use

system call data from an audit process that tracks all system calls made on behalf of each user on

a particular machine. These audit processes usually run on each monitored machine. Network-

based attack detection routines typically use network traffic data from a network packet sniffer.

Many computer networks, including the widely accepted Ethernet network, use a shared medium

for communication. Therefore, the packet sniffer only needs to be on the same shared subnet as

the monitored machines.

Problem statement:
Accessibility Login is an important concept in the security of modern computer networks. Rather

than protecting the network from malicious computer programs that are known to block network

connections, such as Intrusion Prevention Systems, Intrusion Detection is intended to analyze the

current state of the network in real time and to identify potential confusing processes, reporting

them as soon as they are detected. With the remarkable advancement of technology, criminals

have been developing even more sophisticated malware attacks making login detection a very

difficult task. In this context, traditional analytical tools face the daunting challenges of

identifying and mitigating these threats. In this work, we present a novel analysis analysis and

intelligent intelligence acquisition system driven by autoencoder (AE) (IDS). Specifically, the

proposed IDS incorporates data analysis and statistical techniques and the latest developments in

machine learning theory to exclude highly improved, highly correlated features. The proposed

IDS is tested using the NSL-KDD benchmark website. The purpose of this work, in particular, is

the production of labeled NIDS data labels that can detect suspicious behavior on the network and

classify individual communications as normal or bizarre.


Significance/ novelty of the problem:

Comparative test results show that the statistical analysis and IDE-based IDS achieve better

classification performance compared to conventional deep and shallow machine learning and

other recent proposed strategies.

Crime Detection Systems are generally divided into the following categories [2]:

• Uncertainty Detection Compared to Acquisition: In the detection of misuse, each event in the

data set is labeled ‘normal’ or ‘dangerous’ and the learning algorithm is trained with labeled data.

Conflict detection methods, on the other hand, create normal data models and detect deviations

from the normal data model observed.

• Network-based compared to host-based: NIDS login systems are located in a strategic location

or points within the network to monitor the return traffic to all network devices, while Hosting

(HIDS) operating systems are operational. on individual hosts or on network devices.


Background Study - Literature Survey

Title of paper Network Intrusion Detection System:


A Machine Learning Approach

Authors
• Mrutyunjaya Panda
• Ajith Abraham
• Swagatam Das
• Manas Ranjan Patra
Year of publication
2011

Journal IEEE

Summary In this paper the authors propose the results of this


study's empirical analysis that show our suggested
methodology, which employs a variety of machine
learning approaches and the NSL-KDD dataset,
performs admirably in all categories of majority
attacks. However, owing of the substantial bias present
in the sample, a low detection rate for U2R and R2L
minority attacks is unavoidable. Our approach is
particularly fascinating because of the low false
positive rate we achieved in identifying both assaults
and routine situations. We also ran a cost-sensitive
classification and discovered that the models have a
low cost of misclassification. Finally, although using
the best available techniques to protect our resources
from network attacks, no system can be guaranteed to
be completely secure. As a result, computer security is
constantly a lively and hard study topic
Title of paper Detailed analysis of Intrusion detection using machine
learning algorithms

Authors
• Samriddhi Verma
• Nithyanandam P.

Year of publication
2020

Journal
IEEE

Summary In this research, seven distinct machine learning models


were used to apply eight various types of classification
algorithms, including Nave-Bayes, Decision Tree, Random
Forest, K Nearest Neighbors, Logistic Regression, and
Support Vector Machine Learning Using the Soft Voting
approach, a machine and a voting classifier are created. For
the intrusion detection system, a hard voting approach is
used. The All of these machine learning models performed
well. Observed and compared using various standards
Accuracy, Precision, False-Positive, and False-Negative are
some of the test data evaluation factors. These are the
classifiers were capable of dealing with such a large number
of high-dimensional data which yields satisfactory
precision. It was discovered that the Hard-voting method is
used in the Voting Classifier algorithm fared better than
other models, with a precision of With an accuracy of 84.233
percent and a precision of 0.83647, The total number of
False-Negatives and False-Positives.
Tittle of paper Network intrusion detection system: A systematic study of machine
learning and deep learning approaches

Authors
• Zeeshan Ahmed
• Adnan Shahid Khan

Year of publication
2020

Journal IEEE

Summary
In this research the AI-powered NIDS' performance is mainly
dependent on training with an appropriate dataset. To improve the
results of ML models, the algorithm might be trained with a limited
dataset. However, unless the dataset is labelled in nature, ML is not
suitable for larger datasets. For larger datasets, DL approaches are
recommended since labelling is an expensive and time-consuming
operation. From raw datasets, these algorithms will learn and uncover
important patterns. To make NIDS effective in detecting zero-day
attacks, it must be updated on a regular basis with new data received
from network traffic monitoring. The vast dataset and deep nature of
the deep learning algorithms will make the learning process more
computationally intensive. There are also some challenges which
they faced like unavailability of systematic dataset, Low detection
accuracy due to imbalance dataset, Resources consumed by complex
models.
Title of paper

Design and Development of an Efficient Network Intrusion

Detection System Using Machine Learning Techniques

Authors • Thomas Rincy N


• Roopam Gupta

Year of
publication 2021

Journal
Hindawi

Summary In this paper, an efficient hybrid NID-Shield NIDS is proposed.


Furthermore, for precise and highly merit feature subsets, CAPPER,
a successful hybrid feature selection method, is used. According to
attack kinds and names, the proposed hybrid NID-Shield NIDS
classifies the UNSW-NB15 and NSL-KDD 20% datasets. Some
attacks, such as R2L and U2R, may have very few N/W connections,
whilst others, such as Probe and DoS, may have a huge number of
N/W connections or be a combination of any of them. Furthermore,
the hybrid NID-Shield NIDS derives individual performance metrics
for attack names identified in the NSL-KDD 20% dataset (DoS,
Probe, U2R, and R2L) and the UNSW-NB15 dataset. This strategy
also aids us.
Title of the Network Intrusion Detection System Based On Machine Learning
Algorithms
paper

Authors
Vipin Das
Vijaya Pathak
Sattvik Sharma

Sreevathsan

Year of
publication 2010

Journal IEEE

Summary In this paper the authors propose that the IDS is intended to provide
basic detection techniques in order to secure systems in networks that
are connected to the Internet either directly or indirectly. But, at the
end of the day, it is the Network Administrator's responsibility to
ensure that his network is secure. Although IDS does not entirely
protect the network from intruders, it does assist the Network
Administrator in tracking down bad guys on the Internet whose sole
objective is to infiltrate your network and make it vulnerable to
attacks. To reduce the number of features from 41 to 29, we devised
an intrusion detection approach based on an SVM-based system on a
RST. We also looked into RST's performance.
Title of the paper Survey of intrusion detection systems: techniques, datasets and
challenges

Authors • Ansam Karisat


• Iqbal Gondail
• Peter Vamplew
• Joarder Kamruzzman
Year of publication 2019

Journal Springer Open

Summary In this research paper Cybercriminals utilise sophisticated


tactics and social engineering strategies to target computer
users. Some cybercriminals are becoming more sophisticated
and motivated as time goes on. Cybercriminals have
demonstrated their capacity to conceal their identities, mask
their communications, keep their identities separate from
unlawful gains, and use secure infrastructure. As a result,
advanced intrusion detection systems capable of identifying
contemporary malware are becoming increasingly vital for
computer systems to be safeguarded. To develop and build such
IDS systems, a thorough understanding of the strengths and
limitations of current IDS research is required.

We have included an overview of intrusion detection system


approaches, types, and technologies, as well as their benefits and
drawbacks, in this work. There are several machine learning
approaches available.
Title of the Paper Intrusion Detection System (IDS): Anomaly Detection Using
Outlier Detection Approach

Authors • J. Jabez
• B. Muthukumar

Year of publication 2015

Journal ScienceDirect

Summary Cloud computing is an internet-based technology that is


employed all over the world. This survey looked at privacy and
security challenges in cloud data storage, as well as potential
privacy options for dealing with privacy concerns in cloud
computing's untrusted data stores. In this paper, encryption-
based approaches and auditability systems are employed as
methodology. The correctness of data integrity in cloud storage
may be efficiently confirmed in this work.
TPA cannot learn any data content during public auditing,
preserving identification while performing numerous auditing
tasks at the same time. Data owners will be able to track any
concerns with their data.
The user was recognised as a problematic user based on meta
data verification and auditing. The findings section provides
detailed information about the data.
Neuroscent Networks
Artificial Neural Networks supervised machine learning algorithms are inspired by

the human brain. The main idea is to have many simple units, called neurons,

organized into categories. In particular, in the artificial neural network feed-forward

all layer neurons are connected to all neurons of the next layer, and so on until the

last layer, which contains the output of the neural network.

This type of network is a popular choice among Data Mining Strategies in modern

times, and has proven to be an important Import and Export option.

In this work we use feed-forward neural networks trained in the NSL-KDD database

to classify network connectivity as one of two possible possibilities: normal or

abnormal. The goal of this work is to increase accuracy in identifying new data

samples, while also avoiding overuse, which occurs when the algorithm is heavily

attached to the read data and is unable to perform normally on previously unseen data.

NSL-KDD data set

Dataset had 43 attributes, attribute 'difficulty_level' was dropped.

The database used to train and validate the neural network is the NSL-KDD database,

which is an improved version of the KDD CUP ’99 database. This data set is a well-

known symbol in the field of Network Acquisition strategies, which provides 42

features per example and many amazing examples.


Summary

train rows 125973

test rows 22544

total rows 148517

columns 42

duplicates 629

null values None

As for the features of this dataset, they can be broken down into 4 types (excluding

the target column):

4 Multi-Class

6 Binary

16 Discrete

15 Continuous

The NSL-KDD dataset contains a few binary and multi-class categorical features,

which are used to label each connection


Multi-Class Features

Feature Distinct Values

protocol type 3

service 70

flag 11

su attempted 3

Binary Features

Feature Number of ’0’s

land 99.98%

logged in 59.72%

root shell 99.85%

num outbound cmds 100.00%

is host login 99.99%

is guest login 98.77%


Data Normalization

A standard practice is to re-measure data from the actual range so that all values are

within the new range of 0 and 1.

Familiarity requires that you know or be able to accurately measure minimum and

maximum visible values. You can estimate these values in your available data.

Database measurement involves re-measuring the distribution of values so that the

mean value of the target is 0 and the standard deviation is 1.

This can be thought of as subtracting average or placing data in the center.

As a general rule, configuration can be useful, and is required for some machine

learning algorithms where your data has input values with different scales.

Direct and standard data conversions to stop predictive variables. To stop the forecast

variance, the forecast value is deducted from all values. As a result of placement, the

prediction has a zero meaning. Similarly, to measure data, each value of the prediction

variance is divided by its standard deviation. Data measurement forces values to have

a standard deviation of one.

• 38 DataFrame Number Columns are measured using Standard Scale.

• We will use the default configuration and scale values to remove the definition to

set it to 0.0 and divide by the standard deviation to give a standard deviation of 1.0.

First, the StandardScaler model is defined by default hyperparameters.

• Once defined, we can call the function fit_transform () and transfer it to our database

to create a modified version of our database.


One-hot-encoding

One hot code coding is one way to convert data to optimize the algorithm and get a

better prediction. With one heat, we convert each category value into a new category

column and assign a binary value of 1 or 0 to those columns. The value of each

number is represented as a binary vector. All values are zero, and the index is marked

One hot code is useful for unrelated data to another. Machine learning algorithms

treat numerical order as a value attribute. In other words, they will learn a higher

number as better or more important than a low number.

While this is useful in some ordinal situations, some input data does not have a

category value level, and this can lead to problems with guessing and poor

performance. That’s where one hot code copy saves the day.

One hot code coding makes our training data very useful and clear, and can be easily

modified. By using numerical values, we easily determine the probability of our

values. In particular, a single hot code text is used for our output values, as it provides

predictions with a very different perspective than a single label.

• The columns 'protocol_type', 'service', 'flag' have one hot code using

pd.get_dummies ().

• Dataframe 'category' had 84 attributes after one hot coding.


Binary separation

A copy of DataFrame was created for Binary Partition.

Attack label ('label') is divided into two 'normal' and 'unusual' categories.

'label' is encoded using LabelEncoder (), encoded labels are saved 'in'.

'label' is coded.

Divide into several categories

A copy of DataFrame was created for Multi-Category.

Attack label (label 'label') is divided into five 'normal' categories, 'U2R', 'R2L',

'Probe', 'Dos'.

'label' is encoded using LabelEncoder (), encoded labels are saved 'in'. 'label encoded'.

Feature Domain

In machine learning, features are unique independent features that work as part of

your system. In fact, while making predictions, models use such features to make

predictions. And with the use of feature engineering, new features can be found in

older machine learning features.

Number of features of 'bin_data' - 45

Number of 'big_data' features - 48


The 'bin_data' and 'many_data' attributes are selected using the 'Pearson Correlation

Coefficient'.

Characters with an interaction coefficient greater than 0.5 and the target 'entry'

attribute are selected.

9 adjectives 'count', 'srv_serror_rate', 'serror_rate', 'dst_host_serror_rate',

'dst_host_srv_serror_rate', 'logged_in', 'dst_host_same_srv_rate',

'dst_host_srv_count'.

Number of 'bin_data' attributes after selecting a feature and joining 'Categorical'

DataFrame - 97

Number of 'data_more' adjectives after selecting a feature and joining 'Categorical'

DataFrame - 100

Database division

Divide the database into a 1: 4 Test and Training Rate.

93 out of 97 attributes were selected, to exclude targeted (coded, single-coded, real)

Dividend attribute.

The 'login' attribute is selected as the target attribute.

93 attributes were selected out of 100 attributes, so as not to include the targeted

(coded, hot-coded, real) attribute of Multi-Category.


Tools and Technologies:

Python: It is an interpreted, high-level, general programming language. It is


powerful, fast and open source. With a rich support for tons of libraries and
frameworks related to Deep Learning & Computer Vision.

Recurrence Neural Networks: A recurrent neural network (RNN) is a class of


artificial neural networks where connections between nodes form a directed graph
along a temporal sequence. This allows it to exhibit temporal dynamic behavior.

Keras: Keras is a neural networks library written in Python that is high-level in


nature – which makes it extremely simple and intuitive to use. It works as a wrapper
to low-level libraries like TensorFlow or Theano high-level neural networks library,
written in Python that works as a wrapper to TensorFlow or Theano.

Pickle: Python pickle module is used for serializing and de-serializing python
object structures. The process to converts any kind of python objects (list, dict, etc.)
into byte streams (0s and 1s) is called pickling or serialization or flattening or
marshalling.

Word Embeddings: An embedding is a relatively low-dimensional space into


which you can translate high-dimensional vectors. Embeddings make it easier to do
machine learning on large inputs like sparse vectors representing words. Word
embeddings are computed by applying dimensionality reduction techniques to
datasets of co-occurrence statistics between words in a corpus of text.
References:

• Jiang, Kaiyuan, et al. "Network intrusion detection combined hybrid


sampling with deep hierarchical network." IEEE Access 8 (2020):
32464-32476.
• Jiang, Hui, et al. "Network intrusion detection based on PSO-
XGBoost model." IEEE Access 8 (2020): 58392-58401.
• Zaman, Marzia, and Chung-Horng Lung. "Evaluation of machine
learning techniques for network intrusion detection." NOMS 2018-
2018 IEEE/IFIP Network Operations and Management Symposium.
IEEE, 2018.
• Choudhury, Sumouli, and Anirban Bhowal. "Comparative analysis
of machine learning algorithms along with classifiers for network
intrusion detection." 2015 International Conference on Smart
Technologies and Management for Computing, Communication,
Controls, Energy and Materials (ICSTM). IEEE, 2015.
• Taher, Kazi Abu, Billal Mohammed Yasin Jisan, and Md Mahbubur
Rahman. "Network intrusion detection using supervised machine
learning technique with feature selection." 2019 International
conference on robotics, electrical and signal processing techniques
(ICREST). IEEE, 2019.
• Abdulhammed, Razan, et al. "Features dimensionality reduction
approaches for machine learning based network intrusion
detection." Electronics 8.3 (2019): 322.
• Chowdhury, Md Nasimuzzaman, Ken Ferens, and Mike Ferens.
"Network intrusion detection using machine learning." Proceedings
of the International Conference on Security and Management
(SAM). The Steering Committee of The World Congress in
Computer Science, Computer Engineering and Applied Computing
(WorldComp), 2016.

You might also like