You are on page 1of 13

Information Analysis and Audit

(CSE-3501)
Submitted to: Sir B. Chandra Mohan

Name – Rajvansh Singh Chhabra


Reg no 20BCE2689
Topic: Intrusion Detection System
Abstract:
Cyber-attacks are becoming more and more sophisticated, and thus present
greater challenges in accurate detection penetrations. Failure to prevent intrusions
could reduce the credibility of security services, example: data confidentiality
integrity and availability. Many intrusion detection methods have been proposed
in the literature computer security threats that can be broadly classified into
signature-based intrusion detection systems (SIDS) and Anomaly Based Intrusion
Detection Systems (AIDS). This overview presents a taxonomy of current IDS, as a
comprehensive overview of notable recent work and an overview of datasets
commonly used for evaluation purposes. It also presents evasion techniques used
by attackers to avoid detection and discusses future research challenges facing
such techniques to make computer systems more secure.

Introduction:
The development of malicious software (malware) represents a fundamental
challenge for intrusion detection design systems (IDS). Malicious attacks have
become more sophisticated and the main challenge is to identify unknown and
cloaked malware, according to malware authors use various evasion techniques to
hide information to avoid IDS detection. Plus there there has been an increase in
security threats such as zero-day attacks targeting Internet users. Therefore,
computer security has become essential as the use of information technology has
become a part of our daily lives. As a result, various countries such as Australia and
the USA was significantly affected by zero-day attacks.

High-profile cases of cybercrime have been proven the ease with which cyber
threats can spread internationally, as a simple compromise can disrupt a business's
essential services or facilities. There are a large number cybercriminals around the
world motivated to steal information, illegitimately receive income and seek new
targets. Malware is deliberately created for the purpose of compromise computer
systems and exploit any weaknesses in intrusion detection systems. In 2017,
Australian Cyber Security Center (ACSC) has critically examined various the level
of sophistication used by attackers (Australian, 2017). So there is a need to develop
an effective IDS detect new, sophisticated malware. The goal of IDS is identify
different types of malware as soon as possible, which cannot be achieved by a

PAGE 1
traditional firewall. With the growing volume of computer malware, improved IDS
has become extremely important.

Intrusion Detection System


A breach can be defined as any type of unauthorized activity that causes damage to
an information system. This means any attack that could pose a possible threat to
confidentiality, integrity or availability of information considered an intrusion. For
example, activities that would so that computer services stop responding to
legitimate users are considered a breach. IDS is software or a hardware system that
identifies malicious actions on computer systems to enable system security
maintained. The goal of IDS is identify different types of malicious network traffic
and computer usage that cannot be identified by a traditional firewall. This is
necessary to achieve high protection against actions that threaten the availability,
integrity, or confidentiality of computer systems. IDS systems can generally be
divided into two groups: Signature-based Intrusion Detection System (SIDS) and
anomaly based Intrusion Detection System (AIDS).

Types:
Signature-based intrusion detection systems (SIDS)
Signature intrusion detection systems (SIDS) are the basis on pattern matching
techniques to find a known attack. These are also known as Knowledge-based
Detection or Abuse detection. In SIDS, matching methods are used to find the
previous violation. In other words, when the breach signature matches a signature
of a previous breach that already exists an alarm signal is triggered in the signature
database. For SIDS, the host's logs are checked to find the sequences commands or
actions that were previously identified as malware. SIDS has also been referred to
as Knowledge-Based Detection or Misuse.

The main idea is to build a database of intrusion signatures and to compare the
current set of activities against the existing signatures and raise an alarm if a
match is found.

SIDS typically provide excellent detection accuracy previously known


intersections. However, SIDS has difficulty detecting zeroday attacks because there

PAGE 2
is no corresponding signature it exists in the database until a new attack signature
is extracted and stored.

The increasing rate of zero-day attacks has made SIDS techniques progressively
less effective because there is no prior signature for any such attacks. Polymorphic
variants of malware and the increasing number of targeted attacks may further
undermine the adequacy of this traditional paradigm. A potential solution to this
problem would be the use of AIDS(Anamoly Based) techniques that work by
profiling what is acceptable behavior rather than what is anomalous.

Anomaly-based intrusion detection system (AIDS)


AIDS has attracted the interest of many scholars because of the ability to
overcome the limitations of SIDS. In AIDS, a the normal model of computer
system behavior is created using machine learning, based on statistical or
knowledge methods. Any significant deviation between the observed behavior and
the model is taken into account as an anomaly that can be interpreted as a
disturbance. The assumption for this group of techniques is that malicious
behavior differs from typical user behavior. The abnormal user behavior that does
not resemble standard behavior is classified as a violation. The development of
AIDS involves two phases: the training phase and testing phase. Normal in the
training phase the traffic profile is used to train the normal behavior model and
then the new data set is used in the testing phase create the system's ability to
generalize to previously unseen disruptions. AIDS can be classified as a number of
categories according to the method used training, for example based on statistics,
on knowledge and based on machine learning.

The main advantage of AIDS is the ability to identify zero-day attacks due to the
fact that recognition of abnormal user activity does not rely on the signature
database. AIDS triggers a danger signal when the investigated behavior differs
from the usual behavior. In addition, AIDS has various benefits. First, they have
the ability to detect internal harmful substances activities. If the intruder starts
transacting in a stolen accounts that are not identified with the typical user activity
will generate an alarm. Second, it is very difficult a cybercriminal who recognizes
what is normal user behavior without issuing an alert like the system made from
customized profiles.

PAGE 3
Detection Methodologies:
 Statistics based: analyzes the network traffic using complex statistical
algorithms to process the information.
 Pattern-based: identifies the characters, forms, and patterns in the data.
 Rule-based: uses an attack “signature” to detect a potential attack on the
suspicious network traffic.
 State-based: examines a stream of events to identify any possible attack.
 Heuristic-based: identifies any abnormal activity that is out of the ordinary
activity.

State-based techniques - A statistics-based IDS creates a distribution model for


normal behavior profile, then detects a low probability events and flag them as
potential intrusions. Statistical AIDS essentially takes into account statistical
metrics such as median, mean, mode, and standard deviation packages. In other
words, rather than checking the data traffic, each packet is monitored, which
means a flow footprint. Statistical AIDS is used to identify any type of differences
in current behavior from normal behavior.

Knowledge-based techniques - This group of techniques is also referred to as


expert system method. This approach requires the creation of a knowledge base
that reflects a legitimate traffic profile. Actions that deviate from this standard
profile are considered violations. Unlike other classes of AIDS, a standard profile
model is usually created based on human knowledge in terms of a set of rules that
try to define the normal operation of the system. The main benefit of knowledge
techniques is the ability to reduce false positives since then the system has
knowledge of all normal behaviors. However, in a dynamically changing
computing environment, this kind of IDS needs regular knowledge updating for
expected normal behaviors, which is a time-consuming task because gathering
information about all normal behaviors is very difficult.

Machine Learning techniques - Machine learning is the process of acquiring


knowledge from a large amount of data. Machine learning models consist of a set
of rules, methods, or complex "transfer functions" that can be used to find
interesting data patterns, or recognize or predict behavior. Machine learning

PAGE 4
techniques have been widely used in the field of AIDS. Several algorithms and
techniques like clustering, neural networks, association rules, decision trees,
genetic algorithms and nearest neighborhood methods, were used for discovery
knowledge from disturbance datasets.

Various AIDS were created based on the machine learning techniques. The goal
using machine learning techniques is to create an IDS with improved accuracy and
less requirement for human expertise. In the last few years, the amount of AIDS
that the number of machine learning methods used is increasing.

The key focus of IDS based on machine learning research is to detect patterns and
build an intrusion detection system based on the data set. Generally, there are two
kinds of machine learning methods, supervised and unsupervised.

Supervised
Decision Tree:

The decision tree consists of three basic components. The first component is the
decision node, which is used to identify the test attribute. The second is a branch
where each branch represents a possible decision based on the value of the test
attribute. The third is a leaf which includes the class to which the instance
belongs. There are many different decisions tree algorithms including ID3
(Quinlan, 1986), C4.5 (Quinlan, 2014) and CART (Breiman, 1996).

PAGE 5
Naive Bayes:

This approach is application based Bayes principle with robust independence


assumptions between attributes. Naive Bayes answers the questions such as “what
is the probability that a certain species given the observed system activities, is an
attack”? by applying conditional probability formulas. Naïve Bayes relies on
features that have different probabilities of occurrence in attacks and normal
behavior. Naive Bayes classification model is one of the most prevalent models in
IDS due to ease of use and efficiency calculation.

Genetic Algorithms:

Genetic algorithms are a heuristic approach to optimization, based on the


principles of evolution. Each possible solution is shown as a series of bits (genes)
or chromosomes, and the quality of the solution improves over time by applying

PAGE 6
selection and reproduction operators, biased prefer fitter solutions. When applying
a genetic algorithm to a disturbance classification problem usually exists two types
of chromosome coding: one is by clustering to generate binary chromosome
encoding another method is to specify the cluster center (prototype clustering
matrix) using an integer coding chromosome.

Artificial Neural Networks:

ANN is one of the the most widespread machine learning methods and has they
have been shown to be successful in detecting a variety of malware. The most
frequently used teaching technique for supervised learning is the backpropagation
(BP) algorithm. The BP algorithm assesses the error gradient of the network with
respect to its modifiable weights. However, for ANN-based IDS, detection
accuracy, especially for less frequent attacks, and detection accuracy it still needs
to be improved. The training dataset for less frequent attacks is small compared to
more frequent attacks, which makes ANN difficult to properly learn the properties
of these attacks. As a result, detection accuracy is lower for less frequent attacks. In
the field of information security, enormous damage can occur if low frequency
attacks are not detected. For example, if User to Root (U2R) attacks evade
detection, a cybercriminal can gain root authorization rights users and thereby
perform malicious activities on victims' computer systems.

KNN:

K-Nearest-neighbor (k-NN) technique is typically non-parametric a classifier used


in machine learning. The idea behind these techniques is to name the unlabeled
sample data into a class of its k nearest neighbors (where k is an integer defining
the number of neighbors to consider).

PAGE 7
Unsupervised
K-means:

K-means techniques are one of the most prevalent cluster analysis techniques that
aim to separate 'n' data objects into 'k' clusters in which each the data object is
selected in the cluster with the closest one mean. It is a distance based clustering
technique and it doesn't need to calculate the distances between all of them
combination of records. It applies the Euclidean metric as similarity measure. The
number of clusters is predetermined by the user. Usually several solutions will be
tested before accepting the most suitable one. K-means clustering algorithm to
identify different host behavior profiles. They proposed new distance metrics that
can be used in the k-means algorithm to tightly connect clusters.

PAGE 8
Hierarchal Clustering:

This is a clustering technique which aims to create a hierarchy of clusters.


Approaches for hierarchical clustering are normally classified into two categories:

 Agglomerative.
 Divisive.

Evaluation Matrix:

Performance Metrix for IDS:


IDS are typically evaluated based on the following standard performance measures:

 True Positive Rate (TPR): It is calculated as the ratio between the number of
correctly predicted attacks and the total number of attacks.
 False Positive Rate (FPR): It is calculated as the ratio between the number of
normal instances incorrectly classified as an attack and the total number of normal
instances.

PAGE 9
 False Negative Rate (FNR): False negative means when a detector fails to identify
an anomaly and classifies it as normal.
 Classification rate (CR) or Accuracy: The CR measures how accurate the IDS is in
detecting normal or anomalous traffic behavior.

Intrusion Detection Datasets:

1. DARPA.
2. KDD.
3. NSL-KDD.
4. ADFA-LD.

Types of Computer Attacks:


Cyber-attacks can be categorized based on the activities and targets of the
attacker. Each attack type can be classified into one of the following four classes:

 Denial-of-Service (DoS) attacks have the objective of blocking or restricting


services delivered by the network, computer to the users.
 Probing attacks have the objective of acquisition of information about the
network or the computer system.
 User-to-Root (U2R) attacks have the objective of a non-privileged user
acquiring root or admin-user access on a specific computer or a system on
which the intruder had user level access.
 Remote-to-Local (R2L) attacks involve sending packets to the victim
machine.

Challenges for IDS:


Although there has been a lot of research on IDS, many essential matters remain.
IDSs have to be more accurate, with the capability to detect a varied ranging of
intrusions with fewer false alarms and other challenges .

PAGE 10
Conclusion:
Cybercriminals target computer users with sophisticated techniques as well as
social engineering strategy. Some cybercriminals are becoming increasingly
sophisticated and motivated. Cybercriminals have they showed their ability to hide
their identity, to hide their communications, distance their identity from illicit
profits and use an infrastructure that is resistant to compromise. Therefore, it is
increasingly important that computer systems are protected with advanced
intrusion detection systems that are capable of modern malware detection. To be
able to design and build such IDS systems must be complete an overview of the
strengths and limitations of current IDS research.

In addition, the most popular public datasets used for IDS research and their data
collection techniques, evaluation results, and limitations were explored was
discussed. Because routine activities change frequently and may not remain
effective over time, there is the need for newer and more comprehensive datasets
which contain a wide range of malware activities. A new malware dataset is needed
because most existing machine learning techniques are trained and evaluated on
knowledge provided by a legacy dataset such as DARPA/ KDD, which do not
include more recent malware activity. Therefore, testing is done using these
collected data sets only in 1999 because they are publicly available and no other
alternative and acceptable data files are available. Although these datasets are
widely accepted as benchmarks, no longer represent current zero-day attacks.
Although The ADFA dataset contains many new attacks, it is not adequate. For
that reason, AIDS testing using these datasets do not offer a true evaluation and
could result in inaccurate claims about their effectiveness.

Reference:
(PDF) INTRUSION DETECTION SYSTEM (researchgate.net)

Survey of intrusion detection systems: techniques, datasets and challenges | Cybersecurity


| Full Text (springeropen.com)

PAGE 11
Thanking you,

With regards,

Rajvansh.

PAGE 12

You might also like