Professional Documents
Culture Documents
(CSE-3501)
Submitted to: Sir B. Chandra Mohan
Introduction:
The development of malicious software (malware) represents a fundamental
challenge for intrusion detection design systems (IDS). Malicious attacks have
become more sophisticated and the main challenge is to identify unknown and
cloaked malware, according to malware authors use various evasion techniques to
hide information to avoid IDS detection. Plus there there has been an increase in
security threats such as zero-day attacks targeting Internet users. Therefore,
computer security has become essential as the use of information technology has
become a part of our daily lives. As a result, various countries such as Australia and
the USA was significantly affected by zero-day attacks.
High-profile cases of cybercrime have been proven the ease with which cyber
threats can spread internationally, as a simple compromise can disrupt a business's
essential services or facilities. There are a large number cybercriminals around the
world motivated to steal information, illegitimately receive income and seek new
targets. Malware is deliberately created for the purpose of compromise computer
systems and exploit any weaknesses in intrusion detection systems. In 2017,
Australian Cyber Security Center (ACSC) has critically examined various the level
of sophistication used by attackers (Australian, 2017). So there is a need to develop
an effective IDS detect new, sophisticated malware. The goal of IDS is identify
different types of malware as soon as possible, which cannot be achieved by a
PAGE 1
traditional firewall. With the growing volume of computer malware, improved IDS
has become extremely important.
Types:
Signature-based intrusion detection systems (SIDS)
Signature intrusion detection systems (SIDS) are the basis on pattern matching
techniques to find a known attack. These are also known as Knowledge-based
Detection or Abuse detection. In SIDS, matching methods are used to find the
previous violation. In other words, when the breach signature matches a signature
of a previous breach that already exists an alarm signal is triggered in the signature
database. For SIDS, the host's logs are checked to find the sequences commands or
actions that were previously identified as malware. SIDS has also been referred to
as Knowledge-Based Detection or Misuse.
The main idea is to build a database of intrusion signatures and to compare the
current set of activities against the existing signatures and raise an alarm if a
match is found.
PAGE 2
is no corresponding signature it exists in the database until a new attack signature
is extracted and stored.
The increasing rate of zero-day attacks has made SIDS techniques progressively
less effective because there is no prior signature for any such attacks. Polymorphic
variants of malware and the increasing number of targeted attacks may further
undermine the adequacy of this traditional paradigm. A potential solution to this
problem would be the use of AIDS(Anamoly Based) techniques that work by
profiling what is acceptable behavior rather than what is anomalous.
The main advantage of AIDS is the ability to identify zero-day attacks due to the
fact that recognition of abnormal user activity does not rely on the signature
database. AIDS triggers a danger signal when the investigated behavior differs
from the usual behavior. In addition, AIDS has various benefits. First, they have
the ability to detect internal harmful substances activities. If the intruder starts
transacting in a stolen accounts that are not identified with the typical user activity
will generate an alarm. Second, it is very difficult a cybercriminal who recognizes
what is normal user behavior without issuing an alert like the system made from
customized profiles.
PAGE 3
Detection Methodologies:
Statistics based: analyzes the network traffic using complex statistical
algorithms to process the information.
Pattern-based: identifies the characters, forms, and patterns in the data.
Rule-based: uses an attack “signature” to detect a potential attack on the
suspicious network traffic.
State-based: examines a stream of events to identify any possible attack.
Heuristic-based: identifies any abnormal activity that is out of the ordinary
activity.
PAGE 4
techniques have been widely used in the field of AIDS. Several algorithms and
techniques like clustering, neural networks, association rules, decision trees,
genetic algorithms and nearest neighborhood methods, were used for discovery
knowledge from disturbance datasets.
Various AIDS were created based on the machine learning techniques. The goal
using machine learning techniques is to create an IDS with improved accuracy and
less requirement for human expertise. In the last few years, the amount of AIDS
that the number of machine learning methods used is increasing.
The key focus of IDS based on machine learning research is to detect patterns and
build an intrusion detection system based on the data set. Generally, there are two
kinds of machine learning methods, supervised and unsupervised.
Supervised
Decision Tree:
The decision tree consists of three basic components. The first component is the
decision node, which is used to identify the test attribute. The second is a branch
where each branch represents a possible decision based on the value of the test
attribute. The third is a leaf which includes the class to which the instance
belongs. There are many different decisions tree algorithms including ID3
(Quinlan, 1986), C4.5 (Quinlan, 2014) and CART (Breiman, 1996).
PAGE 5
Naive Bayes:
Genetic Algorithms:
PAGE 6
selection and reproduction operators, biased prefer fitter solutions. When applying
a genetic algorithm to a disturbance classification problem usually exists two types
of chromosome coding: one is by clustering to generate binary chromosome
encoding another method is to specify the cluster center (prototype clustering
matrix) using an integer coding chromosome.
ANN is one of the the most widespread machine learning methods and has they
have been shown to be successful in detecting a variety of malware. The most
frequently used teaching technique for supervised learning is the backpropagation
(BP) algorithm. The BP algorithm assesses the error gradient of the network with
respect to its modifiable weights. However, for ANN-based IDS, detection
accuracy, especially for less frequent attacks, and detection accuracy it still needs
to be improved. The training dataset for less frequent attacks is small compared to
more frequent attacks, which makes ANN difficult to properly learn the properties
of these attacks. As a result, detection accuracy is lower for less frequent attacks. In
the field of information security, enormous damage can occur if low frequency
attacks are not detected. For example, if User to Root (U2R) attacks evade
detection, a cybercriminal can gain root authorization rights users and thereby
perform malicious activities on victims' computer systems.
KNN:
PAGE 7
Unsupervised
K-means:
K-means techniques are one of the most prevalent cluster analysis techniques that
aim to separate 'n' data objects into 'k' clusters in which each the data object is
selected in the cluster with the closest one mean. It is a distance based clustering
technique and it doesn't need to calculate the distances between all of them
combination of records. It applies the Euclidean metric as similarity measure. The
number of clusters is predetermined by the user. Usually several solutions will be
tested before accepting the most suitable one. K-means clustering algorithm to
identify different host behavior profiles. They proposed new distance metrics that
can be used in the k-means algorithm to tightly connect clusters.
PAGE 8
Hierarchal Clustering:
Agglomerative.
Divisive.
Evaluation Matrix:
True Positive Rate (TPR): It is calculated as the ratio between the number of
correctly predicted attacks and the total number of attacks.
False Positive Rate (FPR): It is calculated as the ratio between the number of
normal instances incorrectly classified as an attack and the total number of normal
instances.
PAGE 9
False Negative Rate (FNR): False negative means when a detector fails to identify
an anomaly and classifies it as normal.
Classification rate (CR) or Accuracy: The CR measures how accurate the IDS is in
detecting normal or anomalous traffic behavior.
1. DARPA.
2. KDD.
3. NSL-KDD.
4. ADFA-LD.
PAGE 10
Conclusion:
Cybercriminals target computer users with sophisticated techniques as well as
social engineering strategy. Some cybercriminals are becoming increasingly
sophisticated and motivated. Cybercriminals have they showed their ability to hide
their identity, to hide their communications, distance their identity from illicit
profits and use an infrastructure that is resistant to compromise. Therefore, it is
increasingly important that computer systems are protected with advanced
intrusion detection systems that are capable of modern malware detection. To be
able to design and build such IDS systems must be complete an overview of the
strengths and limitations of current IDS research.
In addition, the most popular public datasets used for IDS research and their data
collection techniques, evaluation results, and limitations were explored was
discussed. Because routine activities change frequently and may not remain
effective over time, there is the need for newer and more comprehensive datasets
which contain a wide range of malware activities. A new malware dataset is needed
because most existing machine learning techniques are trained and evaluated on
knowledge provided by a legacy dataset such as DARPA/ KDD, which do not
include more recent malware activity. Therefore, testing is done using these
collected data sets only in 1999 because they are publicly available and no other
alternative and acceptable data files are available. Although these datasets are
widely accepted as benchmarks, no longer represent current zero-day attacks.
Although The ADFA dataset contains many new attacks, it is not adequate. For
that reason, AIDS testing using these datasets do not offer a true evaluation and
could result in inaccurate claims about their effectiveness.
Reference:
(PDF) INTRUSION DETECTION SYSTEM (researchgate.net)
PAGE 11
Thanking you,
With regards,
Rajvansh.
PAGE 12