Professional Documents
Culture Documents
INTRODUCTION
The intrusion detection systems (IDS) are defined as efficient security tools which
are used for improving the security of the communicating and the information
antivirus software, and can access the control schemes. The IDS is classified
For the signature based detection systems, the systems identify the traffic pattern
or the application data as malicious and this requires an updated database for
storing all the new attack signatures, whereas the anomaly detection system
compares all activities against the normal defined behavior (Agrawal, 2015).
The main objective of the IDS system is detecting and then raising an alarm if the
network is attacked. The best IDS process detects the new or more malicious
attacks within a short time period and carries out the necessary actions. The
currently used IDS systems do not show 100% accuracy, hence, this study has
been carried out for improving and increasing the IDS system accuracy (Sheta
and Alamleh, 2017). Many of the machine learning techniques have been used for
helping in the detection of the network attacks, improving the accuracy detection
The IDS systems are classified as per the detection methods used for identifying
main task of the technology expert is to provide a secured data in terms of data
Vaghamshi, 2013).
dangerous effect, such as information theft and many others. Information gotten
from a private network, let’s say a governmental information can be used against
the progress of the society which can lead to pandemonium and panic in the
society at large.
avoiding programming errors and firewalls are used as the first line of defence for
(Summers, 1997). They are generally unable to fully protect against malicious
mobile code, insider attacks and all other forms of intrusion. Programming errors
computer systems are likely to remain unsecured for the foreseeable future.
systems despite the prevention techniques. Intrusion detection is useful not only
measures(Sundaram, 1996).
importance as the rate at which the said IDS can effectively and efficiently
domains. Of these, anomalies and outliers are two terms used most commonly in
finds extensive use in a wide variety of applications such as fraud detection for
credit cards, insurance or healthcare, intrusion detection for cyber-security, fault
detection in safety critical systems, and military surveillance for enemy activities
(Varun, Arindam, & Vipin, 2009). Misuse detection method using a rule-based
regularly, because if the signature is not included in its library this type of IDS is
from normal behaviour profile. Despite being able to detect unknown attacks, the
probability of high false alarm is considerable (Muda, Yassin, Sulaiman & Udzir,
violations. These policy violations range from external attackers trying to gain
such as Naïve Bayesian (NB), Neural Network (NN), Support Vector Machine
(SVM), K-Nearest Neighbors (KNN), Fuzzy Logic model, and Genetic Algorithm
(Chiba, 2017) have been widely used in the last decades. However, there exist
imbalanced detection rates for different types of attacks, high false alarm rates,
redundancy of input attributes as well in the training data (Rida and Omri, 2016)
also suggested removal of noise and redundancy in dataset will be handy in
features may be redundant since the information they add is contained in other
features. Extra features can increase computation time, and can impact the
These gaps majorly on the detection accuracy and the false alarm are what this
project seeks to address. Hence, how to improve the detection accuracy has
algorithm) to reduce the detection rate, accuracy and reduce the false alarm rate.
sniffers and access control lists aid in preventing attackers from gaining easy
factors make the need for a proper security framework even more
Detection Systems can play such a role. Therefore a need exists for development
of effective and efficient algorithms and intrusion detection systems for this
purpose.
Data mining: The term data mining is frequently used to designate the process of
modeling.
program which helps to identify the malicious program which enter our system or
in network.
extracting useful knowledge from large data sets. Data mining, by contrast, refers
to one particular step in this process. Spherically, the data mining step applies so
This project study is organized as follows. Chapter two of this project work
consists of review of literature and important concept that are relevant to the
Chapter three presents the methodology. In Chapter four the system design and
LITERATURE REVIEW
the biological point of view is the principle of distinction between self-cells and
trying to enter into a system. They have chosen back propagation algorithm
(BPN) as the learning algorithm for their artificial neural network of type
Erza Aminanto et al. (2017) have proposed an anomaly detection system to detect
Tao Ma et al. (2018) have proposed a novel approach called KDSVM, which
utilized the K-mean clustering technique and advantage of feature learning with
deep neural network (DNN) model and strong classifier of support vector
Panda et al. (2018) stated that integrating a hybrid intelligent scheme, which
needed different classifiers to be implemented, would improve the detection and
make it very genuine, thereby improving the result quality. In this paper, the
researchers have applied a 2-class classification
strategy which is based on the 10-fold cross validation process, which would
increase the rate of intrusion detection and also decrease the rate of false alarms.
Aburomman and Reaz (2016) carried out a study which described the different
algorithms used for classifying the intrusions based on a popular machine
learning method. They studied different homogeneous or heterogeneous systems
along with various hybrid techniques. They stated that implementing the
ensemble-based techniques helped in solving the pattern classification-based
problems.
Security measures have failed in many cases to stop the wide variety of possible
automatically scan network activity and detect such intrusion attacks, providing
action. A strong case can be made for the use of data mining techniques to
Salvatore, 1998). Intrusion is a type of malicious activity that tries to deny the
security aspects of a computer system. Intrusion detection is a process of
events and inspecting them for sign of malicious acts (Maharaj & Khanna, 2014).
Maharaj and Khanna (2014) states ‘the primary goal of intrusion detection is to
peculiar effects without raising too many false alarms’. Intrusion detection is an
area growing in significance as more and more sensitive data are stored and
monitors information systems and raises alarms when security violations are
computer activities for the purpose of finding security violations. The security of
have been used to protect computer systems as a first line of defense. Intrusion
prevention alone is not sufficient because as systems become ever more complex,
there are always exploitable weaknesses in the systems due to design and
programming errors. Now a day, intrusion detection is one of the high priority
computer systems play increasingly vital roles in modern society, they have
functions:
properly authorized.
the moment they are transmitted to the moment they are actually
versus normal traffic. There are two useful method of classification for intrusion
detection systems is according to data source. Each has a distinct approach for
monitoring, securing data and systems. There are two following general
Hemalatha, 2011).
2.4 INTRUSION DETECTION APPROACHES
The signatures of some attacks are known, whereas other attacks only
reflect some deviation from normal patterns. Consequently, two main approaches
always reflect some deviations from normal patterns. Anomaly detection may be
divided into static and dynamic anomaly detection. A static anomaly detector
based on the assumption that there is a portion of the system being monitored that
does not change. The static portion of a system is the code for the system and the
constant portion of data upon which the correct functioning of the system
depends. For example, the operating systems software and data to bootstrap a
computer never change. If the static portion of the system ever deviates from its
original form, an error has occurred or an intruder has altered the static portion of
record all events they only record events of interest. Therefore only behavior that
results in an event that is recorded in the audit will observed and these events may
occur in a sequence.
and known attack patterns. Misuse detection is concerned with finding intruders
known vulnerabilities and eliminate them. The term intrusion scenario is used as a
scenarios to ensure that one or more attackers are not attempting to exploit known
modeled.
Detection
detect only the attacks for which they are trained to detect. Novel attacks or
particularly if they fit the established profile of the user. Once detected, it is often
difficult to characterize the nature of the attack for forensic purposes. Finally a
high false positive rate may result for a narrowly trained detection algorithm, or
conversely, a high false negative rate may result for a broadly trained anomaly
detection approach.
approach and the misuse detection approach (Lunt, 1989). These techniques seek
advantage over the singular use of either method separately, the use of a
of two knowledgebase for the intrusion detection system will increase the amount
of system resources which must be dedicated to the system (Cannady & Harrell,
1996). Additional disk space will be required for the storage of the profiles, and
user activities with information in the dual knowledge bases. In addition, the
technique will share the disadvantage of either method individually in its inability
capable of identifying attacks which may occur over an extended period of time, a
series of user sessions, or by multiple attackers working in concert. This approach
is effective in reducing the need to review a potentially large amount of audit data
Figure 2.3 shows taxonomy of Intrusion Detection Systems. More details and
information on the various IDS systems and the way they work can be found in
(Mitchell, 2005).
Basically almost all authors or researcher categorize all intrusion attack into four
different types. A research work done by Jaiganesh et al (2013) detailed that there
are four different types of attacks made on a network based intrusion detection
system.
1. Denial of service attack (DoS): It is an attack in which the
attacker makes the memory too busy or too full to handle the
requests.
network of computers.
However, Patel et al. (2013), discovered that the type of attack is more than four,
thus they stated the classes in KDD’ 99 dataset can be categorized into five main
classes (one normal class and four main intrusion classes: PROBE, DOS, U2R,
and R2L).
violations. These policy violations range from external attackers trying to gain
unauthorized access to insiders abusing their access. Current IDS have a number
of significant drawbacks:
1. Current IDS are usually tuned to detect known service level network
attacks.
2. Data overload: Another aspect which does not relate directly to
tools employed by a company and its size there is the possibility for
4. False negatives: This is the case where an IDS does not generate an
Data mining can help improve intrusion detection by addressing each and every
real attacks.
To accomplish these tasks, data miners employ one or more of the following
techniques:
I. Data summarization with statistics, including finding outliers
belongs
1996). The main function of the model that we are interested in is classification,
Schatz, 1999). We are also interested in link and sequence analysis (Eric
Bloedron et al., 2001). Additionally, data mining systems provide the means to
easily perform data summarization and visualization, aiding the security analyst
in identifying areas of concern (Eric Bloedron et al., 2001). The models must be
include rules, decision trees, linear and non-linear functions (including neural
finished and, therefore, we can compute all the features and check the
Many real-time IDSs will start to drop packets when flooded with data
real time. (Lee et al., 1998). (Ghosh, Schwartzbar & Schatz, 1999) were
detection models. They also develop efficient approaches that use statistics
called ”Judge”, was also developed to test and evaluate the use of those
activities.
The use of multiple sensors to collect data by various sources has been
of an IDS.
1. Lee et al (1998), state that using multiple sensors for ID should
three layers:
manual programming.
3. Software applications that customize to the individual user’s
There are several reasons why data mining approaches plays a role in these three
domains. First of all, for the classification of security incidents, a vast amount of
data has to be analyzed containing historical data. It is difficult for human beings
seems well-suited to overcome this problem and can therefore be used to discover
those patterns.
Reddy, 2011).
2.10 THE DATA MINING PROCESS OF BUILDING INTRUSION
DETECTION MODELS
techniques and process frameworks that can support systematic data analysis on
the vast amount of audit data that can be made available. The process of using
data mining approaches to build intrusion detection models is shown in Fig 2.4.
1999).
Here raw (binary) audit data is first processed into ASCII network packet
information (or host event data), which is in turn summarized into connection
features, e.g., service, duration, flag (indicating the normal or error status
according to the protocols), etc. Data mining programs are then applied to the
connection records to compute the frequent patterns, i.e., association rules and
frequent episodes, which are then analyzed to construct additional features for the
connection records. Classification programs, for example, RIPPER, are then used
to inductively learn the detection models. This process is of course iterative. For
example, poor performance of the classification models often indicates that more
Data Mining is the automated process of going through large amounts of data
with the intention to discover useful information about the data that is not
obvious. Useful information may include special relations between the data,
specific models that of the data that repeats itself, specific patterns, and ways of
classifying it or discovering specific values that fall out of the “normal” pattern or
model (Agrawal & Srikant, 1994). In other to understand how data mining can
help advance intrusion detection, it is important to know how current IDS work to
known or novel attack (Chittur, 2001). However the search for the ideal IDS
continues and the amount of network data is increasing. Besides the issue of data
overload facing network analysts due to increasing complexity and large size of
The signature database has to be manually revised for each new type of intrusion
cannot detect emerging cyber threats. In addition, once a new attack is discovered
and its signature developed, often there is a substantial latency in its deployment
across networks.
The central theme of our approach is to apply data mining techniques for
intrusion detection in network based system. Data mining generally refers to the
process of (automatically) extracting models from large stores of data. The recent
learning, and database. Several types of algorithms (Lee, Stolfo, & Mok, 1999)
classifier that can label or predict new unseen audit data as belonging to
the normal class or the abnormal class environment ( Narayana, Prasad,
Finding out the correlations in audit data will provide insight for selecting
and/or correlation relationships among large set of data items. The mining
minsupport.
into clusters where each cluster consists of members that are quite similar.
Members from different clusters are different from each other. Hence
clustering methods can be useful for classifying network data for detecting
Misuse detection.
supervised learning technique. A classification based IDS will classify all the
attributes.
iii. After mapping, it outputs a classifier that can accurately predict the
unknown object. For example, a loan officer requires data analysis to determine
which loan applicants are "safe" or "risky". The data analysis task is
(categorical) labels, such as “safe” or “risky” for the loan application data. These
values has no meaning. Because the class labels of training data is already known,
i. Training and
ii. Testing.
data containing class labels. While the second process, testing, examines a
classifier (using testing data) for accuracy (in which case the test data contains
the class labels) or its ability to classify unknown objects (records) for
Hebat, Sherif, and Mohamed (2012) in their work reveal that subsequent to
preprocessing of data, the features of the data set are identified as either being
correlated with one or more other features. As a result, omitting them from the
intrusion detection process does not degrade classification accuracy. In fact, the
accuracy may improve due to the resulting data reduction, and removal of noise
and measurement errors associated with the omitted features. Therefore, choosing
the system.
1) Information Gain: In this method, the features are filtered to create the
most prominent feature subset before the start of the learning process.
2) Gain ratio: a modification of the information gain that solves the issue
spread and small when all data belong to one branch attribute.
Gain ratio takes number and size of branches into account when choosing an
split into account (i.e. how much information do we need to tell which branch an
Krzystof and Nobert (2007) states, effective and versatile classification cannot be
models comprising a feature selection stage. Similarly to other data analysis tasks,
algorithms. The authors further stressed that one of the most efficient heuristics
used for decision tree construction is the Separability of Split Value (SSV)
criterion. Its basic advantage is that it can be applied to both continuous and
that The SSV criterion has been successfully used not only for building
discrete and in the opposite direction) and as the discretization part of feature
selection methods, which finally rank the features according to such indices like
Mutual Information. It is known that extra features can increase computation time
and can impact accuracy of IDS, so feature selection is a very good measure of
The approach used in this project is to apply machine learning algorithm known
Decision Trees are a class of very powerful Machine Learning model cable of
achieving high accuracy in many tasks while being highly interpretable. What
makes decision trees special in the realm of ML models is really their clarity of
and displays the knowledge in such a way that it can easily be understood, even by
non-experts.
The approach used in this project is to discuss extensively the intended algorithms
data, the features of the data set are identified as either being significant to the
process does not degrade classification accuracy. In fact, the accuracy may
improve due to the resulting data reduction, and removal of noise and
the system.
They thus define Information Gain: In this method, the features are filtered to
create the most prominent feature subset before the start of the learning process.
Mathematically, (Xindong et al, 2008) stated that information gain for a dataset
J=1 S
Since 1999, KDD’99 (Yimin, 2004) has been the most widely used data set for
the evaluation of anomaly detection methods. This data set is built based on the
network traffic. The two weeks of test data have around 2 million connection
normal or an attack, with exactly one specific attack type. The simulated attacks
makes some computing or memory resource too busy or too full to handle
(2) User to Root Attack (U2R): is a class of exploit in which the attacker
starts out with access to a normal user account on the system (perhaps
and is able to exploit some vulnerability to gain root access to the system.
(3) Remote to Local Attack (R2L): occurs when an attacker who has the
ability to send packets to a machine over a network but who does not have
an account on that machine exploits some vulnerability to gain local
each category.
Import Dataset
Data preprocessing
Result
process. Features may contain false correlations, which hinder the process of
information they add is contained in other features. Extra features can increase
computation time, and can impact the accuracy of IDS. Feature selection
bagging approach to create a bunch of decision trees with a random subset of the
data. It is considered to be one of the most effective algorithm to solve almost any
prediction task. It can be used both for classification and the regression kind of
values of a random vector sampled independently with the same distribution for
The pseudo code for random forest algorithm can split into two stages. First, in
which ‘n' random trees are created, this forms the random forest. In the second
stage, the outcome for the same test feature from all decision trees is combined.
Then the final prediction is derived by assessing the results of each decision tree
or just by going with a prediction that appears the most times in the decision trees.
Random Forest Machine Learning Algorithm maintains accuracy even when there
is inconsistent data and is simple to use. It also gives estimates on what variables
are important for the classification. It runs efficiently on large databases while
provides methods for balancing error in class population unbalanced data sets but
another drawback that is, it does not predict beyond the range of the response
In general, the more trees in the forest the more robust the forest looks like. In
the same way in the random forest classifier, the higher the number of trees in
if Xt ∈ A then
UpdateEstimationStatistics(A, (Xt, Yt))
end if
end for
end for
else if
It = structure then
if At has fewer than m candidate split points then
WEKA as J48 using Java. All of them adopt a greedy and a top-down approach to
decision tree making. It is used for classification in which new data is labelled
induction begins with a dataset (training set) which is partitioned at every node
attributes is also passed. Objects can be an event, an activity and the attributes are
the information related to that object. To every tuple in the data set is associated a
class label which identifies whether an object belongs to a particular class or not.
Splitting can further be performed only if the tuples fall in different classes. The
partition of dataset uses heuristics that chooses an attribute that best partitions a
selection measures are responsible for the type of branching that occurs on a
node. Gini index, information gain are some examples which partition a node into
trees into strict binary can be used if need be. C4.5 uses gain ratio as the attribute
selection measure which has an advantage over information gain used in its
predecessor ID3. Since ID3 can produce n-ary branch trees if the attribute on
which partitioning is data has unique values, and therefore cannot be used for
classification.
INPUT:
Dataset//Training data
OUTPUT
Tee//Decision tree
BUILD(*DataSet)
{Tree = ∅;
Tree = Add arc to node which is root and for each split predicate and label are
assigned;
If stopping point reached to this path, then Tree = create leaf node and label with
appropriate class;
Else
}
3.8.2 Pseudocode of J48
Step1: All the rows in a dataset are passing onto the root node.
Step2: Based on the values for the rows in the node considered, each of the
Step3: At each split point, the parent node is split as binary nodes (child nodes)
by separating the rows with values lower than or equal to the split point and
values higher than the split point for the considered predictor variable. For
turn.
Step 4: The predictor variable and split point with the highest value of I is
Where PL and PR are the probabilities of a sample to lie in left sub-tree & right
sub-tree respectively and are the probabilities that a sample is in the class Cj and
the Not Unix General Public License (GNU). The WEKA work bench contains a
collection of visualization tools and algorithms for data analysis and predictive
modelling, together with graphical user interfaces for easy access to this
functionality. WEKA toolkit is a widely used toolkit for machine learning and
become very popular with academic and industrial researchers, and is also widely
used for teaching purposes. To use WEKA, the collected data need to be prepared
and converted to (csv) file format to be compatible with the WEKA data mining
toolkit.
3.10 System Configuration
- RAM - 4 GB (min)
- Hard Disk - 20 GB
Software Configuration
The goal of this simulation is to show how the two classification algorithms can
efficiently and effectively able to detect intrusions. This study used two
performed using Weka, Data Mining tool. The dataset used in this project is the
KDD dataset. Next, we will discuss about the data used to train and test the
following steps: data sets, data mining tools and performance measurement terms.
The dataset used in this research is the KDD dataset. KDD is a data set suggested
to solve some of the inherent problems of the KDD cup'99 data set. It is basically
a processed version of the KDD cup‟99 dataset. This dataset enables researchers
to train their algorithms on the full dataset (because of its smaller amount of
records) instead of using a portion of the full dataset as in the case of the KDD
cup‟99data set.
The experiments was done using Weka 3.6.7. Weka(Waikato Environment for
system, with 2 GB of RAM and a Pentium (R) Dual-core CPU at 2.20Hz per core.
Due to the iterative nature of the experiments and resultant processing power
required, the java heap size for weka-3-6.7 was set to 1024 MB.
To assess the effectiveness of the algorithms, each one of them was trained on the
KDD data set using a ten-fold validation test mode in a Weka (Waikato
algorithms we use 10-fold cross validation. In this process the data set is divided
into 10 subsets. Each time, one of the 10 subsets is used as the test set and the
other k-1 subsets form the training set. Performance statistics are calculated
across all 10 trials. This provides a good indication of how well the classifier will
instances show the percentage of test instances that were correctly and
agreement between the classifications and the true classes. It's calculated
greater than 0 means that the classifier is doing better than chance.
(3) Mean Absolute Error, Root Mean Squared Error, Relative Absolute
Error: The error rates are used for numeric prediction rather than
the error has a magnitude, and these measures reflect that. Detection of
(i). True positive (TP): Corresponds to the number of detected attacks and
it is in fact an attack.
not applicable due to several specific details that include dealing with skewed
class distribution, learning from data streams and labeling network connections.
very apparent since intrusion as a class of interest is much smaller i.e. rarer than
the class representing normal network behavior. In such scenarios when the
normal behavior may typically represent 98-99% of the entire population a trivial
classifier that labels everything with the majority class can achieve 98-99%
accuracy.
The 10-fold cross validation is used in the field of machine learning to determine
how accurately a learning algorithm will be able to predict data that it was not
trained on.
Figure 4.1. KDD dataset loaded into the model
Figure 4.2. J48 Classifier error
Figure 4.3. J48 classifier tree generated
Figure 4.4 J48 Margin curve
Figure 4.5 J48 Threshold curve
Figure 4.6 J48 cost curve
Figure 4.7 J48 Cost benefit analysis curve
Figure 4.8. J48 classifier performance
As can be seen from Table 4.2, the RF and J48 have the same TP rate, precision,
recall and F-measure. The RF has the highest ROC. The J48 has the highest FP
The figure 4.12 revealed that the RF outperformed RF in terms of ROC while
they have same TP rate, recall and F-measure. The J48 performed better in terms
of FP rate with same TP rate, precision and F- measure. The two algorithms
proposed has detection accuracy of 99% with Random forest has false alarm rate
of 4% and J48 with false alarm rate of 6%. This revealed that the RF performed
Finally, the conclusions of the study and the summary of the results obtained from
all the experiments carried out using the RF and J48 algorithms for anomaly
In today’s day and age, the prevention of the security breaches with the help of
very important feature in the network security. Furthermore, the misuse detection
methods are unable to detect the unknown attacks; hence, anomaly detection
needs to be used for identifying such attacks. The data mining technique is
In this project, this project developed and proposed the J48 and RF Classification
algorithms for the intrusion and anomaly detection. It was seen that the proposed
algorithm showed a better performance with RF performing better than the J48.
This new method was very effective for detection of many attacks and showed
5.2 CONCLUSION
detection system
using data mining and machine learning algorithms. In this project, two decision
tree algorithms; RF and J48 have been chosen to find which of the two algorithms
is more accurate and efficient. The efforts have been made to implement the
intrusion detection system using famous J48 and RF algorithms for machine
learning. The results obtained by the experiments revealed that Rf performs well
algorithm. The detection rate of RF algorithm was 99% mark while as J48
provides also 99 % of detesction rate but false alarm of 4% for RF with 6% for
J48.
5.3 RECOMMENDATION
Finally, for future work, the IDS intrusion detection accuracy rate and the
implemented in real network environments. The future work can also be to further
explore the features of the J48 algorithm and improve the split value and the
technique.
may be developed.
Abstract
Nowadays it is very important to maintain a high level security to ensure safe and
data communication over internet and any other network is always under threat of
protect network based systems, but there are still many undetected intrusions.
However, over the past years, a growing number of research projects have applied
Many of these approaches resulted in high detection rate and accuracy but,
majority of them encounter a high false alarm rate which is as a result of falsely
area is in desperate need of focusing not only on level of detection rate and
accuracy but also on a way to reduce the dataset from noise but still retains its
value and false alarm rate to properly identify such intrusions. This project
presents a classification algorithms based on Random Forest (RF) and J48. The first stage
of the process is data preprocessing based on feature selection with Principal Component
Analysis (PCA) before the Network classification with RF and J48. The KDDCup 99
data set is used for the experiment and the experiment was performed in WEKA. The
results findings showed that the RF gave a detection accuracy of 99% with 4% false
alarm rate. The J48 gave a detection accuracy of 99% with 6% false alarm rate.
Experimental results showed that the RF algorithm has higher detection rate with low
Agrawal, R., Imielinski, T., and Swami, A. (1993). Mining Associations between
207{216.
Ajayi, A., Idowu, and S.A., Anyaehie A.,(2013). Comparative study of selected
Books, NJ.
detection”, ACM Trans. Information and System Security 3 (3), pp. (186-
205).
Barbarà, D., Couto, J., Jajodia, S., Popyack, L., And Wu,N., ADAM: Testbed for
Berry, M. J. A. and Lino_,G. (1997). Data Mining Techniques. John Wiley and
Sons, Inc.
Publications.
Chen, W.H., Hsu, S.H., and Shen, H.P. (2005). Application of SVM and ANN for
Chittur, A., ”Model generation for an intrusion detection system using genetic
Wesley
Crosbie, M. and E. H. Spafford, ”Active defense of a computer system using
D’silva, M., Deepali, V., (2013) “Comparative Study of Data Mining Techniques
February 2013.
2000.
Didaci, L., Giacinto, A. & Roli, F. (2002). “Ensemble learning for intrusion
ICIQ-03, 2008.
Eric Bloedorn et al, ”Data Mining for Network Intrusion Detection: How to Get
Started,”
Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. J., A Geometric
Fayyad, U. M., Piatetsky-Shapiro, G., Smyth, P., and Uthurusamy, R., editors
Press/MIT Press.
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996a). From Data Mining to
G. J. Klir, ”Fuzzy arithmetic with requisite constraints”, Fuzzy Sets and Systems,
91:165175, 1997.
Han, J. and Kamber, M. (2000). Data Mining: Concepts and Techniques, Morgan
Kaufmann Publisher.
October 2012.
October, 2013.
Jaiganesh, V., Mangayarkarasi, S., and Sumathi, P., (2013) “Intrusion Detection
Detection Technologies.
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, December
2009.
Krzystof, G., and Nobert, J., (2012). “Feature selection with Decision Tree
Criterion”. 2012.
Landwehr, C.E., Bull, A.R., McDermott, J.P., and Choi, W.S.(1994). A taxonomy
vol.26,no.3,pp.211–254,1994.
Lafayette, IN.
Lee, W (1999). A Data Mining Framework for Constructing Features and Models
Lee, W., Stolfo, S.J. & Mok, K.W. (1999). “Mining in a data-flow environment:
Discovery and Data Mining (KDD-99) (pp. 114-124), San Diego, CA:
ACM,
1998.
Lee, W. , S.J.Stolfo et al, ”A data mining and CIDF based approach for detecting
France.
Discovery and Data Mining (KDD-99), San Diego, CA, pp. 114124.
Lunt, T.F. (1989). Real -Time Intrusion Detection. Proceedings from IEEE
COMPCON.
Mannila, H., Smyth, P., and Hand, D. J. (2001). Principles of Data Mining. MIT
Discovery, :259-289.
Namur (Belgium).
Mukkamala, .S., Janoski, .G., Sung, .A., (2002) Intrusion Detection Using Neural
Mukkamala, S., Sung, A.H., Abraham, A., (2003) Intrusion detection using
48.
42.
Mukkamala, S., Sung, A.H., Abraham, A., Ramos, V.(2004b) Intrusion detection
2004b. p.26–33[ISBN:972-8865-00-7].
Narayana, M.S., Prasad, B. V. V. S., Srividhya, A., Pandu Ranga R.K., (Issue 6,
Neri, F., ”Comparing local search with respect to genetic evolution to detect
Noel, S., Wijesekera, D., and Youman, C., Modern Intrusion Detection, Data
Patel, H., Sarkhedi, B., and Vaghamshi, H., (2013) “Intrusion Detection in Data
2013.
Patel, R.,Thakkar, A., Ganatra, A., (2012) “ A Survey and Comparative Analysis
resources/idfaq/data_mining.php
Shyu, M., Chen, S., Sarinnapakorn, K. and Chang, L. (2003). A novel Anomaly
Hill; 1997.
1996;2(4).
Valdimir, .V. N., (1995). The Nature of Statistical Learning Theory, Springer,
1995.
Zhao, J., Chen, M., and Lou, Q. (2011). Research of intrusion detection system