3 views

Uploaded by Ahmar Taufik Safaat

Farid - Hybrid Decision Tree and Naïve Bayes Classifiers for Multi-class - 2014

save

- 10. Data mining and data warehouse.pdf
- Ijitce June 2011
- Machine Learning Revision Notes
- computer graphics Q & A
- Decision Trees
- encog-3_3-quickstart.pdf
- Ge Presentation(1)
- Predictive Modelling of Crime Dataset Using Data Mining
- fvcnn.pdf
- tutorial
- u07a1 PROBABILITY MODELING Hal Hagood .docx
- T03a.BinaryTreeTheorems
- Data Structure and Algorithms
- Paper on Data Mining
- First Task 2
- CS301-Lec12 handout.doc
- GeeksforGeeks placement questions
- TournamentTrees.pdf
- Synchronization Barriers
- UGC Computer Science Paper 2 Solved December 2012
- Greedy Part 2
- EAP Menggunakan Zachman Framework (Studi Kasus PT. Jasaraharja Putera Cabang Pekanbaru)
- jurnal_15299
- Bab 1 Pengertian Data Mining
- Aplikasi Enkripsi Pesan SMS Dengan Algoritma Kriptografi Block Chiper DES Berbasis Android
- Modul Praktikum SIstem Digital
- Analisa dan Perancangan Sistem Informasi Penerimaan Siswa Baru dan Pembayaran SPP Menggunakan Zachman Framework.pdf
- Modul Praktikum Algoritma Dan Struktur Data
- A Simple Dynamic Mind-map Framework to Discover Associative Relationships in Transactional Data Streams
- Franklin County Hurricane Irma Information as of noon on September 9th
- FEMA - High Water Mark Collection for Hurricane Katrina in Alabama
- 01 Meteorologia Basica Resumo - avlan.net (1).pdf
- CHEM 335 Aldol Reaction
- Nerzic 2007 Wind:Waves:Current
- Bayesian Computation for the Prediction of Football Match Using Artificial Neural Network (1)
- Sage X3 - User Guide - HTG-SmartForecast V8 Setup.pdf
- Chapter 01- Essentials of Geography
- Army Aviation Digest - Feb 1963
- Course Project C
- 2011 NIPCC Interim Report
- TRANSMISSION LINE INSULATION
- Design of Flexible Pavements
- 98308348-Complete-PET-Workbook.pdf
- below - lesson 3
- Decomposition Method
- Wind Thesis2
- 1986 Paterson
- Safety First 24
- Q1
- Actividad Integradora Podemos Saberlo M17S3
- Chapter 09 - Forecasting Exchange Rate
- Inhofe's Anti-global Warming Report
- Flugwetterinformationen 2007 Nov d
- Heat Flow Basics
- masas de aire.docx
- unit 2 social science 4º primaria.pdf
- Time Seriesforcasting and Index Number
- 1 Ano B. Exercicio Revisao. 1
- J. 2008 Indoor Air

You are on page 1of 10

Contents lists available at ScienceDirect

**Expert Systems with Applications
**

journal homepage: www.elsevier.com/locate/eswa

**Hybrid decision tree and naïve Bayes classiﬁers for multi-class
**

classiﬁcation tasks

Dewan Md. Farid a, Li Zhang a,⇑, Chowdhury Moﬁzur Rahman b, M.A. Hossain a, Rebecca Strachan a

a

Computational Intelligence Group, Department of Computer Science and Digital Technology, Northumbria University, Newcastle upon Tyne, UK

b

Department of Computer Science & Engineering, United International University, Bangladesh

a r t i c l e i n f o a b s t r a c t

**Keywords: In this paper, we introduce two independent hybrid mining algorithms to improve the classiﬁcation
**

Data mining accuracy rates of decision tree (DT) and naïve Bayes (NB) classiﬁers for the classiﬁcation of multi-class

Classiﬁcation problems. Both DT and NB classiﬁers are useful, efﬁcient and commonly used for solving classiﬁcation

Hybrid problems in data mining. Since the presence of noisy contradictory instances in the training set may

Decision tree

cause the generated decision tree suffers from overﬁtting and its accuracy may decrease, in our ﬁrst

Naïve Bayes classiﬁer

proposed hybrid DT algorithm, we employ a naïve Bayes (NB) classiﬁer to remove the noisy troublesome

instances from the training set before the DT induction. Moreover, it is extremely computationally expen-

sive for a NB classiﬁer to compute class conditional independence for a dataset with high dimensional

attributes. Thus, in the second proposed hybrid NB classiﬁer, we employ a DT induction to select a

comparatively more important subset of attributes for the production of naïve assumption of class con-

ditional independence. We tested the performances of the two proposed hybrid algorithms against those

of the existing DT and NB classiﬁers respectively using the classiﬁcation accuracy, precision, sensitivity-

speciﬁcity analysis, and 10-fold cross validation on 10 real benchmark datasets from UCI (University of

California, Irvine) machine learning repository. The experimental results indicate that the proposed

methods have produced impressive results in the classiﬁcation of real life challenging multi-class prob-

lems. They are also able to automatically extract the most valuable training datasets and identify the

most effective attributes for the description of instances from noisy complex training databases with

large dimensions of attributes.

Ó 2013 Elsevier Ltd. All rights reserved.

**1. Introduction construction of overﬁtting or fragile classiﬁers. Thus, data prepro-
**

cessing techniques are needed, where the data are prepared for

During the past decade, a sufﬁcient number of data mining mining. It can improve the quality of the data, thereby helping to

algorithms have been proposed by the computational intelligence improve the accuracy and efﬁciency of the mining process. There

researchers for solving real world classiﬁcation and clustering are a number of data preprocessing techniques available such as

problems (Farid et al., 2013; Liao, Chu, & Hsiao, 2012; Ngai, Xiu, (a) data cleaning: removal of noisy data, (b) data integration:

& Chau, 2009). Generally, classiﬁcation is a data mining function merging data from multiple sources, (c) data transformations: nor-

that describes and distinguishes data classes or concepts. The goal malization of data, and (d) data reduction: reducing the data size

of classiﬁcation is to accurately predict class labels of instances by aggregating and eliminating redundant features.

whose attribute values are known, but class values are unknown. This paper presents two independent hybrid algorithms for

Clustering is the task of grouping a set of instances in such a way scaling up the classiﬁcation accuracy of decision tree (DT) and

that instances within a cluster have high similarities in comparison naïve Bayes (NB) classiﬁers in multi-class classiﬁcation problems.

to one another, but are very dissimilar to instances in other clus- DT is a classiﬁcation tool commonly used in data mining tasks such

ters. It analyzes instances without consulting a known class label. as ID3 (Quinlan, 1986), ID4 (Utgoff, 1989), ID5 (Utgoff, 1988), C4.5

The instances are clustered based on the principle of maximizing (Quinlan, 1993), C5.0 (Bujlow, Riaz, & Pedersen, 2012), and CART

the intraclass similarity and minimizing the interclass similarity. (Breiman, Friedman, Stone, & Olshen, 1984). The goal of DT is to

The performance of data mining algorithms in most cases depends create a model that predicts the value of a target class for an un-

on dataset quality, since low-quality training data may lead to the seen test instance based on several input features (Loh & Shih,

1997; Safavian & Landgrebe, 1991; Turney, 1995). Amongst other

⇑ Corresponding author. Tel.: +44 191 243 7089.

data mining methods, DTs have various advantages: (a) simple to

E-mail address: li.zhang@northumbria.ac.uk (L. Zhang).

understand, (b) easy to implement, (c) requiring little prior

**0957-4174/$ - see front matter Ó 2013 Elsevier Ltd. All rights reserved.
**

http://dx.doi.org/10.1016/j.eswa.2013.08.089

Section 6 concludes accuracy. DTFS ex- missing attribute values. It is also an important mining classi. The inﬂuential of instances from a large amount of noisy training data with high factor of attributes was found in their work. (b) strong (naïve) independence assumptions. Carrasco-Ochoa. Their one-against-all method constructed M of class conditional independence.5 decision tree classiﬁer and a one-against-all meth- most important attributes selected by DT with their corresponding od to improve the classiﬁcation accuracy for multi-class classiﬁca- weights are employed for the calculation of the naïve assumption tion problems. each of which sep- of the proposed hybrid algorithms against those of existing DT arated one class from all of the rest. The ith C4. 2010). then DTFS updated the va- to use. The G-FDT tree used the Gini Index as the split measure In this section. (2009) proposed a method to resolve one of the exceptions in basic ically extract the most representative high quality training datasets decision tree induction algorithms when the class prediction at a and identify the most important attributes for the characterization leaf node cannot be determined by majority voting. ﬁer for data mining and applied in many real world classiﬁcation DTFS expended or updated the leaf node according to the class la- problems because of its high classiﬁcation performance.1. such as the bagging approach of a DT classiﬁer and a categorize the training dataset. presents our proposed two hybrid algorithms for the multi-class Chen and Hung (2009) presented an associative classiﬁcation classiﬁcation problems respectively based on DT and NB classiﬁers. (c) handling lue of the input tree branch of this leaf node. fast splitting attribute selection (DTFS). 2010). the NB classiﬁer also has several advantages such as (a) easy training instances from only one class. sensitivity-speciﬁcity analysis. dependability of the attribute value on the class label. it is also noted that to com. the number of instances stored in a leaf node reached its limit. If the leaf node of DT contained to DT. we review recent research on decision trees and to choose the most appropriate splitting attribute for each node naïve Bayes classiﬁers for various real world multi-class classiﬁca- in the decision tree. Sanchez. Chandra and Varghese (2009) proposed a fuzzy decision tree Gini Index based (G-FDT) algorithm to fuzzify the decision boundary 2. Otherwise. bute with the highest gain measure as the splitting attribute. DT may suffer found similarities between instances using clustering algorithms from overﬁtting due to the presence of such noisy instances and and also selected target attributes. (b) only one scan of the training data required. repository (Frank & Asuncion. Gini Index was computed using the fuzzy-membership values of the attribute corresponding to a split value and fuzzy-membership 2. Otherwise. The benchmark datasets from UCI (University of California. Moreover. for classifying instances in large datasets with a large number of where a large number of variables in datasets were being consid- variables. When a threshold for the pute class conditional independence using a NB classiﬁer is extre. When classifying new instances overview of the work related to DT and NB classiﬁers. tree (ACT) that combined the advantages of both associative clas- Section 5 provides experimental results and a comparison against siﬁcation and decision trees. Related work without converting the numeric attributes into fuzzy linguistic terms. Decision trees values of the instances.5 decision tree clas- and NB classiﬁers using the classiﬁcation accuracy. to improve the classiﬁ- superﬂuous nodes and branches are removed in order to improve cation accuracy. It the learning tree for decision making. A naïve DTs for large datasets. an algorithm for building (e) robust. In order to avoid storing all the instances into the DT. which gave the dimensional attributes. Rahman. 2010). The experi. and (f) dealing with large and noisy datasets. mental way. Section 3 using this pruned tree. The ﬁrst proposed hybrid DT algorithm ﬁnds the trouble. most crucial subset of attributes using a DT induction. duced a decision tree construction method based on adjusted clus- moves these instances from the training set before constructing ter analysis classiﬁcation called classiﬁcation by clustering (CbC). the tion problems. siﬁer was trained with all the training instances of the ith class sitivity–speciﬁcity analysis. (d) able to handle both numerical and categorical data. Irvine) ma. for a DT classiﬁer and a NB classiﬁer for multi-class classiﬁcation DTFS always considered a small number of instances in the main tasks. & Rahman. and 10-fold cross validation on 10 real with positive labels and all the others with negative labels. memory for the building of a DT. For the construction of the decision tree. ACT followed a simple heuristic which selected the attri- the ﬁndings and proposes directions for future work. Section 2 gives an was pruned based on this factor. number of instances stored in a cluster was reached.1938 D. and (c) independent feature models (Farid. sen. / Expert Systems with Applications 41 (2014) 1937–1946 knowledge. and 10-fold cross mental results prove that the proposed methods have produced validation on three datasets taken from the UCI machine learning very promising results in the classiﬁcation of real world challeng. Farid et al. They proposed a novel combination of DTs and evolutionary decision trees: (a) the growth of the tree to enable it to accurately methods. Our second proposed hybrid NB algorithm ﬁnds the ate value of the target attribute. cation accuracy. The split-points were chosen as the mid- points of attribute values where the class information changed. The weights Polat and Gunes (2009) proposed a hybrid classiﬁcation system of the selected attributes by DT are also calculated. whereby back-propagation neural network method. DTFS used this attribute selection technique Bayes (NB) classiﬁer is a simple probabilistic classiﬁer based on: to expand nodes and process all the training instances in an incre- (a) Bayes theorem. When 2011. Balamurugan and Rajaram ing multi-class problems. Franco-Arcega. Aviad and Roy (2011) also intro- some instances in the training data using a NB classiﬁer and re. performance of this hybrid classiﬁer was tested using the classiﬁ- chine learning repository (Frank & Asuncion. The ACT tree was built using a set of existing DT and NB algorithms using 10 real benchmark datasets associative classiﬁcation rules with high classiﬁcation predictive from UCI machine learning repository. DTFS stored at most N number of instances in a leaf node. Decision tree classiﬁcation provides a rapid and useful solution Aitkenhead (2008) presented a co-evolving decision tree method. Then it calculated the target its accuracy may decrease. In this approach.Md. There are two common issues for the construction of ered. all the in- mely computationally expensive for a dataset with many stances in each cluster were classiﬁed with respect to the appropri- attributes. Section 4 rately than the basic assignment by traditional DT algorithms. We evaluate the performances number of binary C4. panded the leaf node by choosing a splitting attribute and created In this paper. tree and also handled comparatively a wider range of values and Diaz. and (b) the pruning stage. The DT The rest of the paper is organized as follows. Then only these based on a C4. These methods also allow us to automat. Such methods evolved the structure of a decision classiﬁcation accuracy.5 decision tree classiﬁers. Finally. the class labels can be assigned more accu- introduces the basic DT and NB classiﬁcation techniques. 2010. precision. we propose two hybrid algorithms respectively one branch for each class of the stored instances. Similar bels of the instances stored in it. and Martinez-Trinidad (2011) presented decision trees using data types. Lee & Isa. and (d) continuous data. attributes distribution for each cluster. .

InfoA ðDÞ ¼ InfoðDj Þ ð2Þ jDj conditional independent component analysis (PC-ICA) method for j¼1 naïve Bayes classiﬁcation in microarray data analysis. pj. (3).5 classiﬁer is a successor of ID3 method called extended naïve Bayes (ENB) for the classiﬁcation classiﬁer (Quinlan. socioeconomic status and experience) and performance probability that xi 2 D. the expected information needed to correctly model was tested using socio-demographic (age. Di A subset of a training dataset T A decision tree The HNB method was based on the idea of creating another layer that represented a hidden parent of each attribute. partition. gender. Fan. Koc. Aij An attribute’s value Wi The weight of attribute Ai tacks. D. The goal of DT is to iteratively and over-ﬁtting for the classiﬁcation of gene expression datasets. Tavallaee. xi 2 D. and Ruz (2012) also presented an approach selection measure (Quinlan. where all instances in R-NBC used logarithms of probabilities rather than multiplying each Di belong to the same class. . In Eq. Jamain & Hand. xi 2 D. The HNB classiﬁer Ci A class label was an extended version of a basic NB classier. / Expert Systems with Applications 41 (2014) 1937–1946 1939 2. Poh. A. & Ghor. (1). The results showed that the socio-demographic attributes were not suitable for predicting sale X m InfoðDÞ ¼ pi log 2 ðpi Þ ð1Þ performances. Farid et al. When handling numeric attributes. effective contacts and ﬁnished records) jCi. Given a exclusively to sales and telemarketing based on a NB classiﬁer. training dataset. ENB used a normal NB algorithm to calcu- gously with Info(D) as shown in Eq. The root node of DT is chosen to predict the performance of sales agents of a call centre dedicated based on the highest information gain of the attribute.e. xi. underﬂow to identify Ci of an instance. from D based estimate approach for providing solutions to over-ﬁtting problems. Valle. Dichotomiser). error rate and the training datasets into smaller subsets until all the subsets be- misclassiﬁcation cost compared with the traditional NB classiﬁer on long to a single class. on the partitioning by attributes. In this section. The mixed types of data included categor- information gain using a ‘‘split information’’ value deﬁned analo- ical and numeric data. and Zhou (2010) proposed a partition. (1). and is estimated by (logged hours.Dj/jDj. . Classiﬁcation is one of the most popular data mining techniques GainðAÞ GainRatioðAÞ ¼ ð5Þ that can be used for intelligent decision making. It applies a kind of normalization to of mixed types of data. Huang. Generally. Hsu. Table 1 summarizes the most com- quency of class Cj 2 D. 1986). each DT is a rule set. ent partitions so that independent component analysis (ICA) can be done within each partition. The naïve Bayes classiﬁer is also widely used for classiﬁcation Symbol Term problems in data mining and machine learning ﬁelds because of xi A data point or instance its simplicity and impressive classiﬁcation accuracy. monly used symbols and terms throughout the paper. (4). Eq. which uses information theory as its attribute bani. Naïve Bayes classiﬁers Table 1 Commonly used symbols and terms. we SplitInfoðAÞ discuss some basic techniques for data classiﬁcation using decision Eq. (2) shows InfoA(D) jD j It did not require any prior feature selection approaches in the ﬁeld calculation. belongs to a class. (7). It relaxed the condi. DT recursively partitions a superior overall performance in terms of accuracy. late the probabilities of categorical attributes. This training dataset. X m GiniðDÞ ¼ 1 p2j ð6Þ 3. InfoA(D) is the expected infor- probabilities to handle the underﬂow problem and employed an mation required to correctly classify an instance. The attribute with the maximum Gain Ratio is selected as the 3. which is deﬁned in Eq. . Chandra and Gupta (2011) proposed a robust naïve Bayes clas. Info(D) is the average amount of information needed siﬁer (R-NBC) to overcome two major limitations i. D. where pi is the status. D A training dataset tional independence assumption imposed on the basic NB classiﬁer. Ci. PC-ICA spited the small-size data samples into differ- is shown in Eq. talked hours. D. information as attributes of individuals.Md. .. Ci. is given in Eq. 2009. D2. marital classify an instance. PC-ICA also attempted to do ICA-based GainðAÞ ¼ InfoðDÞ InfoA ðDÞ ð3Þ feature extraction within each partition that may consist of several The Gain Ratio is an extension to the information gain approach. Supervised classiﬁcation splitting attribute. 2000. It further ex- Information gain is deﬁned as the difference between the tended the class-conditional independent component analysis (CC- original information requirement and the new requirement that ICA) method. which provides a rapid and effective method for classifying Eq. (6) deﬁnes the gini for a dataset. Decision tree induction j¼1 A decision tree classiﬁer is typically a top-down greedy ap. Varas. (5). Bagheri. A C4. and Chang (2008) presented a classiﬁcation also used in DT such as C4. Dn}. The common DT algorithm is ID3 (Iterative the KDD 99 dataset (McHugh. Lu.5. The goodness of a split of D into subsets D1 and D2 is deﬁned by proach. The HNB multiclass classiﬁcation model exhibited 2008). . The inﬂuences from all of the other attributes can thus be easily combined through conditional probabilities by estimating the attributes from the data instances (Chandra & Paul Varghese. where jDjj acts as the weight of the jth partition. It especially signiﬁcantly improved the accuracy for the C Total number of classes in a training dataset detection of denial-of-services (DoS) attacks. X A set of instances and Sarkani (2012) applied a hidden naïve Bayes (HNB) classiﬁer to Ai An attribute a network intrusion detection system (NIDS) to classify network at. 2009). but operational records proved to be useful for the i¼1 prediction of the performances of sales agents. into subsets. it adopted the statistical theory to discrete X n jDj j jDj j the numeric attributes into symbols by considering the average S plitInfoA ðDÞ ¼ log 2 ð4Þ j¼1 jDj jDj and variance of numeric values. classes. is the fre- tree and naïve Bayes classiﬁers. 1993).2. where. {D1. of microarray data analysis where a large number of attributes X n jDj j were considered.1. D. Mazzuchi.

(12). the class Ci for which P(XjCi)P(Ci) is the maximum probability. the NB classiﬁer predicts that the instance X belongs to the class Ci. rC i Þ ð11Þ class Ci for which P(CijX) is maximized is called the Maximum 1 ðxpÞ2 2 Posteriori Hypothesis. lC i is the mean and rC i is the standard deviation of PðC i jXÞ ¼ ð8Þ PðXÞ the values of the attribute Ak for all training instances in the class Ci. (9).Md. if and therefore maximize P(XjCi).j – i. PðXjC i Þ ¼ Pðxk jC i Þ ð9Þ k¼1 sponding probabilities for those attributes when calculating the likelihood of membership for each class. Therefore. if and only the classes are equally likely. P(C1) = P(C2) = = P(Cm). . lC i . Sunny Mild Normal Strong Yes Overcast Mild High Strong Yes Probability Value Overcast Hot Normal Weak Yes P(Play = Yes) 9/14 = 0. Moreover. Tables 3 and 4 respectively tabulate the prior probabilities for each class and conditional probabilities for each attribute value gener- Table 2 ated using the playing tennis dataset shown in Table 2. P(xnjCi) can be Given a training dataset. .375 . xn}. that is.. The attribute values can be dis. To compute P(XjCi) in a dataset with many attributes In Eq. the classiﬁer will predict that X belongs sumed to have a Gaussian distribution with a mean l and standard to the class with the highest posterior probability. X2. PðXjC i Þ ¼ Pðx1 jC i Þ Pðx2 jC i Þ Pðxn jC i Þ ð10Þ quires the class conditional independence.. . the attri- represented as. divided by crete or continuous. A2. X. . (13). The NB classiﬁer probabilities are not known. independent of one another. butes in training datasets can be categorical or continuous-valued. in order to estimate P(xkjCi). . we consider a small dataset in Table 2 described by four attributes namely Outlook. has a particular class label If Ak is a continuous-valued attribute. Aih}. rÞ ¼ pﬃﬃﬃﬃﬃﬃﬃ e 2r ð12Þ 2pr PðXjC i ÞPðC i Þ In Eq. x2.Dj/jDj. if and only if P(CijX) > P(CjjX) for 1 6 j 6 m. . 3. X 2 D. l.2. Tian. A NB classiﬁer can easily Y n handle missing attribute values by simply omitting the corre. . Temperature. each data record is easily estimated from the training instances. these probabilities P(x1jCi). 2010). Xn}. (8). Naïve Bayes classiﬁcation tion of class conditional independence is made in order to reduce A naïve Bayes classiﬁer is a simple probabilistic based method. . . In Eq. advantages: (a) easy to use. 1 shows the decision tree model constructed using the playing tennis dataset shown in Table 2. Xi = {x1. then P(xkjCi) is the number attribute values {Ai1. Ak.P(x2jCi). . Humid- ity. . butes {A1. maximize P(XjCi)P(Ci). Each attribute has several unique attribute values. The attributes are conditionally which can predict the class membership probabilities (Chen. 2009. C2. Fig. (11). . . The class prior probabilities are calculated by P(Ci) = jCi. i. is categorical. bute on a given class is independent of those of other attributes. To predict the class label of instance classes. D = {X1. . then Ak is typically as- Ci. An} and each attribute Ai contains the following If the attribute value. given the class label of the instance. If the class prior X. The Play column in Table 2 represents the class category of each instance. together with In Bayes theorem shown in Eq. i. as P(X) is a constant for all xk. It indicates whether a particular weather condition is suitable or not for playing tennis. .e. conditioned deviation r. The proposed hybrid learning algorithms Sunny Hot High Weak No Sunny Hot High Strong No In this paper. Thus. PðXjC i ÞPðC i Þ > PðXjC j ÞPðC j Þ ð13Þ where jCi. Farid et al. the number of instances belonging to the class Ci 2 D. of instances in the class Ci 2 D with the value xk for Ak. gðx. Otherwise. . . The NB classiﬁer also re. the effect of an attri. Cm}. The playing tennis dataset. xk refers to the value of attribute Ak for instance X. For a test instance. The Pðxk jC i Þ ¼ gðxk . Ai2.Dj. Each training instance. Fig. That is the predicted class label is is extremely computationally expensive. A decision tree generated using the playing tennis dataset. only P(XjCi)P(Ci) needs to be maximized. D contains the following attri. D also contains a set of classes C = {C1. then it is commonly assumed that predicts that the class label of instance X is the class Ci. Farid & Rahman.Dj is the number of training instances belonging to the class Ci in D. deﬁned respectively by the following two equations: on X. the split with the best gini value is chosen. and Wind.P(XjCi)P(Ci) is evaluated for each class Ci 2 D.642 Rain Mild High Strong No P(Play = No) 5/14 = 0. . . These proposed algorithms are described in the following Overcast Cool Normal Strong Yes Sunny Mild High Weak No Sunny Cool Normal Weak Yes Table 3 Rain Mild Normal Weak Yes Prior probabilities for each class generated using the playing tennis dataset. Outlook Temperature Humidity Wind Play 4. the naïve assump. we have proposed two independent hybrid Overcast Hot High Weak Yes algorithms respectively for decision tree and naïve Bayes classiﬁers Rain Mild High Weak Yes Rain Cool Normal Weak Yes to improve the classiﬁcation accuracy in multi-class classiﬁcation Rain Cool Normal Strong No tasks. . 1 6 j 6 m and j – i. computation in evaluating P(XjCi). 1. & Qu. / Expert Systems with Applications 41 (2014) 1937–1946 n1 n2 ginisplit ðDÞ ¼ þ ð7Þ nðginiðD1 ÞÞ nðginiðD2 ÞÞ In this way.1940 D. . jCi. Huang. To illustrate the operation of DT. which represent the weather condition of a particu- lar day. . .e.. It has several Thus. Eqs. . and (b) only one scan of the training data required for probability generation. That is. Now we can bring these two quantities to Eq. (9) and (10) are used to produce P(XjCi).

each instance. the number P(Wind = WeakjPlay = No) 2/5 = 0. do attributes to improve efﬁciency. Ai.4 noise free data.222 splitting attribute with the maximum information gain value as P(Temperature = HotjPlay = No) 2/5 = 0. .666 a DT algorithm depends on the size of training dataset. xn}. training dataset. we select the best P(Temperature = HotjPlay = Yes) 2/9 = 0. Cm}. For the decision tree generation. Decision tree.333 The algorithm continues recursively by adding new subtrees to P(Temperature = CooljPlay = No) 1/5 = 0. .444 mined. It is developed based on a basic C4.333 P(Wind = StrongjPlay = No) 3/5 = 0. We have found some in. bility. P(AijjCi). xi2. A decision tree is a classiﬁcation attribute. attributes {A1. . . The time and space complexity of P(Humidity = NormaljPlay = No) 1/5 = 0. . P(AijjCi). and the size of the generated tree. and thus decrease its accuracy. attribute values {Ai1. . We calculate the prior prob- 23: end if ability.3 making using the updated training dataset D with those purely P(Outlook = RainjPlay = No) 2/5 = 0. The result is a 19: if stopping point reached for this path. Then we re- move all the misclassiﬁed training instances from the dataset D. We the integration of a decision tree in order to ﬁnd a subset of attri- calculate the prior and class conditional probabilities using this butes with attribute weighting. instance based on these probabilities. 2: Find the prior probabilities. Each attribute. xi 2 D. for each class.0 P(Outlook = RainjPlay = Yes) 3/9 = 0. it is used to classify each test instance. xn} // Training dataset. P(Ci). .333 in the reduced training set all belong to the same class. x2. .1 and 4. C2. P(Temperature = MildjPlay = No) 2/5 = 0. Each attribute con- indicate that they belong to ‘‘Class = yes’’. each training instance is 12: end for represented as xi = {xi1. . for each attribute value (even if it is numeric) in D. some of these examples either contain contradictory characteristics. This class P(Humidity = HighjPlay = No) 4/5 = 0. An}. Decision tree induction Input: D = {x1. It seems that there is some C2.4 of attributes. Ai2.6 After removing those misclassiﬁed/troublesome instances from P(Outlook = OvercastjPlay = Yes) 4/9 = 0. do noisy training data at an initial stage to avoid overﬁtting. a predicate that can be applied to the attribute associated with the 17: for each arc do parent. Then we calculate the P(ClassjD) for each class determination. . brid DT induction. Algorithm 1 is used to describe the proposed Output: T. A2.2. xi 2 D. The proposed hybrid algorithm for a naïve Bayes classiﬁer blesome training examples. we discuss the proposed Algorithm 1 of the hy- 10: Remove xi from D. . D = {x1.2 P(Wind = WeakjPlay = Yes) 6/9 = 0. The presence of Probability Value such noisy training instances is more likely to cause a DT classiﬁer P(Outlook = SunnyjPlay = Yes) 2/9 = 0. with the highest posterior probability. where each . we ﬁrst apply a basic NB classiﬁer to 22: T0 = DTBuild(D). xih} and D contains the following 13: T = . 21: else For the training dataset. Once the root node of DT has been deter- P(Temperature = MildjPlay = Yes) 4/9 = 0. D = {A1. xi.444 the training dataset. . xih}. Ai2. An}. / Expert Systems with Applications 41 (2014) 1937–1946 1941 Table 4 noise within these data.2. 3: end for It embeds a DT classiﬁer to identify a subset of most important 4: for each attribute value. . The algorithm terminates when the instances P(Humidity = HighjPlay = Yes) 3/9 = 0.2 each branching arc. Aih}.4 P(Temperature = CooljPlay = Yes) 3/9 = 0. 5: Find the class conditional probabilities. P(Wind = StrongjPlay = Yes) 3/9 = 0. . There are two basic steps for the 20: T0 = Create a leaf node and label it with an appropriate development of a DT based application: (a) building the DT from a class. or carry exceptional In this section. xi. The proposed hybrid decision tree algorithm 8: Find the posterior probability. . is built. D. A set of classes C = {C1. . . Thus these misclassiﬁed in- dataset. . .4 the root node of the tree. . x2.- dataset they are labeled as ‘‘Class = no’’. . and (b) applying the DT to a test dataset. D. D. Ai. Ci. (b) each arc labeled with label. There is a set of attributes used to stances where the probabilities calculated using the NB classiﬁer describe the training data. Farid et al. The training data also belong to a 15: T = Create the root node and label it with the splitting set of classes C = {C1. stances are regarded as troublesome examples. Aij 2 D. . 11: end if Given a training dataset. these misclassiﬁed instances tend to be the trou. . xi 2 D. Ci. we present a hybrid naïve Bayes classiﬁer with features. classify each training instance. . which employs a NB classiﬁer to remove any Method: 1: for each class. For example. P(Ci). In our experiments. and (c) each leaf node labeled with a class. Sections 4. Ci 2 D and the class conditional proba- 24: T = Add T0 to arc. xi 2 D. is selected as the ﬁnal classiﬁcation for the instance.1. . . Algorithm 2 is used for the construction of a hybrid NB classiﬁer. The class. . tree associated with D that has the following properties: (a) each 16: T = Add arc to the root node for each split predicate and internal node labeled with an attribute. 4. .666 lines the proposed DT algorithm. .. 25: end for Then we classify each training instance. D. xi. D. hybrid DT induction. Aik}. P(Cijxi) 9: if xi is misclassiﬁed. . using these proba- bilities. A2. . Cn} is also used to label the training instances. Ci 2 D. do In this section. the child nodes and its arcs are created and added to the DT. . do 4. Once the tree 18: D = Dataset created by applying splitting predicate to D. which contains a set of training instances and their associated class labels.6 Algorithm 1. P(Cijxi). However in the training tains attribute values Ai = {Ai1. Suppose there is a training dataset with two classes. . . 6: end for 7: for each training instance.222 to become overﬁtting. which leads to contradictory results in Conditional probabilities for each attribute value calculated using the playing tennis comparison with the original labels.8 is then used to label the corresponding leaf node. which play more important roles in example training dataset. classiﬁcation for that instance. In a given training dataset. we subsequently build a DT for decision P(Outlook = OvercastjPlay = No) 0/5 = 0. . contains values {xi1.5 algorithm. . contains the following 14: Determine best splitting attribute. P(Outlook = SunnyjPlay = No) 3/5 = 0. . . .Md. Algorithm 2 out- P(Humidity = NormaljPlay = Yes) 6/9 = 0. . xi.

For example. then crucial roles in the ﬁnal classiﬁcation. 2. we calculate the prior probability.. sponding weights. by counting how often Ci occurs in D.1. P(Ci). Each dataset is roughly equivalent to a two-dimensional Pðxi jC i Þ ¼ PðAij jC i ÞW i ð14Þ spreadsheet or a database table. First of all. Wi. The weights of the se. fects on class conditional probability calculation as an exponential 3. xi 2 D. d. which ef. proposed hybrid NB algorithm. 26: Find the class conditional probabilities. proposed hybrid decision tree and naïve Bayes classiﬁers. Naïve Bayes classiﬁer 9.1942 D. Wi. For each attribute. Dataset descriptions. we calculate the minimum 19: end if depth. 10. Table 5 Method: 1: T = .. P(Cijxi). 16: Wi = 0. the The performances of both of the proposed hybrid decision tree weights of the selected attributes also inﬂuence their class condi- and naïve Bayes algorithms are tested on 10 real benchmark data- tional probability calculation as exponential parameters. If the attri. Ai. Image Segmentation Data (Image seg. Ai 2 T. Wi. 18: d as the minimum depth of Ai 2 T. Moreover. . Breast cancer 9 Nominal 286 2 Contact lenses 4 Nominal 24 3 4: T = Add arc to the root node for each split predicate and Diabetes 8 Real 768 2 label. 19 Real 1500 7 8: T0 = Create a leaf node and label it with an appropriate Tic-Tac-Toe 9 Nominal 958 2 class. 2: Determine the best splitting attribute. do Subsequently. Fitting Contact Lenses Database (Contact lenses). Breast Cancer Data (Breast cancer). Aij 2 xi. from the training dataset.e. Glass Identiﬁcation Database (Glass). That is we use the DT classiﬁer to 14: for each attribute. Wi – 0) and classify each instance xi 2 D using these probabilities. Ci2. (14). in this 10: T0 = DTBuild(D). Att. . we generate a basic decision tree. This is done only for those attributes that appear in the DT. . Wi refers to the weight of the attribute. 11: end if T. D. and present the evaluation results for both of the can be estimated by counting how often each Aij occurs in Ci 2 D. parameter. This is conducted by combining the effects of different attribute values. xn} // Training data. in the tree. lected attributes by the DT are also used as exponential parameters 27: end for (see Eq. Output: A classiﬁcation Model..Md. probability. then Image seg. Input: D = {x1. . and collect the attributes appearing 12: T = Add T0 to arc. (14). Ai. Aij 2 Ai. x2. To classify an instance. Datasets make the prediction. can be In this section. do ﬁnd the subset of the attributes in the training dataset which play 15: if Ai is not tested in T. Large Soybean Database (Soybean). Tic-Tac-Toe Endgame Data (Tic-Tac-Toe).e. Iris Plants Database (Iris plants). Datasets No of Att.P(Ci) and P(AijjCi) from D are used to 5. we calculate the class conditional probabilities using 25: for each attribute value. xi 2 D. Ai. xi. 1984 United States Congressional Voting Records Database the proposed hybrid algorithm for the naïve Bayes classiﬁer. Soybean 35 Nominal 683 19 Vote 16 Nominal 435 2 7: ifstopping point reached for this path. Farid et al. class conditional probabilities of those unselected attributes by 31: end for DT will not be generated and employed in the classiﬁcation result production. (Vote). P(Cijxi). where the attribute. Pima Indians Diabetes Database (Diabetes). The posterior 5. jCi). is tested in the DT and initial- 20: end for ize the weight. Otherwise. Cik} also has some values. . . Ai 2 D. we need P(Ci) for each Ci. We esti- sets from UCI machine learning repository (Frank & Asuncion. d bute. The class. the root node of the tree has a 23: end for higher weight value in comparison with those of its child nodes. 2010). Table 5 describes the datasets used in experimental analy- Y n sis. do way. we initialize the weight. As mentioned earlier. and P(xi. for each 5. is then found for Ci. . P(Ci). Ci. for each attribute. PðAij j C i ÞW i . the 30: Find the posterior probability. Similarly. the probability P(AijjCi) also environments. 4. and estimate the likelihood that xi is in each Ci. 17: else bute.. DT is applied as an attribute selection 13: end for and attribute weighting method. Ai. of the attri. . 8. Ci 2 D. Ai 2 D and Wi – 0. the number of occurrences of each attribute value. NSL-KDD 41 Real & Nominal 25192 23 9: else . After the tree construction. we describe the test datasets and experimental counted to determine P(Ai). / Expert Systems with Applications 41 (2014) 1937–1946 class Ci = {Ci1. do only those attributes selected by DT (i. Ci. Aij. In Eq. Ai 2 D. Experiments class. the importance of the attributes is measured by their corre- 22: Find the prior probabilities. In this 21: for each class. In this approach. NSL-KDD Dataset (NSL-KDD).) Algorithm 2. is not tested in the DT then the weight. est probability is used to label the instance. Types Instances Classes 3: T = Create the root node and label it with the splitting attribute. Wi = 0) will not 29: for each instance. Glass 9 Real 214 7 5: for each arc do Iris plants 4 Real 150 3 6: D = Dataset created by applying splitting predicate to D. T. is initialized to zero. 24: for each attribute. with the value of p1ﬃﬃd. 6. The 10 datasets are: j¼1 1. Ai 2 D. of the attribute. with the high.e. Algorithm 2 outlines 7. Other 28: end for attributes which are not selected by the DT (i. and W i ¼ p1ﬃﬃ. I. (14)) for class conditional probability calculation. mate P(xijCi) by Eq. do be considered in the ﬁnal result probability calculation. To calculate P(Cijxi).

We consider the weighted average for pre- accuracy ¼ ð16Þ TP þ TN þ FP þ FN cision. The detailed results for the weighted average of precision. since it is able to identify the most jXj important and discriminative subset of attributes for the If classify(x) = x c then assess(x) = 1 else assess(x) = 0.6 86.33 precision ‘‘%’’ of positive predictions that are correct Image seg. The classiﬁcation accuracy is measured either algorithm. x.66 79.66 73. This repeats 10 times and we take a mean accuracy rate. sensitivity and speciﬁcity analysis for each dataset. true negatives. The TP weighted average is similar to an arithmetic mean. and 10-fold cross validation to measure the performances of the proposed algorithms (Algorithms 1 and TP þ TN 2) using all 10 datasets.33 93. and 10- generated directly from the original training instances using C4. dataset by 4. and the remaining partitions are collectively used to mances of the classic C4.60 97.26 83.94 77.83 FP xi predicted to be in Ci but is not actually in it Iris plants 98 96 100 98.. the accuracy estimate is the overall number of correct classiﬁcations from the k iterations. Farid et al.34 98.98%. where X is production of naïve assumption of class conditional independence the set of instances to be classiﬁed. 10-fold cross validation breaks data into 10 sets on 10 datasets using 10-fold cross validation.5% using tioned into k mutually exclusive subsets or ‘‘folds’’. x 2 X and x c is the class of in.63 accuracy ‘‘%’’ of predictions that are correct Vote 97. (15) or by Eq. and visualization.83 98.0 81. or the recall rate). .24 90. Training and test.55 TN xi not predicted to be in Ci and is not actually in it Glass 96. Table 7 divided by the total number of instances in the initial dataset.1 in accuracy rates of the basic C4. and speciﬁcity (also 8. Dk. regression. the tradi- In k-fold cross-validation. by Eq.7 99.5 fold cross validation.83 95. and false negatives.44 . This DT classiﬁer gen- rules. in Table 7.5 classiﬁer and the proposed DT algorithm train the classiﬁer. It is a collection of machine learning 4.83 100 TP xi predicted to be in Ci and is actually in it Diabetes 84.7 98. rizes the symbols and terms used throughout in Eqs. We implement both of the proposed algo- datasets shown in Table 5. precision. respectively. Moreover. sensitivity-speciﬁcity analysis. NetBeans IDE is the ﬁrst posed hybrid algorithms for each of the 10 training datasets. Evaluated using all the instances in the 10 datasets.5 DT achieved an average accuracy rate of 83.66 99. . the tic-tac-toe dataset by called the true positive rate. the tic-tac-toe dataset by ing software (Hall et al. the partition Di is reserved of the 10 datasets. Table 6 summa. the sensitiv ity ¼ ð18Þ TP þ FN weighting for the evaluation of this research is calculated using TN the number of instances belonging to one class divided by the total specificity ¼ ð19Þ TN þ FP number of instances in one dataset. (16)–(19) and their meanings. (16).82 95. xi 2 X ð15Þ of all 10 training datasets.0 GHz processor (2 MB Cache.87% and the contact lenses called the true negative rate). the image seg. Weka3 contains tools for data by 3.D2. D.81 speciﬁcity ‘‘%’’ of negative instances that are predicted as negative NSL-KDD 73.11 76. the initial data are randomly parti. dataset by 4. The proposed DT algorithm (Algorithm each of which has an approximately equal size. which is open source data min- tion of the diabetes dataset by 7. Among all the 10 datasets. 1) obtained an average accuracy rate of 88. of size N/10.66 FN xi not predicted to be in Ci but is actually in it Soybean 96.2. ..Md.html). The code for the basic versions of the DT and NB (Algorithm 1) outperforms the C4. It trains the classiﬁer on 9 datasets and tests it using the remaining one dataset.5 DT classiﬁer for the classiﬁca- classiﬁers is adopted from Weka3. the contact lenses dataset by 4. (17)– proposed NB algorithm respectively improves the classiﬁcation (19) are used for the calculations of precision. IDE providing support for JDK 7 and Java EE 6 (http://netbeans. association from each dataset before the DT induction. sensitivity. 10-fold cross validation. In iteration i. 800 MHz the classiﬁcation accuracy on the training sets of the 10 benchmark FSB) and 1 GB of RAM. clustering. sensitivity and speciﬁcity analysis where.9 Symbol Intuitive Meaning Contact lenses 91.85 96.85 93. Eqs. positives. false for the experiments conducted are presented in Tables 8–11.5 and NB classiﬁers.org/ The results in Table 7 indicate that the proposed DT algorithm index. TP some data points contribute more than others. we evaluated the performances of proposed algorithms The experiments were conducted using a machine with an Intel (Algorithms 1 and 2) against existing DT and NB classiﬁers using Core 2 Duo Processor 2. Tables 8 and 9 respectively tabulate the perfor- as the test set. tional C4. We use NetBeans IDE 7. comparing to the basic NB classiﬁer. classify(x) returns the classiﬁcation of x. Secondly. Experimental setup Firstly.26 55. D1.55%. the proposed NB algorithm (Algorithm 2) PjXj also outperforms the traditional NB classiﬁer for the classiﬁcation i¼1 assessðxi Þ accuracy ¼ . Table 7 summarizes the classiﬁcation rithms (Algorithms 1 and 2) in Java. scribed as 100% sensitivity and 100% speciﬁcity. TN. precision. we have used the classi- to carry more generalization capabilities comparing to the DT ﬁcation accuracy.89%. Also.17% and the NSL-KDD dataset algorithms for data mining tasks. Training C4.3% for the classiﬁcation ing are performed k times. / Expert Systems with Applications 41 (2014) 1937–1946 1943 5. TP.53 sensitivity ‘‘%’’ of positive instances that are predicted as positive Tic-Tac-Toe 93. 99. classiﬁcation. Algorithm 1 is capable of identifying the noisy instances pre-processing. where instead precision ¼ ð17Þ TP þ FP of each of the data points contributing equally to the ﬁnal average.17%. Results and discussion 5. For example.7%. training dataset is less likely to become overﬁtting and thus able To test the proposed hybrid methods. speciﬁcity. we have used classiﬁcation accuracy.5 DT NB Proposed DT Proposed NB Table 6 datasets classiﬁer classiﬁer classiﬁer (%) classiﬁer(%) Symbols used in Eqs.37 79.23%. (16)–(19). The classiﬁcation accuracies of classiﬁers using training datasets.3. For classiﬁcation. sensitivity (also rates of the glass dataset by 18. FP and FN denote true positives. The mining algorithms in Weka3 can be erated from the updated noise free high quality representative either applied directly to a dataset or called from our own coding.33 93.30 91.43 78. and the pro- Redhat enterprise Linux 5 for Java coding. 2009). (%) (%) (Algorithm 1) (Algorithm 2) Breast cancer 96. the stance.73 69.66 95. A perfect classiﬁer would be de.

742 0.67%.042 Image seg.758 0. the results shown in Tables 8 and 9 indicate that the proposed DT algorithm outperforms the traditional C4.036 Vote 94.3% using 10-fold cross validation. Overall.055 Image seg.735 0.5 DT and the proposed DT classiﬁ- Tic-Tac-Toe 88. sensitivity and speciﬁcity values for the sets using 10-fold cross validation.957 0. Breast cancer 71.185 the classiﬁcation accuracy rates of the contact lenses dataset by Contact lenses 91.5 DT classi- ﬁer in all the test cases.716 0.96 0.96 0.02 Soybean 91.842 0.91 0.098 Image seg.46 0.891 0.87 0.241 Contact lenses 83.149 Tic-Tac-Toe 78. / Expert Systems with Applications 41 (2014) 1937–1946 Table 8 Table 11 The classiﬁcation accuracies and precision. Fig.82 0.) Breast cancer 75. the proposed NB algorithm has respectively improved Breast cancer 81.70 0.67 0.145 Glass 66.819 0.) avg.755 0.057 by 9.853 0.931 0.866 0.5 DT and the siﬁer (Algorithm 2) obtained an average accuracy rate of 86.1 0.06 0. 2 and 3). Algorithm 1 is able to automatically remove noisy Table 10 instances from training datasets for DT generation to avoid overﬁt- The classiﬁcation accuracies and precision. and between a basic NB and the proposed NB classiﬁers for NSL-KDD 81.87 0.738 0.19 0.291 classiﬁers in all the test cases (see Figs.708 0.836 0. 2 and 3 respectively show the comparison of classiﬁcation Image seg. Comparing to the traditional avg. proposed DT classiﬁers on 10 datasets with 10-fold cross validation.826 0.15 0.944 0.691 0.767 0.) avg.764 0.851 0.909 0.5 The classiﬁcation accuracies and precision.79 0.96 0.485 0. 95.07 0.245 Breast cancer 75.176 Tables 10 and 11 respectively tabulate the performances of the Table 9 traditional NB classiﬁer and the proposed NB algorithm on 10 data- The classiﬁcation accuracies and precision.022 Figs.762 0.851 0.) avg.514 Iris plants 96 0.763 0.39 0.977 0.04 Iris plants 98 0.83 0.33 0.12%.3 NSL-KDD 76.11 0.826 0.8% on average comparing to the traditional DT classiﬁer.533 0.261 Diabetes 85.945 0. Furthermore. 2. proposed hybrid DT classiﬁer with 10-fold cross validation. 96.849 0. Algorithm 2 is capable of identifying the most discriminative Datasets Classiﬁcation Precision Sensitivity Speciﬁcity subset of attributes for classiﬁcation.055 Vote 90.083 16.166 Contact lenses 87.849 0. Overall. 85. In comparison with the traditional DT.62 0. the contact lenses dataset by 8. and the NSL-KDD dataset by 6.901 0.81%.) avg.758 0.283 They respectively outperform the traditional C4.66 0.21 NSL-KDD 71.013 Soybean 92.875 0.67 0.696 0. Diabetes 76.66 0.789 0. the glass dataset by 9.209 9.18 each dataset with 10-fold cross validation.50 0.986 0.965 0.814 0.905 0.48 0.236 Glass 48.189 Tic-Tac-Toe 69.523 0. proposed hybrid NB classiﬁer with 10-fold cross validation.) avg.034 accuracy rates between the C4.98 0.819 0.92 0. The results shown in Tables 10 and 11 indicate that the Datasets Classiﬁcation Precision Sensitivity Speciﬁcity proposed NB algorithm (Algorithm 2) also outperforms the basic accuracy (%) (weighted (weighted (weighted NB algorithm for all the test cases. and the breast cancer dataset by 5.496 0.94%.27 0.) and 2) for the classiﬁcation of challenging real benchmark datasets.711 0.752 0.83 0. Datasets Classiﬁcation Precision Sensitivity Speciﬁcity Datasets Classiﬁcation Precision Sensitivity Speciﬁcity accuracy (%) (weighted (weighted (weighted accuracy (%) (weighted (weighted (weighted avg. while the proposed NB clas.682 0.872 0.97 0. It thus possesses more robustness and generalization capabil- classiﬁer with 10-fold cross validation.75 0.Md. Fig.977 0.893 0. the tic-tac-toe dataset by Diabetes 79.96 0.82 0.5 DT and NB Contact lenses 70.331 Glass 52.52 0.73 0.1944 D. 4 also shows the comparison of classiﬁcation accuracy rates of all classiﬁers on 10 datasets with 10-fold cross validation.98 0.708 0.916 0.834 0.786 0. The evaluation results prove accuracy (%) (weighted (weighted (weighted the efﬁciency of the proposed DT and NB algorithms (Algorithm 1 avg.30 0.59 0.237 improved the classiﬁcation accuracy rates for all the 10 datasets Iris plants 98. Vote 97.88 0.854 0.963 0.788 0.45%. it has improved the classiﬁcation accu- racy rates for the classiﬁcation of the above 10 datasets by 4.11%.7%. Algorithm 2 has Glass 76.) avg.881 0. sensitivity and speciﬁcity values for a C4. the diabetes dataset by 9.33 0.) NB classiﬁer.711 0.03 0.148 Tic-Tac-Toe 85.) avg.81 0. sensitivity and speciﬁcity values for a NB ting. Farid et al.986 0. ities.047 Vote 96.) avg.32 0.) avg.963 0. The comparison of classiﬁcation accuracy rates between the C4. sensitivity and speciﬁcity values for the classiﬁer with 10-fold cross validation.33%.4% on average in comparison to the classic NB classiﬁer.125 Diabetes 73.288 NSL-KDD 82.476 Iris plants 96 0.53 0.703 0.762 0. . 81.065 Soybean 94.29%. we have also evaluated the traditional NB classi- ﬁer using all the 10 datasets and achieved an average accuracy rate of 77.50 0.668 0. the proposed DT classiﬁer has respectively improved the classiﬁcation accuracy rates of the NSL-KDD dataset by 10.823 0.118 ers.852 0.27 0.871 0.41 0.237 Moreover.04 Soybean 92.833 0.11 0.

A. Hall. McHugh. Partition-conditional ICA for Bayesian classiﬁcation of microarray data. 12–19. (2008).. & Sarkani. Testing intrusion detection systems: a critique of the 1998 and algorithms were tested against those of the traditional DT and NB 1999 darpa intrusion detection system evaluations as performed by lincoln classiﬁers using the classiﬁcation accuracy. D. J. 36. 1080–1083. Application of data mining techniques mark datasets from UCI machine learning repository. & Asuncion.. 81–106. J. Jamain. In future work. & Rajaram.. Expert Systems with Applications.0 machine learning algorithm. 1059–1069.. sensitivity. & Hsiao. Strachan..ics.. Y.5: Programs for machine learning. C. 23–31. & Hand. B. 1293–1298. Riaz. 39. Frank. Farid. Tavallaee. T. A survey of decision tree classiﬁer methodology. http:// www. (1995). A.. 8549–8559. B. 24. Carrasco-Ochoa. & Varghese. of the International Conference on Computing.. 815–840. Harbi.. 4. Turney. M. Hsu. Safavian. Expert Systems with Applications. D. (2012). based on a hidden naïve Bayes multiclass classiﬁer. Farid. Chandra. Using decision trees to summarize associative classiﬁcation rules. 13492–13500. & Qu. M. K.. precision. W. (2011).. & Ghorbani. 5895–5906. A. UCI machine learning repository. C. 35. L. & Gunes. Effective solution for unhandled exception in decision tree induction algorithms. Z. The in customer relationship management: A literature review and classiﬁcation review article.. E. & Isa. A. 39. Friedman.edu/ml.-L. 25. (2009). Rahman. C4.H. R. P. T.. J. L. G.. Journal of Computers..-P. (2008). & Rahman. Frank. IEEE Transactions on Systems.. Rahman. 660–674. classiﬁcation problems. C. T. 34. Classiﬁcation and regression trees. International Journal of Network Security & Its Applications. R. Feature selection for text classiﬁcation with naïve Bayes. T. S.ac. Sexton. (2011). (1993). Induction of decision tree. Chen. 87–112. Expert Systems with Applications. L.. (2009). S. K. the proposed hybrid NB classiﬁers on 10 datasets with 10-fold cross validation.-C. speciﬁcity analysis. Holmes. & Rahman. 38. A. L. R. (2012). R. factor facility for na’’ıve Bayes classiﬁcation. Adaptive intrusion detection based on boosting and naïve Bayesian classiﬁer. The comparison of classiﬁcation accuracy rates between the basic NB and Chandra. 2. rithms. Hossain.. 11303–11311.. Robust approach for estimating probabilities in naïve Bayesian classiﬁer for gene expression data. (2009). Expert Systems with Applications. Farid... Conclusions mixed data. (2010). In Proc.. A novel hybrid intelligent method based on C4.-Y. R. I. A. Mining supervised classiﬁcation performance studies: A meta-analytic investigation. 7. 38. R.. Expert Systems with Applications. E. multi-class classiﬁcation tasks under dynamic feature sets. L.. In Proc. Anomaly network intrusion detection based on improved self adaptive Bayesian algorithm. & Gupta. 12113–12119. J. Y. P.. P.. A. J. genetic algorithms. Machine Learning. D.. laboratory.-L. Franco-Arcega. & Shih. A co-evolving decision tree classiﬁcation method. J. M.-H. Chu. An adaptive ensemble classiﬁer for mining concept drifting data streams. S. instances from the training set before the DT induction. 369–409. 8220–8228.Md. Expert Systems with Applications. S. (2009). L. 2645). D. (1991).. Lu.T. 5432–5435. G. M. 37. 262–294. Information Sciences.. Reutemann.-W. A. Quinlan. Expert Systems with Applications. ACM Transaction on Information and System Security. Combining naïve Bayes and decision tree for adaptive intrusion detection.. & Roy. & Chang. J.. Bujlow. J. Lee. P. Bagheri.. S. Split selection methods for classiﬁcation tree. Mazzuchi.cs. The performances of the proposed Sininca. 37. Networking and Knowl- edge) project (Grant No.. (1986). M. J. of the 2nd IEEE Int.2013. H. (2008).waikato. M. J. 11. Extended naïve Bayes classiﬁer for 6. W. Poh. experimental results showed that the proposed methods have pro. Huang. 1587–1592.-H. 14290–14300. Man and Cybernetics. Chandra. & Hung. W. (2013). on Computational Intelligence in Security and Defense Applications (pp.. Stone. & Chau. 21. & Pedersen. Classsiﬁcation by clustering decision tree-like classiﬁer based on adjusted clusters. Zhang. The comparison of classiﬁcation accuracy rates of all classiﬁers on each archive.. (2010). Chapman and Hall/CRC. Data mining techniques and applications – a decade review from 2000 to 2011. & Paul Varghese. Expert Systems with Applications. / Expert Systems with Applications 41 (2014) 1937–1946 1945 of Excellence for Learning.. Decision tree induction using a fast splitting attribute selection for large datasets. 2592–2602. 12–15. .. Cost-sensitive classiﬁcation: Empirical evaluation of a hybrid We appreciate the support for this research received from the genetic decision tree induction algorithm. P. 36. M. Y. Xiu..07.nz/ml/weka/. M. Expert Systems with proved the classiﬁcation accuracy rates of both DT and NB classiﬁers Applications.2013. (2009). 53–58). Expert Systems with Applications. (2009). (2011). Expert Systems with Applications. other classiﬁcation algo. A method for classiﬁcation of network trafﬁc based on C5.. Expert Systems with Applications. The proposed methods im. Aviad.. & Rahman. 179. Farid. A detailed analysis of the Acknowledgment KDD CUP 99 data set. Sanchez-Diaz. 36.. D. A. M. Expert Systems with second proposed hybrid NB classiﬁer used a DT induction to select Applications. P. D. will be used to deal with real-time Publishers Inc. Networking and Communications (ICNC) (pp.. 3. J. (2010). Expert Systems with Applications. 40. 3. References Aitkenhead. D. H. 38. 36.5 duced impressive results for the classiﬁcation of real life challenging decision tree classiﬁer and one-against-all approach for multi-class multi-class problems.. X. Z. (2000). L. D. The ﬁrst proposed hybrid DT algo. rough Quinlan. while the Liao. Conf. B. A.. The weka data mining software: An update. & Witten.. 237–241). Chen. 5. Tian.. 2338–2351. 1. such as naïve Bayes tree (NBTree). & Olshen. Z. Pfahringer. & Martinez-Trinidad. A network intrusion detection system algorithms for DT and NB classiﬁers. M. Ngai. M. G.A. D. International Journal of Computer Applications. D. M. B. Moving towards efﬁcient decision tree construction.. (2009). Fuzzifying gini index based decision trees. Statistica conditional independence. Expert Systems with Applications. Accessed 29. Balamurugan.. Expert Systems with Applications. and 10-fold cross validation on 10 real bench. Morgan Kaufman set approaches and fuzzy logic.07. Y. 18–25.uci. & Zhou. we have proposed two independent hybrid Koc. & Landgrebe. P. (2012).. SIGKDD Explorations.. C.M. (2011). http:// Fig. Accessed 29. a subset of attributes for the production of naïve assumption of class Loh. (2010). M. E. Polat. 36. A. Breiman. (1997).-H. Fan. dataset with 10-fold cross validation. N. (1984). 36. In this paper. 8188–8192. K. M.. Huang. B. Journal of Artiﬁcial Intelligence European Union (EU) sponsored (Erasmus Mundus) cLINK (Centre Research. rithm used a NB classiﬁer to remove the noisy troublesome 8471–8478. (2009). Automatically computed document dependent weighting in multi-class classiﬁcation tasks. Fig. F. Farid et al. G.. C. K. S. (2010). (2009).. Innovation. et al. Journal of Classiﬁcation.

. 4. Varas. E. using a naïve Bayes classiﬁer. Ann Arbor. of the ﬁfth National Conference on Machine Learning. 107–120). Farid et al. / Expert Systems with Applications 41 (2014) 1937–1946 Utgoff. Michigan. & Ruz. USA (pp.Md. Valle. P.E. M. P. Expert Systems with Applications. Incremental induction of decision trees. Machine Learning. (1988). (2012). S.. Utgoff. G.. A. Job performance prediction in a call center 161–186. In Proc. (1989). A.1946 D. 9939–9945. ID5: An incremental ID3. 39.

- 10. Data mining and data warehouse.pdfUploaded byMaryus Steaua
- Ijitce June 2011Uploaded bysivala
- Machine Learning Revision NotesUploaded byGaurav Tendolkar
- computer graphics Q & AUploaded byksi12345
- Decision TreesUploaded byRatul Kumar
- encog-3_3-quickstart.pdfUploaded bybinkyfish
- Ge Presentation(1)Uploaded bySav Sui
- Predictive Modelling of Crime Dataset Using Data MiningUploaded byLewis Torres
- fvcnn.pdfUploaded byRajat Govekar
- tutorialUploaded byAl Oy
- u07a1 PROBABILITY MODELING Hal Hagood .docxUploaded byHalHagood
- T03a.BinaryTreeTheoremsUploaded byBogdan Florescu
- Data Structure and AlgorithmsUploaded byShanmugha Priya Balasubramanian
- Paper on Data MiningUploaded byHaveit12
- First Task 2Uploaded byJon Arnold Grey
- CS301-Lec12 handout.docUploaded byJʋŋaiɗ Aĸɓar
- GeeksforGeeks placement questionsUploaded byRadhika Parik
- TournamentTrees.pdfUploaded byИРадојичић
- Synchronization BarriersUploaded bySamir Najar
- UGC Computer Science Paper 2 Solved December 2012Uploaded bytanyagoswami11
- Greedy Part 2Uploaded byjay desai

- EAP Menggunakan Zachman Framework (Studi Kasus PT. Jasaraharja Putera Cabang Pekanbaru)Uploaded byAhmar Taufik Safaat
- jurnal_15299Uploaded byMuhammad Robby Dharmawan
- Bab 1 Pengertian Data MiningUploaded byArif Febrian Arif
- Aplikasi Enkripsi Pesan SMS Dengan Algoritma Kriptografi Block Chiper DES Berbasis AndroidUploaded byAhmar Taufik Safaat
- Modul Praktikum SIstem DigitalUploaded byAhmar Taufik Safaat
- Analisa dan Perancangan Sistem Informasi Penerimaan Siswa Baru dan Pembayaran SPP Menggunakan Zachman Framework.pdfUploaded byAhmar Taufik Safaat
- Modul Praktikum Algoritma Dan Struktur DataUploaded byAhmar Taufik Safaat
- A Simple Dynamic Mind-map Framework to Discover Associative Relationships in Transactional Data StreamsUploaded byAhmar Taufik Safaat

- Franklin County Hurricane Irma Information as of noon on September 9thUploaded byMichael Allen
- FEMA - High Water Mark Collection for Hurricane Katrina in AlabamaUploaded byPhilee
- 01 Meteorologia Basica Resumo - avlan.net (1).pdfUploaded byLuiza Helena Salamoni Stravalacci
- CHEM 335 Aldol ReactionUploaded bytyler_4
- Nerzic 2007 Wind:Waves:CurrentUploaded bybrian_dutra
- Bayesian Computation for the Prediction of Football Match Using Artificial Neural Network (1)Uploaded byGeorge Chiu
- Sage X3 - User Guide - HTG-SmartForecast V8 Setup.pdfUploaded bycaplusinc
- Chapter 01- Essentials of GeographyUploaded byAsmawi Bin Abdullah
- Army Aviation Digest - Feb 1963Uploaded byAviation/Space History Library
- Course Project CUploaded byBankole Oni
- 2011 NIPCC Interim ReportUploaded byMattParke
- TRANSMISSION LINE INSULATIONUploaded byapi-19757160
- Design of Flexible PavementsUploaded bykyugu
- 98308348-Complete-PET-Workbook.pdfUploaded byEvaMolinaRuiz
- below - lesson 3Uploaded byapi-355630171
- Decomposition MethodUploaded byAbdirahman Deere
- Wind Thesis2Uploaded bypcsekhar111112509
- 1986 PatersonUploaded byJanet Lee
- Safety First 24Uploaded byVincenzo Parisi
- Q1Uploaded byShEikh Azlan
- Actividad Integradora Podemos Saberlo M17S3Uploaded byEdgar Roel Acosta Carrillo
- Chapter 09 - Forecasting Exchange RateUploaded byJeffrey Strachan
- Inhofe's Anti-global Warming ReportUploaded byBigMamaTEA
- Flugwetterinformationen 2007 Nov dUploaded byChristian Mueller
- Heat Flow BasicsUploaded bychandirandelhi
- masas de aire.docxUploaded byMayeli Muñoz Caja
- unit 2 social science 4º primaria.pdfUploaded byAnonymous 3noAFIag
- Time Seriesforcasting and Index NumberUploaded byleziel
- 1 Ano B. Exercicio Revisao. 1Uploaded byReginaldo Campos
- J. 2008 Indoor AirUploaded bySureshSurya