This action might not be possible to undo. Are you sure you want to continue?

# ICIT – IEEE 2007

**Medical datamining with probabilistic classifiers
**

Ranjit Abraham1, Jay B.Simha2, Iyengar S.S3

Ejyothi Services Pvt. Ltd, Kurisupally Road, Cochin. ranjit.abraham@ascellatech.com 2 Abiba Systems, Bangalore, INDIA. jbsimha@gmail.com 3 Department of Computer Science, Louisiana State University, Baton Rouge, USA iyengar@bit.cse.lsu.edu

1

**Abstract: - Statistical classifiers typically build (parametric) probabilistic models of the training data, and
**

compute the probability that an unknown sample belongs to each of the possible classes using these models. We utilize two established measures to compare the performance of statistical classifiers namely; classification accuracy (or error rate) and the area under ROC. Naïve Bayes has obtained much relevance in data classification for machine learning and datamining. In our work, a comparative analysis of the accuracy performance of statistical classifiers namely Naïve Bayes (NB), MDL discretized NB, 4 different variants of NB and 8 popular non-NB classifiers was carried out on 21 medical datasets using classification accuracy and true positive rate. Our results indicate that the classification accuracy of Naïve Bayes (MDL discretized) on the average is the best performer. The significance of this work through the results of the comparative analysis, we are of the opinion that medical datamining with generative methods like Naïve Bayes is computationally simple yet effective and are to be used whenever possible as the benchmark for statistical classifiers.

**Keywords: Bayesian networks, Naïve Bayes classifier, discretization, Minimum Description Length
**

(MDL)

1. Introduction

In the last few years, the digital revolution has provided relatively inexpensive and available means to collect and store large amounts of patient data in databases containing rich medical information and made available through the Internet for Health services globally. For a Physician who is guided by empirical observation and clinical trials, this data becomes appropriate if it is provided in terms of generalized knowledge such as information pertaining to patient history, diseases, medications, and clinical reports. Several computer programs have been developed to carry out optimal management of data for extraction of knowledge or patterns contained in the data. One such program approach has been data classification utilizing statistical algorithms with the goal of providing information such as if the patient is suffering from the illness or not from a case or collection of symptoms. Naïve Bayes (NB) is a simple yet consistently performing probabilistic model based on the theory of Bayesian networks. Data classification with naïve Bayes is the task of predicting the class of an instance from a set of attributes describing that instance and assumes that all the attributes are conditionally independent given the class. This assumption grossly violates real-world problems and much effort has been focused in the name of naïve Bayes variants by relaxing the independence assumptions to improve classification accuracy. Research shows Naïve Bayes still performs well in spite of strong dependencies among attributes [19]. Research work show that naïve Bayes classification works best for discretized attributes and the application of Fayyad & Irani’s Minimum discretization length (MDL) discretization gives on the average best classification accuracy performance [22].

Formally a BN for X is a pair B= <G. popular variants of NB and with state-of-the-art classifiers such as k-nearest neighbor. Data attributes are either numeric or categorical. Decision trees.xn where x1 is the value of the attribute X1. Neural networks. The technique evaluates a candidate cut point between each successive pair of sorted values. Research study shows that naïve Bayes classification works best for discretized attributes and discretization effectively approximates a continuous variable [2].θ>. Here nodes represent dataset attributes. x 2 .. X2. a special form of Bayesian network (BN) has been widely used for data classification in that its predictive performance is competitive with state-of-the-art classifiers such as C4.ICIT – IEEE 2007 In this paper we compare the accuracy performance of non-discretized NB with MDL discretized NB. C X1 X2 X3 X4 Figure 1. a much more powerful and flexible representation of probabilistic dependence generally did not lead to improvements in accuracy and in some cases reduced accuracy for some domains [19]. While categorical attributes are discrete. Support vector machines. X2. The NB classifier represented as a BN has the simplest structure..Xn and whose edges represent direct dependencies between the attributes. 3.. then the class c of the example E (c(E)) can be represented as a classifier by the BN [11] as c( E ) = arg max p (c) p ( x1.. The most likely class of a test example can be easily estimated and surprisingly effective [6]. x2. x n | c) . numerical attributes are either discrete or continuous. A BN is an annotated directed acyclic graph that encodes a joint probability distribution over a set of attributes X. Here the assumption made is that all attributes are independent given the class and equation 1 takes the form n c( E ) = arg max p (c) c∈C ∏ p( x | c) i i =1 .. Assuming that X1. Let C represent the class variable and c its value corresponding to the class node in the BN. The BN can be used to compute the conditional probability of a node given values assigned to the other nodes. RIPPER. Comparing NB to BN.(1) c∈C Although BN can represent arbitrary dependencies it is intractable to learn it from data. RIDOR and Decision Tables. The Minimum Description Length (MDL) discretization is Entropy based heuristic given by Fayyad and Irani [9]. the data are discretized into two intervals and the class information entropy is .(2) The structure of NB is graphically shown in Figure 1. The Bayesian Network can be used as a classifier where the learner attempts to construct a classifier from a given set of training examples with class labels..Xn are the n attributes corresponding to the nodes of the BN and say an example E is represented by a vector x1.. For each candidate cut point.. Naïve Bayes (NB) Naïve Bayes (NB). Hence learning restricted structures such as Naïve Bayes is more practical. NB is best understood from the perspective of Bayesian networks. where G represents the directed acyclic graph whose nodes represent the attributes X1. MDL Discretized Naïve Bayes Discretization is the process of transforming data containing a quantitative attribute so that the attribute in question is replaced by a qualitative attribute [25]. Structure of Naïve Bayes Accordingly each attribute has a class node as its parent only.5 [26]. Bayesian networks graphically represent the joint probability distribution of a set of random variables. 2. Logistic regression.

A graphical representation for Boosted Naïve Bayes (BAN) is shown in Figure 3. which is N-P hard. where each classifier in the series learns more attention to the examples that have been misclassified by its predecessors. TAN is a special case of Augmented Naïve Bayes (ANB). The final boosted classifier outputs a weighted sum of the outputs of each individual classifier series with each weighted according to its accuracy on its training set. MDL discretized datasets show good classification accuracy performance with naïve Bayes [22]. The hidden nodes ψ correspond to the outputs of the NB classifier after each iteration of boosting. C X2 X1 X3 X4 Figure 4. An example structural representation for SNB is shown in Figure 5. By applying the algorithm [12] incorporating Kruskal’s Maximum Spanning Tree algorithms an optimal Augmented Bayes Network can be found. C Ψ1 Ψ3 Ψ2 X1 X2 X3 Figure 3. Structural representation for Forest augmented Naïve Bayes (FAN) The Selective Naïve Bayesian classifier (SNB) uses only a subset of the given attributes in making the prediction [17]. The technique is applied recursively to the two sub-intervals until the criteria of the Minimum Description Length (MDL). Boosting requires only linear time and constant space and hidden nodes are learned incrementally starting with the most important [8].ICIT – IEEE 2007 calculated. A graphical structural representation for the Forest augmented NB is shown in Figure 4. Hence each next classifier learns from the reweighed examples. The candidate cut point that provides the minimum entropy is chosen as the cut point. Variants of Naïve Bayes Classifier The Tree Augmented Naïve Bayes (TAN) is an extended NB [10] where with a less restricted structure in which the class node directly points to all attribute nodes and an attribute node can have only one parent attribute node. With sample datasets BAN shows comparable accuracy with TAN. Experiments with sample datasets reveal that SNB appears to overcome the weakness of NB classifier. The structure of TAN is shown in Figure 2. TAN has shown to maintain NB robustness and computational complexity and at the same time displaying better accuracy. C X1 X2 X3 X4 Figure 2. . Structural representation for the Boosted Augmented Naïve Bayes (BAN) The Forest augmented Naïve Bayes (FAN) represents an Augmented Bayes Network defined by a Class variable as parent to every attribute and an attribute can have at most one other attribute as its parent [11][24]. The model enables to exclude redundant. irrelevant variables so that they do not reflect any differences for classification purposes. Structural representation of Tree Augmented Naïve Bayes (TAN) Boosting involves learning a series of classifiers. which is equivalent to learning an optimal BN. 4.

Given a query point.5 rules yielding faster training and lower error rates [3]. we find K number of objects or (training points) closest to the query point. LR is often referred to as a discriminative classifier unlike NB which is referred to as a generative classifier. and C4. from a set of variables that may be continuous. Any ties can be broken at random. and used to modify the networks algorithm the second time around. at the outset. is largely arbitrary) with the known actual classification of the record. A Decision Table (DTab) is essentially a hierarchical table in which each entry in a higher level table gets broken down by the values of a pair of additional attributes to form another table. For an unlabelled instance. Popular non-NB statistical classifiers The idea of a Decision Tree (DT) [21] is to partition the input space into small segments. This method offers modifications to IREP. discrete. If no instances are found. Each test results in branches. which represent different outcomes of the test. Artificial neural networks (NN) are relatively crude electronic networks of "neurons" based on the neural structure of the brain. k-NN is a supervised learning algorithm where the result of new instance query is classified based on majority of K-nearest neighbor category [6]. otherwise the majority class matching all the instances is returned. Structural representation for Selective Naïve Bayes (SNB) For the above given model. A DT is a k-ary tree where each of the internal nodes specifies a test on some attributes from the input feature set used to represent the data. C4. The classification is using majority vote among the classification of the K objects. X3. They process records one at a time. This approach constructs hyperplanes in a multidimensional space that separates cases of different class labels. The class probability of an example is estimated by the proportion of the examples of that class in the leaf into which the example falls.5. will be assigned to the class c( E ) = arg max p(c) p( x1 | c) P ( x 2 | c) P ( x 4 | c) c ∈C 5. Logistic regression (LR) is part of a category of statistical models called generalized linear models. A decision plane is one that separates between a set of objects having different class memberships. The errors from the initial classification of the first record is fed back into the network. K-nearest neighborhood may be influenced by the density of the neighboring data points. a decision table classifier searches for exact matches in the decision table. x4 >. then the majority class from the decision table is returned. The Support Vector Machine (SVM) classification is based on the concept of decision planes that define decision boundaries. SVM can handle multiple continuous and categorical variables [5]. such as group membership. The basic algorithm for DT induction is a greedy algorithm that constructs decision trees in a top-down recursive divide-and-conquer manner. x2. . and an example given by E=< x1. or a mix of any of these [17]. LR allows one to predict a discrete outcome. The purpose of this algorithm is to classify a new object based on attributes and training samples. and label these small segments with one of the various output categories.ICIT – IEEE 2007 C X1 X2 X4 Figure 5. and so on for many iterations [14]. Each branch descending from a node corresponds to one of the possible values of the feature specified at that node. The Repeated Incremental Pruning to Produce Error Reduction (RIPPER) is a decision-tree learning algorithm developed by William Cohen of AT&T Laboratories. dichotomous. and "learn" by comparing their classification of the record (which. [16]. k-NN algorithm uses neighborhood classification as the prediction value of the new query instance. The classifiers do not use any model to fit and only based on memory.

k -Nearest Neighbor (k-NN). Tree augmented naïve Bayes (TAN) and Forest augmented naïve Bayes (FAN). The wins at the bottom of Table 2 and Table 3 provides the ratio of medical datasets where the accuracy is highest among the considered classifiers to the total number of datasets used for our experiments. No. Here a default rule is generated and then the exceptions to the default rule (least weighted error rate). The second measure is based on true positive rate which is used in Reciever Operator Characteristic Curve (ROC). Support Vector Machine (SVM). status attr. . We have used 10-fold cross validation test method to all the Medical datasets [15]. Repeated Incremental Pruning to Produce Error Reduction (RIPPER). Logistic Regression (LR). Experimental Evaluation Table 1 provides the specification for the 21 medical datasets used for the experimental evaluation. Accordingly the area under the curve (AUROC) becomes a single-number performance measure for comparing classifiers. [4] 6. The 4 variants of naïve Bayes chosen for our experiments are Selective naïve Bayes (SNB). The dataset was divided into 10 parts of which 9 parts were used as training sets and the remaining one part as the testing set.ICIT – IEEE 2007 The RIpple DOwn Rule learner (RIDOR) is an approach to building knowledge based systems (KBS) incrementally. Decision Table (DTab) and RIpple Down Rules Learner (RIDOR). The 8 popular non-NB statistical classifiers are Decision Tree (DT). Table 1: Specifications for the Medical datasets SL. Table 2 shows the accuracy results for non-discetized NB. status Wisconsin Breast Cancer [1] 699 10 2 Yes No Pima Diabetes [1] 768 9 2 No No Bupa Liver Disorders [1] 345 7 2 No No Cleveland Heart Disease [1] 303 14 2 Yes No Hepatitis [1] 155 20 2 Yes No Thyroid (ann-train) [1] 3772 22 3 No No Statlog. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Total Total Number Missing Noisy Instances attributes of Classes attr. The best exception rules are generated iteratively to predict classes other than the default. Clearly the MDL discretized NB classifier on the average is the best performer compared to the other variants of NB as well as the 8 popular non-NB statistical classifiers that have been considered. Neural Network (NN). MDL discretized NB and some popular non-NB classifiers. MDL discretized NB and variants of NB. To further substantiate the results obtained in Table 2 and 3. Boosted naïve Bayes (BNB). Table 3 shows the accuracy performance with non-discetized NB. while the KBS is in routine use. we have tabled the results for the Area under the Receiver Operator Characteristics (AUROC) for the above mentioned statistical classifiers. The first one is the typical one namely Classification accuracy. The classification accuracy was taken as the average of the 10 predictive accuracy values. Clearly the wins obtained by MDL discretized NB classifier proves that it is the best performer.heart [1] 270 14 2 No No Hepatobiliary disorders [13] 536 10 4 No No Appendicitis [23]6 106 9 2 Yes No Stover Audiology [20] 1848 6 (5) 2 No No Leisenring neo audiology [20] 3152 8 (7) 2 No No Norton neonatal audiology [20] 5058 9 (7) 2 Yes No CARET PSA [20] 683 6 (5) 2 No No Ultrasound hepatic mets [20] 96 3 2 No No Pancreatic Ca biomarkers [20] 141 3 2 No No Laryngeal 1 [18] 213 17 2 No No Laryngeal 2 [18] 692 17 2 No No Laryngeal 3 [18] 353 17 3 No No RDS [18] 85 18 2 No No Voice_3 [18] 238 11 3 No No Voice_9 [18] 428 11 9 (2) No No Medical Dataset We have used two established measures of classifier performance in our experiments.

2197 70. MDL discretized NB and non-NB classifiers Sl No.375 63.4553 81.0588 76.1418 84.4851 87.7589 75.9355 96.5087 68.1159 84.0983 72.3711 91. SVM – Support Vector Machine.5882 74.8185 82.306 79.422 72.1418 81.8986 75.2772 77. FAN – Forest Augmented Naïve Bayes Table 3: Classification Accuracy with Naïve Bayes (NB).8148 49.3623 83.8215 95.2543 75.2135 68.1347 84.0542 72.9751 84.9076 80.7239 82.7925 100 100 96.4051 87. NN-Neural Network.9057 100 100 97.8646 63.0841 2/21 FAN 95.4118 69.2963 71.5368 100 96.4493 84.5794 9/21 Popular non-NB Classifiers DT 94.0376 96.8298 84.279 75.8505 3/21 DTab 95.7589 75.2258 99.1667 72.6714 90.Naïve Bayes.3284 87.7383 2/21 NB (MDL) 96.8512 75 63.6792 100 100 96.7647 80.8598 3/21 LR 96.8545 94.9583 68.8284 84.3507 86.7224 84.5161 98.5651 73.5722 82.5794 10/21 Variants of NB SNB 96. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Medical Dataset Wisconsin Breast Cancer Pima Diabetes Bupa Liver Disorders Cleveland Heart Disease Hepatitis Thyroid (ann-train) Statlog.6714 92.2521 89.3697 88.3881 90.9943 76.0588 78.5942 80. BNB.1596 96.6983 85.0542 79.4553 78.6266 80.4701 88.5161 95.6452 92.963 42.9538 73.7096 77.7899 91.7096 74.0806 80.1262 75.0672 88.Decision Table.5869 84.3333 68.1874 81.1884 83.7358 100 100 97.2903 99.7407 65.8908 87.041 81.2197 70.0189 99.0854 74.8491 100 100 97.7407 45.9478 84.ICIT – IEEE 2007 Table 2: Classification Accuracy with Naïve Bayes (NB).513 100 94.8487 94.2354 83.2207 95.5637 73.9478 84.0821 86.0833 61.3704 79.7479 78.4706 89.4983 83.6094 56.9538 71.0854 76.Tree Augmented Naïve Bayes.125 78.8148 47.9076 83.807 83.1092 94.25 73.7084 76.7234 86.8581 81.7095 74.6018 81.3637 80.5318 71.0968 95.8824 75.9957 77.25 70.7479 78.3059 89.2083 72.6619 77.2656 57.7358 100 100 96.7925 100 100 97.3333 68.6822 3/21 TAN 96.2609 85.375 78. RIPPER.2353 78.922 84.8284 84.087 83.0344 83.8284 84.2319 83.8545 94.486 3/21 SVM 96.8148 47.8122 95.25 70.807 83.6303 88.6377 74.3404 84.4074 67.5202 69.0739 83.Logistic Regression.3777 82.6792 100 100 96.8281 68.871 99.3209 83.3623 83.1302 71.5161 95.4983 83.8646 63.2941 76.6983 85.3587 84.8717 83.7239 86.9195 81.8487 86.5869 84. MDL discretized NB and variants of NB.0659 80. TAN.5806 93.4118 71.5368 100 96. DTab. RIDORRIpple Down Rules Learner .8284 84.2817 95.7589 85.0588 75.7082 75.6819 80.5493 77.9957 77.2083 65.9422 65.2523 9/21 BNB 95.375 73.1884 83.0376 96.6812 76.507 97.1302 64.2917 80.7925 99.9957 77.6452 99.349 66.7925 100 100 97.2258 99.3333 60.8224 2/21 RIDOR 95.3021 55.0867 71. DT – Decision Tree.7234 86.1493 84.2101 91.6821 64.2941 76.8908 84.6792 100 100 96.7964 82.1884 72.2804 6/21 RIPPER 95.3757 70.7383 2/21 NB (MDL) 96.3587 84.25 80.922 82.4983 82.3059 89.4701 88.0376 95.0755 100 100 96.0867 66.7943 80.3021 55.9943 76.1215 4/21 k-NN 94.7358 100 100 96.5665 77.4744 84.5651 74.1485 85.9928 70.Naïve Bayes.2917 73.9057 96.heart Hepatobiliary disorders Appendicitis Stover Audiology Leisenring neo audiology Norton neonatal audiology CARET PSA Ultrasound hepatic mets Pancreatic Ca biomarkers Laryngeal 1 Laryngeal 2 Laryngeal 3 RDS Voice_3 Voice_9 Wins NB 95.5714 85. k-NN.1823 62.k -Nearest Neighbor.5882 76.3333 59. LR.5382 85.9412 73.0708 87.6819 77.4074 67.1513 87.7512 95.1613 93.8598 7/21 Abbreviations Used: NB.1495 4/21 NN 95.5556 73.3107 80.25 68.3777 82.5552 89.5806 96.9751 84.8215 95.Repeated Incremental Pruning to Produce Error Reduction.6957 75.2917 73.4884 87.063 82. Sl No.1045 86.6212 100 97.3529 71.6821 64.4194 99.7407 63.8908 84.4118 69.4328 85.9751 85.0146 83. NB (MDL) – Naïve Bayes with MDL discretization. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Medical Dataset Wisconsin Breast Cancer Pima Diabetes Bupa Liver Disorders Cleveland Heart Disease Hepatitis Thyroid (ann-train) Statlog.3404 79.9057 96.9855 83.0496 78.375 78.2376 81.8385 87.5196 84. NB (MDL) – Naïve Bayes with MDL discretization.3438 58.5196 84.446 95.4038 94.9175 80.Boosted Naïve Bayes.1596 94.0854 74.9574 82.5161 98.8209 87.8045 88.9553 84.2428 68.7850 2/21 Abbreviations Used: NB.6257 84.0542 84.7234 80. SNB – Selective Naïve Bayes.heart Hepatobiliary disorders Appendicitis Stover Audiology Leisenring neo audiology Norton neonatal audiology CARET PSA Ultrasound hepatic mets Pancreatic Ca biomarkers Laryngeal 1 Laryngeal 2 Laryngeal 3 RDS Voice_3 Voice_9 Wins NB 95.

45 98.08 80.24 96.9 90.43 79.08 90.44 77.46 89.07 79.01 76.64 82.87 95.83 56.64 96.93 77.k -Nearest Neighbor.17 77.83 74. NB (MDL) – Naïve Bayes with MDL discretization.58 70.04 76.04 57.91 61.48 100 100 100 100 100 100 100 100 100 62. TAN.87 92. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Medical Dataset Wisconsin Breast Cancer Pima Diabetes Bupa Liver Disorders Cleveland Heart Disease Hepatitis Thyroid (ann-train) Statlog.81 100 100 100 100 58.15 82.84 74.91 84.33 98.82 96.32 90.97 81.50 93.69 90. SVM – Support Vector Machine.06 85.19 96.92 86.62 62.76 74.73 87.41 75.44 88.28 54.46 95.28 86.53 73.56 88.33 99.50 59.36 89.18 79.87 97.11 77.78 68.84 84.98 90.75 92.2 95.62 75.Naïve Bayes.95 92.28 95.67 95.33 79.82 99. k-NN.92 90.71 85.50 98.41 95.72 99.65 100 60.96 80. MDL discretized NB and non-NB classifiers Sl No.95 61.61 83. FAN – Forest Augmented Naïve Bayes Table 5: AUROC (in percentage) with Naïve Bayes (NB).66 90.64 75.86 64.30 2/21 2/21 2/21 3/21 1/21 5/21 3/21 1/21 Abbreviations Used: NB.70 96.02 90.18 61.2 99.27 78.20 72.51 83.14 79.39 89.37 80.49 95.00 50.09 71.61 96.36 91.6 97.09 81.48 77.36 73.39 75.75 75.06 97.20 77.78 90.20 91.33 99.34 65.62 98.95 99.78 89.96 78.66 80.21 79.31 78.01 90.94 78.68 84.6 91.43 76.56 99.75 84.11 97.36 72.44 51.75 81.heart Hepatobiliary disorders Appendicitis Stover Audiology Leisenring neo audiology Norton neonatal audiology CARET PSA Ultrasound hepatic mets Pancreatic Ca biomarkers Laryngeal 1 Laryngeal 2 Laryngeal 3 RDS Voice_3 Voice_9 Wins NB 98. BNB.95 99.02 78.17 88.95 96.63 75.91 2/21 NB (MDL) DT 99.71 85.42 80.66 84.46 100 51.91 72.85 85.60 91.56 97.91 2/21 Variants of NB NB (MDL) SNB BNB TAN FAN 99.53 91.75 15/21 k-NN 97.98 89.79 80.67 95.38 83.94 89.68 91. RIDORRIpple Down Rules Learner .Decision Table.74 93.33 84.07 83.36 59.75 74.75 77.68 87.28 65.68 80.01 62.31 65.Naïve Bayes.51 83.12 50.12 72.29 14/21 2/21 3/21 3/21 6/21 Abbreviations Used: NB.73 90.67 91.30 84.84 95.94 63.97 91 76.15 90.27 70.06 95. DT – Decision Tree.95 73.21 81.94 70.Boosted Naïve Bayes.54 87.31 94.16 49.60 100 100 100 100 100 100 100 100 100 100 58.33 96.66 72.95 99.48 71.54 70.75 81.86 88.44 91.97 91.28 85.47 84.16 76.78 84.41 51.60 73.16 50.54 99.16 91.31 93.48 79. Sl No.95 96.11 83.23 90.42 90.01 90.60 98.94 85.54 90.15 71.82 89.10 77.24 84.47 88.55 82.94 99.39 89.83 56.16 73.9 90.95 66.53 80.11 77.66 81.22 95.99 51.02 89.Tree Augmented Naïve Bayes.99 87.14 55.89 98.54 80.35 79.42 55.88 99.75 78.48 78.22 75.97 81.47 98.90 78.38 80.50 75.94 98.78 96. DTab. NB (MDL) – Naïve Bayes with MDL discretization.07 78.5 91.65 100 60.82 88.09 80.01 68.24 96.94 99.54 84. RIPPER.11 97.68 84.85 85.82 85.Logistic Regression.43 83.03 93.67 82.15 87.65 83.04 100 100 99.28 94.82 86.20 87.heart Hepatobiliary disorders Appendicitis Stover Audiology Leisenring neo audiology Norton neonatal audiology CARET PSA Ultrasound hepatic mets Pancreatic Ca biomarkers Laryngeal 1 Laryngeal 2 Laryngeal 3 RDS Voice_3 Voice_9 Wins NB 98.67 78.69 75.70 86.11 87.20 77.01 90.82 97.68 92.27 88.75 94.54 83. SNB – Selective Naïve Bayes.81 78. MDL discretized NB and variants of NB.44 90.62 91.45 80.35 79.69 96.34 54.26 81.44 60.63 Popular non-NB Classifiers LR NN SVM RIPPER DTab RIDOR 99.47 85.91 84.83 74.53 98.51 50.ICIT – IEEE 2007 Table 4: AUROC (in percentage) with Naïve Bayes (NB).43 99.99 99.23 98.97 91 87.83 87.05 79.01 91. NN-Neural Network.00 79.73 90. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Medical Dataset Wisconsin Breast Cancer Pima Diabetes Bupa Liver Disorders Cleveland Heart Disease Hepatitis Thyroid (ann-train) Statlog.24 97.04 63.19 76.86 64.Repeated Incremental Pruning to Produce Error Reduction.95 79.19 98.15 90. LR.85 98.35 82.87 78.86 90.

4553 SNB 90.79 NB 81.8646 71 72 73 74 75 76 77 78 TAN 84.23 k-NN 65.44 NB 95.24 TAN 89.06 SNB 84.0376 FAN 84.4744 .83 NB 90.56 SVM k-NN 73.4118 88.95 79. we show 4 medical datasets used in our experiments where MDL discretized Naïve Bayes classification accuracy provides the best results compared to all other statistical classifiers.36 91.38 75.2 90.69 NB 82.0588 87.Average Performance Average Performance Average Performance Average Performance 86 85 87 MDL (MDL) 75 MDL (MDL) 84.2656 DT 75.39 NN 94.RDS Classifier AUROC Performance .4118 89.02 90.7647 90.1302 75.1596 NN FAN 78.6094 75 75.0588 RIDOR 87.0376 SVM 82.2941 91.0833 77.27 Statistical Classifier Statistical Classifier Statistical Classifier NN 87.8281 k-NN 77.86 TAN 80.3777 k-NN 79.0588 83.28 SVM 88.0376 84.1823 Average Performance Average Performance 92 90 88 86 84 82 80 78 76 74 72 70 68 66 64 62 60 58 Average Performance 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 Average Performance 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 79 MDL (MDL) NN FAN DTab TAN BNB LR DT SNB NB RIPPER RIDOR k-NN SVM 59.5882 89.CARET PSA Statistical Classifier Statistical Classifier Statistical Classifier LR 92.18 SNB 82.7512 74.Pima Diabetes Classifier AUROC Performance .446 LR 84.2 91.43 RIPPER RIDOR 73.RDS Classifier Accuracy Performance .7095 76.97 85.349 DTab 71.2135 77.5882 90.507 82 81 84 83 70 MDL (MDL) SVM LR SNB NB NN RIPPER RIDOR TAN BNB FAN DT DTab k-NN 74.87 91. 7: Classification AUROC (in percentage) of Statistical classifiers In Fig.62 85 87 89 91 93 95 64 66 68 70 72 LR 83.Laryngeal 1 Fig.75 BNB 80.306 DT 89.19 SNB 91.CARET PSA Classifier AUROC Performance .93 89.4553 RIPPER 95.2817 83.Pima Diabetes Classifier AUROC Performance .1347 RIDOR 85.97 BNB 89. Statistical Classifier DTab 76.82 83 93.1596 Classifier Accuracy Performance .6266 84.3529 74.22 BNB 96.98 88.33 FAN 96.01 NB 70.9195 84.3587 85.11 89 91 93 95 97 99 M DL (M DL) FAN RIPPER NN BNB TAN DTab DT 85.64 77 MDL (MDL) 86.8824 79.4038 79.6 90.56 Statistical Classifier Classifier Accuracy Performance .3021 77.063 77.8545 79 81 FAN 91.43 80.07 RIPPER 83.2353 83 85 87 MDL (MDL) 98.68 83.6018 83.31 DTab 91.8122 81.5869 DT 72.041 83.9583 RIDOR 70.08 74 76 78 80 82 84 86 76 75 78 77 80 79 SNB 85.78 TAN 97.02 ICIT – IEEE 2007 LR 90.Laryngeal 1 Classifier Accuracy Performance .14 DTab 80.19 78.78 LR k-NN 85.1874 84. 6: Classification accuracy of Statistical classifiers Fig.66 RIPPER 81. Fig 7 shows the same 4 medical datasets that gave the best results for the AUROC compared to all the other statistical classifiers.28 87.0659 84.54 84.9412 95.47 81 MDL (MDL) FAN SNB TAN RIPPER NB BNB SVM LR NN DTab RIDOR DT k-NN 92.54 DT 75.15 78.1302 76.44 SVM 72. 6.84 RIDOR 75.09 BNB NN 82.2207 SVM 73.3438 77.42 79.

Su J. http://www.. Bangor. [9] Fayyad U. In Machine Learning: Proceedings of the Twelfth International Conference. [3] Cohen W.net/JP_Sacha_PhD_dissertation. [20] Pepe M. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. We show using two established measures for comparing performance of statistical classifiers. 1995 [6] David W.. “Boosting and Naive Bayesian learning”.S. 2006 [5] Cortes C. Pazzani M. Volume 19. California. Tech Report TR-CS-2004-11. Issue 5. R.ICIT – IEEE 2007 7.G. References [1] Blake C.sourceforge. “Learning Optimal Augmented Bayes Networks”. Hung-Ju Huang. Iyengar S. 191-201. Computer Science Department. Irvine. “Support Vector Networks”. Merz C. Japan pp 159-173. “Bayesian network classifiers”. Peters L... [2] Chun-Nan Hsu. Oxford Statistical Science Series. In Proceedings of PKAW. Vol. UK. Machine Learning. [14] Herve Abdi. B. “The Statistical Evaluation of Medical Tests for Classification and Prediction”. “Multi-interval discretization of continuous-valued attributes for classification learning”. pp.fhcrc... The work is presently under progress to explore feature selection methods in achieving better naïve Bayes classification performance. pp.5. [15] Kohavi R. 1995. [13] Hayashi Y. In Proceedings of the 13th International Joint Conference on Artificial Intelligence.ac. Bangor Gwynedd LL57 1UT. 1994. “Why Discretization works for Naïve Bayesian Classifiers”. “Instance-Based learning algorithms”.edu /~mlearn/ MLRepository. 356-362. 17th ICML..School of Informatics. 1991. 2002. Programs for MachineLearning”. 47-58.org /science/ labs/pepe/book/. 1992. [24] Saha. .. In Proceedings of the 14th International Joint Conference on Artificial Intelligence. Proceedings of the Seventh International Workshop on Artificial Intelligence and Statistics: 225-230.. 1994. Machine Learning. 3766. Heraclion. “Experience with Ripple-Down Rules”. [16] Kohavi R. AAAI Press. pp. J. Machine Learning: ECML-95: 8th European Conference on Machine Learning. “Learning Augmented Bayesian Classifiers: A Comparison of Distribution-based and Classification-based Approaches”.. Geoffrey I Webb.html].uk/ ~kuncheva/ activities/ real_data_full_set. [26] Zhang H.S. . [11] Keogh E. University of Wales. “A Comparative Study of Discretization Methods for Naïve Bayes Classifiers”. Lake Tahoe. pp. [22] Ranjit Abraham.. Machine Learning. Dennis Kibler. ‘http://jbnc.J. Information Sciences Applications. 1022–1027.. Journal of Biological Systems. Edwards G. Helman P. van Houwelingen J. “Hidden Naive Bayes” . Irani K. San Diego. 9th International Conference for Information Technology (ICIT2006). Kuncheva. Jay B. 6. “Neural expert system using fuzzy teaching input and its application to medical diagnosis”. pp. pp. 20(3).. “The Power of Decision Tables”. [7] Domingos P.. 2000. “Beyond independence: Conditions for the optimality of the simple Bayesian classifier”. [17] le Cessie S... 1997. (for the Medical dataset on Appendicitis). 1.bangor. 1995.J. Hence forth we are of the opinion that generative methods like naïve Bayes discretized with MDL is simple yet effective and are to be used whenever possible to set the benchmark for other statistical classifiers..htm [19] Pearl J. Knowledge-Based Systems. 1997. Lavers T. 2006.. pp. 29. Proceedings of the Twentieth National Conference on Artificial Intelligence (AAAI-05) . Crete.pdf’. M. Aha.. Vapnik V. Jiang L. [21] Quinlan. [8] Elkan C. Morgan Kaufmann. India..L. “A comparative analysis of discretization methods for Medical datmining with Naïve Bayesian classifiers”. pages 131163. 1997. “ A Neural Network Primer”. 2004.ics.uci. 2003. [23] Shalom M. 1137–1145. 2005. Morgan Kaufmann Publishers. [18] Ludmila I. Mark C Albert. Machine Learning. [10] Friedman N. 1988. “Ridge estimators in logistic regression”. (Technical Report) University of California. pp.J. Applied Statistics. Tsu-Tsung Wong.Simha. Geiger D. 273-297.. . University of California.. Bhubaneshwar. no 1. 1995. [4] Compton P. Vol 41. naïve Minimum Description Length (MDL) discretization seems to be the best performer compared to the considered various naïve Bayes and non-naïve Bayes classifiers. [12] Hamine V. Oxford University Press. http://www. 1993. “C4. Dec 18-21. University of New Mexico. “Probabilistic Reasoning in Intelligent Systems”.. Vol 2(3). Department of Information and Computer Science. Dean Street. [25] Ying Yang. 247-283. Pazzani M. 1999. 1993.. San Mateo. pp 309-406. “Fast Effective Rule Induction”... on an average. 29:103—130. Weiss.. vol. “UCI repository of machine learning databases”. Conclusions In this research work an attempt was made to evaluate various probabilistic classifiers that could be used for medical datamining.informatics.. [http://www.. Goldszmidt M. Ca. Greece.