instance, only if the instances are described in terms of features that are correlated with the target concept.2. The methods are efficient in computation that is proportional to the number of observed traininginstances.3. The resulting decision tree provides a representationof the concept that appeal to human because it rendersthe classification process self-evident.
In this paper, we have focused on the problemof minimizing test cost while maximizing accuracy. Insome settings, it is more appropriate to minimizemisclassification costs instead of maximizing accuracy.For the two class problem, Elkan gives a method tominimize misclassification costs given classification probability estimates. Bradford et al. compare pruningalgorithms to minimize misclassification costs. As bothof these methods act independently of the decision treegrowing process, they can be incorporated with our algorithms (although we leave this as future work). Lingetal propose a cost-sensitive decision tree algorithm thatoptimizes both accuracy and cost. However, the costinsensitive version of their algorithm (i.e. the algorithmrun if all feature costs are zero), reduces to a splittingcriteria that maximizes accuracy, which is well known to be inferior to the information gain and gain ratiocriterion. Integrating machine learning with programunderstanding is an active area of current research.Systems that analyze root cause errors in distributedsystems and systems that find bugs using dynamic predicates may both benefit from cost sensitive learningto decrease overhead monitoring costs.
2. Classiﬁcation by Decision Tree Learning
This section brieﬂy describes the machinelearning and data mining problem of classiﬁcation andID3, a well-known algorithm for it. The presentationhere is rather simplistic and very brief and we refer thereader to Mitchell  for an in-depth treatment of thesubject. The ID3 algorithm for generating decision treeswas ﬁrst introduced by Quinlan in  and has since become a very popular learning tool.
2.1 The Classiﬁcation Problem
The aim of a classiﬁcation problem is toclassify transactions into one of a discrete set of possiblecategories. The input is a structured database comprisedof attribute-value pairs. Each row of the database is atransaction and each column is an attribute taking ondi
erent values. One of the attributes in the database isdesignated as the class attribute; the set of possiblevalues for this attribute being the classes. We wish to predict the class of a transaction by viewing only thenon-class attributes. This can then be used to predict theclass of new transactions for which the class isunknown. For example, the weather problem is a toydata set which we will use to understand how a decisiontree is built. It is reproduced with slight modifications inWitten and Frank (1999), and concerns the conditionsunder which some hypothetical outdoor game may be played. In this dataset, there are five categoricalattributes outlook, temperature, humidity, windy, and play. We are interested in building a system which willenable us to decide whether or not to play the game onthe basis of the weather conditions, i.e. we wish to predict the value of play using outlook, temperaturehumidity, and windy. We can think of the attribute wewish to predict, i.e. play, as the output attribute, and theother attributes as input.
2.2 Decision Trees and the ID3 Algorithm
The main ideas behind the ID3 algorithm are:1. Each non-leaf node of a decision tree corresponds toan input attribute, and each arc to a possible value of thatattribute. A leaf node corresponds to the expected valueof the output attribute when the input attributes aredescribed by the path from the root node to that leaf node.2. In a “good” decision tree, each non-leaf node shouldcorrespond to the input attribute which is the mostinformative about the output attribute amongst all theinput attributes not yet considered in the path from theroot node to that node. This is because we would like to predict the output attribute using the smallest possiblenumber of questions on average.The ID3 algorithm assumes that each attributeis categorical, that is containing discrete data only, incontrast to continuous data such as age, height etc. The principle of the ID3 algorithm is as follows. The tree isconstructed top-down in a recursive fashion. At the root,each attribute is tested to determine how well it aloneclassiﬁed the transactions. The “best” attribute (to bediscussed below) is then chosen and the remainingtransactions are partitioned by it. ID3 is then recursivelycalled on each partition (which is a smaller databasecontaining only the appropriate transactions and withoutthe splitting attribute).
2.2.1 ID3 algorithm is best suited for: -
1. Instance is represented as attribute-value pairs.2. Target function has discrete output values.