Welcome to Scribd. Sign in or start your free trial to enjoy unlimited e-books, audiobooks & documents.Find out more
Standard view
Full view
of .
0 of .
Results for:
P. 1
ID3 Algorithm

# ID3 Algorithm

Ratings: (0)|Views: 4,707|Likes:

### Availability:

See more
See less

06/01/2015

pdf

text

original

Implementation of ID3 – Decision Tree Algorithm
*Sharadverma@live.in, Truba College of Engineering & Science/ Computer Engineering INDORE, INDIA**nikitaj.01@gmail.com, Truba College of Engineering & Science/ Computer Engineering  INDORE, INDIA
Abstract
In this paper we address the issue of decisiontree learning algorithm which has been successfullyused in expert systems in capturing knowledge. Themain task performed in these systems is using inductivemethods to the given values of attributes of an unknownobject to determine appropriate classification according to decision tree rules.We focus on the problem of decision tree learning with the popular ID3 algorithm. Algorithms have a wide range of applications like churn pre-diction, fraud detection, artificial intelligence, and credit card rating etc. Also there are many classificationalgorithms available in literature but decision trees isthe most commonly used because of its ease of implementation and easier to understand compared toother classification algorithms.
Keywords:

Data mining, Decision trees & ID3Algorithm.
1. Introduction
A decision tree is a tree in which each branchnode represents a choice between a number of alternatives, and each leaf node represents a decision.Decision tree are commonly used for gaininginformation for the purpose of decision -making.Decision tree starts with a root node on which it is for users to take actions. From this node, users split eachnode recursively according to decision tree learningalgorithm. The final result is a decision tree in whicheach branch represents a possible scenario of decisionand its outcome. We demonstrate this on ID3, a well-known and inﬂuential algorithm for the task of decisiontree learning. We note that extensions of ID3 are widelyused in real market applications.ID3 is a simple decision tree learning algorithmdeveloped by Ross Quinlan (1983). The basic idea of ID3 algorithm is to construct the decision tree byemploying a top-down, greedy search through the givensets to test each attribute at every tree node. In order toselect the attribute that is most useful for classifying agiven sets, we introduce a metric---information gain.To find an optimal way to classify a learning set, whatwe need to do is to minimize the questions asked (i.e.minimizing the depth of the tree). Thus, we need somefunction which can measure which questions provide themost balanced splitting. The information gain metric issuch a function.
1.1 Entropy
In information theory, entropy is a measure of theuncertainty about a source of messages. The moreuncertain a receiver is about a source of messages, themore information that receiver will need in order toknow what message has been sent.
=
=
iii
p p Entropy
1
log)(
1.2 Information Gain
Measuring the expected reduction in Entropy Aswe mentioned before, to minimize the decision treedepth, when we traverse the tree path, we need to selectthe optimal attribute for splitting the tree node, which wecan easily imply that the attribute with the most entropyreduction is the best choice. We define information gainas the expected reduction of entropy related to specifiedattribute when splitting a decision tree node.
=
=
j j j
Entropy EntropyGain
11
)()()..,(
For inductive learning, decision tree learning isattractive for 3 reasons:1. Decision tree is a good generalization for unobserved

instance, only if the instances are described in terms of features that are correlated with the target concept.2. The methods are efficient in computation that is proportional to the number of observed traininginstances.3. The resulting decision tree provides a representationof the concept that appeal to human because it rendersthe classification process self-evident.
1.3Related Wor
In this paper, we have focused on the problemof minimizing test cost while maximizing accuracy. Insome settings, it is more appropriate to minimizemisclassification costs instead of maximizing accuracy.For the two class problem, Elkan gives a method tominimize misclassification costs given classification probability estimates. Bradford et al. compare pruningalgorithms to minimize misclassification costs. As bothof these methods act independently of the decision treegrowing process, they can be incorporated with our algorithms (although we leave this as future work). Lingetal propose a cost-sensitive decision tree algorithm thatoptimizes both accuracy and cost. However, the costinsensitive version of their algorithm (i.e. the algorithmrun if all feature costs are zero), reduces to a splittingcriteria that maximizes accuracy, which is well known to be inferior to the information gain and gain ratiocriterion. Integrating machine learning with programunderstanding is an active area of current research.Systems that analyze root cause errors in distributedsystems and systems that find bugs using dynamic predicates may both benefit from cost sensitive learningto decrease overhead monitoring costs.
2. Classiﬁcation by Decision Tree Learning
This section brieﬂy describes the machinelearning and data mining problem of classiﬁcation andID3, a well-known algorithm for it. The presentationhere is rather simplistic and very brief and we refer thereader to Mitchell [12] for an in-depth treatment of thesubject. The ID3 algorithm for generating decision treeswas ﬁrst introduced by Quinlan in [15] and has since become a very popular learning tool.
2.1 The Classiﬁcation Problem
The aim of a classiﬁcation problem is toclassify transactions into one of a discrete set of possiblecategories. The input is a structured database comprisedof attribute-value pairs. Each row of the database is atransaction and each column is an attribute taking ondi
ﬀ
erent values. One of the attributes in the database isdesignated as the class attribute; the set of possiblevalues for this attribute being the classes. We wish to predict the class of a transaction by viewing only thenon-class attributes. This can then be used to predict theclass of new transactions for which the class isunknown. For example, the weather problem is a toydata set which we will use to understand how a decisiontree is built. It is reproduced with slight modifications inWitten and Frank (1999), and concerns the conditionsunder which some hypothetical outdoor game may be played. In this dataset, there are five categoricalattributes outlook, temperature, humidity, windy, and play. We are interested in building a system which willenable us to decide whether or not to play the game onthe basis of the weather conditions, i.e. we wish to predict the value of play using outlook, temperaturehumidity, and windy. We can think of the attribute wewish to predict, i.e. play, as the output attribute, and theother attributes as input.
2.2 Decision Trees and the ID3 Algorithm
The main ideas behind the ID3 algorithm are:1. Each non-leaf node of a decision tree corresponds toan input attribute, and each arc to a possible value of thatattribute. A leaf node corresponds to the expected valueof the output attribute when the input attributes aredescribed by the path from the root node to that leaf node.2. In a “good” decision tree, each non-leaf node shouldcorrespond to the input attribute which is the mostinformative about the output attribute amongst all theinput attributes not yet considered in the path from theroot node to that node. This is because we would like to predict the output attribute using the smallest possiblenumber of questions on average.The ID3 algorithm assumes that each attributeis categorical, that is containing discrete data only, incontrast to continuous data such as age, height etc. The principle of the ID3 algorithm is as follows. The tree isconstructed top-down in a recursive fashion. At the root,each attribute is tested to determine how well it aloneclassiﬁed the transactions. The “best” attribute (to bediscussed below) is then chosen and the remainingtransactions are partitioned by it. ID3 is then recursivelycalled on each partition (which is a smaller databasecontaining only the appropriate transactions and withoutthe splitting attribute).
2.2.1 ID3 algorithm is best suited for: -
1. Instance is represented as attribute-value pairs.2. Target function has discrete output values.