You are on page 1of 1

There is, in fact, a trivial way to construct a decision tree that is consistent with all the examples.

We simply add one complete path to a leaf for each example, with the appropriate attribute values and leaf value. This trivial tree fails to extract any pattern from the examples and so we can t expect it to be able to extrapolate to examples it hasn t seen. Finding a pattern means being able to describe a large number of cases in a conc ise way that is, finding a small, consistent tree. This is an example of a general principle of inductive learning often called Ockham s razor : the most likely hypothesis is the simplest one that is consistent with all observations. Unfortunately, finding the smallest tree is an intractable pro blem, but with some simple heuristics we can do a good job of finding a smallish one. The basic idea of decision-tree algorithms such as ID3 is to test the most impor tant attribute first. By most important, we mean the one that makes the most difference to the cl assification of an example. (Various measures of importance are used, based on either the infor mation gain (Quinlan, 1986) or the minimum description length criterion (Wallace & Patr ick, 1993).) In this way, we hope to get to the correct classification with the smallest number of tests, meaning that all paths in the tree will be short and the tree will be small. ID3 chooses the best attribute as the root of the tree, then splits the examples into subsets according to their value for the attribute. Each of the subsets obtained by splitting on an attribute is essentially a new (but smal ler) learning problem in itself, with one fewer attributes to choose from. The subtree along each bran ch is therefore constructed by calling ID3 recursively on the subset of examples.

You might also like