You are on page 1of 8

Table of Content

1 General Introduction 2
1.1 Taxonomy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Theory of Decision Trees 5


2.0.1 Splitting Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Summary & Conclusion 7


3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

A Bibliography 8

Title page graphic: Copyright by Benni from The Noun Project


1 General Introduction
1 General Introduction
1.1 Taxonomy
Before we approach the theory behind decision trees, a small but general overview of
the taxonomy shall be given.

Decision tree classication is very often used in the context of data mining and machine
learning. These keywords are no synonyms, although used as one very often. Machine
learning cannot be seen as a true subset of data mining, as it also contains other elds,
not utilized for data mining (e.g. theory of learning, computational learning theory, and
reinforcement learning).

The gure below shows the machine learning context for decision trees.

Machine Learning Algorithms

Unsupervised Reinforcement
Supervised Learning
Learning Learning

Classication Regression

... ...

Decision Tree Articial Support Vector ...

Neural Network Machines

Figure 1: The context of decision trees in machine learning.

1.2 Denitions
Denition 1. A tree is a directed, connected graph with one root node. Every other
node has a single predecessor (parent) and no or more successors (children). Nodes
without successors are called leaves. All nodes are connected by edges. The depth of
a node is the number of edges on the path to the root. The height of the whole tree is
the number of edges on the longest path from the root to any leaf.

2
1 General Introduction 1.3 Applications

Denition 2. A decision tree is a tree with following equivalents:


Tree Decision tree equivalent
Root Initial decision node

Node Internal decision node for testing on an attribute

Edge Rule to follow

Leaf Terminal node represents the resulting classication

As mentioned in subsection 1.1, machine learning is a set of algorithms that extract


models representing patterns from data and then evaluate those models. Let us dene
four relevant terms, which are important for understanding the following algorithms
descriptions: instance, attribute, class, and dataset:

Denition 3. The input of a machine learning algorithm consists of a set of instances


(e.g. rows, examples or observations). Each instance is described by a xed number of
attributes (i.e. columns), which are assumed to be either nominal or numeric, and a
label which is called class (in case of a classication task). The set of all instances is
called dataset.

Following this denition we get a table containing the dataset: Each decision becomes
an attribute (all binary relations), all leaves are classes, while each row represents an
instance of the dataset

Normally, the transformation is vice versa: the data is collected in table form (e.g.
databases) and a decision tree has to be generated.

1.3 Applications
Decision trees have a wide eld of applications. In this subsection some examples of the
applications are listed.

Astronomy [ salzberg1995decision] applied decision tree learning to the task of dis-


tinguishing between stars and cosmic rays in images collected by the Hubble Space
Telescope.

Chemistry The relationship between the research on octane (ROC) number and the
molecular substructures were explored in the paper [ blurock1995automatic].
Medicine In [ vlahou2003diagnosis] decision trees are applied for the diagnosis of the
ovarian cancer.

Economy The results of the research project on decision trees used in stock trading was
published in [ wu2006eective].

3
1 General Introduction 1.3 Applications

Geography [ lagacherie1997addressing] used classication trees to predict and cor-


rect errors in topographical and geological data.

4
2 Theory of Decision Trees
2 Theory of Decision Trees
2.0.1 Splitting Criterion
Comparison of splitting criterions is a frequently visited research topic. Although, there
is no extraordinary dierence, each splitting criterion is superior in some cases and
inferior in others; a general.

Information Gain The idea of the information gain is based on the information theory
which was introduced by Claude Elwood Shannon in 1948. Let node N represent or
hold the tuples of partition D.

Denition 4. The entropy and the information gain are dened as

m v m
X X |Dj | X
Gain(A) = − pi log2 (pi ) − ×− pi log2 (pi ) . (1)
i=1 j=1
|D| i=1

where pi is the nonzero probability that an arbitrary tuple in D belongs to class Ci and
Dj contains those tuples in D that have outcome aj of A. while the rst term in the
expression above is the entropy and the second term the weighted entropy of the child
nodes. The dierence thus reects the decrease in entropy or the information gained
from the use of attribute aj .

Gain Ratio The information gain measure is biased toward tests with many outcomes.
Gain ratio is an extension to information gain, which attempts to overcome this bias.It
applies a kind of normalization to information gain using a split information value
dened analogously with Entropy(D) as:

Denition 5.
v  
X |Dj | |Dj |
SplitEntropyA (D) = − × log2 . (2)
j=1
|D| |D|

This value represents the potential information generated by splitting the training data
set, D, into v partitions, corresponding to the v outcomes of a test on attribute A. The
gain ratio is dened as:

5
2 Theory of Decision Trees

Denition 6.
Gain(A)
GainRatio(A) = . (3)
SplitEntropyA (D)

The attribute with the maximum gain ratio is selected as the splitting attribute.

Gini Index Using the notation previously described, the Gini index measures the
impurity of D, a data partition or set of training tuples, as

Denition 7.
m
X
Gini(D) = 1 − p2i . (4)
i=1

The Gini index considers a binary split for each attribute.

6
3 Summary & Conclusion
3 Summary & Conclusion
3.1 Summary
In the scope of this paper, a small introduction to decision trees was given. The introduc-
tory example showed the working principle and advantages of decision trees. An overview
of machine learning approaches helped to see the bigger picture.

The theory part started with some necessary denitions which are used in the following
parts. The basic top-down induction of decision trees algorithm was introduced and the
options for improving this framework were described mathematically.

Four decision tree algorithms were selected and presented to the reader: CHAID, ID3,
CART, and C4.5. All facts from the previous sections were compared and traded o
one against the other. Due to the limited scope of this seminar paper some parts were not
considered in detail, but a small outlook on these interesting topics is given.

The last part shows some examples where decision trees nd their application in the real
world and how many-faceted this eld is. The programming example with a small code
documentation completes the picture of decision trees.

7
A Bibliography
A Bibliography

You might also like