You are on page 1of 14

Decision Trees

Arun Kumar

IIT Ropar

1 / 14
Outlines

1 Elements of Information Theory

2 Decision Tree Classification for Categorical Data

3 Decision Tree Regression

2 / 14
History

• Information theory was introduced in 1948 by Shannon.

• The theory cam into existence in connection with the problem of


transmission of information along communications channels.

• “Information" in itself is a very general, qualitative, subjective and not


very precise concept.

• However, information theory is developed into a quantitative, precise,


objective and very useful theory.

3 / 14
Shannon’s Information

• Let the PMF of the rv X is given.

• The question posed by Shannon is the following “Can we find a measure


of how much uncertain we are of the outcome" ?

• Shannon then assumed that if such a function, denoted H(p1 , · · · , pn ),


exists, it is reasonable to expect that it will have the following properties.
a H should be continuous in all the pi .
b If all the pi are equal, i.e., pi = 1/n, then H should have a maximum value
and this maximum value should be a monotonic increasing function of n.
c If a choice is broken down into successive choices, the quantity H should be
the weighted sum of the individual values of H.
Pn
• The entroy function is defined by H(p1 , p2 , · · · , pn ) = − i=1 pi log(pi ).

4 / 14
Measure of Impurity

Entropy
Entropy for a set S is givne by
X
H(S) = − p(c) log2 p(c),
c∈C

where C is the set of classes is S and p(c) are proportions of different


classes.

Gini
Gini impurity for a set S, where the target variable takes N different labels

X N
X
Gini(S) = p(i)p(j) = 1 − p(i)2 ,
i̸=j i=1

where p(i) are the proportions of different labels in the set S.

5 / 14
Decision Tree Introduction

• Decision Tree algorithm belongs to the family of supervised learning


algorithms.
• Decision tree can be used for solving classification as well as regression
problems.
• Decision trees classify instances by sorting them down the tree from the
root node to some leaf node. Leaf node classify the instance.
• Each node in the tree specifies a test of some attribute of instance and
each branch emanating from that node belong to the values of the
attribute.

6 / 14
Sample Decision Tree

7 / 14
Algorithms to build decision trees

• ID3 (Iterative Dichotomiser 3): Entropy and Information Gains as


metrics
• CART (Classification and Regression Trees): Gini Index as metric
• Decision tree regression by using recursive binary splitting as metric
• Others

8 / 14
ID3 Algorithm based on Weather Data

1
1
Based on “Machine Learning", by T. Mitchell, Ch. 3
9 / 14
Final Decision Tree

10 / 14
Recursive Binary Tree Splitting Algorithm

• Suppose there are p predictors.


• We find out the predictor Xj and the cutpoint s such that splitting the
predictor space into the regions {X |Xj < s} and {X |Xj ≥ s} leads to the
highest reduction in Residual Square Sums (RSS).
• In details, for any j and s, define the pair of half-planes given by
R1 (j, s) = {X |Xj < s} and R2 (j, s) = {X |Xj ≥ s}, and we search for the
pair (j, s), that minimize the value of RSS given by
X X
(yi − ŷR1 )2 + (yi − ŷR2 )2
i:xi ∈R1 (j,s) i:xi ∈R2 (j,s)

where ŷR1 is the mean response for the training observations in R1 (j, s),
and ŷR1 is the mean response for the training observations in R2 (j, s).
2

2
Based on “An Introduction to Statistical Learning with Applications in R ", Chapter 8, Page 306
11 / 14
Data

12 / 14
Final Decision Tree Based On Python Sklearn

13 / 14
References

• Beazley, D. M. (2009). Python: Essential Reference (4th ed.) Pearson


Education, Inc.

• James, G., Witten, D., Hastie, T. and Tibshirani, R. (2013). An


Introduction to Statistical Learning with Applications in R. Springer New
York.

• Mitchell, T. M. (2017). Machine Learning. McGraw Hill Education.

• https://www.superdatascience.com/

14 / 14

You might also like