You are on page 1of 51

Predictive Modelling

Decision Tree

Machine Learning
Topics: Decision Tree, Naı̈ve Bayes Classifier

Partha Basuchowdhuri

Assistant Professor,
Department of Computer Science and Engineering,
Heritage Institute of Technology,
Kolkata, INDIA

August 11, 2015

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Outline for Part I

1 Predictive Modelling

2 Decision Tree
Introduction, Impurity Measures
Decision Tree Construction
Overfitting in Decision Tree

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - What is it?

Idea comes from Turing test.

Whether a conversation between a human and a machine


would generate human-like responses.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - What is it?

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - What is it?

Idea comes from Turing test.

Whether a conversation between a human and a machine


would generate human-like responses.

Definition: Machine Learning (Mitchell, 1997)


A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its
performance at tasks in T , as measured by P, improves with
experience E .

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - How do we use it?

We use it for predicting future patterns or future behavior of a


system.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - How do we use it?

We use it for predicting future patterns or future behavior of a


system.

How can we predict future?

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - How do we use it?

We use it for predicting future patterns or future behavior of a


system.

How can we predict future?

By learning from the present and past. This knowledge from


prior instances is known as apriori knowledge.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - How do we use it?

We use it for predicting future patterns or future behavior of a


system.

How can we predict future?

By learning from the present and past. This knowledge from


prior instances is known as apriori knowledge.

The model is known as a hypothesis. It is learned based on


the apriori knowledge. This phase is known as training.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - Significance of dataset


Usually the data is divided into three parts - training data,
validation data and test data.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - Significance of dataset


Usually the data is divided into three parts - training data,
validation data and test data.

Training data: Part of dataset used to build the model.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - Significance of dataset


Usually the data is divided into three parts - training data,
validation data and test data.

Training data: Part of dataset used to build the model.

Validation data: May be the model still does not account for
a few cases. This type of data further improves the model by
taking some corrective measures.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - Significance of dataset


Usually the data is divided into three parts - training data,
validation data and test data.

Training data: Part of dataset used to build the model.

Validation data: May be the model still does not account for
a few cases. This type of data further improves the model by
taking some corrective measures.

Test data: Part of dataset kept aside to test the predictive


model when ready.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

Machine Learning - Significance of dataset


Usually the data is divided into three parts - training data,
validation data and test data.

Training data: Part of dataset used to build the model.

Validation data: May be the model still does not account for
a few cases. This type of data further improves the model by
taking some corrective measures.

Test data: Part of dataset kept aside to test the predictive


model when ready.

Analysis of the prediction results on the test data tells us how


accurate the predictive model is.
Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

How good is your predictive model?


Problem
Given a picture of 7 cats and 9 dogs, you are asked to build a
computational model (may be image processing based) that could
identify the dogs.

Say, your model can identify - 8 dogs out of which 3 are actually
cats
TRUE + TRUE +
Precision = Predicted TRUE = (TRUE +)+(FALSE +)

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

How good is your predictive model?


Problem
Given a picture of 7 cats and 9 dogs, you are asked to build a
computational model (may be image processing based) that could
identify the dogs.

Say, your model can identify - 8 dogs out of which 3 are actually
cats
TRUE + TRUE +
Precision = Predicted TRUE = (TRUE +)+(FALSE +)
TRUE + TRUE +
Recall = Actual TRUE = (TRUE +)+(FALSE −)

Actual TRUE Actual FALSE


Predicted TRUE TRUE+ FALSE+
Predicted FALSE FALSE- TRUE-
Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

How good is your predictive model?


Problem
Given a picture of 7 cats and 9 dogs, you are asked to build a
computational model (may be image processing based) that could
identify the dogs.

Say, your model can identify - 8 dogs out of which 3 are actually
cats
5 5
Precision = 5+3 = 8 (what is the significance?)

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Predictive Modelling
Decision Tree

How good is your predictive model?


Problem
Given a picture of 7 cats and 9 dogs, you are asked to build a
computational model (may be image processing based) that could
identify the dogs.

Say, your model can identify - 8 dogs out of which 3 are actually
cats
5 5
Precision = 5+3 = 8 (what is the significance?)
5 5
Recall = 5+4 = 9 (what is the significance?)

Actual TRUE Actual FALSE


Predicted TRUE TRUE+ FALSE+
Predicted FALSE FALSE- TRUE-

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Outline for Part II

1 Predictive Modelling

2 Decision Tree
Introduction, Impurity Measures
Decision Tree Construction
Overfitting in Decision Tree

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Motivation - An example that requires making a decision

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - Dataset

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - Disjunction of Conjunctions

Observations:

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - Disjunction of Conjunctions

Observations:
Every path starting from the root and ending at a leaf node,
would form a rule to achieve the target.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - Disjunction of Conjunctions

Observations:
Every path starting from the root and ending at a leaf node,
would form a rule to achieve the target.

The tree can be seen as disjunction of such conjunctive rules.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - Disjunction of Conjunctions

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - Disjunction of Conjunctions

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Decision Tree - How to make decisions?

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Purity in Decisions - How to understand?

We try to find the pure decisions in order to build the decision


tree.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Purity in Decisions - How to understand?

We try to find the pure decisions in order to build the decision


tree.

How to find pure decisions?

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Purity in Decisions - How to understand?

We try to find the pure decisions in order to build the decision


tree.

How to find pure decisions?

We need a purity/ impurity measure to evaluate prospective


decisions.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Purity in Decisions - How to understand?

We try to find the pure decisions in order to build the decision


tree.

How to find pure decisions?

We need a purity/ impurity measure to evaluate prospective


decisions.

Three such impurity measures are -


1 Entropy.
2 Gini index.
3 Misclassification Error.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Measuring Impurity - Entropy


Entropy -
A probabilistic measure of impurity of decisions based on a certain
combination of attributes (and attribute values). It is measured as
below,
Xc Xc
E (D) = − pi ×log2 (pi ) or , E (D) = − pi ×logc (pi )
i=1 i=1

c is the number of target classes.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Measuring Impurity - Entropy


Entropy -
A probabilistic measure of impurity of decisions based on a certain
combination of attributes (and attribute values). It is measured as
below,
Xc Xc
E (D) = − pi ×log2 (pi ) or , E (D) = − pi ×logc (pi )
i=1 i=1

c is the number of target classes.

A predictive model with c classes can have a maximum entropy of:


1 log (c) for equation I.
2

2 1 for equation II.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Measuring Impurity - Entropy


Entropy -
A probabilistic measure of impurity of decisions based on a certain
combination of attributes (and attribute values). It is measured as
below,
Xc Xc
E (D) = − pi ×log2 (pi ) or , E (D) = − pi ×logc (pi )
i=1 i=1

c is the number of target classes.

A predictive model with c classes can have a maximum entropy of:


1 log (c) for equation I.
2

2 1 for equation II. ! !


9 9 5 5
E ([9+, 5−]) = − × log2 − × log2 = 0.940
14 14 14 14
Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Nature of Entropy

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Other Impurity Measures -


Gini Index -
Xc
G (D) = 1 − pi2
i=1
c is the number of target classes.

Misclassification Error -
c
ME (D) = 1 − maxi=1 (pi )

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Other Impurity Measures -


Gini Index -
Xc
G (D) = 1 − pi2
i=1
c is the number of target classes.

Misclassification Error -
c
ME (D) = 1 − maxi=1 (pi )

Example:
!2 ! !2
5 9
G ([9+, 5−]) = 1 − + = 0.46
14 14
!
9 5
ME ([9+, 5−]) = 1 − max , = 0.357
14 14
Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Impurity Measures -
Example 1:
Node N1 Count E(0+,6-) = - (0/6)log2 (0/6) - (6/6)log2 (6/6) = 0
Class 0 0 G(0+,6-) = 1 - (0/6)2 - (6/6)2 = 0
Class 1 6 ME(0+,6-) = 1 - max((0/6), (6/6)) = 0

Example 2:
Node N1 Count E(1+,5-) = - (1/6)log2 (1/6) - (5/6)log2 (5/6) = 0.65
Class 0 1 G(1+,5-) = 1 - (1/6)2 - (5/6)2 = 0.278
Class 1 5 ME(1+,5-) = 1 - max((1/6), (5/6)) = 0.167

Example 3:
Node N1 Count E(0+,6-) = - (3/6)log2 (3/6) - (3/6)log2 (3/6) = 1
Class 0 3 G(0+,6-) = 1 - (3/6)2 - (3/6)2 = 0.5
Class 1 3 ME(0+,6-) = 1 - max((3/6), (3/6)) = 0.5

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Nature of Impurity Measures

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

How to Build a Decision Tree -

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Choosing the best attribute -

Find the attribute leading to maximum gain in Entropy


X |Dv |
Gain(D, A) = E (D) − E (Dv )
v ∈Values(A) |D|

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Choosing the best attribute -

We eventually get the following -


Gain(D, Outlook) = 0.246 Gain(D, Humidity) = 0.151
Gain(D, Wind) = 0.048 Gain(D, Temperature) = 0.029

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Choosing the best attribute -

We eventually get the following -


Gain(D, Outlook) = 0.246 Gain(D, Humidity) = 0.151
Gain(D, Wind) = 0.048 Gain(D, Temperature) = 0.029

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Iteratively continue the process -

Consider the extracted subset to be a new dataset and repeat the


same procedure on it.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Iteratively continue the process -

Maximum gain can be obtained with Humidity.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Final Decision Tree

The final Decision Tree looks like this.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

Overfitting - Decision Tree


Definition - Overfitting
Given a hypothesis space H, a hypothesis h ∈ H is said to overfit
the training data if there exists some alternative hypothesis h0 ∈ H,
such that h has smaller error than h0 over the training examples,
but h0 has a smaller error than h over the entire distribution of
instances.
If we have less nodes, the model may underfit, whereas with too
many nodes the model may overfit.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

How to correct overfitting - Pruning


We use a part of the initial dataset (30% of training data) to
correct the model. This dataset is known as validation data.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

How to correct overfitting - Pruning


We use a part of the initial dataset (30% of training data) to
correct the model. This dataset is known as validation data.
Start from the nodes and move bottom-up.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

How to correct overfitting - Pruning


We use a part of the initial dataset (30% of training data) to
correct the model. This dataset is known as validation data.
Start from the nodes and move bottom-up.
If the entropy at a leaf’s level is more than that of its parent,
then prune the child nodes of its parent.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier
Introduction, Impurity Measures
Predictive Modelling
Decision Tree Construction
Decision Tree
Overfitting in Decision Tree

How to correct overfitting - Pruning


We use a part of the initial dataset (30% of training data) to
correct the model. This dataset is known as validation data.
Start from the nodes and move bottom-up.
If the entropy at a leaf’s level is more than that of its parent,
then prune the child nodes of its parent.
This method is known as Reduced Error Pruning. There are
many other pruning (Rule Post-Pruning, etc) methods as well.

Partha Basuchowdhuri Machine Learning Topics: Decision Tree, Naı̈ve Bayes Classifier

You might also like