You are on page 1of 57

Decision

Trees
2

Outline
• What is a Decision Tree (DT)
• Attribute Measure
Entropy, information gain
• ID3 (Decision Tree Algorithm)
• Decision Tree Construction Examples
3

Classification problem

Goal: Given dataset D, learn a function that assigns a record to


one of several predefined classes.
4
Classification Problem Example
5

Goals and Requirements


Goals:
• To produce an accurate classifier function
• To understand the structure of the problem
Requirements on the model:
• High accuracy
• Fast construction for very large training databases
6

Decision Trees
• Decision Trees (DTs) are a non-parametric supervised learning method
used for classification and regression.
• A method for approximating discrete-valued target functions.
• The learned function is represented by a decision tree.
Attributen
vn1 vn3
vn2
Attributem Attributek

vm1 vm2 Class1 vk1 vk2

Attributel

Class2 Class2 Class3


vl1 vl2

Class2 Class3
7

Decision Trees: Dataset Class


Attribute
8

Decision Trees
Lets start building the tree from scratch. We first need to decide which attribute to
make a decision. Lets say we selected “humidity”

Humidity

high normal

D1,D2,D3,D4
D5,D6,D7,D9
D8,D12,D14
D10,D11,D13

Violet present NO
Green present YES
9

Decision Trees
Now lets classify the first subset
D1,D2,D3,D4,D8,D12,D14 using attribute “wind”

Humidity

high normal

D1,D2,D3,D4 D5,D6,D7,D
D8,D12,D14 9
D10,D11,D1
3
10

Decision Trees
Subset D1,D2,D3,D4,D8,D12,D14 classified by attribute “wind”

Humidity

high normal

wind D5,D6,D7,D9
strong D10,D11,D13
weak

D2,D12,D14 D1,D3,D4,D8
11

Decision Trees
Now lets classify the subset D2,D12,D14 using attribute
“outlook”

Humidity

high normal

wind D5,D6,D7,D9
D10,D11,D13
strong weak

D2,D12,D14 D1,D3,D4,D8
12

Decision Trees
Subset D2,D12,D14 classified by “outlook”

Humidity

high normal

wind D5,D6,D7,D9
D10,D11,D13
strong weak

D2,D12,D14 D1,D3,D4,D8
13

Decision Trees
subset D2,D12,D14 classified using attribute “outlook”

Humidity

high normal

wind D5,D6,D7,D9
D10,D11,D13
strong weak

outlook D1,D3,D4,D8
Sunny Rain Overcast
No No Yes
14

Decision Trees
Now lets classify the subset D1,D3,D4,D8 using attribute
“outlook”

Humidity

high normal

wind D5,D6,D7,D9
strong D10,D11,D13
weak

outlook D1,D3,D4,D8
Sunny Rain Overcast
No No Yes
15

Decision Trees
subset D1,D3,D4,D8 classified by “outlook”

Humidity

high normal

wind D5,D6,D7,D9
strong D10,D11,D13
weak

outlook outlook
Sunny Rain Overcast Sunny Rain Overcast
No No Yes No Yes Yes
16

Decision Trees
Now classify the subset D5,D6,D7,D9,D10,D11,D13 using
attribute “outlook”

Humidity

high normal

wind D5,D6,D7,D9
strong D10,D11,D13
weak

outlook outlook
Sunny Rain Overcast Sunny Rain Overcast
No No Yes No Yes Yes
17

Decision Trees
subset D5,D6,D7,D9,D10,D11,D13 classified by “outlook”

Humidity

high normal

wind outlook
strong weak Sunny Rain Overcast

outlook outlook Yes D5,D6,D10 Yes


Sunny Rain Overcast Sunny Rain Overcast
No No Yes No Yes Yes
18

Decision Trees
Finally classify subset D5,D6,D10by “wind”

Humidity

high normal

wind outlook
strong weak Sunny Rain Overcast

outlook outlook Yes D5,D6,D10 Yes


Sunny Rain Overcast Sunny Rain Overcast
No No Yes No Yes Yes
19

Decision Trees
subset D5,D6,D10 classified by “wind”

Humidity

high normal

wind outlook
strong weak Sunny Rain Overcast

outlook outlook Yes wind Yes

Sunny Rain Overcast Sunny Rain Overcast strong weak

No No Yes No Yes Yes No Yes


20

Decision Trees
Note: The decision tree can be expressed as an expression or if-then-else sentences

Humidity

high normal

wind outlook
strong weak Sunny Rain Overcast

outlook outlook Yes wind Yes

Sunny Rain Overcast Sunny Rain Overcast strong weak


No No Yes No Yes Yes No Yes
21

Decision Trees
(humidity=high  wind=strong  outlook=overcast) 

(humidity=high  wind=weak  outlook=overcast) 

(humidity=normal  outlook=sunny) 

(humidity=normal  outlook=overcast) 

(humidity=normal  outlook=overcast  wind=weak)

play=yes
22

Decision Trees
Now classify <sunny,hot,normal,weak>=?

Humidity

high normal

wind outlook
strong weak Sunny Rain Overcast

outlook outlook Yes wind Yes

Sunny Rain Overcast Sunny Rain Overcast strong weak

No No Yes No Yes Yes No Yes


23

Decision Trees
Now classify <sunny,hot,normal,weak>=?

Humidity

high normal

wind outlook
strong weak Sunny Rain Overcast

outlook outlook Yes wind Yes

Sunny Rain Overcast Sunny Rain Overcast strong weak

No No Yes No Yes Yes No Yes


24

Quality of Decision Tree


Accuracy
Size
Complexity
25

Decision Trees
Another tree from the same training data using different attributes

Question: What is the maximal number of (unique) decision trees we can possibly build?

Which attribute we should choose at branches?


26

Outline
What is a Decision Tree (DT)
Attribute Measure
Entropy, information gain
ID3 (Decision Tree Algorithm)
Decision Tree Construction Example
27

Entropy
A measure of the disorder or randomness in a closed system)

Entropy(S)=-p+log2p+-p-log2p-
In our system we have 9 positive examples
and 5 negative examples:

p+=9/14 (ratio of positive examples to all


examples)
p-=5/14 (ratio of negative examples to all examples)

So entropy of our system is


Entropy(S)=H(S)=-(9/14)*log2(9/14)-(5/14)*log2(5/14)=0.940
28

Entropy
If there are more than two classes:

Log2n=log10n / log102

Where n is the number of classes


29

Conditional Entropy
The entropy of the system, given attribute X,
We can evaluate each attribute by calculating how much change they will do in entropy. For example, we
can evaluate the attribute “Temperature” .
There are 3 values (Hot,Mild,Cool). So we can create 3 subsets for each value.
Shot={D1,D2,D3,D13}
Smild={D4,D8,D10,D11,D12,D14}
Scool={D5,D6,D7,D9}

Lets find entropy in each subset Entropy(Shot),Entropy(Smild),Entropy(Scool)


30

Conditional Entropy
Shot={D1,D2,D3,D13} (2 positive and 2 negative examples)
p+=0.5
p-=0.5

Entropy(Shot),=-0.5log20.5-0.5log20.5=1
31

Conditional Entropy
Smild={D4 (+),D8(-),D10(+),D11(+),D12(+),D14(-)}
p+=0.666
p-=0.333

Entropy(Smild),=-0.666log20.666-0.333log20.333=0.918
32

Conditional Entropy
Scool={D5(+),D6(-),D7(+),D9(+)}

p+=.75
p-=0.25

Entropy(Scool),=-0.25log20.25-0.75log20.75=0.811
Conditional Entropy H(Play |
temp)
We can represent the entropy of the system when we use attribute
“Temperature”

(|Shot|/S) Entropy(Shot) + (|Smild|/S)Entropy(Smild) +


(|Scool|/S)Entropy(Scool)
(4/14)+(6/14)0.918+(4/14)0.811=0.9108
34

Information Gain
We define Gain as the difference between the entropy of the system before the
Split and the expected entropy of the system after the split.

• I (S | Outlook) = 0.246  <--- chosen attribute


• I (S | Humidity) = 0.151
• I (S | Wind) = 0.048
•I (S | Temp) = 0.029 (0.940 - 0.9108)
35

Outline
What is a Decision Tree (DT)
Attribute Measure
Entropy, information gain
ID3 (Decision Tree Algorithm)
Decision Tree Construction Example
Bias and Overfitting
Other Decision Trees
36
ID3 (Decision Tree Algorithm)
•DTL(Examples, TargetAttribute, Attributes)

•create a Root node for the tree


•If all Examples are positive, return the single-node tree Root, with label = Yes
•If all Examples are negative, return the single-node tree Root, with label = No
•If Attributes is empty,
• return the single-node tree Root, with label = most common value of TargetAttributes in Examples
•Else begin
• A  <--  the attribute from Attributes with the highest information gain with respect to Examples
• Make A the decision attribute for Root
• For each possible value v of A
• add a new tree branch below Root, corresponding to the test A = v
• let Examplesv be the subset of Examples that have value v for attribute A
• If Examplesv is empty Then
• add a leaf node below this new branch with label = most common value of
TargetAttribute in Examples
• Else
• add the subtree DTL(Examplesv, TargetAttribute, Attributes - { A })
• End For
•End begin
•return Root
37

Decision Tree Example

Robot can
turn left & right,
And move forward

Obstacles

Robot
38

Decision Tree Example

Left Right Forward Back Previous Action


Sensor Sensor Sensor Sensor Action
Obstacle Free Obstacle Free moveForward TurnRight
X1
Free Free Obstacle Free TurnLeft TurnLeft
X2

X3 Free Obstacle Free free moveForward MoveForward

Free Obstacle Free Obstacle TurnLeft MoveForward


X4
Obstacle Free Free Free TurnRight MoveForward
X5
Free Free Free Obstacle TurnRight MoveForward
X6
Decision Tree Example 39

Entropy=-1/6*log2(1/6)-1/6*log2(1/6)-4/6*log2(4/6)=1.25

Gain(LeftSensor)=Entropy-2/6*Entropy(LeftSensor=obstacle)
-4/6*Entropy(LeftSensor=free)
=1.25-2/6*1-4/6*0.811=0.326

Gain(RightSensor)=Entropy-2/6*Entropy(RightSensor=obstacle)
-4/6*Entropy(RightSensor=free)
=1.25-2/6*0-4/6*1.5=0.25

Gain(ForwardSensor)=Entropy-2/6*Entropy(ForwardSensor=obstacle)
-4/6*Entropy(ForwardSensor=Free)
=1.25-2/6*1-4/6*0= .917

Gain(BackSensor)=Entropy-2/6*Entropy(BackSensor=obstacle)
-4/6*Entropy(BackSensor=free)
=1.25-2/6*0-4/6*1.5=0.25

Gain(PreviousAction)=Entropy-2/6*Entropy(PreviousAction=MoveForward)
-2/6*Entropy(PreviousAction=TurnLeft)
-2/6*Entropy(PreviousAction=TurnRight)
= 1.25-2/6*1-2/6*1-2/6*0
=0.58
40

Decision Tree Example

ForwardSensor

free obstacle

MoveForward {X1,X2}

Select ForwardSensor
41

Decision Tree Example

Entropy({X1,X2})=-1/2*log2(1/2)-1/2*log2(1/2)=1
Gain(LeftSensor)=1-1/2*Entropy(LeftSensor=free)
-1/2*Entropy(LeftSensor=obstacle)
=1-1/2*0-1/2*0
=1
Gain(RightSensor)=1-1*Entropy(RightSensor=0)
=1-1=0
Gain(BackSensor)=0
Gain(PreviousAction)=1

Either LeftSensor or PreviousAction depending on the execution order


42

Decision Tree Example

ForwardSensor ForwardSensor

free free
obstacle obstacle

MoveForward MoveForward
LeftSensor Previous action

obstacle free Move forward TurnLeft

TurnRight (X1) TurnLeft (X2) TurnRight (x1) TurnLeft (X2)


43

Convert A Tree To Rules


Regression Trees Predict Continuous Values

• Example: use slope and Node Elevation:


elevation in Himalayas s < 7900 ft.
• Predict average Slope:
precipitation < 2.5º
(continuous value) 55.42
in.
• Values at leaves are
averages of members 13.67 48.50
in. in.
Leave
s
Classification Error vs Entropy Vs Gini

• Classification error is a flat


function with maximum at
center

Error
• Center represents ambiguity—
50/50 split Classification Error

• Splitting metrics favor results that


are furthest away from the 0. 0. 5 1.
center 0 Purity 0
𝐸 𝑡 = 1 − max[𝑝
𝑖 𝑡 ]
𝑖
Classification Error vs Entropy

• Entropy has the same maximum

Error/Entropy
but is curved

Classification Error
Cross Entropy

0. 0.5 1.
0 Purity 0
The CART algorithm
produces only binary Trees: non-leaf nodes always have two
children (i.e., questions only have yes/no answers). On the contrary,
other Tree algorithms such as ID3 can produce Decision Trees with
nodes having more than two children.

CART with example using Gini (not entropy)


https://www.youtube.com/watch?v=B4I6Im35jkE&list=PLS9ZE8KmjrmO
zmdG8lI8-X4eytgLnSyCA&index=13
Decision Trees are High Variance

• Problem: decision trees tend to


overfit

• Small changes in data greatly


affect prediction--high variance

• Solution: Prune trees


Pruning Decision Trees

• How to decide what leaves to


prune?

• Solution: prune based on


classification error threshold
Pruning Decision Trees

• How to decide what leaves to


prune?

• Solution: prune based on


classification error threshold

𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
𝑖
Strengths of Decision Trees

• Easy to interpret and


implement—"if … then … else"
logic

• Handle any data category—


binary, ordinal, continuous

• No preprocessing or scaling
required
DecisionTreeClassifier: The Syntax
Import the class containing the classification method
from sklearn.tree import DecisionTreeClassifier

To use the Intel® Extension for Scikit-learn* variant of this


algorithm:
• Install Intel® oneAPI AI Analytics Toolkit (AI Kit)
• Add the following two lines of code after the above code:
import patch_sklearn
patch_sklearn().
DecisionTreeClassifier: The
Syntax
Import the class containing the classification
method
from sklearn.tree import DecisionTreeClassifier
Create an instance of the class
DTC = DecisionTreeClassifier(criterion='gini', tree
parameter
max_features=10, max_depth=5) s

Fit the instance on the data and then predict the expected value
DTC = DTC.fit(X_train, y_train)
y_predict = DTC.predict(X_test)

Tune parameters with cross-


validation. Use
DecisionTreeRegressor for
DecisionTreeClassifier: The
Syntax
Import the class containing the classification method
from sklearn.tree import DecisionTreeClassifier

Create an instance of the class


DTC = DecisionTreeClassifier(criterion='gini',
max_features=10, max_depth=5)

Fit the instance on the data and then predict the expected value
DTC = DTC.fit(X_train, y_train)
y_predict = DTC.predict(X_test)

Tune parameters with cross-


validation. Use
DecisionTreeRegressor for

You might also like