Professional Documents
Culture Documents
8.decision Trees
8.decision Trees
Trees
2
Outline
• What is a Decision Tree (DT)
• Attribute Measure
Entropy, information gain
• ID3 (Decision Tree Algorithm)
• Decision Tree Construction Examples
3
Classification problem
Decision Trees
• Decision Trees (DTs) are a non-parametric supervised learning method
used for classification and regression.
• A method for approximating discrete-valued target functions.
• The learned function is represented by a decision tree.
Attributen
vn1 vn3
vn2
Attributem Attributek
Attributel
Class2 Class3
7
Decision Trees
Lets start building the tree from scratch. We first need to decide which attribute to
make a decision. Lets say we selected “humidity”
Humidity
high normal
D1,D2,D3,D4
D5,D6,D7,D9
D8,D12,D14
D10,D11,D13
Violet present NO
Green present YES
9
Decision Trees
Now lets classify the first subset
D1,D2,D3,D4,D8,D12,D14 using attribute “wind”
Humidity
high normal
D1,D2,D3,D4 D5,D6,D7,D
D8,D12,D14 9
D10,D11,D1
3
10
Decision Trees
Subset D1,D2,D3,D4,D8,D12,D14 classified by attribute “wind”
Humidity
high normal
wind D5,D6,D7,D9
strong D10,D11,D13
weak
D2,D12,D14 D1,D3,D4,D8
11
Decision Trees
Now lets classify the subset D2,D12,D14 using attribute
“outlook”
Humidity
high normal
wind D5,D6,D7,D9
D10,D11,D13
strong weak
D2,D12,D14 D1,D3,D4,D8
12
Decision Trees
Subset D2,D12,D14 classified by “outlook”
Humidity
high normal
wind D5,D6,D7,D9
D10,D11,D13
strong weak
D2,D12,D14 D1,D3,D4,D8
13
Decision Trees
subset D2,D12,D14 classified using attribute “outlook”
Humidity
high normal
wind D5,D6,D7,D9
D10,D11,D13
strong weak
outlook D1,D3,D4,D8
Sunny Rain Overcast
No No Yes
14
Decision Trees
Now lets classify the subset D1,D3,D4,D8 using attribute
“outlook”
Humidity
high normal
wind D5,D6,D7,D9
strong D10,D11,D13
weak
outlook D1,D3,D4,D8
Sunny Rain Overcast
No No Yes
15
Decision Trees
subset D1,D3,D4,D8 classified by “outlook”
Humidity
high normal
wind D5,D6,D7,D9
strong D10,D11,D13
weak
outlook outlook
Sunny Rain Overcast Sunny Rain Overcast
No No Yes No Yes Yes
16
Decision Trees
Now classify the subset D5,D6,D7,D9,D10,D11,D13 using
attribute “outlook”
Humidity
high normal
wind D5,D6,D7,D9
strong D10,D11,D13
weak
outlook outlook
Sunny Rain Overcast Sunny Rain Overcast
No No Yes No Yes Yes
17
Decision Trees
subset D5,D6,D7,D9,D10,D11,D13 classified by “outlook”
Humidity
high normal
wind outlook
strong weak Sunny Rain Overcast
Decision Trees
Finally classify subset D5,D6,D10by “wind”
Humidity
high normal
wind outlook
strong weak Sunny Rain Overcast
Decision Trees
subset D5,D6,D10 classified by “wind”
Humidity
high normal
wind outlook
strong weak Sunny Rain Overcast
Decision Trees
Note: The decision tree can be expressed as an expression or if-then-else sentences
Humidity
high normal
wind outlook
strong weak Sunny Rain Overcast
Decision Trees
(humidity=high wind=strong outlook=overcast)
(humidity=normal outlook=sunny)
(humidity=normal outlook=overcast)
play=yes
22
Decision Trees
Now classify <sunny,hot,normal,weak>=?
Humidity
high normal
wind outlook
strong weak Sunny Rain Overcast
Decision Trees
Now classify <sunny,hot,normal,weak>=?
Humidity
high normal
wind outlook
strong weak Sunny Rain Overcast
Decision Trees
Another tree from the same training data using different attributes
Question: What is the maximal number of (unique) decision trees we can possibly build?
Outline
What is a Decision Tree (DT)
Attribute Measure
Entropy, information gain
ID3 (Decision Tree Algorithm)
Decision Tree Construction Example
27
Entropy
A measure of the disorder or randomness in a closed system)
Entropy(S)=-p+log2p+-p-log2p-
In our system we have 9 positive examples
and 5 negative examples:
Entropy
If there are more than two classes:
Log2n=log10n / log102
Conditional Entropy
The entropy of the system, given attribute X,
We can evaluate each attribute by calculating how much change they will do in entropy. For example, we
can evaluate the attribute “Temperature” .
There are 3 values (Hot,Mild,Cool). So we can create 3 subsets for each value.
Shot={D1,D2,D3,D13}
Smild={D4,D8,D10,D11,D12,D14}
Scool={D5,D6,D7,D9}
Conditional Entropy
Shot={D1,D2,D3,D13} (2 positive and 2 negative examples)
p+=0.5
p-=0.5
Entropy(Shot),=-0.5log20.5-0.5log20.5=1
31
Conditional Entropy
Smild={D4 (+),D8(-),D10(+),D11(+),D12(+),D14(-)}
p+=0.666
p-=0.333
Entropy(Smild),=-0.666log20.666-0.333log20.333=0.918
32
Conditional Entropy
Scool={D5(+),D6(-),D7(+),D9(+)}
p+=.75
p-=0.25
Entropy(Scool),=-0.25log20.25-0.75log20.75=0.811
Conditional Entropy H(Play |
temp)
We can represent the entropy of the system when we use attribute
“Temperature”
Information Gain
We define Gain as the difference between the entropy of the system before the
Split and the expected entropy of the system after the split.
Outline
What is a Decision Tree (DT)
Attribute Measure
Entropy, information gain
ID3 (Decision Tree Algorithm)
Decision Tree Construction Example
Bias and Overfitting
Other Decision Trees
36
ID3 (Decision Tree Algorithm)
•DTL(Examples, TargetAttribute, Attributes)
Robot can
turn left & right,
And move forward
Obstacles
Robot
38
Entropy=-1/6*log2(1/6)-1/6*log2(1/6)-4/6*log2(4/6)=1.25
Gain(LeftSensor)=Entropy-2/6*Entropy(LeftSensor=obstacle)
-4/6*Entropy(LeftSensor=free)
=1.25-2/6*1-4/6*0.811=0.326
Gain(RightSensor)=Entropy-2/6*Entropy(RightSensor=obstacle)
-4/6*Entropy(RightSensor=free)
=1.25-2/6*0-4/6*1.5=0.25
Gain(ForwardSensor)=Entropy-2/6*Entropy(ForwardSensor=obstacle)
-4/6*Entropy(ForwardSensor=Free)
=1.25-2/6*1-4/6*0= .917
Gain(BackSensor)=Entropy-2/6*Entropy(BackSensor=obstacle)
-4/6*Entropy(BackSensor=free)
=1.25-2/6*0-4/6*1.5=0.25
Gain(PreviousAction)=Entropy-2/6*Entropy(PreviousAction=MoveForward)
-2/6*Entropy(PreviousAction=TurnLeft)
-2/6*Entropy(PreviousAction=TurnRight)
= 1.25-2/6*1-2/6*1-2/6*0
=0.58
40
ForwardSensor
free obstacle
MoveForward {X1,X2}
Select ForwardSensor
41
Entropy({X1,X2})=-1/2*log2(1/2)-1/2*log2(1/2)=1
Gain(LeftSensor)=1-1/2*Entropy(LeftSensor=free)
-1/2*Entropy(LeftSensor=obstacle)
=1-1/2*0-1/2*0
=1
Gain(RightSensor)=1-1*Entropy(RightSensor=0)
=1-1=0
Gain(BackSensor)=0
Gain(PreviousAction)=1
ForwardSensor ForwardSensor
free free
obstacle obstacle
MoveForward MoveForward
LeftSensor Previous action
Error
• Center represents ambiguity—
50/50 split Classification Error
Error/Entropy
but is curved
Classification Error
Cross Entropy
0. 0.5 1.
0 Purity 0
The CART algorithm
produces only binary Trees: non-leaf nodes always have two
children (i.e., questions only have yes/no answers). On the contrary,
other Tree algorithms such as ID3 can produce Decision Trees with
nodes having more than two children.
𝐸 𝑡 = 1 − max[𝑝 𝑖 𝑡 ]
𝑖
Strengths of Decision Trees
• No preprocessing or scaling
required
DecisionTreeClassifier: The Syntax
Import the class containing the classification method
from sklearn.tree import DecisionTreeClassifier
Fit the instance on the data and then predict the expected value
DTC = DTC.fit(X_train, y_train)
y_predict = DTC.predict(X_test)
Fit the instance on the data and then predict the expected value
DTC = DTC.fit(X_train, y_train)
y_predict = DTC.predict(X_test)