You are on page 1of 28

AI Foundations and Applications

5. Decision Trees

Thien Huynh-The
HCM City Univ. Technology and Education
Jan, 2023
Decision Trees

• Previous techniques have consisted of real-valued feature vectors (or discrete-


valued) and natural measures of distance (e.g., Euclidean).
• Consider a classification problem that involves nominal data – data described by a
list of attributes (e.g., categorizing people as short or tall using gender, height, age,
and ethnicity).
• How can we use such nominal data for
classification?
• How can we learn the categories of such
data?
• Nonmetric methods such as decision trees
provide a way to deal with such data.

HCMUTE AI Foundations and Applications 03/18/2024 2


Decision Trees

• Decision trees attempt to classify a pattern through a sequence of questions. For


example, attributes such as gender and height can be used to classify people as
short or tall. But the best threshold for height is gender dependent.
• A decision tree consists of nodes and leaves, with each leaf denoting a class.
• Classes (tall or short) are the outputs of the tree.
• Attributes (gender and height) are a set of features that describe the data.

• The input data consists of values of the


gender
different attributes. Using these attribute male female
values, the decision tree generates a class
as the output for each input data. height height

>1m7 ≤1m7 >1m55 ≤1m55


tall short tall short

HCMUTE AI Foundations and Applications 03/18/2024 3


Basic Principles

• The top, or first node, is called the root node.


• The last level of nodes are the leaf nodes and contain the final classification.
• The intermediate nodes are the descendant or “hidden” layers.
• Binary trees, like the one shown to the right, are the most popular type of tree.
However, M-ary trees (M branches at each node) are possible.

gender
male female

height height

>1m7 ≤1m7 >1m55 ≤1m55

tall short tall short

HCMUTE AI Foundations and Applications 03/18/2024 4


Basic Principles

• Nodes can contain one more questions. In a binary tree, by convention if the
answer to a question is “yes”, the left branch is selected. Note that the same
question can appear in multiple places in the network.
• Decision trees have several benefits over neural network-type approaches,
including interpretability and data-driven learning.
• Key questions include how to grow the tree, how to stop growing, and how to prune
the tree to increase generalization.
• Decision trees are very powerful and can give excellent performance on closed-set
testing. Generalization is a challenge.

HCMUTE AI Foundations and Applications 03/18/2024 5


ID3 Algorithm

• Benefits
• Can represent any Boolean Function
• Can be viewed as a way to compactly represent
a lot of data. Outlook

• Natural representation: (20 questions)


• The evaluation of the Decision Tree Classifier is Sunny Overcast Rain
easy
Humidity Wind
• Clearly, given data, there are many ways to Yes
represent it as a decision tree.
High Normal Strong Weak
• Learning a good representation from data is the No Yes No Yes
challenge.

HCMUTE AI Foundations and Applications 03/18/2024 6


ID3 Algorithm

• Given data you can always represent it using a decision tree; if so, what is a
"good" decision tree?
• Consider a following example: Will I play tennis today?
• Features
• Outlook: {Sun, Overcast, Rain}
• Temperature: {Hot, Mild, Cool}
• Humidity: {High, Normal, Low}
• Wind: {Strong, Weak}
• Labels
• Binary classification task: Y = {+, -}

HCMUTE AI Foundations and Applications 03/18/2024 7


Example

• Outlook: S(unny)
O(vercast)
R(ainy)
• Temperature: H(ot),
M(edium),
C(ool)
• Humidity: H(igh)
N(ormal)
L(ow)
• Wind: S(trong)
W(eak)

HCMUTE AI Foundations and Applications 03/18/2024 8


Example

• Data is processed in Batch (i.e. all the data


available)
• Recursively build a decision tree top down.

Outlook

Sunny Overcast Rain


Humidity Wind
Yes

High Normal Strong Weak


No Yes No Yes

HCMUTE AI Foundations and Applications 03/18/2024 9


Example

• Let S be the set of Examples


• Label is the target attribute (the prediction)
• Attributes is the set of measured attributes
• ID3 (S, Attributes, Label)
If all examples are labeled the same return a single node tree with Label
Otherwise Begin
A = attribute in Attributes that best classifies S (Create a Root node for tree)
for each possible value v of A
Add a new tree branch corresponding to A=v
Let Sv be the subset of examples in S with A=v
if Sv is empty: add leaf node with the common value of Label in S
Else: below this branch add the subtree
ID3(Sv, Attributes - {a}, Label)
End
Return Root
HCMUTE AI Foundations and Applications 03/18/2024 10
Picking the Root Attribute

• The goal is to have the resulting decision tree as small as possible (Occam’s Razor)
• But, finding the minimal decision tree consistent with the data is NP-hard
• The recursive algorithm is a greedy heuristic search for a simple tree, but cannot
guarantee optimality.
• The main decision of the algorithm is the selection of the next attribute to condition
on.
• Consider data with two Boolean attributes (A,B).
< (A=0, B=0), - >: 50 examples
< (A=0, B=1), - >: 50 examples
< (A=1, B=0), - >: 0 examples
< (A=1, B=1), + >: 100 examples
• What should be the first attribute we select?
HCMUTE AI Foundations and Applications 03/18/2024 11
Picking the Root Attribute

• Consider data with two Boolean attributes (A,B).


< (A=0, B=0), - >: 50 examples
A
< (A=0, B=1), - >: 50 examples 1 0
< (A=1, B=0), - >: 0 examples + -

< (A=1, B=1), + >: 100 examples splitting on A

• What should be the first attribute we select?


B
•Splitting on A: we get purely labeled nodes. 1 0
•Splitting on B: we don’t get purely labeled nodes. A -
•What if we have: <(A=1, B=0), - >: 3 examples? 1 0
• (one way to think about it: # of queries required to + -
label a random data point) splitting on B

HCMUTE AI Foundations and Applications 03/18/2024 12


• Consider data with two Boolean attributes (A,B).
A
1 0
< (A=0, B=0), - >: 50 examples
B -
< (A=0, B=1), - >: 50 examples 1 0 100
+ -
< (A=1, B=0), - >: 0 examples 3 examples
100 3
< (A=1, B=1), + >: 100 examples splitting on A
• What should be the first attribute we select?
• Trees looks structurally similar; which attribute B
1 0
should we choose?
A -
• One way to think about it: # of queries required to 1 0 53
label a random data point. + -
• If we choose A we have less uncertainty about the 100 50
splitting on B
labels.
HCMUTE AI Foundations and Applications 03/18/2024 13
Picking the Root Attribute

• The goal is to have the resulting decision tree as small as possible (Occam’s
Razor)
• The main decision in the algorithm is the selection of the next attribute to
condition on.
• We want attributes that split the examples to sets that are relatively pure in one
label; this way we are closer to a leaf node.
• The most popular heuristics is based on information gain, originated with the ID3
system of Quinlan.

HCMUTE AI Foundations and Applications 03/18/2024 14


Entropy

• Entropy (impurity, disorder) of a set of examples, S, relative to a binary


classification is:

• is the proportion of positive examples in S and


• is the proportion of negatives examples in S
– If all the examples belong to the same category [(1,0) or (0,1)]: Entropy = 0
– If all the examples are equally mixed (0.5, 0.5): Entropy = 1
– Entropy = Level of uncertainty.
• In general, when pi is the fraction of examples labeled i:

• Entropy can be viewed as the number of bits required, on average, to encode the
class of labels. If the probability for + is 0.5, a single bit is required for each
example; if it is 0.8 – can use less then 1 bit.
HCMUTE AI Foundations and Applications 03/18/2024 15
Entropy

• Entropy (impurity, disorder) of a set of examples, S, relative to a binary classification is:

• is the proportion of positive examples in S and


• is the proportion of negatives examples in S Which one has the lowest
– If all the examples belong to the same category: Entropy = 0 entropy (or less
– If all the examples are equally mixed (0.5, 0.5): Entropy = 1 uncertainty)?
– Entropy = Level of uncertainty.

Test yourself: assign high,


medium, low to each of
these distributions

HCMUTE AI Foundations and Applications 03/18/2024 16


Exercise

Calculate the entropy for series 1 and series 2 with given example
distributions as follows

HCMUTE AI Foundations and Applications 03/18/2024 17


Information Gain

• The information gain of an attribute a is the expected reduction in entropy caused


by partitioning on this attribute

• Where:
– Sv is the subset of S for which attribute a has value v, and
– the entropy of partitioning the data is calculated by weighing the entropy of each partition by its size
relative to the original set
• Partitions of low entropy (imbalanced splits) lead to high gain
• Go back to check which of the A, B splits is better

HCMUTE AI Foundations and Applications 03/18/2024 18


Example

• Outlook: S(unny)
O(vercast)
R(ainy)
• Temperature: H(ot),
M(edium),
C(ool)
• Humidity: H(igh)
N(ormal)
L(ow)
• Wind: S(trong)
W(eak)

HCMUTE AI Foundations and Applications 03/18/2024 19


Example - Entropy

Calculate current entropy


;

 0.94

HCMUTE AI Foundations and Applications 03/18/2024 20


Example – Information Gain of Outlook

|Sv |
Gain (S , a ) = Entropy ( S ) − ∑ Entropy (S v )
v ∈values ( S) |S|
Outlook = sunny:
Entropy(O = S) = 0.971
Outlook = overcast:
Entropy(O = O) = 0
Outlook = rainy:
Entropy(O = R) = 0.971

Expected entropy
=
= (5/14)×0.971 + (4/14)×0 + (5/14)×0.971 = 0.694

Information gain = 0.940 – 0.694 = 0.246

HCMUTE AI Foundations and Applications 03/18/2024 21


Example – Information Gain of Huminity

¿
𝐺𝑎𝑖𝑛 ( 𝑆,𝑎 )=𝐸𝑛𝑡𝑟𝑜𝑝𝑦 ( 𝑆 ) − ∑ ¿ 𝑆𝑣 ∨
¿𝑆∨¿ 𝐸𝑛𝑡𝑟𝑜𝑝𝑦(𝑆𝑣 )
¿¿
𝑣∈𝑣𝑎𝑙𝑢𝑒𝑠(𝑆)
Humidity = high:
Entropy(H = H) = 0.985
Humidity = Normal:
Entropy(H = N) = 0.592

Expected entropy
=
= (7/14)×0.985 + (7/14)×0.592 = 0.7885

Information gain = 0.940 – 0.7885 = 0.1515

HCMUTE AI Foundations and Applications 03/18/2024 22


Which feature to split on?

Information gain:
Outlook: 0.246
Humidity: 0.151
Wind: 0.048
Temperature: 0.029

→ Split on Outlook

HCMUTE AI Foundations and Applications 03/18/2024 23


Complete the tree

• Students should complete and draw the tree that is able to make a decision based
on the values of given attributes

HCMUTE AI Foundations and Applications 03/18/2024 24


Student’s tasks

• Stopping criteria in decision trees


• Overfitting in decision trees and pruning technique
• Read here https://machinelearningcoban.com/2018/01/14/id3/
• Random forest algorithm
• Read here https://machinelearningcoban.com/tabml_book/ch_model/random_forest.
html
https://www.mathworks.com/help/stats/framework-for-ensemble-learning.html

HCMUTE AI Foundations and Applications 03/18/2024 25


Python
from __future__ import print_function
import numpy as np
import pandas as pd

class TreeNode(object):
def __init__(self, ids = None, children = [], entropy = 0, depth = 0):
self.ids = ids # index of data in this node
self.entropy = entropy # entropy, will fill later
self.depth = depth # distance to root node
self.split_attribute = None # which attribute is chosen, it non-leaf
self.children = children # list of its child nodes
self.order = None # order of values of split_attribute in children
self.label = None # label of node if it is a leaf

def set_properties(self, split_attribute, order):


self.split_attribute = split_attribute # split at which attribute
self.order = order # order of this node's children

def set_label(self, label):


self.label = label # set label if the node is a leaf

HCMUTE AI Foundations and Applications 03/18/2024 26


Python
def entropy(freq):
# remove prob 0
freq_0 = freq[np.array(freq).nonzero()[0]]
prob_0 = freq_0/float(freq_0.sum())
return -np.sum(prob_0*np.log(prob_0))
df = pd.DataFrame.from_csv('weather.csv')
X = df.iloc[:, :-1]
y = df.iloc[:, -1]
tree = DecisionTreeID3(max_depth = 3, min_samples_split = 2)
tree.fit(X, y)
print(tree.predict(X))

https://machinelearningcoban.com/2018/01/14/id3/

HCMUTE AI Foundations and Applications 03/18/2024 27


Matlab

• Students practice by following the example

https://www.mathworks.com/help/stats/train-decision-trees-in-classification-learn
er-app.html

https://www.mathworks.com/help/stats/view-decision-tree.html

HCMUTE AI Foundations and Applications 03/18/2024 28

You might also like