You are on page 1of 34

Supervised Machine Learning

CSE4107: Artificial Intelligence

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 1


ML
References
• This slide is taken from — CS583, Bing Liu, UIC
• https://medium.com/deep-math-machine-learning-
ai/chapter-4-decision-trees-algorithms-b93975f7a1f1

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 2


ML
Decision Tree
• Decision tree learning is one of the most widely used
techniques for classification.
• Its classification accuracy is competitive with other
methods, and
• it is very efficient.
• The classification model is a tree, called decision tree.

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 3


ML
Decision Tree
A decision tree is a tree where
• each node represents a feature (attribute)
• each link (branch) represents a decision (rule) and
• each leaf represents an outcome (categorical or
continues value)

A Node/feature
Link/branch
(decision rule)
B C

Yes No Outcome

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 4


ML
Decision Tree

Age Car Type Class Age < 27.5 —> High


Age >= 27.5 and CarType = Sports —> High
20 Family High
Age >= 27.5 and CarType ≠ Sports —> Low
23 Family High

17 Sports High Age < 27.5

43 Sports High No Yes

32 Truck Low
CarType
68 Family Low Not
Sports Sports High

High No

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 5


ML
Decision Tree — How to build?
• Classification and Regression Tree (CART) : Gini Index
• Iterative Dichotomiser 3 (ID3) : Entropy function and
Information Gain

Why Decision Tree?

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 6


ML
Decision Tree (Math)
• Four Features (X) Play Tennis Dataset
✓ outlook outlook temp humidity windy play
✓ temp sunny
sunny
hot
hot
high
high
FALSE
TRUE
no
no
✓ humidity overcast hot high FALSE yes

✓ windy rainy
rainy
mild
cool
high
normal
FALSE
FALSE
yes
yes
• Class (y) rainy cool normal TRUE no

✓ play
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes

• All of the feature values are categorical rainy mild normal FALSE yes
sunny mild normal TRUE yes
• Binary Classification Problem overcast mild high TRUE yes
• We will apply ID3 algorithm overcast hot normal FALSE yes
rainy mild high TRUE no

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 7


ML
Decision Tree (Math)
• Firstly, we need to select an attribute/feature node
✓ it will be the root of the tree
✓ best classifies the training data
✓ repeat this approach for each branch

How do we select an attribute to be as the root of the tree?

Information Gain (ID3 Algorithm)

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 8


ML
Entropy Measure

CSE4107: Artificial Intelligence (Spring-2020) 9


ML
Information theory: Entropy measure
• The entropy formula,
|C |
entropy( D ) = −  Pr(c ) log
j =1
j 2 Pr(c j )

|C |

 Pr(c ) = 1,
j =1
j

• Pr(cj) is the probability of class cj in data set D


• We use entropy as a measure of impurity or disorder of
data set D. (Or, a measure of information in a tree)

CSE4107: Artificial Intelligence (Spring-2020) 10


ML
Entropy Measure

For a binary classification problem


• If all examples are positive or all are negative then entropy will
be zero i.e., low.
• If half of the examples are of positive class and half are of
negative class then entropy is one i.e., high.

CSE4107: Artificial Intelligence (Spring-2020) 11


ML
Entropy measure: Examples

◼ As the data become purer and purer, the entropy value


becomes smaller and smaller. This is useful to us!
1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 12
ML
Information gain
• Given a set of examples D, we first compute its entropy:

• If we make attribute Ai, with v values, the root of the


current tree, this will partition D into v subsets D1, D2 …, Dv.
The expected entropy if Ai is used as the current root:

v | Dj |
entropyAi ( D) =  | D |  entropy( D )
j =1
j

CSE4107: Artificial Intelligence (Spring-2020) 13


ML
Information gain (cont …)

• Information gained by selecting attribute Ai to branch


or to partition the data is

gain( D, Ai ) = entropy ( D) − entropy Ai ( D)


• We choose the attribute with the highest gain to
branch/split the current tree.

CSE4107: Artificial Intelligence (Spring-2020) 14


ML
Decision Tree (Math)
• Calculate the Entropy of Play Tennis Dataset

Play Tennis Dataset


outlook temp humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 15


ML
Decision Tree (Math)
• Entropy of the play class
Play Tennis Dataset
• There are 14 instances outlook temp humidity windy play
✓ Nine (9) are Classified as yes sunny hot high FALSE no

✓ Five (5) are Classified as no sunny hot high TRUE no


overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 16


ML
Decision Tree (Math)
Play Tennis Dataset
outlook temp humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 17


ML
Decision Tree (Math)
Play Tennis Dataset
outlook temp humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
• Calculate Entropy and Information overcast cool normal TRUE yes
Gain for every feature sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 18


ML
Decision Tree (Math)
Play Tennis Dataset
outlook temp humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
4 4
log 2 + 0 sunny cool normal FALSE yes
4 4
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

*** Calculate Entropy and Information Gain for ‘temp’, ‘humidity’, and ‘windy’ class.
CSE4107: Artificial Intelligence (Spring-2020) 19
ML
Decision Tree (Math)
Class — outlook
Play Tennis Dataset
Entropy: 0.693
outlook temp humidity windy play
Information Gain: 0.940 – 0.693 = 0.247 sunny hot high FALSE no
sunny hot high TRUE no
Class — temp overcast hot high FALSE yes
rainy mild high FALSE yes
Entropy: 0.911
rainy cool normal FALSE yes
Information Gain: 0.940 – 0.911 = 0.029 rainy cool normal TRUE no
overcast cool normal TRUE yes
Class — humidity sunny mild high FALSE no
sunny cool normal FALSE yes
Entropy: 0.788
rainy mild normal FALSE yes

Information Gain: 0.940 – 0.788 = 0.152 sunny mild normal TRUE yes
overcast mild high TRUE yes
Class — windy overcast hot normal FALSE yes
rainy mild high TRUE no
Entropy: 0.892

Information Gain: 0.940 – 0.892 = 0.048

CSE4107: Artificial Intelligence (Spring-2020) 20


ML
Decision Tree (Math)
Class — outlook
Play Tennis Dataset
Entropy: 0.693
outlook temp humidity windy play
Information Gain: 0.940 – 0.693 = 0.247 sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
Root Node is — outlook rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 21


ML
Decision Tree (Math)
Play Tennis Dataset
outlook temp humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
rainy mild normal FALSE yes
sunny mild normal TRUE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes
rainy mild high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 22


ML
Decision Tree (Math)

CSE4107: Artificial Intelligence (Spring-2020) 23


ML
Decision Tree (Math)

CSE4107: Artificial Intelligence (Spring-2020) 24


ML
Decision Tree (Math)

CSE4107: Artificial Intelligence (Spring-2020) 25


ML
The loan data (reproduced)
Approved or not

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 26


ML
A decision tree from the loan data
◼ Decision nodes and leaf nodes (classes)

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 27


ML
Use the decision tree

No

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 28


ML
Is the decision tree unique?
◼ No. Here is a simpler tree.
◼ We want smaller tree and accurate tree.
◼ Easy to understand and perform better.

◼ Finding the best tree is NP-


hard.
◼ All current tree building
algorithms are heuristic
algorithms

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 29


ML
From a decision tree to a set of rules
◼ A decision tree can
be converted to a
set of rules
◼ Each path from the
root to a leaf is a
rule.

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 30


An example
D = Class
• 9 positive class
• 6 negative class
6 6 9 9
entropy( D) =  log 2 +  log 2 = 0.971
15 15 15 15

ML
CSE4107: Artificial Intelligence (Spring-2020) 31
An example
6 6 9 9
entropy( D) =  log 2 +  log 2 = 0.971
15 15 15 15

5 5 5
entropyAge ( D) =  entropy( D1 ) +  entropy( D2 ) +  entropy( D3 ) Age Yes No entropy(Di)
15 15 15 young 2 3 0.971
5 5 5 middle 3 2 0.971
=  0.971 +  0.971 +  0.722
15 15 15 old 4 1 0.722
= 0.888

ML
CSE4107: Artificial Intelligence (Spring-2020) 32
An example
6 6 9 9
entropy( D) =  log 2 +  log 2 = 0.971
15 15 15 15

6 9
entropyOwn _ house ( D) =  entropy( D1 ) +  entropy( D2 )
15 15
6 9
=  0 +  0.918
15 15
= 0.551

5 5 5
entropyAge ( D) =  entropy( D1 ) +  entropy( D2 ) +  entropy( D3 ) Age Yes No entropy(Di)
15 15 15 young 2 3 0.971
5 5 5 middle 3 2 0.971
=  0.971 +  0.971 +  0.722
15 15 15 old 4 1 0.722
= 0.888

◼ Own_house is the best


choice for the root.
ML
CSE4107: Artificial Intelligence (Spring-2020) 33
ML
We build the final tree

◼ We can use information gain ratio to evaluate the


impurity as well (see the handout)

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 34

You might also like