CSE4107 - Class4

Supervised Machine Learning
CSE4107: Artificial Intelligence
1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 1

ML
References
• This slide is taken from — CS583, Bing Liu, UIC
• https://medium.com/deep-math-machine-learning-
ai/chapter-4-decision-trees-algorithms-b93975f7a1f1

ML
Decision Tree
• Decision tree learning is one of the most widely used
techniques for classification.
• Its classification accuracy is competitive with other
methods, and
• it is very efficient.
• The classification model is a tree, called decision tree.

ML
Decision Tree
A decision tree is a tree where
• each node represents a feature (attribute)
• each link (branch) represents a decision (rule) and
• each leaf represents an outcome (categorical or
continues value)
A Node/feature
Link/branch
(decision rule)
B C
Yes No Outcome

ML
Decision Tree
Age Car Type Class Age < 27.5 —> High

Age >= 27.5 and CarType = Sports —> High
20 Family High
Age >= 27.5 and CarType ≠ Sports —> Low
23 Family High
17 Sports High Age < 27.5
43 Sports High No Yes
32 Truck Low
CarType
68 Family Low Not
Sports Sports High
High No

ML
Decision Tree — How to build?
• Classification and Regression Tree (CART) : Gini Index
• Iterative Dichotomiser 3 (ID3) : Entropy function and
Information Gain
Why Decision Tree?

ML
Decision Tree (Math)
• Four Features (X) Play Tennis Dataset
✓ outlook outlook temp humidity windy play
✓ temp sunny
sunny
hot
hot
high
high
FALSE
TRUE
no
no
✓ humidity overcast hot high FALSE yes
✓ windy rainy
rainy
mild
cool
high
normal
FALSE
FALSE
yes
yes
• Class (y) rainy cool normal TRUE no
✓ play
overcast cool normal TRUE yes
sunny mild high FALSE no
sunny cool normal FALSE yes
• All of the feature values are categorical rainy mild normal FALSE yes
sunny mild normal TRUE yes
• Binary Classification Problem overcast mild high TRUE yes
• We will apply ID3 algorithm overcast hot normal FALSE yes
rainy mild high TRUE no

ML
• Firstly, we need to select an attribute/feature node
✓ it will be the root of the tree
✓ best classifies the training data
✓ repeat this approach for each branch
How do we select an attribute to be as the root of the tree?
Information Gain (ID3 Algorithm)

ML
Entropy Measure
CSE4107: Artificial Intelligence (Spring-2020) 9

ML
Information theory: Entropy measure
• The entropy formula,
|C |
entropy( D ) = −  Pr(c ) log
j =1
j 2 Pr(c j )
|C |
 Pr(c ) = 1,
j =1
j
• Pr(cj) is the probability of class cj in data set D

• We use entropy as a measure of impurity or disorder of
data set D. (Or, a measure of information in a tree)

ML
Entropy Measure
For a binary classification problem

• If all examples are positive or all are negative then entropy will
be zero i.e., low.
• If half of the examples are of positive class and half are of
negative class then entropy is one i.e., high.

ML
Entropy measure: Examples
◼ As the data become purer and purer, the entropy value

becomes smaller and smaller. This is useful to us!
ML
Information gain
• Given a set of examples D, we first compute its entropy:
• If we make attribute Ai, with v values, the root of the

current tree, this will partition D into v subsets D1, D2 …, Dv.
The expected entropy if Ai is used as the current root:
v | Dj |
entropyAi ( D) =  | D |  entropy( D )
j =1
j

ML
Information gain (cont …)
• Information gained by selecting attribute Ai to branch

or to partition the data is
gain( D, Ai ) = entropy ( D) − entropy Ai ( D)

• We choose the attribute with the highest gain to
branch/split the current tree.

ML
• Calculate the Entropy of Play Tennis Dataset
Play Tennis Dataset

outlook temp humidity windy play
sunny hot high FALSE no
sunny hot high TRUE no
overcast hot high FALSE yes
rainy mild high FALSE yes
rainy cool normal FALSE yes
rainy cool normal TRUE no
rainy mild normal FALSE yes
overcast mild high TRUE yes
overcast hot normal FALSE yes

ML
• Entropy of the play class
Play Tennis Dataset
• There are 14 instances outlook temp humidity windy play
✓ Nine (9) are Classified as yes sunny hot high FALSE no
✓ Five (5) are Classified as no sunny hot high TRUE no


ML
Play Tennis Dataset

ML
Play Tennis Dataset
• Calculate Entropy and Information overcast cool normal TRUE yes
Gain for every feature sunny mild high FALSE no

ML
Play Tennis Dataset
4 4
log 2 + 0 sunny cool normal FALSE yes
4 4
*** Calculate Entropy and Information Gain for ‘temp’, ‘humidity’, and ‘windy’ class.
ML
Class — outlook
Play Tennis Dataset
Entropy: 0.693
Information Gain: 0.940 – 0.693 = 0.247 sunny hot high FALSE no
Class — temp overcast hot high FALSE yes
Entropy: 0.911
Information Gain: 0.940 – 0.911 = 0.029 rainy cool normal TRUE no
Class — humidity sunny mild high FALSE no
Entropy: 0.788
Information Gain: 0.940 – 0.788 = 0.152 sunny mild normal TRUE yes
Class — windy overcast hot normal FALSE yes
Entropy: 0.892
Information Gain: 0.940 – 0.892 = 0.048

ML
Class — outlook
Play Tennis Dataset
Entropy: 0.693
Information Gain: 0.940 – 0.693 = 0.247 sunny hot high FALSE no
Root Node is — outlook rainy mild high FALSE yes

ML
Play Tennis Dataset

ML

ML

ML

ML
The loan data (reproduced)
Approved or not

ML
A decision tree from the loan data
◼ Decision nodes and leaf nodes (classes)

ML
Use the decision tree
No

ML
Is the decision tree unique?
◼ No. Here is a simpler tree.
◼ We want smaller tree and accurate tree.
◼ Easy to understand and perform better.
◼ Finding the best tree is NP-

hard.
◼ All current tree building
algorithms are heuristic
algorithms

ML
From a decision tree to a set of rules
◼ A decision tree can
be converted to a
set of rules
◼ Each path from the
root to a leaf is a
rule.

An example
D = Class
• 9 positive class
• 6 negative class
6 6 9 9
entropy( D) =  log 2 +  log 2 = 0.971
15 15 15 15
ML
An example
6 6 9 9
entropy( D) =  log 2 +  log 2 = 0.971
15 15 15 15
5 5 5
entropyAge ( D) =  entropy( D1 ) +  entropy( D2 ) +  entropy( D3 ) Age Yes No entropy(Di)
15 15 15 young 2 3 0.971
5 5 5 middle 3 2 0.971
=  0.971 +  0.971 +  0.722
15 15 15 old 4 1 0.722
= 0.888
ML
An example
6 6 9 9
entropy( D) =  log 2 +  log 2 = 0.971
15 15 15 15
6 9
entropyOwn _ house ( D) =  entropy( D1 ) +  entropy( D2 )
15 15
6 9
=  0 +  0.918
15 15
= 0.551
5 5 5
entropyAge ( D) =  entropy( D1 ) +  entropy( D2 ) +  entropy( D3 ) Age Yes No entropy(Di)
15 15 15 young 2 3 0.971
5 5 5 middle 3 2 0.971
=  0.971 +  0.971 +  0.722
15 15 15 old 4 1 0.722
= 0.888
◼ Own_house is the best

choice for the root.
ML
ML
We build the final tree
◼ We can use information gain ratio to evaluate the

impurity as well (see the handout)

CSE4107 - Class4

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CSE4107 - Class4

Uploaded by

Copyright:

Available Formats

Supervised Machine Learning

CSE4107: Artificial Intelligence

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 1

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 2

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 3

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 4

Age Car Type Class Age < 27.5 —> High

17 Sports High Age < 27.5

43 Sports High No Yes

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 5

Why Decision Tree?

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 6

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 7

How do we select an attribute to be as the root of the tree?

Information Gain (ID3 Algorithm)

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 8

CSE4107: Artificial Intelligence (Spring-2020) 9

• Pr(cj) is the probability of class cj in data set D

CSE4107: Artificial Intelligence (Spring-2020) 10

For a binary classification problem

CSE4107: Artificial Intelligence (Spring-2020) 11

◼ As the data become purer and purer, the entropy value

• If we make attribute Ai, with v values, the root of the

CSE4107: Artificial Intelligence (Spring-2020) 13

• Information gained by selecting attribute Ai to branch

gain( D, Ai ) = entropy ( D) − entropy Ai ( D)

CSE4107: Artificial Intelligence (Spring-2020) 14

Play Tennis Dataset

CSE4107: Artificial Intelligence (Spring-2020) 15

✓ Five (5) are Classified as no sunny hot high TRUE no

CSE4107: Artificial Intelligence (Spring-2020) 16

CSE4107: Artificial Intelligence (Spring-2020) 17

CSE4107: Artificial Intelligence (Spring-2020) 18

Information Gain: 0.940 – 0.892 = 0.048

CSE4107: Artificial Intelligence (Spring-2020) 20

CSE4107: Artificial Intelligence (Spring-2020) 21

CSE4107: Artificial Intelligence (Spring-2020) 22

CSE4107: Artificial Intelligence (Spring-2020) 23

CSE4107: Artificial Intelligence (Spring-2020) 24

CSE4107: Artificial Intelligence (Spring-2020) 25

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 26

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 27

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 28

◼ Finding the best tree is NP-

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 29

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 30

◼ Own_house is the best

◼ We can use information gain ratio to evaluate the

1/21/2021 CSE4107: Artificial Intelligence (Spring-2020) 34

You might also like