Professional Documents
Culture Documents
Instructor: Yi Yang
Department of ISOM
Spring 2023
q Last lecture
q Data preparation
q This Lecture
q Decision tree
2
Data Mining Process
3
Supervised Learning
4
Training vs. Testing
2. Split data into two parts, learn the model on one part, and
evaluate on the other part.
2. Split data into two parts, learn the model on one part, and
evaluate on the other part.
7
Decision Tree
8
Decision Tree
categorical continuous
Employed Balance Age Default
Yes 123,000 50 No
No 51,100 40 Yes
No 68,000 55 No
Yes 34,000 46 Yes
Yes 50,000 44 No
No 100,000 50 Yes
9
Decision tree
Tree
v A upside down if-else tree, start Root Node
Employed
with all training data
Yes No
Node Class =
v Each node has an if-else condition Not
Balance Node
about one feature. (Which feature?) Default
10
The essence of Decision Tree
11
Entropy
Entropy (dataset) =
12
Entropy Exercise
𝑡𝑖𝑝: 0 log ! 0 = 0
13
Let’s play a game. I have someone in my mind,
and your job is to guess this person. You can only
ask yes/no question.
Go!
14
Information Gain
q The information gain is based on the
decrease in entropy after a dataset is split
on a feature.
15
Information Gain Example
A1: has credit card?? A2: is student??
Entropy=
before
before
18
Outlook: Sunny,
Overcast, Rain
Humidity: High,
Normal
Wind: Weak,
Strong
Decision:
Yes (9), No (5)
19
q On board demonstration
20
Recap: The essence of Decision Tree
21