Professional Documents
Culture Documents
Today’s Agenda
• What is Decision Tree?
• ID3 Algorithm
• Entropy and Information Gain
• Tree Construction
• Examples
• Summary
Decision Trees
● A decision tree consists of
Nodes:
● test for the value of a certain attribute
Edges:
● correspond to the outcome of a test
Leaves:
● terminal nodes that predict the outcome
to classifiy an example:
1. start at the root
2. perform the test
3. follow the edge corresponding to outcome
4. goto 2. unless leaf
5. predict that outcome associated with the leaf
3
Decision Tree Learning
New
Example Classification
4
A Sample Task (Training & Testing Data)
Day Temperature Outlook Humidity Windy Play Golf?
07-05 hot sunny high false no
07-06 hot sunny high true no
07-07 hot overcast high false yes
07-09 cool rain normal false yes
07-10 cool overcast normal true yes
07-12 mild sunny high false no
07-14 cool sunny normal false yes
07-15 mild rain normal false yes
07-20 mild sunny normal true yes
07-21 mild overcast high true yes
07-22 hot overcast normal false yes
07-23 mild rain high true no
07-26 cool rain normal true no
07-30 mild rain high false yes
5
Decision Tree Example
Outlook
sunny
rain
overcast
Humidity Yes
Windy
No Yes No Yes
Decision Tree Learning
7
Divide-And-Conquer Algorithms
● Family of decision tree learning algorithms
TDIDT: Top-Down Induction of Decision Trees
● Learn trees in a Top-Down fashion:
divide the problem in subproblems
solve each problem
8
Decision Tree (ID3 Pseudocode)
A Different Decision Tree
11
A criterion for attribute selection
• Which is the best attribute?
– The one which will result in the smallest tree
– Heuristic: choose the attribute that produces the
“purest” nodes
• Popular impurity criterion: information gain
– Information gain increases with the average purity
of the subsets that an attribute produces
• Strategy: choose attribute that results in
greatest information gain
12
Computing information
• Information is measured in bits
– Given a probability distribution, the info required
to predict an event is the distribution’s entropy
13
Consider entropy H(p)
15
Entropy of a split
• Information in a split with x items of one class,
y items of the second class
x y
info([x, y]) entropy( , )
xy xy
x x y y
log( ) log( )
x y x y x y x y
16
Example: attribute “Outlook”
• “Outlook” = “Sunny”: 2 and 3 split
2 2 3 3
info([2,3] ) entropy(2/ 5,3/5) log( ) log( ) 0.971 bits
5 5 5 5
17
Outlook = Overcast
• “Outlook” = “Overcast”: 4/0 split
18
Outlook = Rainy
• “Outlook” = “Rainy”:
3 3 2 2
info([3,2] ) entropy(3/ 5,2/5) log( ) log( ) 0.971 bits
5 5 5 5
19
Expected Information
Expected information for attribute:
0.693 bits
20
Computing the information gain
• Information gain:
(information before split) – (information after
split)
gain(" Outlook" ) info([9,5] ) - info([2,3] , [4,0], [3,2]) 0.940 - 0.693
0.247 bits
Outlook is selected
as the root note
? ?
further splitting
necessary Outlook = overcast
contains only
examples of class yes
Example (Ctd.)
23
Example (Ctd.)
Humidity is selected
further splitting
necessary
Pure leaves
→ No further expansion necessary
24
Final decision tree
25