(Lec 6) Decision Tree ML

Decision Tree Learning
Today’s Agenda
• What is Decision Tree?
• ID3 Algorithm
• Entropy and Information Gain
• Tree Construction
• Examples
• Summary
Decision Trees
● A decision tree consists of
Nodes:
● test for the value of a certain attribute
Edges:
● correspond to the outcome of a test
● connect to the next node or leaf
Leaves:
● terminal nodes that predict the outcome
to classifiy an example:
1. start at the root
2. perform the test
3. follow the edge corresponding to outcome
4. goto 2. unless leaf
5. predict that outcome associated with the leaf
3
In Decision Tree The training examples

Learning, a new are used for choosing
example is classified by appropriate tests in the
submitting it to a series decision tree. Typically,
of tests that determine a tree is built from top
the class label of the to bottom, where tests
example.These tests are that maximize the
Training
organized in a information gain about
hierarchical structure the classification
called a decision tree. are selected first.
?
New
Example Classification
4
A Sample Task (Training & Testing Data)
Day Temperature Outlook Humidity Windy Play Golf?
07-05 hot sunny high false no
07-06 hot sunny high true no
07-07 hot overcast high false yes
07-09 cool rain normal false yes
07-10 cool overcast normal true yes
07-12 mild sunny high false no
07-14 cool sunny normal false yes
07-15 mild rain normal false yes
07-20 mild sunny normal true yes
07-21 mild overcast high true yes
07-22 hot overcast normal false yes
07-23 mild rain high true no
07-26 cool rain normal true no
07-30 mild rain high false yes
today cool sunny normal false ?

tomorrow mild sunny normal false ?
5
Decision Tree Example
Outlook
sunny
rain
overcast
Humidity Yes
Windy
high normal true false
No Yes No Yes
tomorrow mild sunny normal false ?
7
Divide-And-Conquer Algorithms
● Family of decision tree learning algorithms
TDIDT: Top-Down Induction of Decision Trees
● Learn trees in a Top-Down fashion:
divide the problem in subproblems
solve each problem
Basic Divide-And-Conquer Algorithm:

1. select a test for root node
Create branch for each possible outcome of the test
2. split instances into subsets
One for each branch extending from the node
3. repeat recursively for each branch, using only instances that reach
the branch
4. stop recursion for a branch if all its instances have the same class
8
Decision Tree (ID3 Pseudocode)
A Different Decision Tree
● also explains all of the training data

●
will it generalize well to newdata?
10
Which attribute to select as the
root?
11
A criterion for attribute selection
• Which is the best attribute?
– The one which will result in the smallest tree
– Heuristic: choose the attribute that produces the
“purest” nodes
• Popular impurity criterion: information gain
– Information gain increases with the average purity
of the subsets that an attribute produces
• Strategy: choose attribute that results in
greatest information gain
12
Computing information
• Information is measured in bits
– Given a probability distribution, the info required
to predict an event is the distribution’s entropy
– Entropy gives the information required in bits (this

can involve fractions of bits!)
13
Consider entropy H(p)
pure, 100% yes
not pure at all, 40% yes
pure, 100% yes

not pure at all, 40% yes done
allmost 1 bit of information required
to distinguish yes and no
Entropy
• Formula for computing the entropy:
entropy( p1 , p2 ,, pn )   p1logp1  p2logp2   pn logpn
15
Entropy of a split
• Information in a split with x items of one class,
y items of the second class
x y
info([x, y])  entropy( , )
xy xy
x x y y
 log( ) log( )
x y x y x y x y
16
Example: attribute “Outlook”
• “Outlook” = “Sunny”: 2 and 3 split
2 2 3 3
info([2,3] )  entropy(2/ 5,3/5)   log( )  log( )  0.971 bits
5 5 5 5
17
Outlook = Overcast
• “Outlook” = “Overcast”: 4/0 split
Note: log(0) is not

info([4,0])  entropy(1,0)  1log(1)  0 log(0)  0 bitsdefined, but we
evaluate 0*log(0) as
zero
18
Outlook = Rainy
• “Outlook” = “Rainy”:
3 3 2 2
info([3,2] )  entropy(3/ 5,2/5)   log( )  log( )  0.971 bits
5 5 5 5
19
Expected Information
Expected information for attribute:
info([3,2],[4,0],[3,2])  (5 / 14)  0.971  (4 / 14)  0  (5 / 14)  0.971
 0.693 bits
20
Computing the information gain
• Information gain:
(information before split) – (information after
split)
gain(" Outlook" )  info([9,5] ) - info([2,3] , [4,0], [3,2])  0.940 - 0.693
 0.247 bits
• Information gain for attributes from weather

data: gain(" Outlook" )  0.247 bits
gain(" Temperatur e" )  0.029 bits
gain(" Humidity" )  0.152 bits
21 gain(" Windy" )  0.048 bits
Example (Ctd.)
Outlook is selected
as the root note
? ?
further splitting
necessary Outlook = overcast
contains only
examples of class yes
Example (Ctd.)
Gain(Temperature ) = 0.571 bits

Gain(Humidity ) = 0.971 bits Humidity is selected
Gain(Windy ) = 0.020 bits
23
Example (Ctd.)
Humidity is selected
further splitting
necessary
Pure leaves
→ No further expansion necessary
24
Final decision tree
25

(Lec 6) Decision Tree ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Lec 6) Decision Tree ML

Uploaded by

Copyright:

Available Formats

Decision Tree Learning

● connect to the next node or leaf

In Decision Tree The training examples

today cool sunny normal false ?

high normal true false

tomorrow mild sunny normal false ?

Basic Divide-And-Conquer Algorithm:

● also explains all of the training data

– Entropy gives the information required in bits (this

pure, 100% yes

not pure at all, 40% yes

pure, 100% yes

Note: log(0) is not

info([3,2],[4,0],[3,2])  (5 / 14)  0.971  (4 / 14)  0  (5 / 14)  0.971

• Information gain for attributes from weather

Gain(Temperature ) = 0.571 bits

You might also like