You are on page 1of 11

Decision Tree Classification

Decision tree builds classification or regression models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an associated decision
tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.

A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy).

Leaf node (e.g., Play) represents a classification or decision.

The topmost decision node in a tree which corresponds to the best predictor called root node.
Decision trees can handle both categorical and numerical data.

Information theory

Introduced by Claude Shannon. It has quantified entropy. This is key measure of information
which is usually expressed by the average number of bits needed to store or communicate one
symbol in a message.

Information theory measure information in bits

ID3 Algorithm:

Based on greedy search through the space of all possible branches with no backtracking

Includes all predictors

Entropy

A decision tree is built top-down from a root node and involves partitioning the data into
subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to
calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is
zero and if the sample is an equally divided it has entropy of one.

entropy(p1,p2,…,pn)=−p1log(p1)−p2log(p2)−⋯−pnlog(pn)
To build a decision tree, we need to calculate two types of entropy using frequency tables as
follows:

a) Entropy using the frequency table of one attribute:


b) Entropy using the frequency table of two attributes:

Information Gain
The information gain is based on the decrease in entropy after a dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns the highest information
gain (i.e., the most homogeneous branches).

Step 1: Calculate entropy of the target.


Step 2: The dataset is then split on the different attributes. The entropy for each branch is
calculated. Then it is added proportionally, to get total entropy for the split. The resulting
entropy is subtracted from the entropy before the split. The result is the Information Gain, or
decrease in entropy.

Step 3: Choose attribute with the largest information gain as the decision node, divide the
dataset by its branches and repeat the same process on every branch.
Step 4a: A branch with entropy of 0 is a leaf node.

Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
Decision Tree to Decision Rules
A decision tree can easily be transformed to a set of rules by mapping from the root node to
the leaf nodes one by one.

Detailed Explanation about Calculations

Overall Entropy

Expected new entropy for each attribute

Outlook

Outlook attribute contains 3 distinct values: Overcast, Rainy, Sunny

Overcast: 4 records; 4 are “Yes”


-[(4/4) log2 (4/4)] = 0

Rainy: 5 Records; 3 are “Yes”, 2 are “No”

E(3,2) = E(0.6, 0.4)

= -[(0.6) log2 (0.6) + (0.4) log2 (0.4)]

= -[(0.6) (-0.7369) + (0.4) (-1.3219)]

= 0.971

sunny: 5 records, 2 are “yes”, 3 are “No”

E(2,3) = E(0.4, 0.6)

= -[(0.4) log2 (0.4) + (0.6) log2 (0.6) ]

= 0.971

Thus, Expected New Entropy =

-[(4/14) * 0 + (5/14) * 0.971 + (5/14) * 0.971]

= 0.693

Gain (Outlook): 0.940 – 0.693 = 0.247

Temp

Temp attribute contains 3 distinct values: Hot, Mild, Cool

Hot: 4 records; 2 are “Yes”, 2 are “No”

E(2,2) = E(0.5, 0.5) = -[(0.5) log2 (0.5) + (0.5) log2 (0.5)]

= 1.0 { Put log2 (0.5) = -1}

Mild: 6 records; 4 are “Yes”, 2 are “No”

E(4,2) = E(0.666, 0.333) = -[(0.666) log2 (0.666) + (0.333) log2 (0.333)]

= -[(0.666) log2 (-0.586) + (0.333) log2 (-1.586)] = 0.918

Cool: 4 Records; 3 are “Yes”, 1 are “No”

E(3,1) = E(0.75, 0.25)= -[(0.75) log2 (0.75) + (0.25) log2 (0.25)]

= -[(0.75) (-0.4150) + (0.25) (-2)]


= 0.81125

Thus, Expected New Entropy =

-[(4/14) * (1) + (6/14) * 0.918 + (4/14) * 0.81125]

= 0.9109

Gain (Temp): 0.940 – 0.9109 = 0.029

Similarly, you can compute

Gain (Humidity) = .152

Gain (Windy) = .048

Gain by splitting at Outlook is largest., so we take outlook as decision node and divide the dataset by
its branches

Repeat the steps for Sunny, Overcast and Rainy

(Calculation sheets circulated)

Overall Entropy

E(Sunny): Yes - 3; No – 2

E(3,2) = E(0.6, 0.4) = -[(0.6) log2 (0.6) + (0.4) log2 (0.4)] = -[(0.6) (-0.737) + (0.4) (-1.322)] = 0.971

Now compute for temp, humidity and windy

Temp

2 distinct values: mild, cool


Mild: yes: 2; no: 1

E(2,1) = E(0.67, 0.33) = -[(0.67) log2 (0.67) + (0.33) log2 (0.33)] = -[(0.67) (-0.577) + (0.33) (-1.599)] = 0.914

Cool: yes: 1; no: 1

E(1,1) = E(0.67, 0.33) = -[(0.5) log2 (0.5) + (0.5) log2 (0.5)] = -[(0.5 (-1) + (0.5) (-1)] = 1.0

Expected New Entropy for Temp= (3/5) (0.914) + (2/5)(1.0)= 0.9484

Gain (Temp) = 0.971 – 0.9484 = .0226

Humidity

2 distinct values: high, normal

high: yes: 1; no: 1

E(1,1) = E(0.5, 0.5) = 1.0

normal: yes: 2; no: 1

E(2,1) = E(0.67, 0.33) = 0.914

Expected New Entropy for humidity= (2/5) (1.0) + (3/5) (0.914)= 0.9484

Gain (humidity) = 0.971 – 0.9484 = .0226

Windy

2 distinct values: false, true

false: yes: 3; no: 0

E(3,0) = -[(3/3) log2 (3/3) = 0

true: yes: 0; no: 2

E(0,2) = -[(2/2) log2 (2/2) = 0

Expected New Entropy for humidity= (3/5) (0) + (2/5) (0)= 0

Gain (windy) = 0.971 – 0 = .971 (Maximum)

Thus, split at Windy

Rainy

Overall Entropy
E(Rainy): Yes - 2; No – 3

E(2,3) = E(0.4, 0.6) = 0.971

Now compute expected new entropy for temp, humidity and windy

Temp

3 distinct values: hot, mild, cool

Hot: yes: 0; no: 2

E(0,2) = 0

Mild: yes: 1; no: 1

E(1,1) = 1

Cool: yes: 1; no: 0

E(1,0) = 0

Expected New Entropy for Temp= (2/5) (0) + (2/5)(1.0) + (1/5)(0)= 0.40

Gain (Temp) = 0.971 – 0.400 = .571

Humidity

2 distinct values: high, normal

high: yes: 0; no: 3

E(0,3) = 0

normal: yes: 2; no: 0

E(2,0) = 0

Expected New Entropy for humidity= (3/5) (0) + (2/5) (0)= 0

Gain (humidity) = 0.971 – 0 = .971 (Maximum)

Windy

2 distinct values: false, true

false: yes: 1; no: 2

E(1,2) = 0.914
true: yes: 1; no: 1

E(1,1) = 1

Expected New Entropy for humidity= (3/5) (0.914) + (2/5) (1)= 0.9484

Gain (windy) = 0.971 – 0.9484= .0226

Thus, split at WIndy

You might also like