Professional Documents
Culture Documents
• Node - a feature(attribute)
• Branch - a decision(rule)
• Leaf - an outcome(categorical or continuous)
There are many algorithms to build decision trees, here we are going to discuss ID3 algorithm
with an example.
It is a classification algorithm that follows a greedy approach by selecting a best attribute that
yields maximum Information Gain(IG) or minimum Entropy(H).
H(S)=∑c∈C−p(c)log2p(c)
Where,
• S - The current dataset for which entropy is being calculated(changes every iteration
of the ID3 algorithm).
• C - Set of classes in S {example - C ={yes, no}}
• p(c) - The proportion of the number of elements in class c to the number of elements
in set S.
In ID3, entropy is calculated for each remaining attribute. The attribute with the smallest
entropy is used to split the set S on that particular iteration.
Entropy = 0 implies it is of pure class, that means all are of same category.
Information Gain IG(A) tells us how much uncertainty in S was reduced after splitting
set S on attribute A. Mathematical representation of Information gain is shown here -
IG(A,S)=H(S)−∑t∈Tp(t)H(t)
Where,
S=⋃tϵTt
• p(t) - The proportion of the number of elements in t to the number of elements in set
S.
• H(t) - Entropy of subset t.
In ID3, information gain can be calculated (instead of entropy) for each remaining attribute.
The attribute with the largest information gain is used to split the set S on that particular
iteration.
We'll discuss it here mathematically and later see it's implementation in Python.
So, Let's take an example to make it more clear.
Here,dataset is of binary classes(yes and no), where 9 out of 14 are "yes" and 5 out of 14 are
"no".
Complete entropy of dataset is:
I again recommend you to perform further calculations and then cross check by
scrolling down...
Now, finding the best attribute for splitting the data with Outlook=Sunny values{ Dataset
rows = [1, 2, 8, 9, 11]}.
Here, when Outlook = Sunny and Humidity = High, it is a pure class of category "no". And
When Outlook = Sunny and Humidity = Normal, it is again a pure class of category "yes".
Therefore, we don't need to do further calculations.
Now, finding the best attribute for splitting the data with Outlook=Sunny values{ Dataset
rows = [4, 5, 6, 10, 14]}.
Here, when Outlook = Rain and Wind = Strong, it is a pure class of category "no". And When Outlook
= Rain and Wind = Weak, it is again a pure class of category "yes".
And this is our final desired tree for the given dataset.
Example 2: Build a decision tree using ID3 algorithm to determine if they will buy a computer or not
based on their age and income.
Sol:
To build a decision tree using the ID3 algorithm, we'll follow these steps:
Step 1: Calculate the entropy of the target variable (Buys Computer) for the entire dataset.
Step 2: Calculate the information gain for each feature (Age and Income). Step 3: Select the
feature with the highest information gain as the root node. Step 4: Create branches for each
unique value of the selected feature and repeat the process for each branch. Step 5: Continue
expanding the tree until all leaves are pure (i.e., all examples in a leaf node belong to the
same class) or a stopping criterion is met.
Step 1: Calculate the entropy of the target variable (Buys Computer) for the entire dataset.
Where:
• Number of Yes = 8
• Number of No = 4
Entropy(S) = -8/12 * log2(8/12) - 4/12 * log2(4/12) ≈ 0.918
Step 2: Calculate the information gain for each feature (Age and Income).
To calculate the weighted average entropy for Age, you need to consider each unique Age
value (Young, Middle, Old) and calculate the entropy for each subset of data.
• For Young:
o Number of Yes = 2
o Number of No = 2
o Entropy(Age=Young) = -2/4 * log2(2/4) - 2/4 * log2(2/4) = 1.0
• For Middle:
o Number of Yes = 4
o Number of No = 2
o Entropy(Age=Middle) = -4/6 * log2(4/6) - 2/6 * log2(2/6) ≈ 0.918
• For Old:
o Number of Yes = 2
o Number of No = 0
o Entropy(Age=Old) = -2/2 * log2(2/2) - 0 = 0.0
Weighted Average Entropy(Age) = (4/12) * 1.0 + (6/12) * 0.918 + (2/12) * 0.0 ≈ 0.764
Weighted Average Entropy(Income) = (5/12) * 0.971 + (4/12) * 0.0 + (3/12) * 0.918 ≈ 0.717
Since Income has a higher information gain, it will be our root node.
• Three examples result in "Yes," and two examples result in "No." Since there's still
some impurity, further splitting is needed.
Step 4: Calculate Information Gain for the Age attribute within the Low Income subset.
Weighted Average Entropy(Age) for Low Income = (3/5) * 0.918 + (2/5) * 0.0 + (0/5) * 0.0
≈ 0.551
Income is still the best attribute to split on for the Low Income subset.
Step 5: Continue building the tree. In this case, further splitting based on the Age attribute
within the Low Income subset provides the highest information gain. Continue this process
for the remaining branches, considering the values of Income and Age.