Decision_Tree_For_Classification_(ID3__Information_Gain__Entropy)

Decision Tree For Classification
(ID3 | Information Gain |

Entropy)
Decision Tree is an beginner friendly, easy to interpret and implement machine
learning algorithm which can be applied for the tasks of classification and
regression.
It is a non parametric supervised machine learning algorithm, which when
represented on paper will look like an hierarchical flowchart with root nodes,
branches, internal nodes and leaf nodes, something like an upside down tree.
💡 Non parametric ML algorithms are models which do not make any

assumptions about the functional form or distribution of the underlying
data.
Unlike parametric ML algorithms which have fixed number of parameters
and assume data distribution or shape of data. This is why non-
parametric models are flexible but tedious for processing.
Considering linear regression (parametric supervised model), it will

always have slope and intercept as it parameters which the model uses
to represent the relationships with the data.
Whereas a decision tree (non-parametric supervised model) doesn’t

have any fixed number of parameters that it learns from training data.
Instead, the number of parameters (splits and conditions) can grow with
the complexity of the data. Which is heavy on systems!
Decision tree learning employs a recursive splitting strategy using greedy search
to identify the optimal split points within the tree. This process of splitting is
repeated from top to bottom until all records gets classified under a specific class
label or value. The complexity of the tree decides on well a decision tree can
Decision Tree For Classification (ID3 | Information Gain | Entropy) 1

classify the data into homogeneous classes. A more complex tree can split the
data into smaller and specific subsets/classes, but can cause overfitting. Hence,
finding a right balance between tree depth and accuracy. To reduce complexity
and prevent overfitting, an optimum size of datasets should be considered for
training. Also one can use pruning to overcome overfitting which involves
removing the branches that split on features with low importance (which may
happen as decision trees assume all features to be important).
There are different types of decision tree models to choose from based on there
learning and node splitting technique. ID3 (Iterative Dichotomiser 3), C4.5, CART
and Chi-Square are popular ones.
As node splitting is the key concept/step in decision tree algorithm let’s look at it in
detail. There are multiple ways to split a node and it can be broadly divided into
two categories based the type of target variable.
1. Continuous Target Variable:

Variance Reduction:
1. For each split, individually calculate the variance of each child node
(feature on which you want try split)
2. Calculate the variance of each split as the weighted average variance of

child nodes. ( add splits above multiplying by the weight in parent node)
3. Select the split with the lowest variance
4. Perform steps 1-3 until completely homogeneous nodes are achieved
2. Categorical Target Variable: a) Information Gain, b) Gini Impurity

Information Gain:
1. For each split, individually calculate the entropy of each child node (feature
on which you want try split)
2. Calculate the entropy of each split as the weighted average entropy of

child nodes. ( add splits above multiplying by the weight in parent node)

3. Select the split with the lowest entropy or highest information gain.
(Information gain = 1 - entropy)
4. Until you achieve homogeneous nodes, repeat steps 1-3
Gini Impurity:
1. Similar to what we did in information gain. For each split, individually

calculate the Gini Impurity of each child node
2. Calculate the Gini Impurity of each split as the weighted average Gini
Impurity of child nodes
3. Select the split with the lowest value of Gini Impurity
4. Until you achieve homogeneous nodes, repeat steps 1-3
REFERENCES:
https://www.analyticsvidhya.com/blog/2020/06/4-ways-split-decision-tree/
https://www.ibm.com/topics/decision-trees#:~:text=data mining
solutions-,Decision Trees,internal nodes and leaf nodes.
https://www.analyticsvidhya.com/blog/2021/08/decision-tree-algorithm/#Pruning

Decision_Tree_For_Classification_(ID3__Information_Gain__Entropy)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Decision_Tree_For_Classification_(ID3__Information_Gain__Entropy)

Uploaded by

Copyright:

Available Formats

Decision Tree For Classification

(ID3 | Information Gain |

💡 Non parametric ML algorithms are models which do not make any

Considering linear regression (parametric supervised model), it will

Whereas a decision tree (non-parametric supervised model) doesn’t

Decision Tree For Classification (ID3 | Information Gain | Entropy) 1

1. Continuous Target Variable:

2. Calculate the variance of each split as the weighted average variance of

3. Select the split with the lowest variance

4. Perform steps 1-3 until completely homogeneous nodes are achieved

2. Categorical Target Variable: a) Information Gain, b) Gini Impurity

2. Calculate the entropy of each split as the weighted average entropy of

Decision Tree For Classification (ID3 | Information Gain | Entropy) 2

4. Until you achieve homogeneous nodes, repeat steps 1-3

1. Similar to what we did in information gain. For each split, individually

3. Select the split with the lowest value of Gini Impurity

4. Until you achieve homogeneous nodes, repeat steps 1-3

Decision Tree For Classification (ID3 | Information Gain | Entropy) 3

You might also like