You are on page 1of 16

Decision Trees

What is decision tree?


• Decision Tree algorithm belongs to the family
of supervised learning algorithms. Unlike other
supervised learning algorithms, the decision
tree algorithm can be used for
solving regression and classification
problems too.
• The goal of using a Decision Tree is to create a
training model that can use to predict the class
or value of the target variable by learning
simple decision rules inferred from prior
data(training data).
Terminology

1. Root Node: It represents the entire population or sample and this


further gets divided into two or more homogeneous sets.
2. Splitting: It is a process of dividing a node into two or more sub-nodes.
3. Decision Node: When a sub-node splits into further sub-nodes, then it
is called the decision node.
4. Leaf / Terminal Node: Nodes do not split is called Leaf or Terminal
node.
5. Pruning: When we remove sub-nodes of a decision node, this process is
called pruning. You can say the opposite process of splitting.
6. Branch / Sub-Tree: A subsection of the entire tree is called branch or
sub-tree.
7. Parent and Child Node: A node, which is divided into sub-nodes is
called a parent node of sub-nodes whereas sub-nodes are the child of a
parent node.
As an example, let’s imagine that you were trying to assess whether or
not you should go surf, you may use the following decision rules to
make a choice:
Decision tree learning employs a divide and conquer strategy by conducting a greedy search to
identify the optimal split points within a tree.

This process of splitting is then repeated in a top-down, recursive manner until all, or the majority
of records have been classified under specific class labels.

Whether or not all data points are classified as homogenous sets is largely dependent on the
complexity of the decision tree.

Smaller trees are more easily able to attain pure leaf nodes—i.e. data points in a single class.
However, as a tree grows in size, it becomes increasingly
difficult to maintain this purity, and it usually results in too little
data falling within a given subtree.

When this occurs, it is known as data fragmentation, and it can


often lead to overfitting. As a result, decision trees have
preference for small trees
Types of Decision Trees

ID3 C4.5 CART

Ross Quinlan is credited This algorithm is The term, CART, is an abbreviation


for “classification and regression
within the development of considered a later iteration trees” and was introduced by Leo
ID3, which is shorthand for of ID3, which was also Breiman.
“Iterative Dichotomiser 3.” developed by Quinlan. This algorithm typically utilizes Gini
impurity to identify the ideal
This algorithm leverages It can use information gain attribute to split on.
entropy and information or gain ratios to evaluate Gini impurity measures how often a
gain as metrics to evaluate split points within the randomly chosen attribute is
misclassified. When evaluating using
candidate splits. decision trees. Gini impurity, a lower value is more
ideal.
How to choose the best attribute at each node

Entropy and
Information Gini Impurity
Gain
Entropy and Information Gain

• Entropy is a concept that stems from


information theory, which measures
the impurity of the sample values. It is
defined with by the following formula,
where:
• S represents the data set that entropy is
calculated
• c represents the classes in set, S
• p(c) represents the proportion of data points
that belong to class c to the number of total
data points in set, S
• Entropy values can fall between 0 and 1.
Entropy Values • If all samples in data set, S, belong to one class, then entropy will
equal zero.
• If half of the samples are classified as one class and the other half are
in another class, entropy will be at its highest at 1.
• In order to select the best feature to split on and find the optimal
decision tree, the attribute with the smallest amount of entropy should
be used.
Information gain

• Information gain represents the difference in entropy before and after a split on a given attribute.
• The attribute with the highest information gain will produce the best split as it’s doing the best job at
classifying the training data according to its target classification.

• a represents a specific attribute or class label


• Entropy(S) is the entropy of dataset, S
• |Sv|/ |S| represents the proportion of the values in S v to the number of values in dataset, S
• Entropy(Sv) is the entropy of dataset, Sv
Gini Impurity

• Gini impurity is the probability of incorrectly classifying random data


point in the dataset if it were labeled based on the class distribution of
the dataset.
• Similar to entropy, if set, S, is pure—i.e. belonging to one class) then,
its impurity is zero.
Example
Example
• https://www.saedsayad.com/decision_tree.htm
Information gain uses
Entropy is uncertainty/ entropy to make decisions. If Information gain is used in
randomness in the data, the the entropy is less, decision trees and random
more the randomness the information will be more. forest to decide the best
higher will be the entropy. split.

Entropy and Thus, the more the


information gain the better
The entropy of a dataset
before and after a split is
Information the split and this also means
lower the entropy.
used to calculate
information gain.
Entropy is the measure of
uncertainty in the data.

Gain
The feature having the most
The effort is to reduce the information is considered
entropy and maximize the important by the algorithm
information gain. and is used for training the
model.

You might also like