You are on page 1of 12

ZETECH UNIVERSITY

DATA SCIENCE PROGRAMMIG WITH PYTHON

Lesson 7: supervised method (Decision trees)algorithms

What is Decision Trees?


 A decision tree is one of the supervised machine learning algorithms. This
algorithm can be used for regression and classification problems — yet, is
mostly used for classification problems.
 A decision tree follows a set of if-else conditions to visualize the data and
classify it according to the conditions.

Important terminology
Root Node: This attribute is used for dividing the data into two or more
sets. The feature attribute in this node is selected based on Attribute
Selection Techniques.
Branch or Sub-Tree: A part of the entire decision tree is called a branch or
sub-tree.

1|P ag e
Splitting: Dividing a node into two or more sub-nodes based on if-else
conditions.
Decision Node: After splitting the sub-nodes into further sub-nodes, then
it is called the decision node.
Leaf or Terminal Node: This is the end of the decision tree where it
cannot be split into further sub-nodes.
Pruning: Removing a sub-node from the tree is called pruning.

Working of Decision Tree

 The root node feature is selected based on the results from the Attribute Selection
Measure(ASM).
 The ASM is repeated until a leaf node, or a terminal node cannot be split into
sub-nodes.

2|P ag e
What is Attribute Selective Measure(ASM)?

Attribute Subset Selection Measure is a technique used in the data mining process for
data reduction. The data reduction is necessary to make better analysis and prediction
of the target variable.

The two main ASM techniques are

1. Gini index
2. Information Gain(ID3)

Gini index
 The measure of the degree of probability of a particular variable being
wrongly classified when it is randomly chosen is called the Gini index or
Gini impurity. The data is equally distributed based on the Gini index.

3|P ag e
Pi= probability of an object being classified into a particular class.
When you use the Gini index as the criterion for the algorithm to select the
feature for the root node., The feature with the least Gini index is selected.

2 Information Gain(ID3)
 Entropy is the main concept of this algorithm
o which helps determine a feature or attribute that gives maximum
information about a class is called Information gain or ID3
algorithm.
o By using this method, we can reduce the level of entropy from the
root node to the leaf node.

4|P ag e
algorithms used while Training Decision Trees.
ID3, C4.5, CART and Pruning

1. ID3: Ross Quinlan is credited within the development of ID3, which is shorthand
for “Iterative Dichotomiser 3.” This algorithm leverages entropy and information
gain as metrics to evaluate candidate splits. Some of Quinlan’s research on this
algorithm from 1986 can be found here (PDF, 1.4 MB) (link resides outside
of ibm.com).

2. C4.5: This algorithm is considered a later iteration of ID3, which was also
developed by Quinlan. It can use information gain or gain ratios to evaluate split
points within the decision trees.

3. CART: The term, CART, is an abbreviation for “classification and regression


trees” and was introduced by Leo Breiman. This algorithm typically utilizes Gini
impurity to identify the ideal attribute to split on. Gini impurity measures how
often a randomly chosen attribute is misclassified. When evaluating using Gini
impurity, a lower value is more ideal.

Decision Tree - Classification

Decision tree builds classification or regression models in the form of a tree


structure.

It breaks down a dataset into smaller and smaller subsets while at the same time
an associated decision tree is incrementally developed. The final result is a tree
with decision nodes and leaf nodes.

A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast
and Rainy). Leaf node (e.g., Play) represents a classification or decision. The
topmost decision node in a tree which corresponds to the best predictor called
root node. Decision trees can handle both categorical and numerical data.

5|P ag e
Algorithm

The core algorithm for building decision trees called ID3 by J. R. Quinlan which
employs a top-down, greedy search through the space of possible branches with no
backtracking. ID3 uses Entropy and Information Gain to construct a decision tree. In
ZeroR model there is no predictor, in OneR model we try to find the single best
predictor, naive Bayesian includes all predictors using Bayes' rule and the
independence assumptions between predictors but decision tree includes all predictors
with the dependence assumptions between predictors.
Entropy
Entropy- is the measure of uncertainty in the data. The effort is to reduce the
entropy and maximize the information gain.
o The feature having the most information is considered important by the
algorithm and is used for training the model.
o By using Information gain you are actually using entropy.
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).
ID3 algorithm uses entropy to calculate the homogeneity of a sample.

6|P ag e
o NB If the sample is completely homogeneous the entropy is zero and if
the sample is an equally divided it has entropy of one.

To build a decision tree, we need to calculate two types of entropy using


frequency tables as follows:
a) Entropy using the frequency table of one attribute:

7|P ag e
Information Gain
The information gain is based on the decrease in entropy after a dataset is split
on an attribute. Constructing a decision tree is all about finding attribute that

8|P ag e
returns the highest information gain (i.e., the most homogeneous branches).

Step 1: Calculate entropy of the target.

Step 2: The dataset is then split on the different attributes. The entropy for each
branch is calculated. Then it is added proportionally, to get total entropy for the
split. The resulting entropy is subtracted from the entropy before the split. The
result is the Information Gain, or decrease in entropy.

9|P ag e
Step 3: Choose attribute with the largest information gain as the decision node, divide
the dataset by its branches and repeat the same process on every branch.

Step 4a: A branch with entropy of 0 is a leaf node.

10 | P a g e
Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all
data is classified.

Decision Tree to Decision Rules

11 | P a g e
A decision tree can easily be transformed to a set of rules by mapping from the
root node to the leaf nodes one by one.

To understand the concepts discussed above watch the video link


below

Decision Trees ID3 Solved watch video Link ……..

http://youtube.com/watch?v=pRaKQC_DKLM

12 | P a g e

You might also like