You are on page 1of 31

Decision Tree

23MSD7027-Kunal Kaustav Nath


23MSD7038-T.Kiran Adithya
23MSD7054-B.Lakshmi Priya
23MSD7055-D.Suvarna
23MSD7062-B.Supriya
23MSD7061-Sreelakshmi T
Decision Tree

Introduction Classification Regression Pruning Conclusion

What is Information
Supervised & decision Entropy Pre Post Advantages
Unsupervised tree Gain pruning pruning

Components Ginny index


of Tree Limitations
Introduction

Supervised Learning

 Supervised learning is a type of machine learning algorithm that learns from labeled data.
Labeled data is data that has been tagged with a correct answer or classification.

 Supervised learning is when we teach or train the machine using data that is well-labelled,
which means some data is already tagged with the correct answer.

 After that, the machine is provided with a new set of examples(data) so that the supervised
learning algorithm analyses the training data(set of training examples) and produces a
correct outcome from labeled data.

 Examples: Classification and regression problems are common examples of supervised


learning.
Unsupervised Learning
 In unsupervised learning, the algorithm is given unlabeled data and is tasked with finding
patterns, structures, or relationships within the data without explicit guidance. The learning
algorithm explores the inherent structure of the data without predefined output labels.

 Examples:Clustering, where the algorithm groups similar data points together based on certain
characteristics, and dimensionality reduction, where the algorithm reduces the number of
features while preserving relevant information, are examples of unsupervised learning.
What is Decision Tree

 A decision tree is a popular machine learning algorithm that is used for both classification and regression
tasks. It is a tree-like model where each internal node represents a decision based on a feature, each branch
represents the outcome of the decision, and each leaf node represents the final predicted output for a given
input.
Components of Decision Tree
A decision tree consists of several key components that together define its structure and behavior. Here are
the main components of a decision tree:

 Root Node:
- The topmost node in the tree, representing the initial decision or test based on a specific feature. It is the
starting point for the decision-making process.

 Internal Nodes:
- Nodes within the tree, excluding the leaf nodes. Each internal node corresponds to a decision based on a
specific feature and serves as a branching point in the tree.

 Terminal nodes/Leaf nodes:


- Terminal nodes that do not have any children. Each leaf node represents the final predicted output (or class
in the case of classification) for a given input. The leaf nodes are the endpoints of the decision-making
process.

 Decision Node
-In a decision tree, a decision node is a point where the tree makes a decision about the input data. It
represents a test or condition that is applied to the input features, and based on the outcome of that test, the
 Edges:
- Edges connect nodes in the tree and represent the outcome of a decision. Each edge leads from
one node to another and corresponds to a particular outcome of the decision associated with the parent
node.

 Parent and Child Nodes:


- In the context of a decision tree, an internal node is a parent node to its child nodes. Child nodes are
connected to their parent node through branches.

 Subtree:
- A subtree is a portion of the entire decision tree that is itself a valid decision tree. Internal nodes and
leaf nodes, along with their connecting branches, form subtrees within the larger tree.
ID3 VS CART
 ID3 (Iterative Dichotomiser 3) and CART (Classification and Regression Trees) are both algorithms
used in machine learning for building decision trees, but they have some differences in terms of their
approach and applications.

 Objective:

• ID3: Primarily designed for classification tasks. It builds a decision tree by recursively splitting the
dataset based on the most informative attribute at each step, aiming to create branches that result in
pure subsets.

• CART: Can be used for both classification and regression tasks. It constructs binary trees by
recursively splitting the dataset based on the feature that provides the best separation according to a
specified criterion.
 Categorical vs. Numeric Attributes:

• ID3: Primarily handles categorical attributes. It selects the attribute that maximizes information gain or
minimizes entropy.

• CART: Can handle both categorical and numeric attributes. It uses different criteria for determining the best split
depending on the type of attribute (e.g., Gini impurity for categorical attributes and mean squared error for
numeric attributes).

 Splitting Criteria:

• ID3: Uses information gain and entropy as the criterion for selecting the best attribute to split the data.

• CART: Uses Gini impurity for classification tasks (to measure the node's impurity) and mean squared error for
regression tasks.
 Tree Structure:

• ID3: Tends to create deeper trees, which can be more prone to overfitting.

• CART: Typically results in shallower trees, and it employs pruning techniques to control overfitting.

 Handling Missing Values:

• ID3: Generally does not handle missing values well.

• CART: Can handle missing values by using surrogate splits

 Output:

• ID3: Outputs categorical labels.

• CART: Can output both categorical labels and numeric values, making it versatile for classification
and regression tasks.
Decision Tree

Regression Classifiaction
Classification
 Classification in decision trees involves building a tree structure that helps classify instances or data
points into different classes or categories. The decision tree is constructed based on features or attributes
of the data, and each internal node of the tree represents a decision based on a specific attribute.

 If we have output of the data as categarical or label, we use classification technique to split the nodes of
the decision tree
 Entropy

• It is used for checking the impurity or uncertainity present in the data.Entropy is used to evaluate the quality
of the split.

• Formula of Entropy

• Lets asume that resulting decision tree classifies instances into two categories,we will call them positive and
negative.Given a set containing positive and negative classes,then the entropy of S is,
The above table is about forecasting whether the match will be played or not according to the weather condition
• Graph of Entropy
 Gini Impurity

• Gini impurity is a measure used in decision tree algorithms to quantity a data sets impurity
level or disorder

• Formula for Gini Impurity


 Information Gain

• It indicates how much information a particular feature or variable give us about the final outcome

• To mininize the decision tree depth when we traverse the path,we need to select the optimal attribute for
splitting the tree node
Steps to make Decision Tree
 Take the entire dataset as an input

 Calculate the entropy of the target variable,as well as the predictor attributes

 Calculate the information gain of all attributes

 Choose the attributes withh the highest information gain as the Root node

 Repeat the same procedure on every branch until the decision node of each branch is
finalised
PROBLEM
Instance Outlook Temperature Humidity Outcomes

1 Sunny Hot High NO

2 Sunny Hot High NO

3 Rain Hot High YES

4 Rain Cool Normal YES

5 Rain Cool Normal YES

6 Sunny Cool High NO

7 Sunny Hot High NO

8 Sunny Hot Normal YES


Regression

 Regression refers to using a decision tree algorithm to predict a continuous numeric output rather
than class labels.

 While decision trees are commonly associated with classification tasks, they can also be adapted for
regression tasks.

 The process of building a decision tree for regression is similar to that for classification, but instead
of predicting discrete classes at the leaf nodes, it predicts continuous values.

 Continuous data is a type of quantitative data that can take any value within a given range.

 For exmaple:Height,Weight,Time,Temperature
Regression Problem

Experience Gap Salary


2 Yes 40K
2.5 Yes 42K
3 No 52K
4 No 60K
4.5 Yes 56K
STEPS TO SOLVE A DECISION TREE REGRESSION

• At first we have to take any one of the input as the root node

• Then we have to split the output and arrange it according to the condition of the root node

• The next step will be we have to calculate the Variance or MSE or SSR for the rootnode

• Formula of Variance/MSE/SSR
• Next step will be we have to calculate the variance reduction for that root node

• Variance Reduction Formula

• Next we have to calculate the variance reduction for different root nodes

• After calculating the variance reduction for all the root nodes,we will select the root node as the main
root node for the decision tree where the variance reduction is high or larger

• After selecting the root node,we will split the node into binary according to the conditions

• This process will continue until the decision tree has been built
 Key Differences from Classification Trees:

• Output at Leaf Nodes: Instead of class labels, regression trees output continuous values representing the
predicted outcome.

• Splitting Criteria: The decision criteria for selecting features and thresholds are typically based on
minimizing the variance or mean squared error of the target variable within the subsets created by the
splits.

• Evaluation Metrics: Common evaluation metrics for regression trees include mean squared error (MSE),
mean absolute error (MAE), or other measures of prediction accuracy.
PRUNING

 Pruning consists of a set of techniques that can be used to simplify a Decision Tree, and enable it to
generalise better

 Pruning Decision Trees falls into 2 general forms: Pre-Pruning and Post-Pruning.

 Post Pruning :

• This technique is used after construction of decision tree.

• This technique is used when decision tree will have very large depth and will show overfitting of model.

• It is also known as backward pruning.

• This technique is used when we have infinitely grown decision tree.

• Here we will control the branches of decision tree that is max_depth and min_samples_split using
cost_complexity_pruning
 Pre-Pruning :

• This technique is used before construction of decision tree.

• Pre-Pruning can be done using Hyperparameter tuning.

• Overcome the overfitting issue

 The benefits of pruning include:

• Improved Generalization: Pruning helps create a simpler and more generalizable tree, reducing the risk
of overfitting to the training data.

• Reduced Complexity: A pruned tree is often smaller and easier to interpret than an unpruned one,
making it more suitable for practical applications.

• Faster Prediction: Smaller trees typically lead to faster prediction times for new instances.

• Increased Robustness: Pruned trees are less sensitive to noise in the training data.
 Advantages of the Decision Tree

• It is simple to understand as it follows the same process which a human follow while making any
decision in real-life.

• It can be very useful for solving decision-related problems.

• It helps to think about all the possible outcomes for a problem.

• There is less requirement of data cleaning compared to other algorithms.

• Disadvantages of the Decision Tree

• The decision tree contains lots of layers, which makes it complex.

• It may have an overfitting issue, which can be resolved using the Random Forest algorithm.

• For more class labels, the computational complexity of the decision tree may increase
 Limitations of decision tree

1.Overfitting

2.Instability:

3.Sensitive to Noise

4.Not Suitable for Regression with High Variance Data

 Conclusion

Decision trees are powerful tools for classification and regression tasks, providing a clear and interpretable
way to make predictions. Their ability to handle both categorical and numerical data, along with their simplicity
and visual representation, makes them widely used in various fields. However, decision trees can be prone to
overfitting, and the choice of hyperparameters, such as tree depth, is crucial. Ensemble methods like
Random Forests and Gradient Boosting can enhance performance and mitigate overfitting. Ultimately, the
suitability of a decision tree depends on the specific characteristics of the dataset and the goals
of the analysis.
THANK YOU

You might also like