You are on page 1of 15

DECISION TREE

Mosharaaf Hossain
Submit to :
Fozilatunnesa Masuma mam
AGENDA
Introduction​
Primary goals
​Areas of growth
​Summary​
Decision tree -assignment 3

INTRODUCTION TO
DECISION TREES IN
MACHINE LEARNING:
What is a Decision Tree?
A Decision Tree is a powerful and popular algorithm used for both classification and
regression tasks in machine learning. It is a supervised learning algorithm that works
by recursively partitioning the input space into regions and assigning a specific
prediction to each region. Decision Trees are particularly useful for their simplicity
and interpretability.

How Does a Decision Tree Work?


At a high level, the decision tree algorithm works by making a series of binary
decisions based on the input features. The tree is constructed in a hierarchical manner,
with each internal node representing a decision based on a specific feature, and each
leaf node representing the final output or prediction. The decision-making process
involves selecting the feature that provides the best split, effectively partitioning the
data into subsets.
Decision tree -assignment 4

INTRODUCTION TO
DECISION TREES IN
MACHINE LEARNING:
Key Components of a Decision Tree:
1.Root Node: The topmost decision in the tree, representing the best feature to split
the data initially.
2.Internal Nodes: Nodes in the middle of the tree that represent decisions based on
specific features.
3.Leaf Nodes: The endpoints of the tree where the final prediction is made. Each leaf
node corresponds to a class label in classification or a numerical value in regression.
4.Splitting Criteria: The decision tree algorithm uses a splitting criterion to
determine the best feature and value to split the data at each internal node. Common
criteria include Gini impurity for classification tasks and mean squared error for
regression tasks.
5.Pruning: Decision trees are prone to overfitting, meaning they may capture noise in
the training data. Pruning is a technique used to remove parts of the tree that do not
provide significant predictive power, improving the model's generalization to new,
unseen data.
Decision tree -assignment 5

ADVANTAGES OF DECISION
TREES:
1.Interpretability: Decision Trees are easy to understand and interpret, making them
suitable for explaining the reasoning behind a particular prediction.
2.No Feature Scaling Required: Unlike some algorithms, decision trees do not
require feature scaling, as they make decisions based on the relative ordering of
features.
3.Handle Non-Linearity: Decision Trees can capture complex relationships and non-
linear patterns in the data.
4.Versatility: Decision Trees can be applied to both classification and regression
tasks.

Challenges and Considerations:


5.Overfitting: Decision Trees are susceptible to overfitting, especially if the tree is
deep and complex. Pruning and limiting the tree depth are common strategies to
address this issue.
6.Instability: Small changes in the input data can lead to different tree structures,
making decision trees somewhat unstable.
7.Biased Toward Dominant Classes: In classification tasks, Decision Trees may be
biased toward classes with a large number of instances.
PRIMARY GOALS
Decision tree
Decision tree -assignment 7

PRIMARY GOALS
The primary goals of decision trees in machine learning are to facilitate effective
decision-making and prediction. Here are the key objectives and goals associated with
decision trees:
1.Classification:
1. One of the primary goals of decision trees is to perform classification tasks.
It classifies instances into different classes or categories based on the
features provided.
2.Regression:
1. Decision trees can be used for regression tasks as well. Instead of predicting
a class label, it predicts a continuous value. This makes decision trees
versatile, applicable to both classification and regression problems.
3.Interpretability:
1. Decision trees are designed to be interpretable. The structure of the tree is
easy to understand, allowing users to interpret the decision-making process
and gain insights into the factors influencing predictions.
4.Feature Importance:
1. Decision trees can help identify the most important features in a dataset. By
examining the splits in the tree and the features used at each node, it
becomes apparent which features contribute most significantly to the
decision-making process.
Decision tree -assignment 8

PRIMARY GOALS
1.Handling Non-Linearity:
1. Decision trees are capable of capturing non-linear relationships in the data.
They can model complex decision boundaries, making them effective in
situations where the relationships between features and outcomes are not
linear.
2.Handling Missing Data:
1. Decision trees can handle datasets with missing values. During the decision-
making process, if a particular feature has missing data, the algorithm can
still make decisions based on the available information.
3.Scalability:
1. Decision trees can handle datasets with a large number of features and
instances. While the computational complexity can increase with the size of
the dataset, decision trees are generally scalable and can handle a variety of
data sizes.
4.No Requirement for Feature Scaling:
1. Unlike some machine learning algorithms, decision trees do not require feature
scaling. They make decisions based on the relative ordering of features, so the
scale of individual features does not impact the algorithm's performance.
5.Ensemble Methods:
1. Decision trees can be used as building blocks in ensemble methods, such as Random Forests
and Gradient Boosted Trees. These methods combine multiple decision trees to improve overall
Presentation title 9

EXAMPLES OF DECISION TREE IN


NUMPY
10
Presentation title 11

CONTINUE …
13

AREAS OF FOCUS
DECISION TREES IN MACHINE LEARNING ARE VERSATILE THE PRIMARY AREAS OF FOCUS FOR DECISION TREES IN
MODELS USED FOR BOTH CLASSIFICATION AND MACHINE
REGRESSION TASKS.

1. Tree Pruning:
1. Splitting Criteria: 1. Pre-pruning: Stopping the tree-building
1. Gini Impurity (for classification): process early based on predefined criteria
Measures the likelihood of an incorrect (e.g., maximum depth, minimum samples
classification. per leaf).
2. Entropy (for classification): Measures the 2. Post-pruning (or pruning): Trimming
level of disorder or uncertainty in a set of branches of the tree after it has been built to
labels. reduce overfitting.
3. Mean Squared Error (for regression): 2. Handling Categorical Features:
Measures the average squared difference 1. Binary Splitting: Some decision tree
between the predicted and actual values. algorithms can only perform binary splits,
2. Node Splitting: so techniques are needed to handle
categorical features.
1. Feature Selection: Identifying which
feature to split on at each node based on the 3. Handling Missing Data:
selected splitting criterion. 1. Imputation: Strategies for dealing with
2. Splitting Threshold: For numerical missing values in the dataset, such as
features, determining the threshold that surrogate splits or imputation techniques.
Presentation title 14

SUMMARY
A decision tree in machine learning is a predictive model that maps
features to outcomes through a tree-like structure of decisions. The tree is
constructed by recursively partitioning the data based on feature values,
optimizing criteria such as Gini impurity or entropy for classification
tasks and mean squared error for regression. Each internal node represents
a decision, and each leaf node represents a predicted outcome. Decision
trees are versatile, interpretable, and can handle both categorical and
numerical data. They are used for classification, regression, and are often
employed in ensemble methods like random forests or gradient boosting
for improved accuracy and generalization. Pruning techniques help
control tree size and prevent overfitting, while feature importance analysis
reveals the significance of different features in the model's predictions.
THANK YOU
Mosharaaf Hossain
Id:01223200
Course code:061952203
Msc in cse

You might also like