Professional Documents
Culture Documents
2
Title: Decision tree
Aim: Implement and evaluate ID3/C4.5 algorithm on given dataset.
Outcomes: At the end of the experiment the student should be able to:
1. Understand ID3/C4.5 algorithm.
2. Implement and evaluate ID3/C4.5 algorithm on given dataset.
Contents:
Decision Tree is one of the most powerful and popular algorithms. Python Decision-tree
algorithm falls under the category of supervised learning algorithms. It works for both
continuous as well as categorical output variables.
Decision Tree:
A Decision tree is a tree-like structure that represents a set of decisions and their
possible consequences. Each node in the tree represents a decision, and each branch represents
an outcome of that decision. The leaves of the tree represent the final decisions or predictions.
Decision trees are created by recursively partitioning the data into smaller and smaller
subsets. At each partition, the data is split based on a specific feature, and the split is made in
a way that maximizes the information gain.
In the above figure, decision tree is a flowchart-like tree structure that is used to make decisions.
It consists of Root Node(WINDY), Internal nodes(OUTLOOK, TEMPERATURE), which
represent tests on attributes, and leaf nodes, which represent the final decisions. The branches
of the tree represent the possible outcomes of the tests.
Components of a Decision Tree:
Root Node: It is the topmost node in the tree, which represents the complete
dataset. It is the starting point of the decision-making process.
Internal Node: A node that symbolizes a choice regarding an input feature.
Branching off of internal nodes connects them to leaf nodes or other internal
nodes.
Leaf/Terminal Node: A node without any child nodes that indicates a class
label or a numerical value.
Parent Node: The node that divides into one or more child nodes.
Child Node: The nodes that emerge when a parent node is split.
Where V is the number of values (unique outcomes) the feature can take, Dv
represents the subset of data points for which the feature has the vth value, and ∣D∣
denotes the total number of data points in dataset D.
Gini Impurity: Gini impurity, often used in algorithms like CART (Classification
and Regression Trees), measures the probability of misclassifying a randomly chosen
element if it were randomly labelled according to the class distribution in the dataset.
Gini impurity is computationally efficient and works well for binary splits.
Mathematically, Gini impurity for a dataset D is calculated as:
Where pi represents the proportion of data points belonging to class i in dataset D.
Lower Gini impurity values indicate a purer dataset.
Information gain quantifies the reduction in entropy achieved by splitting the data based on
a particular feature. Features with higher information gain are preferred for splitting.
Information gain is calculated as follows:
Every decision tree node’s dataset is recursively divided using the ID3 algorithm according
to the chosen attribute. This method keeps going until either there are no more attributes to
divide on, or all the examples in a node belong to the same class.
Advantages of Decision Tree Algorithms:
Easy to understand and interpret: Decision trees can be easily understood and
interpreted by humans, even those without a machine learning background. This
makes them a good choice for applications where it is important to be able to
explain the model’s predictions.
Versatile: Decision tree algorithms can be used for a wide range of machine
learning tasks, including classification, regression, and anomaly detection.
Robust to noise: Decision tree algorithms are relatively robust to noise in the
data. This is because they make predictions based on the overall trend of the data,
rather than individual data points.
Limitations and Considerations:
Overfitting: Decision trees can be prone to overfitting, capturing noise in the
data. Techniques like pruning, setting minimum samples per leaf, or limiting the
tree depth can mitigate this issue.
Bias: Certain tree structures can exhibit bias towards features with more levels.
Proper feature scaling or using algorithms like C4.5, which consider gain ratio,
can address this bias.
Conclusion: