This document discusses decision trees and how to build them. It explains that decision trees are built using recursive partitioning to classify data by determining the attribute that best splits the data based on its predictive power. It discusses how predictiveness is based on decreasing the impurity of nodes, and that impurity is calculated using entropy, which measures the homogeneity of samples in a node, with 0 being completely homogeneous and 1 being equally divided. Selecting the attribute that most reduces entropy results in the purest nodes.
This document discusses decision trees and how to build them. It explains that decision trees are built using recursive partitioning to classify data by determining the attribute that best splits the data based on its predictive power. It discusses how predictiveness is based on decreasing the impurity of nodes, and that impurity is calculated using entropy, which measures the homogeneity of samples in a node, with 0 being completely homogeneous and 1 being equally divided. Selecting the attribute that most reduces entropy results in the purest nodes.
This document discusses decision trees and how to build them. It explains that decision trees are built using recursive partitioning to classify data by determining the attribute that best splits the data based on its predictive power. It discusses how predictiveness is based on decreasing the impurity of nodes, and that impurity is calculated using entropy, which measures the homogeneity of samples in a node, with 0 being completely homogeneous and 1 being equally divided. Selecting the attribute that most reduces entropy results in the purest nodes.
Instructor: Abinta Mehmood Outline • Decision Tree • How to build a decision tree? • Selecting the attributes • Entropy Decision trees are built using recursive partitioning to classify the data.
What is important in making a decision tree, is to determine which attribute
is the best or more predictive to split data based on the feature. if the patient has high cholesterol we cannot say with high confidence that drug B might be suitable for him. Also, if the patient's cholesterol is normal, we still don't have sufficient evidence or information to determine if either drug A or drug B is in fact suitable. if the patient is female, we can say drug B might be suitable for her with high certainty. But if the patient is male, we don't have sufficient evidence or information to determine if drug A or drug B is suitable. However, it is still a better choice in comparison with the cholesterol attribute because the result in the nodes are more pure Predictiveness is based on decrease in impurity of nodes. We're looking for the best feature to decrease the impurity of patients in the leaves, after splitting them up based on that feature. Impurity and Entropy
• A node in the tree is considered pure if in 100
percent of the cases, the nodes fall into a specific category of the target field. • In fact, the method uses recursive partitioning to split the training records into segments by minimizing the impurity at each step. • Impurity of nodes is calculated by entropy of data in the node. • So, what is entropy? The entropy is used to calculate the homogeneity of the samples in that node. If the samples are completely homogeneous, the entropy is zero and if the samples are equally divided it has an entropy of one.