You are on page 1of 71

x <= 7

ID3
The ID3 (Iterative Dichotomiser 3) algorithm is a classic decision tree algorithm used in machine learning for classification tasks The
primary goal of recursively partition the data based on the features to create a decision tree that can be used for classification.

ID3 Algorithm Steps:

1. Selecting the Best Attribute


Choose the attribute that best est splits the data into subsets, ideally minimizing uncertainty or entropy. Entropy measures the impurity or
disorder of Creating Decision Nodes
2. Create a decision node based on the selected attribute.
3.split data
Split the dataset into subsets based on the values of the chosen attribute.
4.recursion
Recursively apply the above steps to each subset until one of the stopping conditions is met
5 stopping conditions
If all instances in a subset belong to the same class, create a leaf node with that class label. If there are no more attributes to split on,
create a leaf node with the most common class label. Other stopping conditions may be applied based on user-defined parameters
Entropy
In information theory and machine learning, entropy is a measure of uncertainty or disorder
in a set of data. In the context of decision trees, entropy is often used to evaluate the
impurity of a dataset.

H(S) = - Σ (P_i * log2(P_i))

Entropy is maximum when all classes are equally likely, indicating maximum disorder, and
it is minimum when the set is pure (contains instances of only one class).
Gini Index
The Gini index, like entropy, is a measure of impurity or disorder in a set of data. It is often used in the context of decision trees to evaluate
the quality of a split.

Gini(S) = 1 - Σ (P_i^2)

Similar to entropy, the Gini index is maximum when the classes are equally likely (maximum impurity) and minimum when the set is pure.
The Gini index is computationally less intensive than entropy, making it a popular choice in decision tree algorithms.

Information Gain

Information gain is a concept used in decision tree algorithms, particularly in the context of feature selection for splitting nodes.

In the context of decision trees, the information gain for a given feature is a measure of how well that feature separates the data into classes. It is
used to decide which feature to split on at each node of the tree.

Calculate Information Gain:

Information gain is then calculated as the difference between the impurity of the parent dataset and the weighted impurity of the child nodes:

Information Gain = Impurity(Parent) - Weighted Impurity(Children)


Tree Pruning
Tree pruning is a technique used in decision tree algorithms to prevent overfitting and improve the generalization
ability of the model.

Pruning involves removing parts of the tree that do not provide significant predictive power and simplifying the model.

Tree pruning is a technique used in decision tree algorithms to prevent overfitting and improve the generalization
ability of the model.Pruning involves removing parts of the tree that do not provide significant predictive power and
simplifying the model.
Let's say you are building a decision tree for a classification task. As a pre-pruning measure, you decide to limit the maximum depth of the tree to 5 levels.
This means that the tree will stop growing once it reaches a depth of 5, and no further splits will be performed beyond that depth. This helps prevent the
tree from becoming overly complex and overfitting the training data.
Divisive Dendrogram for hierarchical clustering
Dendrogram:
A dendrogram is a tree-like diagram that visualizes the hierarchy of clusters produced by hierarchical clustering. It is particularly useful in understanding the
relationships between different clusters and deciding the appropriate number of clusters to use for a particular application.

Key Concepts:

Vertical Lines (Branches):


Each vertical line in a dendrogram represents a cluster.
Horizontal Lines (Height):
The height at which a branch is split corresponds to the dissimilarity at which the split occurred.
Longer horizontal lines indicate larger dissimilarities.

Nodes:
Points where branches split are called nodes.
Steps in Divisive Clustering
Start with a Single Cluster:
Initially, all data points are in a single cluster.
Dissimilarity Measure:
Use a dissimilarity measure (e.g., Euclidean distance) to quantify the dissimilarity between data points or clusters.
Difference:
MEDOIDS vs CENTROID

The centroid is the average position of all points in a cluster, while the medoid is the data point
with the minimum dissimilarity to all other points in the cluster.
Centroid calculation involves taking the mean of coordinates, while medoid involves selecting an
actual data point.
Centroid is sensitive to outliers, while medoid is more robust as it is less influenced by extreme
values.
Centroid is used in K-means, while medoid is used in K-medoids and related algorithms.
tanh(z) = (e^{z} - e^{-z}) / (e^{z} + e^{-z})

Rectified ReLU(z) = max(0, z)


Linear Unit

σ(z) = 1 / (1 + e^{-z})

You might also like