You are on page 1of 3

Gini IMPURITY : 1 - sum( ovr all classes)[ fraction of datapoints in that class] 2 ….

Fr pure
nodes = 0

Gini index: Gini impur ( parent node ) - partition of data1 ( gini imppurity of par 1) - partition 2
ratio* gini impur ( 2nd partition )

Important : squared…

Entropy: - sumal classes [ f * log(fraction) ].... Fr pure nodes =0

Info gain : 1- p1*entropy(p1) - p2*entropy(p2)


We can find min impurity split threshold as:
● Exhaustive..if few features ad small dynamic rage of data points values in those
features… try out diff values fr threshold n save the besr value thresholds in each
feature then compare the best ones..n decide
● Use gradient descent

1) Hw to decide feature for classifying


2) Hw to decide threshold

Feature:
● IMPURITY:
○ ENTROPY

Threshold:
Select that threshold gives min impurity in nyx step:
1. entropy
2. Gini index
3. Misclassification error

Gini IMPUITY is dotted line… peaks faster but still differentiable.. Helps in grad decant

The triangle is misclassification error… 1 - max( prob of correct class) => 1 - P ( all are correctly
classified )

⇒ cost function: the dec in entropy after doing spli using a threshold n a feature… we h to max this
threshold..so grad desc

You might also like