Professional Documents
Culture Documents
GTU #3160714
Unit-4
Classification &
Prediction
• Classification Methods
• Decision Tree
• Bayesian Classification
• Rule Based Classification
• Neural Network
Classification Methods
Section - 1
Classification Methods
Classificatio
n
age?
youth senior
middle_aged
no yes no yes
The attribute A with the highest information gain is chosen as the splitting attribute at node
N.
Similarly,
Gain(income) = 0.029 bits
Gain(student) = 0.151 bits
Gain(credit_rating) = 0.048 bits
age attribute has highest Information Gain among all attributes.
Therefore node N is labelled with age and branches grow for each of the attributes value.
This value represents the potential information generated by splitting the training data set, D,
into v partitions, corresponding to the v outcomes of a test on attribute A.
Gain Ratio is defined as
The attribute with the maximum gain ratio is selected as the splitting attribute.
For a discrete-valued attribute, the subset that gives the minimum Gini index for that
attribute is selected as its splitting subset.
( ( ) ( )) ( ( ) ( ))
2 2 2 2
10 7 3 4 2 2
¿ 1− − + 1− −
14 10 10 14 4 4
¿ 0.443
¿ 𝐺𝑖𝑛𝑖𝑖𝑛𝑐𝑜𝑚𝑒 ∈ {h𝑖𝑔h } ( 𝐷 )
Best binary split for attribute income is on {low, medium} (or {high}) because it minimizes
the Gini index.
From above all Gini index of age is minimum which results in binary split and
Unit-4
Classification &
Thank Any
Questions ?
Prediction
You