You are on page 1of 90

Decision Tree

U.A. NULI
2
Introduction:

Decision tree is a supervised learning algorithm for classification that represents


learned knowledge in the tree data structure.

Tree based learning algorithms are considered to be one of the best and mostly used
supervised learning methods.

Tree based methods empower predictive models with high accuracy, stability and ease
of interpretation. Unlike linear models, they map non-linear relationships quite well.
They are adaptable at solving any kind of problem at hand

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


3
Decision Trees in the Real World

Because decision trees are so easy to interpret, they are among the most widely used
data-mining methods in business analysis, medical decision-making, and policymaking.

Often, a decision tree is created automatically, and an expert uses it to understand the
key factors and then refines it to better match her beliefs.

This process allows machines to assist experts and to clearly show the reasoning process
so that individuals can judge the quality of the prediction.

Decision trees have been used in this manner for such wide-ranging applications as
customer profiling, financial risk analysis, assisted diagnosis, and traffic prediction.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
4
What is decision tree?

➢ Decision tree is a type of supervised learning algorithm mostly used for classification
problem.
➢ Provide Rules for classifying data using attributes.
➢ The tree consists of decision nodes and leaf nodes.
➢ A decision node has two or more branches, each representing values for the
attribute tested.
➢ A leaf node attribute produces a homogeneous result (all in one class), which does
not require additional classification testing.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


5
Decision Tree Characteristics:

► Every non leaf node ( decision node) represents an attribute in dataset

► Every Branch represents possible values of an attribute

► Every leaf (or terminal) node represents the value of target attribute

► Starting node is called as root node.

► To make a decision, the flow starts at root node, navigates through the
arc/edges until it reaches a leaf node, and then makes decision
Based on leaf node value.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
6
Decision Tree Classification task

Tid Attrib1 Attrib2 Attrib3 Class


Tree
1 Yes Large 125K No Induction
2 No Medium 100K No algorithm
3 No Small 70K No

4 Yes Medium 120K No


Induction
5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No Learn


8 No Small 85K Yes Model
9 No Medium 75K No

10 No Small 90K Yes


Model
10

Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?


Deduction
14 No Small 95K ?

15 No Large 67K ?
10

Test Set
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
7
Decision Tree Examples:

Day Outlook Temp. Humidit Wind Play Tennis


D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
8

Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak


No Yes No Yes

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


9

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


10

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


11

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


12
how to build decision tree?

1. ID3 (Iterative Dichotomiser 3) → uses Entropy function and Information


gain as metrics.

2. CART (Classification and Regression Trees) → uses Gini


Index(Classification) as metric.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


13
ID3 (Iterative Dichotomiser 3)
Algorithm

 A mathematical algorithm for building the decision tree.


 Invented by J. Ross Quinlan in 1979.
 Uses Information Theory invented by Shannon in 1948.
 Builds the tree from the top down, with no backtracking.
 Information Gain is used to select the most useful
attribute for classification.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


14
ID3 (Iterative Dichotomiser 3)
Algorithm

Day Outlook Temp. Humidit Wind Play Tennis


D1 Sunny Hot High Weak No Attributes – outlook, temp,
D2 Sunny Hot High Strong No Humidity, Wind
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes Which attribute will be
D5 Rain Cool Normal Weak Yes used at root node
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Weak Yes
D8 Sunny Mild High Weak No
D9 Sunny Cold Normal Weak Yes
D10 Rain Mild Normal Strong Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
15
Which attribute to select first?

We have four X values (outlook, temp, humidity and wind) being


categorical and one y value (play Y or N) also being categorical.

so we need to learn the mapping (what machine learning always does)


between X and y.

This is a binary classification problem.

We Will build the tree using the ID3 algorithm.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


16

To create a tree, we need to have a root node first and we know that nodes are
features/attributes(outlook, temp, humidity and wind),

so which one do we need to pick first?

Answer: determine the attribute that best classifies the training data; use this attribute at
the root of the tree. Repeat this process at for each branch.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


17
so how do we choose the best
attribute?

Answer: use the attribute with the highest information gain in ID3

In order to define information gain precisely, we begin by defining a


measure commonly used in information theory, called entropy that
characterizes the (im)purity of an arbitrary collection of examples.”

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


18
Entropy

Entropy is the measures of impurity, disorder or uncertainty in a bunch of examples.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


19

16/30 are green circles; 14/30 are pink crosses


log2(16/30) = -.9; log2(14/30) = -1.1

Entropy = -(16/30)(-.9) –(14/30)(-1 .1 ) = .99

The higher the entropy the more the information content

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


20

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


21

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


22

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


23

Outlook Play Tennis


Sunny No
Sunny No
Overcast Yes P(Play Tennis = Yes) = 9/14
Rain Yes
Rain Yes P(Play Tennis = No) = 5/14
Rain No
Overcast Yes
Sunny No Entropy(Play Tennis) = - (9/14)log2(9/14) – (5/14)log2(5/14) = .940
Sunny Yes
Rain Yes
Sunny Yes
Overcast Yes
Overcast Yes
Rain No

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


24
Information Gain

We want to determine which attribute in a given set of attributes in dataset that is most
useful for discriminating between the classes to be learned.

Information gain tells us how important a given attribute is from all other attributes

We will use it to decide the ordering of attributes in the nodes of a decision tree.

the information gain (or entropy reduction) is the reduction in ‘uncertainty’ when
choosing an attribute

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


25
Information Gain

◼ The information gain is based on the decrease in entropy after a


dataset is split on an attribute.
How to decide the most appropriate attribute for decision node?
◼ First the entropy of the total dataset is calculated.
◼ The dataset is then split on the different attributes.
◼ The entropy for each branch is calculated. Then it is added
proportionally, to get total entropy for the split.
◼ The resulting entropy is subtracted from the entropy before the
split.
◼ The result is the Information Gain, or decrease in entropy.
◼ The attribute that yields the largest IG is chosen for the decision
node.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
26
Information Gain

◼ A branch set with entropy of 0 is a leaf node.


◼ Otherwise, the branch needs further splitting to classify
its dataset.
◼ The ID3 algorithm is run recursively on the non-leaf
branches, until all data is classified.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


27
Information Gain(IG)

Information Gain(IG) = entropy(parent) – [average entropy(children)]

IG(S,A): expected reduction in entropy due to splitting S on attribute A

IG(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


28
Calculation of information gain

All logs
are with
respect
to base 2

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


29
Calculating Entropy and information gain

◼ Most programming languages and calculators do not


have a log2 function.
◼ Use a conversion factor
◼ Take log function of 2, and divide by it.
◼ Example: log10(2) = .301
◼ Then divide to get log2(n):
◼ log2(3/5) = log10(3/5) / .301

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


30
Finding most important attribute for
play tennis dataset
day outlook temp humidity wind play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No Total element = 14
D3 Overcast Hot High Weak Yes No class = 5
D4 Rain Mild High Weak Yes Yes class = 9
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No Entropy of entire dataset
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No E(S) = - (9/14) log2(9/14) –
D9 Sunny Cool Normal Weak Yes (5/14) log2(5/14)
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes = 0.940
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
31

Now we have to split the database on outlook, humidity, wind and


temperature and have to find information gain in each case.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


32
Splitting dataset on outlook

day outlook temp humidity wind play


Total element in outlook sunny
D1 Sunny Hot High Weak No
=5
D2 Sunny Hot High Strong No Yes class = 2
D8 Sunny Mild High Weak No No class = 3
D9 Sunny Cool Normal Weak Yes
D11 Sunny Mild Normal Strong Yes Entropy of outlook sunny

E(Sun) = - (2/5) log2(2/5) –


(3/5) log2(3/5)

= 0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


33
Splitting dataset on outlook

Total element in outlook overcast


day outlook temp humidity wind play =4
D3 Overcast Hot High Weak Yes Yes class = 4
D7 Overcast Cool Normal Strong Yes
No class = 0
D12 Overcast Mild High Strong Yes
Entropy of outlook overcast
D13 Overcast Hot Normal Weak Yes
E(over) = - (4/4) log2(4/4) –
(0/4) log2(0/4)

=0

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


34
Splitting dataset on outlook

day outlook temp humidity wind play Total element in outlook rain
=5
D4 Rain Mild High Weak Yes Yes class = 3
D5 Rain Cool Normal Weak Yes No class = 2
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes Entropy of outlook overcast
D14 Rain Mild High Strong No
E(rain) = - (3/5) log2(3/5) –
(2/5) log2(2/5)

= 0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


35
Information gain on splitting dataset
on outlook
Sunny IG(S,A)=Entropy(S) - vvalues(A) |Sv|/|S| Entropy(Sv)
E(Sun) =
0.971
IG(S,outlook) = E(S) – ( (5/14)*E(Sun) + (4/14)* E(Over)
+ (5/14)*E(Rain))
= 0.940 – ((5/14)*0.971 + (4/14)*0
Ovecast + (5/14)* .971)
Outlook
E(over) = 0 = 0.247
E(S) = 0.940

Rain
E(Rain) =
0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


36

Similarly we can calculate information gain by


splitting dataset on Humidity, Temperature, and
wind

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


S=[9+,5-] S=[9+,5-] 37
E=0.940 E=0.940
Humidity Wind

High Normal Weak Strong

[3+, 4-] [6+, 1-] [6+, 2-] [3+, 3-]


E=0.985 E=0.592 E=0.811 E=1.0
IG(S,Humidity) IG(S,Wind)
=0.940-(7/14)*0.985 =0.940-(8/14)*0.811
– (7/14)*0.592 – (6/14)*1.0
=0.151 =0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


38

The information gain values for the 4 attributes are:


• IG(S,Outlook) =0.247
• IG(S,Humidity) =0.151
• IG(S,Wind) =0.048
• IG(S,Temperature) =0.029

where S denotes the collection of training examples


Here highest gain is available by splitting dataset on outlook
Hence Outlook becomes the root node of tree and its three
branches are Sunny, Overcast and Rain
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
39
Decision Tree – root node

Outlook

Sunny Overcast Rain

Yes
Sunny and Rain branches need further splitting because their entropy is greater than 0
Overcast will not be splitted because its entropy is 0
Tree grows in this way till it reaches leaf node having entropy 0U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
40

day outlook temp humidity wind play


Total element in outlook sunny
D1 Sunny Hot High Weak No
=5
D2 Sunny Hot High Strong No Yes class = 2
D8 Sunny Mild High Weak No No class = 3
D9 Sunny Cool Normal Weak Yes
D11 Sunny Mild Normal Strong Yes Entropy of outlook sunny

E(Sun) = - (2/5) log2(2/5) –


(3/5) log2(3/5)

= 0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


41
How to split sunny Branch?

Which Attribute to use for splitting sunny branch?


- temperature, humidity, or wind

Take sunny node and parent node and dataset corresponding to sunny
Value as parent data set.

Calculate Information Gain of Temperature, Humidity and Wind

Select attribute with highest gain as attribute for child node of outlook
On sunny branch.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
42

Attribute Outlook – value Sunny – Total Element 5 Entropy E(Sunny) = 0.971


Attribute Temperature:

Value Hot, Total Elements =2 , Play yes = 0 No =2


E(hot) = - (0/2) log2(0/2) – (2/2)log2(2/2) = 0

Value mild, Total Elements =2 , Play yes = 1 No =1


E(mild) = - (1/2) log2(1/2) – (1/2)log2(1/2) = 1

Value cool, Total Elements =1 , Play yes = 1 No =0


E(cool) = - (1/1) log2(1/1) – (0/1)log2(0/1) = 0

IG(sunny, Temp ) = E(sunny) – ((2/5)*E(hot) + (2/5)*E(mild) + (1/5)*E(cool))


= 0.971 – ((2/5)*0 + (2/5)*1 + (1/5)*0) = 0.571 U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
43

Attribute Outlook – value Sunny – Total Element 5 Entropy E(Sunny) = 0.971


Attribute Humidity:

Value High, Total Elements =3 , Play yes = 0 No =3


E(high) = - (0/3) log2(0/3) – (3/3)log2(3/3) = 0

Value Normal, Total Elements =2 , Play yes = 2 No =0


E(Normal) = - (2/2) log2(2/2) – (0/2)log2(0/2) = 0

IG(sunny, Humidity ) = E(sunny) – ((3/5)*E(High) + (2/5)*E(Normal))


= 0.971 – ((3/5)*0 + (2/5)*0) = 0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


44

Attribute Outlook – value Sunny – Total Element 5 Entropy E(Sunny) = 0.971


Attribute Wind:

Value weak, Total Elements =3 , Play yes = 1 No =2


E(weak) = - (1/3) log2(1/3) – (2/3)log2(2/3) = 0.931

Value strong, Total Elements =2 , Play yes = 1 No =1


E(Strong) = - (1/2) log2(1/2) – (1/2)log2(1/2) = 1

IG(sunny, Wind ) = E(sunny) – ((3/5)*E(weak) + (2/5)*E(strong))


= 0.971 – ((3/5)*0.931 + (2/5)*1) = 0.9586

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


45

The information gain values for the 3 attributes are:

IG(sunny,Temperature) = 0.571

IG(sunny,Humidity) = 0.971

IG(sunny,Wind) = 0.9586

The highest gain is achieved through Humidity attribute hence sunny node
Should split on the basis of humidity.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


46

Outlook

sunny
Rain
Overcast

Humidity Yes

High Normal

No Yes

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


47
How to split Rain branch?

Which Attribute to use for splitting Rain node?


- temperature, humidity, or wind

Take Rain node as parent node and dataset corresponding to Rain


Value as parent data set.

Calculate Information Gain of Temperature, Humidity and Wind

Select attribute with highest gain as child node of Outlook on


Rain Branch of tree.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
48
Splitting dataset on outlook

day outlook temp humidity wind play Total element in outlook rain
=5
D4 Rain Mild High Weak Yes Yes class = 3
D5 Rain Cool Normal Weak Yes No class = 2
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes Entropy of outlook overcast
D14 Rain Mild High Strong No
E(rain) = - (3/5) log2(3/5) –
(2/5) log2(2/5)

= 0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


49

Attribute Outlook – value Rain – Total Element 5 Entropy E(Rain) = 0.971


Attribute Temperature:

Value mild, Total Elements =3 , Play yes = 2 No =1


E(mild) = - (2/3) log2(2/3) – (1/3)log2(1/3) = 0.931

Value cool, Total Elements =2 , Play yes = 1 No =1


E(cool) = - (1/2) log2(1/2) – (1/2)log2(1/2) = 1

IG(Rain, Temp ) = E(Rain) – ((3/5)*E(mild) + (2/5)*E(cool))


= 0.971 – ((3/5)*0.931 + (2/5)*1) = 0.9586

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


50

Attribute Outlook – value Rain – Total Element 5 Entropy E(rain) = 0.971


Attribute Humidity:

Value High, Total Elements =2 , Play yes = 1 No =1


E(high) = - (1/2) log2(1/2) – (1/2)log2(1/2) = 1

Value Normal, Total Elements =3 , Play yes = 2 No =1


E(Normal) = - (2/3) log2(2/3) – (1/3)log2(1/3) = 0.931

IG(rain, Humidity ) = E(rain) – ((2/5)*E(High) + (3/5)*E(Normal))


= 0.971 – ((3/5)*1 + (2/5)*0.931) = -0.0014

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


51

Attribute Outlook – value Rain – Total Element 5 Entropy E(Rain) = 0.971


Attribute Wind:

Value weak, Total Elements =3 , Play yes = 3 No =0


E(weak) = - (3/3) log2(3/3) – (0/3)log2(0/3) = 0

Value strong, Total Elements =2 , Play yes = 0 No =2


E(Strong) = - (0/2) log2(0/2) – (2/2)log2(2/2) = 0

IG(rain, Wind ) = E(rain) – ((3/5)*E(weak) + (2/5)*E(strong))


= 0.971 – ((3/5)*0 + (2/5)*0) = 0.971

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


52

The information gain values for the 3 attributes are:

IG(rain,Temperature) = 0.9586

IG(rain,Humidity) = -0.0014

IG(rain,Wind) = 0.971

The highest gain is achieved through Wind attribute hence Rain branch
Should split on the basis of Wind Attribute.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


53
Complete Decision Tree

Outlook

sunny
Rain
Overcast

Humidity Yes
Wind
High Normal Strong
weak

No Yes Yes No

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


ID3 Algorithm: 54
ID3 (Examples, Target_Attribute, Attributes)
Create a root node for the tree
If all examples are positive, Return the single-node tree Root, with label = +.
If all examples are negative, Return the single-node tree Root, with label = -.
If number of predicting attributes is empty, then Return the single node tree Root,
with label = most common value of the target attribute in the examples.
Otherwise Begin
A ← The Attribute that best classifies examples.
Decision Tree attribute for Root = A.
For each possible value, vi, of A,
Add a new tree branch below Root, corresponding to the test A = vi.
Let Examples(vi) be the subset of examples that have the value vi for A
If Examples(vi) is empty
Then below this new branch add a leaf node with label = most common target value in the
examples
Else below this new branch add the subtree ID3 (Examples(vi), Target_Attribute, Attributes – {A})
End
Return Root
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
55
Advantages of ID3

 Understandable prediction rules are created from the training data.


 Builds the fastest tree.
 Builds a short tree.
 Only need to test enough attributes until all data is classified.
 Finding leaf nodes enables test data to be pruned, reducing number of tests.
 Whole dataset is searched to create tree.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


56
Disadvantages of ID3

 Data may be over-fitted or over-classified, if a small sample is tested.


 Only one attribute at a time is tested for making a decision.
 Classifying continuous data may be computationally expensive, as
many trees must be generated to see where to break the continuum.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


57
Classification And Regression
Tree(CART) Algorithm:

CART algorithm makes use of Gini index to select appropriate attribute for a node.

Here the Gini Index (used in CART) measures the impurity of a data partition

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


58
Gini index

Pk - Probability of class K

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


59
Example:

D = { Y,Y,Y,Y,N,N,N} D = { Y,Y,Y,Y,Y,Y,Y}

Total elements = 7 Y =4, N =3 Total elements = 7 Y =7, N =0

Gini(D) = 1 – (4/7)2-(3/7)2 Gini(D) = 1 – (7/7)2-(0/7)2


Gini(D) = 1 – 0.3265 – 0.1837 Gini(D) = 1 – 1 – 0

Gini(D) = 0.4898 Gini(D) = 0

Gini index of pure/homogeneous data is 0 where as impure data is greater than 0

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


60
Gini index of dataset with multiple
attribute

Drivng Risk Table


Age Car Type Risk Total Element =6 Risk (high) =4,
Risk(low)=2
23 Family High
17 Sports High Gini(DrvTable) = 1- (4/6)2 – (2/6)2
43 Sports High Gini(DrvTable) = 1-0.4444 – 0.1111
68 Family low
32 Truck Low Gini(DrvTable) = 0.4445
20 Family high

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


61
How to find Gini index of Cartype
Attribute?

CartType has multiple values Family, Sports, and Truck.

Its Gini index can be calculated as multiway split or binary split.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


62
Gini index of CarType using multiway split

Car Type Gini(sport) = 1- (2/2)2-(0/2)2


Risk Sports Family Truck
Gini(sport) = 1- 1-0 = 0
High 2 2 0
Gini(Family) = 1- (2/3)2-(1/3)2
Low 0 1 1
Gini(Family) = 1- 0.4444 – 0.1111 = 0.4445

Gini(Truck) = 1- (0/1)2-(1/1)2 = 1-0-1 =0

How to combine three gini index of three categories to cartype ?

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


63
Combine gini index of attribute having
multiple values:

If an attribute D having total elements Dn has 3 values D1,D2, D3

D1 has D1n elements and gini index of Gini(D1)

D2 has D2n elements and gini index of Gini(D2)

D3 has D3n elements and gini index of Gini(D3)

And Dn = D1n+D2n+D3n Then

Gini(D) = (D1n/Dn)*Gini(D1) + (D2n/Dn)*Gini(D2) + (D3n/Dn)*Gini(D3)

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


64
Gini index of CarType Attribute

Gini (CarType) = (2/6)*Gini(Sport) + (3/6) Gini(Family) + (1/6)*Gini(Truck)

Gini (CarType) = (2/6)*0+(3/6)*.4445+(1/6)*0

Gini (CarType) = 0.2222

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


65
Can we split CarType for binary split

We have three values Sports, Family and Truck for Cartype that splits in three branches
We can split into two branches by grouping certain values
For example:

{Sports,Family} and {Truck}


{Sports,Truck} and {Family}
{Truck,Family} and {Sport}

For each of this we have to calculate gini value and pickup the lowest gini group

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


Car Type1 Gini(Sp,Fa) = 1 – (4/5)2-(1/5)2 66
Risk Sports,Family Truck Gini(Sp,Fa) = 1 – 0.64 - 0.04 = 0.32
High 4 0
Gini(Truck) = 1- (0/1)2-(1/1)2 = 0
Low 1 1
Gini(CarType1) = (5/6)*0.2975 + (1/6)*0 = 0.2667

Gini(Fa,Tr) = 1 – (2/4)2-(2/4)2
Car Type2
Risk Sports Family,Truck Gini(Fa,Tr) = 1 – 0.25 - 0.25 = 0.5
High 2 2 Gini(sport) = 1- (2/2)2-(0/2)2 = 0
Low 0 2
Gini(CarType2) = (5/6)*0.5 + (1/6)*0 = 0.4167

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


67

Car Type3
Risk Sports,Truck Family Gini(Sp,Tr) = 1 – (2/3)2-(1/3)2
High 2 2 Gini(Sp,Tr) = 1 – 0.4444 - 0.1111 = 0.4445
Low 1 1 Gini(Family) = 1- (2/3)2-(1/3)2 = 0.4445

Gini(CarType3) = (5/6)*0.4445 + (1/6)*0.4445 = 0.4445

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


68
Which Split is better?

Split Gini
CarType1 0.2667 Hence, Best Split is CarType1 because it has lowest
Gini Index
CarType2 0.4167
CarType3 0.4445

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


69
day outlook temp humidity wind play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes
D4 Rain Mild High Weak Yes
D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes
D8 Sunny Mild High Weak No
D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes
D13 Overcast Hot Normal Weak Yes
D14 Rain Mild High Strong No

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


70
Gini Index for outlook

Outlook Yes No Number of instances


Sunny 2 3 5
Overcast 4 0 4
Rain 3 2 5

Gini(Outlook=Sunny) = 1 – (2/5)2 – (3/5)2 = 1 – 0.16 – 0.36 = 0.48


Gini(Outlook=Overcast) = 1 – (4/4)2 – (0/4)2 = 0
Gini(Outlook=Rain) = 1 – (3/5)2 – (2/5)2 = 1 – 0.36 – 0.16 = 0.48
Then, we will calculate weighted sum of gini indexes for outlook feature.
Gini(Outlook) = (5/14) x 0.48 + (4/14) x 0 + (5/14) x 0.48 = 0.171 + 0 + 0.171 = 0.342
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
71
Gini Index for Temperature
Temperature
Similarly, temperature is a nominal feature and it could have 3 different values: Cool, Hot and Mild.
Let’s summarize decisions for temperature feature.

Temperature Yes No Number of instances

Hot 2 2 4

Cool 3 1 4

Mild 4 2 6

Gini(Temp=Hot) = 1 – (2/4)2 – (2/4)2 = 0.5


Gini(Temp=Cool) = 1 – (3/4)2 – (1/4)2 = 1 – 0.5625 – 0.0625 = 0.375
Gini(Temp=Mild) = 1 – (4/6)2 – (2/6)2 = 1 – 0.444 – 0.111 = 0.445
We’ll calculate weighted sum of gini index for temperature feature
Gini(Temp) = (4/14) x 0.5 + (4/14) x 0.375 + (6/14) x 0.445 = 0.142 + 0.107 + 0.190 = 0.439
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
72
Gini Index for Humidity

Humidity Yes No Number of instances

High 3 4 7

Normal 6 1 7

Gini(Humidity=High) = 1 – (3/7)2 – (4/7)2 = 1 – 0.183 – 0.326 = 0.489


Gini(Humidity=Normal) = 1 – (6/7)2 – (1/7)2 = 1 – 0.734 – 0.02 = 0.244

Weighted sum for humidity feature will be calculated next


Gini(Humidity) = (7/14) x 0.489 + (7/14) x 0.244 = 0.367

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


73
Gini Index for Wind

Wind Yes No Number of instances

Weak 6 2 8

Strong 3 3 6

Gini(Wind=Weak) = 1 – (6/8)2 – (2/8)2 = 1 – 0.5625 – 0.062 = 0.375


Gini(Wind=Strong) = 1 – (3/6)2 – (3/6)2 = 1 – 0.25 – 0.25 = 0.5

Gini(Wind) = (8/14) x 0.375 + (6/14) x 0.5 = 0.428

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


74
Which attribute is best?

We’ve calculated gini index values for each feature. The winner will be outlook
feature because its cost is the lowest.

Feature Gini index

Outlook 0.342
Outlook Attribute will be at the root of the tree
Temperature 0.439

Humidity 0.367

Wind 0.428

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


75
Root node of the tree

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


76
Attribute for Sunny branch

Day Outlook Temp. Humidity Wind Decision

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

11 Sunny Mild Normal Strong Yes

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


77
Gini of temperature for sunny outlook

Temperature Yes No Number of instances

Hot 0 2 2

Cool 1 0 1

Mild 1 1 2

Gini(Outlook=Sunny and Temp.=Hot) = 1 – (0/2)2 – (2/2)2 = 0


Gini(Outlook=Sunny and Temp.=Cool) = 1 – (1/1)2 – (0/1)2 = 0
Gini(Outlook=Sunny and Temp.=Mild) = 1 – (1/2)2 – (1/2)2 = 1 – 0.25 – 0.25 = 0.5

Gini(Outlook=Sunny and Temp.) = (2/5)x0 + (1/5)x0 + (2/5)x0.5 = 0.2


U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
78
Gini of humidity for sunny outlook

Humidity Yes No Number of instances

High 0 3 3

Normal 2 0 2

Gini(Outlook=Sunny and Humidity=High) = 1 – (0/3)2 – (3/3)2 = 0


Gini(Outlook=Sunny and Humidity=Normal) = 1 – (2/2)2 – (0/2)2 = 0

Gini(Outlook=Sunny and Humidity) = (3/5)x0 + (2/5)x0 = 0

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


79
Gini of wind for sunny outlook

Wind Yes No Number of instances

Weak 1 2 3

Strong 1 1 2

Gini(Outlook=Sunny and Wind=Weak) = 1 – (1/3)2 – (2/3)2 = 0.266


Gini(Outlook=Sunny and Wind=Strong) = 1- (1/2)2 – (1/2)2 = 0.2

Gini(Outlook=Sunny and Wind) = (3/5)x0.266 + (2/5)x0.2 = 0.466

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


80
Decision for sunny outlook

We’ve calculated gini index scores for feature when outlook is sunny. The winner is
humidity because it has the lowest value.

Feature Gini index

Temperature 0.2

Humidity 0

Wind 0.466

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


81

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


82
Rain outlook

Day Outlook Temp. Humidity Wind Decision


4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
10 Rain Mild Normal Weak Yes
14 Rain Mild High Strong No

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


83
Gini of temprature for rain outlook

Temperature Yes No Number of instances

Cool 1 1 2

Mild 2 1 3

Gini(Outlook=Rain and Temp.=Cool) = 1 – (1/2)2 – (1/2)2 = 0.5


Gini(Outlook=Rain and Temp.=Mild) = 1 – (2/3)2 – (1/3)2 = 0.444

Gini(Outlook=Rain and Temp.) = (2/5)x0.5 + (3/5)x0.444 = 0.466

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


84
Gini of humidity for rain outlook

Humidity Yes No Number of instances


High 1 1 2
Normal 2 1 3

Gini(Outlook=Rain and Humidity=High) = 1 – (1/2)2 – (1/2)2 = 0.5


Gini(Outlook=Rain and Humidity=Normal) = 1 – (2/3)2 – (1/3)2 = 0.444

Gini(Outlook=Rain and Humidity) = (2/5)x0.5 + (3/5)x0.444 = 0.466

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


85
Gini of wind for rain outlook

Wind Yes No Number of instances


Weak 3 0 3
Strong 0 2 2

Gini(Outlook=Rain and Wind=Weak) = 1 – (3/3)2 – (0/3)2 = 0


Gini(Outlook=Rain and Wind=Strong) = 1 – (0/2)2 – (2/2)2 = 0

Gini(Outlook=Rain and Wind) = (3/5)x0 + (2/5)x0 = 0

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


86
Decision for rain outlook

The winner is wind feature for rain outlook because it has the minimum gini index score
in features.

Feature Gini index


Temperature 0.466
Humidity 0.466
Wind 0

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


87

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


88

Procedure used in CART and ID3 algorithm is same the only difference is the metric
used to
Calculate impurity at every stage.

Incase of ID3 entropy is used and in CART gini index is used.

Although it is possible to create a multiway tree using CART, it is most preferred for
Binary Tree

Tree created using ID3 and CART may not be same.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji


89
Advantages of Decision Tree

1 . Easy to Understand: Decision tree output is very easy to understand even for people from non-
analytical background. It does not require any statistical knowledge to read and interpret them.
Its graphical representation is very intuitive and users can easily relate their hypothesis.
2. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant
variables and relation between two or more variables. With the help of decision trees, we can
create new variables / features that has better power to predict target variable. You can refer
article (Trick to enhance power of regression model) for one such trick. It can also be used in
data exploration stage. For example, we are
working on a problem where we have information available in hundreds of variables, there
decision tree will help to identify most significant variable.
3. Less data cleaning required: It requires less data cleaning compared to some other modeling
techniques. It is not influenced by outliers and missing values to a fair degree.
4. Data type is not a constraint: It can handle both numerical and categorical variables.
5. Non Parametric Method: Decision tree is considered to be a non-parametric method. This
means that decision trees have no assumptions about the space distribution and the classifier
structure. U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
90
Disadvantages of Decision Tree

1 . Over fitting: Over fitting is one of the most practical difficulty for
decision tree models. This problem gets solved by setting constraints on
model parameters and pruning (discussed in detailed below).
2. Not fit for continuous variables: While working with continuous
numerical variables, decision tree looses information when it categorizes
variables in different categories.

U.A. Nuli, Textile and Engineering Institute, Ichalkaranji

You might also like