Professional Documents
Culture Documents
Decision Tree
Decision Tree
U.A. NULI
2
Introduction:
Tree based learning algorithms are considered to be one of the best and mostly used
supervised learning methods.
Tree based methods empower predictive models with high accuracy, stability and ease
of interpretation. Unlike linear models, they map non-linear relationships quite well.
They are adaptable at solving any kind of problem at hand
Because decision trees are so easy to interpret, they are among the most widely used
data-mining methods in business analysis, medical decision-making, and policymaking.
Often, a decision tree is created automatically, and an expert uses it to understand the
key factors and then refines it to better match her beliefs.
This process allows machines to assist experts and to clearly show the reasoning process
so that individuals can judge the quality of the prediction.
Decision trees have been used in this manner for such wide-ranging applications as
customer profiling, financial risk analysis, assisted diagnosis, and traffic prediction.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
4
What is decision tree?
➢ Decision tree is a type of supervised learning algorithm mostly used for classification
problem.
➢ Provide Rules for classifying data using attributes.
➢ The tree consists of decision nodes and leaf nodes.
➢ A decision node has two or more branches, each representing values for the
attribute tested.
➢ A leaf node attribute produces a homogeneous result (all in one class), which does
not require additional classification testing.
► Every leaf (or terminal) node represents the value of target attribute
► To make a decision, the flow starts at root node, navigates through the
arc/edges until it reaches a leaf node, and then makes decision
Based on leaf node value.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
6
Decision Tree Classification task
6 No Medium 60K No
Training Set
Apply
Tid Attrib1 Attrib2 Attrib3 Class
Model
11 No Small 55K ?
15 No Large 67K ?
10
Test Set
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
7
Decision Tree Examples:
Outlook
To create a tree, we need to have a root node first and we know that nodes are
features/attributes(outlook, temp, humidity and wind),
Answer: determine the attribute that best classifies the training data; use this attribute at
the root of the tree. Repeat this process at for each branch.
Answer: use the attribute with the highest information gain in ID3
We want to determine which attribute in a given set of attributes in dataset that is most
useful for discriminating between the classes to be learned.
Information gain tells us how important a given attribute is from all other attributes
We will use it to decide the ordering of attributes in the nodes of a decision tree.
the information gain (or entropy reduction) is the reduction in ‘uncertainty’ when
choosing an attribute
All logs
are with
respect
to base 2
= 0.971
=0
day outlook temp humidity wind play Total element in outlook rain
=5
D4 Rain Mild High Weak Yes Yes class = 3
D5 Rain Cool Normal Weak Yes No class = 2
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes Entropy of outlook overcast
D14 Rain Mild High Strong No
E(rain) = - (3/5) log2(3/5) –
(2/5) log2(2/5)
= 0.971
Rain
E(Rain) =
0.971
Outlook
Yes
Sunny and Rain branches need further splitting because their entropy is greater than 0
Overcast will not be splitted because its entropy is 0
Tree grows in this way till it reaches leaf node having entropy 0U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
40
= 0.971
Take sunny node and parent node and dataset corresponding to sunny
Value as parent data set.
Select attribute with highest gain as attribute for child node of outlook
On sunny branch.
U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
42
IG(sunny,Temperature) = 0.571
IG(sunny,Humidity) = 0.971
IG(sunny,Wind) = 0.9586
The highest gain is achieved through Humidity attribute hence sunny node
Should split on the basis of humidity.
Outlook
sunny
Rain
Overcast
Humidity Yes
High Normal
No Yes
day outlook temp humidity wind play Total element in outlook rain
=5
D4 Rain Mild High Weak Yes Yes class = 3
D5 Rain Cool Normal Weak Yes No class = 2
D6 Rain Cool Normal Strong No
D10 Rain Mild Normal Weak Yes Entropy of outlook overcast
D14 Rain Mild High Strong No
E(rain) = - (3/5) log2(3/5) –
(2/5) log2(2/5)
= 0.971
IG(rain,Temperature) = 0.9586
IG(rain,Humidity) = -0.0014
IG(rain,Wind) = 0.971
The highest gain is achieved through Wind attribute hence Rain branch
Should split on the basis of Wind Attribute.
Outlook
sunny
Rain
Overcast
Humidity Yes
Wind
High Normal Strong
weak
No Yes Yes No
CART algorithm makes use of Gini index to select appropriate attribute for a node.
Here the Gini Index (used in CART) measures the impurity of a data partition
Pk - Probability of class K
D = { Y,Y,Y,Y,N,N,N} D = { Y,Y,Y,Y,Y,Y,Y}
We have three values Sports, Family and Truck for Cartype that splits in three branches
We can split into two branches by grouping certain values
For example:
For each of this we have to calculate gini value and pickup the lowest gini group
Gini(Fa,Tr) = 1 – (2/4)2-(2/4)2
Car Type2
Risk Sports Family,Truck Gini(Fa,Tr) = 1 – 0.25 - 0.25 = 0.5
High 2 2 Gini(sport) = 1- (2/2)2-(0/2)2 = 0
Low 0 2
Gini(CarType2) = (5/6)*0.5 + (1/6)*0 = 0.4167
Car Type3
Risk Sports,Truck Family Gini(Sp,Tr) = 1 – (2/3)2-(1/3)2
High 2 2 Gini(Sp,Tr) = 1 – 0.4444 - 0.1111 = 0.4445
Low 1 1 Gini(Family) = 1- (2/3)2-(1/3)2 = 0.4445
Split Gini
CarType1 0.2667 Hence, Best Split is CarType1 because it has lowest
Gini Index
CarType2 0.4167
CarType3 0.4445
Hot 2 2 4
Cool 3 1 4
Mild 4 2 6
High 3 4 7
Normal 6 1 7
Weak 6 2 8
Strong 3 3 6
We’ve calculated gini index values for each feature. The winner will be outlook
feature because its cost is the lowest.
Outlook 0.342
Outlook Attribute will be at the root of the tree
Temperature 0.439
Humidity 0.367
Wind 0.428
Hot 0 2 2
Cool 1 0 1
Mild 1 1 2
High 0 3 3
Normal 2 0 2
Weak 1 2 3
Strong 1 1 2
We’ve calculated gini index scores for feature when outlook is sunny. The winner is
humidity because it has the lowest value.
Temperature 0.2
Humidity 0
Wind 0.466
Cool 1 1 2
Mild 2 1 3
The winner is wind feature for rain outlook because it has the minimum gini index score
in features.
Procedure used in CART and ID3 algorithm is same the only difference is the metric
used to
Calculate impurity at every stage.
Although it is possible to create a multiway tree using CART, it is most preferred for
Binary Tree
1 . Easy to Understand: Decision tree output is very easy to understand even for people from non-
analytical background. It does not require any statistical knowledge to read and interpret them.
Its graphical representation is very intuitive and users can easily relate their hypothesis.
2. Useful in Data exploration: Decision tree is one of the fastest way to identify most significant
variables and relation between two or more variables. With the help of decision trees, we can
create new variables / features that has better power to predict target variable. You can refer
article (Trick to enhance power of regression model) for one such trick. It can also be used in
data exploration stage. For example, we are
working on a problem where we have information available in hundreds of variables, there
decision tree will help to identify most significant variable.
3. Less data cleaning required: It requires less data cleaning compared to some other modeling
techniques. It is not influenced by outliers and missing values to a fair degree.
4. Data type is not a constraint: It can handle both numerical and categorical variables.
5. Non Parametric Method: Decision tree is considered to be a non-parametric method. This
means that decision trees have no assumptions about the space distribution and the classifier
structure. U.A. Nuli, Textile and Engineering Institute, Ichalkaranji
90
Disadvantages of Decision Tree
1 . Over fitting: Over fitting is one of the most practical difficulty for
decision tree models. This problem gets solved by setting constraints on
model parameters and pruning (discussed in detailed below).
2. Not fit for continuous variables: While working with continuous
numerical variables, decision tree looses information when it categorizes
variables in different categories.