CSC354
Machine Learning
Dr Muhammad Sharjeel
03
Decision Trees
General motive of Decision Tree (DT) is to create a training model which can
predict class (or value) of target variables by learning decision rules inferred from
prior data (training data)
In a DT, each node represents a feature (attribute), each link (branch) a decision
(rule) and each leaf an outcome
CSC354 – Machine Learning Dr Muhammad Sharjeel
Belongs to the family of supervised learning algorithms
Could be used to solve both classification and regression problems
Transparent algorithm, means decisions can be read and understood
CSC354 – Machine Learning Dr Muhammad Sharjeel
Algorithm pseudocode
1. Place the best attribute of the dataset (complete training set) at the root of
the tree
2. Split the training set into subsets in such a way that each subset contains
data with the same value for an attribute
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the
branches of the tree
CSC354 – Machine Learning Dr Muhammad Sharjeel
To create DT
Shortlist a root node among all the nodes (nodes are ‘features/attributes’ in the dataset)
Determine a node (attribute) that best classifies the training data and use it as the root
Repeat the process for each branch
CSC354 – Machine Learning Dr Muhammad Sharjeel
Three implementations used to create DTs
ID3
C4.5
CART
CSC354 – Machine Learning Dr Muhammad Sharjeel
ID3 (Iterative Dichotomiser), uses information gain as metric
Dichotomisation means dividing something into two completely opposite things
ID3 iteratively divides attributes into two groups (dominant vs others) to construct
a tree
Dominant attributes are selected based on information gain
Performs top-down, greedy search through the space of possible decision trees
Top-down means it starts building the tree from the top
Greedy means at each iteration it selects the best feature at the present moment to
create a node
CSC354 – Machine Learning Dr Muhammad Sharjeel
Which attribute (node) best classifies the training data?
Most dominant attribute would be the one with the highest information gain
Information gain calculates the reduction in the entropy
Entropy (uncertainty) of a dataset is the measure of disorder in the target attribute
Entropy measures
How well a given attribute/feature separates (or classifies) the target classes
Attribute with the highest information gain is selected as the best one
CSC354 – Machine Learning Dr Muhammad Sharjeel
Entropy is the measurement of the impurity or randomness in the values of the
dataset
A low disorder (no disorder) implies a low level of impurity
Values between 0 and 1. A ‘1’ signifies a higher level of disorder or more impurity
CSC354 – Machine Learning Dr Muhammad Sharjeel
Formulae to calculate Entropy and Information Gain
Entropy (S) = ∑ – p(I) . log2p(I)
Gain (S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]
CSC354 – Machine Learning Dr Muhammad Sharjeel
Compute the entropy [Entropy(S)] for the entire dataset
For each attribute/feature:
Calculate entropy [Entropy(A)] for each value of the attribute
Calculate average information entropy (IE) for the attribute
Calculate information gain (IG) for the attribute
Pick the highest gain attribute
Repeat until the complete tree is formed
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Compute the entropy [Entropy(S)] for the entire dataset
Entropy(S) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)
Entropy(S) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940
CSC354 – Machine Learning Dr Muhammad Sharjeel
For each attribute/feature: (let say, Outlook)
Calculate entropy [Entropy(A)] for each value of the attribute, i.e., in case of Outlook,
'Sunny', 'Rainy’, 'Overcast'
Outlook PlayGolf Outlook PlayGolf Outlook PlayGolf
Sunny No Rain Yes Overcast Yes
Sunny No Rain Yes Overcast Yes
Sunny No Rain No Overcast Yes
Sunny Yes Rain Yes Overcast Yes
Sunny Yes Rain No
Calculations for Outlook (Sunny)
Outlook Positive Negative Entropy
Sunny 2 3 0.971 -(2/5).log2(2/5) - (3/5).log2(3/5)
-(0.4).(-1.322)- (0.6).(-0.737)
Rainy 3 2 0.971
0.5288 + .4422
Overcast 4 0 0 = 0.971
CSC354 – Machine Learning Dr Muhammad Sharjeel
For each attribute/feature:
Calculate average information entropy (IE) for the attribute (i.e., Outlook)
IE(Outlook) = (2+3/9+5)*0.971 + (3+2/9+5)*0.971 + (4+0/9+5)*0
IE(Outlook) = 0.693
Calculate information gain (IG) for the attribute (i.e., Outlook)
IG(Outlook) = 0.940 – 0.693 = 0.247
CSC354 – Machine Learning Dr Muhammad Sharjeel
Pick the highest gain attribute, in this case, Outlook
Attributes Gain
Outlook 0.247
Temperature 0.029
Outlook
Humidity 0.152
Wind 0.048
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Overcast) only contains examples of ‘Yes’
Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples
Outlook
Sunny Overcast Rain
? Yes ?
Repeat until the complete tree is formed
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Overcast) only contains examples of ‘Yes’
Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples
Outlook
Yes
Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
Entropy(S) = 0.971
Entropy(A)[Temperature](Cool) = 0
Entropy(A)[Temperature](Hot) = 0
Entropy(A)[Temperature](Mild) = 1
IE(Temperature) = 0.400
IG(Temperature) = 0.571
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
Entropy(S) = 0.971
Entropy(A)[Humidity](High) = 0
Entropy(A)[Humidity](Normal) = 0
IE(Humidity) = 0
IG(Humidity) = 0.971
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
Entropy(S) = 0.971
Entropy(A)[Wind](Strong) = 1
Entropy(A)[Wind](Weak) = 0.918
IE(Wind) = 0.951
IG(Wind) = 0.020
CSC354 – Machine Learning Dr Muhammad Sharjeel
Pick the highest gain attribute, in this case, Humidity
Outlook
Sunny Overcast Rain
Humidity Yes ?
Normal High
Yes No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No
Entropy(S) = 0.971
Entropy(A)[Temperature](Cool) = 1
Entropy(A)[Temperature](Mild) = 0.918
IE(Temperature) = 0.951
IG(Temperature) = 0.020
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No
Entropy(S) = 0.971
Entropy(A)[Humidity](High) = 1
Entropy(A)[Humidity](Normal) = 0.918
IE(Humidity) = 0.951
IG(Humidity) = 0.020
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No
Entropy(S) = 0.971
Entropy(A)[Wind](Weak) = 0
Entropy(A)[Wind](Strong) = 0
IE(Humidity) = 0
IG(Humidity) = 0.971
CSC354 – Machine Learning Dr Muhammad Sharjeel
Pick the highest gain attribute, in this case, Wind
Outlook
Sunny Overcast Rain
Humidity Yes Wind
Normal High Weak Strong
Yes No Yes No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Use the final DT (ID3) to classify an unseen example
Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong
Output = No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Shortcomings of ID3
Information gain reduces the entropy due to the selection of a particular
attribute
Biasness in considering attributes with a large number of distinct values
which might lead to overfitting
Continues to go deeper and deeper (builds many branches) to reduce the
training error but results in an increased test error
Overfitting: Model fits on training data well but fails to generalize
Underfitting: Model is too simple to find the patterns in the data
CSC354 – Machine Learning Dr Muhammad Sharjeel
Improving ID3
Pruning is a mechanism that reduces the size and complexity of a DT by
removing unnecessary nodes
Pre-pruning, stops the tree construction bit early
Do not split a node if its goodness measure is below a threshold value
Post-pruning, once a DT is complete, cross-validation is performed to
test whether expanding a node makes an improvement
If it shows an improvement, continue expanding the node
If it shows a reduction in accuracy, node is converted to a leaf node
To overcome problems in information gain, the information gain ratio is
used (C4.5)
CSC354 – Machine Learning Dr Muhammad Sharjeel
C4.5 is the improved version of ID3
Create more generalized models
Works with continuous data
Could handle missing data
Avoids overfitting
Also known as J48 (C4.5 release 8)
Uses the information gain ratio as metric to split the dataset
Information gain (used in ID3) tends to prefer the attributes with more categories
Such attributes tends to have lower entropy
Results in overfitting
Gain ratio mitigates this issue by penalising attributes having more categories
It uses split information (or intrinsic information)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Information gain ratio
GainRatio(A) = Gain(A) / SplitInfo(A)
Split information
SplitInfo(A) = -∑ |Dj|/|D| . log2|Dj|/|D|
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Split information for Outlook attribute
Sunny = 5, Overcast = 4, Rain = 5
SplitInfo(Outlook) = - (5/14).log2(5/14) - (4/14).log2(4/14) - (5/14).log2(5/14) = 1.577
GainRatio(Outlook) = 0.247/1.577 = 0.156
Entropy of the whole dataset, Outlook attribute entropy, and information gain of Outlook already calculated
(ID3)
Entropy(S) = 0.940
IE[Outlook] = 0.693
IG(Outlook) = 0.940 – 0.693 = 0.247
CSC354 – Machine Learning Dr Muhammad Sharjeel
Gain ratio for Temperature attribute
Hot = 4, Mild = 6, Cool = 4
SplitInfo(Temperature) = - (4/14).log2(4/14) - (6/14).log2(6/14) - (4/14).log2(4/14) = 1.556
GainRatio(Temperature) = 0.029/1.556 = 0.018
Gain ratio for Humidity attribute
High = 7, Normal = 7
SplitInfo(Humidity) = - (7/14).log2(7/14) - (7/14).log2(7/14) = 1
GainRatio(Humidity) = 0.152/1 = 0.152
Gain ratio for Wind attribute
Weak = 8, Strong = 6
SplitInfo(Wind) = - (8/14).log2(8/14) - (6/14).log2(6/14) = 0.985
GainRatio(Wind) = 0.048/0.985 = 0.048
CSC354 – Machine Learning Dr Muhammad Sharjeel
Gain ratio of Outlook is the highest, so it will be the root node
Outlook
Yes
Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
GainRatio(Temperature) = 0.571/1.521 = 0.375
GainRatio(Humidity) = 0.971/0.971 = 1
GainRatio(Wind) = 0.020/0.971 = 0.233
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No
GainRatio(Temperature) = 0.020/0.971 = 0.020
GainRatio(Humidity) = 0.020/0.971 = 0.020
GainRatio(Wind) = 0.971/0.971 = 1
CSC354 – Machine Learning Dr Muhammad Sharjeel
Final DT using C4.5
Outlook
Sunny Overcast Rain
Humidity Yes Wind
Normal High Weak Strong
Yes No Yes No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Use the final DT (C4.5) to classify an unseen example
Outlook = Rain, Temperature = Cool, Humidity = High, Wind = Weak
Output = Yes
CSC354 – Machine Learning Dr Muhammad Sharjeel
Some drawback of C4.5
Split ratio is higher for multi-valued attributes (more outcomes)
Tends to prefer unbalanced splits in which one partition is much smaller than others
Classification And Regression Tree (CART) uses gini index as metric
If a dataset D contains examples from n classes, gini index is defined as
Gini(D) = 1 – Σ (Pi)2 for i=1 to n (number of classes)
It creates a binary tree
If there are more than two outcomes of an attribute then gini index is
GiniA(D) = (D1/D).Gini(D1) + (D2/D).Gini(D2)
Reduction in impurity
Gini(A) = Gini(D) – GiniA(D)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Total 14 examples, 9 positive, 5 negative
Gini(D) = 1 – ((9/14)2 + (5/14)2) = 0.459
Compute gini index of each attribute
Start with Outlook (Sunny, Overcast, Rain)
Attribute has three values, it will have 6 subsets
{(Sunny, Overcast), (Overcast, Rain), (Sunny, Rain), (Sunny), (Overcast), (Rain)}
Empty and All subsets are not used
Gini(S,O), R = (9/14) x [1 – ((6/9)2 + (3/9)2)] + (5/14) x [1 – ((3/5)2 + (2/5)2)] = 0.457
Gini(O,R), S = (9/14) x [1 – ((7/9)2 + (2/9)2)] + (5/14) x [1 – ((2/5)2 + (3/5)2)] = 0.393
Gini(S,R), O = (10/14) x [1 – ((5/10)2 + (5/10)2)] + (4/14) x [1 – ((4/4)2 + (0/4)2)] = 0.357
Gini(A) = 0.459 – 0.357 = 0.101
CSC354 – Machine Learning Dr Muhammad Sharjeel
Temperature (Hot, Mild, Cool)
Attribute has three values, it will have 6 subsets
{(Hot, Mild), (Hot, Cool), (Mild, Cool), (Hot), (Mild), (Cool)}
Gini(H,M), C = (10/14) x [1 – ((6/10)2 + (4/10)2)] + (4/14) x [1 – ((3/4)2 + (1/4)2)] = 0.450
Gini(H,C), M = (8/14) x [1 – ((5/8)2 + (3/8)2)] + (6/14) x [1 – ((4/6)2 + (2/6)2)] = 0.458
Gini(M,C), H = (10/14) x [1 – ((7/10)2 + (3/10)2)] + (4/14) x [1 – ((2/4)2 + (2/4)2)] = 0.442
Gini(A) = 0.459 – 0.442 = 0.016
CSC354 – Machine Learning Dr Muhammad Sharjeel
Humidity (High, Normal)
Attribute has only two values
GiniH, N = (7/14) x [1 – ((6/7)2 + (1/7)2)] + (7/14) x [1 – ((3/7)2 + (4/7)2)] = 0.367
Gini(A) = 0.459 – 0.367 = 0.091
Wind (Weak, Strong)
Attribute has only two values
GiniW, S = (8/14) x [1 – ((6/8)2 + (2/8)2)] + (6/14) x [1 – ((3/6)2 + (3/6)2)] = 0.428
Gini(A) = 0.459 – 0.428 = 0.030
Attribute with the highest gini index is Outlook, hence, it will be chosen as root node
Within the Outlook, [(Sunny, Rain), Overcast] [Gini(S,R), O] has the lowest gini index
CSC354 – Machine Learning Dr Muhammad Sharjeel
Partial DT using CART
Outlook
Sunny, Rain Overcast
Outlook Temperature Humidity Wind PlayGolf
Yes
Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Calculate the gini index for the following subset Outlook (Sunny, Rain)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Information Gain: biased toward high branching features
Gain Ratio: Prefers splits with some partitions being much smaller than the others
Gini Index: Balanced around 0.5
CSC354 – Machine Learning Dr Muhammad Sharjeel
C4.5 with the continues (numeric) data
Example dataset, 14 instances, 4 input attributes, 2 attributes with continuous data
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Weak Yes
4 Rain 70 96 Weak Yes
5 Rain 68 80 Weak Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rain 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rain 71 80 Strong No
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook and Wind are nominal attributes
Gain ratio for Wind = 0.048
Gain ratio for Outlook = 0.156
Humidity and Temperature are continuous attributes
Convert continuous values to nominal ones
Perform binary split based on a threshold value
Threshold should be a value which offers maximum gain for an attribute
CSC354 – Machine Learning Dr Muhammad Sharjeel
Separate dataset into two parts
Instances less than or equal to (<=)
Instances greater than (>=)
How?
Sort the attribute values in ascending order
Calculate gain ratio for every value
Value which maximizes the gain would be the threshold (separator)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Sort the Humidity values smallest to largest
Humidity PlayGolf
65 Yes
70 No
70 Yes
70 Yes
75 Yes
78 Yes
80 Yes
80 Yes
80 No
85 No
90 No
90 Yes
95 No
96 Yes
CSC354 – Machine Learning Dr Muhammad Sharjeel
Humidity (65)
Entropy(Humidity<=65) = -(0/1).log2(0/1) – (1/1).log2(1/1) = 0
Entropy(Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) = 0.961
Gain(Humidity<=,> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048
SplitInfo(Humidity<=,> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371
GainRatio(Humidity<=,> 65) = 0.126
Humidity (70)
Entropy(Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811
Entropy(Humidity>70) = – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970
Gain(Humidity<=,> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.014
SplitInfo(Humidity<=,> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863
GainRatio(Humidity<=,> 70) = 0.016
CSC354 – Machine Learning Dr Muhammad Sharjeel
GainRatio(Humidity<=,> 75) = 0.047
GainRatio(Humidity <=,> 78) = 0.090
GainRatio(Humidity <=,> 80) = 0.107
GainRatio(Humidity <=,> 85) = 0.027
GainRatio(Humidity <=,> 90) = 0.016
GainRatio(Humidity <=,> 95) = 0.128
No calculation of gain ratio for Humidity (96) because it cannot be greater than
this value
Gain is maximum when threshold is equal to Humidity (80)
CSC354 – Machine Learning Dr Muhammad Sharjeel
Apply the same process on Temperature as its values are continuous too
Gain is maximum when Temperature (80)
GainRatio(Temperature<=, > 83) = 0.305
Gain ratio for all the attributes is summarized in the following table
Attribute GainRatio
Wind 0.049
Outlook 0.155
Humidity <=, > 0.107
Temperature <=, > 0.305
Temperature will be the root node as it has the highest gain ratio value
Can you build the complete DT?
CSC354 – Machine Learning Dr Muhammad Sharjeel
DTs famous implementation models
CHAID = 1980
CART = 1984
ID3 = 1986
C4.5 = 1993
CSC354 – Machine Learning Dr Muhammad Sharjeel
CHAID (CHi-square Automatic Interaction Detection)
Uses chi-square tests to find the most dominant feature
Check if there is a relationship between two variables and chooses the independent
variable that has the strongest interaction with the dependent variable
√((y – y’)2 / y’) where y is actual and y’ is expected value
CSC354 – Machine Learning Dr Muhammad Sharjeel
How to construct a DT using CHAID?
Find the most dominant feature in the dataset
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook
3 possible values (Sunny, Rain, and Overcast)
2 decisions (Yes and No)
Chi-square (yes) - (sunny) - (outlook) = √((2 – 2.5)2 / 2.5) = 0.316
Yes No Total Expected Chi-square (Yes) Chi-square (No)
Sunny 2 3 5 2.5 0.316 0.316
Rain 4 0 4 2 1.414 1.414
Overcast 3 2 5 2.5 0.316 0.316
Chi-square (outlook) = 0.316+0.316+1.414+1.414+0.316+0.316 = 4.092
CSC354 – Machine Learning Dr Muhammad Sharjeel
Outlook = 0.316+0.316+1.414+1.414+0.316+0.316 = 4.092
Temperature = 0 + 0 + 0.577 + 0.577 + 0.707 + 0.707 = 2.569
Humidity = 0.267 + 0.267 + 1.336 + 1.336 = 3.207
Wind = 0.802 + 0.802 + 0 + 0 = 1.604
Outlook has the highest chi-square value (most significant feature) and will be
the root node
Can you build the complete DT?
CSC354 – Machine Learning Dr Muhammad Sharjeel
How to construct a DT when the output attribute is a numeric value?
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30
CSC354 – Machine Learning Dr Muhammad Sharjeel
How to construct a DT when the output attribute is a numeric value?
Regression problems are solved by using the metric ‘standard deviation’
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30
CSC354 – Machine Learning Dr Muhammad Sharjeel
Regression problems are solved by using the metric ‘standard deviation’
Hours_Played = {25, 30, 46, 45, 52, 23, 43, 35, 38, 46, 48, 52, 44, 30}
Average= 39.78
Standard deviation = 9.32
Outlook
Overcast = 3.49
Rain = 10.87
Sunny = 7.78
Weighted SD (Outlook) = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66
SD reduction (Outlook) = 9.32 – 7.66 = 1.66
CSC354 – Machine Learning Dr Muhammad Sharjeel
Regression problems are solved by using the metric ‘standard deviation’
SD reduction (Outlook) = 9.32 – 7.66 = 1.66
SD reduction (Temperature) = 9.32 – 8.84 = 0.47
SD reduction (Humidity) = 9.32 – 9.04 = 0.27
SD reduction (Wind) = 9.32 – 9.03 = 0.29
Outlook will be the root node as it has the highest SD reduction value
Can you build the complete DT?
CSC354 – Machine Learning Dr Muhammad Sharjeel
Thanks