0% found this document useful (0 votes)

32 views65 pages

Lec 3

The K-Means algorithm is an unsupervised learning algorithm used for clustering. It aims to partition a set of data points into k clusters by minimizing the variance within each cluster. The algorithm works by assigning each data point to the nearest cluster centroid and then recalculating the centroid based on the mean of all points within the cluster. This process is repeated until convergence. K-Means is widely used in customer segmentation and pattern recognition tasks.

Uploaded by

Mohammad Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views65 pages

Lec 3

Uploaded by

Mohammad Ahmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CSC354

Machine Learning
Dr Muhammad Sharjeel
03
Decision Trees
 General motive of Decision Tree (DT) is to create a training model which can
predict class (or value) of target variables by learning decision rules inferred from
prior data (training data)
 In a DT, each node represents a feature (attribute), each link (branch) a decision
(rule) and each leaf an outcome

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Belongs to the family of supervised learning algorithms
 Could be used to solve both classification and regression problems
 Transparent algorithm, means decisions can be read and understood

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Algorithm pseudocode
1. Place the best attribute of the dataset (complete training set) at the root of
the tree
2. Split the training set into subsets in such a way that each subset contains
data with the same value for an attribute
3. Repeat step 1 and step 2 on each subset until you find leaf nodes in all the
branches of the tree

CSC354 – Machine Learning Dr Muhammad Sharjeel

 To create DT
 Shortlist a root node among all the nodes (nodes are ‘features/attributes’ in the dataset)
 Determine a node (attribute) that best classifies the training data and use it as the root
 Repeat the process for each branch

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Three implementations used to create DTs
 ID3
 C4.5
 CART

CSC354 – Machine Learning Dr Muhammad Sharjeel

 ID3 (Iterative Dichotomiser), uses information gain as metric
 Dichotomisation means dividing something into two completely opposite things
 ID3 iteratively divides attributes into two groups (dominant vs others) to construct
a tree
 Dominant attributes are selected based on information gain
 Performs top-down, greedy search through the space of possible decision trees
 Top-down means it starts building the tree from the top
 Greedy means at each iteration it selects the best feature at the present moment to
create a node

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Which attribute (node) best classifies the training data?
 Most dominant attribute would be the one with the highest information gain
 Information gain calculates the reduction in the entropy
 Entropy (uncertainty) of a dataset is the measure of disorder in the target attribute
 Entropy measures
 How well a given attribute/feature separates (or classifies) the target classes
 Attribute with the highest information gain is selected as the best one

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Entropy is the measurement of the impurity or randomness in the values of the
dataset
 A low disorder (no disorder) implies a low level of impurity
 Values between 0 and 1. A ‘1’ signifies a higher level of disorder or more impurity

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Formulae to calculate Entropy and Information Gain
 Entropy (S) = ∑ – p(I) . log2p(I)
 Gain (S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Compute the entropy [Entropy(S)] for the entire dataset
 For each attribute/feature:
 Calculate entropy [Entropy(A)] for each value of the attribute
 Calculate average information entropy (IE) for the attribute
 Calculate information gain (IG) for the attribute
 Pick the highest gain attribute
 Repeat until the complete tree is formed

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Example dataset, 14 instances, 4 input attributes
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny Hot High Weak No
2 Sunny Hot High Strong No
3 Overcast Hot High Weak Yes
4 Rain Mild High Weak Yes
5 Rain Cool Normal Weak Yes
6 Rain Cool Normal Strong No
7 Overcast Cool Normal Strong Yes
8 Sunny Mild High Weak No
9 Sunny Cool Normal Weak Yes
10 Rain Mild Normal Weak Yes
11 Sunny Mild Normal Strong Yes
12 Overcast Mild High Strong Yes
13 Overcast Hot Normal Weak Yes
14 Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Compute the entropy [Entropy(S)] for the entire dataset
 Entropy(S) = – p(Yes) . log2p(Yes) – p(No) . log2p(No)
 Entropy(S) = – (9/14) . log2(9/14) – (5/14) . log2(5/14) = 0.940

CSC354 – Machine Learning Dr Muhammad Sharjeel

 For each attribute/feature: (let say, Outlook)
 Calculate entropy [Entropy(A)] for each value of the attribute, i.e., in case of Outlook,
'Sunny', 'Rainy’, 'Overcast'

Outlook PlayGolf Outlook PlayGolf Outlook PlayGolf

Sunny No Rain Yes Overcast Yes
Sunny No Rain Yes Overcast Yes
Sunny No Rain No Overcast Yes
Sunny Yes Rain Yes Overcast Yes
Sunny Yes Rain No

Calculations for Outlook (Sunny)

Outlook Positive Negative Entropy
Sunny 2 3 0.971 -(2/5).log2(2/5) - (3/5).log2(3/5)
-(0.4).(-1.322)- (0.6).(-0.737)
Rainy 3 2 0.971
0.5288 + .4422
Overcast 4 0 0 = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel

 For each attribute/feature:
 Calculate average information entropy (IE) for the attribute (i.e., Outlook)
 IE(Outlook) = (2+3/9+5)*0.971 + (3+2/9+5)*0.971 + (4+0/9+5)*0
 IE(Outlook) = 0.693

 Calculate information gain (IG) for the attribute (i.e., Outlook)

 IG(Outlook) = 0.940 – 0.693 = 0.247

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Pick the highest gain attribute, in this case, Outlook

Attributes Gain
Outlook 0.247

Temperature 0.029
Outlook
Humidity 0.152

Wind 0.048

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Overcast) only contains examples of ‘Yes’
 Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples

Outlook

Sunny Overcast Rain

? Yes ?

 Repeat until the complete tree is formed

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Overcast) only contains examples of ‘Yes’
 Outlook (Sunny, Rain) contains both ‘Yes’ and ‘No’ examples

Outlook

Yes

Outlook Temperature Humidity Wind PlayGolf Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No Rain Mild High Weak Yes
Sunny Hot High Strong No Rain Cool Normal Weak Yes
Sunny Mild High Weak No Rain Cool Normal Strong No
Sunny Cool Normal Weak Yes Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 Entropy(S) = 0.971
 Entropy(A)[Temperature](Cool) = 0
 Entropy(A)[Temperature](Hot) = 0
 Entropy(A)[Temperature](Mild) = 1
 IE(Temperature) = 0.400
 IG(Temperature) = 0.571

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 Entropy(S) = 0.971
 Entropy(A)[Humidity](High) = 0
 Entropy(A)[Humidity](Normal) = 0
 IE(Humidity) = 0
 IG(Humidity) = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 Entropy(S) = 0.971
 Entropy(A)[Wind](Strong) = 1
 Entropy(A)[Wind](Weak) = 0.918
 IE(Wind) = 0.951
 IG(Wind) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Pick the highest gain attribute, in this case, Humidity

Outlook

Sunny Overcast Rain

Humidity Yes ?

Normal High

Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 Entropy(S) = 0.971
 Entropy(A)[Temperature](Cool) = 1
 Entropy(A)[Temperature](Mild) = 0.918
 IE(Temperature) = 0.951
 IG(Temperature) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 Entropy(S) = 0.971
 Entropy(A)[Humidity](High) = 1
 Entropy(A)[Humidity](Normal) = 0.918
 IE(Humidity) = 0.951
 IG(Humidity) = 0.020

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 Entropy(S) = 0.971
 Entropy(A)[Wind](Weak) = 0
 Entropy(A)[Wind](Strong) = 0
 IE(Humidity) = 0
 IG(Humidity) = 0.971

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Pick the highest gain attribute, in this case, Wind

Outlook

Sunny Overcast Rain

Humidity Yes Wind

Normal High Weak Strong

Yes No Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Use the final DT (ID3) to classify an unseen example
 Outlook = Sunny, Temperature = Cool, Humidity = High, Wind = Strong
 Output = No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Shortcomings of ID3
 Information gain reduces the entropy due to the selection of a particular
attribute
 Biasness in considering attributes with a large number of distinct values
which might lead to overfitting
 Continues to go deeper and deeper (builds many branches) to reduce the
training error but results in an increased test error
 Overfitting: Model fits on training data well but fails to generalize
 Underfitting: Model is too simple to find the patterns in the data

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Improving ID3
 Pruning is a mechanism that reduces the size and complexity of a DT by
removing unnecessary nodes
 Pre-pruning, stops the tree construction bit early
 Do not split a node if its goodness measure is below a threshold value
 Post-pruning, once a DT is complete, cross-validation is performed to
test whether expanding a node makes an improvement
 If it shows an improvement, continue expanding the node
 If it shows a reduction in accuracy, node is converted to a leaf node
 To overcome problems in information gain, the information gain ratio is
used (C4.5)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 C4.5 is the improved version of ID3
 Create more generalized models
 Works with continuous data
 Could handle missing data
 Avoids overfitting
 Also known as J48 (C4.5 release 8)
 Uses the information gain ratio as metric to split the dataset
 Information gain (used in ID3) tends to prefer the attributes with more categories
 Such attributes tends to have lower entropy
 Results in overfitting
 Gain ratio mitigates this issue by penalising attributes having more categories
 It uses split information (or intrinsic information)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Information gain ratio
 GainRatio(A) = Gain(A) / SplitInfo(A)
 Split information
 SplitInfo(A) = -∑ |Dj|/|D| . log2|Dj|/|D|

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Split information for Outlook attribute
 Sunny = 5, Overcast = 4, Rain = 5
 SplitInfo(Outlook) = - (5/14).log2(5/14) - (4/14).log2(4/14) - (5/14).log2(5/14) = 1.577
 GainRatio(Outlook) = 0.247/1.577 = 0.156

 Entropy of the whole dataset, Outlook attribute entropy, and information gain of Outlook already calculated
(ID3)
 Entropy(S) = 0.940
 IE[Outlook] = 0.693
 IG(Outlook) = 0.940 – 0.693 = 0.247

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Gain ratio for Temperature attribute
 Hot = 4, Mild = 6, Cool = 4
 SplitInfo(Temperature) = - (4/14).log2(4/14) - (6/14).log2(6/14) - (4/14).log2(4/14) = 1.556
 GainRatio(Temperature) = 0.029/1.556 = 0.018
 Gain ratio for Humidity attribute
 High = 7, Normal = 7
 SplitInfo(Humidity) = - (7/14).log2(7/14) - (7/14).log2(7/14) = 1
 GainRatio(Humidity) = 0.152/1 = 0.152
 Gain ratio for Wind attribute
 Weak = 8, Strong = 6
 SplitInfo(Wind) = - (8/14).log2(8/14) - (6/14).log2(6/14) = 0.985
 GainRatio(Wind) = 0.048/0.985 = 0.048

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Gain ratio of Outlook is the highest, so it will be the root node

Outlook

Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Sunny)
Outlook Temperature Humidity Wind PlayGolf
Sunny Hot High Weak No
Sunny Hot High Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes

 GainRatio(Temperature) = 0.571/1.521 = 0.375

 GainRatio(Humidity) = 0.971/0.971 = 1
 GainRatio(Wind) = 0.020/0.971 = 0.233

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook (Rain)
Outlook Temperature Humidity Wind PlayGolf
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No

 GainRatio(Temperature) = 0.020/0.971 = 0.020

 GainRatio(Humidity) = 0.020/0.971 = 0.020
 GainRatio(Wind) = 0.971/0.971 = 1

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Final DT using C4.5

Outlook

Sunny Overcast Rain

Humidity Yes Wind

Normal High Weak Strong

Yes No Yes No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Use the final DT (C4.5) to classify an unseen example
 Outlook = Rain, Temperature = Cool, Humidity = High, Wind = Weak
 Output = Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Some drawback of C4.5
 Split ratio is higher for multi-valued attributes (more outcomes)
 Tends to prefer unbalanced splits in which one partition is much smaller than others
 Classification And Regression Tree (CART) uses gini index as metric
 If a dataset D contains examples from n classes, gini index is defined as
 Gini(D) = 1 – Σ (Pi)2 for i=1 to n (number of classes)
 It creates a binary tree
 If there are more than two outcomes of an attribute then gini index is
 GiniA(D) = (D1/D).Gini(D1) + (D2/D).Gini(D2)
 Reduction in impurity
 Gini(A) = Gini(D) – GiniA(D)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Total 14 examples, 9 positive, 5 negative
 Gini(D) = 1 – ((9/14)2 + (5/14)2) = 0.459
 Compute gini index of each attribute
 Start with Outlook (Sunny, Overcast, Rain)
 Attribute has three values, it will have 6 subsets
 {(Sunny, Overcast), (Overcast, Rain), (Sunny, Rain), (Sunny), (Overcast), (Rain)}
 Empty and All subsets are not used
 Gini(S,O), R = (9/14) x [1 – ((6/9)2 + (3/9)2)] + (5/14) x [1 – ((3/5)2 + (2/5)2)] = 0.457
 Gini(O,R), S = (9/14) x [1 – ((7/9)2 + (2/9)2)] + (5/14) x [1 – ((2/5)2 + (3/5)2)] = 0.393
 Gini(S,R), O = (10/14) x [1 – ((5/10)2 + (5/10)2)] + (4/14) x [1 – ((4/4)2 + (0/4)2)] = 0.357
 Gini(A) = 0.459 – 0.357 = 0.101

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Temperature (Hot, Mild, Cool)
 Attribute has three values, it will have 6 subsets
 {(Hot, Mild), (Hot, Cool), (Mild, Cool), (Hot), (Mild), (Cool)}
 Gini(H,M), C = (10/14) x [1 – ((6/10)2 + (4/10)2)] + (4/14) x [1 – ((3/4)2 + (1/4)2)] = 0.450
 Gini(H,C), M = (8/14) x [1 – ((5/8)2 + (3/8)2)] + (6/14) x [1 – ((4/6)2 + (2/6)2)] = 0.458
 Gini(M,C), H = (10/14) x [1 – ((7/10)2 + (3/10)2)] + (4/14) x [1 – ((2/4)2 + (2/4)2)] = 0.442
 Gini(A) = 0.459 – 0.442 = 0.016

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Humidity (High, Normal)
 Attribute has only two values
 GiniH, N = (7/14) x [1 – ((6/7)2 + (1/7)2)] + (7/14) x [1 – ((3/7)2 + (4/7)2)] = 0.367
 Gini(A) = 0.459 – 0.367 = 0.091

 Wind (Weak, Strong)

 Attribute has only two values
 GiniW, S = (8/14) x [1 – ((6/8)2 + (2/8)2)] + (6/14) x [1 – ((3/6)2 + (3/6)2)] = 0.428
 Gini(A) = 0.459 – 0.428 = 0.030

 Attribute with the highest gini index is Outlook, hence, it will be chosen as root node
 Within the Outlook, [(Sunny, Rain), Overcast] [Gini(S,R), O] has the lowest gini index

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Partial DT using CART
Outlook

Sunny, Rain Overcast

Outlook Temperature Humidity Wind PlayGolf

Yes
Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Calculate the gini index for the following subset Outlook (Sunny, Rain)

Outlook Temperature Humidity Wind PlayGolf

Sunny Hot High Weak No
Sunny Hot High Strong No
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Rain Mild High Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Information Gain: biased toward high branching features
 Gain Ratio: Prefers splits with some partitions being much smaller than the others
 Gini Index: Balanced around 0.5

CSC354 – Machine Learning Dr Muhammad Sharjeel

 C4.5 with the continues (numeric) data
 Example dataset, 14 instances, 4 input attributes, 2 attributes with continuous data
No. Outlook Temperature Humidity Wind PlayGolf
1 Sunny 85 85 Weak No
2 Sunny 80 90 Strong No
3 Overcast 83 78 Weak Yes
4 Rain 70 96 Weak Yes
5 Rain 68 80 Weak Yes
6 Rain 65 70 Strong No
7 Overcast 64 65 Strong Yes
8 Sunny 72 95 Weak No
9 Sunny 69 70 Weak Yes
10 Rain 75 80 Weak Yes
11 Sunny 75 70 Strong Yes
12 Overcast 72 90 Strong Yes
13 Overcast 81 75 Weak Yes
14 Rain 71 80 Strong No

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook and Wind are nominal attributes
 Gain ratio for Wind = 0.048
 Gain ratio for Outlook = 0.156
 Humidity and Temperature are continuous attributes
 Convert continuous values to nominal ones
 Perform binary split based on a threshold value
 Threshold should be a value which offers maximum gain for an attribute

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Separate dataset into two parts
 Instances less than or equal to (<=)
 Instances greater than (>=)
 How?
 Sort the attribute values in ascending order
 Calculate gain ratio for every value
 Value which maximizes the gain would be the threshold (separator)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Sort the Humidity values smallest to largest
Humidity PlayGolf

65 Yes
70 No
70 Yes
70 Yes
75 Yes
78 Yes
80 Yes
80 Yes
80 No
85 No
90 No
90 Yes
95 No
96 Yes

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Humidity (65)
 Entropy(Humidity<=65) = -(0/1).log2(0/1) – (1/1).log2(1/1) = 0
 Entropy(Humidity>65) = -(5/13).log2(5/13) – (8/13).log2(8/13) = 0.961
 Gain(Humidity<=,> 65) = 0.940 – (1/14).0 – (13/14).(0.961) = 0.048
 SplitInfo(Humidity<=,> 65) = -(1/14).log2(1/14) -(13/14).log2(13/14) = 0.371
 GainRatio(Humidity<=,> 65) = 0.126
 Humidity (70)
 Entropy(Humidity<=70) = – (1/4).log2(1/4) – (3/4).log2(3/4) = 0.811
 Entropy(Humidity>70) = – (4/10).log2(4/10) – (6/10).log2(6/10) = 0.970
 Gain(Humidity<=,> 70) = 0.940 – (4/14).(0.811) – (10/14).(0.970) = 0.014
 SplitInfo(Humidity<=,> 70) = -(4/14).log2(4/14) -(10/14).log2(10/14) = 0.863
 GainRatio(Humidity<=,> 70) = 0.016

CSC354 – Machine Learning Dr Muhammad Sharjeel

 GainRatio(Humidity<=,> 75) = 0.047
 GainRatio(Humidity <=,> 78) = 0.090
 GainRatio(Humidity <=,> 80) = 0.107
 GainRatio(Humidity <=,> 85) = 0.027
 GainRatio(Humidity <=,> 90) = 0.016
 GainRatio(Humidity <=,> 95) = 0.128

 No calculation of gain ratio for Humidity (96) because it cannot be greater than
this value
 Gain is maximum when threshold is equal to Humidity (80)

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Apply the same process on Temperature as its values are continuous too
 Gain is maximum when Temperature (80)
 GainRatio(Temperature<=, > 83) = 0.305
 Gain ratio for all the attributes is summarized in the following table
Attribute GainRatio
Wind 0.049
Outlook 0.155
Humidity <=, > 0.107
Temperature <=, > 0.305

 Temperature will be the root node as it has the highest gain ratio value
 Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 DTs famous implementation models
 CHAID = 1980
 CART = 1984
 ID3 = 1986
 C4.5 = 1993

CSC354 – Machine Learning Dr Muhammad Sharjeel

 CHAID (CHi-square Automatic Interaction Detection)
 Uses chi-square tests to find the most dominant feature
 Check if there is a relationship between two variables and chooses the independent
variable that has the strongest interaction with the dependent variable
 √((y – y’)2 / y’) where y is actual and y’ is expected value

CSC354 – Machine Learning Dr Muhammad Sharjeel

 How to construct a DT using CHAID?
 Find the most dominant feature in the dataset
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook
 3 possible values (Sunny, Rain, and Overcast)
 2 decisions (Yes and No)
 Chi-square (yes) - (sunny) - (outlook) = √((2 – 2.5)2 / 2.5) = 0.316

Yes No Total Expected Chi-square (Yes) Chi-square (No)

Sunny 2 3 5 2.5 0.316 0.316

Rain 4 0 4 2 1.414 1.414

Overcast 3 2 5 2.5 0.316 0.316

 Chi-square (outlook) = 0.316+0.316+1.414+1.414+0.316+0.316 = 4.092

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Outlook = 0.316+0.316+1.414+1.414+0.316+0.316 = 4.092
 Temperature = 0 + 0 + 0.577 + 0.577 + 0.707 + 0.707 = 2.569
 Humidity = 0.267 + 0.267 + 1.336 + 1.336 = 3.207
 Wind = 0.802 + 0.802 + 0 + 0 = 1.604

 Outlook has the highest chi-square value (most significant feature) and will be
the root node
 Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel

 How to construct a DT when the output attribute is a numeric value?

No. Outlook Temperature Humidity Wind Hour-Played

1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30

CSC354 – Machine Learning Dr Muhammad Sharjeel

 How to construct a DT when the output attribute is a numeric value?
 Regression problems are solved by using the metric ‘standard deviation’
No. Outlook Temperature Humidity Wind Hour-Played
1 Sunny Hot High Weak 25
2 Sunny Hot High Strong 30
3 Overcast Hot High Weak 46
4 Rain Mild High Weak 45
5 Rain Cool Normal Weak 52
6 Rain Cool Normal Strong 23
7 Overcast Cool Normal Strong 43
8 Sunny Mild High Weak 35
9 Sunny Cool Normal Weak 38
10 Rain Mild Normal Weak 46
11 Sunny Mild Normal Strong 48
12 Overcast Mild High Strong 52
13 Overcast Hot Normal Weak 44
14 Rain Mild High Strong 30

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Regression problems are solved by using the metric ‘standard deviation’
 Hours_Played = {25, 30, 46, 45, 52, 23, 43, 35, 38, 46, 48, 52, 44, 30}
 Average= 39.78
 Standard deviation = 9.32

 Outlook
 Overcast = 3.49
 Rain = 10.87
 Sunny = 7.78
 Weighted SD (Outlook) = (4/14)x3.49 + (5/14)x10.87 + (5/14)x7.78 = 7.66
 SD reduction (Outlook) = 9.32 – 7.66 = 1.66

CSC354 – Machine Learning Dr Muhammad Sharjeel

 Regression problems are solved by using the metric ‘standard deviation’
 SD reduction (Outlook) = 9.32 – 7.66 = 1.66
 SD reduction (Temperature) = 9.32 – 8.84 = 0.47
 SD reduction (Humidity) = 9.32 – 9.04 = 0.27
 SD reduction (Wind) = 9.32 – 9.03 = 0.29

 Outlook will be the root node as it has the highest SD reduction value
 Can you build the complete DT?

CSC354 – Machine Learning Dr Muhammad Sharjeel

Thanks

Decision Trees in Machine Learning
No ratings yet
Decision Trees in Machine Learning
52 pages
Decision Trees
No ratings yet
Decision Trees
20 pages
ID3 Algorithm and Decision Trees Explained
No ratings yet
ID3 Algorithm and Decision Trees Explained
17 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
20 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
42 pages
Machine Learning: Professor Department of Computer Science & Engineering
No ratings yet
Machine Learning: Professor Department of Computer Science & Engineering
45 pages
Decision Tree Learning in Python
No ratings yet
Decision Tree Learning in Python
20 pages
ML Unit-2 Material
No ratings yet
ML Unit-2 Material
20 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
21 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
42 pages
Understanding Decision Trees in Classification
100% (1)
Understanding Decision Trees in Classification
58 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
37 pages
Decision Tree Learning in Python
No ratings yet
Decision Tree Learning in Python
20 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Tree Models
No ratings yet
Tree Models
42 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
ID3 Algorithm
No ratings yet
ID3 Algorithm
11 pages
Decision Trees in Applied Machine Learning
No ratings yet
Decision Trees in Applied Machine Learning
85 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
14 pages
Decision Tree Learning in Machine Learning
No ratings yet
Decision Tree Learning in Machine Learning
10 pages
Decision Trees in Data Science
No ratings yet
Decision Trees in Data Science
61 pages
Machine Learning: Decision Trees Overview
No ratings yet
Machine Learning: Decision Trees Overview
41 pages
Decision Tree Learning in AI
No ratings yet
Decision Tree Learning in AI
43 pages
Ai Mod3@Azdocuments - in
No ratings yet
Ai Mod3@Azdocuments - in
42 pages
ID3 Algorithm for Decision Trees
No ratings yet
ID3 Algorithm for Decision Trees
7 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
21 pages
Module 2 Notes
No ratings yet
Module 2 Notes
20 pages
Decision Tree Classifier & ID3 Guide
No ratings yet
Decision Tree Classifier & ID3 Guide
34 pages
ID3 Algorithm Decision Tree Program
No ratings yet
ID3 Algorithm Decision Tree Program
6 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
75 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
53 pages
T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
Unit 2
100% (1)
Unit 2
42 pages
Chapter#03 Supervised Learning and Its Algorithms - III
No ratings yet
Chapter#03 Supervised Learning and Its Algorithms - III
29 pages
Decision Tree ID3 Algorithm - Machine Learning - by AshirbadPradhan - Medium
No ratings yet
Decision Tree ID3 Algorithm - Machine Learning - by AshirbadPradhan - Medium
18 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
49 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
15 pages
Decision Trees and ID3 Algorithm Overview
No ratings yet
Decision Trees and ID3 Algorithm Overview
15 pages
Decision Tree Classifier Overview
No ratings yet
Decision Tree Classifier Overview
38 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Decision Tree Id3 Problem
No ratings yet
Decision Tree Id3 Problem
5 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
28 pages
ID3 Algorithm Decision Tree Example
No ratings yet
ID3 Algorithm Decision Tree Example
6 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
79 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
83 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Decision Tree Learning Explained
No ratings yet
Decision Tree Learning Explained
46 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
29 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Classification - Issues Regarding Classification and Prediction
No ratings yet
Classification - Issues Regarding Classification and Prediction
42 pages
Understanding Decision Trees in AI
No ratings yet
Understanding Decision Trees in AI
31 pages
Decision Trees and ID3 Algorithm Guide
No ratings yet
Decision Trees and ID3 Algorithm Guide
16 pages
Decision Trees for CS Students
No ratings yet
Decision Trees for CS Students
54 pages
ID3 Decision Trees for ML Beginners
No ratings yet
ID3 Decision Trees for ML Beginners
7 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Supervised Learning and Decision Trees
No ratings yet
Supervised Learning and Decision Trees
84 pages
Overview of Computer Networks and Communication
No ratings yet
Overview of Computer Networks and Communication
50 pages
Lecture 12
No ratings yet
Lecture 12
73 pages
Lecture 18
No ratings yet
Lecture 18
62 pages
Lecture 17
No ratings yet
Lecture 17
49 pages
Understanding CPU Memory & Cache
No ratings yet
Understanding CPU Memory & Cache
41 pages
Understanding Unicode and Text Codes
No ratings yet
Understanding Unicode and Text Codes
44 pages
Overview of Computer Storage Devices
No ratings yet
Overview of Computer Storage Devices
45 pages
Understanding Operating Systems Functions
No ratings yet
Understanding Operating Systems Functions
43 pages
Data Representation in Computing
No ratings yet
Data Representation in Computing
37 pages
Overview of Printer Types and Technologies
No ratings yet
Overview of Printer Types and Technologies
40 pages
Storage Devices Overview and Comparison
No ratings yet
Storage Devices Overview and Comparison
56 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
52 pages
Understanding Computer Memory Types
No ratings yet
Understanding Computer Memory Types
44 pages
Understanding Computer Input Devices
No ratings yet
Understanding Computer Input Devices
40 pages
Ergonomics and Input Devices Overview
No ratings yet
Ergonomics and Input Devices Overview
40 pages
Understanding Computer Monitors and Output Devices
No ratings yet
Understanding Computer Monitors and Output Devices
39 pages
Lecture 3
No ratings yet
Lecture 3
43 pages
Lecture 1
No ratings yet
Lecture 1
53 pages
Bringing the Nation's Husband Home
No ratings yet
Bringing the Nation's Husband Home
132 pages
SW 01 User Manual Print NEW
No ratings yet
SW 01 User Manual Print NEW
6 pages
User-Centric ML Framework for Cybersecurity
No ratings yet
User-Centric ML Framework for Cybersecurity
11 pages
EZVIZ H4 2K Smart Security Camera
No ratings yet
EZVIZ H4 2K Smart Security Camera
9 pages
Maximo Custom Email Interaction 1722074621
No ratings yet
Maximo Custom Email Interaction 1722074621
12 pages
Opensoundmeter Manualv1.2.1
No ratings yet
Opensoundmeter Manualv1.2.1
61 pages
Acer Aspire E5-571G Performance
No ratings yet
Acer Aspire E5-571G Performance
4 pages
Pro Em235 PM335 - 0
100% (1)
Pro Em235 PM335 - 0
223 pages
PS3 To PS4 - FirmwareUpdateGuide - 202203
No ratings yet
PS3 To PS4 - FirmwareUpdateGuide - 202203
5 pages
Service Manual Scissor Lift.
No ratings yet
Service Manual Scissor Lift.
48 pages
IVI 1985 It Asset Management Suite
No ratings yet
IVI 1985 It Asset Management Suite
1 page
FNaF Timeline
No ratings yet
FNaF Timeline
9 pages
2.1. Representation of Curves: Geometric Modeling
No ratings yet
2.1. Representation of Curves: Geometric Modeling
88 pages
Kuwait Bank
No ratings yet
Kuwait Bank
2 pages
MKTG Atlantic Computer Case Study Assignment
No ratings yet
MKTG Atlantic Computer Case Study Assignment
5 pages
Stock Inventory Control Overview
No ratings yet
Stock Inventory Control Overview
7 pages
Mid-term Exam: Signals & Systems Guide
No ratings yet
Mid-term Exam: Signals & Systems Guide
2 pages
Question
No ratings yet
Question
161 pages
QED User Manual
No ratings yet
QED User Manual
57 pages
Reverse MCQ 1
No ratings yet
Reverse MCQ 1
3 pages
Open Source Security Testing Methodology Audit: Rav Score
No ratings yet
Open Source Security Testing Methodology Audit: Rav Score
7 pages
Demat Account Closure Form
No ratings yet
Demat Account Closure Form
1 page
Railway Revenue Expenditure Classification
No ratings yet
Railway Revenue Expenditure Classification
37 pages
Listado Juegos 256gB 51000
No ratings yet
Listado Juegos 256gB 51000
886 pages
G12 IT TG 2023 Web
100% (3)
G12 IT TG 2023 Web
161 pages
EN 50575:2014+A1:2016 Overview
No ratings yet
EN 50575:2014+A1:2016 Overview
26 pages
Passing Authoritative Text Backgrounds and Contexts Criticism 1st Edition Nella Larsen Online Version
No ratings yet
Passing Authoritative Text Backgrounds and Contexts Criticism 1st Edition Nella Larsen Online Version
164 pages
Pathways rw2 2e U10 Test
No ratings yet
Pathways rw2 2e U10 Test
11 pages
PFDA Assignment Report by Jackson Tai
No ratings yet
PFDA Assignment Report by Jackson Tai
92 pages
GitHub - Peggy1502 - Fraud-Detection-Handbook - Machine Learning For Credit Card Fraud Detection - Practical Handbook
No ratings yet
GitHub - Peggy1502 - Fraud-Detection-Handbook - Machine Learning For Credit Card Fraud Detection - Practical Handbook
5 pages