Professional Documents
Culture Documents
University of Science
Faculty of Information Technology
LAB 2:
Decision Tree
Course Introduction to AI
Class 22CLC02
Teacher Nguyễn Ngọc Thảo
Hồ Thị Thanh Tuyến
HCMC, 2024
Mục lục
1 Check list 2
2 Source Code 2
2.1 Library Usages: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Decision Tree: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2.1 Preparing the data sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2.2 Building the decision tree classifiers . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2.3 Performance Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 Evaluating the decision tree classifiers: . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3.1 Classification Report and Confusion Matrix . . . . . . . . . . . . . . . . . . . 4
2.3.2 Comments: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Statistics Report 13
3.1 The depth and accuracy of a decision tree: . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.1 Decision Tree Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1.2 Accuracy to Max Depth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4 References 16
VNUHCM-US-FIT 1 22127357
1 Check list
1. Preparing the data sets.
2 Source Code
2.1 Library Usages:
− sklearn: A powerful machine learning library in Python. It provides a wide range of algo-
rithms for classification, regression, clustering, etc. It also offers tools for data preprocessing,
model evaluation, and parameter tuning.
− pandas: A library for data manipulation and analysis in Python. It is used to load, manipu-
late, filter, and analyze data from CSV files.
− matplotlib: Used for creating static, interactive, and animated visualizations in Python. It
offers a wide range of plotting functions to create line plots, bar plots, histograms, heatmaps,
etc.
− pydotplus: Interface to Graphviz’s Dot language. Allows the users to create, manipulate,
and visualize graphs using the Dot language. And is often used in combination with other li-
braries for generating and visualizing complex graphs (Decision Trees and Neural Networks).
VNUHCM-US-FIT 2 22127357
parents usual , pretentious , great_pret
has_nurs proper , less_proper , improper , critical , very_crit
form complete , completed , incomplete , foster
children 1 , 2 , 3 , more
housing convenient , less_conv , critical
finance convenient , inconv
social non - prob , slightly_prob , problematic
health recommended , priority , not_recom
1. Shuffled: The dataset is shuffled once at the start and then shuffled again during the split-
ting process (with default random_state setting).
2. Split: The dataset are split into Training set and Testing set in these ratio (train/test): 40/60,
60/40, 80/20, and 90/10. The purpose of this is to compare the performance of the models
with different training/test ratios.
*Note: stratify = y ensures that the distribution of the labels in the train and test sets
are the same.
3. Encoded: Due to the nature of the dataset being a ’Categories with an inherent order or
rank’ dataset (Ex: proper, less_proper, etc.), it must be encoded in order to be process for
making a Decision Tree. The best encoding style woulld be Label Encoding (Encode categori-
cal variables to positive integers: 0, 1, 2,...). After encoding, the dataset looks like this:
parents : usual (2) , pretentious (1) , great_pret (0)
has_nurs : proper (3) , less_proper (2) , improper (1) , critical (0) ,
very_crit (4)
form : complete (0) , completed (1) , incomplete (2) , foster (3)
children : 1 (0) , 2 (1) , 3 (2) , more (3)
housing : convenient (0) , less_conv (1) , critical (2)
finance : convenient (0) , inconv (1)
social : non - prob (0) , slightly_prob (1) , problematic (2)
health : recommended (2) , priority (1) , not_recom (0)
VNUHCM-US-FIT 3 22127357
*Note: For the Encoding step, the other types of Encoding that can be used here is Ordinal
Encoding (used for Categorical data that involves ranking or ordering) and One Hot Encod-
ing (Creates dummy variables based on the number of unique values in the categorical fea-
ture) with similar results.
MAX_DEPTH = [ None , 2 , 3 , 4 , 5 , 6 , 7]
class D e c i s i o n T r e e C l a s s i f i e r I n f o G a i n ( D e c i s i o n T r e e C l a s s i f i e r ) :
def __init__ ( self , max_depth = None ) :
super () . __init__ ( criterion = ’ entropy ’ , max_depth = max_depth )
VNUHCM-US-FIT 4 22127357
• True Negatives (TN): True Negatives occur when the model correctly predicts the negative
class (e.g., absence of a disease) when it is indeed not present in the actual data.
• False Positives (FP): These occur when the model incorrectly predicts the positive class when
it is not present in the actual data.
• False Negatives (FN): False Negatives happen when the model incorrectly predicts the nega-
tive class when it is, in fact, the positive class.
Before commenting/evaluating on the results of the Classification Report and Confusion Matrix, it
is important to know the meanings of the evaluation metrics listed.
+ Precision: Measures the accuracy of positive predictions. It is the ratio of correctly predicted
positive observations to the total predicted positives. A model which produces no false posi-
tives has a precision of 1.
TP
Precision =
TP + FP
+ Recall (Sensitivity): It measures the ability of the classifier to find all positive instances. It is
the ratio of correctly predicted positive observations to the all observations in actual class. A
model which produces no false negatives has a recall of 1.
TP
Recall =
TP + FN
*Note: To fully evaluate the effectiveness of a model, both precision and recall must be ex-
amined. Unfortunately, precision and recall are often in tension (as seen in their nature and
mathematic calculation). That is, improving precision typically reduces recall and vice versa.
+ F1-score: It is the weighted average of Precision and Recall. It is a good way to show that
a classifier has a good value for both recall and precision (The closer to 1, the better the
model).
+ Support: It is the number of actual occurrences of the class in the specified dataset.
+ Accuracy: The ratio of correctly predicted observation to the total observations. Perfect accu-
racy is equal to 1.
TP + TN
Accuracy =
TP + TN + FP + FN
+ Macro Avg: Short for macro average. In macro average, the metric (precision, recall, or F1-
score) for each class is calculated independently, and then the average of these metrics is
taken without considering class imbalance. This means that each class contributes equally
to the final average, regardless of the number of instances in each class, so if there are class
imbalances this metric to be be payed attention to. A higher macro average indicates better
overall performance across all classes.
VNUHCM-US-FIT 5 22127357
+ Weighted Avg: In weighted average, the metric for each class is calculated independently,
and then the average of these metrics is taken, with each class weighted by its support (i.e.,
the number of true instances in each class). This means that it takes into account class im-
balance by giving more weight to classes with more instances and thus these classes have a
greater influence on the final average than classes with fewer instances. A higher weighted
average indicates better overall performance, with more weight given to classes with larger
support.
1. Classification Report:
VNUHCM-US-FIT 6 22127357
weighted avg 1.00 0.99 0.99 2592
*Note: Due to having 0 instance of class recommend, it doesn’t appear in this ratio Re-
port.
+ Each row of the matrix represents the instances in a predicted class, while each column
represents the instances in an actual class. (Ex: The first row represents instances of the
first class not_recom, the second priority, and so on...)
+ The diagonal elements (from top-left to bottom-right) represent the correctly classified
instances for each class. (Ex: In the 40-60 ratio, class not_recom (top-left) has 2592 in-
stances correctly classified.)
+ The off-diagonal elements represent misclassifications. If there are no misclassifications
then all the off-diagonal elements are 0s. (Ex: In the 40-60 ratio, on row 3 - col 1 (num-
bered from 0 to 4) the elements has a value of 58, this indicates that there were 58 in-
stances of class spec_prior misclassified as class priority.)
VNUHCM-US-FIT 7 22127357
− Training 60% - Test 40%
VNUHCM-US-FIT 8 22127357
− Training 80% - Test 20%
VNUHCM-US-FIT 9 22127357
− Training 90% - Test 10%
VNUHCM-US-FIT 10 22127357
2.3.2 Comments:
+ The model has a precision of 0.98 — in other words, when it predicts the target (rank
applications for nursery schools) it is correct 98% of the time (a almost perfect model
prediction rate).
+ The model performs exceptionally well for classes not_recom, priority, and spec_prior,
with precision, recall, and F1-score all above 97%. For class very_recom, while precision
is relatively high at 90%, recall is slightly lower at 94%, resulting in a lower F1-score of
92%.
This is because:
• All 3 classes (excluding very_recom for having 12.5 TIMES less amount of in-
stances) have a balanced distribution of instances (around the 2500 - as seen in the
support column), making it easier for the model to learn and generalize patterns
effectively.
• Sufficient training data for these classes might have been available, enabling the
model to learn robust representations of these classes during training.
• The hyperparameters of the decision tree classifier maximum depth is set to be
None, without a limiter the model have been tuned effectively, leading to improved
VNUHCM-US-FIT 11 22127357
performance for these classes.
• These classes also have distinct and easily separable features from other classes
(clearly named and ranked), allowing the model to make accurate predictions.
+ Opposite to not_recom perfect score across the board, it can be seen that the recommend
attribute has a 0.00 score across all of the evaluation metrics (precision, recall,
f1-score), indicating that the model fails to correctly classify any instances of this
class. This is due to the imbalance of the classes, and insufficient data, recommend only
takes 0.015% in the whole dataset (12960 Instances) and close to 0% (as illustrated in
the Section 2.2.1) in the split sub-datasets (ONLY 1 instance - seen from support).
+ Because there is class imbalance, looking at the macro average, which calculates the
metric for each class and then takes the average, we can see that precision, recall, and
F1-score are around 78%. Meanwhile the weighted average has a score of 98%, much
higher due to the fact that recommend stats now doesn’t affect the metrics as much.
Conclusion: Overall, the model has a high accuracy of 98% and high F1-Scores, but its per-
formance varies significantly across different classes. Further investigation and possibly model
improvement are necessary, especially regarding attribute recommend.
SIMILAR TREND ARE SEEN FOR ALL OF THE OTHER SUBSETS, BUT
WITH LESS SUPPORTS/INSTANCES AS TEST SIZE IS LESSEN
Final Comment:
VNUHCM-US-FIT 12 22127357
Final Conclusion: In summary, these observations highlight the importance of dataset bal-
ance, the impact of training set size on model performance, and the stability of overall per-
formance metrics across different training-test splits. Additionally, the consistent high perfor-
mance for classes with sufficient support indicates that the model effectively learns patterns
for these classes.
3 Statistics Report
3.1 The depth and accuracy of a decision tree:
3.1.1 Decision Tree Visualization
NOTE: Due to some Trees having too large of a dimension, please refer to the attached ’.png’ in
the ’[REPORT] Data Assets’ folder.
1. max_depth = None
2. max_depth = 2
3. max_depth = 3
VNUHCM-US-FIT 13 22127357
4. max_depth = 4
5. max_depth = 5
6. max_depth = 6
VNUHCM-US-FIT 14 22127357
7. max_depth = 7
VNUHCM-US-FIT 15 22127357
Comments:
− It is clear that the Accuracy Scorings rank similarly to the max_depth values, with no limiter
None resulting in the highest Accuracy Score (extremely close to the 100%).
− While it’s true that the Accuracy Scorings continues to improve with increasing max_depth,
the rate of improvement slows down. This is evident in the smaller increases in Accuracy ob-
served as the max_depth increases beyond 4. In contrast, there’s a significant jump in accu-
racy from depth 2 (76.23%) to depth 3 (80.05%), indicating that allowing the tree to grow
beyond a depth of 2 substantially improves performance.
=⇒ Therefore, this improvement is not linear.
− This can be due to the fact that while the deeper trees tend to yield higher accuracy on the
training data, there’s a risk of Overfitting to the training data, leading to poor generaliza-
tion on unseen data. The reverse would be Underfitting for shallow trees.
Conclusion: The Decision Tree’s depth significantly impacts its accuracy. While deeper trees gen-
erally lead to better performance, there’s a trade-off between accuracy and model complexity. It’s
essential to find the optimal depth that maximizes accuracy without Overfitting or Underfitting
the training data.
4 References
[1] scikit-learn library.
[4] 105 Evaluating A Classification Model 6 Classification Report | Creating Machine Learning
Models
VNUHCM-US-FIT 16 22127357