Professional Documents
Culture Documents
Group 15
Artificial Intelligence
Lab - 8
Team Members:
Description
● Number of Attributes: 6
● Attribute Values:
○ buying v-high, high, med, low
○ maint v-high, high, med, low
○ doors 2, 3, 4, 5-more
○ persons 2, 4, more
○ lug_boot small, med, big
○ safety low, med, high
● Class Distribution:
○ Unacc
○ Acc
○ Good
○ V-good
● The classifier used: Classification - regression tree (ID3)
2
3
Question 1:
Randomly select 60 percent of labeled data (from each class) for constructing the tree (training).
Test for the rest of 40 percent data. Find out the accuracy of the classification tree with the help
of the confusion matrix and F-score. Use the entropy measure for the selection of attributes.
4
Repeat steps 1 and 2 with the Gini index as a measure for the selection of attributes.
Question 4:
Repeat steps 1, 2 and 3 considering 70 percent data (random selection) for training.
5
Repeat steps 1, 2 and 3 considering 80 percent data (random selection) for training.
Question 5:
Describe the problem of overfitting in your words with an example created from the data-set.
Overfitting refers to a model that models the training data too well. Overfitting happens
when a model learns the detail and noise in the training data to the extent that it negatively
impacts the performance of the model on new data. This means that the noise or random
fluctuations in the training data is picked up and learned as concepts by the model. The
problem is that these concepts do not apply to new data and negatively impact the models
ability to generalize. So overfitting is basically when your model is trained so specific on the
training dataset that predictions are bad for data that the model has never seen before.
Generally speaking, you could say that your model will start to overfit as soon as the test error
starts to increase where the training error is still decreasing.
6