You are on page 1of 6

 

 
 
 
 
 
 
 
 
 
Group 15 

Artificial Intelligence 
Lab - 8 
 

Team Members: 

Khushboo Garg 201751065 

Preeti Yadav 201751039 

Anjali Priya 201751007 

Keshav Purohit 201751021 

Varnika Dasgupta 201751060 

 
 
 

Description 

● Number of Attributes: 6 
● Attribute Values:  
○ buying v-high, high, med, low 
○ maint v-high, high, med, low 
○ doors 2, 3, 4, 5-more 
○ persons 2, 4, more 
○ lug_boot small, med, big 
○ safety low, med, high 
● Class Distribution: 
○ Unacc 
○ Acc 
○ Good 
○ V-good 
● The classifier used: Classification - regression tree (ID3) 

 

 
 

Code and Output: 


The target variable is marked as a class in the data frame. The values are present in string 
format. However, the algorithm requires the variables to be coded into its equivalent integer 
codes. We can convert the string categorical values into an integer code using the factorize 
method of the pandas library. 

 

 
 

Question 1: 
Randomly select 60 percent of labeled data (from each class) for constructing the tree (training).
Test for the rest of 40 percent data. Find out the accuracy of the classification tree with the help
of the confusion matrix and F-score. Use the entropy measure for the selection of attributes.

 

 
 

Question 2 and Question 3: 


Repeat the above exercise 20 times. Calculate the average accuracy of classification.

Repeat steps 1 and 2 with the Gini index as a measure for the selection of attributes. 

Question 4: 
Repeat steps 1, 2 and 3 considering 70 percent data (random selection) for training.

 

 
 

 
Repeat steps 1, 2 and 3 considering 80 percent data (random selection) for training. 

Question 5: 
Describe the problem of overfitting in your words with an example created from the data-set.

Overfitting refers to a model that models the training data too well. Overfitting happens 
when a model learns the detail and noise in the training data to the extent that it negatively 
impacts the performance of the model on new data. This means that the noise or random 
fluctuations in the training data is picked up and learned as concepts by the model. The 
problem is that these concepts do not apply to new data and negatively impact the models 
ability to generalize. ​So overfitting is basically when your model is trained so specific on the 
training dataset that predictions are bad for data that the model has never seen before. 
Generally speaking, you could say that your model will start to overfit as soon as the test error 
starts to increase where the training error is still decreasing. 

 

You might also like