You are on page 1of 58

Task 1

List all the categorical (or nominal) attributes and the real valued attributes separately.

Ans) Steps for identifying categorical attributes

1. Double click on credit-g.arff file.


2. Select all categorical attributes.
3. Click on invert.
4. Then we get all real valued attributes selected
5. Click on remove
6. Click on visualize all.

Steps for identifying real valued attributes

1. Double click on credit-g.arff file. 2.Select all real valued attributes.

2. Click on invert.
3. Then we get all categorial attributes selected
4. Click on remove
5. Click on visualize all.

The following are the Categorical (or Nominal) attributes)

Checking_Status
Credit_history
Purpose
Savings_status
Employment
Personal_status
Other_parties
Property_Magnitude
Other_payment_plans
Housing
Job
Own_telephone
Foreign_worker

The following are the Numerical attributes)

Duration
Credit_amout
Installment_Commitment
Residence_since
Age
Existing_credits
Num_dependents
Task 2
What attributes do you think might be crucial in making the credit assessment? Come up
with some simple rules in plain English using your selected attributes.

Ans) The following are the attributes may be crucial in making the credit assessment.
Credit_amount
Age
Job
Savings_status
Existing_credits
Installment_commitment
Property_magnitude
Task 3
3.One type of model that you can create is a Decision tree .train a Decision tree using the
complete data set as the training data. Report the model obtained after training.

Ans) Steps to model decision tree.

1. Double click on credit-g.arff file.


2. Consider all the 21 attributes for making decision tree.
3. Click on classify tab.
4. Click on choose button.
5. Expand tree folder and select J48
6. Click on use training set in test options.
7. Click on start button.
8. Right click on result list and choose the visualize tree to get decision tree.
Task 4
4.Suppose you use your above model trained on the complete dataset, and classify credit
good/bad for each of the examples in the dataset. What % of examples can you classify
correctly?(This is also called testing on the training set) why do you think can not get 100%
training accuracy?

Ans) Steps followed are:

1. Double click on credit-g.arff file.


2. Click on classify tab.
3. Click on choose button.
4. Expand tree folder and select J48
5. Click on use training set in test options.
6. Click on start button.
7. On right side we find confusion matrix
8. Note the correctly classified instances.
Output:
If we used our above model trained on the complete dataset and classified credit as good/bad for
each of the examples in that dataset. We can not get 100% training accuracy only 85.5% of
examples, we can classify correctly.

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit
Instances: 1000
Attributes: 21
checking_status
duration
credit_history
purpose
credit_amount
savings_status
employment
installment_commitment
personal_status
other_parties
residence_since
property_magnitude
age
other_payment_plans
housing
existing_credits
job
num_dependents
own_telephone
foreign_worker
class
Test mode: evaluate on training data

=== Classifier model (full training set) ===

J48 pruned tree


------------------

checking_status = <0
| foreign_worker = yes
| | duration <= 11
| | | existing_credits <= 1
| | | | property_magnitude = real estate: good (8.0/1.0)
| | | | property_magnitude = life insurance
| | | | | own_telephone = none: bad (2.0)
| | | | | own_telephone = yes: good (4.0)
| | | | property_magnitude = car: good (2.0/1.0)
| | | | property_magnitude = no known property: bad (3.0)
| | | existing_credits > 1: good (14.0)
| | duration > 11
| | | job = unemp/unskilled non res: bad (5.0/1.0)
| | | job = unskilled resident
| | | | purpose = new car
| | | | | own_telephone = none: bad (10.0/2.0)
| | | | | own_telephone = yes: good (2.0)
| | | | purpose = used car: bad (1.0)
| | | | purpose = furniture/equipment
| | | | | employment = unemployed: good (0.0)
| | | | | employment = <1: bad (3.0)
| | | | | employment = 1<=X<4: good (4.0)
| | | | | employment = 4<=X<7: good (1.0)
| | | | | employment = >=7: good (2.0)
| | | | purpose = radio/tv
| | | | | existing_credits <= 1: bad (10.0/3.0)
| | | | | existing_credits > 1: good (2.0)
| | | | purpose = domestic appliance: bad (1.0)
| | | | purpose = repairs: bad (1.0)
| | | | purpose = education: bad (1.0)
| | | | purpose = vacation: bad (0.0)
| | | | purpose = retraining: good (1.0)
| | | | purpose = business: good (3.0)
| | | | purpose = other: good (1.0)
| | | job = skilled
| | | | other_parties = none
| | | | | duration <= 30
| | | | | | savings_status = <100
| | | | | | | credit_history = no credits/all paid: bad (8.0/1.0)
| | | | | | | credit_history = all paid: bad (6.0)
| | | | | | | credit_history = existing paid
| | | | | | | | own_telephone = none
| | | | | | | | | existing_credits <= 1
| | | | | | | | | | property_magnitude = real estate
| | | | | | | | | | | age <= 26: bad (5.0)
| | | | | | | | | | | age > 26: good (2.0)
| | | | | | | | | | property_magnitude = life insurance: bad (7.0/2.0)
| | | | | | | | | | property_magnitude = car
| | | | | | | | | | | credit_amount <= 1386: bad (3.0)
| | | | | | | | | | | credit_amount > 1386: good (11.0/1.0)
| | | | | | | | | | property_magnitude = no known property: good (2.0)
| | | | | | | | | existing_credits > 1: bad (3.0)
| | | | | | | | own_telephone = yes: bad (5.0)
| | | | | | | credit_history = delayed previously: bad (4.0)
| | | | | | | credit_history = critical/other existing credit: good (14.0/4.0)
| | | | | | savings_status = 100<=X<500
| | | | | | | credit_history = no credits/all paid: good (0.0)
| | | | | | | credit_history = all paid: good (1.0)
| | | | | | | credit_history = existing paid: bad (3.0)
| | | | | | | credit_history = delayed previously: good (0.0)
| | | | | | | credit_history = critical/other existing credit: good (2.0)
| | | | | | savings_status = 500<=X<1000: good (4.0/1.0)
| | | | | | savings_status = >=1000: good (4.0)
| | | | | | savings_status = no known savings
| | | | | | | existing_credits <= 1
| | | | | | | | own_telephone = none: bad (9.0/1.0)
| | | | | | | | own_telephone = yes: good (4.0/1.0)
| | | | | | | existing_credits > 1: good (2.0)
| | | | | duration > 30: bad (30.0/3.0)
| | | | other_parties = co applicant: bad (7.0/1.0)
| | | | other_parties = guarantor: good (12.0/3.0)
| | | job = high qualif/self emp/mgmt: good (30.0/8.0)
| foreign_worker = no: good (15.0/2.0)
checking_status = 0<=X<200
| credit_amount <= 9857
| | savings_status = <100
| | | other_parties = none
| | | | duration <= 42
| | | | | personal_status = male div/sep: bad (8.0/2.0)
| | | | | personal_status = female div/dep/mar
| | | | | | purpose = new car: bad (5.0/1.0)
| | | | | | purpose = used car: bad (1.0)
| | | | | | purpose = furniture/equipment
| | | | | | | duration <= 10: bad (3.0)
| | | | | | | duration > 10
| | | | | | | | duration <= 21: good (6.0/1.0)
| | | | | | | | duration > 21: bad (2.0)
| | | | | | purpose = radio/tv: good (8.0/2.0)
| | | | | | purpose = domestic appliance: good (0.0)
| | | | | | purpose = repairs: good (1.0)
| | | | | | purpose = education: good (4.0/2.0)
| | | | | | purpose = vacation: good (0.0)
| | | | | | purpose = retraining: good (0.0)
| | | | | | purpose = business
| | | | | | | residence_since <= 2: good (3.0)
| | | | | | | residence_since > 2: bad (2.0)
| | | | | | purpose = other: good (0.0)
| | | | | personal_status = male single: good (52.0/15.0)
| | | | | personal_status = male mar/wid
| | | | | | duration <= 10: good (6.0)
| | | | | | duration > 10: bad (10.0/3.0)
| | | | | personal_status = female single: good (0.0)
| | | | duration > 42: bad (7.0)
| | | other_parties = co applicant: good (2.0)
| | | other_parties = guarantor
| | | | purpose = new car: bad (2.0)
| | | | purpose = used car: good (0.0)
| | | | purpose = furniture/equipment: good (0.0)
| | | | purpose = radio/tv: good (18.0/1.0)
| | | | purpose = domestic appliance: good (0.0)
| | | | purpose = repairs: good (0.0)
| | | | purpose = education: good (0.0)
| | | | purpose = vacation: good (0.0)
| | | | purpose = retraining: good (0.0)
| | | | purpose = business: good (0.0)
| | | | purpose = other: good (0.0)
| | savings_status = 100<=X<500
| | | purpose = new car: bad (15.0/5.0)
| | | purpose = used car: good (3.0)
| | | purpose = furniture/equipment: bad (4.0/1.0)
| | | purpose = radio/tv: bad (8.0/2.0)
| | | purpose = domestic appliance: good (0.0)
| | | purpose = repairs: good (2.0)
| | | purpose = education: good (0.0)
| | | purpose = vacation: good (0.0)
| | | purpose = retraining: good (0.0)
| | | purpose = business
| | | | housing = rent
| | | | | existing_credits <= 1: good (2.0)
| | | | | existing_credits > 1: bad (2.0)
| | | | housing = own: good (6.0)
| | | | housing = for free: bad (1.0)
| | | purpose = other: good (1.0)
| | savings_status = 500<=X<1000: good (11.0/3.0)
| | savings_status = >=1000: good (13.0/3.0)
| | savings_status = no known savings: good (41.0/5.0)
| credit_amount > 9857: bad (20.0/3.0)
checking_status = >=200: good (63.0/14.0)
checking_status = no checking: good (394.0/46.0)

Number of Leaves : 103

Size of the tree : 140

Time taken to build model: 0.11 seconds

=== Evaluation on training set ===

Time taken to test model on training data: 0.02 seconds

=== Summary ===

Correctly Classified Instances 855 85.5 %


Incorrectly Classified Instances 145 14.5 %
Kappa statistic 0.6251
Mean absolute error 0.2312
Root mean squared error 0.34
Relative absolute error 55.0377 %
Root relative squared error 74.2015 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.956 0.380 0.854 0.956 0.902 0.640 0.857 0.905 good
0.620 0.044 0.857 0.620 0.720 0.640 0.857 0.783 bad
Weighted Avg. 0.855 0.279 0.855 0.855 0.847 0.640 0.857 0.869

=== Confusion Matrix ===

a b <-- classified as
669 31 | a = good
114 186 | b = bad
Task 5

=== Confusion Matrix ===

a b <-- classified as
669 31 | a = good
114 186 | b = bad
Is testing on the training set as you did above a good idea? Why or why not?
Ans)It is not good idea by using 100% training data set.
Task 6
One approach for solving the problem encountered in the previous question is using cross-
validation? Describe what is cross validation briefly. Train a decision tree again using cross
validation and report your results. Does accuracy increase/decrease? Why?

Ans) steps followed are:


1. Double click on credit-g.arff file.
2. Click on classify tab.
3. Click on choose button.
4. Expand tree folder and select J48
5. Click on cross validations in test options.
6. Select folds as 10
7. Click on start
8. Change the folds to 5
9. Again click on start
10. Change the folds with 2
11. Click on start.
12. Right click on blue bar under result list and go to visualize tree

Output:

Cross-Validation Definition: The classifier is evaluated by cross validation using the number of
folds that are entered in the folds text field.
In Classify Tab, Select cross-validation option and folds size is 2 then Press Start Button, next
time change as folds size is 5 then press start, and next time change as folds size is 10 then press
start.
=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit
Instances: 1000
Attributes: 21
checking_status
duration
credit_history
purpose
credit_amount
savings_status
employment
installment_commitment
personal_status
other_parties
residence_since
property_magnitude
age
other_payment_plans
housing
existing_credits
job
num_dependents
own_telephone
foreign_worker
class
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


------------------

checking_status = <0
| foreign_worker = yes
| | duration <= 11
| | | existing_credits <= 1
| | | | property_magnitude = real estate: good (8.0/1.0)
| | | | property_magnitude = life insurance
| | | | | own_telephone = none: bad (2.0)
| | | | | own_telephone = yes: good (4.0)
| | | | property_magnitude = car: good (2.0/1.0)
| | | | property_magnitude = no known property: bad (3.0)
| | | existing_credits > 1: good (14.0)
| | duration > 11
| | | job = unemp/unskilled non res: bad (5.0/1.0)
| | | job = unskilled resident
| | | | purpose = new car
| | | | | own_telephone = none: bad (10.0/2.0)
| | | | | own_telephone = yes: good (2.0)
| | | | purpose = used car: bad (1.0)
| | | | purpose = furniture/equipment
| | | | | employment = unemployed: good (0.0)
| | | | | employment = <1: bad (3.0)
| | | | | employment = 1<=X<4: good (4.0)
| | | | | employment = 4<=X<7: good (1.0)
| | | | | employment = >=7: good (2.0)
| | | | purpose = radio/tv
| | | | | existing_credits <= 1: bad (10.0/3.0)
| | | | | existing_credits > 1: good (2.0)
| | | | purpose = domestic appliance: bad (1.0)
| | | | purpose = repairs: bad (1.0)
| | | | purpose = education: bad (1.0)
| | | | purpose = vacation: bad (0.0)
| | | | purpose = retraining: good (1.0)
| | | | purpose = business: good (3.0)
| | | | purpose = other: good (1.0)
| | | job = skilled
| | | | other_parties = none
| | | | | duration <= 30
| | | | | | savings_status = <100
| | | | | | | credit_history = no credits/all paid: bad (8.0/1.0)
| | | | | | | credit_history = all paid: bad (6.0)
| | | | | | | credit_history = existing paid
| | | | | | | | own_telephone = none
| | | | | | | | | existing_credits <= 1
| | | | | | | | | | property_magnitude = real estate
| | | | | | | | | | | age <= 26: bad (5.0)
| | | | | | | | | | | age > 26: good (2.0)
| | | | | | | | | | property_magnitude = life insurance: bad (7.0/2.0)
| | | | | | | | | | property_magnitude = car
| | | | | | | | | | | credit_amount <= 1386: bad (3.0)
| | | | | | | | | | | credit_amount > 1386: good (11.0/1.0)
| | | | | | | | | | property_magnitude = no known property: good (2.0)
| | | | | | | | | existing_credits > 1: bad (3.0)
| | | | | | | | own_telephone = yes: bad (5.0)
| | | | | | | credit_history = delayed previously: bad (4.0)
| | | | | | | credit_history = critical/other existing credit: good (14.0/4.0)
| | | | | | savings_status = 100<=X<500
| | | | | | | credit_history = no credits/all paid: good (0.0)
| | | | | | | credit_history = all paid: good (1.0)
| | | | | | | credit_history = existing paid: bad (3.0)
| | | | | | | credit_history = delayed previously: good (0.0)
| | | | | | | credit_history = critical/other existing credit: good (2.0)
| | | | | | savings_status = 500<=X<1000: good (4.0/1.0)
| | | | | | savings_status = >=1000: good (4.0)
| | | | | | savings_status = no known savings
| | | | | | | existing_credits <= 1
| | | | | | | | own_telephone = none: bad (9.0/1.0)
| | | | | | | | own_telephone = yes: good (4.0/1.0)
| | | | | | | existing_credits > 1: good (2.0)
| | | | | duration > 30: bad (30.0/3.0)
| | | | other_parties = co applicant: bad (7.0/1.0)
| | | | other_parties = guarantor: good (12.0/3.0)
| | | job = high qualif/self emp/mgmt: good (30.0/8.0)
| foreign_worker = no: good (15.0/2.0)
checking_status = 0<=X<200
| credit_amount <= 9857
| | savings_status = <100
| | | other_parties = none
| | | | duration <= 42
| | | | | personal_status = male div/sep: bad (8.0/2.0)
| | | | | personal_status = female div/dep/mar
| | | | | | purpose = new car: bad (5.0/1.0)
| | | | | | purpose = used car: bad (1.0)
| | | | | | purpose = furniture/equipment
| | | | | | | duration <= 10: bad (3.0)
| | | | | | | duration > 10
| | | | | | | | duration <= 21: good (6.0/1.0)
| | | | | | | | duration > 21: bad (2.0)
| | | | | | purpose = radio/tv: good (8.0/2.0)
| | | | | | purpose = domestic appliance: good (0.0)
| | | | | | purpose = repairs: good (1.0)
| | | | | | purpose = education: good (4.0/2.0)
| | | | | | purpose = vacation: good (0.0)
| | | | | | purpose = retraining: good (0.0)
| | | | | | purpose = business
| | | | | | | residence_since <= 2: good (3.0)
| | | | | | | residence_since > 2: bad (2.0)
| | | | | | purpose = other: good (0.0)
| | | | | personal_status = male single: good (52.0/15.0)
| | | | | personal_status = male mar/wid
| | | | | | duration <= 10: good (6.0)
| | | | | | duration > 10: bad (10.0/3.0)
| | | | | personal_status = female single: good (0.0)
| | | | duration > 42: bad (7.0)
| | | other_parties = co applicant: good (2.0)
| | | other_parties = guarantor
| | | | purpose = new car: bad (2.0)
| | | | purpose = used car: good (0.0)
| | | | purpose = furniture/equipment: good (0.0)
| | | | purpose = radio/tv: good (18.0/1.0)
| | | | purpose = domestic appliance: good (0.0)
| | | | purpose = repairs: good (0.0)
| | | | purpose = education: good (0.0)
| | | | purpose = vacation: good (0.0)
| | | | purpose = retraining: good (0.0)
| | | | purpose = business: good (0.0)
| | | | purpose = other: good (0.0)
| | savings_status = 100<=X<500
| | | purpose = new car: bad (15.0/5.0)
| | | purpose = used car: good (3.0)
| | | purpose = furniture/equipment: bad (4.0/1.0)
| | | purpose = radio/tv: bad (8.0/2.0)
| | | purpose = domestic appliance: good (0.0)
| | | purpose = repairs: good (2.0)
| | | purpose = education: good (0.0)
| | | purpose = vacation: good (0.0)
| | | purpose = retraining: good (0.0)
| | | purpose = business
| | | | housing = rent
| | | | | existing_credits <= 1: good (2.0)
| | | | | existing_credits > 1: bad (2.0)
| | | | housing = own: good (6.0)
| | | | housing = for free: bad (1.0)
| | | purpose = other: good (1.0)
| | savings_status = 500<=X<1000: good (11.0/3.0)
| | savings_status = >=1000: good (13.0/3.0)
| | savings_status = no known savings: good (41.0/5.0)
| credit_amount > 9857: bad (20.0/3.0)
checking_status = >=200: good (63.0/14.0)
checking_status = no checking: good (394.0/46.0)

Number of Leaves : 103

Size of the tree : 140

Time taken to build model: 0.03 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 705 70.5 %


Incorrectly Classified Instances 295 29.5 %
Kappa statistic 0.2467
Mean absolute error 0.3467
Root mean squared error 0.4796
Relative absolute error 82.5233 %
Root relative squared error 104.6565 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.840 0.610 0.763 0.840 0.799 0.251 0.639 0.746 good
0.390 0.160 0.511 0.390 0.442 0.251 0.639 0.449 bad
Weighted Avg. 0.705 0.475 0.687 0.705 0.692 0.251 0.639 0.657

=== Confusion Matrix ===

a b <-- classified as
588 112 | a = good
183 117 | b = bad

=== Run information ===

Test mode: 5-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


------------------

Time taken to build model: 0.03 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 733 73.3 %


Incorrectly Classified Instances 267 26.7 %
Kappa statistic 0.3264
Mean absolute error 0.3293
Root mean squared error 0.4579
Relative absolute error 78.3705 %
Root relative squared error 99.914 %
Total Number of Instances 1000
=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.851 0.543 0.785 0.851 0.817 0.330 0.685 0.789 good
0.457 0.149 0.568 0.457 0.506 0.330 0.685 0.483 bad
Weighted Avg. 0.733 0.425 0.720 0.733 0.724 0.330 0.685 0.697

=== Confusion Matrix ===

a b <-- classified as
596 104 | a = good
163 137 | b = bad

Test mode: 2-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


------------------

Time taken to build model: 0 seconds


=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances 721 72.1 %


Incorrectly Classified Instances 279 27.9 %
Kappa statistic 0.2443
Mean absolute error 0.3407
Root mean squared error 0.4669
Relative absolute error 81.0491 %
Root relative squared error 101.8806 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.891 0.677 0.755 0.891 0.817 0.260 0.662 0.776 good
0.323 0.109 0.561 0.323 0.410 0.260 0.662 0.464 bad
Weighted Avg. 0.721 0.506 0.696 0.721 0.695 0.260 0.662 0.682

=== Confusion Matrix ===

a b <-- classified as
624 76 | a = good
203 97 | b = bad

Note: With this observation, we have seen accuracy is increased when we have folds size is 2 , 5
and accuracy is decreased when we have 10 folds.
Task 7
Check to see if the data shows a bias against “foreign workers” or “personal-status”.

One way to do this is to remove these attributes from the data set and see if the decision
tree created in those cases is significantly different from the full dataset case which you
have already done. Did removing these attributes have any significantly effect? Discuss.

Ans) steps followed are:


1. Double click on credit-g.arff file.
2. Click on classify tab.
3. Click on choose button.
4. Expand tree folder and select J48
5. Click on cross validations in test options.
6. Select folds as 10
7. Click on start
8. Click on visualization
9. Now click on preprocessor tab
10. Select 9 and 20 attribute
11. Click on remove button
12. Go to classify tab
13. Choose J48 tree
14. Select cross validation with 10 folds
15. Click on start button
16. Right click on blue bar under the result list and go to visualize tree.

Output:

We use the Preprocess Tab in Weka GUI Explorer to remove an attribute “Foreign-
workers” & “Perosnal_status” one by one. In Classify Tab, Select Use Training set option then
Press Start Button, If these attributes removed from the dataset, we can see change in the
accuracy compare to full data set when we removed.
=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit-weka.filters.unsupervised.attribute.Remove-R9
Instances: 1000
Attributes: 20
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


Number of Leaves : 105

Size of the tree : 148

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 703 70.3 %


Incorrectly Classified Instances 297 29.7 %
Kappa statistic 0.2369
Mean absolute error 0.346
Root mean squared error 0.4828
Relative absolute error 82.3413 %
Root relative squared error 105.3514 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.843 0.623 0.759 0.843 0.799 0.242 0.631 0.744 good
0.377 0.157 0.507 0.377 0.432 0.242 0.631 0.445 bad
Weighted Avg. 0.703 0.483 0.684 0.703 0.689 0.242 0.631 0.654

=== Confusion Matrix ===

a b <-- classified as
590 110 | a = good
187 113 | b = bad

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit-weka.filters.unsupervised.attribute.Remove-R20
Instances: 1000
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


Number of Leaves : 104

Size of the tree : 144

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 720 72 %


Incorrectly Classified Instances 280 28 %
Kappa statistic 0.2972
Mean absolute error 0.3304
Root mean squared error 0.4729
Relative absolute error 78.6289 %
Root relative squared error 103.1843 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.839 0.557 0.779 0.839 0.807 0.300 0.677 0.780 good
0.443 0.161 0.541 0.443 0.487 0.300 0.677 0.477 bad
Weighted Avg. 0.720 0.438 0.707 0.720 0.711 0.300 0.677 0.689

=== Confusion Matrix ===

a b <-- classified as
587 113 | a = good
167 133 | b = bad
=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit-weka.filters.unsupervised.attribute.Remove-R9,20
Instances: 1000
Attributes: 19
checking_status
duration
credit_history
purpose
credit_amount
savings_status
employment
installment_commitment
other_parties
residence_since
property_magnitude
age
other_payment_plans
housing
existing_credits
job
num_dependents
own_telephone
class
Test mode: 10-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


------------------

checking_status = <0
| duration <= 11
| | existing_credits <= 1
| | | property_magnitude = real estate: good (9.0/1.0)
| | | property_magnitude = life insurance
| | | | own_telephone = none: bad (2.0)
| | | | own_telephone = yes: good (4.0)
| | | property_magnitude = car: good (2.0/1.0)
| | | property_magnitude = no known property: bad (3.0)
| | existing_credits > 1: good (19.0)
| duration > 11
| | job = unemp/unskilled non res: bad (5.0/1.0)
| | job = unskilled resident
| | | property_magnitude = real estate
| | | | existing_credits <= 1
| | | | | num_dependents <= 1
| | | | | | installment_commitment <= 2: good (3.0)
| | | | | | installment_commitment > 2: bad (10.0/4.0)
| | | | | num_dependents > 1: bad (2.0)
| | | | existing_credits > 1: good (3.0)
| | | property_magnitude = life insurance
| | | | duration <= 18: good (9.0)
| | | | duration > 18: bad (3.0/1.0)
| | | property_magnitude = car: bad (12.0/5.0)
| | | property_magnitude = no known property: bad (5.0)
| | job = skilled
| | | other_parties = none
| | | | duration <= 30
| | | | | savings_status = <100
| | | | | | credit_history = no credits/all paid: bad (8.0/1.0)
| | | | | | credit_history = all paid: bad (6.0)
| | | | | | credit_history = existing paid
| | | | | | | own_telephone = none
| | | | | | | | employment = unemployed: good (3.0/1.0)
| | | | | | | | employment = <1
| | | | | | | | | property_magnitude = real estate: good (2.0)
| | | | | | | | | property_magnitude = life insurance: bad (4.0)
| | | | | | | | | property_magnitude = car: good (3.0)
| | | | | | | | | property_magnitude = no known property: good (1.0)
| | | | | | | | employment = 1<=X<4
| | | | | | | | | age <= 26: bad (7.0/1.0)
| | | | | | | | | age > 26: good (7.0/1.0)
| | | | | | | | employment = 4<=X<7: bad (5.0)
| | | | | | | | employment = >=7: good (2.0)
| | | | | | | own_telephone = yes: bad (5.0)
| | | | | | credit_history = delayed previously: bad (4.0)
| | | | | | credit_history = critical/other existing credit: good (14.0/4.0)
| | | | | savings_status = 100<=X<500
| | | | | | credit_history = no credits/all paid: good (0.0)
| | | | | | credit_history = all paid: good (1.0)
| | | | | | credit_history = existing paid: bad (3.0)
| | | | | | credit_history = delayed previously: good (0.0)
| | | | | | credit_history = critical/other existing credit: good (2.0)
| | | | | savings_status = 500<=X<1000: good (4.0/1.0)
| | | | | savings_status = >=1000: good (4.0)
| | | | | savings_status = no known savings
| | | | | | own_telephone = none
| | | | | | | installment_commitment <= 3: good (3.0/1.0)
| | | | | | | installment_commitment > 3: bad (7.0)
| | | | | | own_telephone = yes: good (6.0/1.0)
| | | | duration > 30: bad (30.0/3.0)
| | | other_parties = co applicant: bad (7.0/1.0)
| | | other_parties = guarantor: good (14.0/4.0)
| | job = high qualif/self emp/mgmt: good (31.0/9.0)
checking_status = 0<=X<200
| credit_amount <= 9857
| | savings_status = <100
| | | duration <= 42
| | | | purpose = new car
| | | | | employment = unemployed
| | | | | | installment_commitment <= 3: good (2.0)
| | | | | | installment_commitment > 3: bad (3.0)
| | | | | employment = <1: bad (7.0/2.0)
| | | | | employment = 1<=X<4: good (5.0/2.0)
| | | | | employment = 4<=X<7: good (5.0/1.0)
| | | | | employment = >=7: bad (5.0)
| | | | purpose = used car
| | | | | residence_since <= 3: good (6.0)
| | | | | residence_since > 3: bad (3.0/1.0)
| | | | purpose = furniture/equipment
| | | | | other_payment_plans = bank: good (2.0/1.0)
| | | | | other_payment_plans = stores: good (2.0)
| | | | | other_payment_plans = none
| | | | | | housing = rent: good (5.0/1.0)
| | | | | | housing = own: bad (14.0/5.0)
| | | | | | housing = for free: bad (0.0)
| | | | purpose = radio/tv: good (45.0/8.0)
| | | | purpose = domestic appliance: good (1.0)
| | | | purpose = repairs
| | | | | installment_commitment <= 3: good (3.0)
| | | | | installment_commitment > 3: bad (3.0/1.0)
| | | | purpose = education
| | | | | age <= 33: good (2.0)
| | | | | age > 33: bad (3.0/1.0)
| | | | purpose = vacation: good (0.0)
| | | | purpose = retraining: good (1.0)
| | | | purpose = business
| | | | | residence_since <= 3: good (10.0/2.0)
| | | | | residence_since > 3: bad (5.0)
| | | | purpose = other: good (1.0)
| | | duration > 42: bad (7.0)
| | savings_status = 100<=X<500
| | | purpose = new car
| | | | property_magnitude = real estate: bad (0.0)
| | | | property_magnitude = life insurance: bad (6.0)
| | | | property_magnitude = car
| | | | | residence_since <= 2: good (3.0)
| | | | | residence_since > 2: bad (4.0/1.0)
| | | | property_magnitude = no known property: good (2.0/1.0)
| | | purpose = used car: good (3.0)
| | | purpose = furniture/equipment: bad (4.0/1.0)
| | | purpose = radio/tv: bad (8.0/2.0)
| | | purpose = domestic appliance: good (0.0)
| | | purpose = repairs: good (2.0)
| | | purpose = education: good (0.0)
| | | purpose = vacation: good (0.0)
| | | purpose = retraining: good (0.0)
| | | purpose = business
| | | | housing = rent
| | | | | existing_credits <= 1: good (2.0)
| | | | | existing_credits > 1: bad (2.0)
| | | | housing = own: good (6.0)
| | | | housing = for free: bad (1.0)
| | | purpose = other: good (1.0)
| | savings_status = 500<=X<1000: good (11.0/3.0)
| | savings_status = >=1000: good (13.0/3.0)
| | savings_status = no known savings: good (41.0/5.0)
| credit_amount > 9857: bad (20.0/3.0)
checking_status = >=200
| property_magnitude = real estate
| | installment_commitment <= 3: good (15.0/3.0)
| | installment_commitment > 3: bad (6.0/1.0)
| property_magnitude = life insurance: good (12.0)
| property_magnitude = car: good (21.0/3.0)
| property_magnitude = no known property
| | num_dependents <= 1: good (7.0/1.0)
| | num_dependents > 1: bad (2.0)
checking_status = no checking: good (394.0/46.0)

Number of Leaves : 97

Size of the tree : 139

Time taken to build model: 0 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 716 71.6 %


Incorrectly Classified Instances 284 28.4 %
Kappa statistic 0.2843
Mean absolute error 0.3328
Root mean squared error 0.477
Relative absolute error 79.2118 %
Root relative squared error 104.0916 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.839 0.570 0.774 0.839 0.805 0.287 0.660 0.766 good
0.430 0.161 0.533 0.430 0.476 0.287 0.660 0.471 bad
Weighted Avg. 0.716 0.447 0.702 0.716 0.706 0.287 0.660 0.678

=== Confusion Matrix ===

a b <-- classified as
587 113 | a = good
171 129 | b = bad
We use the Preprocess Tab in Weka GUI Explorer to remove an attribute “Foreign- workers” &
“Personal_status” one by one. In Classify Tab, Select Use Training set option then
Press Start Button, If these attributes removed from the dataset, we can see change in the
accuracy compare to full data set when we removed.
Note: With this observation we have seen, when “Foreign_worker “attribute is removed
from the Dataset, the accuracy is decreased. So this attribute is important for classification.
Task 8
Another question might be, do you really need to input so many attributes to get good results?
May be only a few would do. For example, you could try just having attributes 2,3,5,7,10,17 and
21. Try out some combinations.(You had removed two attributes in problem 7. Remember to
reload the arff data file to get all the attributes initially before you start selecting the ones you
want.)

Ans) steps followed are:


1. Double click on credit-g.arff file.
2. Select 2,3,5,7,10,17,21 and tick the check boxes.
3. Click on invert
4. Click on remove
5. Click on classify tab
6. Choose trace and then algorithm as J48
7. Select cross validation folds as 2
8. Click on start.
OUTPUT:
1. We use the Preprocess Tab in Weka GUI Explorer to remove 2 attribute (Duration). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes
removed from the dataset, we can see change in the accuracy compare to full data set
when we removed.
2. Remember to reload the previous removed attribute, press Undo option in Preprocess tab.
We use the Preprocess Tab in Weka GUI Explorer to remove 3rd attribute
(Credit_history). In Classify Tab, Select Use Training set option then Press Start Button,
If these attributes removed from the dataset, we can see change in the accuracy compare
to full data set when we removed.

3. Remember to reload the previous removed attribute, press Undo option in Preprocess
tab. We use the Preprocess Tab in Weka GUI Explorer to remove 5 attribute
(Credit_amount). In Classify Tab, Select Use Training set option then Press Start Button,
If these attributes removed from the dataset, we can see change in the accuracy compare
to full data set when we removed.
4. Remember to reload the previous removed attribute, press Undo option in Preprocess tab.
th
We use the Preprocess Tab in Weka GUI Explorer to remove 7 attribute
(Employment). In Classify Tab, Select Use Training set option then Press Start Button, If
these attributes removed from the dataset, we can see change in the accuracy compare to
full data set when we removed.

5. Remember to reload the previous removed attribute, press Undo option in Preprocess
tab. We use the Preprocess Tab in Weka GUI Explorer to remove 10 attribute
(Other_parties). In Classify Tab, Select Use Training set option then Press Start Button,
If these attributes removed from the dataset, we can see change in the accuracy compare
to full data set when we removed.

6. Remember to reload the previous removed attribute, press Undo option in Preprocess tab.
We use the Preprocess Tab in Weka GUI Explorer to remove 17 attribute (Job). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes
removed from the dataset, we can see change in the accuracy compare to full data set
when we removed.

7. Remember to reload the previous removed attribute, press Undo option in Preprocess
tab. We use the Preprocess Tab in Weka GUI Explorer to remove 21 attribute (Class).
In Classify Tab, Select Use Training set option then Press Start Button, If these attributes
removed from the dataset, we can see change in the accuracy compare to full data set
when we removed.

=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit-weka.filters.unsupervised.attribute.Remove-R1,4,6,8-9,11-16,18-20
Instances: 1000
Attributes: 7
duration
credit_history
credit_amount
employment
other_parties
job
class
Test mode: 2-fold cross-validation

=== Classifier model (full training set) ===

J48 pruned tree


------------------

credit_history = no credits/all paid: bad (40.0/15.0)


credit_history = all paid
| employment = unemployed
| | duration <= 36: bad (3.0)
| | duration > 36: good (2.0)
| employment = <1
| | duration <= 26: bad (7.0/1.0)
| | duration > 26: good (2.0)
| employment = 1<=X<4: good (15.0/6.0)
| employment = 4<=X<7: bad (10.0/4.0)
| employment = >=7
| | job = unemp/unskilled non res: bad (0.0)
| | job = unskilled resident: good (3.0)
| | job = skilled: bad (3.0)
| | job = high qualif/self emp/mgmt: bad (4.0)
credit_history = existing paid
| credit_amount <= 8648
| | duration <= 40: good (476.0/130.0)
| | duration > 40: bad (27.0/8.0)
| credit_amount > 8648: bad (27.0/7.0)
credit_history = delayed previously
| employment = unemployed
| | credit_amount <= 2186: bad (4.0/1.0)
| | credit_amount > 2186: good (2.0)
| employment = <1
| | duration <= 18: good (2.0)
| | duration > 18: bad (10.0/2.0)
| employment = 1<=X<4: good (33.0/6.0)
| employment = 4<=X<7
| | credit_amount <= 4530
| | | credit_amount <= 1680: good (3.0)
| | | credit_amount > 1680: bad (3.0)
| | credit_amount > 4530: good (11.0)
| employment = >=7
| | job = unemp/unskilled non res: good (0.0)
| | job = unskilled resident: good (2.0/1.0)
| | job = skilled: good (14.0/4.0)
| | job = high qualif/self emp/mgmt: bad (4.0/1.0)
credit_history = critical/other existing credit: good (293.0/50.0)

Number of Leaves : 27

Size of the tree : 40

Time taken to build model: 0.02 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 697 69.7 %


Incorrectly Classified Instances 303 30.3 %
Kappa statistic 0.1967
Mean absolute error 0.3868
Root mean squared error 0.4606
Relative absolute error 92.0257 %
Root relative squared error 100.5167 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===


TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.860 0.683 0.746 0.860 0.799 0.205 0.615 0.764 good
0.317 0.140 0.492 0.317 0.385 0.205 0.615 0.391 bad
Weighted Avg. 0.697 0.520 0.670 0.697 0.675 0.205 0.615 0.652

=== Confusion Matrix ===

a b <-- classified as
602 98 | a = good
205 95 | b = bad

OUTPUT:
nd
We use the Preprocess Tab in Weka GUI Explorer to remove 2 attribute (Duration). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes removed
from the dataset, we can see change in the accuracy compare to full data set when we removed.

=== Evaluation on training set ===


=== Summary ===

Correctly Classified Instances 841 84.1 %


Incorrectly Classified Instances 159 15.9 %
Confusion Matrix ===

a b <-- classified as 647 53 | a = good


106 194 | b = bad
Remember to reload the previous removed attribute, press Undo option in Preprocess tab. We
use the Preprocess Tab in Weka GUI Explorer to remove 3rd attribute (Credit_history). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes removed
from the dataset, we can see change in the accuracy compare to full data set when we removed.

=== Evaluation on training set ===


=== Summary ===

Correctly Classified Instances 839 83.9 %


Incorrectly Classified Instances 161 16.1 %

== Confusion Matrix ===

a b <-- classified as 645 55 | a = good


106 194 | b = bad

Remember to reload the previous removed attribute, press Undo option in Preprocess tab. We
use the Preprocess Tab in Weka GUI Explorer to remove 5 attribute (Credit_amount). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes removed
from the dataset, we can see change in the accuracy compare to full data set when we removed.

=== Evaluation on training set ===


=== Summary ===

Correctly Classified Instances 864 86.4 %


Incorrectly Classified Instances 136 13.6 %
= Confusion Matrix ===

a b <-- classified as 675 25 | a = good


111 189 | b = bad

Remember to reload the previous removed attribute, press Undo option in Preprocess tab. We
use the Preprocess Tab in Weka GUI Explorer to remove 7 attribute (Employment). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes removed
from the dataset, we can see change in the accuracy compare to full data set when we removed.

=== Evaluation on training set ===


=== Summary ===

Correctly Classified Instances 858 85.8 %


Incorrectly Classified Instances 142 14.2 %

== Confusion Matrix ===

a b <-- classified as 670 30 | a = good


112 188 | b = bad

Remember to reload the previous removed attribute, press Undo option in Preprocess tab. We
use the Preprocess Tab in Weka GUI Explorer to remove 10 attribute (Other_parties). In
Classify Tab, Select Use Training set option then Press Start Button, If these attributes removed
from the dataset, we can see change in the accuracy compare to full data set when we removed.

Time taken to build model: 0.05 seconds

=== Evaluation on training set ===


=== Summary ===

Correctly Classified Instances 845 84.5 %

Incorrectly Classified Instances 155 15.5 %

Confusion Matrix === a b <-- classified as 663 37 | a = good


118 182 | b = bad

Remember to reload the previous removed attribute, press Undo option in Preprocess tab. We
use the Preprocess Tab in Weka GUI Explorer to remove 17 attribute (Job). In Classify Tab,
Select Use Training set option then Press Start Button, If these attributes removed from the
dataset, we can see change in the accuracy compare to full data set when we removed.
=== Evaluation on training set ===
=== Summary ===

Correctly Classified Instances 859 85.9 %


Incorrectly Classified Instances 141 14.1 %

=== Confusion Matrix ===

a b <-- classified as 675 25 | a = good


116 184 | b = bad

Remember to reload the previous removed attribute, press Undo option in Preprocess tab. We
use the Preprocess Tab in Weka GUI Explorer to remove 21 attribute (Class). In Classify
Tab, Select Use Training set option then Press Start Button, If these attributes removed from the
dataset, we can see change in the accuracy compare to full data set when we removed.

=== Evaluation on training set ===


=== Summary ===
Correctly Classified Instances 963 96.3 %
Incorrectly Classified Instances 37 3.7 %
=== Confusion Matrix ===

a b<-- classified as
963 0 |a = yes

37 0 | b = no

Note : With this observation we have seen, when 3 attribute is removed from the Dataset, the
accuracy (83%) is decreased. So this attribute is important for classification. when 2nd and 10th
attributes are removed from the Dataset, the accuracy(84%) is same. So we can remove any one
among them. when 7th and 17th attributes are removed from the Dataset, the accuracy(85%) is
same.
So we can remove any one among them. If we remove 5 and 21 attributes the accuracy is
increased, so these attributes may not be needed for the classification.
Task 9
Sometimes, The cost of rejecting an applicant who actually has good credit might be higher than
accepting an applicant who has bad credit. Instead of counting the misclassification equally in
both cases, give a higher cost to the first case ( say cost 5) and lower cost to the second case. By
using a cost matrix in weak. Train your decision tree and report the Decision Tree and cross
validation results. Are they significantly different from results obtained in problem 6.

Ans) steps followed are:


1. Double click on credit-g.arff file.
2. Click on classify tab.
3. Click on choose button.
4. Expand tree folder and select J48
5. Click on start
6. Note down the accuracy values
7. Now click on credit arff file
8. Click on attributes 2,3,5,7,10,17,21
9. Click on invert
10. Click on classify tab
11. Choose J48 algorithm
12. Select Cross validation fold as 2
13. Click on start and note down the accuracy values.
14. Again make cross validation folds as 10 and note down the accuracy values.
15. Again make cross validation folds as 20 and note down the accuracy values.
OUTPUT:
In Weka GUI Explorer, Select Classify Tab, In that Select Use Training setoption . In Classify
Tab then press Choose button in that select J48 as Decision Tree Technique. In Classify Tab
then press More options button then we get classifier evaluation options window in that select
cost sensitive evaluation the press set option Button then we get Cost Matrix Editor. In that
change classes as 2 then press Resize button. Then we get 2X2 Cost matrix. In Cost Matrix (0,1)
location value change as 5, then we get modified cost matrix is as follows.

0.0 5.0
1.0 0.0
Then close the cost matrix editor, then press ok button. Then press start button.
=== Evaluation on training set ===
=== Summary ===

Correctly Classified Instances 855 85.5 %


Incorrectly Classified Instances 145 14.5 %

=== Confusion Matrix ===

a b <-- classified as 669 31 | a = good 114 186 | b = bad


Note: With this observation we have seen that ,total 700 customers in that 669 classified as
goodcustomers and 31 misclassified as bad customers. In total 300cusotmers, 186 classified as
bad customers and 114 misclassified as good customers.
Task 10
Do you think it is a good idea to prefect simple decision trees instead of having long complex
decision tress? How does the complexity of a Decision Tree relate to the bias of the model?
Ans)
steps followed are:-
1)click on credit arff file
2)Select all attributes
3)click on classify tab
4)click on choose and select J48 algorithm 5)select cross validation folds with 2
6)click on start
7)write down the time complexity value

It is Good idea to prefer simple Decision trees, instead of having complex Decision tree
=== Run information ===

Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2


Relation: german_credit
Instances: 1000
Attributes: 21

Test mode: 2-fold cross-validation


Evaluation cost matrix:
05
10

=== Classifier model (full training set) ===

J48 pruned tree

Number of Leaves : 103

Size of the tree : 140

Time taken to build model: 0 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 721 72.1 %


Incorrectly Classified Instances 279 27.9 %
Kappa statistic 0.2443
Total Cost 583
Average Cost 0.583
Mean absolute error 0.3407
Root mean squared error 0.4669
Relative absolute error 81.0491 %
Root relative squared error 101.8806 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.891 0.677 0.755 0.891 0.817 0.260 0.662 0.776 good
0.323 0.109 0.561 0.323 0.410 0.260 0.662 0.464 bad
Weighted Avg. 0.721 0.506 0.696 0.721 0.695 0.260 0.662 0.682

=== Confusion Matrix ===

a b <-- classified as
624 76 | a = good
203 97 | b = bad
Task 11
You can make your Decision Trees simpler by pruning the nodes. One approach is to use
Reduced Error Pruning. Explain this idea briefly. Try reduced error pruning for training your
Decision Trees using cross validation and report the Decision Trees you obtain? Also Report
your accuracy using the pruned model Does your Accuracy increase?

Ans)

steps followed are:-


1)click on credit arff file
2)Select all attributes
3)click on classify tab
4)click on choose and select REP algorithm
5)select cross validation 2
6)click on start
Note down the results

We can make our decision tree simpler by pruning the nodes. For that In Weka GUI Explorer,
Select Classify Tab, In that Select Use Training setoption . In Classify Tab then press Choose
button in that select J48 as Decision Tree Technique. Beside Choose Button Press on J48 –c
0.25–M2 text we get Generic Object Editor. In that select Reduced Error pruning Property as
True then press ok. Then press start button.

=== Evaluation on training set ===


=== Summary ===
Correctly Classified Instances 786 78.6 %
Incorrectly Classified Instances 214 21.4 %
== Confusion Matrix ===
a b <-- classified as 662 38 | a = good 176 124 | b = bad
By using pruned model, the accuracy decreased. Therefore by pruning the nodes we can make
our decision tree simpler.
Task 12
How can you convert a Decision Tree into “if-then-else rules”. Make up your own small
Decision Tree consisting 2-3 levels and convert into a set of rules. There also exist different
classifiers that output the model in the form of rules. One such classifier in weka is rules. PART,
train this model and report the set of rules obtained. Sometimes just one attribute can be good
enough in making the decision, yes, just one ! Can you predict what attribute that might be in this
data set? OneR classifier uses a single attribute to make decisions(it chooses the attribute based
on minimum error).Report the rule obtained by training a one R classifier. Rank the performance
of j48,PART,oneR.

Ans)

Steps For Analyze Decision Tree:


1. click on credit arff file
2. Select all attributes
3. click on classify tab
4. click on choose and select J48 algorithm
5. select cross validation folds with 2
6. click on start
7. note down the accuracy value
8. again goto choose tab and select rules and select PART
9. select cross validation folds with 2
10. click on start
11. note down accuracy value
12. again goto choose tab and select and select rules and select One R
13. select cross validation folds with 2
14. click on start
15. note down the accuracy value.

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 721 72.1 %


Incorrectly Classified Instances 279 27.9 %
Kappa statistic 0.2443
Total Cost 279
Average Cost 0.279
Mean absolute error 0.3407
Root mean squared error 0.4669
Relative absolute error 81.0491 %
Root relative squared error 101.8806 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===


TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.891 0.677 0.755 0.891 0.817 0.260 0.662 0.776 good
0.323 0.109 0.561 0.323 0.410 0.260 0.662 0.464 bad
Weighted Avg. 0.721 0.506 0.696 0.721 0.695 0.260 0.662 0.682

=== Confusion Matrix ===

a b <-- classified as
624 76 | a = good
203 97 | b = bad

=== Run information ===

Scheme: weka.classifiers.rules.PART -M 2 -C 0.25 -Q 1


Relation: german_credit
Instances: 1000
Attributes: 21
checking_status
duration
credit_history
purpose
credit_amount
savings_status
employment
installment_commitment
personal_status
other_parties
residence_since
property_magnitude
age
other_payment_plans
housing
existing_credits
job
num_dependents
own_telephone
foreign_worker
class
Test mode: 2-fold cross-validation
Evaluation cost matrix:
01
10

=== Classifier model (full training set) ===


PART decision list
------------------

checking_status = no checking AND


other_payment_plans = none AND
credit_history = critical/other existing credit: good (134.0/3.0)

checking_status = no checking AND


existing_credits <= 1 AND
other_payment_plans = none AND
purpose = radio/tv: good (49.0/2.0)

checking_status = no checking AND


foreign_worker = yes AND
employment = 4<=X<7: good (35.0/2.0)

foreign_worker = no AND
personal_status = male single: good (21.0)

checking_status = no checking AND


purpose = used car AND
other_payment_plans = none: good (23.0)

duration <= 15 AND


other_parties = guarantor: good (22.0/1.0)

duration <= 11 AND


credit_history = critical/other existing credit: good (29.0/3.0)

checking_status = >=200 AND


num_dependents <= 1 AND
property_magnitude = car: good (20.0/3.0)

checking_status = no checking AND


property_magnitude = real estate AND
other_payment_plans = none AND
age > 23: good (25.0)

savings_status = >=1000 AND


property_magnitude = real estate: good (10.0)

savings_status = 500<=X<1000 AND


employment = >=7: good (13.0/1.0)

credit_history = no credits/all paid AND


housing = rent: bad (9.0)

savings_status = no known savings AND


checking_status = 0<=X<200 AND
existing_credits > 1: good (9.0)

checking_status = >=200 AND


num_dependents <= 1 AND
property_magnitude = life insurance: good (9.0)

installment_commitment <= 2 AND


other_parties = co applicant AND
existing_credits > 1: bad (5.0)

installment_commitment <= 2 AND


credit_history = delayed previously AND
existing_credits > 1 AND
residence_since > 1: good (14.0/3.0)

installment_commitment <= 2 AND


credit_history = delayed previously AND
existing_credits <= 1: good (9.0)

duration > 30 AND


savings_status = 100<=X<500: bad (13.0/3.0)

credit_history = all paid AND


other_parties = none AND
other_payment_plans = bank: bad (16.0/5.0)

duration > 30 AND


savings_status = no known savings AND
num_dependents > 1: good (5.0)

duration > 30 AND


credit_history = delayed previously: bad (9.0)

duration > 42 AND


savings_status = <100 AND
residence_since > 1: bad (28.0/3.0)

purpose = used car AND


credit_amount <= 8133 AND
existing_credits > 1: good (11.0)

purpose = used car AND


credit_amount > 8133: bad (8.0/1.0)

purpose = used car AND


employment = 1<=X<4: good (7.0)

purpose = used car: good (16.0/3.0)

purpose = furniture/equipment AND


other_payment_plans = stores: good (8.0)

credit_history = all paid AND


other_parties = none AND
other_payment_plans = none: bad (10.0)

purpose = business AND


residence_since <= 1: good (9.0)

other_payment_plans = stores AND


purpose = radio/tv AND
personal_status = male single: bad (6.0/1.0)

purpose = radio/tv AND


employment = >=7 AND
num_dependents <= 1: good (20.0/1.0)

installment_commitment <= 3 AND


purpose = furniture/equipment AND
other_parties = none AND
own_telephone = yes: good (19.0/3.0)

checking_status = no checking AND


savings_status = no known savings AND
personal_status = male single: good (11.0/1.0)

checking_status = 0<=X<200 AND


employment = 4<=X<7 AND
personal_status = male single AND
residence_since > 2: good (9.0)

purpose = other: good (5.0/1.0)

installment_commitment <= 2 AND


foreign_worker = yes AND
credit_history = existing paid AND
residence_since > 1 AND
other_parties = none AND
other_payment_plans = none AND
housing = rent AND
installment_commitment <= 1: good (9.0)

housing = rent AND


other_payment_plans = none AND
purpose = new car: bad (13.0/2.0)

other_payment_plans = stores AND


property_magnitude = life insurance: bad (4.0/1.0)

other_payment_plans = bank AND


other_parties = none AND
housing = rent: bad (7.0/1.0)

installment_commitment > 3 AND


existing_credits <= 1 AND
savings_status = <100 AND
credit_history = existing paid AND
purpose = new car: bad (17.0/5.0)

checking_status = >=200 AND


job = unskilled resident: bad (5.0)

duration <= 15 AND


property_magnitude = real estate: good (38.0/8.0)

foreign_worker = yes AND


property_magnitude = real estate AND
other_payment_plans = none AND
other_parties = none AND
duration <= 33 AND
own_telephone = yes: bad (7.0)

foreign_worker = yes AND


checking_status = <0 AND
purpose = education: bad (9.0/1.0)

foreign_worker = yes AND


purpose = education AND
checking_status = 0<=X<200: good (5.0)

foreign_worker = yes AND


checking_status = <0 AND
savings_status = 100<=X<500 AND
num_dependents <= 1: bad (6.0/1.0)
foreign_worker = yes AND
savings_status = >=1000 AND
checking_status = <0: good (4.0)

foreign_worker = yes AND


savings_status = 100<=X<500 AND
personal_status = male single: good (10.0/2.0)

foreign_worker = yes AND


existing_credits > 2: good (11.0/2.0)

foreign_worker = yes AND


other_parties = guarantor AND
other_payment_plans = none AND
existing_credits <= 1: good (6.0)

foreign_worker = yes AND


num_dependents > 1 AND
personal_status = male single AND
savings_status = <100 AND
job = skilled AND
duration > 16: bad (7.0)

foreign_worker = yes AND


other_parties = guarantor AND
purpose = radio/tv: bad (3.0)

foreign_worker = yes AND


credit_history = critical/other existing credit AND
job = unskilled resident: bad (6.0)

foreign_worker = yes AND


credit_history = no credits/all paid AND
housing = own: good (9.0/4.0)

foreign_worker = yes AND


credit_history = delayed previously AND
savings_status = <100 AND
existing_credits <= 1: bad (5.0)

foreign_worker = yes AND


credit_history = delayed previously AND
num_dependents <= 1: good (5.0)

foreign_worker = yes AND


credit_history = delayed previously AND
job = skilled: good (3.0/1.0)

foreign_worker = yes AND


credit_history = critical/other existing credit AND
other_parties = none AND
housing = own AND
savings_status = <100 AND
existing_credits > 1 AND
installment_commitment > 2 AND
credit_amount > 2181: bad (6.0)

foreign_worker = yes AND


credit_history = critical/other existing credit AND
other_payment_plans = bank: bad (5.0/1.0)

foreign_worker = yes AND


credit_history = critical/other existing credit AND
job = skilled AND
employment = 1<=X<4 AND
residence_since <= 3: good (6.0/1.0)

foreign_worker = yes AND


credit_history = critical/other existing credit: good (17.0/5.0)

foreign_worker = yes AND


credit_history = existing paid AND
checking_status = <0 AND
other_payment_plans = none AND
job = skilled AND
purpose = new car: bad (7.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
checking_status = no checking AND
duration <= 30 AND
residence_since > 1 AND
own_telephone = yes: good (4.0)

foreign_worker = yes AND


credit_history = existing paid AND
savings_status = no known savings: bad (18.0/6.0)

foreign_worker = yes AND


credit_history = existing paid AND
checking_status = <0 AND
other_payment_plans = bank AND
housing = own: bad (3.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
checking_status = <0 AND
other_payment_plans = none AND
purpose = radio/tv AND
job = skilled: bad (7.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
existing_credits <= 1 AND
purpose = radio/tv AND
age > 22: good (11.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
existing_credits <= 1 AND
installment_commitment > 3: bad (27.0/8.0)

foreign_worker = yes AND


credit_history = existing paid AND
other_payment_plans = bank: good (5.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
own_telephone = yes AND
installment_commitment > 2: bad (4.0)

foreign_worker = yes AND


credit_history = existing paid AND
existing_credits <= 1 AND
employment = 1<=X<4 AND
personal_status = female div/dep/mar AND
credit_amount > 1474: good (5.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
purpose = repairs: good (4.0/1.0)

foreign_worker = yes AND


credit_history = existing paid AND
purpose = furniture/equipment AND
property_magnitude = real estate: good (3.0)
foreign_worker = yes AND
credit_history = existing paid AND
housing = own AND
property_magnitude = life insurance: bad (8.0/3.0)

num_dependents <= 1 AND


foreign_worker = yes AND
credit_history = existing paid AND
checking_status = no checking: good (4.0)

credit_history = existing paid AND


housing = own AND
residence_since > 1: bad (8.0/2.0)

existing_credits <= 1 AND


num_dependents <= 1: good (8.0/2.0)

: bad (5.0)

Number of Rules : 78

Time taken to build model: 0.11 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 706 70.6 %


Incorrectly Classified Instances 294 29.4 %
Kappa statistic 0.2822
Total Cost 294
Average Cost 0.294
Mean absolute error 0.3239
Root mean squared error 0.4893
Relative absolute error 77.0567 %
Root relative squared error 106.7747 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.809 0.533 0.780 0.809 0.794 0.283 0.684 0.799 good
0.467 0.191 0.511 0.467 0.488 0.283 0.684 0.454 bad
Weighted Avg. 0.706 0.431 0.699 0.706 0.702 0.283 0.684 0.695
=== Confusion Matrix ===

a b <-- classified as
566 134 | a = good
160 140 | b = bad

=== Run information ===

Scheme: weka.classifiers.rules.OneR -B 6
Relation: german_credit
Instances: 1000
Attributes: 21
checking_status
duration
credit_history
purpose
credit_amount
savings_status
employment
installment_commitment
personal_status
other_parties
residence_since
property_magnitude
age
other_payment_plans
housing
existing_credits
job
num_dependents
own_telephone
foreign_worker
class
Test mode: 2-fold cross-validation
Evaluation cost matrix:
01
10

=== Classifier model (full training set) ===

credit_amount:
< 718.0 -> good
< 759.5 -> bad
< 883.0 -> good
< 922.0 -> bad
< 938.0 -> good
< 979.5 -> bad
< 1206.5 -> good
< 1223.5 -> bad
< 1267.5 -> good
< 1286.0 -> bad
< 1821.5 -> good
< 1865.5 -> bad
< 3913.5 -> good
< 3969.0 -> bad
< 4049.5 -> good
< 4329.5 -> bad
< 4726.0 -> good
< 5024.0 -> bad
< 6322.5 -> good
< 6564.0 -> bad
< 6750.0 -> good
< 6917.5 -> bad
< 7760.5 -> good
< 8109.5 -> bad
< 9340.5 -> good
< 10331.5 -> bad
< 11191.0 -> good
>= 11191.0 -> bad
(743/1000 instances correct)

Time taken to build model: 0.01 seconds

=== Stratified cross-validation ===


=== Summary ===

Correctly Classified Instances 631 63.1 %


Incorrectly Classified Instances 369 36.9 %
Kappa statistic -0.0284
Total Cost 369
Average Cost 0.369
Mean absolute error 0.369
Root mean squared error 0.6075
Relative absolute error 87.7905 %
Root relative squared error 132.5571 %
Total Number of Instances 1000

=== Detailed Accuracy By Class ===


TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area
Class
0.846 0.870 0.694 0.846 0.762 -0.031 0.488 0.695 good
0.130 0.154 0.265 0.130 0.174 -0.031 0.488 0.295 bad
Weighted Avg. 0.631 0.655 0.565 0.631 0.586 -0.031 0.488 0.575

=== Confusion Matrix ===

a b <-- classified as
592 108 | a = good
261 39 | b = bad

Converting Decision tree into a set of rules is as follows.

Rule1: If age = youth AND student=yes THEN buys_computer=yes Rule2: If age = youth AND
student=no THEN buys_computer=no Rule3: If age = middle_aged THEN buys_computer=yes

Rule4: If age = senior AND credit_rating=excellent THEN buys_computer=yes Rule5: If age =


senior AND credit_rating=fair THEN buys_computer=no

In Weka GUI Explorer, Select Classify Tab, In that Select Use Training set option .There also
exist different classifiers that output the model in the form of Rules. Such classifiers in weka are

“PART” and ”OneR” . Then go to Choose and select Rules in that select PART and press
start Button.

== Evaluation on training set ===


=== Summary ===
Correctly Classified Instances 897 89.7 %
Incorrectly Classified Instances 103 10.3 %

== Confusion Matrix ===

a b <-- classified as 653 47 | a = good


56 244 | b = bad

Then go to Choose and select Rules in that select OneR and press start Button.
== Evaluation on training set ===
=== Summary ===
Correctly Classified Instances 742 74.2 %
Incorrectly Classified Instances 258 25.8 %
=== Confusion Matrix ===
a b <-- classified as 642 58 | a = good 200 100 | b = bad

Then go to Choose and select Trees in that select J48 and press start Button.
=== Evaluation on training set ===
=== Summary ===
Correctly Classified Instances 855 85.5 %
Incorrectly Classified Instances 145 14.5 %
=== Confusion Matrix ===
a b <-- classified as 669 31 | a = good 114 186 | b = bad
Note: With this observation we have seen the performance of classifier and Rank is as follows
PART
J48 3. OneR
Task 2:Hospital Management System

Data warehouse consists dimension table and fact table. REMEMBER the following
Dimension
The dimension object(dimension);
_name
_attributes(levels),with primary key
_hierarchies
One time dimension is must. About levels and hierarchies
Dimensions objects(dimension) consists of set of levels and set of hierarchies defined over those
levels.the levels represent levels of aggregation.hierarchies describe-child relationships among a
set of levels.
For example .a typical calander dimension could contain five levels.two hierarchies can be
defined on these levels.
H1: YearL>QuarterL>MonthL>DayL H2: YearL>WeekL>DayL
The hierarchies are describes from parent to child,so that year is the parent of Quarter,quarter are
parent of month,and so forth.
About Unique key constraints
When you create a definition for a hierarchy,warehouse builder creates an identifier key for each
level of the hierarchy and unique key constraint on the lowest level (base level)
Design a hospital management system data warehouse(TARGET) consists of dimensions
patient,medicine,supplier,time.where measure are ‘ NO UNITS’ ,UNIT PRICE.
Assume the relational database(SOURCE)table schemas as follows TIME(day,month,year)
PATIENT(patient_name,age,address,etc)
MEDICINE(Medicine_brand_name,Drug_name,supplier,no_units,units_price,etc..,)
SUPPLIER:( Supplier_name,medicine_brand_name,address,etc..,)
If each dimension has 6 levels,decide the levels and hierarchies,assumes the level names
suitably.
Design the hospital management system data warehousing using all schemas.give the example 4-
D cube with assumption
names.MEDICINE(Medicine_brand_name,Drug_name,supplier,no_units,units_price,etc..,)
SUPPLIER:( Supplier_name,medicine_brand_name,address,etc..,)
If each dimension has 6 levels,decide the levels and hierarchies,assumes the level names
suitably.

Design the hospital management system data warehousing using all schemas.give the example 4-
D cube with assumption names.

You might also like