You are on page 1of 2

CS504 Spring 2020 Homework 3

Due: 04/30/2020 by 11:59PM

Description:
Download WEKA from: http://www.cs.waikato.ac.nz/ml/weka/ Weka assumes by default
that the class attribute is the last column.

Datasets and corresponding descriptions are provided with this homework.

Datasets Descriptions
a.arff a.png
b.arff b.png
c.arff c.png

For this assignment, you will use WEKA to evaluate 4 different classifiers
(DecisionStump, J48, IBk(KNN), NaiveBayes) on three synthetic datasets. This will be
done in the following steps:
1. First, you will explore the datasets.
2. Next, you will perform a series of experiments using Weka Explorer. For each
experiment, you will be asked to answer a series of questions.
3. Compile your answers in the form of a pdf file.

Data Exploration (6 points)


• Visually explore the data sets, and describe the following for each data set
o Types of attributes
o Class distribution
o Any special structure that you might observe, if any

Experiments (8 points each)


• Experiment 1: use 10-fold cross validation to test/compare DecisionStump and
J48 on dataset c. Here are the steps to do this:
o Select the explorer button
o Click on the open file button and load the c.arff file
o With the preprocess tab open, make sure the "Class (Nom)" field is
selected on the right side of the screen. On the left, select different
attributes and observe the distributions of data over the two classes (1 and
2). Specifically note the distribution of the "class" attribute. You can also
click on "visualize all" to look at a complete pairwise plot.
o Click on the classify tab.
§ Click on the choose button. Then select
• Trees->DecisionStump
§ Make sure (Nom) Class is selected as the attribute to predict.
§ Under test options, click cross-validate folds 10.
§ Click start, and review the output on the right. Note the correctly
classified instances and the root mean squared error (RMSE).
o REPEAT the above steps for:
§ Trees->J48
§ Trees->J48 Unpruned (next to the choose button where you
selected J48, click on the parameters line, which will open a
window with some options. Set the unpruned option to TRUE.)
o In a table, list the classification accuracy (correctly classified instance
percentage) and the RMSE for each classifier. (One row for
DecisionStump, two rows for J48(pruned/unpruned.)
o For DecisionStump, briefly explain the technique and list the attribute that
was used to make the decision. Compare the results of
J48(pruned/unpruned) and explain why pruned has better performance.

• Experiment 2: Run J48(pruned), NaiveBayes, IBk (k=1 and k=21) respectively


on data sets a and b using default parameters
o For each classifier, use F-measure to compare its performance obtained on
data set a to its performance obtained on data set b.
o For data set a, compare the performance of the 4 classifiers using F-
measure.
o Give explanations for your observations above.

• Experiment 3: Run NaiveBayes, IBk (k=1 and k=10) on data set c using default
parameters
o Compare the performance of the 3 classifiers using F-measure.
§ Comment on the effect of k.
o Give explanations for your observations above.

Deliverables:
A single PDF document including all the answers for questions in data exploration and
experiments.

You might also like