You are on page 1of 7

Assignment 2:

Using the WEKA Workbench

Group Members:
Yogesh Katore(CI15M06)
Kiran Gavhane(CI15M07)

Problem:
Selecting

wheather.arff

file

and

apply

different

Learning

schemes/technique that is Nave Bayes, ZeroR, OneR and J4.8 to analysis


of particular file and find out feasible technique used which gives
minimum error and provide greater accuracy. For test options, first choose
"Use training set", and then choose "Percentage Split" using default 66%
percentage split. Report model percent error rate.

Answer:

A. Become familiar with the use of the WEKA workbench to invoke several
different machine learning schemes. Following are the some snapshot which
shows analysis or training set and splitting set at 66%.

Using Only Training Set:


1. Learning Scheme: Nave Bayes

2. Learning Scheme: J4.8

3. Learning Scheme: ZeroR

4. Learning Scheme: OneR

Using percentage Split(66%) :

Learning Scheme: Nave Bayes

Learning Scheme: J4.8

Learning Scheme:ZeroR

Learning Scheme: OneR

Use the following learning schemes, with the default settings to analyze
the weather data (in weather.arff). For test options, first choose "Use
training set", then choose "Percentage Split" using default 66%

percentage split. Report model percent error rate.


ZeroR
OneR
Naive Bayes
J4.8
Answer:
ZeroR
Model: Yes
Evaluate using training set: 5/14 = 35% errors
Evaluate using split:

2/5 = 40% errors

OneR
Model:
sunny
rainy -> yes

-> no

overcast -> yes

Evaluate using training set, error rate: 4/14 =29%


Evaluate using split, error rate: 3/5 = 60%

NaiveBayes
Evaluate using training set, error rate: 1/14 =7%
Evaluate using split, error rate: 2/5 = 40%

J48 pruned tree


Model:
outlook = sunny
| humidity <= 75: yes (2.0)
| humidity > 75: no (3.0)
outlook = overcast: yes (4.0)

outlook = rainy
| windy = TRUE: no (2.0)
| windy = FALSE: yes (3.0)

Evaluate using training set, error rate: 0/14 =0%


Evaluate using split, error rate: 3/5 = 60%
. Which of these classifiers are you more likely to trust when
determining whether to play? Why?

Answer: The one with the lower error on the separate test set, which
is NaiveBayes.
What can you say about accuracy when using training set data
and when using a separate percentage to train?
Answer:
When using only training data, the classifier that can build a
more complex model, like J4.8 decision tree, can fit the data.
Accuracy on the train set is not a good predictor of the accuracy on
the separate test set.

You might also like