You are on page 1of 1

1.

Setting a random seed effects which random sample from the data set will be chosen for the training (and
therefore also for the testing) set. How do we change the random seed in Weka?
We can do that by clicking the More options button in the Classify panel, and then change the
value in the “Random seed for XVal / % Split” section.
2. At minute 4:17 we see that he got 10 different results for the accuracy of J48. Why are there different
results?
Because each result was generated using a different value of the random seed.
3. In the weather.nominal dataset they play golf 9 times altogether and don’t play 5 times. If that is ALL you
knew (you don’t know anything about the other variables), then what would you guess would happen
tomorrow with regard to whether they would play golf or not?
If all the data I knew was the number of time they did and did not play golf with each other, I would guess
that in the next day they would play with each other since the probability of that is higher (9/14 > 5/14)
4. Regarding question number 3, what classifier behaves like the guessing you made in question number 3?
In my opinion, it was the baseline classifier
5. One of the datasets in the data folder does not work very well in terms of building a model to predict the
class. In fact many algorithms perform worse than the baseline. Which dataset is that?

It is the diabetes.arff dataset.

6. He repeated the experiment 10 times. The standard deviation measures the variation of the accuracy
measure over the 10 repetitions. That is also referred to as the variance of the estimate. What was the value
of the variance of the estimate?
The variance of the estimate, or standard deviation measures the variation of the accuracy measure over
the 10 repetitions was 0.018
7. In the video around minute 2:40 you can see two “pies”. Explain what they are.
1 pie represents 1 dataset. It is divided into 10 parts; we then use 9 parts for training and 1 for testing. The
chosen one tenth of the dataset in each pie is different, indicating that in repeated holdout, each time a
different 10% of the dataset will be used for testing.
8. At minute 4:15 you can see the word “DEPLOY”. Can you understand what he means on this screen by that
word? If so, explain
After cross-validation, Weka will run the algorithm a 11th time using 100% of the data set to get a final
classifier that can be deployed in practice. In my opinion, deploy here refer to the act of applying the actual
classifier in real-world usage.
9. What is meant by a fold?
Fold refers to the number of groups that a given data sample is to be split into. As such, the procedure is
often called k-fold cross-validation.
10. At minute 2:20 he says “Each branch assigns the most frequent class that comes down that branch.” What
does that mean?
It means that each branch is a possible value of the attribute, and for each possible value, the algorithm will
associate it with the most frequent class that appears in the selected value.

You might also like