You are on page 1of 4

Assignment 7

Introduction to Machine Learning


Prof. B. Ravindran
1. Which of the following statement(s) regarding the evaluation of Machine Learning models
is/are true?

(a) A model with a lower training loss will perform better on a test dataset.
(b) The train and test datasets should represent the underlying distribution of the data.
(c) To determine the variation in the performance of a learning algorithm, we generally use
one training set and one test set.
(d) A learning algorithm can learn different parameter values if given different samples from
the same distribution.

Sol. (b), (d)


The training loss does not necessarily indicate the performance of a model on test data.
For a good estimate of the performance, the train and test data should represent the data
distribution.
We need multiple train and test samples to determine the variation in the learned models.
The learned parameter values may be different for different samples, as explained in the lec-
tures.
2. Suppose we have a classification dataset comprising of 2 classes A and B with 100 and 50
samples respectively. Suppose we use stratified sampling to split the data into train and test
sets. Which of the following train-test splits would be appropriate?

(a) Train- {A : 80 samples, B : 30 samples}, Test- {A : 20 samples, B : 20 samples}


(b) Train- {A : 20 samples, B : 20 samples}, Test- {A : 80 samples, B : 30 samples}
(c) Train- {A : 80 samples, B : 40 samples}, Test- {A : 20 samples, B : 10 samples}
(d) Train- {A : 20 samples, B : 10 samples}, Test- {A : 80 samples, B : 40 samples}

Sol. (c)
In stratified sampling, the train and test sets have the same class proportions as the original
dataset. Also, the train set is generally chosen to be larger than the test set.
Options (c) and (d) preserve the class proportion in the original dataset. Of these two, (c) has
a larger training set while (d) has a larger test set. Hence, (c) is the right option.
3. Suppose we are performing cross-validation on a multiclass classification dataset with N data
points. Which of the following statement(s) is/are correct?

(a) In k-fold cross validation, each fold should have a class-wise proportion similar to the
given dataset.
(b) In k-fold cross-validation, we train one model and evaluate it on the k different test sets.
(c) In LOOCV, we train N different models, using (N-1) data points for training each model.
(d) In LOOCV, we can use the same test data to evaluate all the trained models.

1
Sol. (a), (c)
If the class-wise proportions are different across different folds, the model evaluate will be
incorrect.
In k-fold cross-validation, we divide the dataset into k parts. For a given model, we use one
part as the test set and the remaining (k-1) parts as the training set. This process is repeated
k times to get k models.
LOOCV (Leave One Out Cross Validation) is a special case of k-fold cross-validation. For a
given model, we use one data point as the test set and the remaining (N-1) data points for
training. This process is repeated N times to get N models.
4. Suppose we have a binary classification problem wherein we need to achieve a high recall. On
training four classifiers and evaluating them, we obtain the following confusion matrices. Each
matrix has the format indicated below:

Predicted Positive Predicted Negative


Actual Positive —- —-
Actual Negative —- —-

Which of these classifiers should we prefer?


 
4 6
(a)
3 87
 
8 2
(b)
11 79
 
5 5
(c)
0 90
 
2 8
(d)
4 86

Sol. (b)
TP
The recall is computed as T P +F N .
8
For option (b), recall = 8+2 = 0.8,which is the maximum among all the options. Similarly,
we can compute it for the other options and verify that option (b) has the highest recall.

5. Suppose we have a binary classification problem wherein we need to achieve a low False Positive
Rate (FPR). On training four classifiers and evaluating them, we obtain the following confusion
matrices. Each matrix has the format indicated below:

Predicted Positive Predicted Negative


Actual Positive —- —-
Actual Negative —- —-

Which of these classifiers should we prefer?


 
4 6
(a)
6 84

2
 
8 2
(b)
13 77
 
5 5
(c)
2 88
 
10 0
(d)
4 86

Sol. (c)
FP
The FPR is computed as F P +T N .
2
For option (c), F P R = 88+2 = 0.022,which is the minimum among all the options. Similarly,
we compute it for the other options and verify that option (c) has the lowest FPR.

6. We have a logistic regression model that computes the probability p(x) that a given in-
put x belongs to the positive class. For a threshold θ ∈ (0, 1), the class labels f (x) ∈
{negative, positive} are predicted as given below.
(
negative, if p(x) < θ
f (x) =
positive, if p(x) ≥ θ

For θ = 0.5, we have T P R = 0.8 and F P R = 0.3. Then which of the following statement(s)
is/are correct?

(a) For θ = 0.4, the FPR could be lower than 0.25.


(b) For θ = 0.4, the FPR could be higher than 0.45.
(c) For θ = 0.6, the TPR must be higher than 0.85.
(d) For θ = 0.6, the TPR could be higher than 0.85.
(e) For θ = 0.4, the TPR must be lower than 0.75.
(f) For θ = 0.4, the TPR could be lower than 0.75.

Sol. (b)
FPR = FP/(FP + TN). Here the denominator (FP + TN) is the number of actual negative
samples and it remains constant irrespective of the value of θ. If the value of θ is decreased,
some of the actual negatives which were correctly classified as negatives for θ = 0.5 could be
incorrectly classified as positives for θ = 0.4. Thus, a decrease in θ could increase FP, which
in turn will increase FPR. On the contrary, an increase in θ could decrease FP, which in turn
will decrease FPR. Thus, option (a) is incorrect and option (b) is correct.
TPR = TP/(TP + FN). Here the denominator (TP + FN) is the number of actual positive
samples and it remains constant irrespective of the value of θ. If the value of θ is decreased,
some of the actual positives which were incorrectly classified as negatives for θ = 0.5 could be
correctly classified as positives for θ = 0.4. Thus, a decrease in θ could increase TP, which in
turn will increase TPR. On the contrary, an increase in θ could decrease TP, which in turn
will decrease FPR. Thus, options (c), (d), (e), (f) are incorrect.
Hence, the only correct option is (b). All the other options are incorrect.

3
7. Consider the following statements.
Statement P: Boosting takes multiple weak classifiers and combines them into a strong
classifier.
Statement Q: Boosting assigns equal weights to the predictions of all the weak classifiers,
resulting in a high overall performance.

(a) P is True. Q is True. Q is the correct explanation for A.


(b) P is True. Q is True. Q is not the correct explanation for A.
(c) P is True. Q is False.
(d) Both P and Q are False.

Sol. (c)

Statement P is true since it summarizes the basic principle of boosting.


As explained in the lecture, boosting determines the proportion of importance each weak
classifier should be assigned and combines them to make the final prediction. It does not
assign equal weights to each classifier. Hence, statement Q is false.

8. Which of the following statement(s) about ensemble methods is/are correct?

(a) The individual classifiers in bagging cannot be trained parallelly.


(b) The individual classifiers in boosting cannot be trained parallelly.
(c) A committee machine can consist of different kinds of classifiers like SVM, decision trees
and logistic regression.
(d) Bagging further increases the variance of an unstable classifier.

Sol. (b), (c)


Please refer to the relevant lectures.

You might also like