You are on page 1of 6

Exercise 1 1

HomeWork
November 2nd, 2022
October 29, 2021

1 DA-AI - HomeWork 1
Teachingassistant:
Teaching assistant: Francesco
Gabriele Tiboni
Cappio Borlino
gabriele.tiboni@polito.it
francesco.cappio@polito.it

1.1 Quick recap


1.1.1 Classification task
We are given a training set S = ( xi , yi )im=1 . We have to build a classifier h able to predict the label
of new data points: ŷi = h( xi ).

1.1.2 k-NN
• it’s a family of classification algorithms (one for each value of K);
• hk− NN ( x ) outputs the label y appearing in the majority of the k points xt ∈ S which are
closest to x

1
1.1.3 SVM
• optimal binary classification algorithm for linearly separable problems;
– obtained thorugh the maximization of the margin;

Hard margin problem:


• when the problem is not linearly separable:

2
– we add slack variables and a penalty parameter C
Soft margin problem:

A soft margin formulation may not be enough:

• when the problem is not linearly separable:


– we map the problem to an higher dimensional feature space through a function Φ

3
Instead of computing Φ we apply the kernel trick: k( xi , x j ) = hΦ( xi ), Φ( x j )i. An example of
possible kernel function is the RBF with parameter γ:
2
k ( x i , x j ) = e − γ ( xi − x j )

1.2 Exercise 1 - kNN


Steps:
1. load the Wine dataset (included in sklearn library);
2. select only 2 attributes (the first 2 or any couple you prefer);
• by looking at the data you can try to understand which is the best choice
3. split into train, validation and test sets (suggested proportions 5:2:3)
4. for different values of K ([1,3,5,6])
• apply kNN
• plot training data and decision boundaries
• evaluate on validation set
5. inspect the results:
• plot the trend of the validation accuracy for the different Ks
• look at how the boundaries change for the different values of K
6. use the best value of K to predict on the test set and compute the accuracy.
Example of output:

4
1.3 Exercise 2 - Linear SVM
Steps:
1. use the same data as before (same attributes and splits)
2. for different values of C ([0.001, 0.01, 0.1, 1, 10, 100, 1000]) repeat the evaluations performed
before with K
3. inspect the results:
• plot the trend of the validation accuracy for the different Cs
• try to understand how C influences the result, look at what happens with low and high
values
4. this is a multi-class problem (3 classes), look at the decision_function_shape parameter
and try to understand how it is used and how it influences the result
Example of output:

1.4 Exercise 3 - SVM with RBF Kernel


Steps:
1. use the same data as before (same attributes and splits)
2. for different values of C ([0.001, 0.01, 0.1, 1, 10, 100, 1000]) repeat the evaluations performed
before
• for this step keep γ fixed and simply focus on understanding how the decision bound-
aries changed w.r.t. the linear SVM
3. perform a grid search over both γ and C at the same time:

5
• for each of them select an appropriate range
• for each combination compute the validation accuracy and plot the decision boundary;
• choose the best parameters and evaluate the best model on the test set;
4. try to understand how γ influences the output;
5. compare the performance of this model with the results you obtained before.
Example of output:

1.5 Exercise 4 - K-Fold cross validation


Steps:
1. use the same data as before (same attributes and splits)
2. merge the train and validation set;
3. repeat the grid search over C and γ performing a 5-fold cross validation
4. evaluate on the test set: do you obtain an improvement?

1.6 Need help?


• the documentation is always useful: https://scikit-learn.org/stable/
– wine dataset: https://scikit-learn.org/stable/modules/generated/sklearn.
datasets.load_wine.html
– model selection: https://scikit-learn.org/stable/model_selection.html
– knn: https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.
KNeighborsClassifier.html
– svm: https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.
html
• live assistance: Tuesday Nov21
Wed 03 Nov 2nd, 228:30-10:00
from from 8:30-10:00 classroom
classroom 05AM 05AM
#exercises
• Slack channel: #hw-1
• please do not write me emails with questions regarding the homework: if we keep the dis-
cussion on Slack also other students may find help

You might also like