Professional Documents
Culture Documents
Exercise Introduction
Exercise Introduction
1. You must choose a machine learning multiclass dataset to process it with the tools
that you have learned and those that you will learn in this part of the course. In order to
choose such a dataset, you must visit the following web site:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
You must also check the following web site to learn how to load the dataset in sklearn:
https://scikit-
learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html
2. You must split the dataset into a training set and a disjoint validation set, if the
original dataset does not already provide the split. Then you must train a Support Vector
Machine (SVM) on the training set. After that, you must measure the mean
classification accuracy on the validation set.
3. You must retrain the SVM with a different kernel and measure the performance
again.
4. Finally, you must repeat the steps 2 and 3 with multilabel coding rather than
multiclass coding.
Note: After you finish your work, you must download the Jupyter Notebook in PDF
format (the menu option is File -> Print -> Save as PDF). Then you must put the .pdf
and .ipynb files into a .zip archive file, and submit the .zip file to the virtual campus
activity.
Optional task: In order to generate a good quality PDF, you may put the following code
as the last cell of your notebook:
Enter the full file name of your notebook as shown above, within quotes. The pdf will
be saved on your Google Drive. For better quality plots use: