You are on page 1of 1

PROBLEM STATEMENT FOR THE INTRODUCTION SECTION OF THE

MACHINE LEARNING PART OF THE COURSE

1. You must choose a machine learning multiclass dataset to process it with the tools
that you have learned and those that you will learn in this part of the course. In order to
choose such a dataset, you must visit the following web site:
https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
You must also check the following web site to learn how to load the dataset in sklearn:
https://scikit-
learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html

2. You must split the dataset into a training set and a disjoint validation set, if the
original dataset does not already provide the split. Then you must train a Support Vector
Machine (SVM) on the training set. After that, you must measure the mean
classification accuracy on the validation set.

3. You must retrain the SVM with a different kernel and measure the performance
again.

4. Finally, you must repeat the steps 2 and 3 with multilabel coding rather than
multiclass coding.

Note: After you finish your work, you must download the Jupyter Notebook in PDF
format (the menu option is File -> Print -> Save as PDF). Then you must put the .pdf
and .ipynb files into a .zip archive file, and submit the .zip file to the virtual campus
activity.

Optional task: In order to generate a good quality PDF, you may do the following.

First, put the following code as the first cell of your notebook:
!apt-get install texlive texlive-xetex texlive-latex-extra pandoc
!pip install pypandoc
from google.colab import drive
drive.mount('/content/drive')

Then put the following code as the last cell of your notebook:
!jupyter nbconvert --to PDF "/content/drive/MyDrive/Colab Notebooks/NB.ipynb"

where NB.ipynb must be replaced by the file name of your notebook. Finally, run the
notebook from the beginning. The pdf will be saved on your Google Drive.

For better quality plots use:

from IPython.display import set_matplotlib_formats


set_matplotlib_formats('pdf', 'svg')

as the first cell of the notebook.

You might also like