You are on page 1of 2

ELEC801 Pattern Recognition, Fall 2019

Programming Assignment 2

Instructor: Gil-Jin Jang Email: gjang@knu.ac.kr


School of Electronics Engineering, Kyungpook National University

The purpose of programming assignment 2 is to practice cross validation experiments on IRIS dataset
with various, uni-modal Gaussian classifiers and SVM.

1 Cross Validation Data Design

The Iris dataset is composed of 150 samples of 4-dimensional vectors with 1 integer label. There are 3
distinct labels, and there are exactly 50 samples per label. We first want to 5-fold cross validation as
follows:

1. For class 1, split data into 5 folds: sample numbers 1-10, 11-20, 21-30, 31-40, and 41-50, and they
are named as f11, f12, f13, f14, and f15, respectively.

2. For class 2, its folds are f21, f22, f23, f24, and f25.

3. For class 3, its folds are f31, f32, f33, f34, and f35.

4. Create a training data set by R1 = {f11, . . ., f14, f21, . . ., f24, f31, . . ., f34 }, and
a test set by T1 = {f15, f25, f35 }.

5. Use R1 to train the above 6 Gaussian classifiers, calculate the accuracy on T1.

6. Repeat the above with R2-R5 and T2-T5, to obtain 5 accuracies.

7. Find the average accuracies, and determine the best Gaussian classifier for Iris dataset.

2 Gaussian Classifier Design

There are 6 different types of uni-model Gaussian classifiers.

1. Σc = σ 2 I

2. Σc = Σ = diag(σ12 , . . . , σm
2
)

3. Σc = Σ

4. Σc = σc2 I

5. Σc1 6= Σc2 – general case


2 2
6. Σc1 6= Σc2 , Σc = diag(σc,1 , . . . , σc,m ) – diagonal covariance case

1
For given data in the form of matrix, (dimension) × (number of samples), or (number of samples) ×
(dimension), depending on your implementation, write 6 methods (functions) to estimate mean and
covariance matrix of the above 6 techniques.

1. Perform 5-fold cross-validation experiments for all 6 methods


2. Evaluate the average performace
3. Determine which is the best out of 6 methods

You may use python with numpy, scikit-learn, etc.

3 Support Vector Machines

The hyperparamters of SVM are C (non-separability) and kernel-specific parameters. Use 5-fold cross
validation to

1. Determine the best C and degree (order, rank, etc.) of the polynomial kernel functions.
2. Determine the best C and the standard deviation of the Gaussian kernel function (or called γ
paramter of RBF kernel).

These hyperparameter selection process is called grid search because the combinations of discrete param-
eter selection constitute a grid in a multidimensional vector space, and we look into every grid to find
the optimal hyperparameter set.

The critical design issue is the selection of discretizing continuous parameter space: for example, the
usual choice of C = {1, 10, 100, . . .}.

There are a lot of implementation with scikit-learn, so read them very carefully to design a good grid
search framework. Several examples (but not limited to) are as follows:

• https://scikit-learn.org/stable/modules/generated/sklearn.model selection.GridSearchCV.html
• https://scikit-learn.org/stable/auto examples/model selection/plot grid search digits.html
• https://scikit-learn.org/stable/auto examples/svm/plot rbf parameters.html
• https://scikit-learn.org/stable/modules/grid search.html

4 Grading Scheme
10% Basic score for submission
30% Executable
30% Code readibility
• Be sure that your name and student ID are on the top of the code
30% Report
Due 11/5 Tuesday 11:59 LMS time
Late submission 11/6 Wednesday 09:59 LMS time, 10% penalty per hour.

You might also like