ML LW 6 Kernel SVM

Introduction to Machine Learning / Labwork 6
Kernel SVMs and model selection
Maxime Ossonce maxime.ossonce@esme.fr
The purpose of this labwork is to study the support vector machine (SVM) algorithm with the radial basis
function (RBF) kernel.
Remark: for better readability of the code, imports are added when needed, even though this is not considered
a good practice (imports are generally done at the beginning of the file).
The kernel trick
The SVM algorithm test function is of the the form:
f (x) = w > x + b.
The hyperplane w is written as a linear combination of the so called support vectors which are a subset of
training dataset D :
n
αi y i x ij
X
w=
i =1
The support vectors being the set x i : αi > 0 . Hence, the scoring function f (x) writes:
© ª
n
αi y i x i > x + b.
X
f (x) =
i =1
This solution is a linear separation of the sample space X . One can imagine a feature sapce H, endowed with
a scalar product 〈·, ·〉H and a function φ : X 7→ H computing a representation of samples x of X on which the
classification is performed.
The advantages of applying the SVM on φ(x) ∈ H rather than in x ∈ X are that:
• classification can be performed on objects that are not in a vector space (on which the scalar product
x i > x would not be defined);
• the linear separation performed by the SVM on H could induce more complex separation in X .
One important feature of SVM is that the (dual) optimization as well as the classification of a new sample x
only involves
scalar products. Hence a SVM performedin the feature ® space φ(x) would only involve the scalar
products φ(x i ), φ(x) H . That means that if k (x, z) := φ(x), φ(z) H is calculable for any x, z ∈ X without the
®
explicit values of φ(x) and φ(z), the feature space H as well as the feature map φ(·) do not need to be known to
perform a SVM in H. This powerful property is called the kernel trick (the function k (·, ·) is called the kernel).
Furthermore, under some hypothesis on k (·, ·) (symmetric and positive-definite), it is shown that the Hilbert
space H exists. The most used kernel is the so called RBF, or gaussian kernel:
2
¡ ¢
k (x, z) = exp −γkx − zk .
1
Sometimes the precision γ is replaced by 2σ1 2 . The gaussian kernel is positive-definite: for every dataset
X = x i : x ∈ {1, . . . , n} and every (α1 , . . . , αn ) ∈ R∗n we have:
© ª
n
αi α j k (x i , x j ) > 0.
X
i =1
Hence, there exist one Hilbert space H, the reproducing kernel Hilbert space (RKHS), and one feature map
φ : X 7→ H s.t.:
k (x, z) = φ(x), φ(z) H ∀x, z ∈ X .
®
1 Kernel SVM on synthetic data
The dataset used here is a 2D (p = 2) dataset with two classes.
kernel_svm.py
from sklearn.datasets import make_moons
n = 900
X, y = make_moons(n_samples=n, noise=0.25, random_state=42)
Q 1-1. After loading the dataset (see above), split the data to create Xa, ya, Xt, yt, Xv, yv the train, test
and validation sets (of equal sizes).
Q 1-2. Visualize the training set to check that you need a non linear classifier.
kernel_svm.py
from matplotlib import pyplot as plt
plt.figure(figsize=(12, 8))
plt.scatter(Xa[:, 0], Xa[:, 1], c=ya)
The chosen kernel is the RBF (which is also the default kernel in scikit-learn library):
2
¡ ¢
k (x, z) = exp −γkx − zk .
Q 1-3. What is the equivalent kernel when γ → 0?
The precision γ is a hyperparameter that has to be chosen by the user (or selected, see section 2). Another
chosen hyperparameter is C (a regularization term controlling slack variables, see lesson 6). First, we set
arbitrary values for C and γ: C = 10, γ = 0.1.
Q 1-4. With function plot_decision_bounday (from svm_utility) plot the decision boundary of a SVM fit
on the training samples. Comment on the obtained decison regions.
Q 1-5. Vary γ (e.g. γ ∈ {1 × 10−2 , 1 × 10−1 , 1, 10, 100}) to see its influence on the decision regions. What can be
said for small values of γ? Large values?
kernel_svm.py
from sklearn.svm import SVC
clf = SVC(kernel=’rbf’) # SVM with gaussian (rbf) kernel

clf.gamma = 0.1
clf.C = 10
clf.fit(Xa, ya)
from svm_utility import plot_decison_boundary
plot_decison_boundary(Xa, ya, clf)
2
A validation procedure has to be applied to chose the proper values of C and γ: the performances of the model
(fit on the training set) are evaluated on the validation set. The chosen (C , γ) is the one yielding the best accuracy
on the validation set.
Q 1-6. Select C in the logarithmic range {1 × 10−3 , . . . , 1 × 102 } and γ in {1 × 10−2 , . . . , 1 × 102 }. What is the
optimal pair (C ∗ , γ∗ ) to select?
kernel_svm.py
from sklearn.metrics import accuracy_score
C_ = np.logspace(-3, 2, 6)
gamma_ = np.logspace(-2, 2, 5)
val_err_ = np.zeros((len(C_), len(gamma_)))
for i_C, C in enumerate(C_):

for i_g, gamma in enumerate(gamma_):
clf.C = C
clf.gamma = gamma
print(’Fitting with C= {}, gamma={}’.format(C, gamma))
clf.fit(Xa, ya)
err = 1 - accuracy_score(yv, clf.predict(Xv))

print(’Validation error; {:.1%}’.format(err))
val_err_[i_C, i_g] = err
plt.figure(figsize=(12, 8))
extentC = [min(np.log10(C_)) - 0.5, max(np.log10(C_)) + 0.5]

extentG = [min(np.log10(gamma_)) - 0.5, max(np.log10(gamma_)) + 0.5]
plt.imshow(val_err_, extent=[*extentG, *extentC])
plt.colorbar()
plt.xlabel("log(C)")
plt.ylabel("log(gamma)")
plt.title("Validation error rate")
ind_C, ind_gamma = np.unravel_index(np.argmin(val_err_), val_err_.shape)

C_star = C_[ind_C]
gamma_star = gamma_[ind_gamma]
print(’C*={}, gamma*={}’.format(C_star, gamma_star))
Q 1-7. Now you can merge training set and validation set to train the optimal SVM and evaluate its perfor-
mances on the test set. Comment on the decison region.
kernel_svm.py
Xa = np.concatenate([Xa, Xv])
ya = np.concatenate([ya, yv])
clf.C = C_star
clf.gamma = gamma_star
# ... to complete
2 Model selection
The validation procedure described question 1-6 can be done with a grid search using GridSearchCV from
scikit-learn library.
The used dataset is the UCI breast cancer Wisconsin.
3
model_selection.py
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
print(’Target:’, *data.target_names)
print(’Features:’, ’, ’.join(data.feature_names))
X, y = data.data, data.target
The selection procedure is a K -fold (the training set is split into K subsets, each consecutively used for validation,
the K − 1 remaining used for model fitting).
Q 2-1. Perform the grid search using a K -fold validation (K = 3).
model_selection.py
C_ = np.logspace(-0.5, 2, 25)
gamma_ = np.logspace(-3, 1, 25)
from sklearn.model_selection import GridSearchCV

from sklearn.svm import SVC
# the grid
parameters = [{"gamma": gamma_, "C": C_}]
# Define the classifier

clf = SVC(kernel=’rbf’)
# Perform a K-fold validation using the accuracy as the performance measure
K = 3
clf = GridSearchCV(clf, param_grid=parameters, cv=K, scoring=’accuracy’, verbose=2, n_jobs=2)
clf.fit(Xa, ya) # of course you have to do first a train / test split!
print(’Best parameters:’, clf.best_params_)
print(’Best score: {:.1%}’.format(clf.best_score_))
Q 2-2. What is the accuracy attained on the train set? on the test set?

ML LW 6 Kernel SVM

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML LW 6 Kernel SVM

Uploaded by

Copyright:

Available Formats

Introduction to Machine Learning / Labwork 6

Kernel SVMs and model selection

Maxime Ossonce maxime.ossonce@esme.fr

The kernel trick

The SVM algorithm test function is of the the form:

1 Kernel SVM on synthetic data

The dataset used here is a 2D (p = 2) dataset with two classes.

Q 1-3. What is the equivalent kernel when γ → 0?

clf = SVC(kernel=’rbf’) # SVM with gaussian (rbf) kernel

from svm_utility import plot_decison_boundary

plot_decison_boundary(Xa, ya, clf)

val_err_ = np.zeros((len(C_), len(gamma_)))

for i_C, C in enumerate(C_):

err = 1 - accuracy_score(yv, clf.predict(Xv))

extentC = [min(np.log10(C_)) - 0.5, max(np.log10(C_)) + 0.5]

ind_C, ind_gamma = np.unravel_index(np.argmin(val_err_), val_err_.shape)

The used dataset is the UCI breast cancer Wisconsin.

Q 2-1. Perform the grid search using a K -fold validation (K = 3).

from sklearn.model_selection import GridSearchCV

# Define the classifier

You might also like