Professional Documents
Culture Documents
The purpose of this labwork is to study the support vector machine (SVM) algorithm with the radial basis
function (RBF) kernel.
Remark: for better readability of the code, imports are added when needed, even though this is not considered
a good practice (imports are generally done at the beginning of the file).
f (x) = w > x + b.
The hyperplane w is written as a linear combination of the so called support vectors which are a subset of
training dataset D :
n
αi y i x ij
X
w=
i =1
The support vectors being the set x i : αi > 0 . Hence, the scoring function f (x) writes:
© ª
n
αi y i x i > x + b.
X
f (x) =
i =1
This solution is a linear separation of the sample space X . One can imagine a feature sapce H, endowed with
a scalar product 〈·, ·〉H and a function φ : X 7→ H computing a representation of samples x of X on which the
classification is performed.
The advantages of applying the SVM on φ(x) ∈ H rather than in x ∈ X are that:
• classification can be performed on objects that are not in a vector space (on which the scalar product
x i > x would not be defined);
• the linear separation performed by the SVM on H could induce more complex separation in X .
One important feature of SVM is that the (dual) optimization as well as the classification of a new sample x
only involves
scalar products. Hence a SVM performedin the feature ® space φ(x) would only involve the scalar
products φ(x i ), φ(x) H . That means that if k (x, z) := φ(x), φ(z) H is calculable for any x, z ∈ X without the
®
explicit values of φ(x) and φ(z), the feature space H as well as the feature map φ(·) do not need to be known to
perform a SVM in H. This powerful property is called the kernel trick (the function k (·, ·) is called the kernel).
Furthermore, under some hypothesis on k (·, ·) (symmetric and positive-definite), it is shown that the Hilbert
space H exists. The most used kernel is the so called RBF, or gaussian kernel:
2
¡ ¢
k (x, z) = exp −γkx − zk .
1
Sometimes the precision γ is replaced by 2σ1 2 . The gaussian kernel is positive-definite: for every dataset
X = x i : x ∈ {1, . . . , n} and every (α1 , . . . , αn ) ∈ R∗n we have:
© ª
n
αi α j k (x i , x j ) > 0.
X
i =1
Hence, there exist one Hilbert space H, the reproducing kernel Hilbert space (RKHS), and one feature map
φ : X 7→ H s.t.:
k (x, z) = φ(x), φ(z) H ∀x, z ∈ X .
®
kernel_svm.py
from sklearn.datasets import make_moons
n = 900
X, y = make_moons(n_samples=n, noise=0.25, random_state=42)
Q 1-1. After loading the dataset (see above), split the data to create Xa, ya, Xt, yt, Xv, yv the train, test
and validation sets (of equal sizes).
Q 1-2. Visualize the training set to check that you need a non linear classifier.
kernel_svm.py
from matplotlib import pyplot as plt
plt.figure(figsize=(12, 8))
plt.scatter(Xa[:, 0], Xa[:, 1], c=ya)
The chosen kernel is the RBF (which is also the default kernel in scikit-learn library):
2
¡ ¢
k (x, z) = exp −γkx − zk .
The precision γ is a hyperparameter that has to be chosen by the user (or selected, see section 2). Another
chosen hyperparameter is C (a regularization term controlling slack variables, see lesson 6). First, we set
arbitrary values for C and γ: C = 10, γ = 0.1.
Q 1-4. With function plot_decision_bounday (from svm_utility) plot the decision boundary of a SVM fit
on the training samples. Comment on the obtained decison regions.
Q 1-5. Vary γ (e.g. γ ∈ {1 × 10−2 , 1 × 10−1 , 1, 10, 100}) to see its influence on the decision regions. What can be
said for small values of γ? Large values?
kernel_svm.py
from sklearn.svm import SVC
clf.fit(Xa, ya)
2
A validation procedure has to be applied to chose the proper values of C and γ: the performances of the model
(fit on the training set) are evaluated on the validation set. The chosen (C , γ) is the one yielding the best accuracy
on the validation set.
Q 1-6. Select C in the logarithmic range {1 × 10−3 , . . . , 1 × 102 } and γ in {1 × 10−2 , . . . , 1 × 102 }. What is the
optimal pair (C ∗ , γ∗ ) to select?
kernel_svm.py
from sklearn.metrics import accuracy_score
C_ = np.logspace(-3, 2, 6)
gamma_ = np.logspace(-2, 2, 5)
plt.figure(figsize=(12, 8))
Q 1-7. Now you can merge training set and validation set to train the optimal SVM and evaluate its perfor-
mances on the test set. Comment on the decison region.
kernel_svm.py
Xa = np.concatenate([Xa, Xv])
ya = np.concatenate([ya, yv])
clf.C = C_star
clf.gamma = gamma_star
# ... to complete
2 Model selection
The validation procedure described question 1-6 can be done with a grid search using GridSearchCV from
scikit-learn library.
3
model_selection.py
from sklearn.datasets import load_breast_cancer
data = load_breast_cancer()
print(’Target:’, *data.target_names)
print(’Features:’, ’, ’.join(data.feature_names))
X, y = data.data, data.target
The selection procedure is a K -fold (the training set is split into K subsets, each consecutively used for validation,
the K − 1 remaining used for model fitting).
model_selection.py
C_ = np.logspace(-0.5, 2, 25)
gamma_ = np.logspace(-3, 1, 25)
Q 2-2. What is the accuracy attained on the train set? on the test set?