Professional Documents
Culture Documents
9 SVM 2
9 SVM 2
Example
21
21
22
22
1
11/26/2021
Properties
• The optimization problem has a very interesting property: it
turns out that only observations that either lie on the margin or
that violate the margin will affect the hyperplane, and hence the
classifier obtained
• An observation that lies strictly on the correct side of the
margin does not affect the support vector classifier!
• Changing the position of that observation would not change the
classifier at all, provided that its position remains on the correct
side of the margin.
• Observations that lie directly on the margin, or on the wrong
side of the margin for their class, are known as support vectors.
• These observations do affect the support vector classifier.
23
23
24
24
2
11/26/2021
Specificity
• The fact that the support vector classifier’s decision rule is based only
on a potentially small subset of the training observations (the support
vectors) means that it is quite robust to the behavior of observations
that are far away from the hyperplane.
• Recall that the LDA classification rule depends on the mean of all the
observations within each class, as well as the within-class covariance
matrix computed using all the observations.
25
25
2
6
26
3
11/26/2021
27
27
• Many possible ways to enlarge the feature space, and that unless
we are careful, we could end up with a huge number of features.
28
28
4
11/26/2021
Kernels
• SVM algorithms use a set of mathematical functions that are
defined as the kernel.
• The function of kernel is to take data as input and transform it into the
required form.
• Some of the common kernels: linear, nonlinear, polynomial, radial
basis function (RBF), and sigmoid. Each of these will result in a
different nonlinear classifier in the original input space.
29
29
30
30
5
11/26/2021
One-Versus-One Classification
𝑘
• A one-versus-one or all-pairs approach constructs SVM
2
classifiers, each of which compares a pair of classes.
31
31
One-Versus-All Classification
32
32
6
11/26/2021
• As γ increases and the fit becomes more non-linear, the ROC curves improve.
• Using γ = 10−1 appears to give an almost perfect ROC curve. However, these
curves represent training error rates, which can be misleading in terms of
performance on new test data.
33
33
• once again evidence that while a more flexible method will often
produce lower training error rates, this does not necessarily
lead to improved performance on test data.
• The SVMs with γ = 10−2 and γ = 10−3 perform comparably to the
support vector classifier, and all three outperform the SVM with
γ = 10−1
34
34