Week7 - SVM and Kernels

Relationship to Logistic Regression
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes
1
𝑦𝛽 𝑥 =
1+𝑒 −(𝛽0 + 𝛽1 𝑥 + ε )
2
Support Vector Machines (SVM)
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
3
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
Three misclassifications
4
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
Two misclassifications
5
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
No misclassifications
6
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
No misclassifications—but is this the best position?
7
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
No misclassifications—but is this the best position?
8
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
Maximize the region between classes.
9
Similarity Between Logistic Regression and SVM
Patient Lost: 1.0

Status
After Five 0.5
Years
Survived: 0.0
10
Classification with SVMs
Two features (nodes, age)
Two labels (survived, lost)
60
40
Age
20
0 10 20
Number of Malignant Nodes
11
Find the line that best separates classes.
60
40
Age
20
0 10 20
12
60
40
Age
20
0 10 20
13
60
40
Age
20
0 10 20
14
60
40
Age
20
0 10 20
15
Also, include the largest boundary possible.
60
40
Age
20
0 10 20
16
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
17
Lost: 1.0
Patient
Status
Survived: 0.0
3 Logistic Regression
Cost Function
for Lost Class
2
-3 -2 -1 0 1 2 3
18
Lost: 1.0
Patient
Status
Survived: 0.0
3 Logistic Regression 3 SVM

Cost Function Cost Function
for Lost Class for Lost Class
2 2
1 1
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
19
The SVM Cost Function
Lost: 1.0
Patient
Status
Survived: 0.0
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
20
Lost: 1.0
Patient
Status
Survived: 0.0
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
21
Lost: 1.0
Patient
Status
Survived: 0.0
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
22
Lost: 1.0
Patient
Status
Survived: 0.0
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
23
Lost: 1.0
Patient
Status
Survived: 0.0
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
24
Outlier Sensitivity in SVMs
60
40
Age
20
0 10 20
25
60
40
Age
20
0 10 20
26
60
40
Age
20
0 10 20
27
60
40
Age
20
0 10 20
28
This is probably still the correct boundary.
60
40
Age
20
0 10 20
29
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
40
Age
20
0 10 20
30
1
𝐶
60
40
Age
20
0 10 20
31
1
𝐶
60
Best Fit
40
Age
20
0 10 20
32
1 Large
𝐶
60
Best Fit
40
Age
20
0 10 20
33
1
𝐶
60
40
Age
20
0 10 20
34
1
𝐶
60
Slightly Higher
40
Age
20
0 10 20
35
1 Much Smaller
𝐶
60
Slightly Higher
40
Age
20
0 10 20
36
Interpretation of SVM Coefficients
1
𝐶
60
40
Age
20
0 10 20
37
Interpretation of SVM Coefficients
1
𝐶
60
𝛽1
40
𝛽2
𝛽3
Vector orthogonal
20
to the hyperplane
0 10 20
38
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC
39
Create an instance of the class.

LinSVC = LinearSVC(penalty='l2', C=10.0)
40

Regularization
parameters
41

Fit the instance on the data and then predict the expected value.
LinSVC = LinSVC.fit(X_train, y_train)
y_predict = LinSVC.predict(X_test)
42

LinSVC = LinSVC.fit(X_train, y_train)
y_predict = LinSVC.predict(X_test)
Tune regularization parameters with cross-validation.
43
44
60
40
Age
20
0 10 20
45
Non-Linear Decision Boundaries with SVM
Non-linear data can be made linear with higher dimensionality.
46
The Kernel Trick
Transform data so it is linearly separable.
47
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Budget
IMDB User Rating
48
SVM Gaussian Kernel
Approach 1:
Create higher order
features to transform
the data.
Budget Budget2 +
Rating2 +
Budget * Rating +
…
IMDB User Rating
49
SVM Gaussian Kernel
Approach 2:
Transform the space to
a different
coordinate system.
Budget
IMDB User Rating
50
SVM Gaussian Kernel
Define Feature 1:
Similarity to
“Pulp Fiction.”
Budget
IMDB User Rating
51
SVM Gaussian Kernel
Define Feature 2:
Similarity to
“Black Swan.”
Budget
IMDB User Rating
52
SVM Gaussian Kernel
Define Feature 3:
Similarity to
“Transformers.”
Budget
IMDB User Rating
53
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖
𝑃𝑢𝑙𝑝 𝐹𝑖𝑐𝑡𝑖𝑜𝑛 2
)
Palme d’Or Winners at Cannes. 𝑜𝑏𝑠
𝑎1 (𝑥 ) = 𝑒𝑥𝑝
2𝜎 2
Create a
Gaussian function
at feature 1.
Budget
IMDB User Rating
54
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖𝐵𝑙𝑎𝑐𝑘 𝑆𝑤𝑎𝑛 )2
Palme d’Or Winners at Cannes. 𝑎1 (𝑥 𝑜𝑏𝑠 ) = 𝑒𝑥𝑝
2𝜎 2
Create a
Gaussian function
at feature 2.
Budget
IMDB User Rating
55
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖
𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟𝑠 2
)
Palme d’Or Winners at Cannes. 𝑜𝑏𝑠
𝑎1 (𝑥 ) = 𝑒𝑥𝑝
2𝜎 2
Create a
Gaussian function
at feature 3.
Budget
IMDB User Rating
56
SVM Gaussian Kernel
Budget
Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
IMDB User Rating
57
SVM Gaussian Kernel
a3 a1
a2
Budget
a1=0.90
Transformation:
a2=0.92
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
a3=0.30
IMDB User Rating
58
SVM Gaussian Kernel
a3 a1
a2
Budget
a1=0.50
a2=0.60
a3=0.70 Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
IMDB User Rating
59
SVM Gaussian Kernel
Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
a2 (Black Swan)
x1 (Budget)
a1 (Pulp Fiction)
x2 (IMDB Rating)
a3 (Transformers)
60
Classification in the New Space
Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
a2 (Black Swan)
a1 (Pulp Fiction)
a3 (Transformers)
61
SVM Gaussian Kernel
Budget
IMDB User Rating
62
SVM Gaussian Kernel
Budget
IMDB User Rating
63
SVM Gaussian Kernel
Radial Basis Function
(RBF) Kernel.
Budget
IMDB User Rating
64
SVMs with Kernels: The Syntax
from sklearn.svm import SVC
65

rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0)
66
Create an instance of the class. Set kernel and

rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0) associated coefficient
(gamma).
67
Create an instance of the class. "C" is penalty

rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0) associated with
the error term.
68

rbfSVC = rbfSVC.fit(X_train, y_train)
y_predict = rbfSVC.predict(X_test)
69

rbfSVC = rbfSVC.fit(X_train, y_train)
y_predict = rbfSVC.predict(X_test)
Tune kernel and associated parameters with cross-validation.
70
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.
71
Feature Overload
Problem data.
▪ Construct approximate kernel map with SGD using Nystroem or

Solution RBF sampler
72
Feature Overload
Problem data.
▪ Construct approximate kernel map with SGD using Nystroem or

Solution RBF sampler
▪ Fit a linear classifier
73
Faster Kernel Transformations: The Syntax
from sklearn.kernel_approximation import Nystroem

nystroemSVC = Nystroem(kernel='rbf', gamma=1.0,
n_components=100)
Fit the instance on the data and transform.

X_train = nystroemSVC.fit_transform(X_train)
X_test = nystroemSVC.transform(X_test)
Tune kernel parameters and components with cross-validation.
74

nystroemSVC = Nystroem(kernel='rbf', gamma=1.0, Multiple non-linear
n_components=100) kernels can be used.

75

nystroemSVC = Nystroem(kernel='rbf', gamma=1.0, Kernel and gamma are
n_components=100) identical to SVC.

76

nystroemSVC = Nystroem(kernel='rbf', gamma=1.0, n_components is
n_components=100) number of samples.

77
from sklearn.kernel_approximation import RBFsampler

rbfSample = RBFsampler(gamma=1.0,
n_components=100)

X_train = rbfSample.fit_transform(X_train)
X_test = rbfSample.transform(X_test)
78

rbfSample = RBFsampler(gamma=1.0, RBF is only kernel that
n_components=100) can be used.

79

rbfSample = RBFsampler(gamma=1.0, Parameter names
n_components=100) are identical
to previous.
80
When to Use Logistic Regression vs SVC
Features Data Model Choice
Many (~10K Features) Small (1K rows) Simple, Logistic or LinearSVC
Few (<100 Features) Medium (~10k rows) SVC with RBF
Few (<100 Features) Many (>100K Points) Add features, Logistic, LinearSVC or
Kernel Approx.
81

Week7 - SVM and Kernels

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week7 - SVM and Kernels

Uploaded by

Copyright:

Available Formats

Relationship to Logistic Regression

Patient Lost: 1.0

Patient Lost: 1.0

Patient Lost: 1.0

Patient Lost: 1.0

Patient Lost: 1.0

Patient Lost: 1.0

No misclassifications—but is this the best position?

Patient Lost: 1.0

No misclassifications—but is this the best position?

Patient Lost: 1.0

Maximize the region between classes.

Patient Lost: 1.0

3 Logistic Regression 3 SVM

Create an instance of the class.

Create an instance of the class.

Create an instance of the class.

Create an instance of the class.

Tune regularization parameters with cross-validation.

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

IMDB User Rating

Create an instance of the class.

Create an instance of the class. Set kernel and

Create an instance of the class. "C" is penalty

Create an instance of the class.

Create an instance of the class.

Tune kernel and associated parameters with cross-validation.

▪ Construct approximate kernel map with SGD using Nystroem or

▪ Construct approximate kernel map with SGD using Nystroem or

Create an instance of the class.

Fit the instance on the data and transform.

Tune kernel parameters and components with cross-validation.

Create an instance of the class.

Fit the instance on the data and transform.

Tune kernel parameters and components with cross-validation.

Create an instance of the class.

Fit the instance on the data and transform.

Tune kernel parameters and components with cross-validation.

Create an instance of the class.

Fit the instance on the data and transform.

Tune kernel parameters and components with cross-validation.

Create an instance of the class.

Fit the instance on the data and transform.

Tune kernel parameters and components with cross-validation.

Create an instance of the class.

Fit the instance on the data and transform.

Tune kernel parameters and components with cross-validation.

Create an instance of the class.

Tune kernel parameters and components with cross-validation.