Professional Documents
Culture Documents
Week7 - SVM and Kernels
Week7 - SVM and Kernels
1
𝑦𝛽 𝑥 =
1+𝑒 −(𝛽0 + 𝛽1 𝑥 + ε )
2
Support Vector Machines (SVM)
3
Support Vector Machines (SVM)
Three misclassifications
4
Support Vector Machines (SVM)
Two misclassifications
5
Support Vector Machines (SVM)
No misclassifications
6
Support Vector Machines (SVM)
7
Support Vector Machines (SVM)
8
Support Vector Machines (SVM)
9
Similarity Between Logistic Regression and SVM
10
Classification with SVMs
Two features (nodes, age)
Two labels (survived, lost)
60
40
Age
20
0 10 20
Number of Malignant Nodes
11
Classification with SVMs
Find the line that best separates classes.
60
40
Age
20
0 10 20
Number of Malignant Nodes
12
Classification with SVMs
Find the line that best separates classes.
60
40
Age
20
0 10 20
Number of Malignant Nodes
13
Classification with SVMs
Find the line that best separates classes.
60
40
Age
20
0 10 20
Number of Malignant Nodes
14
Classification with SVMs
Find the line that best separates classes.
60
40
Age
20
0 10 20
Number of Malignant Nodes
15
Classification with SVMs
Also, include the largest boundary possible.
60
40
Age
20
0 10 20
Number of Malignant Nodes
16
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
17
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
3 Logistic Regression
Cost Function
for Lost Class
2
-3 -2 -1 0 1 2 3
18
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
1 1
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
19
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
20
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
21
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
22
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
23
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5
Survived: 0.0
Number of Positive Nodes
3 SVM
Cost Function
2 for Lost Class
-3 -2 -1 0 1 2 3
24
Outlier Sensitivity in SVMs
60
40
Age
20
0 10 20
Number of Malignant Nodes
25
Outlier Sensitivity in SVMs
60
40
Age
20
0 10 20
Number of Malignant Nodes
26
Outlier Sensitivity in SVMs
60
40
Age
20
0 10 20
Number of Malignant Nodes
27
Outlier Sensitivity in SVMs
60
40
Age
20
0 10 20
Number of Malignant Nodes
28
Outlier Sensitivity in SVMs
This is probably still the correct boundary.
60
40
Age
20
0 10 20
Number of Malignant Nodes
29
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
40
Age
20
0 10 20
Number of Malignant Nodes
30
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
40
Age
20
0 10 20
Number of Malignant Nodes
31
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Best Fit
40
Age
20
0 10 20
Number of Malignant Nodes
32
Regularization in SVMs
1 Large
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Best Fit
40
Age
20
0 10 20
Number of Malignant Nodes
33
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
40
Age
20
0 10 20
Number of Malignant Nodes
34
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Slightly Higher
40
Age
20
0 10 20
Number of Malignant Nodes
35
Regularization in SVMs
1 Much Smaller
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Slightly Higher
40
Age
20
0 10 20
Number of Malignant Nodes
36
Interpretation of SVM Coefficients
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
40
Age
20
0 10 20
Number of Malignant Nodes
37
Interpretation of SVM Coefficients
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
𝛽1
40
𝛽2
𝛽3
Vector orthogonal
20
to the hyperplane
0 10 20
Number of Malignant Nodes
38
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC
39
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC
40
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC
41
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC
Fit the instance on the data and then predict the expected value.
LinSVC = LinSVC.fit(X_train, y_train)
y_predict = LinSVC.predict(X_test)
42
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC
Fit the instance on the data and then predict the expected value.
LinSVC = LinSVC.fit(X_train, y_train)
y_predict = LinSVC.predict(X_test)
43
44
Classification with SVMs
60
40
Age
20
0 10 20
Number of Malignant Nodes
45
Non-Linear Decision Boundaries with SVM
Non-linear data can be made linear with higher dimensionality.
46
The Kernel Trick
Transform data so it is linearly separable.
47
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Budget
48
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Approach 1:
Create higher order
features to transform
the data.
Budget Budget2 +
Rating2 +
Budget * Rating +
…
49
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Approach 2:
Transform the space to
a different
coordinate system.
Budget
50
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Define Feature 1:
Similarity to
“Pulp Fiction.”
Budget
51
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Define Feature 2:
Similarity to
“Black Swan.”
Budget
52
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Define Feature 3:
Similarity to
“Transformers.”
Budget
53
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖
𝑃𝑢𝑙𝑝 𝐹𝑖𝑐𝑡𝑖𝑜𝑛 2
)
Palme d’Or Winners at Cannes. 𝑜𝑏𝑠
𝑎1 (𝑥 ) = 𝑒𝑥𝑝
2𝜎 2
Create a
Gaussian function
at feature 1.
Budget
54
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖𝐵𝑙𝑎𝑐𝑘 𝑆𝑤𝑎𝑛 )2
Palme d’Or Winners at Cannes. 𝑎1 (𝑥 𝑜𝑏𝑠 ) = 𝑒𝑥𝑝
2𝜎 2
Create a
Gaussian function
at feature 2.
Budget
55
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖
𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟𝑠 2
)
Palme d’Or Winners at Cannes. 𝑜𝑏𝑠
𝑎1 (𝑥 ) = 𝑒𝑥𝑝
2𝜎 2
Create a
Gaussian function
at feature 3.
Budget
56
SVM Gaussian Kernel
Budget
Transformation:
[x1, x2] [0.7a1 , 0.9a2 , -0.6a3]
57
SVM Gaussian Kernel
a3 a1
a2
Budget
a1=0.90
Transformation:
a2=0.92
[x1, x2] [0.7a1 , 0.9a2 , -0.6a3]
a3=0.30
58
SVM Gaussian Kernel
a3 a1
a2
Budget
a1=0.50
a2=0.60
a3=0.70 Transformation:
[x1, x2] [0.7a1 , 0.9a2 , -0.6a3]
59
SVM Gaussian Kernel
Transformation:
[x1, x2] [0.7a1 , 0.9a2 , -0.6a3]
a2 (Black Swan)
x1 (Budget)
a1 (Pulp Fiction)
x2 (IMDB Rating)
a3 (Transformers)
60
Classification in the New Space
Transformation:
[x1, x2] [0.7a1 , 0.9a2 , -0.6a3]
a2 (Black Swan)
a1 (Pulp Fiction)
a3 (Transformers)
61
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Budget
62
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Budget
63
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Radial Basis Function
(RBF) Kernel.
Budget
64
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC
65
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC
66
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC
67
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC
68
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC
Fit the instance on the data and then predict the expected value.
rbfSVC = rbfSVC.fit(X_train, y_train)
y_predict = rbfSVC.predict(X_test)
69
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC
Fit the instance on the data and then predict the expected value.
rbfSVC = rbfSVC.fit(X_train, y_train)
y_predict = rbfSVC.predict(X_test)
70
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.
71
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.
72
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.
73
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem
74
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem
75
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem
76
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem
77
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import RBFsampler
78
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import RBFsampler
79
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import RBFsampler
80
When to Use Logistic Regression vs SVC
Features Data Model Choice
Few (<100 Features) Many (>100K Points) Add features, Logistic, LinearSVC or
Kernel Approx.
81