You are on page 1of 82

Relationship to Logistic Regression

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

1
𝑦𝛽 𝑥 =
1+𝑒 −(𝛽0 + 𝛽1 𝑥 + ε )
2
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

3
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

Three misclassifications

4
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

Two misclassifications

5
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

No misclassifications

6
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

No misclassifications—but is this the best position?

7
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

No misclassifications—but is this the best position?

8
Support Vector Machines (SVM)

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

Maximize the region between classes.

9
Similarity Between Logistic Regression and SVM

Patient Lost: 1.0


Status
After Five 0.5
Years
Survived: 0.0
Number of Positive Nodes

10
Classification with SVMs
Two features (nodes, age)
Two labels (survived, lost)

60

40
Age

20

0 10 20
Number of Malignant Nodes
11
Classification with SVMs
Find the line that best separates classes.

60

40
Age

20

0 10 20
Number of Malignant Nodes
12
Classification with SVMs
Find the line that best separates classes.

60

40
Age

20

0 10 20
Number of Malignant Nodes
13
Classification with SVMs
Find the line that best separates classes.

60

40
Age

20

0 10 20
Number of Malignant Nodes
14
Classification with SVMs
Find the line that best separates classes.

60

40
Age

20

0 10 20
Number of Malignant Nodes
15
Classification with SVMs
Also, include the largest boundary possible.

60

40
Age

20

0 10 20
Number of Malignant Nodes
16
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

17
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 Logistic Regression
Cost Function
for Lost Class
2

-3 -2 -1 0 1 2 3

18
Logistic Regression vs SVM Cost Functions
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 Logistic Regression 3 SVM


Cost Function Cost Function
for Lost Class for Lost Class
2 2

1 1

-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3

19
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 SVM
Cost Function
2 for Lost Class

-3 -2 -1 0 1 2 3

20
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 SVM
Cost Function
2 for Lost Class

-3 -2 -1 0 1 2 3

21
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 SVM
Cost Function
2 for Lost Class

-3 -2 -1 0 1 2 3

22
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 SVM
Cost Function
2 for Lost Class

-3 -2 -1 0 1 2 3

23
The SVM Cost Function
Lost: 1.0
Patient
Status
After Five Years 0.5

Survived: 0.0
Number of Positive Nodes

3 SVM
Cost Function
2 for Lost Class

-3 -2 -1 0 1 2 3

24
Outlier Sensitivity in SVMs
60

40
Age

20

0 10 20
Number of Malignant Nodes
25
Outlier Sensitivity in SVMs
60

40
Age

20

0 10 20
Number of Malignant Nodes
26
Outlier Sensitivity in SVMs
60

40
Age

20

0 10 20
Number of Malignant Nodes
27
Outlier Sensitivity in SVMs
60

40
Age

20

0 10 20
Number of Malignant Nodes
28
Outlier Sensitivity in SVMs
This is probably still the correct boundary.

60

40
Age

20

0 10 20
Number of Malignant Nodes
29
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60

40
Age

20

0 10 20
Number of Malignant Nodes
30
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60

40
Age

20

0 10 20
Number of Malignant Nodes
31
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Best Fit

40
Age

20

0 10 20
Number of Malignant Nodes
32
Regularization in SVMs
1 Large
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Best Fit

40
Age

20

0 10 20
Number of Malignant Nodes
33
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60

40
Age

20

0 10 20
Number of Malignant Nodes
34
Regularization in SVMs
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Slightly Higher

40
Age

20

0 10 20
Number of Malignant Nodes
35
Regularization in SVMs
1 Much Smaller
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
Slightly Higher

40
Age

20

0 10 20
Number of Malignant Nodes
36
Interpretation of SVM Coefficients
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60

40
Age

20

0 10 20
Number of Malignant Nodes
37
Interpretation of SVM Coefficients
1
𝐽(𝛽𝑖 ) = 𝑆𝑉𝑀𝐶𝑜𝑠𝑡(𝛽𝑖 ) + σ𝑖 𝛽𝑖
𝐶
60
𝛽1
40
𝛽2
𝛽3
Vector orthogonal
20
to the hyperplane

0 10 20
Number of Malignant Nodes
38
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC

39
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC

Create an instance of the class.


LinSVC = LinearSVC(penalty='l2', C=10.0)

40
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC

Create an instance of the class.


Regularization
LinSVC = LinearSVC(penalty='l2', C=10.0)
parameters

41
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC

Create an instance of the class.


LinSVC = LinearSVC(penalty='l2', C=10.0)

Fit the instance on the data and then predict the expected value.
LinSVC = LinSVC.fit(X_train, y_train)
y_predict = LinSVC.predict(X_test)

42
Linear SVM: The Syntax
Import the class containing the classification method.
from sklearn.svm import LinearSVC

Create an instance of the class.


LinSVC = LinearSVC(penalty='l2', C=10.0)

Fit the instance on the data and then predict the expected value.
LinSVC = LinSVC.fit(X_train, y_train)
y_predict = LinSVC.predict(X_test)

Tune regularization parameters with cross-validation.

43
44
Classification with SVMs
60

40
Age

20

0 10 20
Number of Malignant Nodes
45
Non-Linear Decision Boundaries with SVM
Non-linear data can be made linear with higher dimensionality.

46
The Kernel Trick
Transform data so it is linearly separable.

47
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Budget

IMDB User Rating

48
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Approach 1:
Create higher order
features to transform
the data.

Budget Budget2 +
Rating2 +
Budget * Rating +

IMDB User Rating

49
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Approach 2:
Transform the space to
a different
coordinate system.

Budget

IMDB User Rating

50
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Define Feature 1:
Similarity to
“Pulp Fiction.”

Budget

IMDB User Rating

51
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Define Feature 2:
Similarity to
“Black Swan.”

Budget

IMDB User Rating

52
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Define Feature 3:
Similarity to
“Transformers.”

Budget

IMDB User Rating

53
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖
𝑃𝑢𝑙𝑝 𝐹𝑖𝑐𝑡𝑖𝑜𝑛 2
)
Palme d’Or Winners at Cannes. 𝑜𝑏𝑠
𝑎1 (𝑥 ) = 𝑒𝑥𝑝
2𝜎 2

Create a
Gaussian function
at feature 1.

Budget

IMDB User Rating

54
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖𝐵𝑙𝑎𝑐𝑘 𝑆𝑤𝑎𝑛 )2
Palme d’Or Winners at Cannes. 𝑎1 (𝑥 𝑜𝑏𝑠 ) = 𝑒𝑥𝑝
2𝜎 2

Create a
Gaussian function
at feature 2.

Budget

IMDB User Rating

55
SVM Gaussian Kernel − σ(𝑥𝑖𝑜𝑏𝑠 − 𝑥𝑖
𝑇𝑟𝑎𝑛𝑠𝑓𝑜𝑟𝑚𝑒𝑟𝑠 2
)
Palme d’Or Winners at Cannes. 𝑜𝑏𝑠
𝑎1 (𝑥 ) = 𝑒𝑥𝑝
2𝜎 2

Create a
Gaussian function
at feature 3.

Budget

IMDB User Rating

56
SVM Gaussian Kernel

Budget

Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]

IMDB User Rating

57
SVM Gaussian Kernel

a3 a1

a2
Budget

a1=0.90
Transformation:
a2=0.92
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
a3=0.30

IMDB User Rating

58
SVM Gaussian Kernel

a3 a1

a2
Budget
a1=0.50
a2=0.60
a3=0.70 Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]

IMDB User Rating

59
SVM Gaussian Kernel
Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
a2 (Black Swan)
x1 (Budget)

a1 (Pulp Fiction)

x2 (IMDB Rating)
a3 (Transformers)

60
Classification in the New Space
Transformation:
[x1, x2]  [0.7a1 , 0.9a2 , -0.6a3]
a2 (Black Swan)

a1 (Pulp Fiction)

a3 (Transformers)

61
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Budget

IMDB User Rating

62
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.

Budget

IMDB User Rating

63
SVM Gaussian Kernel
Palme d’Or Winners at Cannes.
Radial Basis Function
(RBF) Kernel.

Budget

IMDB User Rating

64
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC

65
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC

Create an instance of the class.


rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0)

66
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC

Create an instance of the class. Set kernel and


rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0) associated coefficient
(gamma).

67
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC

Create an instance of the class. "C" is penalty


rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0) associated with
the error term.

68
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC

Create an instance of the class.


rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0)

Fit the instance on the data and then predict the expected value.
rbfSVC = rbfSVC.fit(X_train, y_train)
y_predict = rbfSVC.predict(X_test)

69
SVMs with Kernels: The Syntax
Import the class containing the classification method.
from sklearn.svm import SVC

Create an instance of the class.


rbfSVC = SVC(kernel='rbf', gamma=1.0, C=10.0)

Fit the instance on the data and then predict the expected value.
rbfSVC = rbfSVC.fit(X_train, y_train)
y_predict = rbfSVC.predict(X_test)

Tune kernel and associated parameters with cross-validation.

70
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.

71
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.

▪ Construct approximate kernel map with SGD using Nystroem or


Solution RBF sampler

72
Feature Overload
SVMs with RBF Kernels are very slow to train with lots of features or
Problem data.

▪ Construct approximate kernel map with SGD using Nystroem or


Solution RBF sampler
▪ Fit a linear classifier

73
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem

Create an instance of the class.


nystroemSVC = Nystroem(kernel='rbf', gamma=1.0,
n_components=100)

Fit the instance on the data and transform.


X_train = nystroemSVC.fit_transform(X_train)
X_test = nystroemSVC.transform(X_test)

Tune kernel parameters and components with cross-validation.

74
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem

Create an instance of the class.


nystroemSVC = Nystroem(kernel='rbf', gamma=1.0, Multiple non-linear
n_components=100) kernels can be used.

Fit the instance on the data and transform.


X_train = nystroemSVC.fit_transform(X_train)
X_test = nystroemSVC.transform(X_test)

Tune kernel parameters and components with cross-validation.

75
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem

Create an instance of the class.


nystroemSVC = Nystroem(kernel='rbf', gamma=1.0, Kernel and gamma are
n_components=100) identical to SVC.

Fit the instance on the data and transform.


X_train = nystroemSVC.fit_transform(X_train)
X_test = nystroemSVC.transform(X_test)

Tune kernel parameters and components with cross-validation.

76
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import Nystroem

Create an instance of the class.


nystroemSVC = Nystroem(kernel='rbf', gamma=1.0, n_components is
n_components=100) number of samples.

Fit the instance on the data and transform.


X_train = nystroemSVC.fit_transform(X_train)
X_test = nystroemSVC.transform(X_test)

Tune kernel parameters and components with cross-validation.

77
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import RBFsampler

Create an instance of the class.


rbfSample = RBFsampler(gamma=1.0,
n_components=100)

Fit the instance on the data and transform.


X_train = rbfSample.fit_transform(X_train)
X_test = rbfSample.transform(X_test)

Tune kernel parameters and components with cross-validation.

78
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import RBFsampler

Create an instance of the class.


rbfSample = RBFsampler(gamma=1.0, RBF is only kernel that
n_components=100) can be used.

Fit the instance on the data and transform.


X_train = rbfSample.fit_transform(X_train)
X_test = rbfSample.transform(X_test)

Tune kernel parameters and components with cross-validation.

79
Faster Kernel Transformations: The Syntax
Import the class containing the classification method.
from sklearn.kernel_approximation import RBFsampler

Create an instance of the class.


rbfSample = RBFsampler(gamma=1.0, Parameter names
n_components=100) are identical
to previous.
Fit the instance on the data and transform.
X_train = rbfSample.fit_transform(X_train)
X_test = rbfSample.transform(X_test)

Tune kernel parameters and components with cross-validation.

80
When to Use Logistic Regression vs SVC
Features Data Model Choice

Many (~10K Features) Small (1K rows) Simple, Logistic or LinearSVC

Few (<100 Features) Medium (~10k rows) SVC with RBF

Few (<100 Features) Many (>100K Points) Add features, Logistic, LinearSVC or
Kernel Approx.

81

You might also like