Professional Documents
Culture Documents
nn LinearDiscriminant Function
nn Large Margin Linear Classifier
Nearest Nonlinear
Linear
Neighbor Functions
Functions
f(x) = wT x + b
• For a K-NN classifier it was necessary to `carry’ the training data
• For a linear classifier, the training data is used to learn w and then
discarded
• Only w is needed for classifying new data
Linear Discriminant Function denotes +1
denotes -1
nn How would you classify x2
these points using a linear
discriminant function in order
to minimize the error rate?
x1
Linear Discriminant Function denotes +1
denotes -1
nn How would you classify x2
these points using a linear
discriminant function in order
to minimize the error rate?
x1
Linear Discriminant Function denotes +1
denotes -1
nn How would you classify x2
these points using a linear
discriminant function in order
to minimize the error rate?
x1
Linear Discriminant Function denotes +1
denotes -1
nn How would you classify x2
these points using a linear
discriminant function in order
to minimize the error rate?
f(x) = wT x + b
nn (Unit-length)
normal vector
of the hyper-plane:
w
n= wT x + b < 0 x1
w
Large Margin Linear Classifier denotes +1
denotes -1
nn Given a set of data points: x2
wT x− + b = −1
x+
nn Distance between the origin
and the line wtx=k is k/||w|| m
nn The margin width is:
x-
Support Vectors x1
Support Vector Machine
Large Margin Linear Classifier denotes +1
denotes -1
nn Formulation: x2
Margin
2 x+
maximize
w
such that x+
For y = +1, wT x + b ≥1 n
i i x-
For y = −1, wT x + b ≤ −1
i i
x1
Large Margin Linear Classifier denotes +1
denotes -1
nn Formulation: x2
Margin
1 2 x+
minimize w
2
such that x+
For y = +1, wT x + b ≥1 n
i i x-
For y = −1, wT x + b ≤ −1
i i
x1
Large Margin Linear Classifier denotes +1
denotes -1
nn Formulation: x2
Margin
1 2 x+
minimize w
2
such that x+
y (wT x + b) ≥1 n
i i x-
x1
Solving the Optimization Problem
1
Quadratic minimize w2
programming 2
with linear
constraints s.t. y (wT x + b) ≥1
i i
Solving the Optimization Problem
1
Quadratic minimize w2
programming 2
with linear
constraints s.t. y (wT x + b) ≥1
i i
Lagrangian
Function
n
1
w −∑α i (y i (w T x i + b) −1)
2
minimize Lp (w,b,αi ) =
2 i=1
s.t. αi ≥ 0
Solving the Optimization Problem
n
1
w −∑α i (yi (w T xi + b) −1)
2
minimize Lp (w,b,αi ) =
2 i=1
s.t. αi ≥ 0
n
∂Lp w = ∑ αi yi xi
=0
∂w i=1
n
∂Lp
=0 ∑α y i i =0
∂b i=1
Solving the Optimization Problem
1 n
minimize Lp (w,b,αi ) = w −∑ α i (yi (w T xi + b) −1)
2
2 i=1
s.t. αi ≥ 0
Primal formulation
Lagrangian Dual
Problem
Dual formulation
KKT Conditions
Solving the Optimization Problem
nn From KKT condition, we know:
x2
x+
g(x) = w T x + b = ∑ i i i x +b
α y x T
i∈SV
nn Slackvariables ξi can be ξ2
added to allow mis- ξ1
classification of difficult
or noisy data points
x1
Large Margin Linear Classifier
nn Formulation:
n
minimize 1 w + C ∑ξi
2
2 i=1
such that
y (wT x + b) ≥ 1− ξ
i i i
ξi ≥ 0
n 1 n n
maximize ∑ αi − ∑∑ α iα j yi y j xTi x j
i=1 i=1 j=1
2 A small value for C will
increase the number of
such that training errors, while a
large C will lead to a
0 ≤ αi ≤ C behavior similar to that of
a hard-margin SVM
n
∑α y i i =0
i=1
nn The parameter C controls the trade off
between errors of the SVM on training data
and margin maximization (C = ∞ leads to
hard margin SVM).