Professional Documents
Culture Documents
srihari@buffalo.edu
Machine Learning Srihari
1. Importance of SVMs
• SVM is a discriminative method that
brings together:
1. computational learning theory
2. previously known methods in linear
discriminant functions
3. optimization theory
• Widely used for solving problems in
classification, regression and novelty
detection
Machine Learning Srihari
g(x)
• Lemma:: Distance from x to the plane is r=
|| w ||
w
Proof: Let x = xp + r where r is the distance from x to the plane,
|| w ||
=0
then
w w t
w || w ||2
g(x) = w (x p + r
t
) + w0 = w x + w + r
t
= g(x p ) + r = r || w || QED
|| w || p 0
|| w || || w ||
w
x = xp + r
g ( x) = w • x + w0 = 0
t
|| w ||
Corollary: Distance of origin to plane is
r = g(0)/||w|| = w0/||w|| g ( x)
since g(0) = w t 0 + w 0 = w 0 r=
|| w ||
Thus w0=0 implies that
plane passes through origin xp
g ( x) > 0
g ( y) < 0
Machine Learning Srihari
Choosing a margin
• Augmented space: g(y) = aty by choosing a0= w0
and y0=1, i.e, plane passes through origin
zk g(yk )
b ≥b
|| a ||
Machine Learning Srihari
g(y) = -1 y1
y2
y1
Optimal
hyperplane is Shortest line connecting
orthogonal Convex hulls, which
to.. has length = 2/||a||
Machine Learning Srihari
while maximizing b
subject to constraints
z k a t yk ≥ 1, k = 1,.....n
• We seek to minimize L( )
• with respect to the weight vector a and α k
≥0
∑α z
k =1
k k =0 α k ≥ 0, k = 1,....., n
• Implementation:
• Solved using quadratic programming
• Alternatively, since it only needs inner
products of training data
• It can be implemented using kernel functions
• which is a crucial property for generalizing to
non-linear case
a = ∑α k z k y k
• The solution is given by k
Machine Learning Srihari
Quadratic term
Machine Learning Srihari
or K ( x , y) = ϕ ( x )ϕ ( y)
K (x, y ) = ϕ (x)ϕ (y )
where
( )
ϕ (x) = x12 , x 2 2 , 2 x1 , 2 x 2 , 2 x1x 2 ,1
Non-Linear Case
• Mapping function f (.) to a
sufficiently high dimension
• So that data from two categories
can always be separated by a
hyperplane
• Assume each pattern xk has been
transformed to
yk=f (xk), for k=1,.., n
• First choose the non-linear f
functions
• To map the input vector to a higher
dimensional feature space
• Dimensionality of space can be
arbitrarily high only limited by
computational resources
Machine Learning Srihari
y k
= Φ(xk )
• Dimensionality of mapped space can be arbitrarily high
Machine Learning Srihari
Hyperbolas
Corresponding to
x1 x2 = +1
Machine Learning Srihari
∑
k =1
αk − ∑
2 k =1
∑ k j j k jy k
α
j=1
α z z y t
VC Dimension of Hyperplanes in R2
VC dimension provides
the complexity of a
class of decision
functions
f(n,d)
At n = 2(d+1), called the capacity 0.5
of the hyperplane nearly one half
of the dichotomies are still linearly
separable
f(5,2)= 0.5,
i.e., half the dichotomies are linear Some Non-separable Cases
Capacity is achieved at n = 2d+1 = 5
VC Dimension = d+1 = 3
Machine Learning Srihari