You are on page 1of 20

Support Vector Machine

Compiled by Koushika B, COE17B044


Guided by
Dr Umarani Jayaraman

Department of Computer Science and Engineering


Indian Institute of Information Technology Design and Manufacturing
Kancheepuram

May 5, 2022

1 / 20
SVM

w t Xi + w0 > 0
w .Xi + w0 > 0 if Xiw1
w .Xi + w0 < 0 if Xiw2
g (X ) = w .Xi + w0 ≥ b for good generalization
The distance of point ‘X’ from the hyperplane ‘H’ which is
represented as g(X) can be calculated as r = g (x)/||w ||
which is nothing but (W .Xi + w0 )/||w || .

2 / 20
So, we must ensure that

(W .Xi + w0 )/||W || ≥ b ⇒ 1

(W .Xi + w0 ) ≥ b.||W ||

W .Xi + w0 ≥ 1 if Xi ω1

W .Xi + w0 ≤ −1 if Xi ω2 ⇒ 2

3 / 20
From 2, we can establish an uniform criteria as follows Yi is the class label
whose values is ±1
+1 for class ω1
-1 for class ω2

yi (W .Xi + w0 ) ≥ 1 and this equality holds

yi (W .Xi + w0 ) = 1 if Xi are support vectors

yi (W .Xi + w0 ) > 1 if Xi are not support vectors

4 / 20
5 / 20
From 1 ,
In order to maximize the margin ‘b’, ||w || has to be minimized and at
the same time wo has to be maximized .
For minimization of ||w || , let’s consider the other constraints.

yi (w .Xi + w0 ) = 1
Because it is constraint optimisation problem, it can be converted into
un-constraint problem by using the lagrangian multiplier.

6 / 20
Minimization of ||W || is same as minimization of φ(W ) -
function of W

φ(W ) = W t .W
φ(w ) = 12 .W .W (dot product)

1/2 is introduced for mathematical convenience.

with the constraints,


yi (W .Xi + w0 ) = 1

subject to the constraint, if Xi are support vectors

7 / 20
It can be written using unconstraint optimization problem as follows

Pn
L(||W ||, wo ) = 12 ||W ||2 – i=1 λ[yi (W .Xi + wo )–1]

minimize ||w || .
1 2
2 ||W || is objective function
[yi (W .Xi + wo )–1] is a constraint

8 / 20
We can define the lagrangian of the form,

L(w , wo ) = 21 (w .w )– αi [yi(w .Xi + wo)–1]


P

minimize ||wo || .
maximize ||w || .
αi is Lagrangian multiplier

L(w , wo ) = 12 (w .w )–
P
αi [yi (w .Xi + wo )–1]

by taking derivative w.r.t ’w’ and 0 wo0

9 / 20
L(W , wo ) = 12 (W .W )– P αi [yi (W Xi + w
P
Po )–1]
L(W , wo ) = 21 (W .W )– αi yi (W Xi ) − αi yi wo + αi
P

∂L P
∂wo =− αi yi = 0

∂L Pn
∂wo = i=1 αi yi = 0 ⇐ one of the constraints

n= number of samples during training

10 / 20
L(W , wo ) = 12 (W .W )– αi yi (W .Xi ) − αi yi wo + αi
P P P
∂L P
∂w = W − αi yi Xi = 0
Pn
W = i=1 αi yi Xi

Substitute those values P


in 1
L(W , wo ) = 12 (W .W )– αi yi (W .Xi ) − αi yi wo + αi
P P

where P
W .Xi = αj yj Xj and αi yi wo ) = 0

11 / 20
1 Pn Pn P
= 2 i=1 αi αj yi yj (Xi .Xj )– i=1 αi αj yi yj (Xi .Xj ) + αi
Pn 1 Pn
L(W , wo ) = i=1 αi − 2 i=1 αi αj yi yj (Xi .Xj )

We have to maximize this with different values of αi

12 / 20
αi - lagrangian multipliers are always positive

αi ≥ 0
Pn
i=1 αi yi =0

When we try to maximize this lagrangian function, it is quite likely


some of the lagrangian multipliers are zero and few of the lagrangian
multipliers are very high.
if αi = 0; that indicates its corresponding training vectors (Xi ) are
not support vectors.
if αi 6= 0 , its corresponding training vectors (Xi ) are having high
influence over the position of hyper plane (support vectors).
Here αi 6= 0 will go into decision making.

13 / 20
g (z) = W .Z P
+ wo
g (z) = sign( ni=1 αi yi Xi .Z + wo )

unknown Feature Vector Z


if sign is +ve Z ω1
if sign is -ve Z ω2

14 / 20
The steps of SVM design to estimate W and wo

Pn
W = i=1 αi yi Xi

and wo

wo = 12 [min
P P
∀yi =+1 αi yi (Xi .Xj ) + max ∀yi =−1 αi yi (Xi .Xj )]

wo = 21 [min
P P
∀yi =+1 W .Xi + max ∀yi =−1 W .Xi ]

substitute these W and wo into classification rule to classify the unknown


feature vector Z .

g (Z ) = W .ZP+ wo
g (Z ) = sign ni=1 αi yi Xi .Z + wo

15 / 20
16 / 20
but in implementation,
w t Xi + wo = 1
w .Xi + wo = 1

wo = 1 − (w .X )

17 / 20
Additional slides

Consider the optimization problem maximize f(x,y) subject to :

g(x,y) = 0 or g(x,y) = c
L(x, y , λ) = f (x, y )–λg (x, y )

f(x,y) is the objective function


g(x,y) is the constraint

assume both f and g have continuous first partial derivatives.

18 / 20
Additional slides

Here
f (x, y ) = f (||w ||, wo ) ⇒ objective function

g (x, y ) = yi (w .X + wo ) = 1 ⇒ constraint
we can define the lagrangian of the form ,
Pn
L(||w ||, wo ) = 12 ||w ||2 – i=1 λ[yi (w .Xi + wo )–1]

minimize ||w || .
maximize ||wo ||
||w ||2 is objective function
[yi (w .Xi + wo )–1] is the constraint

19 / 20
THANK YOU

20 / 20

You might also like