You are on page 1of 4

Example:2 (Illustration of training a nonlinear SVM)

Consider the XOR problem:


Input Vector Desired output y
x 1=(−1 ,−1 ) y 1=−1
x 2=(−1 ,+1 ) y 2=+1
x 3=( +1 ,−1 ) y 3=+1
x 4 =(+ 1,+1 ) y 4 =−1

Figure 13: square showing the negative and circles showing the positive examples

As shown in the above figure, there is no linear function that can separate the four training
points. So, we will use the kernel function as

k ( ⃗x i , ⃗x j )=( ⃗x i ∙ ⃗x j +1 )2 (a)

Where, ⃗x i =( x i 1 , x i2 ¿ and ⃗x j =( x j 1 , x j2 ¿
The kernel function can be expressed as
k ( ⃗x 1 , ⃗x 2 )=¿) (b)

The images of input vector x is induced in the feature space is therefore deduced to be
Φ ( ⃗x i )=¿) (c )

The dual form of the maximization function is


l l l
1
W(α)=∑ α i - ∑ ∑ αi α j y i y j K ( ⃗x i , ⃗x j ) (d)
i=1 2 i=1 j=1

Here, all four points are the Support Vectors and by replacing the kernel function in equation
(a), we will have
4 4 4
1
W(α)=∑ α i –
2 ∑ ∑ αi α j y i y j ( 1+ ⃗x i ∙ ⃗x j )2 (e)
i=1 i=1 j=1

= α 1 +α 2+

1
α 3+ α 4− [α 1 α 1 y1 y 1 ( 1+ ⃗x 1 ∙ ⃗x 1 )2 +α 1 α 2 y 1 y 2 ( 1+ ⃗x 1 ∙ ⃗x 2) 2+ α 1 α 3 y 1 y 3 ( 1+ ⃗x 1 ∙ ⃗x3 ) 2 …+ α 4 α 4 y 4 y 4 ( 1+ ⃗x 4 ∙ ⃗x 4 )2 ]
2
(f)
[−1]
( ⃗x 1 ∙ ⃗x 1 )=[ −1−1 ] −1 = 2 [ +1 ]
( ⃗x 1 ∙ ⃗x 2 )=[ −1−1 ] −1 =0

Similarly,

( ⃗x 1 ∙ ⃗x 3 ) =0 ( ⃗x 1 ∙ ⃗x 4 ) = -2 ( ⃗x 2 ∙ ⃗x 2 ) =2 ( ⃗x 2 ∙ ⃗x 3 ) = -2
( ⃗x 2 ∙ ⃗x 4 ) =0 ( ⃗x 3 ∙ ⃗x 3 ) =2 ( ⃗x 3 ∙ ⃗x 4 ) =0 ( ⃗x 4 ∙ ⃗x 4 ) =2
Substituting these values in equation (f), we will get
W(α) =α 1 +α 2+

1
α 3+ α 4− [9 α 21−2 α 1 α 2−2 α 1 α 3 +2 α 1 α 4 +9 α 22 +2 α 2 α 3 −2 α 2 α 4 + 9 α 23 −2 α 2 α 4 + α 24 ]
2
(g) Optimizing W(α) with respect to the Lagrange
multipliers yields the following set of simultaneous equations:
−1
w.r.to α 1: 1 ¿ -2 α 2 −2 α 3 +2 α 4 ¿=0
2
9 α 1 -α 2−α 3+ α 4=1 (h)

w.r.to α 2:−α 1 +9 α 2+ α 3 −α 4=1 (i)

w.r.to α 3:−α 1 +α 2+ ¿9α 3−α 4 =1 (j)

w.r.to α 4:α 1 - α 2−α 3+ 9 α 4=1 (k)


Solving equation (h) and (k),we get
α1 = α4 (l)
Solving equation (i) and (j), we get
α2 = α3 (m)

By putting the value for α 3 and α 4 in equation (h),


10 α 1 –2 α 2 =1 (n)
Solving equation (h) and (i), we have
4 α 1 +4 α 2 =1 (o)
Multiply equation (n) with 2 and add with (0), we get
1
α 1= = = α 4 (p)
8
Putting value of α 1 in equation (o), and solving for α 2 ,
1
α 2= = = α 3 (q)
8
By putting this values of α iin equation (g), the optimum value of W(α) is
1 1 1 1
W(α)= w ∨¿2 =
i.e ∨¿ ⃗ w ∨¿=
or ¿∨⃗
4 2 4 √2
Now, to find the optimal hyperplane we need to find
l
w = ∑ αi y i Φ ( ⃗
⃗ xi )
i=1

1
= ¿ + y2 Φ (⃗
x 2) + y 3 Φ ( ⃗
x3 )+ y 4 Φ ( ⃗
x4 ) ¿
8

1 1 1 1

=
8
1

1
[[ ][ ][ ][ ]
1

1
−√ 2 − √ 2
−√ 2
1

1
√2
√2 −√ 2 √2
1
1 − √ 2 + − √ 2 + −√ 2 − √2
1
√2

[ ] []
0 0
0
−1
1 −4 √ 2
= = √2
8 0
0
0
0
0
0

w is 0.
The bias b is 0, because the first element of ⃗
The optimal hyperplane becomes
w . Φ ¿) + b = 0

w and b,
Substituting the values of ⃗

[]
x 21
−1 √2 x 1 x 2
[0 0 0 0] =0
√2 x 22
√ 2 x1
√ 2 x2
Which reduces to optimal hyperplane,
−x 1 x 2=0

You might also like