0% found this document useful (0 votes)

69 views5 pages

Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem

Uploaded by

renggaslbon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views5 pages

Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem

Uploaded by

renggaslbon

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Sequential Minimal Optimization Method to Solve the Support

Vector Machine Problem

Enzhi Li

July 16, 2018

I Introduction

In this article, I have given a detailed derivation of the support vector machine (SVM) algorithm.
In the same article, I also tried to solve the SVM problem using interior point method. The
disadvantage of the interior point method is that this method requires the storage and manipulation
of a square dense matrix, the dimension of which is equal to the number of data points. In the real
world application, the number of data points could be millions, and the mere storage of the matrix
would cause out of memory problem, let alone the manipulation thereof. Here, in this article, I am
going solve the SVM problem using a fast and efficient algorithm developed by John Platt in 1998 in
this paper. This fast and efficient method, called sequential minimal optimization (SMO), reduces
the original huge optimization problem to a set of small and simple problems, so simple that these
mini problems can be solved exactly without resorting to any unwieldy numerical libraries. Here,
I am going to describe the SMO algorithm in detail, implement this algorithm, and present the
experimental results on synthetic two dimensional data.

The organization of this article is as follows. In section II, I am going to give a brief summary
of the SVM algorithm. Section III focuses on the SMO algorithm, and section IV presents the
program implementation of the algorithm, together with the experimental results.

II Support vector machine (SVM)

SVM is used to classify data points of the form (xi , yi ), i = 1, 2, ..., N . Here, xi ∈ RN is the feature
vector and yi ∈ {−1, 1} is the data label. For a linearly separable set of data, we can always find
a hyperplane that satisfies the condition that β T x + β0 = 0. With this separating hyperplane,
the points the lie on one side of this plane satisfy β T x + β0 > 0, whereas the points that lie on

1
the other side satisfy β T x + β0 < 0. With some proper rescaling, we can make the positively
labeled points satisfy the condition that β T x + β0 ≥ 1, and the negatively labeled points satisfy
β T x + β0 ≤ −1. For any data set (xi , yi ), there are more than one set of parameters β, β0 that
satisfy these conditions. However, there is one set of parameter for which the distance between the
two planes β T +β0 = ±1 is largest. The hyperplane corresponding to this set of parameters is called
the optimal separating line (plane). The training of SVM using data is just the determination of
this set of optimal parameters.

The determination of the optimal parameters can be converted to this constrained optimization
problem:
1
max ||β||2 (1)
β 2

s.t. −yi (β T xi + β0 ) ≤ −1, i = 1, 2, ..., N

This problem can be further transformed to its Lagrangian dual form, which is,

N N N
X 1 XX
max αi − αi yi αj yj xTi xj (2)
α 2
i=1 i=1 j=1
s.t. αi ≥ 0, ∀i,
XN
αi yi = 0,
i=1

This Lagrangian dual form is easier to solve than the original problem. It is a convex quadratic op-
timization problem with linear constraints. A fast and efficient algorithm is the sequential minimal
optimization algorithm. Next, I am going to present the derivation of this algorithm.

III Sequential minimal optimization (SMO)

As can be seen from the previous section, SVM can be finally converted to a quadratic optimization
problem with linear constraints. We know that the quadratic problem can be solved exactly by
hand when the number of variables is not too large. SMO takes advantage of this and decomposes
the original quadratic problem into a series of small and manageable mini quadratic problems. The
original has N variables, αi , i = 1, 2, ..., N . SMO considers two of these variables each time. The
reason that SMO considers only two of these α0 s is that the original problem has a linear constraint,
which is N 0
P
i=1 αi yi = 0. As a result, we must vary at least two of these α s simultaneously for the
new α0 s still to satisfy this constraint. In SMO, we simply choose the smallest possible number of
variation variables, that is, two.

2
As shown in the previous section, our target function for the optimization problem is
N N N
X 1 XX
g(α) = αi − αi yi αj yj xTi xj (3)
2
i=1 i=1 j=1

SMO decomposes this quadratic function into two parts. One part depends on the variation vari-
ables, which we denote as α1 , α2 , and the other part is independent of the variation variables. The
part that depends on the variation variables is

f (α1 , α2 ) := g (v) (4)

1 2 2 T 1
= α1 y1 x1 x1 + α22 y22 xT2 x2 + α1 α2 y1 y2 xT1 x2
2 2
XN N
X
T T
+ α1 y1 x1 αi yi xi + α2 y2 x2 αi yi xi − α1 − α2
i=3 i=3

The part that is independent of α1 , α2 is

N N
(c) 1 X T
X
g = αi αj yi yj xi xj − αi = const (5)
2
i,j=3 i=3

Thus, the original quadratic optimization problem can be rewritten as

max f (α1 , α2 ) + const (6)

α1 ,α2
s.t. 0 ≤ αi ≤ C, i = 1, 2;
N
X
α1 y1 + α2 y2 = γ = − α i yi
i=3

Here, I have used the soft margin SVM, and thus all the αi , i = 1, 2, ..., N are restricted within the
range [0, C]. From the equation α1 y1 + α2 y2 = γ, we can solve α2 from α1 , and vice versa. That is,

α 1 = y1 γ − y1 y2 α 2 ; (7)
α 2 = y2 γ − y1 y2 α 1

Define s = y1 y2 . Note that yi ∈ {−1, 1}, and thus yi2 = 1, s2 = 1. I can represent α1 in terms of α2
as α1 = y1 γ − sα2 . Plug this into f (α1 , α2 ), and I get

f˜(α2 ) := f (γy1 − sα2 , α2 ) (8)

1 2 1
= γ + α22 − 2α2 γy2 xT1 x1 + α22 xT2 x2
2 2
N
X
+ (γy2 α2 − α22 )xT1 x2 + γxT1 + α2 y2 (xT2 − xT1 ) αi yi xi
i=3
− γy1 + (s − 1)α2

3
This is quadratic function, and can be rewritten into the standard form as
1 T
f˜(α2 ) = (x x1 + xT2 x2 − 2xT1 x2 )α22 (9)
2 1
N
X
+ − γy2 xT1 x1 + γy2 xT1 x2 + y2 (xT2 − xT1 ) αi yi xi + s − 1 α2
i=3
N
1 2 T X
+ γ x1 x1 + γxT1 αi yi xi − γy1
2
i=3

It is easy to see that the coefficient of the quadratic term is semi-positive-definite, since

xT1 x1 + xT2 x2 − 2xT1 x2 = (x1 − x2 )T (x1 − x2 ) ≥ 0 (10)

Therefore, except for the case where x1 ∝ x2 , the target function f˜(α1 , α2 ) has a finite lower bound.
Disregarding the constraint that αi ∈ [0, C], I can minimize the target function at

γy2 xT1 x1 − γy2 xT1 x2 − y2 (xT2 − xT1 ) N

P
∗ i=3 αi yi xi − s + 1
α2 = 2
(11)
||x1 − x2 ||
α2 ||x1 − x2 ||2 + y2 (x1 − x2 )T β − s + 1
=
||x1 − x2 ||2
T
y2 β (x1 − x2 ) − s + 1
= α2 +
||x1 − x2 ||2
y2 (β T x1 − y1 ) − y2 (β T x2 − y2 )
= α2 +
||x1 − x2 ||2
E1 − E2
= α2 + y2 ,
||x1 − x2 ||2

where E1 = β T x1 + β0 − y1 , E2 = β T x2 + β0 − y2 are the prediction errors. The right hand side

of the equation depends only on the old α2 , and the left hand side of the equation is the new α2 .
Thus, Equation [11] can be used to update the variation parameter α2 . Note that I also need to
take constraint conditions into consideration. The constraints for α1 , α2 are

0 ≤ α1 , α2 ≤ C, (12)
α1 y1 + α2 y2 = γ.

In order to determine the feasible range of α1 , α2 , I need to consider these exclusive cases:

Case I: y1 y2 = −1 Now the constraint is α1 − α2 = k = γy1 . With some elementary algebra,

the feasible ranges are determined as

max(−k, 0) ≤ α2 ≤ C + min(−k, 0) (13)

max(−k, 0) ≤ α2 ≤ C + min(−k, 0)
−C ≤ k ≤ C

4
Case II: y1 y2 = 1 Now the constraint is α1 + α2 = k = γy1 . With some algebra, the feasible
range is found to be

max(0, k − C) ≤ α1 ≤ min(C, k) (14)

max(0, k − C) ≤ α2 ≤ min(C, k)
0 ≤ k ≤ 2C

When the updated α2∗ lies within the feasible range, I will set the updated α2new = α2∗ ; if α2∗ is
smaller than the permitted lower bound of α2 , then α2new = min(α2 ); when α2∗ is larger than the
permitted upper bound of α2 , then α2new = max(α2 ). Once we know the updated α2new , we can
determine the updated α1new as α1new = γy1 − sα2new . If the updated α2new is calculated according the
rules listed above, then it is guaranteed that the updated α1new also lies within the feasible range.

IV Program implementation and experimental results

I have written two programs to implement this algorithm. One in Python and another Scala. They
both produce consistent results. The Python version of the program can be found here, and the
Scala version of the program can be found here. I have tested the programs with synthetic two
dimensional data, and the results are shown in Fig. IV.1.

-1

-2
-1 0 1 2 3 4 5

Figure IV.1: Experimental results of my SVM program. The points are linearly separated.

SVM New
No ratings yet
SVM New
12 pages
Efficient SVM Regression with SMO
No ratings yet
Efficient SVM Regression with SMO
20 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
35 pages
Support Vector Machine
No ratings yet
Support Vector Machine
49 pages
Simplified SMO Algorithm for SVM
No ratings yet
Simplified SMO Algorithm for SVM
4 pages
MIT15 097S12 Lec12
No ratings yet
MIT15 097S12 Lec12
14 pages
Sequential Minimal Optimization: A Fast Algorithm For Training Support Vector Machines
No ratings yet
Sequential Minimal Optimization: A Fast Algorithm For Training Support Vector Machines
21 pages
Support Vector Machine Overview
No ratings yet
Support Vector Machine Overview
61 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
44 pages
Kernel SVM for Image Classification
No ratings yet
Kernel SVM for Image Classification
20 pages
Enhanced SVM Training Algorithm
No ratings yet
Enhanced SVM Training Algorithm
10 pages
Support Vector Machines Guide
No ratings yet
Support Vector Machines Guide
19 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
20 pages
Support Vector Machines in Machine Learning
No ratings yet
Support Vector Machines in Machine Learning
11 pages
Cost-Sensitive SVM for Imbalanced Data
No ratings yet
Cost-Sensitive SVM for Imbalanced Data
8 pages
Support Vector Machines (SVM) : N I y X D
No ratings yet
Support Vector Machines (SVM) : N I y X D
5 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
20 pages
Support Vector Machines Overview
No ratings yet
Support Vector Machines Overview
25 pages
Unit - 2
No ratings yet
Unit - 2
15 pages
Introduction to Support Vector Machines
No ratings yet
Introduction to Support Vector Machines
22 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
22 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
SVM and Kernel Trick in Machine Learning
100% (1)
SVM and Kernel Trick in Machine Learning
41 pages
SVM
No ratings yet
SVM
11 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
36 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Support Vector Machines (SVMS)
No ratings yet
Support Vector Machines (SVMS)
31 pages
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
No ratings yet
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
8 pages
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
No ratings yet
Learning From Data Lecture 8: Support Vector Machines (SVM) : Alaa Othman June 10, 2025
70 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Lecture 7 - SVM
No ratings yet
Lecture 7 - SVM
125 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Overview of Support Vector Machines
No ratings yet
Overview of Support Vector Machines
2 pages
SVM Slides
No ratings yet
SVM Slides
32 pages
ML TCS Lecture 15
No ratings yet
ML TCS Lecture 15
46 pages
Support Vector Machines Explained
No ratings yet
Support Vector Machines Explained
25 pages
12 - Bài Toán Phân L P - SVM - v2
No ratings yet
12 - Bài Toán Phân L P - SVM - v2
138 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
9 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
10 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
36 pages
Understanding Support Vector Machines
No ratings yet
Understanding Support Vector Machines
13 pages
Projecting Points onto Hyperplanes in SVM
No ratings yet
Projecting Points onto Hyperplanes in SVM
27 pages
SVM Problem Set 3 Solutions
No ratings yet
SVM Problem Set 3 Solutions
7 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
No ratings yet
Lecture: Classification With Support Vector Machines: CS 2XX: Mathematics For AI and ML
28 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
svm2 (1) Fin
No ratings yet
svm2 (1) Fin
24 pages
Least Squares Support Vector Machine Classifiers: Neural Processing Letters 9: 293-300, 1999
No ratings yet
Least Squares Support Vector Machine Classifiers: Neural Processing Letters 9: 293-300, 1999
8 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
Nonlinear Optimization in SVM Training
No ratings yet
Nonlinear Optimization in SVM Training
29 pages

Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem

Uploaded by

Sequential Minimal Optimization Method To Solve The Support Vector Machine Problem

Uploaded by

Sequential Minimal Optimization Method to Solve the Support

Vector Machine Problem

July 16, 2018

II Support vector machine (SVM)

s.t. −yi (β T xi + β0 ) ≤ −1, i = 1, 2, ..., N

III Sequential minimal optimization (SMO)

f (α1 , α2 ) := g (v) (4)

The part that is independent of α1 , α2 is

Thus, the original quadratic optimization problem can be rewritten as

max f (α1 , α2 ) + const (6)

f˜(α2 ) := f (γy1 − sα2 , α2 ) (8)

xT1 x1 + xT2 x2 − 2xT1 x2 = (x1 − x2 )T (x1 − x2 ) ≥ 0 (10)

γy2 xT1 x1 − γy2 xT1 x2 − y2 (xT2 − xT1 ) N

where E1 = β T x1 + β0 − y1 , E2 = β T x2 + β0 − y2 are the prediction errors. The right hand side

Case I: y1 y2 = −1 Now the constraint is α1 − α2 = k = γy1 . With some elementary algebra,

max(−k, 0) ≤ α2 ≤ C + min(−k, 0) (13)

max(0, k − C) ≤ α1 ≤ min(C, k) (14)

IV Program implementation and experimental results

You might also like