Larange behind SVM

© All Rights Reserved

0 views

Larange behind SVM

© All Rights Reserved

- LU7 - Unconstrained and Constrained Optimization Lagrange Multiplier
- Least Square Support Vector Machine and Relevance
- Lexicographic Multi-objective Geometric Programming Problems
- t3 Renmod h1c106097 Ghifari Razi
- Goal Programming
- Duality
- MeasurIT Flexim ADM ShurgraphD Project DL 0906
- Hwei Thesis
- Advanced Optimistion (1)
- jour07-09
- Full Text 01
- duals
- Reviewer
- Power Stab 300
- 40-2101
- S3_dual
- Jirkm2-2-Lecture Timetabling Using Immune-based Algorithms - Mohd Rozi Malim
- Benders Decomposition in Restructured Power Systems
- Taylor Introms10 Ppt 10
- Managing grade risk in stope design optimisation: probabilistic mathematical programming model and application in sublevel stoping

You are on page 1of 4

being multiplied by that same constant, this is indeed a scaling constraint,

and can be satisfied by rescaling w, b. Plugging this into our problem above,

and noting that maximizing γ̂/||w|| = 1/||w|| is the same thing as minimizing

||w||2, we now have the following optimization problem:

1

minγ,w,b ||w||2

2

s.t. y (i) (w T x(i) + b) ≥ 1, i = 1, . . . , m

We’ve now transformed the problem into a form that can be efficiently

solved. The above is an optimization problem with a convex quadratic ob-

jective and only linear constraints. Its solution gives us the optimal mar-

gin classifier. This optimization problem can be solved using commercial

quadratic programming (QP) code.1

While we could call the problem solved here, what we will instead do is

make a digression to talk about Lagrange duality. This will lead us to our

optimization problem’s dual form, which will play a key role in allowing us to

use kernels to get optimal margin classifiers to work efficiently in very high

dimensional spaces. The dual form will also allow us to derive an efficient

algorithm for solving the above optimization problem that will typically do

much better than generic QP software.

5 Lagrange duality

Let’s temporarily put aside SVMs and maximum margin classifiers, and talk

about solving constrained optimization problems.

Consider a problem of the following form:

minw f (w)

s.t. hi (w) = 0, i = 1, . . . , l.

Some of you may recall how the method of Lagrange multipliers can be used

to solve it. (Don’t worry if you haven’t seen it before.) In this method, we

define the Lagrangian to be

l

X

L(w, β) = f (w) + βi hi (w)

i=1

1

You may be familiar with linear programming, which solves optimization problems

that have linear objectives and linear constraints. QP software is also widely available,

which allows convex quadratic objectives and linear constraints.

8

Here, the βi ’s are called the Lagrange multipliers. We would then find

and set L’s partial derivatives to zero:

∂L ∂L

= 0; = 0,

∂wi ∂βi

and solve for w and β.

In this section, we will generalize this to constrained optimization prob-

lems in which we may have inequality as well as equality constraints. Due to

time constraints, we won’t really be able to do the theory of Lagrange duality

justice in this class,2 but we will give the main ideas and results, which we

will then apply to our optimal margin classifier’s optimization problem.

Consider the following, which we’ll call the primal optimization problem:

minw f (w)

s.t. gi (w) ≤ 0, i = 1, . . . , k

hi (w) = 0, i = 1, . . . , l.

To solve it, we start by defining the generalized Lagrangian

k

X l

X

L(w, α, β) = f (w) + αi gi (w) + βi hi (w).

i=1 i=1

Here, the αi ’s and βi ’s are the Lagrange multipliers. Consider the quantity

θP (w) = max L(w, α, β).

α,β : αi ≥0

Here, the “P” subscript stands for “primal.” Let some w be given. If w

violates any of the primal constraints (i.e., if either gi (w) > 0 or hi (w) 6= 0

for some i), then you should be able to verify that

k

X l

X

θP (w) = max f (w) + αi gi (w) + βi hi (w) (1)

α,β : αi ≥0

i=1 i=1

= ∞. (2)

Conversely, if the constraints are indeed satisfied for a particular value of w,

then θP (w) = f (w). Hence,

f (w) if w satisfies primal constraints

θP (w) =

∞ otherwise.

2

Readers interested in learning more about this topic are encouraged to read, e.g., R.

T. Rockarfeller (1970), Convex Analysis, Princeton University Press.

9

Thus, θP takes the same value as the objective in our problem for all val-

ues of w that satisfies the primal constraints, and is positive infinity if the

constraints are violated. Hence, if we consider the minimization problem

min θP (w) = min max L(w, α, β),

w w α,β : αi ≥0

we see that it is the same problem (i.e., and has the same solutions as) our

original, primal problem. For later use, we also define the optimal value of

the objective to be p∗ = minw θP (w); we call this the value of the primal

problem.

Now, let’s look at a slightly different problem. We define

θD (α, β) = min L(w, α, β).

w

Here, the “D” subscript stands for “dual.” Note also that whereas in the

definition of θP we were optimizing (maximizing) with respect to α, β, here

are are minimizing with respect to w.

We can now pose the dual optimization problem:

max θD (α, β) = max min L(w, α, β).

α,β : αi ≥0 α,β : αi ≥0 w

This is exactly the same as our primal problem shown above, except that the

order of the “max” and the “min” are now exchanged. We also define the

optimal value of the dual problem’s objective to be d∗ = maxα,β : αi ≥0 θD (w).

How are the primal and the dual problems related? It can easily be shown

that

d∗ = max min L(w, α, β) ≤ min max L(w, α, β) = p∗ .

α,β : αi ≥0 w w α,β : αi ≥0

(You should convince yourself of this; this follows from the “max min” of a

function always being less than or equal to the “min max.”) However, under

certain conditions, we will have

d ∗ = p∗ ,

so that we can solve the dual problem in lieu of the primal problem. Let’s

see what these conditions are.

Suppose f and the gi ’s are convex,3 and the hi ’s are affine.4 Suppose

further that the constraints gi are (strictly) feasible; this means that there

exists some w so that gi (w) < 0 for all i.

3

When f has a Hessian, then it is convex if and only if the Hessian is positive semi-

definite. For instance, f (w) = wT w is convex; similarly, all linear (and affine) functions

are also convex. (A function f can also be convex without being differentiable, but we

won’t need those more general definitions of convexity here.)

4

I.e., there exists ai , bi , so that hi (w) = aTi w + bi . “Affine” means the same thing as

linear, except that we also allow the extra intercept term bi .

10

solution to the primal problem, α∗ , β ∗ are the solution to the dual problem,

and moreover p∗ = d∗ = L(w ∗ , α∗, β ∗ ). Moreover, w ∗ , α∗ and β ∗ satisfy the

Karush-Kuhn-Tucker (KKT) conditions, which are as follows:

∂

L(w ∗ , α∗ , β ∗) = 0, i = 1, . . . , n (3)

∂wi

∂

L(w ∗ , α∗ , β ∗) = 0, i = 1, . . . , l (4)

∂βi

αi∗ gi (w ∗) = 0, i = 1, . . . , k (5)

gi (w ∗) ≤ 0, i = 1, . . . , k (6)

α∗ ≥ 0, i = 1, . . . , k (7)

solution to the primal and dual problems.

We draw attention to Equation (5), which is called the KKT dual com-

plementarity condition. Specifically, it implies that if αi∗ > 0, then gi (w ∗ ) =

0. (I.e., the “gi (w) ≤ 0” constraint is active, meaning it holds with equality

rather than with inequality.) Later on, this will be key for showing that the

SVM has only a small number of “support vectors”; the KKT dual comple-

mentarity condition will also give us our convergence test when we talk about

the SMO algorithm.

Previously, we posed the following (primal) optimization problem for finding

the optimal margin classifier:

1

minγ,w,b ||w||2

2

s.t. y (i) (w T x(i) + b) ≥ 1, i = 1, . . . , m

We have one such constraint for each training example. Note that from the

KKT dual complementarity condition, we will have αi > 0 only for the train-

ing examples that have functional margin exactly equal to one (i.e., the ones

- LU7 - Unconstrained and Constrained Optimization Lagrange MultiplierUploaded byBadrul Hyung
- Least Square Support Vector Machine and RelevanceUploaded byAshhad Faqeem
- Lexicographic Multi-objective Geometric Programming ProblemsUploaded byIJCSI Editor
- t3 Renmod h1c106097 Ghifari RaziUploaded byghifari razi
- Goal ProgrammingUploaded bySusi Umifarah
- DualityUploaded bysachin_gdgl
- MeasurIT Flexim ADM ShurgraphD Project DL 0906Uploaded bycwiejkowska
- Hwei ThesisUploaded byWBHopf
- Advanced Optimistion (1)Uploaded byMallikarjun Valipi
- jour07-09Uploaded byadysnake
- Full Text 01Uploaded bysimsek1296
- dualsUploaded bytaulz
- ReviewerUploaded byChicco Salac
- Power Stab 300Uploaded bymakroum
- 40-2101Uploaded byJohn Rong
- S3_dualUploaded byChitrakalpa Sen
- Jirkm2-2-Lecture Timetabling Using Immune-based Algorithms - Mohd Rozi MalimUploaded byJournal of Information Retrieval and Knowledge Management
- Benders Decomposition in Restructured Power SystemsUploaded bywvargas926
- Taylor Introms10 Ppt 10Uploaded byChun Yu Poon
- Managing grade risk in stope design optimisation: probabilistic mathematical programming model and application in sublevel stopingUploaded bypaulogmello
- pdf.pdfUploaded byDebstuti Das
- 1.thesis.pdfUploaded byRam Nice
- DualityUploaded byYogeesh LN
- Matematika Optimisasi -1Uploaded byFindar
- Dantzig WolfeUploaded bySarco
- cfol03Uploaded bysuhail.rizwan
- 9788126567904Uploaded bySumit Banerjee
- Constrained Op Tim IzationUploaded byAnonymous iDXcNd
- Chemical engineeringUploaded byNaveen T
- Performance Was Similar to the One DeemedUploaded bySrinivas

- Numerical Method (Asad)Uploaded byAsad Zahid
- RBF, KNN, SVM, DTUploaded byQurrat Ul Ain
- 07a1ec01-c Programming and Data StructuresUploaded bySRINIVASA RAO GANTA
- General Questions on Data Structures in CUploaded byAvadhoot_Joshi
- arulesUploaded byAli
- Kuliah Anum-2Uploaded byJesse gio
- amit vaghelaUploaded byamit_vaghela
- International Journal of Applied Sciences and Innovation - Vol 2015 - No 1 - Paper2Uploaded bysophia
- The PerceptronUploaded bySanjay Wasan
- TransientUploaded bySushank Dani
- Data StructuresUploaded byEnrique Bautista Azuara
- Algorithms chapter 1 2 and 3 solutionsUploaded byprestonchaps
- Lecture 4 5 - DS - QueuesUploaded byMp Inayat Ullah
- OpenFOAM Steady StateUploaded byAli Hussain Kadar
- Transportation + Assignment Models (1)Uploaded bymyra
- Single machine stochastic modelsUploaded byqer111
- CollectionsUploaded byGautam Popli
- Final Sol 4O03 16Uploaded byMaherK
- Ant Colony OptimizationUploaded bysandhanu
- SepaUploaded bykrajendran
- Method Subtractive CulsteringUploaded bytranthan
- Sorting TechniquesUploaded byJohnFernandes
- AI AlgorithmsUploaded byDheeraj Asnani
- TopCoder Hungarian AlgUploaded byGeorgiAndrea
- CIE A2 Computer science Practical NotesUploaded bySulaiman Javed
- Dsp TutorialUploaded bySatya Narayana
- S17 CS 101 Exam 2Uploaded byua student
- C++ vectorsUploaded byRossi223
- A New Heuristic Optimization Algorithm- Harmony SearchUploaded byAlhassan Adamu
- Intro to BoostingUploaded bySneha Singh

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.