You are on page 1of 8

“mathematics is more about geometrical visualization than Symbolic Acrobatics”

Laopti Assignment 53

Lagrangian Duality (LD). The most tricky part of optimization theory.

LD is the core of modern theory of convex optimization. It arose from Lagrangian


function. Lagrangian function is a function created out of objective function and
constraints(treated as functions by removing the equality or inequality part of the
constrains ) that facilitate the writing of first order optimality conditions in a
methodical way. The resulting relationships between gradient of objective and
constraint functions along with complementarity conditions is the famous KKT
conditions.

Complementarity conditions basically help us to avoid the writing of ‘if’ in the


mathematical statement. Optimum value of the objective function can be either on
the bounadary of the constraint set or inside. If it is on boundary, we have one set
of first order conditions and if it is inside we have another set of conditions.
Complementarity condition combine these.

These conditions allows us to check whether the optimization algorithm has


converged onto the optimal point.

Soon mathematicians discovered that there is more to Lagrangian function


(corresponding to a convex objective function defined over a convex constraint
set.). The finding is: Corresponding to an optimization problem in terms of
original primal variables (some times called design variables) , there is an
equivalent dual optimization problem in terms of lagrangian multipliers as
variables. It followed from a geometry based argument on Lagrangian function.

To understand this, consider a simple problem.

min f ( x)
x
g ( x) ≥ 0

Corresponding Lagrangian function is


L( x, λ ) = f ( x) − λ g ( x); λ ≥ 0
Through a geometrical argument, we can derive the following:

min f ( x)
x is equivalent to
g ( x) ≥ 0

max min L( x, λ ) = min max L( x, λ ) ------------- (1)


λ x x λ

Derivation is based on a very tricky ‘thought experiment’ . It is given in the note


given to you and is there in the PPT slides also. It will be discussed in class. One
hour each is required for proving that both LHS and RHS finally lock on to the
same minimum value of the objective function over the constraint set.

Note very specially that, the whole argument is centered around the objective
function value.

Here through several pictures, we will prove one part of the relation. That is:

min f ( x) 
x
 ⇒ min max L( x, λ ) . It is RHS of (1)
g ( x) ≥ 0  x λ

Follow the following arguments/visualization

Step 1. Visualize the problem. Assume x is R2. f(x) is to be visualized in third


dimension.
Step 2: Mentally Compute max f ( x) − λ g ( x) .
λ ≥0

We are allowed to vary λ . Find out what should be λ at any x so as to maximize


lagrangian function at that point.

In the region where g ( x) ≥ 0 , max f ( x) − λ g ( x) is f ( x) itself. It is obtained by


λ ≥0
assigning λ = 0 . Note again that we are computing lagrangian function and
plotting it at every point where g ( x) ≥ 0 .

In the region where g ( x) < 0 , max f ( x) − λ g ( x) is ∞ . It is obtained by assigning


λ ≥0
λ = ∞. Note we are computing it and plotting at every point where g ( x) < 0 .

The resulting picture looks like the one given below.


Step 3. Compute Compute min max f ( x) − λ g ( x)
x λ ≥0
The output of step 3 is given in following figure. We end up in finding the
minimum value of f(x) in the feasible region.

This proves that


min f ( x) 
 ⇒ min max L( x, λ )
g ( x) ≥ 0  x λ

Similarly we can prove that

min f ( x) 
 ⇒ max min L( x, λ )
g ( x) ≥ 0  λ x

But this requires little more tougher geometrical argument.

We will do it later.

Lagrangian duality applied to Linear programming problem.

Consider following LP

max cT x
x
Ax ≤ b
x≥0

Let us find lagrangian function.

Writing lagrangin is tricky.

One way is to convert into a standard format (like changing max to min and chane
type inequality etc).

I personally remember in the following way.

If it is a maximization problem, add positive (>0) constraint part multiplied with


lagrangian multiplier to objective function and go above the objective function
value.
If it is an minimization problem, subtract positive (>0) constraint part multiplied
with lagrangian multiplier to objective function and go below the objective
function value.

It may take some time and practice to fully appreciate the above statement.

For the above LP problem, the lagrangian is:

max cT x 
x 
Ax ≤ b  ⇒ L( x, y ≥ 0, λ ≥ 0) = cT x + yT ( b − Ax ) + λ T x
x≥0 


The positive constraint part of first set of inequality constraint is b − Ax because


b − Ax ≥ 0
Similarly positive constraint part of second set of inequality constraint is x
because x ≥ 0

As per the duality theorem we can go for

max cT x 
x 
Ax ≥ b  ⇒ min max L( x, y ≥ 0, λ ≥ 0) = cT x + yT ( b − Ax ) + λ T x
y ,λ
x≥0 
x



With respect to primal variable we find optimality condition and substitute back
into lagrangian to obtain a minimization problem in terms of dual variables.

∂L
= 0 ⇒ c − AT y + λ = 0 vector
∂x

To eliminate x variables from lagrangian, we substitute c = AT y − λ in lagrangian.

cT x + yT ( Ax − b ) + λ T x = ( AT y − λ )T x + yT ( b − Ax ) + λ T x

Now lagrangian becomes L( y ≥ 0, λ ≥ 0) = bT y .


The dual variables y, λ are related by c = AT y − λ . This is equivalent to the
constraint AT y ≥ c since λ ≥ 0

So as per duality, we obtain a new optimization problem

min bT y
y

AT y ≥ c
y≥0

Assignment Question

Find the dual optimization problem for SVM problem given by

1
min wT w
w,γ 2

di ( wT xi − γ ) ≥ 1 ∀i = 1: m

Refer our book on SVM or any other book on kernel methods.

You are about to enter into the world of kernel methods which revolutionized
machine learning theory in 1990s .

Note that in deep learning algorithms for classification, the last block is still an
SVM classifier.

Kernel PCA, Kernel CCA Kernel ICA are powerful concepts useful in AI and data
science.

You might also like