Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
0 of .
Results for:
P. 1

# Subgradient and Bundle methods Report

Ratings: (0)|Views: 328 |Likes:

### Availability:

See more
See less

01/28/2013

pdf

text

original

Harsh PareekMay 5, 2010
Abstract
Minimizing a convex function over a convex region is an importantproblem in Nonlinear Programming. Several methods have been proposedand studied for optimization of diﬀerentiable functions. We introduce anumber of methods for optimization for non-smooth functions, all basedon the notion of subgradients. For these optimizations, we consider sub-gradient and bundle methods in detail.
1 Introduction
A function is said to be smooth if it is diﬀerentiable and the derivatives are con-tinuous. Many methods exist for the optimization of smooth convex functions(notably Steepest Descent Methods, Newton and Quasi-Newton methods andInterior point methods) [2] and [3] detail these methods and their implementa-tion.One can ask the question, Why are convex problems given such importance?To rephrase a quote by T. Rockafeller,
The great divide in optimization is not between linear and nonlinear problems,but between convex and nonconvex problems.
1.1 Motivation
Many naturally occuring problems are nonsmooth. Some common examplesare:
Hinge loss:
(
x
) = max(0
,
1
x
)
Piecewise Linear functions
For a problem,min
(
x
)
g
(
x
)
0An equivalent problem ismin
t
(
x
)
tg
(
x
)
01

This objective is always smooth and convex, but the problem is inherentlystill nonsmooth. (In some sense, the nonsmoothness has been shifted tothe constraints)
A function approximating a non-smooth function, may be analyticallysmooth, but “numerically nonsmooth”. For example, it may have similaroscillatory behaviour under iterative algorithms such as Gradient descent
1.2 Approaches to Solutions
Nonsmooth problems have arisen in many ﬁelds and have been solved by variousmethods.[5] covers a number of such methods for certain classes of optimizationproblems. Some general approaches are:
Approximate by a smooth function or a sequence of smooth functions
Reformulate the problem adding more variables/constraints such that theproblem becomes smooth
Cutting Plane Methods
: Lower bound the function by a piecewiselinear function and use it to iteratively ﬁnd the minimum.
Moreau-Yosida Regularization
Bundle Methods
: Combine the above two methods
U
decomposition
: Decompose the function to facilitate optimization.Refer [13]
As we deﬁne gradients for diﬀerentiable convex analysis, we deﬁne subgradientswhen the objective is not smooth.From the theory of convex analysis we have, a diﬀerentiable function
(
x
) isconvex iﬀ
x,y
R
m
(
y
)
(
x
) +
(
x
)
T
(
y
x
x
is deﬁned as any
g
R
n
such that
y
R
m
(
y
)
(
x
) +
g
T
(
y
x
at
x
is denoted
∂f
(
x
)2

A number of facts regarding nonsmooth convex functions should be noted.
A convex function is always subdiﬀerentiable i.e. a subgradient of a convexfunction exists at every point
Directional derivatives also exist at every point.
If a convex function
is diﬀerentiable at
x
, its subgradient is a singletonset containing only the gradient at that point. i.e.
∂f
(
x
) =
{
(
x
)
}
Let
(
x
;
d
) denote the directional derivative of
in the direction
d
andlet t
R
. We note that, from deﬁnition
(
x
+ t
d
)
(
x
)t
g
d
g
∈ 
(
x
) (3)So, Subgradients are “lower bounds” for directional derivatives.
In fact,
(
x
;
d
) = sup
g
∂f
(
x
)
g,d
Further,
d
is a descent direction iﬀ
g
d <
0
g
∂f
(
x
)
1.5 Properties
Just as we have an arithmetic for gradients, we have an arithmetic for Subgra-dients. Rigorous proofs for these results are presented in [9]Properties:
∂
(
1
+
2
)(
x
) =
∂f
1
(
x
) +
∂f
2
(
x
)
∂αf
(
x
) =
α∂f
(
x
)
g
(
x
) =
(
Ax
+
b
)
∂g
(
x
) =
A
∂f
(
Ax
+
b
)We have maxima and minima conditions in diﬀerential calculus (namely if a point is an extremal point, then its derivative is 0)The extension of this lemma to nonsmooth functions is :Local extremum
0
∂f
(
x
)We note that this is not very useful in practice. This is because the derivativemay not vary continuously, i.e. its value at a point may not be representativeof its values nearby.This makes ﬁnding minima for general nonsmooth functions impossible. Aswe shall see, we can us the convexity to ﬁnd the location of the minima (as itis unique).As an example, consider the case
(
x
) =
|
x
|
, the oracle returns subgradient0 only at
x
= 0. So, the magnitude of the subgradient is no indication of howclose we are to the ﬁnal solution. During an iterative method, we may never hitzero, though we come arbitrarily close.3