This action might not be possible to undo. Are you sure you want to continue?

# Subgradient and Bundle Methods

for optimization of convex non-smooth functions

April 1, 2009

Subgradient and Bundle Methods

Motivation

**Many naturally occuring problems are nonsmooth
**

Hinge loss Feasible region of a convex minimization problem Piecewise Linear function

If a function is approximating a non-smooth function, then it may be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Motivation

**Many naturally occuring problems are nonsmooth
**

Hinge loss Feasible region of a convex minimization problem Piecewise Linear function

If a function is approximating a non-smooth function, then it may be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Motivation

**Many naturally occuring problems are nonsmooth
**

Hinge loss Feasible region of a convex minimization problem Piecewise Linear function

If a function is approximating a non-smooth function, then it may be analytically smooth, but “numerically nonsmooth”

Subgradient and Bundle Methods

Motivation

**Many naturally occuring problems are nonsmooth
**

Hinge loss Feasible region of a convex minimization problem Piecewise Linear function

Subgradient and Bundle Methods

Motivation

**Many naturally occuring problems are nonsmooth
**

Hinge loss Feasible region of a convex minimization problem Piecewise Linear function

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functions Reformulate the problem adding more constraints such that the objective is smooth Subgradient Methods Cutting Plane Methods Moreau-Yosida Regularization Bundle Methods U V decomposition

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functions Reformulate the problem adding more constraints such that the objective is smooth Subgradient Methods Cutting Plane Methods Moreau-Yosida Regularization Bundle Methods U V decomposition

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Approximate by a series of smooth functions Reformulate the problem adding more constraints such that the objective is smooth Subgradient Methods Cutting Plane Methods Moreau-Yosida Regularization Bundle Methods U V decomposition

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Subgradient and Bundle Methods

Methods for nonsmooth optimizations

Subgradient and Bundle Methods

Deﬁnition

An extension of gradients

For a convex differentiable function f (x), ∀x, y f (y ) ≥ f (x) + f (x)T (y − x) (1)

So, a subgradient is deﬁned as any g ∈ Rn such that ∀y f (y ) ≥ f (x) + g T (y − x) The set of all subgradients of f at x is denoted ∂f (x) (2)

Subgradient and Bundle Methods

Deﬁnition

An extension of gradients

For a convex differentiable function f (x), ∀x, y f (y ) ≥ f (x) + f (x)T (y − x) (1)

So, a subgradient is deﬁned as any g ∈ Rn such that ∀y f (y ) ≥ f (x) + g T (y − x) The set of all subgradients of f at x is denoted ∂f (x) (2)

Subgradient and Bundle Methods

Deﬁnition

An extension of gradients

For a convex differentiable function f (x), ∀x, y f (y ) ≥ f (x) + f (x)T (y − x) (1)

So, a subgradient is deﬁned as any g ∈ Rn such that ∀y f (y ) ≥ f (x) + g T (y − x) The set of all subgradients of f at x is denoted ∂f (x) (2)

Subgradient and Bundle Methods

Some Facts

From Convex Analysis

A convex function is always subdifferentiable i.e. the Subgradient of a convex function exists at every point. Directional derivatives also exist at every point. If a convex function f is differentiable at x, its subgradient is the gradient at that point. i.e. ∂f (x) = { f (x)} Subgradients are lower bounds for directional derivatives. f (x; d) = supg∈∂f (x) g, d Further, d is a descent direction iff g T d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Some Facts

From Convex Analysis

A convex function is always subdifferentiable i.e. the Subgradient of a convex function exists at every point. Directional derivatives also exist at every point. If a convex function f is differentiable at x, its subgradient is the gradient at that point. i.e. ∂f (x) = { f (x)} Subgradients are lower bounds for directional derivatives. f (x; d) = supg∈∂f (x) g, d Further, d is a descent direction iff g T d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Some Facts

From Convex Analysis

A convex function is always subdifferentiable i.e. the Subgradient of a convex function exists at every point. Directional derivatives also exist at every point. If a convex function f is differentiable at x, its subgradient is the gradient at that point. i.e. ∂f (x) = { f (x)} Subgradients are lower bounds for directional derivatives. f (x; d) = supg∈∂f (x) g, d Further, d is a descent direction iff g T d < 0 ∀g ∈ ∂f (x)

Subgradient and Bundle Methods

Some Facts

From Convex Analysis

Subgradient and Bundle Methods

Some Facts

From Convex Analysis

Subgradient and Bundle Methods

Some Facts

From Convex Analysis

Subgradient and Bundle Methods

Properties

Without Proof

∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x) ∂αf (x) = α∂f (x) g(x) = f (Ax + b) ⇒ ∂g(x) = AT ∂f (Ax + b) Local minima ⇒ 0 ∈ ∂f (x) However, For f (x) = |x|, the oracle returns subgradient 0 only at 0. So this is not a good way to ﬁnd minima

Subgradient and Bundle Methods

Properties

Without Proof

∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x) ∂αf (x) = α∂f (x) g(x) = f (Ax + b) ⇒ ∂g(x) = AT ∂f (Ax + b) Local minima ⇒ 0 ∈ ∂f (x) However, For f (x) = |x|, the oracle returns subgradient 0 only at 0. So this is not a good way to ﬁnd minima

Subgradient and Bundle Methods

Properties

Without Proof

∂(f1 + f2 )(x) = ∂f1 (x) + ∂f2 (x) ∂αf (x) = α∂f (x) g(x) = f (Ax + b) ⇒ ∂g(x) = AT ∂f (Ax + b) Local minima ⇒ 0 ∈ ∂f (x) However, For f (x) = |x|, the oracle returns subgradient 0 only at 0. So this is not a good way to ﬁnd minima

Subgradient and Bundle Methods

Properties

Without Proof

Subgradient and Bundle Methods

Properties

Without Proof

Subgradient and Bundle Methods

Properties

Without Proof

Subgradient and Bundle Methods

Subgradient Method

Algorithm

Subgradient Method is NOT a descent method! x (k +1) = x (k ) − αk g k for αk ≥ 0 and g k ∈ ∂f (x) fbest = min{fbest , f (x (k ) )} Line search is not performed. Step lengths αk usually ﬁxed ahead of time

(k ) (k −1)

Subgradient and Bundle Methods

Subgradient Method

Algorithm

Subgradient Method is NOT a descent method! x (k +1) = x (k ) − αk g k for αk ≥ 0 and g k ∈ ∂f (x) fbest = min{fbest , f (x (k ) )} Line search is not performed. Step lengths αk usually ﬁxed ahead of time

(k ) (k −1)

Subgradient and Bundle Methods

Subgradient Method

Algorithm

Subgradient Method is NOT a descent method! x (k +1) = x (k ) − αk g k for αk ≥ 0 and g k ∈ ∂f (x) fbest = min{fbest , f (x (k ) )} Line search is not performed. Step lengths αk usually ﬁxed ahead of time

(k ) (k −1)

Subgradient and Bundle Methods

Subgradient Method

Algorithm

(k ) (k −1)

Subgradient and Bundle Methods

Step Lengths

**Commonly used Step lengths Constant step size: αk = α Constant step length: αk = αk = γ/ g (k )
**

2

**Square summable but not summable step size: ∞ ∞ 2 αk ≥ 0, k =1 αk < ∞, k =1 αk = ∞. Nonsummable diminishing: αk ≥ 0, limk →∞ αk = 0,
**

∞ k =1 αk

= ∞.

Nonsummable diminishing step lengths: ∞ γk ≥ 0, limk →∞ γk = 0, k =1 γk = ∞.

Subgradient and Bundle Methods

Step Lengths

**Commonly used Step lengths Constant step size: αk = α Constant step length: αk = αk = γ/ g (k )
**

2

**Square summable but not summable step size: ∞ ∞ 2 αk ≥ 0, k =1 αk < ∞, k =1 αk = ∞. Nonsummable diminishing: αk ≥ 0, limk →∞ αk = 0,
**

∞ k =1 αk

= ∞.

Nonsummable diminishing step lengths: ∞ γk ≥ 0, limk →∞ γk = 0, k =1 γk = ∞.

Subgradient and Bundle Methods

Step Lengths

**Commonly used Step lengths Constant step size: αk = α Constant step length: αk = αk = γ/ g (k )
**

2

**Square summable but not summable step size: ∞ ∞ 2 αk ≥ 0, k =1 αk < ∞, k =1 αk = ∞. Nonsummable diminishing: αk ≥ 0, limk →∞ αk = 0,
**

∞ k =1 αk

= ∞.

Nonsummable diminishing step lengths: ∞ γk ≥ 0, limk →∞ γk = 0, k =1 γk = ∞.

Subgradient and Bundle Methods

Step Lengths

**Commonly used Step lengths Constant step size: αk = α Constant step length: αk = αk = γ/ g (k )
**

2

∞ k =1 αk

= ∞.

Nonsummable diminishing step lengths: ∞ γk ≥ 0, limk →∞ γk = 0, k =1 γk = ∞.

Subgradient and Bundle Methods

Step Lengths

**Commonly used Step lengths Constant step size: αk = α Constant step length: αk = αk = γ/ g (k )
**

2

∞ k =1 αk

= ∞.

Nonsummable diminishing step lengths: ∞ γk ≥ 0, limk →∞ γk = 0, k =1 γk = ∞.

Subgradient and Bundle Methods

Convergence Result

Assume that ∃G such that the norm of the subgradients is bounded i.e. ||g (k ) ||2 ≤ G (For example, Suppose f is Lipshitz continuous) Result 2 k αi i=1 Proof is through proving ||x − x ∗ || decreases

k fbest

−

f∗

≤

dist x (1) , X ∗

2

+ G2

k 2 i=1 αi

Subgradient and Bundle Methods

Convergence Result

Assume that ∃G such that the norm of the subgradients is bounded i.e. ||g (k ) ||2 ≤ G (For example, Suppose f is Lipshitz continuous) Result 2 k αi i=1 Proof is through proving ||x − x ∗ || decreases

k fbest

−

f∗

≤

dist x (1) , X ∗

2

+ G2

k 2 i=1 αi

Subgradient and Bundle Methods

Convergence Result

Assume that ∃G such that the norm of the subgradients is bounded i.e. ||g (k ) ||2 ≤ G (For example, Suppose f is Lipshitz continuous) Result 2 k αi i=1 Proof is through proving ||x − x ∗ || decreases

k fbest

−

f∗

≤

dist x (1) , X ∗

2

+ G2

k 2 i=1 αi

Subgradient and Bundle Methods

Convergence Result

k fbest

−

f∗

≤

dist x (1) , X ∗

2

+ G2

k 2 i=1 αi

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

Square summable but not summable step size: fbest → f ∗ Nonsummable diminishing: fbest → f ∗ Nonsummable diminishing step lengths: fbest → f ∗ fbest − f ∗ ≤

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

Square summable but not summable step size: fbest → f ∗ Nonsummable diminishing: fbest → f ∗ Nonsummable diminishing step lengths: fbest → f ∗ fbest − f ∗ ≤

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

Square summable but not summable step size: fbest → f ∗ Nonsummable diminishing: fbest → f ∗ Nonsummable diminishing step lengths: fbest → f ∗ fbest − f ∗ ≤

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

**Convergence for Commonly used Step lengths
**

G2 h of optimal 2 within Gh of optimal

(k ) (k )

**Constant step size: fbest within Constant step length: fbest
**

(k )

(k )

(k ) (k )

R 2 + G2 2

k 2 i=1 αi

k i=1 αi

R/G So, optimal αi are √ and converges in (RG/ )2 steps k

Subgradient and Bundle Methods

Variations

If optimal value is known eg. if the optimal value is known to be 0, but the point is not known f (x (k ) ) − f ∗ αk = ||g (k ) ||2 Projected Subgradient: minimize f (x) s.t. x ∈ C x (k +1) = P(x (k ) + αk g (k ) ) Alternating projections: Find a point in the intesection of 2 convex sets Heavy Ball method: x (k +1) = x (k ) − αk g (k ) + βk (x (k ) − x (k −1))

Subgradient and Bundle Methods

Variations

If optimal value is known eg. if the optimal value is known to be 0, but the point is not known f (x (k ) ) − f ∗ αk = ||g (k ) ||2 Projected Subgradient: minimize f (x) s.t. x ∈ C x (k +1) = P(x (k ) + αk g (k ) ) Alternating projections: Find a point in the intesection of 2 convex sets Heavy Ball method: x (k +1) = x (k ) − αk g (k ) + βk (x (k ) − x (k −1))

Subgradient and Bundle Methods

Variations

If optimal value is known eg. if the optimal value is known to be 0, but the point is not known f (x (k ) ) − f ∗ αk = ||g (k ) ||2 Projected Subgradient: minimize f (x) s.t. x ∈ C x (k +1) = P(x (k ) + αk g (k ) ) Alternating projections: Find a point in the intesection of 2 convex sets Heavy Ball method: x (k +1) = x (k ) − αk g (k ) + βk (x (k ) − x (k −1))

Subgradient and Bundle Methods

Variations

Subgradient and Bundle Methods

Pros

Can immediately be applied to a wide variety of problems, especially when accuracy required is not very high. Low memory usage Often possible to design distributed methods if objective is decomposible

Cons

Slower than second-order methods

Subgradient and Bundle Methods

Pros

Can immediately be applied to a wide variety of problems, especially when accuracy required is not very high. Low memory usage Often possible to design distributed methods if objective is decomposible

Cons

Slower than second-order methods

Subgradient and Bundle Methods

Pros

Can immediately be applied to a wide variety of problems, especially when accuracy required is not very high. Low memory usage Often possible to design distributed methods if objective is decomposible

Cons

Slower than second-order methods

Subgradient and Bundle Methods

Can immediately be applied to a wide variety of problems, especially when accuracy required is not very high. Low memory usage Often possible to design distributed methods if objective is decomposible

Cons

Slower than second-order methods

Subgradient and Bundle Methods

Can immediately be applied to a wide variety of problems, especially when accuracy required is not very high. Low memory usage Often possible to design distributed methods if objective is decomposible

Cons

Slower than second-order methods

Subgradient and Bundle Methods

Can immediately be applied to a wide variety of problems, especially when accuracy required is not very high. Low memory usage Often possible to design distributed methods if objective is decomposible

Cons

Slower than second-order methods

Subgradient and Bundle Methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject to x ∈C Construct an Approximate Model: f (x) = maxi∈I (ˆ(xi ) + gi T (x − xi ) f Minimize model over x and ﬁnd f (x) and g Update model and repeat till desired accuracy Numerically unstable

Subgradient and Bundle Methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject to x ∈C Construct an Approximate Model: f (x) = maxi∈I (ˆ(xi ) + gi T (x − xi ) f Minimize model over x and ﬁnd f (x) and g Update model and repeat till desired accuracy Numerically unstable

Subgradient and Bundle Methods

Cutting Plane Method

Again, Consider the problem: minimize f (x) subject to x ∈C Construct an Approximate Model: f (x) = maxi∈I (ˆ(xi ) + gi T (x − xi ) f Minimize model over x and ﬁnd f (x) and g Update model and repeat till desired accuracy Numerically unstable

Subgradient and Bundle Methods

Cutting Plane Method

Subgradient and Bundle Methods

Cutting Plane Method

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems to minimize f (x) λ F (x) = miny ∈Rn f (y ) + ||y − x||2 2 λ p(x) = argminy ∈Rn f (y ) + ||y − x||2 2 F (x) is differentiable! F (x) = λ(x − p(x)) Minimization is done using the dual. Cutting Plane Method + Moreau-Yosida Regularization = Bundle Methods

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems to minimize f (x) λ F (x) = miny ∈Rn f (y ) + ||y − x||2 2 λ p(x) = argminy ∈Rn f (y ) + ||y − x||2 2 F (x) is differentiable! F (x) = λ(x − p(x)) Minimization is done using the dual. Cutting Plane Method + Moreau-Yosida Regularization = Bundle Methods

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Idea: solve a series of smooth convex problems to minimize f (x) λ F (x) = miny ∈Rn f (y ) + ||y − x||2 2 λ p(x) = argminy ∈Rn f (y ) + ||y − x||2 2 F (x) is differentiable! F (x) = λ(x − p(x)) Minimization is done using the dual. Cutting Plane Method + Moreau-Yosida Regularization = Bundle Methods

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Subgradient and Bundle Methods

Moreau-Yosida Regularization

Subgradient and Bundle Methods

Elementary Bundle Method

As before, f is assumed to be Lipshitz continuous At a generic iteration we maintain a “bundle” < yi , f (yi ), si , αi >

Subgradient and Bundle Methods

Elementary Bundle Method

As before, f is assumed to be Lipshitz continuous At a generic iteration we maintain a “bundle” < yi , f (yi ), si , αi >

Subgradient and Bundle Methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularization for building the model µk ˆ yk +1 = argminy ∈Rn ˆk (y ) + f ||y − x k ||2 2 µk k +1 ˆ ˆ δk = f (x k ) − [ˆk (y k +1 ) + f ||y − x k ||2 ] ≥ 0 2 if δk < δ stop ˆ If f (x k ) − f (y k +1 ) ≥ mδk ˆ Serious Step x k +1 = y k +1 ˆ ˆ else Null Step x k +1 = x k ˆk +1 (y ) = max{ˆk (y ), f (y k +1 ) + sk +1 , y − y k +1 } f f

Subgradient and Bundle Methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularization for building the model µk ˆ yk +1 = argminy ∈Rn ˆk (y ) + f ||y − x k ||2 2 µk k +1 ˆ ˆ δk = f (x k ) − [ˆk (y k +1 ) + f ||y − x k ||2 ] ≥ 0 2 if δk < δ stop ˆ If f (x k ) − f (y k +1 ) ≥ mδk ˆ Serious Step x k +1 = y k +1 ˆ ˆ else Null Step x k +1 = x k ˆk +1 (y ) = max{ˆk (y ), f (y k +1 ) + sk +1 , y − y k +1 } f f

Subgradient and Bundle Methods

Elementary Bundle Method

Follow Cutting Plane Method, but use M-Y Regularization for building the model µk ˆ yk +1 = argminy ∈Rn ˆk (y ) + f ||y − x k ||2 2 µk k +1 ˆ ˆ δk = f (x k ) − [ˆk (y k +1 ) + f ||y − x k ||2 ] ≥ 0 2 if δk < δ stop ˆ If f (x k ) − f (y k +1 ) ≥ mδk ˆ Serious Step x k +1 = y k +1 ˆ ˆ else Null Step x k +1 = x k ˆk +1 (y ) = max{ˆk (y ), f (y k +1 ) + sk +1 , y − y k +1 } f f

Subgradient and Bundle Methods

Elementary Bundle Method

Subgradient and Bundle Methods

Elementary Bundle Method

Subgradient and Bundle Methods

Elementary Bundle Method

Subgradient and Bundle Methods

Convergence

The Algorithm either makes a ﬁnite number of Serious Steps and then only makes Null steps Then, If k0 is the last Serious Step, and µk is nondecreasing, then δk → 0 Or it makes an inﬁnite number of Serious steps ˆ f (x0 ) − f ∗ Then, k ∈Ks δk ≤ so δk → 0 m

Subgradient and Bundle Methods

Convergence

The Algorithm either makes a ﬁnite number of Serious Steps and then only makes Null steps Then, If k0 is the last Serious Step, and µk is nondecreasing, then δk → 0 Or it makes an inﬁnite number of Serious steps ˆ f (x0 ) − f ∗ Then, k ∈Ks δk ≤ so δk → 0 m

Subgradient and Bundle Methods

Convergence

The Algorithm either makes a ﬁnite number of Serious Steps and then only makes Null steps Then, If k0 is the last Serious Step, and µk is nondecreasing, then δk → 0 Or it makes an inﬁnite number of Serious steps ˆ f (x0 ) − f ∗ Then, k ∈Ks δk ≤ so δk → 0 m

Subgradient and Bundle Methods

Convergence

Subgradient and Bundle Methods

Convergence

Subgradient and Bundle Methods

Convergence

The Algorithm either makes a ﬁnite number of Serious Steps and then only makes Null steps Then, If k0 is the last Serious Step, and µk is nondecreasing, then δk → 0 Or it makes an inﬁnite number of Serious steps ˆ f (x0 ) − f ∗ so δk → 0 Then, k ∈Ks δk ≤ m

Subgradient and Bundle Methods

Variations

Replace ||y − x||2 by (y − x)T Mk (y − x) : Still differentiable Conjuguate Gradient methods are achieved as a slight modiﬁcation of the algorithm (Refer [5]) Variable Metric Methods [10] Mk = uk I for Diagonal Variable Metric Methods Bundle-Newton Methods

Subgradient and Bundle Methods

Variations

Replace ||y − x||2 by (y − x)T Mk (y − x) : Still differentiable Conjuguate Gradient methods are achieved as a slight modiﬁcation of the algorithm (Refer [5]) Variable Metric Methods [10] Mk = uk I for Diagonal Variable Metric Methods Bundle-Newton Methods

Subgradient and Bundle Methods

Variations

Replace ||y − x||2 by (y − x)T Mk (y − x) : Still differentiable Conjuguate Gradient methods are achieved as a slight modiﬁcation of the algorithm (Refer [5]) Variable Metric Methods [10] Mk = uk I for Diagonal Variable Metric Methods Bundle-Newton Methods

Subgradient and Bundle Methods

Variations

Subgradient and Bundle Methods

Variations

Subgradient and Bundle Methods

Summary

Nonsmooth convex optimization has been explored since 1960’s. The original subgradient methods were introduced by Naum Shor. Bundle methods have been developed more recently. Subgradient Methods are simple but slow, unless distributed, which is the predominant current application. Bundle Methods solve a bounded QP, which is slow, but need fewer iterations. Preferred for applications where oracle cost is high.

Subgradient and Bundle Methods

Summary

Nonsmooth convex optimization has been explored since 1960’s. The original subgradient methods were introduced by Naum Shor. Bundle methods have been developed more recently. Subgradient Methods are simple but slow, unless distributed, which is the predominant current application. Bundle Methods solve a bounded QP, which is slow, but need fewer iterations. Preferred for applications where oracle cost is high.

Subgradient and Bundle Methods

Summary

Nonsmooth convex optimization has been explored since 1960’s. The original subgradient methods were introduced by Naum Shor. Bundle methods have been developed more recently. Subgradient Methods are simple but slow, unless distributed, which is the predominant current application. Bundle Methods solve a bounded QP, which is slow, but need fewer iterations. Preferred for applications where oracle cost is high.

Subgradient and Bundle Methods

Summary

Subgradient and Bundle Methods

Summary

Subgradient and Bundle Methods

For Further Reading I

Naum Z. Shor Minimization Methods for non-differentiable functions. Springer-Verlag, 1985. Boyd and Vanderberge Convex Optimization. Cambridge University Press A. Ruszczyinski Nonlinear Optimization Princeton University Press Wikipedia en.wikipedia.org/wiki/Subgradient_method

Subgradient and Bundle Methods

For Further Reading II

Marko Makela Survey of Bundle Methods, 2009 http://www.informaworld.com/smpp/ content~db=all~content=a713741700 Alexandre Belloni An Introduction to Bundle Methods http://web.mit.edu/belloni/www/ LecturesIntroBundle.pdf John E. Mitchell Cutting Plane and Subgradient Methods, 2005 http://www.optimization-online.org/DB_HTML/ 2009/05/2298.html

Subgradient and Bundle Methods

**For Further Reading III
**

Lecture Notes on Subgradient methods by Stephen Boyd http://www.stanford.edu/class/ee392o/ subgrad_method.pdf Alexander J. Smola, S.V. N. Vishwanathan, Quoc V. Le Bundle Methods for Machine Learning, 2007 http://books.nips.cc/papers/files/nips20/ NIPS2007_0470.pdf C Lemarechal Variable metric bundle methods, 1997. http://www.springerlink.com/index/ 3515WK428153171N.pdf Quoc Le, Alexander Smola Direct Optimization of Ranking Measures, 2007 http://arxiv.org/abs/0704.3359

Subgradient and Bundle Methods

For Further Reading IV

SVN Vishwanathan, A. Smola Quasi-Newton Methods for Efﬁcient Large-Scale Machine Learning http://portal.acm.org/ft_gateway.cfm?id= 1390309&type=pdf and www.stat.purdue.edu/~vishy/talks/LBFGS.pdf

Subgradient and Bundle Methods