28 views

Uploaded by Lucas Weaver

Is the best book for all types of solution methods

- N-R
- ECE4762011_Lect12
- Science
- Test_B AP Statistics
- Genetic Algorithms
- Adaptive Response surface based efficient finite element modeling
- Big Data Analytics Using Neural networks.pdf
- 1-s2.0-S0169743915001744-main
- tmpEE68
- Inference for the Least-Squares Line
- MultivariableRegressionAnalysisChapmanPH507June2016.pdf
- Like the Least Square Methods but without a Trial and Error Procedure
- Real Time Identification of the Hunt-Crossley Environment Dynamics Models
- Chapter4 (2W) - Signal Modeling -Statistical Digital Signal Processing and Modeling
- c Bjt Biasing and Stabilization
- ModelSim Trial(Struc)
- Digital IC Design Static Inverter Characteristics2
- DWT paper
- Time Response Analysis Part I
- A Unified Algorithm for Elementary Functions_by_Walther

You are on page 1of 37

3B1B Optimization

Michaelmas 2014

Newtons method

Line search

Quasi-Newton methods

Least-Squares and Gauss-Newton methods

Downhill simplex (amoeba) algorithm

A. Zisserman

f (x, y) = exp(x)(4x2 + 2y 2 + 4xy + 2x + 1)

5

-1

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0.5

Rosenbrocks function

2

2

2

f (x, y) = 100(y x ) + (1 x)

Rosenbrock function

3

2.5

2

1.5

1

0.5

0

-0.5

-1

-2

-1

Minimum is at [1, 1]

Steepest descent

The 1D line minimization must be performed using one of the earlier

methods (usually cubic polynomial interpolation)

Steepest Descent

Steepest Descent

3

2.5

0.85

2

0.8

1.5

1

0.75

0.5

0

0.7

-0.5

0.65

-1

-2

-1

-0.95

-0.9

-0.85

-0.8

The algorithm crawls down the valley

-0.75

1. Number of iterations required

3. Memory footprint

4. Region of convergence

Fit a quadratic approximation to f (x) using both first and second derivatives at x.

x2 00

f (x) + h.o.t

f (x + x) = f (x) + xf (x) +

2

0

f 0(x)

x = 00

f (x)

Update x

f 0(xn)

xn+1 = xn 00

f (xn)

A function may be approximated locally by its Taylor series expansion

about a point x0

f (x0 + x) f (x0) +

f f

,

x y

x

y

2f

1

2

+ (x, y) x

2f

2

xy

1

f (x0 + x) = a + g>x + x>H x

2

!

2f

x

xy

2f

y

y 2

+ h.o.t

Newtons method in ND

Expand f (x) by its Taylor series about the point xn

1 >

f (xn + x) f (xn) + g>

n x + x H n x

2

where the gradient is the vector

"

f

f

gn = f (xn) =

,...,

x1

xN

#>

H n = H (xn) =

2f

x2

1

..

...

...

2f

x1xN

2f

x1xN

f

x2

N

f (x) = gn + H nx = 0

with solution x = H1

n gn. This gives the iterative update

xn+1 = xn H1

n gn

xn+1 = xn + x

= xn H1

n gn

If f (x) is quadratic, then the solution is found in one step.

The method has quadratic convergence (as in the 1D case).

The solution x = H1

n gn is guaranteed to be a downhill direction

(provided that H is positive definite)

For numerical reasons the inverse is not actually computed, instead

x is computed as the solution of H x = gn.

Rather than jump straight to xn H1

n gn , it is better to perform a

line search which ensures global convergence

xn+1 = xn nH1

n gn

If H = I then this reduces to steepest descent.

Newton method with line search

2.5

2.5

1.5

1.5

0.5

0.5

-0.5

-0.5

-1

-2

-1

0

gradient < 1e-3 after 15 iterations

-1

-2

-1.5

-1

-0.5

0.5

1.5

quadratic approximations

for steepest descent.

However, the method requires computing the Hessian matrix at each

iteration this is not always feasible

Quasi-Newton methods

If the problem size is large and the Hessian matrix is dense then it

may be infeasible/inconvenient to compute it directly.

estimate of H (x), updated at each iteration using new gradient

information.

(BFGS), and also Davidson, Fletcher and Powell (DFP).

e.g. in 1D

First derivatives

f f0

f f1

h

h

and f 0 (x0 ) = 0

f 0(x0 + ) = 1

2

h

2

h

second derivative

f(x)

f0f1

f1 f0

f 2f0 + f1

h

= 1

f 00(x0) = h

2

h

x-1

h

x0

x+1

Quasi-Newton: BFGS

Set H 0 = I.

Update according to

qnq>

(H nsn) (H nsn)>

n

H n+1 = H n + >

qn sn

s>

n H n sn

where

sn = xn+1 xn

qn = gn+1 gn

The matrix itself is not stored, but rather represented compactly by

a few stored vectors.

The estimate H n+1 is used to form a local quadratic approximation

as before.

Example

Rosenbrock function

3

2.5

2

1.5

1

0.5

0

-0.5

-1

-2

-1

method

In Matlab the optimization function fminunc uses a BFGS quasi-Newton

method for medium scale optimization problems

Matlab fminunc

>> f='100*(x(2)-x(1)^2)^2+(1-x(1))^2';

>> GRAD='[100*(4*x(1)^3-4*x(1)*x(2))+2*x(1)-2; 100*(2*x(2)-2*x(1)^2) ]';

Choose options for BFGS quasi-Newton

>> OPTIONS=optimset('LargeScale','off', 'HessUpdate','bfgs' );

>> OPTIONS = optimset(OPTIONS,'gradobj','on');

Start point

>> x = [-1.9; 2];

>> [x,fval] = fminunc({f,GRAD},x,OPTIONS);

This produces

x = 0.9998, 0.9996

fval = 3.4306e-008

It is very common in applications for a cost function f (x) to be the

sum of a large number of squared residuals

f (x) =

M

X

ri2

i=1

minimization of f (x) is a non-linear least squares problem.

The goal is to fit a smooth curve to measured

data points {si, ti} by minimizing the cost

f (x) =

i=1

ri2 =

i=1

(y(si, x) ti)2

target value

s

might be polynomial

y(s, x) = x0 + x1s + x2s2 + . . .

In this case the function is linear in the parameter x and there is a closed form solution. In

general there will not be a closed form solution

for non-linear functions y(s, x).

Cost function

Transformation parameters:

3D rotation matrix R

Image generation:

rotate and translate 3D model by R and T

project to generate image IR,T(x, y)

f (x ) =

M

X

i=1

ri2 = krk2

r1

x1

J(x) =

..

r

M

x1

...

...

r1

xN

rM

xN

assume

M>N

Consider

X

X 2

r

ri =

2ri i

xk i

xk

i

Hence

f (x) = 2J>r

J>

X ri

2 X 2

ri = 2

ri

xl xk i

xl i

xk

2ri

= 2

+2

ri

xk xl

i xk xl

i

X ri ri

Hence

2 J> J

H ( x) =

M

X

riR i

i=1

J>

Ri

the residuals ri.

In most problems, the residuals will typically be small.

Also, at the minimum, the residuals will typically be distributed with

mean = 0.

For these reasons, the second-order term is often ignored, giving the

Gauss-Newton approximation to the Hessian :

H (x) = 2J>J

Hence, explicit computation of the full Hessian can again be avoided.

Example Gauss-Newton

The minimization of the Rosenbrock function

f (x, y) = 100(y x2)2 + (1 x)2

can be written as a least-squares problem with residual vector

r=

r1

x

J(x) = r

2

x

"

10(y x2)

(1 x)

!

r1

20x 10

y =

r2

1

0

y

H(x) =

2f

x2

2f

xy

2f

xy

=

2f

y 2

"

400x

200

2J>J = 2

"

20x 1

10

0

#"

20x 10

1

0

"

800x2 + 2 400x

400x

200

For a cost function f (x) that is a sum of squared residuals

f (x) =

ri2

i=1

H (x) = 2J>J

and the gradient is given by

f (x) = 2J>r

So, the Newton update step

xn+1 = xn + x

= xn H1

n gn

computed as H x = gn, becomes

J>J x = J>r

These are called the normal equations.

>J

xn+1 = xn nH1

g

with

H

(

x

)

=

2

J

n

n n

n n

Gauss-Newton method with line search

2.5

2.5

1.5

1.5

0.5

0.5

-0.5

-0.5

-1

-2

-1

0

gradient < 1e-3 after 14 iterations

-1

-2

-1.5

-1

-0.5

0.5

takes only 14 iterations

1.5

Comparison

Gauss-Newton

Newton

Newton method with line search

2.5

2.5

1.5

1.5

0.5

0.5

-0.5

-0.5

-1

-2

-1

-1

-2

-1

(i.e. n^2 second derivatives)

approximates Hessian by

Jacobian product

algorithm

Due to Nelder and Mead (1965)

A direct method: only uses function evaluations (no derivatives)

a simplex is the polytope in N dimensions with N+1 vertices, e.g.

2D: triangle

3D: tetrahedron

basic idea: move by reflections, expansions or contractions

start

reflect

expand

contract

Reorder the points so that f (xn+1 ) > f (x2) > f (x1) (i.e. xn+1 is the worst point).

Generate a trial point xr by reflection

xr =

x + (

x xn+1)

P

where

x=

i xi /(N + 1) is the centroid and > 0. Compute f (xr ), and there

are then 3 possibilities:

1. f (x1) < f (xr ) < f (xn ) (i.e. xr is neither the new best or worst point), replace

xn+1 by xr .

2. f (xr ) < f (x1 ) (i.e. xr is the new best point), then assume direction of reflection

is good and generate a new point by expansion

xe = xr + (xr

x)

where > 0. If f (xe) < f (xr ) then replace xn+1 by xe, otherwise, the expansion

has failed, replace xn+1 by xr .

3. f (xr ) > f (xn) then assume the polytope is too large and generate a new point

by contraction

xc =

x + (xn+1

x)

where (0 < < 1) is the contraction coecient. If f (xc) < f (xn+1 ) then the

contraction has succeeded and replace xn+1 by xc, otherwise contract again.

Standard values are = 1, = 1, = 0.5.

Example

Downhill Simplex

3

2.5

2

1.5

1

0.5

0

-0.5

-1

-2

-1

2.7

2.6

2.5

2.4

2.3

2.2

2.1

2

1.9

1.8

-2

-1.8

-1.6

-1.4

detail

-1.2

-1

Matlab fminsearch

with 200 iterations

0

-1

-1

contractions

0

-1

0

-1

-1

XX

YY

3

2

YY

3

2

1

YY

0

-1

0

1

XX

3

1

-1

-1

XX

XX

YY

-1

0

-1

-1

XX

reflections

YY

3

2

-1

YY

3

2

-1

YY

3

2

1

-1

YY

-1

XX

1

XX

-1

XX

Summary

no derivatives required

deals well with noise in the cost function

is able to crawl out of some local minima (though, of course, can still get stuck)

Matlab fminsearch

Nelder-Mead simplex direct search

>> banana = @(x)100*(x(2)-x(1)^2)^2+(1-x(1))^2;

Pass the function handle to fminsearch:

>> [x,fval] = fminsearch(banana,[-1.9, 2])

This produces

x = 1.0000 1.0000

fval =4.0686e-010

What is next?

Move from general and quadratic optimization problems

to linear programming

Constrained optimzation

- N-RUploaded byalmauly2
- ECE4762011_Lect12Uploaded byJesse Venzor
- ScienceUploaded bysureshpareth8306
- Test_B AP StatisticsUploaded byJohn SMith
- Genetic AlgorithmsUploaded byGowtham S Shetty
- Adaptive Response surface based efficient finite element modelingUploaded bysyed
- Big Data Analytics Using Neural networks.pdfUploaded byAndres Garzon Valencia
- 1-s2.0-S0169743915001744-mainUploaded byJose Mer
- tmpEE68Uploaded byFrontiers
- Inference for the Least-Squares LineUploaded byldlewis
- MultivariableRegressionAnalysisChapmanPH507June2016.pdfUploaded byNipaporn Tanny Thipmanee
- Like the Least Square Methods but without a Trial and Error ProcedureUploaded byJoseph Stanovsky
- Real Time Identification of the Hunt-Crossley Environment Dynamics ModelsUploaded bySujoe King-Nan
- Chapter4 (2W) - Signal Modeling -Statistical Digital Signal Processing and ModelingUploaded byKhanh Nam

- c Bjt Biasing and StabilizationUploaded bykarthikarajaratna
- ModelSim Trial(Struc)Uploaded byLucas Weaver
- Digital IC Design Static Inverter Characteristics2Uploaded byLucas Weaver
- DWT paperUploaded byLucas Weaver
- Time Response Analysis Part IUploaded byLucas Weaver
- A Unified Algorithm for Elementary Functions_by_WaltherUploaded byLucas Weaver
- b 0529042212Uploaded bynguynphuthanh
- S99chaUploaded byLucas Weaver
- IntroductionUploaded bychangela_ankur
- MOSInv_Statchar_part2Uploaded byLucas Weaver
- 2012_0060. Two-level - Tabulation MethodUploaded byLucas Weaver
- Chapter3 ExUploaded bypranavr
- Low Power ROMUploaded byLucas Weaver
- Nonlinear RegressionUploaded byAbhishek Kumar
- Dynamic and Static PowerUploaded byLucas Weaver
- Survey of Cordic Algorithm_byAndraka.pdfUploaded byLucas Weaver
- Survey of Cordic Algorithm_byAndraka.pdfUploaded byLucas Weaver
- OPAMPUploaded byLucas Weaver
- RanSaCUploaded byLucas Weaver
- Dual Mode Logic DesignUploaded byLucas Weaver
- Paper on CordicUploaded byLucas Weaver

- BISAR3Uploaded byJan Pabon
- Kennedy,k, Rapid Method for determining Ternary-Alloy Phase Diagrams.pdfUploaded byIzabele Ramos
- PMMUploaded byfardhu91
- 741989_Uploaded byIndra Aminudin
- SI-3090CUploaded byjosetantonio
- CapacitanceUploaded byjakelaker
- Power Generation Using Footstep-310Uploaded byLavanya
- Weeks 1-6 NotesUploaded byHugo Hammond
- Helico FlexUploaded by_Asylum_
- Anaerobic RespirationUploaded byPRINTDESK by Dan
- Spesifikasi Teknis Ac -DaikinUploaded byZaky Fachrur Rozi
- CC Tall Buildings GuideUploaded bymexces
- Welding DetailUploaded bycfcshaker
- etzyUploaded byAdonis Cabigon
- UPPCL Technical Interview Question Paper 2011 Placement Model Question Paper « JbigDeaLUploaded bymohitu_2
- EN-018-$$ALL[1]Uploaded bysistlasv
- HDK_N20Uploaded byadisorm
- BTC 9090Uploaded byVESANIAS
- Pump SiteUploaded byCharles Odada
- Grand Polycoat PolyurethaneUploaded byAniket Gaikwad
- PL6-59 F Technical SpecsUploaded byarzeszut
- Lesson Learned From Int- Wind Integration StudiesUploaded byAdel Klk
- Chapter Test ElectrostaticsUploaded byamogh kumar
- Fracciones Parciales StewartUploaded bycegale
- Masibus 85XX_R2_0110_Scanner (2)Uploaded byKvj Srinivasa Rao
- Current ElectricityUploaded byNimmi Nims
- Michael Faraday, D Wood, Edward J Goetz - Experimental Researches in Chemistry and Physics (1990, Nabu Press)Uploaded byEnzo Victorino Hernandez Agressott
- ATPL ElectronicsUploaded byAli Abdallah
- Chemical Engineerings Grand AdventuraUploaded byChino Hpta
- PROB SETUploaded byMary Daphne Buti