Newton's Method
Constructing fixed-point iterations can require some ingenuity
Need to rewrite f(x) = 0 in a form x = g(x), with appropriate
properties on
To obtain a more generally applicable iterative method, let us
consider the following fixed-point iteration
Xk = Xk — ACXK)E(%), Kk = 0,4,2,...
corresponding to g( (x) F(x), for some function
A fixed point « of g yields a solution to f(a) = 0 (except possibly
when (a) = 0). which is what we're trying to actieve!Newton's Method
Recall that the asymptotic convergence rate is dictated by |g’(a)|,
so we'd like to have |g'(a)| = 0 to get superlinear convergence
Suppose (as stated above) that f(a) = 0, then
N(a)F(a) — Ma)E(a)
g(a)= —A(a)f"(a)
Hence to satisfy g’(a”) = 0 we choose (x) = 1/f'(x) to get
Newton's method:
F(x)
Fx)!
k=0,1,2,...
Met = Xe —Newton's Method
Bascd on fixed-point iteration theory, Newton's method is
convergent since |g’(a)| =0< 1
However, we need a different argument to understand the
superlinear convergence rate properly
To do this, we use a Taylor expansion for f(a) about F(x.)
0= Fla) = Flo) + (a — sa) F Oe) + OS 91 0,)
for some 0, € (w, xx)Newton's Method
Dividing through by f’(xx) gives
F( "8,
(#03) -0= ena.
#(6x)
Met 8 = Fac Fs)! Xk — a
or
Hence, roughly speaking, the error at iteration k +1 is the square
of the error at each iteration k
This is referred to as quadratic convergence, which is very rapid!
Key point: Once again we need to be sufficiently close to @ to get
quadratic convergence (result relied on Taylor expansion near a)Secant Method
An alternative to Newton's method is to approximate f’(x,) using
the finite difference
_ F(x) = F%-1)
Xk — Xk-1
F(a)
Substituting this into the iteration leads to the secant method
Xk = Xk=1
Xe = Xk — (xx) ( =——— ],_ k= 1,2,3,...
ran Mea) (EHS)
‘The: main advantages of ‘secant’are:
> does not require us to determine F"(x) analytically
> requires only one extra function evaluation, f(x), per
iteration (Newton's method also requires ”(x,))Secant Method
As one may expect, secant converges faster than a fixed-point
iteration, but slewer than Newton's method
In fact, it can be shown that for the secant method, we have
von [eee =a
lim —————
Kee [xe = a?
where j1 is a positive constant and q ~ 1.6Multivariate CaseSystems of Nonlinear Equations
We now consider fixed-point iterations and Newton's method for
systems of nonlinear equations
We suppose that F : I" — It", n> 1, and we seek a root a € Ik”
such that F(a) = 0
In component form, this is equivalent to
Fi(a) = 0
F(a) = 0
F(a) = 0Fixed-Point Iteration
For a fixed-point iteration, we again seek to rewrite F(x) = 0 as
X = G(x) to obtain:
Xe+1 = G(X)
The convergence proof is the same as in the scalar case, if we
replace | «| with || - |
ive. if ||@(x) — G(y)I| < Ll\x — yl], then |[xx — al] < Lx — al
Hence, as before, if G is a contraction it will converge to a fixed
point «Fixed-Point Iteration
Recall that we define the Jacobian matrix, Jg € R"*", to be
IF | Jc(a)\loc <1, then there is some neighborhood of « for which
the fixed-point i:eration converges to a
The proof of this is a natural extension of the corresponding scalar
resultFixed-Point Iteration
Once again, we can employ a fixed point iteration to solve
F(x) =0
e.g. consider
x~txyy-l = 0
5x? + 2Ld -9
1
°
This can be rearranged to x1 = V = 8, m= vig — 5x?)/21Fixed-Point Iteration
Hence, we define
Gi(x1, 2)
fl —x3, Go(x1, x2) = y/ (9 - 5x?)/21
Ihis yields a convergent iterative methodNewton's Method
As in the one-dimensional case, Newton's method is generally more
useful than a standard fixed-point iteration
The natural generalization of Newton's method is
xiep = Xe — Ie(xk) °F (xe), K = 0,12,...
Note that to put Newton's method in the standard form for a
linear system, we write
Ielxp) Oxy = —F (xk), k= 01,2
where Axy = xket — XkNewton's Method
Once again, if xo is sufficiently close to a, then Newton's method
converges quadratically — we sketch the proof below
This result again relies on Taylor's Theorem
Hence we first consider how to generalize the familiar
one-dimensional laylor’s | heorem to Ik"
First, we consider the case for F: R?’ > RMultivariate Taylor Theorem
Let (s) = F(x + 58), then one-dimensional Taylor Theorem yields
(90) cae
ota) = 0) + 3 EO 5 6M), n€ (0,2),
aa
Also, we have
g(0.) = F(x)
41) = F(x+6)
(a) — PELESDD,, , OFA), DEL Ds,
On OX
ve Peo +. POH
ae FU tb) | PEG 5H 2
Onde, feet ogMultivariate Taylor Theorem
Hence, we have
l a)
F(x +6) = F(x) + +E,
where
e
wey= (are Bo) #] 00, e242
and
= Upabe+n6), 1 € (0.1)Multivariate Taylor Theorem
Let A be an upper hound on the abs. values of all derivatives of
order k +1, then
Kel < pay (AT Melk. ake
_ Tee HAMIE tot
= oa
where the last line follows from the fact that there are n'+1 terms
in the inner product (i.e. there are n+ derivatives of order k + 1)Multivariate Taylor Theorem
We shall only need an expansion up to first order terms for analysis
of Newton's method
From our expression above, we can write first order Taylor
expansion succinctly as:
F(x +6) = F(x) + VF(x)"5 + AMultivariate Taylor Theorem
For F : R® + R®, Taylor expansion follows by developing a Taylor
expansion for each Fi, hence
Fi(x + 6) = Fix) + VFi(x) 6+ Ea
so that for F : R” + R” we have
F(x +6) = F(x) + Je(x)0 + Er
FF,
Byon
where ||Ep||0 < max |Eji| < 3 ( max
isisn 1sijesn
) wate.Newton's Method
We now return to Newton's method
We have
0 = F(a) = F(xq) + Je (Xt) [ — x4] + EF
so that
Xm — 0 = Ue (xq) TF (x4) + We (xa) EFNewton's Method
Also, the Newton iteration itself can be rewritten as
Je (xe) [xiera — 0] = Je (Xe) [xe — | — F(X)
Hence, we obtain.
Xk — = (Sex) EFS
2
so that |jxij1 —Aljac < const. xx — al],
convergence!
i.e. quadraticNewton's Method
Example: Newton's method for the two-point Gauss quadrature
rule
Recall the system of equations
Filxisx2, Wi, W2) = wi + we —-2=0
Falx1.x2, m1, We) = wixr + wox2 = 0
Fa(x1.%2,Wi,W2) = wax; + woxz — 2/3 =0
wix? + woxd = 0
Fa(xi.x2, 1, W2)Newton's Method
We can solve ths in Python using our own implementation of
Newton's method
To do this, we require Lhe Jacobian of this systei:
0 0 161
eosveimiw) =| 9i@ 9 8
Bix? Bwoxd xp xpNewton's Method
Alternatively, we can use Python's built-in fsolve function
Note that #s01ve computes a finite difference approximation to
the Jacobian by default
(Or we can pass in an analytical Jacobian if we went)
Matlab has an equivalent fsolve function.Newton's Method
Python example With either approach and with starting guess
x9 = [-1,1,1, 1], we get
xk =
-0.577350269189626
0.577350269189626
1.000000000000000
1.000000000000000Conditions for OptimalityExistence of Global Minimum
In order to guarantee existence and uniqueness of a global min. we
need to make assumptions about the objective function
e.g. if f is continuous on a closed! and bounded set SC R” then
it has global minimum in $
In one dimension, this says f achieves a minimum on the interval
[2,4] CR
In general f does not achieve a minimum on (a, b), e.g. consider
f(x) — x
(Though | inf, f(x). the largest lower bound of F on (a,b). is
xe(a,
well-defined)
1A set is closed if it contains its own boundaryExistence of Global Minimum
Another helpful concept for existence of global min. is coercivity
A continuous function f on an unbounded set S C R” is coercive if
lim F(x) = +00
[xl] 400
That is, f(x) must be large whenever ||x|| is largeExistence of Global Minimum
If f is coercive cn a closed, unbounded? set S, then f has a global
minimum in S$
Proof: From the definition of coercivity, for any Mc IR, Ir > 0
such that F(x) > M for all x € S where |x|) > r
Suppose that 0 S, and set M — F(0)
Let ¥ = {x ES. |x|] > r}, v0 ual F(x) > £(0) for all x =
And we already F(x) =x? is not cwercive on R (f > —vv for x > —0v)
+ y* is coercive on R? (global min. at (0, 0))
> f(x) = e* is not coercive on R (f + 0 for x + —oo)Convexity
An important cencept for uniqueness is convexity
A set S CR” is convex if it contains the line segment between any
two of its points
That is, S is convex if for any x,y € S, we have
{ax+(1-Ay:0€[0,1}} cSConvexity
Similarly, we define convexity of a function f: $c R’>R
Ff is convex if its graph along any line segment in $ is on or below
the chord connecting the function values
ie F is convex if for any x,y € S and any # € (0,1), we have
F(@x + (1 —4)y) < OF (x) + (1 — 4) f(y)
Also, if
F(éx + (1—@)y) < OF(x) + (1 —8)F(y)
then f is strictly convexConvexity
5 @
Strictly convexConvexity
Non-convex,Convexity
Convex (not strictly convex)Convexity
If F is a convex “unction on a convex set S, then any local
minimum of f must be a global minimum?
Proof: Suppose x is a local minimum, i.e. f(x) < f(y) for
y € B(x,e) (where B(x, e) = {y € S. |ly—x|| <4)
Suppose that x is not a global minimum, i.e. that there exists
w CS such that F(w) < F(x)
(Then we will stow that this gives a contradiction)
3A global minirum is defined as a point z such that f(z) < f(x) for all
x €S. Note that a global minimum may not be unique, e.g. if F(x) — —cosx
then 0 and 2m are both global minima.Convexity
Proof (continued . .. ):
For 6 € (0, 1] we have f(Aw + (1—8)x) < OF(w) + (1 — AF (x)
Let o € (0, 1] be sufficiently small so that
z=aw+(l—o)x€ B(x,e)
Then
F(2) < af(w) + (1a) F(x) < aX) + (1) FX) = Fl)
ie. F(z) < F(x) which contradicts that F(x) is a local minimum!
Hence we cannot have w € 5 such that f(w) < fix) OConvexity
Note that convexity does not guarantee uniqueness of global
minimum
€.g. a convex function can clearly have a “horizontal” section (see
earlier plot)
If f is a strictly convex function on a convex set S, then a local
minimum of f is the unique global minimum
Optimization of convex functions over convex sets is called convex
optimization, which is an important subfield of optimizationOptimality Conditions
We have discussed existence and uniqueness of minima, but
haven't considered how to find a minimum
The familiar optimization idea fram calculus in one dimension is:
set derivative to zero, check the sign of the seconc derivative
This can be generalized to R”Optimality Conditions
If f : R° > R is differentiable. then the gradient vector
VF: R° > Ris
afx)
VE
The importance of the gradient is that VF points “uphill,” i.e
towards points with larger values than f(x)
And similarly —Vf points “downhill”Optimality Conditions
This follows from Taylor's theorem for f : R” + R
Recall that
F(x +6) = f(x) + VF(x)76 +H.0.T.
Let 6 = —cVF(x) for « > 0 and suppose that VF(x) #0, then:
F(x — eVF(x)) & F(x) — eVF(x) VF (x) < F(x)
Also, we see from Cauchy-Schwarz that —V f(x) is the steepest
descent directionOptimality Conditions
Similarly, we sec that a necessary condition for a local minimum at
x" € Sis that VF(x*) =0
In this case there is no “downhill direction” at x*
The condition Vf(x*) — 0 is called a
condition for optimality, since it only involves first derivatives
st-order necessaryOptimality Conditions
x* € S that satisfies the first-order optimality condition is called a
critical point of f
But of course a critical point can be a local min., ocal max., or
saddle point
(Recall that a saddle point is where some directiors are “downhill”
and others are “uphill”, e.g. (x,y) — (0,0) for f(x,y) — x2 — y2)Optimality Conditions
As in the one-dimensional case, we can look to second derivatives
to classify critical points
If f : R" > R is twice differentiable, then the Hessian is the
matrix valued function Hy : IR" > RI"
Pf) Pf)... PAX
ag Oa Enx
BA) PF) PFS
Hex) = es mp ‘oom
Pte) OFX). @F(X)
Dam Oe “ON
The Hessian is the Jacobian matrix of the gradient Vf : R° + R”
If the second partial derivatives of f are continuous, then
0°F /Oxixj — OF /Oxq0x;, and Hy is symmetricOptimality Conditions
Suppose we have found a critical point x*, so that Vf(x") = 0
From Taylor's Theorem, for J € R", we have
I(x" +6)
Ax) ETEGE)T E+ 50TH +1)6
F(x") + SeT M(x +n6)5
for some 1 € (0,1)Optimality Conditions
Recall positive definiteness: A is positive definite if x7 Ax > 0
Suppose //;(x*) is positive definite
Then (by continuity) Hy(x* +774) is also positive definite for |||)
sufficiently small, so that: 57 Hr(x* +) > 0
Hence, we have [(x* + 6) > f(x") for |]6] sufficienuly small, ie.
#(x*) is a local minimum
Hence, in general, positive definiteness of Hy at a critical point x*
is a second-order sufficient condition for a local minimumOptimality Conditions
A matrix can also be negative definite: x Ax <0 for all x £0
Or indefinite: There exists x, y such that x7 Ax <0 < y’Ay
Then we can classify critical points as follows:
> H,(x*) positive definite =» x* is a local minimum
> H,(x") negative definite => x" is a local maximum
> H;(x") indefinite => x’ Isa saddle pointOptimality Conditions
Also, positive definiteness of the Hessian is closely related to
convexity of F
If Hy(x) is posit ve definite, then f is convex on scme convex
neighborhood of x
If H;(x) is posit ve definite for all x < S, where $ is 2 convex set,
then f is convex on S
Question: How do we test for positive definiteness?Optimality Conditions
Answer: A is positive (resp. negative) definite if and only if all
eigenvalues of A are positive (resp. negative)*
Also, a matrix with positive and negative eigenvalues is indefinite
Hence we can compute all the eigenvalues of A and check their
signs
‘This is related to the Rayleigh quotient, see Unit VExample
Consider
F(x) = 2x} + 3x7 + Ldxime + 3x9 — 6m + 6
Then
Ox? + 6x1 + 12x
VEO) = | io + 69 —6
We set VF(x) = 0 to find critical points® [1,1] and [2, —3]7
5In general solving Vf(x) = 0 requires an iterative methodExample
The Hessian is
He(x)
12x,+6 12
12 6
and hence
H/(1,-1) = [ D t |; which has eigenvalues 25.4, —1.4
H¢(2, —3)
30 12
[ D6 |: which has eigenvalues 35.0, 1.0
Hence [2, —3]” is a local min. whereas [1,—1]7 is a saddle pointOptimality Conditions: Equality Constrained Case
So far we have ignored constraints
Let us now consider equality constrained optimization
min f(x) subject to g(x) = 0,
eR"
where f:R” > Rand g:R’—+R™, withm 1 equality constraints
Then g > R® > R® and we now have a set of constraint gradient
vectors, Vgj,/=1,...,m
Then we have S = {x € IR": g(x) =0,/
Any “tangent direction” at x € S must be orthogonal to all
gradient vectors {Vg;(x),=1,...,m} to remain in SOptimality Conditions: Equality Constrained Case
Let T(x) = {v € B®: Vgi(x)Tv =
orthogonal complement of {Vgj(x), i
-.,;m} denote the
mM
Then, for 6 € T(x) and € € Ryo, ¢6 is a step in a “tangent
direction” of S at x
Since we have
F(x" +66) = f(x") + eVF(x*) "6 +H.0.T.
it follows that fer a stationary point we need Vf(»*)'d = 0 for all
5€T(x*)Optimality Conditions: Equality Constrained Case
Hence, we require that at a stationary point x* ¢ 5 we have
VF(x*) € span{Vgi(x"), i =1,...,m}
This can be written succinctly as a linear system
VE(x*) = el)"
for some \* € R™, where (J,(x*))7 € R™™
This follows because the columns of (Jg(x*))™ are the vectors
{Vgi(x*), i =1,...,m}Optimality Conditions: Equality Constrained Case
We can write equality constrained optimization problems more
succinctly by introducing the Lagrangian function, £: R"*™ + R,
Llx,) = F(x) +AT g(x)
F(x) + Angi(x) +++ + AmBm(x)
Then we have,
20M 5), i i”Optimality Conditions: Equality Constrained Case
Hence
VaLlx,A V(x) + Je)
veoxay=| gree?) | = (44) |:
50 that the first order necessary condition for optimality for the
constrained proklem can be written as a nonlinear system:”
VE(x,A) = [me es | <6
(x)
"n+ m variables, n+ m equationsOptimality Conditions: Equality Constrained Case
As another example of equality constrained optimization, recall our
underdeterminec linear least squares problem
min f(b) subject to g(b) = 0,
where f(b) = b* 6, g(b) = Ab—y and AE R™" with m eventual quadratic convergenceQuasi-Newton Methods
Newton's method is effective for optimization, but it can be
unreliable, expensive, and complicated
> Unreliable: Only converges when sufficiently close to a
imum
» Expensive: The Hessian Hy is dense in general, hence very
expensive if n is large
» Complicated: Can be impractical or laborious to derive the
Hessian
Hence there has been much interest in so-called quasi-Newton
methods, which do not require the HessianQuasi-Newton Methods
General form of quasi-Newton methods:
Xk. = Xe — 4B, VE (XK)
where ax is a line search parameter and B, is some approximation
to the Hessian
Quasi-Newton methods generally lose quadratic convergence of
Newton's method, but often superlinear convergence is achieved
We now consider some specific quasi-Newton methodsBFGS
The Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is one of
the most popular quasi-Newton methods:
1: choose initial guess x0
2: choose Bp, initial Hessian guess, e.g. Bo = 1
3: for k=0,1,2,... do
4: solve Busy = —VF(xx)
5:
°:
mn
8:
Xkt1 = Xk + Sk
Va = VEX) — VEX)
Bust = By + ABy
; end for
where + r
Vn — Bu sesy Bx
AB, =
Yes — Sf BuSkBFGS
Actual implementation of BFGS: store and update inverse Hessian
to avoid solving linear system:
1: choose initial guess xa
2: choose Hp, initial inverse Hessian guess, e.g. Hy = 1
3: for k—0,1,2,...do
4 calculate s = Mk VF (x)
5: Xk $1 = Xk + Sk
6 = V E(x 1) — VEC)
¥
8:
Ayia = AH,
end for
where
BH = (1 akpaye Hel ayeoe) | pKoK5 Pk
Ye SkBFGS
BFGS is implemented as the fmin_bfgs function in
ccipy. optimize
Also, BFGS (1 trust region) is implemented in Metlab's fminunc
function, eg.
x0 = [555];
options = optimset(?Gradibj’,’on’);
(x,fval,exitflag,output] = ..
fminunc (@hinmelblau_function,x0,options) ;Conjugate Gradient Method
The conjugate gradient (CG) method is another alternative to
Newton's method that does not require the Hessian:
1: choose initial guess xo
2 go = VF(%)
3: x0 = —80
4: for k=0,1,2,... do
5: choose ny to minimize F (xq + 15k)
6: Xke1 = Xk + MKSk
7 Beer = VE(%41)
8 Bsa = (Bi-r8k+1)/ (Bi Bx)
9% Sepa — — Bata + Ba+isk
10: end forConstrained OptimizationEquality Constrained Optimization
We now consider equality constrained minimization:
min f(x) subject to g(x) =0,
where F:IR° > R and g:R" > R™
With the Lagrangian £(x,A) = F(x) + AT g(x), we recall from that
necessary condition for optimality is
_ | VF) +4Z (x) ] _
VL(x, A) pi
0
Once again, this is a nonlinear system of equations that can be
solved via Newton's methodSequential Quadratic Programming
To derive the Jacobian of this system, we write
VF(x) + Dy Ven (x) er?
a(x)
Then we need to differentiate wrt to x € R” and 1€ R™
VL(x,\) =
For i =1,...,7, we have
(VL(x, A) = 24 +e dente
ae)
Differentiating wrt xj, for i,j =1,...,n, gives
PF(x) Ay Peelx)
OO; Dae Oxide;
a
IgE i=Sequential Quadratic Programming
Hence the top-left n x n block of the Jacobian of VL(x, A) is
m
B(x, A) = Ar (x) 1 SO Ang, (x) ¢ RO
k=l
Differentiating (VL(x,A)); wrt Aj, for i
gives
2 ye x,ry), — 9609)
Byte — EO
Hence the top-right n x m block of the Jacobian of VL(x, \) is
g(x)? @R™™Sequential Quadratic Programming
For i= +1,...,9+m, we have
(VL(x,A))i = g(x)
Differentiating (VL(x,A)); wrt xj, fori =n+1,...,9+m,
j=l...sm gives
a — 28i(x)
By Vs i= Oey
Hence the bottom-left m x n block of the Jacobian of VL(x, A) is
Jg(x) € R™"
and the final m x m bottom right block is just zero
(differentiation of g;(x) wrt. Aj)Sequential Quadratic Programming
Hence, we have derived the following Jacobian matrix for
VL(x, d):
BOA) EOD] plmtayxtminy
I(x) 0
Note the 2 x 2 block structure of this matrix (matrices with this
structure are often called KKT matrices?)
SKarush, Kuhn, Tucker: did seminal work on nonlinear optimizationSequential Quadratic Programming
Therefore. Newton's method for VL(x, 4) = 0 is:
BlxK Ax) Je (x4) [ Sk | I ea! [ VE(xx) — Je (x )AK
dele) 0 5k B(x)
for k= 0,1,2,...
Here (sxx) € R"+” is the k'* Newton stepSequential Quadratic Programming
Now, consider the constrained minimization problem, where
(x, Ax) is our Newton iterate at step k:
min {35780 Da)s-+ 7 (V(X) + swan}
subject to Jz(x,)s + (xk) = 0
The objective function is quadratic in s (here xz, Ax are constants)
This minimization problem has Lagrangian
Lals.6) = 557 B(x, Ae)s +57 (WF) + JE O%)M)
— 67 Ug(xe)s + g(x«))Sequential Quadratic Programming
Then solving V£;,(s,5) = 0 (ie. first-order necessary conditions)
gives a linear system, which is the same as the ktt Newton step
Hence at cach step of Newton's method, we exactly solve a
minimization problem (quadratic objective fn., linear constraints)
An optimization problem of this type is called a quadratic program
This motivates the name for applying Newton's method to
L(x,d) =0: Sequential Quadratic Programming (SQP)Sequential Quadratic Programming
SQP is an important method, and there are many issues to be
considered to obtain an efficient and reliable implementation:
>
>
Efficient solution of the linear systems at each Newton
iteration — matrix block structure can be exploited
Quasi-Newton approximations to the Hessian (as in the
unconstrained case)
Trust region, line search etc to improve robustness
Treatment of constraints (equality and inequality) during the
iterative process
Selection of good starting guess for \Penalty Methods
Another computational strategy for constrained optimization is to
employ penalty methods
This converts a constrained problem into an unconstrained problem
Key idea. Introduce @ new objective function which is a weighted
sum of objective function and constraintPenalty Methods
Given the minimization problem
min f(x) subject to g(
we can consider the related unconstrained problem
. 1
min p(x) ~ #0) + seetx)Ta(x) — (e)
Let x" and x; denote the solution of (*) and (+x), respectively
Under appropriate conditions, it can be shown tha:
lim xp
pooPenalty Methods
In practice, we can solve the unconstrained problem for a large
value of p to get a good approximation of x*
Another strategy is to solve for a sequence of penalty parameters,
Pk, Where x7, serves as a starting guess for x...
Note that the major drawback of penalty methods is that a large
factor p will increase the condition number of the Hessian Hy,
On the other hand, penalty methods can be convenient, primarily
due to their simplLinear ProgrammingLinear Programming
As we mentioned earlier, the optimization problem
min f(x) subject to g(x) =0 and A(x) < 0, (*)
with f,g,/ affine, is called a linear program
The feasible region is a convex polyhedron?
Since the abject ve function maps aut a hyperplane, its global
minimum must occur at a vertex of the feasible region
"Polyhedron: a solid with flat sides, straight edgesLinear Programming
This can be seen most easily with a picture (in R2)
x
at)
PereaLinear Programming
The standard approach for solving linear programs is conceptually
simple: evamine a sequence of the vertices ta find the minimum
This is called the simplex method
Despite its conceptual simplicity, it is non-trivial to develop an
efficient implementation of this algorithm
We will not disciss the implementation details of the simplex
method...Linear Programming
In the worst case, the computational work required for the simplex
method grows exponentially with the size of the problem
But this worst-case behavior is extremely rare; in practice simplex
is very efficient (computational work typically grows linearly)
Newer methods, called interior point methods, have been
developed that