P. 1
Minimax Optimization

# Minimax Optimization

|Views: 64|Likes:

See more
See less

11/02/2012

pdf

text

original

# Minimax Optimization without second order information

Mark Wrobel

LYNGBY 2003 EKSAMENSPROJEKT NR. 6

¡

¡ ¢

Trykt af IMM, DTU

i

Preface
This M.Sc. thesis is the ﬁnal requirement to obtaining the degree: Master of Science in Engineering. The work has been carried out in the period from the 1st of September 2002 to the 28th of February 2003 at the Numerical Analysis section at Informatics and Mathematical Modelling, Technical University of Denmark. The work has been supervised by Associate Professor Hans Bruun Nielsen and co-supervised by Professor, dr.techn. Kaj Madsen. I wish to thank Hans Bruun Nielsen for many very useful and inspiring discussions, and for valuable feedback during this project. Also i wish to thank Kaj Madsen, for introducing me to the theory of minimax, and especially for the important comments regarding exact penalty functions. I also wish to thank M.Sc. Engineering, Ph.D Jacob Søndergaard for interesting discussions about optimization in general and minimax in particular. Finally i wish to thank my ofﬁce colleges, Toke Koldborg Jensen and Harald C. Arnbak for making this time even more enjoyable, and for their daily support, despite their own heavy workload. Last but not least, i wish to thank M.Sc. Ph.D student Michael Jacobsen for helping me proofreading.

Kgs. Lyngby, February 28th, 2003

Mark Wrobel c952453

II .

and theory for estimating the penalty factor is deduced. exact penalty functions. through examples and illustrations. and different strategies regarding updates are given. Methods for large and sparse problems are investigated and analyzed.iii Abstract This thesis deals with the practical and theoretical issues regarding minimax optimization. Exact penalty function are given an intense analysis. large scale optimization. trust region methods. Algorithms for minimax are trust region based. The algorithms are tested extensively and comparisons are made to Matlab’s optimization toolbox. . The theory of minimax optimization is thoroughly introduced. Keywords: Unconstrained and constrained minimax optimization.

IV .

Teorien for a minimax optimering introduceres igennem illustrationer og eksempler. optimering af store problemer. Metoder for store og sparse problemer undersøges og analyseres. eksakt penalty funktion. En grundig analyse af eksakte penalty funktioner gives og teorien for estimering af penalty faktoren σ udledes. . Minimax optimeringsalgoritmer er ofte baseret p˚ trust regioner og deres forskellige strategier for opdatering a undersøges. Algoritmerne gennemg˚ r en grundig testning og sammenlignes med matlabs optimerings pakke. trust region metoder.v Resum´ e Denne afhandling omhander de praktiske og teoretiske emner vedrørende minimax optimering. Nøgleord: Minimax optimering med og uden bibetingelser.

VI .

. . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .vii Contents 1 Introduction 1. Comparative Tests Between SLP and CSLP . . . . . . . . . . . . . . . . Convergence Rates for SLP . . . . 2. .1 2. . . . . . . . . . 3. . . . . . . . . . . . . . .3 SLP With a Corrective Step . . . . . . . .4 Implementation of the CSLP Algorithm . . . . . . .2 Finding Linearly Independent Gradients . . . . . . . Numerical Results . . . . . . . . . . . . . . . . . . . . . .2 3. . . . . . . . . . . . . . . .1 2 Outline . . . . . . .3 Stationary Points . . . . . . . . . .3. . Calculation of the Corrective Step . . . . 3. . . . . . . . . . . . . . . . . . . .2. . . . .2 3. . . . . . . . . . . . .1 Sequential Linear Programming (SLP) . .2 2. . . . .1. . . . 1 3 5 6 10 11 14 17 17 19 23 23 27 28 30 33 33 34 38 41 The Minimax Problem 2. . . . . . . . . .2 Implementation of the SLP Algorithm . . . . . . . . . . . .2. . . . . . . . . . . . . . . Finishing Remarks . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . Strongly Unique Local Minima. . .3. . . .1 Introduction to Minimax . . . . .1. . . .3 3.1 3. . . The First Order Corrective Step . . . . . . . . 3 Methods for Unconstrained Minimax 3. . . . . . . . . . . . . . . . . . 3. . . . . . . . .1 3. . . 3. . . . . . .1. . . .3.1. . . . . .1. . . Numerical Experiments . . . . . . . . . .3 3. . . Strongly Active Functions.1 3. . . . . . . . . . . . .

. . . . . . . . . . . . . . . . 111 D. . . . .2 Source code for CSLP in Matlab . . . . . . . 6 Linprog 6. . . . . . . . . . . . . . . . .5 4. . . . . Estimating the Penalty Factor . .2 5.2 6. . . . . . . . . . . . . .1 6. . . . . . . . . . . . .4 Source code for CMINIMAX in Matlab . . . . . . . . . . . . . . . . . . The Exact Penalty Function . . . . . . . . . .1 The 1 ∞ norm ﬁt . . . . . . . . . . . . .3 Source code for SETPARAMETERS in Matlab . 107 D. . . . . . . . . . . . . . . . . . . . . . .3 4.1 Source code for SLP in Matlab . . . . . . . . . . . . . . . . . . . . .3 6. . . . . . . . . . . . . . . Large scale version of linprog 6. . . . . . . . . . . . . . . . . . . . . . . . . An Algorithm for Constrained Minimax . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . .3 The Continuous Update Strategy . . . . . . A The Fourier Series Expansion Example B The Sparse Laplace Problem C Test functions D Source code D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 4. . . . .1 Future Work . . . . 103 D. . . . . . . . . . . . . . . . .1 5. . . . . . . The Inﬂuence of the Steplength . . . . . . . . . . . . . . . . .VIII CONTENTS 43 43 47 48 50 51 55 65 67 69 71 75 75 76 79 81 87 89 90 91 92 92 95 99 103 4 Constrained Minimax 4. . . . . . Strongly Unique Local Minima .1 . . . . . . . . . . Issues regarding linprog. . . . . . . . . . . . . . . . . Setting up the Linear Subproblem . . . . Why Hot Start Takes at Least n Iterations . . . . . . . . . . . . . . . . . . . . . . . 113 £ A. . . . . . . . . . . . . . . . . . . norm ﬁt . . . . . . . . . 5 Trust Region Strategies 5. . . . . Test of SLP and CSLP . . . . . . . . . . . . . . . . . . . . . .2 4. . . . . . . .4 How to use linprog . . . . . The Scaling Problem . . . . . . . . . . . . . . .1 4. . . . . . . . . . . . . . . . . . . . . .6 Stationary Points . . . . . . . . . . . 7 Conclusion 7. . . . . . . . . . . . . . . .2 The £ A. . . . . . . . . . . . .

that also are well suited for problems that are large and sparse. and without going into details.1. A development has been made. Example 1.e. For a description of the problem and its three different solutions the reader is referred to Appendix A.1 where the 2 solution shows ripples near the discontinuous parts of the design speciﬁcation. subject to the above three norms are shown in ﬁgure 1. Unfortunately the solution based on the 2 estimation is not robust towards outliers. It give rise to smooth optimization problems. This is because 2 also try to minimize the largest residuals. let us look at the linear problem of ﬁnding the Fourier series expansion to ﬁt some design speciﬁcation. where f : IRn IRm is a vector function. where the 1 solution ﬁts the horizontal parts of the design speciﬁcation (fat black line) quite well. the three norms responds differently to “outliers” (points that has large errors). and hence the 2 norm is sensitive to outliers. This behavior is also seen in ﬁgure 1. and things tend to be simpler when using this norm. The 2 norm (least-squares) is a widely used and popular norm. i. Further. the 1 norm is said to be a robust norm. £ £ £ \$0)(¥ ¤ ''%\$¥ ¤ #"  \$&&&   ¦ ¥ ¤ ©§¥ ¤ ¨ ¨ ¦ ¥ ¤  ¥ ¤ ¦ ¥ ¤ ©§¥ ¤  ¨ ¨ ¦  ¥ ¤  ¥ ¤ ¦ ¥ ¤ ©§¥ ¤    ¨ ¨ ¦ 1 2 2 ∞ F x F x F x fx fx fx f1 x f1 x 2 max f1 x fm x fm x 2 fm x 1 2 ∞ £ ¥ !¤ £ ¥ !¤ £ ¥ !¤ £ £ £ £ £ £ £ £ £ 12 £ £ . outlines for algorithms to solve unconstrained and constrained minimax problems are given. This is seen in ﬁgure 1. This is a problem that occur frequently in the realm of applied electrical engineering. because the solution based on the 1 estimation is not affected by outliers.1. 41] and [Hub81]. The corresponding residual function shows that most residuals are near zero. 2 and ∞ norm. to create a norm that combines the smoothness of 2 with the robustness of 1 . except for some large residuals. Before we begin the theoretical introduction to minimax. has its main focus on the theory behind minimax optimization. As shown in [MN02. This norm is called the Huber norm and is described in [MN02. and in this short introductory example we look at the different solutions that arise when using the 1 . The highest residuals are smaller than that of the 1 solution.4]. p.1 Chapter 1 Introduction The work presented in this thesis. The solution to the Fourier problem.

This is clearly shown in ﬁgure 1. hence the name minimax approximation.05 0 0 Residuals 1 2 3 4 5 6 7 2 0.1: Left: Shows the different solution obtained by using an 1 .3 0.25 0. The norm is not robust.35 0. £ 3 3 3 3 £ £ £ £ £ £ £ .15 0.2 0.05 0 0 Residuals 1 2 3 4 5 6 7 ∞ Residuals 0.3 0.35 0.4 0.15 0.1 0. 2 and ∞ estimator. and minimizes the maximum distance between the data (design speciﬁcation) and the approximating function.25 0.1 0. I NTRODUCTION The ∞ norm is called the Chebyshev norm.2 C HAPTER 1.2 0.1 0.3 0.15 0.4 0.35 0. however. the maximum residual is minimized and the residual functions shows no spikes as in the case of the 1 and 2 solutions.1 were the ∞ solution shows large oscillations. Only the ∞ case does not have any spikes in the residual function. Right: The corresponding residual functions.25 0.05 0 0 1 2 3 4 5 6 7 Figure 1.4 0. 1 0. and the lack of robustness is worse than that of 2 .2 0.

and various problems regarding linprog. Matlab’s solver linprog is described in chapter 6. by presenting generalized gradients. Further test results are discussed.C HAPTER 1. but has a more complicated notation. which is used to solve constrained problems. the theoretical framework is applied to construct two algorithms that can solve unconstrained non-linear minimax problems. Trust region strategies are discussed in chapter 5. is also a topic for this chapter. which is somewhat similar to unconstrained minimax theory. In this chapter we also take a closer look at the exact penalty function. directional derivatives and fundamental propositions. Further large scale optimization. is examined. In chapter 3. . I NTRODUCTION 3 1. We present the theoretical basis for constrained minimax in chapter 4. where we also look at scaling issues. encountered in the process of this work.1 Outline We start by providing the theoretical foundation for minimax optimiazation.

4 C HAPTER 1. I NTRODUCTION .

are determined with regard to minimizing the maximum difference between model output and design speciﬁcation. and occur frequently in real life problems. lies at the kink between f 1 x and f2 x .2) to a minimax problem max max fi x max fi x (2. The solution to the problem illustrated. F PSfrag replacements By using the Chebyshev norm ∞ a class of problems called Chebyshev approximation problems occur F∞ x fx ∞ (2. but not vice versa. . spanning from circuit design to satellite antenna design. We will refer to the Chebyshev approximation problem as Chebyshev. where the solid line indicates F x .1. We can easily rewrite (2.2) can be formulated as a minimax problem.3) min F∞ x \$ ¥ ¤ ©5¥ ¤ ¨ ¨ 4 \$ '¥ ¤ ¨ FE¨ 12 ¥ ¤ f1 x ¥ ¤ f2 x x @ 9 ¥ ¤ F x @ 9 D C @ 9 B @ 9 A @ 9 12 \$ \$&&& " 8( \$ 76 \$ '¥ ¤ ¥ ¤ 4 5¥ ¤ \$ '¥ ¤ min F x F x max fi x i 1 m (2.2) min F∞ x x x i i It is simple to see that (2. for i 1 2. Such problems are generally referred to as minimax problems. We start by introducing the minimax problem x i where F : IRm IR is piecewise smooth. \$ I( QP¥ ¤ H 8" \$( 8¥ ¤ " G 4 5¥ ¤ ¥ ¤ 6 where x IR and f x : IR m n IRm .5 Chapter 2 The Minimax Problem Several problems arise.1). Therefore the following discussion will be based on the more general minimax formulation in (2. and f i x : IRn IR is assumed to be smooth and differentiable.1) ¥ ¤ 12 Figure 2. The dashed lines denote f i x .1: F x (the solid line) is deﬁned as maxi fi x . The minimax method ﬁnds its application on problems where estimates of model parameters. A simple example of a minimax problem is shown in ﬁgure 2. where the solution is found by minimizing the maximum error.

5) and vice versa. At the end we formulate conditions for stationary points etc. F x In order to describe the generalized gradient. This convex hull is also the generelized gradient ∂F x . we introduce terms like the generalized gradient and the directional gradient. We get (2. Rather F will consist of piecewise differentiable sections.6). where the objective function is smooth. there are two active functions at the kink in ﬁgure 2.t.6) i.7) . as seen in ﬁgure 2.1 Introduction to Minimax In the following we give a brief theoretical introduction to the minimax problem in the unconstrained case. T HE M INIMAX P ROBLEM 2. if 0 is not in the convex hull deﬁned by (2.6) also has a simple geometric interpretation.5) has no multipliers.6) by using the ﬁrst order Kuhn-Tucker conditions for optimality. so ∂F x deﬁned by conv is the convex hull spanned by the gradients of the active inner functions. According to the deﬁnition (2. we need to look at all the functions that are active at x. cj x τ ¥ ¤R a ¥ ¤R f1 x f2 x τ fj x τ 0 @ 9 @ 9 ¥ ¤R f3 x Y j A ∑ λj fj x j A ∑ λj \$8( ` \$ ¦ ( ¥ ¤ ¦ X¥ ¤ R  X¥ ¤ R " Y " ¦ ¦ W¥ ¤ ∂F x conv f j x j : fj ( ¥ ¤ ¦ 5¥ ¤ " V¦ 6 ¥ ¤ ¦ ¥ ¤ EUT SR A j : fj x F x (2. in a minimax problem. We say that those functions belongs to the active set e.1.2: The contours indicate the minimax landscape of F x whith the inner functions f 1 . but is still equivalent with (2. as seen in ﬁgure 2. PSfrag replacements The formulation in (2.e.2 for the case x IR2 . f2 and f3 .6) then 0 is also not in the convex hull of (2.6) 1 λj 0 Figure 2. We see that (2. Unfortunately the presence of kinks in F makes it impossible for us to deﬁne an optimum by using 0 as we do in the 2-norm case.1) of minimax. Because F is not always differentiable we use a different measure called the generalized gradient ﬁrst introduced by [Cla75]. The gradients of the functions are shown as arrows. (2. The dashed lines show the border of the convex hull as deﬁned in (2. and everywhere else only one active function. we ﬁrst have to set up the minimax problem as a nonlinear programming problem. F is not in general a differentiable function.6 C HAPTER 2.4) F x (2. To show this.5) (2.1.g. `  c¥ ¤ H 4b¥ \$ ¤ ¦ W¥ \$ ¤ minx τ g x τ s. To provide some “tools” to navigate with.

when we remember that F x is a piecewise smooth function.9) (2. One could imagine τ as being indicated by the thick line in ﬁgure 2. τ. is the directional gradient. and conversely. Then we formulate the Lagrangian function m j 1 By using the ﬁrst order Kuhn-Tucker condition.8) (2. From the theory of Lagrangian multipliers it is known that λ j 0 for j A . Theorem 2.6). and we say that those functions belong to the fj x 1 1 & '¥ \$ ¤ R h ¦ §¥ \$ ¤ R i q6 ¦ §¥ \$ \$ ¤ R L xτλ 0 g xτ ∑ λ jc j m ¥ \$ ¤ h H g¥ \$ ¤ ¦ f¥ \$ \$ ¤ L xτλ gxτ ∑ λ jc j xτ xτ ¥ ¤ ¦ f¥ \$ ¤ ¦ §¥ ¤ r Y  ¦ e Q¥ ¤ \$&&& d\$ ¦ (2.9) to the following system which yield the following result j A j A Another kind of gradient that also comes in handy. We have now explained the shift in A & ¦ Y w 5¥ ¤ R ¦  ¤ " \$ ¦ 5¥ ¤ R ∑ λj fj x t Y j A 0 ∑ λj u vs λj ¥ ¤R H ¦ s 0 1 ∑ r p ¦ For the active constraints we have that f j active set A .5) to (2. If Γ and Λ are disjoint. Further. In general it is deﬁned as g x td g x gd x lim (2.C HAPTER 2.10) (2. the directional gradient is a scalar.13) .1. The constrains f j x τ says that τ should be equal the largest function f j x .11) (2. To illustrate this further. with Γ compact and Λ closed. we get j 1 The inactive constraints are those for which f j τ. We can then rewrite (2. 0. we need to introduce the theorem of strict separation which have a connection to convex hulls. In other words  p i i 6 6 ¦ Γ Λ / 0 x x d and α Γ dT x Λ dT x α α \$ y ¦ x( \$ ¦     \$ 6  " x x IRn dT x α d 0 ¥ ¤ & x( 6  ¥ ¤R " ¦ 5¥ ¤ R Fd x max f j x T d j ( ¥ ¤ g¥ H ` Further the Kuhn-Tucker conditions says that λ j formulation in (2. T HE M INIMAX P ROBLEM 7 where j 1 m.12) t 0 t which for a minimax problem leads to This is a correct way to deﬁne the directional gradient. then there exists a plane which strictly separates them. which is in fact the minimax formulation F x gxτ τ.1 Let Γ and Λ be two nonempty convex sets in IR n .

The vector d shown on the ﬁgure corresponds to a downhill direction. when M grows so it touches the dashed lines on the coordinate axes.8 Proof: [Man65. 52]. 50] C HAPTER 2.3 can be explained further.3.3: On the ﬁrst three plots d is not in M. p. where the solver tries to ﬁnd a downhill direction in order to reduce the cost function. however. Fd x p ¥ ¤R q6 p ¥ ¤R q6 d M. As described in detail in chapter 3. The last plot on the ﬁgure shows a case where d can not be separated from M or ∂F x for that matter. The ﬁgure shows four situations where d can be interpreted as a decend direction if it fulﬁls the above theorem. d is found by solving an LP problem. that is vT d 0. In the following we describe when such a downhill direction does not exist. Remember that as a consequence of (2. We will now describe what happens when M has an extreme extent. M M PSfrag replacements d PSfrag replacements d M M PSfrag replacements d PSfrag replacements d Figure 2. therefore v and f j x are interchangeable. and we have a stationary point because Fd x 0 as a consequence of dT v 0. where the above theorem is used. Then d is separated from v if Without loss of generality M x can be replaced with ∂F x . and there exists a d IRn and v M x . Further d can be interpreted as a descend direction. which leads to Fd x 0.6) f j x ∂F x . T HE M INIMAX P ROBLEM It follows from the proof to proposition 2. ¥ ¤ ` ¥ ¤R  6 p ¥ ¤R q6 d M. Fd x 0 d M. that if M x IRn is a set that is convex. Fd x 0 d M. so now an illustration of both the theorem and the generalized gradient is possible and can be seen in ﬁgure 2. Fd x ¥ ¤ & ¥ ¤ 0 0 ¥ ¤R 6 b\$  ` ¥ ¤ p p ¥ ¤ ¥ ¤ ¥ ¤ 6 ¥ ¤ 6 f¥ ¤ R  @ 9  ` ¥ ¤R 6 6 dT v 0 v M x . On the last plot. p. That is. Figure 2. 0 is in M and dT v 0.24 in [Mad86. so there is no downhill direction. compact and closed. The dashed lines in the ﬁgure corresponds to the extreme extent of the convex hull M x .

Still however it must hold that Fd x 0 because F x max f j x is a convex function. and not a line as illustrated in ﬁgure 2. or a plane as seen in ﬁgure 2. Right: The directional derivative for d 1 (dashed lines) and d 1 (solid lines). In the next two plots (top right) (bottom left). Because the problem is one dimensonal. because there is only one active function. In this case 0 is in the interior of M x . that v T d 0. The gradient is ambiguous at the kink. where the gray box indicate an area where F x is constant. but the generalized gradient is well deﬁned. it is simple to illustrate the relationship between ∂F x and Fd x . The gray box indicate an area where F x is constant. so the point is stationary. As described later.C HAPTER 2.1 and the deﬁnition 0. we have a stationary point if 0 M x . and d 1. We illustrate the directional gradient further with the example shown in ﬁgure 2.4. T HE M INIMAX P ROBLEM 9 In the ﬁrst plot (top left) there must be an active inner function where f j x 0 if M x is to have an extreme extend. where the directional gradient is illustrated in the right plot for d 1 (dashed lines) and d 1 (solid lines). 50 40 30 20 10 0 0 −10 50 40 30 20 10 PSfrag replacements −10 −20 −30 −40 0 1 −20 Figure 2. since F x is a convex of the directional gradient. we see by using theorem 2. this situations corresponds to x being a strongly unique minimum.5: Left: Visualization of the gradients in three points. This means that one of the active inner functions must have a local minimum (or maximum) at x. functions has a zero gradient. but it still holds that 0 ∂F x . ( ¥ ¤ " ¥ ¤  §@ 9  ¦ 5¥ ¤ @ 9 ¦ f¥ ¤ R 6 ` ¥ ¤R ` ` ¥ ¤R ¥ ¤ ¦ &  ¥ ¤ ¦  .4. By using the strict separation theorem 2. and hence Fd x function.5. Unique in the sense that the minimum can only be a point. This is illustrated in ﬁgure 2. At x 0 2 the the convex hull generalized gradient must be a point.1 we see that vT d 0 in this case. Thus Fd x The dashed line indicate ∂F x .3 (bottom right) we have that v T d 0 which leads to Fd x 0. In the last case in ﬁgure 2. We see that there are two stationary points at x 0 2 and at x 3. In fact it  C ¦ &  H ¦ ¥ ¤ ¦ Q¨ ¨ 6 ¥ ¤R C ¥ ¤ ¥ ¤ f1 x 2 3 4 ¥ ¤ f2 x 5 −30 −40 6 −50 0 1 2 3 4 5 6 ¥ ¤R Fd x ` ¥ ¤R ¥ ¤ ¥ ¤ @ 9 PSfrag replacements x Figure 2.6. As shown in the following.4: The case where one of the active 0.

p. then we have a stationary point. If 0 ∂F x Then it follows that Fd x 0 ¥ ¤R ¥ ¤R ¥ ¤ ¥ ¤ ¥ ¤ 6 0 ∂F x ` ¥ ¤R p ¥ ¤R is a local maximum as seen on the left plot. T HE M INIMAX P ROBLEM 0 and At x 3 the generalized gradient is an interval. Also there exists a d for which Fd x therefore there is a downhill direction. then the convex hull ∂F x would be the line segment between f 1 x and f2 x . We see from ﬁgure 2. because we can have “kinks” in F like the one shown in ﬁgure 2. say f 3 x from the problem shown on the ﬁgure. because it holds for all directions that Fd x 0 illustrated by the top white and black circle. If we removed. as shown in Figure 2. so this is also a stationary point. The interval is illustrated by the black circles for d 1 (∂F x ) and by the white circles for d 1 ( ∂F x ). it is also a strongly unique local minimum because 0 is in the interior of the interval. then ∂F x would collapse to a point. All this is described and formalized in the following. We can say that at a stationary point it will always hold that x PSfrag replacements the directional derivative is zero or positive.6: The contours show a minimax landscape.1 Let x IRn . If we removed a function more from the problem. Deﬁnition 2.2 that the null vector is inside the convex hull. An 0.1.1 x is a stationary point if This means that if the null vector is inside or at the border of the convex hull of the generalized gradient ∂F x . A proper description has now been given about ¥ ¤ ∂F x Figure 2. 29] In other words the proposition says that there is no downhill directions from a stationary point. This obvious deﬁnition of a stationary point in 2-norm problems would be that F x however. In fact. ` ¥ ¤R ¥ ¤ 6 6 Proposition 2. Proof: See [Mad86. and the solid line indicate a line of stationary points. ¦ ¤ e¥ dR ¥ ¤ H H ¦ ¥ ¤ ¥ ¤ 6 ¥ ¤ ¥ ¤ ¦  . so we need another criterion to deﬁne a stationary point. It is seen that both the intervals have the property that 0 ∂F x . 2. is not correct for minimax problems.10 C HAPTER 2. In other words F is only a piecewise differentiable function.1 Stationary Points Now we have the necessary tools to deﬁne a stationary point in a minimax context.6. The arrows show the convex hull at a point on the line. where the dashed lines show a kink between two functions.1. The deﬁnition give rise to an interesting example where the stationary point is in fact a line. Further it is also a local minimum as seen on the left plot.

For algorithms that do not use second derivatives. which gives rise to the following proposition.2. 31] The mathematical meaning of int in the above deﬁnition.C HAPTER 2. while ﬁgure 2. because there exists a g where g is illustrated in ﬁgure 2. p. when only using ﬁrst order derivatives. Proof: See [Mad86] p.14) z .2 Strongly Unique Local Minima. T HE M INIMAX P ROBLEM 11 stationary points. even though this also to some extent depends on the problem. this will lead to slow ﬁnal convergence. A situation where this criterion is fulﬁlled is shown in ﬁgure 2. however. As we have seen above and from ﬁgure 2. Proof: [Mad86.28. a stationary point is not necessarily unique. d IRn .2 Every local minimum of F is a stationary point. & y ¦ h( \$ 6  " p  g( " 6 z int C zT d sup gT d g 6 6 6 If C IRn is a convex set. So this is not a strongly unique local minimum.3 For x IRn we have a strongly unique local minimum if 0 int ∂F x IRn then C d 0 (2. This is a strong proposition and the proof uses the fact that F is a convex function. Still. we can expect fast (quadratic) convergence in the ﬁnal stages of the iterations. As described in the previous. and we have a vector z ( ¥ ¤ "  Q¥ ¤ R 6 ¥ ¤ f(8"  6 Proposition 2.6. If the algorithm uses second order information.7.1. there remains the issue about the connection between a local minimum of F and stationary points. Another thing that characterizes a point that is a strongly unique local minima is that the directional gradient is strictly uphill Fd x 0 for all directions d. Another class of stationary points have the property that they are unique. because it is not interior to the convex hull. These points are called Strongly unique local minima. Proposition 2. The equation must be true. 2. when described only by ﬁrst order derivatives.6 shows a situation where the null vector is situated at the border of the convex hull. a minimum of F x can be a line. ¨ i¨ ¨ ¨  where. But when is a stationary point unique in a ﬁrst order sense? A strongly unique local minima can be characterized by only using ﬁrst order derivatives. This ¥ ¤ . They can not be lines. is that the null vector should be interior to the convex hull ∂F x .

15) (2. where x 0 0 T is a minimizer. hence a slow ﬁnal convergence is obtained.17) ¥ ¤R &  e¥ ¤ R ¥ ¤R i ( j¥ ¤ 6  " ¦ §¥ ¤ R Fd x max gT d g ∂F x Fd x 0 & y ¦ x¥ ¤ \$ ( 6  " p " 6 0 int ∂F x 0 ¥ ¤ g¥ ¤ ( ¦ ¦ If we now say that z ( ¥ ¤ " 0.2 for all directions d from 0 to 2π. then the minimum can be identiﬁed by a kink in F. When using an SLP like algorithm t IR.16) ¦ Q¨ ¨ ¥ ¤ 6 6 . For direction d1 there is no kink in F.8 we see that Fd x is a continuous function of d and that An interesting consequence of proposition 2. T HE M INIMAX P ROBLEM PSfrag replacements z g We see that the deﬁnition of the directional derivative corresponds to If we look at the strongly unique local minimum illustrated in ﬁgure 2.3 is that if ∂F x is more than a point and 0 ∂F x then ﬁrst order information will sufﬁce from some directions to give ﬁnal convergence.12 C HAPTER 2. If we do not start from x 0 0t 1 Introduced in Chapter 3 q \$ t¦ s r o q \$ ¦ s r o q \$ pnT o ¦  m( ¦ ¨ e¨ vl¥ ¤ R " ¥ ¤ ¦  e¥ ¤ R Fd x K d \$ i¨ ¨ ¥ ¤R q π 2 π 2π 3 2π angle where K inf Fd x C 5k k @ 9 q \$ t¦ s r o q ¥ ¤R Sfrag replacements Fd x Figure 2.9. We illustrate this by using the parabola test function. where d 1. g can be a longer vector than z. Hence ﬁrst order information will sufﬁce to give us fast convergence for direction d 2 . If the decent direction (d 1 ) towards the stationary line is perpendicular to the convex hull (indicated on the ﬁgure as the dashed line) then we need second order information. Otherwise we will have a kink in F which will give us a distinct indication of the minimum only by using ﬁrst order information. then we will see that Fd x is strictly positive as shown in ﬁgure 2. Then It must be the case that the maximum value of inner product gT d can be larger than zT d.9.7: Because z is conﬁned to the interior of M. This can be illustrated by ﬁgure 2.8: Fd x corresponding to the landscape in ﬁgure 2. From other direction we will need second order information to obtain fast ﬁnal convergence.2 and calculate Fd x where d 1 for all directions. d 1 0 (2. 1 and a starting point x 0 0 t where see ﬁgure 2.14) implies that max gT d g ∂F x d 0 (2. ∂F x then (2. and C int M x Figure 2. If x0 t 0 . From ﬁgure 2.8.

This can be formulated more formally by j 1 j 1 [Mad86.19) 6 ( " 6 6 ( " 6 Deﬁnition 2. S IRn is a convex hull consisting of two points.6) and (2.18) ( v( " 6 " 6 . Finally if ∂F consists of three points. then we will eventually get a decent direction that is parallel with the direction d 1 and get slow ﬁnal convergence.g. A vector x IRn is said to be relative interior to the convex set ∂F x . 33] If e. In order to understand the proposition we need to deﬁne what is meant by relative interior. This property that the direction can have a inﬂuence on the convergence rate is stated in the following proposition.4 For x IRn we have Fd x Fd x 0 0 if d ∂F x otherwise (2. then aff ∂F is the plane spanned by the three points.18) is that the constraint λ j omitted in (2. where the convex hull ∂F x is indicated by the dashed line. The Afﬁne hull of ∂F is described by the following. Right: A neighbourhood of x viewed from direction d1 and d2 . If ∂F consists of two points then aff ∂F is the line trough the two points. For direction d1 there is no kink in F. then aff ∂F is also that point. while all other direction will have a kink in F.9: Left: A stationary point x . z IRn is relatively interior to the convex set S IR n z ` Note that the only difference between (2. p. T HE M INIMAX P ROBLEM 13 d2 direction d1 d1 direction d2 Figure 2.2 int aff S .C HAPTER 2. In this case z IRn is said to be relatively interior to S if z is a point on that line. If ∂F is a point. then aff S is a line. if x is interior to the afﬁne hull of ∂F x . G  ( wl¥ ¤ " 6 0 ri ∂F x \$ ¥ ¤ x  e¥ ¤ R ¦ 5¥ ¤ R 6 Proposition 2.18) & 8( ¦ h  ¥ ¤ R h " ¦ ( n¥ ¤ aff ∂F x ∑ λj fj m x ∑ λj m 1 0 has been ri S if z ( " ¥ ¤ T @ u 9 ( " ( " u ¥T ¤ ∂F x T PSfrag replacements PSfrag replacements x x T x " u ¥ ¤ 6 (2.

T HE M INIMAX P ROBLEM The proposition says that every direction that is not perpendicular to ∂F will have a kink in F.20) 6 ¥ ¤ R "  6 ( ¥ ¤ ¥ ¤ " 6 @ 9 6 ¥ ¤ 6 fj x j A y ¥ ¤ . and that the Haar condition holds at x.14 Proof: [Mad86] p. we ﬁrst need to introduce the deﬁnition of a strongly active function.5 Suppose that F x is piecewise smooth near x IR n . it follows that at least n 1 surfaces meet at x.10: Left: The convex hull spanned by the gradients of four active functions.10 for x IR2 .3 Strongly Active Functions. This is illustrated in ﬁgure 2. 35. 33. If this was not the case then the null vector could not be an interior point of the convex hull. So by removing f k x the point x would no longer be a stationary point. then it holds that 0 ∂F x . If not every active function ¥ ¤ ¥ ¤ q6 6 & c( y¦ \$ @ 9 6 ¥ ¤ R "  yz q6 \$ ¥ ¤ 6 j A 0 conv fk x k A k j ¥ ¤  6 ( (2. Right: An active function has been removed. Deﬁnition 2. To deﬁne what is meant by a degenerate stationary point. Figure 2. then 0 ∂F x . Proof: [Mad86] p. so that 0 ∂F x . Middle: We have removed a strongly active function. In this case x is still a stationary point and therefore still a minimizer. This is stated in the following proposition. 2. C HAPTER 2.3 Let F be a piecewise smooth function. Then if x is a stationary point. Proposition 2. This means that x is a strongly unique local minimum. the function f j x is said to be strongly active if If fk x is a strongly active function at a stationary point and if we remove it.2 we see that for x IR2 it holds that we can only have a strongly unique local minimum 0 int ∂F x if at least three functions are active at x. still 0 ∂F x . By looking at ﬁgure 2. If we remove a function f j x that is not strongly active. A strongly unique local minimum can also be expressed in another way by looking at the Haar condition.1. Then it is said that the Haar condition is satisﬁed at x IRn if any subset of has maximal rank. At a stationary point x.

10 (left) is a degenerate stationary point. then 0 ∂F x is reduced to 0 F x . The last topic we will cover here. that it is because of those kinks. In this case the convex hull collapses to a point. At this point we can say that the kinks in F x help us. that we can ﬁnd a minimizer by only using ﬁrst order information and still get quadratic ﬁnal convergence. In the sense. so there would be no way for 0 int ∂F x . ¥ ¤ 6 T ¥ ¤ ¥ ¤ ( ¥ ¤ " ¥ ¤R 6 ¦ .C HAPTER 2. and hence we can not get fast ﬁnal convergence towards such a point without using second order derivatives. then that stationary point is called degenerate. In other words if F x is differentiable at the local minimum. Figure 2. is if the local minimizer x is located on a smooth function. T HE M INIMAX P ROBLEM 15 at a stationary point is strongly active. This means that such a stationary point can not be a strongly unique local minimizer.

T HE M INIMAX P ROBLEM .16 C HAPTER 2.

2) might at ﬁrst glance seem puzzling. it is likely that the LP landscape will have no unique solution. That is. however. ghα f α Jh fx h h fx Jxh (3.3 (middle). This is. Because the LP problem only uses ﬁrst order information. because it has no direct connection to the NP problem. we ﬁnd the minimax solution by only using ﬁrst order information. The nonlinear constraints of the NP problem are approximated by a ﬁrst order Taylor expansion where f x IRm and J x IRm n . easily explained. e h ∞ \$ 6 }¥ ¤ ~ \$ ¥ ¤ c¥ ¤ §¥ |jE¥  ¤  4 ¤£ { e ¨ ¨  4 §¥ \$ ¤ minh α s.t.17 Chapter 3 Methods for Unconstrained Minimax In this chapter we will look at two methods that only use ﬁrst order information to ﬁnd a minimizer of an unconstrained minimax problem. The ﬁrst method (SLP) is a simple trust region based method that uses sequential linear programming to ﬁnd the steps toward a minimizer. The introduction of a trust region eliminates this problem.1 Sequential Linear Programming (SLP) In its basic version. The second method (CSLP) is based upon SLP but further uses a corrective step based on ﬁrst order information.2) ¥ ¤ . The last constraint in (3. That is α ∞ for h ∞.7) by solving a sequence of linear programming problems (LP). This corrective step is expected to give a faster convergence towards the minimizer.1) 2 ¨ ¨ a H 2 ¥ ¤ 6 }¥ ¤ α η (3. 3. like the situation shown in Figure 3.1) and a trust region. SLP solves the nonlinear programming problem (NP) in (2. By combining the framework of the NP problem with the linearization in (3. we deﬁne the following LP subproblem where f x and J x is substituted by f and J. At the end of this chapter the two methods are compared on a set of test problems.

If h ∞ η.1) and (2. the solution to α h will lie at the kink of the thick line. ghα f Jf h f Jf h h ∞ ( \$ " H ¦ a  x x h α α α η @ 9 3 @ 9 3 @ 9  k k @ 9       α0 αh Figure 3. This explains why we need to solve a sequence of LP subproblems in order to ﬁnd the real solution. 1 .3). M ETHODS FOR U NCONSTRAINED M INIMAX By having h ∞ η we deﬁne a trust region that our solution h should be within. Here we will investigate the difference between them. because we can implement it as simple bounds on h. We can say that α is just the linearized equivalent to τ in the nonlinear programming problem (2.18 C HAPTER 3. £ (3. where α h is the thick line. We can then write the Chebyshev LP subproblem as the following In section 2 we introduced two different minimax formulations (2. Only the latter norm can be implemented as bounds on the free variables in an LP problem. instead of the more intuitive Euclidean distance norm 2 .7).2: The objective is to minimize α h (the thick line). the LP problem says that α should be equal to the largest of the linearized constraints α max h . Figure 3. 1 h and 2 h are the thin lines. That is.3) . Equivalent to τ. seen from an LP perspective. F PSfrag replacements 1 2 For Chebyshev minimax problems F∞ max f f . This is reasonable when remembering that the Taylor approximation is only good in some small neighbourhood of x. But it is also seen that the kink is not the same as the real solution (the kink of the dashed lines. a similar strategy of sequentially solving LP subproblems can be used. If the trust region is large enough.2) is α. ( ¤£ ¥ v" ¦ 3 3 3 ¥ ¤ £ £ £ £ ¥ ¤ £ £ £ ¥ ¤ e ¨ ¨ Figure 3.1: Three different trust regions. we only trust the linarization up to a length η from x. 2 and ∞ . We use the Chebyshev norm ∞ to deﬁne the trust region. An illustration of α h is given in ﬁgure 3.t. 1 2 ∞ Another thing that might seem puzzling in (3. 2 and the ∞ norm.1 shows the trust region in IR2 for the 1 .2. based on three different norms. \$ e e e 4 b¥ \$ ¤ ¨ ¨ H H  minh α s. for the non-linear functions). That is because the Chebyshev norm is an easy norm to use in connection with LP problems. then the solution to α h is at the kink in the thick line. just as in the minimax case.

Left: The nonlinear landscape. As stated above.4) b fx (3. Right: For the Chebyshev problem we have a unique solution.2) has to be reformulated to the form ˆ x This is a standard format used by most LP solvers.2). Aˆ x b (3. 3.1.3) gives rise to a mirroring of the LP constraints as seen in (3. \$ ¥ ¤ ¦ \$ s H ¦ q ¥ ¤ ¦ \$ H o \$ s 6 ¦R g 0 1 A Jx e ˆ x h α & e ˆ min g h α T x s. M ETHODS FOR U NCONSTRAINED M INIMAX 19 The Chebyshev formulation in (2.2). Middle: The LP landscape of the minimax problem. but still keep the description as general as possible. A further description of linprog and its various settings are given in Chapter 6. whereas the Chebyshev LP landscape does in fact have a unique minimizer.3: The two LP landscapes from the linarized parabola test function evaluated at x 1 5 9 8 T . We will in this text use Matlab’s LP solver linprog ver. In order to use linprog. In this theoretical presentation of SLP we have not looked at the constrained case of minimax. the LP subproblem (3.22. This is described in more detail in chapter 4. We see that there is no solution.1) does not. For the parabola test function it holds that both the solution and the nonlinear landscape is the same. where F x is minimized subject to some nonlinear constraints. 1A description of the test functions are given in Appendix C. By comparing (3. The LP landscapes are from the linearized parabola1 test function evaluated in x 1 5 9 8 T . It is possible to give a formulation like (3.2) is ˆ implemented as simple bounds on x.3) while minimax (2. This means that the two formulations have different LP landscapes as seen in ﬁgure 3.C HAPTER 3. if we view it as a minimax or a Chebyshev problem.4) we get ˆ where x IRn 1 and e IRm is a column vector of all ones and the trust region in (3.5) C r q q \$ & ¦ Ho −1 0 2 −1 1 2 −2 −2 −1 0 1 2 ¥ \$ ¤R r ¥ ¤   z D   6 . 2 x 1 x* 0 0 0 1 1 2 −1 −1 −2 −2 −1 0 1 2 −2 −2 Figure 3. by approximating the nonlinear problem in (2.7) by sequential LP subproblems (3. is an LP solver. where ﬁrst order Taylor expansions of the constraints are taken into account. The LP landscape corresponding to the minimax problem have no unique minimizer.3.1.2) for the constrained case.t.2) and (3. as seen in (3.1 Implementation of the SLP Algorithm We will in the following present an SLP algorithm that solves the minimax problem. a key part of this algorithm.

h The SLP algorithm uses ρ to determine if a step should be accepted or rejected. In [JM94] it is proposed that ε is chosen so that 0 ε 0 25.1) or (2. If the gain factor ρ 1 or higher. LB. & \$&&& " 8( \$ 6 \$ £ 8( " ¦ 4 5¥ L x. ] := linprog(g .2). by substituting f and J by f f and J J .20 C HAPTER 3. h model can then be written as e ¨ ¨ ¤ ¥ \$ ¤ min L x h s.8) (3. A. This leads to the formulation of the gain factor ρ ∆F x. && & (3.9) (3. M ETHODS FOR U NCONSTRAINED M INIMAX We solve (3. If we were to solve the Chebyshev problem (2.2) it should easily be recognized that h is equivalent with the LP problem in (3. by only looking at minimax problems. Calculate: g and A. however.3). LB := [ ηe. In practice we could use ε 0.  & e e \$ ¥  ¤ & ¥¥ H v¥ ¤ ¤ ¤ p 4 b¥ ∆F x. UB := LB. An Acceptable step has the property that it is downhill and that the linarization is a good approximation to the non-linear functions. and many do that.2) we should change A and b in (3. h max i 1 h ∞ η \$ s q H o ¦ \$ s H q H o H ¤ ¤ ¦ A b H H Jf Jf e e r r ¤ ¦  6 where e IRn is a vector of ones. b by (3. but for convergence proofs. we need ε 0. The gain in the nonlinear objective function can be written in a similar way as where F is deﬁned either by (2. The gain predicted by linear && & H R H R H (3. 0 L x.4) with linprog by using the following setup.2 and the LP-problem in (3. h F x F x h & ¥ ¤ H g¥ H v¥ ¤ ¤ ¦ 4 b¥ ∆L x. We can formulate this by by looking at the linearized minimax function i i By looking at ﬁgure 3.5). then it indicates that linear subproblem is a good approximation to the nonlinear problem.6) ).10) (3. and if ρ ε. If ρ ε then the step is accepted. UB. b. ˆ [ x. f f m α.12) ∆L x. ∞].t.7) (3.5) to The implementation of a minimax solver can be simpliﬁed somewhat. then the opposite argument can be made. h L x.11)  . h ρ (3.2 we see that it must be the case that L x. h F x α ¦ X¥ From ﬁgure 3. One should realize that the Chebyshev problems can be solved in such a solver. .

the algorithm would continue until the trust region radius was so small that the & e ∆L 0 & p ¨ ¨ h δ (3. but we may have to reduce the trust region several times to ﬁnd an acceptable point. This is clearly an indication that a stationary point has been reached. as stated above.14) max 1 F x where δ is a speciﬁed accuracy and x is the solution.. and the results are given in chapter 5.16) H ¨ & p p  if(ρ 0 75) and (ρold η 2 5 η.13)  & ¦ &  q ¦   p ¦  p ¨  . Another important stopping criteria that should be implemented is When the LP landscape has gradients close to zero. that ∆L drops below zero. due to rounding errors in the solver. where the expansion of the trust region is regulated by the term ρ old ε. which could have a damaging effect on convergence. can not be used in the SLP algorithm. We now present an update strategy proposed in [JM94]. and the algorithm should therefore stop. Instead we could exploit that the LP solver in every iteration calculates a basic step h so that This implies that the algorithm should stop when the basic step is small. The description of those. x k xk 1 ε1 . which happens either when the trust region η δ or when a strongly unique local minimizer has been found. In the latter case the LP landscape has a unique solution as shown on ﬁgure 3. The SLP algorithm does not use linesearch. So this sequence of reductions replace the linesearch. elseif ρ 0 25 η η 2. the step xk xk 1 h is discarded if ρ ε. If this stopping criteria were absent.15) (3. A stopping criterion that is activated when the algorithm stops to make progress e.g. e. As a stopping criterion we would could use a criterion that stops when a certain precision is reached F x F x δ (3.g.3 (right). The update strategy says that the trust region should be increased only when ρ indicates a good correspondence between the linear subproblem and the nonlinear problem. gives an indication of the quality of linear approximation to the nonlinear problem. A poor correspondence is indicated by ρ 0 25 and in that case the trust region is reduced. Further the criterion ρold ε prevents the trust region from oscillating. i. because x does not have to change in every iteration. T ( p ¥ T ¤ \$ " ¥ T ¤ v¥ ¤ H & p  ε) (3. This can however not be used in practice where we do not know the solution. M ETHODS FOR U NCONSTRAINED M INIMAX 21 The gain factor is used in the update strategy for the trust region radius η because it.e.C HAPTER 3. it can some times happen. We have tried other update strategies than the above.

xt := x h. then we should discard the step and reduce the trust region.22 C HAPTER 3. h is a good approximation to F xt . ρ ∆F x. h ∆L x. The effect of ε is investicated further in section 3. while (not found) ˆ calculate x by solving (3. and δ that sets the minimum step size. The above leads to the basic SLP algorithm that will be presented in the following with comments to the pseudo code presented in algorithm 3. x := x0 . h if ρ ε x := xt . If ∆L 0 then we must be at a local minimum. then ∆F has to be somewhat large too.1.15) was activated. if stop or k kmax f ound := true.16). if ρ is large then it indicate that L x.1.15) and (3.13). On the other hand.1 SLP begin k := 0. Lets take a closer look at the effect of changing this variable. The condition ∆F ε∆L should be satisﬁed in order to accept the new step. The step should be accepted and the trustregion should be increased.6).2) by using the setup in (3.  ( " 3 ¥ ¤ p 2 ¨ ¨ ¥ ¤ ( " 2 & e e (  " 1 The user supplies the algorithm with an initial trust region radius η and ε which should be between 0 ε 0 25. f ound := false. In other words. When the step size h 0 then its possible that ∆L 0 due to rounding errors. This would give more unnecessary iterations. In the opposite case ε being small.3. So when ε is large. h := x 1 : n . ˆ ˆ α := x n 1 . ft := f xt . J := J f x . J := J xt .it becomes more optimistic. The algorithm should stop. f := ft . (  " (  " 2 1 (  " ¥ ¤ ¥ ¤ ¥ ¤ ¥ ¤ ¥ ¤ q¥ ¤  ¥  ¤   ¥ ¤  ¦ ¦ ¦ 3 ¦ . ρold := 2ε. Use the trust region update in (3. when a stopping criterion indicates that a solution has been reached. k k 1. end stopping criterion in (3.13) ρold ρ. see (3. One of the more crucial parameters in the SLP algorithm is ε. We recommend to use the stopping criteria in (3. In the opposite case when ε is large the algorithm will be more conservative and be more likely to reject steps and reduce the trust region. f := f x . M ETHODS FOR U NCONSTRAINED M INIMAX Algorithm 3.1. ∆F can be small and still SLP will take a new step. The stopping criteria takes k max that sets the maximum number of iterations. a small value of ε will render the algorithm more biased to accepting new steps and increase the trust region .1. If the gain factor ρ is small or even negative.

In this case ﬁrst order derivatives will sufﬁce to give fast ﬁnal convergence. See proposition 2. The only difference is that the trust region update in method 1 does not have the regularizing term ρ old ε. In both cases the starting point is x 0 \$ & \$ #q \$ o ¦ \$ \$ t¦ o opts 1 0 01 100 10 \$ q  \$ η ε kmax δ & ¦ We start by looking at the case ε 0 01. is dedicated to present those results.1 Let xk be generated by method 1 (SLP) where h k is a solution to (3. Assume that we use (3.3 Numerical Experiments In this section we have tested the SLP algorithm with the Rosenbrock function with w 10 and the Parabola test function. which happens in three cases: When ∆F 0. and we can expect quadratic ﬁnal convergence in both 1 2 1 T. then the rate of convergence is quadratic.4. This is stated more formal in the following theorem. Hence in that case we need second order information to get the faster quadratic ﬁnal convergence. If d ∂F x or x is not a strongly unique local minimum.C HAPTER 3.3. Finally both ∆L 0 and ∆F 0. as in (3. If xk converges to a strongly unique local minimum of (2. 3. cases. and as shown in chapter 2 such points are also local minimizers of F x . M ETHODS FOR U NCONSTRAINED M INIMAX 23 Situations occur where the gain factor ρ 0.1. which only happens when h is approaching the machine accuracy and is due to rounding errors.2). When only using ﬁrst order derivatives it is only possible to obtain fast (quadratic) ﬁnal convergence if x is a strongly unique local minimum. then we will get slow (linear) convergence. In that case the solution is said to be regular. Theorem 3. which is caused by both an uphill step and rounding errors. This happens when there is a kink in F from that direction. The SLP algorithm has also been tested with other test problems. 99] We can also get quadratic convergence for some directions towards a stationary point. p. for two extreme values of ε. The Rosenbrock problem has a regular solution at x 1 1 T . and when ∆L 0. SLP Tested on Rosenbrock’s Function Rosenbrock is a Chebyshev approximation problem and we tested the problem with two values of ε to investigate the effect of this parameter. where the following options are used 5 ¨ ¨ p ¦ & p ¦ q p  & H¦ o ¥ ¤ p q ( " o ¦ fT ( " T T ¥ ¤ & x p ¦ (3.2 Convergence Rates for SLP An algorithm like SLP will converge towards stationary points. Proof: [Mad86.4 (left) and (right).1).1.3. The results are presented in ﬁgure 3. because they both are trust region based and use an LP-solver to ﬁnd a basic step.17) . 76–78].13). which indicate that h is an uphill step.16) the two last scenarios would stop the algorithm. and section 3.7). 3. pp. Due to the stopping criteria in (3. The SLP method is similar to the method of Madsen (method 1) [Mad86. The values chosen was ε 0 01 and ε 0 25.

because of ρold ε. The trust region is also oscillating.25 η F(x) 10 0 10 0 10 −1 10 −1 10 −2 0 10 15 Iterations Gain factor − iterations: 20 5 20 10 −2 0 10 15 Iterations Gain factor − iterations 18 5 20 1 0 −1 −2 −3 gain ε 0. SLP found the minimizer x in 18 iterations and stopped on d active functions in the solution. The trust region increases in the last 3 iterations. after which F x decreases more smoothly. This affects the trust region η as seen on the top plot of ﬁgure 3. This is indicated by the increase in ρ at the last 9 iterations. ¥ ¤ e n¨ ¨ The SLP algorithm converged to the solution x in 20 iterations and stopped on d There where four active functions in the solution.4 (left). with the same starting point x 1 2 1 T. 10 1 Rosenbrock. At the last iterations.18) δ and there were four q & H¦ o & ¦ The SLP algorithm was also tested with ε The following options were used 0 25. From the plots one can clearly see.24 C HAPTER 3.13). The acceptance area of a new step. because ρ indicate that the linear subproblem is a good approximation to the nonlinear problem. For F x we see that the ﬁrst 10 iterations show a stair step like decrease. when near the minimizer. When close enough to the minimizer. lies above the dashed line denoting the value of ε. but in a somewhat more dampened way. we get quadratic convergence because Rosenbrock has a regular solution.75 0 5 10 Iterations 15 20 1 0 −1 −2 −3 gain ε 0. the linear subproblem is in good correspondence with the nonlinear problem. The gain factor ρ oscillates in the ﬁrst 11 iterations.75 0 5 10 Iterations 15 20 Figure 3. epilon = 0. \$ & \$ #q \$ o ¦ T \$ \$ t¦ o opts 1 0 25 100 10 e n¨ ¨ & q  \$ η ε kmax δ 5 (3.4 (left). ¥ ¤ T ¦  ¦ ¥ ¤ . epilon = 0. This shows that SLP is able to get into the vicinity of the stationary point in the global part of the iterations.4: Performance of the SLP algorithm when using the Rosenbrock function. due to the updating in (3. that whenever a step is rejected F x remains constant and the trust region is decreased. At the last four iterations we have quadratic convergence as seen on ﬁgure 3.01 η F(x) 10 1 Rosenbrock. M ETHODS FOR U NCONSTRAINED M INIMAX δ. We end with ρ 0 due to ∆F 0.25 0.

1. after which we start the local part of the iterations. This affects the Hsr ¨ p \$ q T \$ \$ & \$ tPq \$ o ¦ e ¨ ¨ \$ \$ t¦ o opts 1 0 01 100 10  η ε kmax δ 10 & ¦ & ¥ ¤ ¦ & & ¦ ¦ & ¦ & & ¦ ¦ ¥ ¤ q ¦ T o & ¥ ¤ ¦ (3. because of the oscillations in ρ.4. This affects the trust region. The results of the test are shown on ﬁgure 3. Many steps are rejected because ρ ε.C HAPTER 3. The algorithm used 65 iterations and converged to the solution x with a precision of x k x 8.5. as seen on the bottom plot of ﬁgure 3. where there is a strict decrease of the trust region in every iteration. that this conservative strategy reduces the amount of iterations needed for this particular experiment. Again we see an increase of the trust region at the end of the iterations because of the good gain factor. when ε is small. We see that by using ε 0 25. when compared to ε 0 25. Again we get quadratic convergence because the solution is regular. We notice that the trust region η is decreasing in a step wise manner. The gain factor ρ for ε 0 01 shows that four steps were rejected. it does not increase its trust region so often and is more reluctant to accept new steps. and that the trust region η decreases to a steady level sooner than the case ε 0 01. so this makes the algorithm more biased towards increasing it’s trust region. M ETHODS FOR U NCONSTRAINED M INIMAX 25 For F x we again see a stair step like decrease in F x in the ﬁrst 6 iterations. The ﬁrst test is done with the following options Notice the lower δ value.5 (left). so every step proposed by SLP is an uphill step. The algorithm stopped on d δ. The numerical experiment shows that a small value of ε makes the algorithm more optimistic. as we see on ﬁgure 3. so optimistic behaviour has its price in an increased amount of iterations. so it too does not oscillate.19) ¦ }¨ T .g. and 2 functions were active in the solution. That is testing for ε 0 01 and ε 0 25. This shows that large ε really makes SLP more conservative e. and the solution is not regular due to only two inner functions being active at the solution. SLP Tested With the Parabola Test Function We have tested the SLP algorithm on the Parabola test function and used the same scheme as in section 3. But it is important to note. The global part of the iterations gets us close to the minimizer.8692e-09. This is indicated by ρ that increases steadily during the last 12 iterations. The problem has a minimizer in x 0 0 T . This is because we do not have a strongly unique local minimum in the solution. It more easily increases the trust region. the oscillations of the gain factor ρ is somewhat dampened in comparison with the case ε 0 01.3. until the last 10 iterations. For ε 0 25 we see that only two steps were rejected. Therefore the trust region will dictate the precision of which we can ﬁnd the solution. The rugged decrease of η is due to the rather large oscillations in the gain factor. The reason is that after F x has hit the machine accuracy the gain factor ρ settles at a constant level below 0 due to ∆F being negative.

The algorithm stopped on d δ. This is shown in the following. epsilon = 0. Apparently this has no inﬂuence on the convergence rate. Hence we need second order information to induce faster (quadratic) convergence. Hsr ¨ & \$ & \$ tvq \$ o ¦ ¦ ¥ ¤ e ¨ ¨ & ¦ & ¦ \$ \$ t¦ o opts 1 0 25 100 10 T &q \$  \$ q o ¦  C ¥ ¤  C η ε kmax δ 10 (3. that show no signs of fast convergence.5 0 20 40 Iterations 60 gain ε 0. If we use SLP with x0 3 0 T and ε 0 01 then the problem is solved in 64 iterations. This test shows almost the same results as that of ε 0 01. In fact the convergence is linear as according to the theory of stationary points that are not strongly unique local minimizers. Later we will use an addition to the SLP algorithm that uses a corrective step on steps that would otherwise have failed. With such an addition.5 −1 −1.5 (right).75 80 1 0.5 0 −0.25 η F(x) 10 10 10 0 10 −10 10 −10 10 −20 0 40 60 Iterations Gain factor − iterations: 65 20 80 10 −20 0 20 40 60 Iterations Gain factor − iterations: 65 80 1 0.5: Performance of the SLP algorithm when using the Parabola test function.75 80 Figure 3.25 0. M ETHODS Parabola. we are interested in exploiting the ﬂexibility that a low ε gives on the trust region update. epsilon = 0. Again two functions where active in the solution. for both ε 0 01 and ε 0 25 we see linear convergence.5 0 20 40 Iterations 60 gain ε 0. From the top plot of ﬁgure 3. due to the solution being not regular. decrease of F x . even though more steps are rejected because of the high ε.5 −1 −1.26 10 C HAPTER 3.5 0 −0. The result is shown on ﬁgure 3. We have seen that the choice of ε 0 25 has a slightly positive effect on the Rosenbrock problem.8692e-09. and almost no effect on the Parabola problem.01 η F(x) 10 0 10 FOR U NCONSTRAINED M INIMAX Parabola. and we have slow linear convergence. A supplementary test has been made with the following options The algorithm used 65 iterations and converged to the solution x with a precision of x k x 8. The rate of convergence is for some problems determined by the direction to the minimizer.5 (right) we see that F x shows a linear convergence towards the minimizer.20) ¦ ¨ Q8T .

M ETHODS FOR U NCONSTRAINED M INIMAX 27 According to Madsen [Mad86.C HAPTER 3. PSfrag replacements xt x Figure 3. that otherwise would have failed. Here it is important to emphasize that we should use the multipliers whenever possible. The corrective step uses the inner functions that are active in the LP problem. an inner function that is not strongly active is \$ ¢( e  H v¥ ¤ £ c V¦  " ¥ ¤ £ where j 1 m and free formulation j h is deﬁned in (3. where active set in the LP problem is deﬁned as j where γ is a small value.6: The basic step h and the corrective step v.1.2 The First Order Corrective Step From the numerical experiments presented in section 3. then we calculate a corrective step v. A sketch of the idea is shown in ﬁgure 3. measured in function evaluations.22) .1). we saw that steps were wasted when ρ ε. In this section we introduce the idea of a corrective step. that if the basic step h is rejected. 3. We will present a ﬁrst order corrective step (CS) that ﬁrst appeared in connection with SLP in [JM92] and [JM94]. that tries to modify a step. The expectation is. p. that the corrective step can reduce the number of iterations in SLP like algorithms. It could be even cheaper if no new evaluation of the Jacobian was made at xt . The difference in deﬁnition is due to numerical errors.20). There is. xt v In the SLP algorithm the basic step h k was found by solving the linear subproblem (LP) in (3. Now a corrective step is used to try to save the step ˜ hk by ﬁnding a new point xt that hopefully is better than xk .22) would introduce another preset parameter or heuristic to the algorithm. because (3. as the traditional SLP ˜ strategy. but one objection. the trust region reduced and a new basic step hk 1 would be tried. the basic step would be discarded. 32] for some problems. The idea is that the step h v will move the step towards ˜ ˜ f1 xt f2 xt (thick line).If ρ ε for xk . We could also use the following multiplier ALP j h α γ ( \$ x(   " ¦ E¥ ¤  " £ ¦ ALP j j h α j λj 0 @ 9 Cc@ 9  h ¦ T x   ¦  ¦ & ¦ q o ¦  \$&&& d\$ p ¦ e (3.6. and in that way reduce the number of iterations. We tried this on Parabola with x0 0 30 T and ε 0 01 and found the minimizer in only 6 iterations.21) (3. For the sake of the later ˜ discussion we use the shorter notation xt x h and xt x h v.2). By using the theoretical insight from (2. another direction towards the minimizer will render ﬁrst order derivatives sufﬁcient to get fast convergence. The idea is. to the strategy of using the multipliers to deﬁne the active set. The important point is that the corrective step cost the same.

If this is not the case. just to mention some. is to project the columns of A onto an orthogonal basis spanned by the columns of Q IRn n . The reasoning behind the corrective step is illustrated in ﬁgure 3.2.7: Side view of a 2d minimax problem. The strategy. The most intuitive ways to do this.28 C HAPTER 3.23. Givens transformations and the Housholder transformation. if we want to be sure to get all the active inner functions we must use (3. then we just delete some equations to make the rest linearly independent. Chapter 4]. Left: The gradients at x indicates a solution at x h. When we have found the active set. In other words. of using Lagrangian multipliers to deﬁne the active set. in fact it could be zero. Sfrag replacements Figure 3. 3. however.7 (left) where the basic step h is found at the kink of the linearized inner functions. The vectors i. At ﬁgure 3. situated further away than the kink in the nonlinear problem.22). For more insight and details. Chapter 5.1 Finding Linearly Independent Gradients We have a some vectors that are stored as columns in the matrix A IR n m .7 (right) the corrective step v is found at the kink ˜ of the linearizations. This kink is. columns of A are then projected onto the basis Q. and hopefully xt will be an acceptable point. and a solution is indicated at x h v. Because Q is orthogonal Q T Q I. The method roughly sketched above is called QR factorization. where A QR. There exists different methods to ﬁnd Q like GramSchmidt orthogonalization.e. ~ 6  ¦ ~ 6   ¦ ~ 6  x x h £ PSfrag replacements £    2 1 x h v x h . the corrective step can be calculated. and the projections are then stored in a matrix called R IR n m . A simple case is described in Section 6. At xt we use the same gradient information as in x and we use those functions that were active at the kink (the active functions in the LP problem) to calculate the corrective step. Right: The gradients at x is used to calculate the corrective step. M ETHODS FOR U NCONSTRAINED M INIMAX not guaranteed to have a positive multiplier. This happens when the active functions have contours parallel to the coordinate system.2] and [Nie96. the reader is referred to [GvL96. Furthermore the active set ALP must only consist of functions whose gradients are linearly independent. We now want to determine whether these vectors i.e.3. 1. can sometimes fail when using linprog ver. columns of A are mutually linearly independent.

The problem is that the QR factorization is not rank revealing. For a QR factorization of JT . because the last element on the diagonal is zero. then one of these columns will be given a weight larger than zero on the main diagonal of R. This is. and it delivers a permutation matrix E IRm m . and the solution is to pivot the system. we give an example. The permutation is done so that the squared elements in the diagional of R is decreasing. then i is a set containing the indexes that denote the active functions at x. We want to ﬁnd linearly independent gradients in the Jacobian by doing a QR factorization of A. we should look after small values on the main diagonal as an indication of linear dependence. For numerical reasons. while the rest ideally will appear as zeros. these can without loss of generality be replaced by J T and f. so that A J x T . which means that the gradients are stored as columns in A.26) (3. where diag R 0.25) \$ q o t¦ \$  \$ s ¦ ¤ E¥ 8£ & r ¦ ¥ ¤ ¦ 1 1 (3. wrong when obviously rank A 2. We then apply the permutation so that & ¦ ¦ ˜ b ET b and ˜ A AE & 6 ¥ ¤ a \$ ¦ \$ ¥ ¤ ¦ b fi x AT Ji : x i ALP y¦ i¥ ¤ s H & H    ¦  ¦ \$ s H  ¦ Q R H & H H H H 2 2 1 1 1 1 2 2 2 0 12 12 ¦ E¥ ¤ s ¦ f¥ ¤ r ¦ \$ s ¦ Q R & 1 0 0 1 r r r ~ 6 6 where h IRn .23) A 0 0 1 where A is the transposed Jacobian.C HAPTER 3. Matlab has this feature build in its QR factorization. In this case it would seem like the system only had rank A 1. A QR factorization would then yield the following result 12 1 1 0 0 1 (3. the projections R and ﬁnally the permutation matrix E. where the permutation matrix is E 0 1 0 0 0 1 1 0 0 We see that there are two non-zeros along the diagonal of R and that this corresponds to the rank of A. so that the following ﬁrst order Taylor expansion can be formulated as h AT h b (3. M ETHODS FOR U NCONSTRAINED M INIMAX 29 If some columns of A are linearly dependent. We have the following system 12 1 1 b 123 T (3. We then ﬁnd We then QR factorize A and get the basis Q. let i denote the row numbers. We now repeat the above example. To illustrate what we want from the QR factorization. however.27) .24) As proposed in the previous we look at the main diagonal for elements not equal to zero. and do a QR factorization with permutation. In the examples we used A and b. however.

β is a scalar.30) (3.33) 0. is seen to be satisﬁed (3. in that sense that all linear dependent columns of J and all linear dependent rows of f have been removed. because of the equality constraint: f when ˆ J e v J ˆ e v Tλ & ¥ ¢ H ¡  ¤ ¢ H © ¡ ¦ ¦ 5¥ \$ ¤ R ˆ L vλ ¤ ¢ H ¡  #¥ ¤ ¥ ¤ £  ¦ E¥ \$ ¤ ˆ L vλ 1 ˆ Iˆ v 2 T ˆv Iˆ f J ˆ e v ˆˆ Iν J e T λ 0 6 \$ s ¦ \$ s ¦ ˆ v ˆ I I IRn T λ & ~ v β r ¥  ¤ ¥  ¤ where f and J are short for f x h and J x h . The equation says that the corrective step v should be as short as possible and the active linearized functions should be equal at the new iterate.28) (3. we search for non-zero values along the diagonal of R.32) (3. to keep it as close to the basic step h as possible. 3. e IRt is a vector of ones and v IRn .22) and linearly independent. Furthermore we have n I 0 0 0 \$ ¦ ¢ H © ¡ ¥ ¤ ¥ ¤ min 1 ˆ Iˆ v 2 T ˆv Iˆ s. can be found by solving a simple system of linear equations. First we reformulate (3. It is those reduced versions we will use in the following calculation of the corrective step.29) 6 . The corrective step v is minimized by using the 2 norm. This last demand is natural because such a condition is expected to hold at the solution x .t.21) or (3. By using the 2 norm. The index to the elements of mathrmdiag R that is non-zero is then stored in j. For numerical reasons we should search for values higher than a certain threshold.29).34) £ 6 ¦ ¥  ¤ c¥  ¤  1 min vT v vβ 2 s.29) ˆ v By using the theory of Lagrangian multipliers and by using the ﬁrst order Kuhn-Tucker conditions for optimality we have that the gradient of the Lagrangian should be zero & ¦ ¢ H ¡ H f ¦ Further.31) (3. whose solution is the corrective step v. the solution to (3. f J ˆ e v 0 (3.30 C HAPTER 3. fx h Jx \$ hv βe T 6 ¦ a ¦ r a ¥ ¤ £ J T ˜ A: and f ˜ bi (3.2 Calculation of the Corrective Step We can now calculate the corrective step based on the reduced system J and f. M ETHODS FOR U NCONSTRAINED M INIMAX Next.t. Then j and we see that J and f correspond to reduced versions of J and f. We now formulate a problem. where f IRt are those components of f that are active according to deﬁnition in (3.2.

and vice versa. the further the corrective step becomes.35) ˆ λ f AT 0 eT ˆ where A IR n 1 t . This goes in line with [JM94] stating that it would be advantageous to use (3. however it was suggested in [JM94].9. It is interesting to notice that Figure 3. this will give a more crude corrective step.8 shows that the the step v based on J x is perpendicular to h. that the difference in the function values of the active inner ˜ functions is proportional to the distance between xt and xt .C HAPTER 3. We will investigate that in more detail in the following.8: Various steps h of different length. M ETHODS FOR U NCONSTRAINED M INIMAX 31 To ﬁnd a solution that satisfy both Kuhn-Tucker conditions.29) when the problem is highly non-linear and otherwise calculate v based on J x and save the gradient evaluation at xt . This is illustrated in ﬁgure 3. As Figure 3. By solving (3. we use the following system of equations T ˆ ˆ ˆ I A 0 v J ˆ A (3.8. So the situation shown in ﬁgure 3. We could also say that the basic step h has to be a tangent to the nonlinear kink where the active functions meet. The squares denote the corrective step v based on J x h . 2 1. The ﬁgure shows that v based on J x is not as good as v based on J xt .35) we get the corrective step v. The length of the basic step h affects the corrective step v as seen in ﬁgure 3.8 clearly indicate. then the length of the corrective step will also be large and vice versa.7. It turns out that the corrective step is only perpendicular to the basic step in certain cases. The latter gives a result much closer to the intersection between the two inner functions of the parabola test function.8 is a special case.29) we used the Jacobian J evaluated at xt . From the ﬁgure it is clear. That is A A LP . So the further apart the values of the active inner functions is. An illustration of this is given in ﬁgure 3.5 −1 −2 −1 0 1 x* f1=f2 tangent h FOC J(x) FOC J(x+h) 2 Figure 3. In (3.5 x 1 0. This is because the length of v is proportional to the difference in the active inner functions f of F.5 h 0 −0. ¦ ¥ ¤ ¥ ¤ ¥ ¤ @  9 § H ¦¦ ¥ @ 9 \$ s H r ¦ s ¥ ¤ ¥ ¤ r s r ~ s  r 6 . The special case arises when the active set in the nonlinear problem is the same as in the linear problem. The circles denote v based on J x . For a good introduction and description of the Kuhn-Tucker conditions. that we instead could use J evaluated at x. The explanation assumes that the corrective step is based on J x . Chapter 2]. the reader is referred to [MNT01. If the distance between the active inner functions are large.

In this case h will be perpendicular to v. @ 9 ¥ ¤ ¥ ¤ ¥ ¤ 6 ¥ ¤ ¦ & ¦ ¦ \$ ¦ 5¥ ¤ ¥ ¤ 6 . Figure 3. Left: Only when the basic step is a tangent to the nonlinear kink. and hence v will be orthogonal to A and B because it is the shortest distance between them.29) says that the corrective step should go to a place in the linearized landscape at xt based on either J x or J xt . however. The deﬁnition of the corrective step in (3. 1 then v 0 (3. In the case where there is only one active function. The pseudo code for the corrective step is presented in the following. the dashed lines are not parallel to each other. the dashed lines will be parallel.9 left. It is seen that the dashed line B moves because of the change in the active inner functions at x t . because it will always go to that place in the linearized landscape where the active functions at x t becomes equal. however.36) length t This is because the calculation of the corrective step has an equality constraint that says that all active functions should be equal. As stated in (3. Figure 3. the LP landscape at x and B. because that is the shortest distance from xt to B. it is always the case that v will be orthogonal to the dashed line B as illustrated in ﬁgure 3. where A illustrates the kink in the LP landscape at x and B illustrates the kink in the linearized landscape at xt . The same ﬁgure middle.29) we should only use the functions that are active in F x to calculate the corrective step. hence h and v are not orthogonal to each other. Hence v 0. v is perpendicular to B.9: The dashed lines illustrates the kink in: A.9 right. and where J x is used to calculate v. We notice that if the active set ALP is of size t 1. illustrates the case where x is situated at A. demands that the length of v should be as short as possible. however. Therefore the corrective step always goes from A to B. then the corrective step v will be equal to the null vector. every v IRn would satisfy the equality constraint. the linearized landscape at xt . and when v IR n then the null vector would be the corrective step with the shortest length. The cost function. In general. This affects v. Also it says that we should keep the corrective step as short as possible. In this case.9 right. Right: If J xt is used to calculate v then the two kinks are not always parallel. Still. Middle: Shows that h is not always perpendicular to v. we have that h is perpendicular to v. shows a situation where x is not situated at A. where the active functions are equal. M ETHODS B FOR U NCONSTRAINED M INIMAX x x x B PSfrag replacements h PSfrag replacements v xt A h PSfrag replacements v xt A h xt v A Figure 3. When using J x to calculate v. The ﬁgure shows two dashed lines. illustrates a situation where the Jacobian evaluated at x t is used.32 B C HAPTER 3.

To induce higher convergence rates. One of the preferred methods that does this. ¨ (  " 1 2 J is either evaluated at x or xt . ( " ( " 2 v : corr step(f J i) begin Reduce i by e. For some problems second order derivatives are not always accessible. else ˆ ﬁnd v λ T by solving (3. Unfortunately for problems that are large and sparse. this will lead to a dense Hessian (quasi-Newton SQP). In the following we discuss the CSLP algorithm that uses a corrective step in conjuncture with the SLP method.2) The active set is reduced so that the gradients of the active inner functions evaluated at x or xt are linearly independent. 3. that only uses ﬁrst order derivatives.1 Corrective Step 3. with an addition that tries a corrective step each time a step otherwise would have failed.g.3 SLP With a Corrective Step In section 3. which gives slow convergence for problems that are not regular in the solution. For large and sparse problems we want to avoid a dense Hessian because it will have n n elements and hence for big problems use a lot of memory and the iterations would become more expensive.1 Implementation of the CSLP Algorithm The CSLP algorithm is built upon the framework of SLP presented in algorithm 3. The naive solution is to approximate the Hessian by ﬁnite difference (which is very expensive) or to do an approximation that for each iteration gets closer to the true Hessian. CSLP is Hessian free and the theory gives us reason to believe that a good performance could be obtained.35) return v. M ETHODS FOR U NCONSTRAINED M INIMAX 33 Function 3.C HAPTER 3. Such methods use Sequential Quadratic Programming (SQP).1 we saw that minimax problems could be solved by using SLP. The CSLP algorithm was ﬁrst proposed in 1992 in a technical note [JM92] and ﬁnally published by Jonasson and Madsen in BIT in 1994 [JM94].2. QR factorization if length(i) 1.1. is the BFGS update. i is the index to the active functions of the LP problem in (3.3. Methods have also been developed that preserve the sparsity pattern of the original problem in the Hessian [GT82]. end 1 \$ \$ e q ¦ o (  " . v := 0. the classic remedy is to use second order derivatives.1.

2 Numerical Results In this section we test the CSLP algorithms performence on the two test cases from section 3. instead of a trust region. we have to recalculate ∆F x. d to get a new measure of the actual reduction in the nonlinear function F x . When the corrective step v have been found a check is performed to see if v This check is needed to ensure that the algorithm does not return to where it came from. If not the check v 0 9 h was performed a situation could arise. Those results are shown in table 3. then d h v. If the corrective step was taken. Finally. If the check was not performed a situation like the one shown in ﬁgure 3.1. we reduce the step so that d d d.1. e. matlab’s own minimax solver fminimax is tested on the same set of test functions.34 C HAPTER 3. The reduction of d is necessary in order to prove convergence. This is followed by a test of SLP and CSLP on a selected set of test functions.3. 09 h . 3.3. which would lead to x x t .3. F Figure 3.10: Side view of a 2d minimax problem. So if ρ ε a corrective step is tried. To conﬁne the step d to the trust region we check η that d η. the Rosenbrock function. Especially for highly non-linear problems like. The CSLP algorithm uses (like SLP) the gain factor ρ from (3. The kink in the solid tangents indicate h and the kink in the dashed tangents indicate v. The CSLP algorithm is described in pseudo code in algorithm 3. ¥ ¤ © © ¦ ¥ ¤  ¦  x x h x  PSfrag replacements ¨ ¨ & ¦ e n¨ ¨ k k   k k ¦  p  ¦ e ¨ ¨ . fminimax is a quasi-Newton SQP algorithm that uses soft line search.1. where the algorithm would oscillate between x and x h. This gives rise to a new gain factor ρ that we treat in the same way as in the SLP algorithm. The algorithm could then oscillate between those two points until the maximum number of iterations is reached. the basic step added with the corrective step gives h v 0. i. M ETHODS FOR U NCONSTRAINED M INIMAX This scheme is expected to reduce the number of iterations in comparison with SLP for certain types of problems. we want to try to correct the step by using function 3.2.e.10 could happen..3. If the check succeeds.12) to determine whether or not a the iterate xt x d is accepted or discarded. If not.g. If the latter is the case. with comments.

if ν 09 h d := h ν. (  " ( " (  " 2 3 (  " begin k := 0. ft := f xt . ρ ∆F x. f := ft .2.8. until found 1  ¥ ¤ ¥ ¤ ¥ ¤ q ¦ ¥ ¤  ¥ ¨ q ¤ ¨  ¨ ¨  ¨ ¨ & ¨ ¨ e ¥ ¤ ¥ ¤ ¥ ¤  ¥ ¤ ¦ q¥ ¤  ¥  ¤  ¥ ¤ e  ¦ ¦ 4 (3.37) . J. as we did in section 3. This can also be seen from ﬁgure 3. Use the trust region update in (3. to see what effect ε has on the convergence rate. The corrective step is calculated by using function 3. and a corrective step should be taken. We can use either J based on J x or J x h .11 in the left column of plots.1 CSLP 35 CSLP Tested on Rosenbrock’s Function Again we test for two extreme cases of ε. We recommend to use (3. The trust region η 1 is suggested in [JM94].13) ρold ρ. repeat ˆ calculate x by solving (3. if ρ ε x := xt . xt := x h. k k 1. if d ∞ η d := η d ∞ d. when a certain stopping criterion indicates that a solution has been reached.C HAPTER 3. where the latter is suggested in [JM94] if the problem is highly non-linear. The step x h is rejected. ˆ ˆ α := x n 1 . The result is shown on ﬁgure 3. h := x n .3) by using the setup in (3. 1 0 01 100 10 and the Jacobian for the corrective step was based upon x h. h ∆L x. ρ : ∆F x.3.15) and (3. We recommend to use ε 10 2 or smaller. We used the following options in the test. h .16). x := x0 . d ∆L x.1. J := J xt . J := J x .3 for the SLP algorithm. η are supplied by the user. if stop or k kmax f ound := true. The algorithm should stop. We see that F x shows ¥ ¤ q \$ ¦ o  \$ '¥  \$ ¦ |T H ¨ ¨ \$ & \$ t5¥ \$ ¤ ¦ \$ \$ ¤ e i¨ ¨ (  " 4 ¥  ¤  ¥ ¤ ( " (  " 2 3 η ε kmax δ 10 ¦ ¦ (  " 1 The variables ε. f ound := false. The algorithm stopped on d δ with a precision of x x ∞ 0. ρold := 2ε.1. f := f x . ft := f xt .2) or (3. M ETHODS FOR U NCONSTRAINED M INIMAX Algorithm 3.6). The solution was found in 8 iterations with x 1 1 T . but in reality the ideal η depends on the problem being solved. xt := x d. h if ρ ε ν := corr step(ft . h).

only two function are active at x IR2 . is done here with CSLP. The gain factor ρ is never below ε which indicates that the corrective step manages to save all the steps that would otherwise have failed.3.g.2 0 0 2 4 Iterations gain ε 0.75 6 8 2 0 −2 −4 −6 −8 0 5 10 Iterations 15 gain ε 0. 10 1 Rosenbrock. (Bottom row) The acceptance area of a new step.01 10 2 Rosenbrock.4 0. at least n 1 functions are active in the solution and the gradients satisﬁes the Harr condition.1. because this gives us a quick convergence towards a stationary point in the global part of the iterations.25 0. The solution x 1 1 T was δ.25 10 0 10 0 10 −1 η F(x) 0 4 6 Iterations Gain factor − iterations: 8 2 8 10 −2 η F(x) 0 10 15 Iterations Gain factor − iterations 17 5 20 10 −2 10 −4 1 0.3. epilon = 0. M ETHODS FOR U NCONSTRAINED M INIMAX quadratic convergence at the end of the iterations when we are close to x .6 0. The result is shown in ﬁgure 3. Its interesting to notice that the level of the trust region radius η is higher than that of SLP in section 3.8 0. CSLP Tested on the Parabola Test Function The same test that was done with SLP in section 3. when the step is saved by the corrective step.11: Performance of the CSLP algorithm when using the Rosenbrock function with w 10.g. epilon = 0.1. This is because the problem is regular e. The Parabola test function is of special interest because it’s not regular at the solution e.36 C HAPTER 3. We see that the corrective step has reduced the number of iterations signiﬁcantly (see table 3.11 right column of plots. with a precision found in 17 iterations and the algorithm stopped on a small step d of x x ∞ 0. Anyway there is no need to be conservative. The same problem was tested with ε 0 25 and the result is somewhat similar to that of ε 0 01.3) for Rosenbrock like problems. the corrective step has not managed to reduce the number of iterations when the very conservative ε 0 25 is used. so we can say that the CSLP is a real enhancement of the SLP algorithm. Compared with the SLP test.75 20 Figure 3. lies above the dotted line denoting the value of ε. Most commonly second order information are needed to get ¦ C q \$ ¦ o T e i¨ ¨ & ¦ & ¦  6 T ¦ 8T H ¨ ¨ & . This is a good sign.

For this problem CSLP differs from SLP in an important way. The algorithm terminated on a small step.12 right column. The same problem was tested with ε 0 25 and the solution was found in 64 iterations with a precision of x x = 8. when the trust region is reduced and no new steps are taken. and it is therefore interesting to see what effect the corrective step has on such a problem.01 η F(x) 10 10 10 0 10 0 10 −10 10 −10 10 −20 0 20 40 60 10 −20 0 Iterations Gain factor − iterations: 60 10 0 −10 −20 −30 gain ε 0. The course of iterations is almost similar to that of ε 0 01. Compared to the SLP test. we need second order information. We see a fast initial reduction of F x coursed by the corrective step that brings x close to the stationary point.38) \$ \$ ¤ ¨ ªT H ¨ ¥ ¤ Rosenbrock. the corrective step no longer helps. The behavior of the variables in CSLP is shown in ﬁgure 3. In order to improve the convergence rate.12: Performance of the CSLP algorithm when using the Parabola test function. and we get linear convergence.C HAPTER 3. epilon = 0. M ETHODS FOR U NCONSTRAINED M INIMAX 37 faster than linear convergence in such a case. The result is shown in ﬁgure 3. When in the vicinity of the stationary point.68e-09. epilon = 0. the corrective step no longer helps. The initial reduction of F x is somewhat slower. We see a very interesting and fast decrease in F x in the ﬁrst 10-15 iterations. (Bottom row) The acceptance area of a new step. This is because the corrective step very quickly gets x into the neighbourhood of the stationary point. The test was done with the following options 1 0 01 100 10 The solution was found in 60 iterations with a precision of x x = 5.75 0 20 Iterations 40 60 2 0 −2 −4 −6 0 Figure 3.25 0.87e-09.75 60 80 . even though the same behavior is present.12 left column of plots.25 η F(x) 20 40 60 Iterations Gain factor − iterations 64 80 gain ε 0. and thats why we get the linear convergence in the remaining iterations. When x is close to the stationary point. and stopped on a small step. 10 10 Rosenbrock. From ﬁgure 3.13 we see ¥ ¤ ¨8T H ¨ & '¥  \$ 10 20 40 Iterations & ¦ ¥ ¤ \$ & \$ t5¥ \$ ¤ ¦ & ¦ η ε kmax δ (3. lies above the stippled line denoting the value of ε. we see that the gain factor is more stable in the start of the iterations (0-15) and at the end.

For functions like Parabola. This yield that CSLP is a very promising method in the global part of the iterations before we reach the vicinity of a stationary point. See Appendix C for deﬁnitions.13: Results from SLP and CSLP tested on the Parabola problem. but when using CSLP. Another important fact is that CSLP does not perform worse than SLP. Distance Iteration 3. M ETHODS FOR U NCONSTRAINED M INIMAX that the corrective step gets us close to the vicinity of a stationary point. We have used an initial trust region radius η 1 and unless otherwise noted ε 0 01.3 Comparative Tests Between SLP and CSLP In this section we presents numerical results for the SLP and CSLP algorithms presented in the previous. Table 3.1: Test functions used in the analysis. We see that CSLP reach the vicinity of the minimizer faster than SLP. BrownDen.3. as seen in section 3. ¥ ¤ ¦ ¦ w w 10 100 T Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich param m 2 2 2 20 15 15 11 11 51 5 n 2 2 2 4 3 3 2 4 6 4 t 2 4 4 3 4 4 2 22 7 4 h are u ¬« ¦ & ¦ . Test functions yA yB Table 3. in fewer iterations than SLP for the Parabola test function. When comparing the number of iterations.3.3.38 C HAPTER 3. the number of iterations are either equal or less for CSLP than that of SLP. The plot shows the Euclidean distance from x k to the minimizer x as a function of iterations. This is because several corrective steps were attempted. ¥  ¤ The test results for the SLP and CSLP with a corrective step based on J x and J x presented in Table 3.3. 10 2 cslp slp 10 0 10 −2 10 −4 10 −6 10 −8 10 −10 0 10 20 30 40 50 60 70 Figure 3. For the Hettich function this has lead to a reduction in iterations. which also is observed in [JM94].1 shows the test functions that have been used. the number of function evaluations grows when compared to SLP. but unfortunately also in an increase of function evaluations. Ztran2f and Hettich. we see that CSLP performs just as well as SLP or better.

but due to its line search it uses a lot more function evaluations. but the precision of the solution is not that good. in order to get more iterations and a better solution. when compared to SLP and CSLP. Matlabs minimax solver fminimax is an SQP algorithm. and by using optimset.C HAPTER 3. Quasi-Newton. Ztran2f. The remedy is obviously to lower the value of TolFun. that uses an approximated Hessian (quasi-Newton) with soft line search. Bard2 and El Attar all share the commen property that CSLP does not use corrective steps. BrownDen. further information was extracted. Rather surprisingly.8. lowering TolFun to 1e-7. For Rosenbrock’s function we see a huge difference in performance between SLP and CSLP. M ETHODS FOR U NCONSTRAINED M INIMAX 39 Algorithm fminimax minimax SQP. line search 10 5 10 8 Name 10 2 Parabola 7(15) 9(19) 8(17) Rosenbrock1 8(18) 8(18) 8(18) Rosenbrock2 19(44) 19(44) 19(44) BrownDen 7(15) 9(19) 9(19) Bard1 5(11) 6(13) 6(13) Bard2 7(15) 8(17) 8(17) 10(21) 14(30) 14(30) Ztran2f Enzyme 8(17) 16(33) 18(37) 18(37) El Attar Hettich 4(9) 3(7) 3(7) Table 3. The same goes for the number of function evaluations that are reduced by almost 45%. Rosenbrock2 and Enzyme. hence for theese functions SLP and CSLP show the same performance. where TolFun=1e-6. The numbers in the parentheses is the number of function evaluations. El Attar uses only one corrective step wich only alter the results for CSLP with one iteration. Especially the Enzyme problem reveals the strength of CSLP.2. For the Enzyme problem. The poor result motivated a further investigation. This get even better when the corrective step is based upon J x h . The effect of using second order derivatives is seen in table 3. triggers a loop that continues until the maximum number of function ­ ­ ­ ¥  ¤ ¥  ¤ . For the Enzyme problem the number of iterations and function evaluations is very low. When the corrective step is based on J x h the number of iterations are reduced by almost 75% compared to SLP. fminimax terminates because the directional derivative is smaller than 2*TolFun. CSLP has shown itself quite successful on test problems like Rosenbrock1. The test functions Bard1. fminimax uses almost as few iterations as CSLP. which is in accordance with expectations and illustrated in ﬁgure 3. especially for Rosenbrock2. The results are generally very acceptable. Enzyme and Hettich. For a function like Rosenbrock.2: Matlab’s minimax solver fminimax tested upon the test functions. and fminimax shows good performance for test problems like Parabola.

[ ] failed corrective steps. Column 2–4 number of iterations. it generally pays to use it. the precision of the solution stay the same. Raising the maximum number of function evaluation from 500 to e. evaluations has been reached. ( ) the number of function evaluations.J x 8(13) 15(27) 21(39) 10(17) 11(18) 11(18) 17(29) 18(30) 18(30) 15(20) 29(41) 42(57) 3(4) 4(5) 5(6) 3(4) 4(5) 5(6) 6(7) 21(30) 30(43) 58(85) 64(94) 65(95) 7(8) 9(10) 10(11) 7(11) 18(33) 28(53) Test of CSLP .g. that if we have the possiblity of using second order information. 1000 . do not help. M ETHODS Test of SLP 11(12) 16(17) 40(41) 19(20) 3(4) 3(4) 6(7) 164(165) 7(8) 8(9) FOR U NCONSTRAINED M INIMAX 21(22) 31(32) 16(17) 16(17) 41(42) 41(42) 32(33) 42(43) 4(5) 5(6) 4(5) 5(6) 21(22) 30(31) 168(169) 169(170) 9(10) 10(11) 19(20) 30(31) Test of CSLP . the table shows. Column 5–7 attempted corrective steps.40 C HAPTER 3. In general. however.J x h 8(15) 7(13) 9(14) 15(20) 3(4) 3(4) 6(7) 26(48) 6(8) 5(7) 12(23) 8(14) 11(16) 29(41) 4(5) 4(5) 21(30) 42(75) 8(10) 12(21) Table 3. Especially when the problem is not regular at the solution. η0 1 and ε 0 01.  C ­ C ­ Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich 10 2 10 5 ­ ¥  ¤ 10 21(41) 8(14) 11(16) 36(52) 5(6) 5(6) 30(43) 43(76) 9(11) 21(39) ­ ­ ­ Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich 10 2 10 5 10 ­ ¥ ¤ ­ ­ Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich 10 2 10 5 10 8 corrective steps - - 8 corrective steps 4[0] 11[0] 17[0] 7[1] 7[1] 7[1] 11[0] 11[0] 11[0] 4[0] 11[0] 14[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 8[0] 12[0] 32[6] 35[6] 35[6] 1[1] 1[1] 1[1] 5[2] 13[2] 26[2] corrective steps 6[0] 10[0] 19[0] 5[0] 5[0] 5[0] 4[0] 4[0] 4[0] 4[0] 11[0] 15[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 8[0] 12[0] 22[1] 33[1] 33[1] 1[0] 1[0] 1[0] 3[2] 10[2] 19[2] 8 .3: Unless otherwise noted.

to ensure fast ﬁnal convergence. As stated in the start of this section its not trivial to use second order information for problems where its not available.C HAPTER 3. the performance of CSLP is very promising. The important difference between CSLP and SLP in the non regular case. The work of Madsen [Mad86]. For the global part of the iterations a method like SLP (method 1) is used to get into the vicinity of the minimizer. like the parabola test function.3. This can pose a problem for large and sparse systems. For problems that do not have regular solutions. is that CSLP seems to be able to reach the vicinity of the stationary point faster than SLP. When in the vicinity. In such cases the approximated Hessian (quasi-Newton) tend to be dense. M ETHODS FOR U NCONSTRAINED M INIMAX 41 3. we see a performance of CSLP that is as good as SLP. proposes a minimax algorithm that is a two stage method called the combined method. . a switch is made to a method that uses second order information (method 2).4 Finishing Remarks For regular solutions where second order information is not necessary to ensure quadratic convergence. with method 1 replaced by CSLP. It could be interesting to try the combined method of Madsen. This make CSLP a good algorithm in the global part of the iterations.

M ETHODS FOR U NCONSTRAINED M INIMAX .42 C HAPTER 3.

3) (4.4) . First we introduce the theory of constrained minimax with examples. We start by describing the constrained optimization problem where j 1 m and i 1 p.9). then we can deﬁne the generalized gradient of the constraints ∂C x in a similar way as we did with ∂F x in (2. i.1 Stationary Points We could deﬁne a stationary point by using the ﬁrst order Kuhn-Tucker condition for optimality.. followed by a presentation of how to solve constrained minimax problems by using a penalty function. That is i 1 If C x 0. The constraint C x 0 deﬁnes a feasible domain D that is n a subset of IR .t. Further we assume that inner functions c i x are differentiable in the domain D. and the function of constraints C x is a convex function.1) \$&&& \$ ¦ ¥ ¤ ¥ ¤ ¦ §¥ ¤ \$&&& \$ ¦ (4. C x max f j x max ci x 0 (4. The interesting case is when C x 0.2) p ¥ ¤ (4. this generalization is the same as the unconstrained case as shown in (2. we are at the border of the feasible region D . by using the ﬁrst order Kuhn-Tucker conditions which yields where & ( h¥ ¤ ¦ 5¥ ¤  V¦ " Ac i ci x Y i Ac i Ac C x \$ x( ` \$ ¦  ¥ ¤ R Y " ¦ V5¥ ¤ ∂C x ∑ λi ci x ∑ λi 1 λi ¥ ¤ & ` \$ ¥ ¤ h p ¥ ¤  c¥ ¤ R ¦ 0 F x ∑ λi ci q x λi 0 0 ¥ ¤ \$ e ( ¢¥ ¤ " ( ¥ ¤ " ¥ ¤ p ¥ ¤ ¦ ®¥ ¤ ¦ b¥ ¤ minx F x s.e. 4.43 Chapter 4 Constrained Minimax We now turn our attention to the solution of minimax problems with constraints. This is natural when realizing that for C x 0 the constraints have no effect.6). When noting that C x is a minimax function. that is piecewise differentiable.

the Fritz-John stationary point condition says that λf λc ∂F x c ∂C x 0 λ Second. we must deﬁne what is meant by a stationary point in a constrained minimax framework.8). We see that the two conditions are equivalent if 0 λ 1. (right) Fritz-John. p. be deﬁned slightly differently by using the Fritz-John stationary point condition [Man65. other cases to consider. For λ0 0 and λ 0. This condition holds when F x in the nonlinear programming problem (4.1: The convex hull for three different values of λ. if where µ 0 belongs to the convex hull in (4.8) (4. 94]. we say that the null vector should belong to the convex hull spanned by F x and ∂C x . This is illustrated through an example. Stationary point condition due to (left) Kuhn-Tucker.9) ¥ ¤ p ¥ ¤ ¥ dR ¤ \$ ¦  & ( ¢¥ ¤ \$ ¦ f¥ ¤ h \$ '¥ ¤ R " ¦  \$&&& d\$ \$ ¥ ¤ R ¥ ¤ " ¥ ¤ ¦ 6  #¥ ¤ R   ¥ ¤ ` ¦ 0 λ0 F x λci x λ ∑ λi q λ0 λ 1 ¥ ¤ q ¥ H ¦ ¤ ¦ ¥ dR ¤ 6 (4. Before we give the example. But this only hold when F x is smooth. however. because for some cases the stationary point condition based on Kuhn-Tucker in (4. Figure 4.1) if 0 M x  C C p p & x( ` '¥ ¤ \$ C 6 R '¥ ¤ \$ 6R R  R 76 " 0 f µc f ∂F x c ∂C x µ 0 \$ h( e e '¥ ¤ \$ 6 R '¥ ¤ \$ 6 R  R ¥ H ° R ¯6 ¤ " 0 1 f ( ¥ ¤ \$ '¥ ¤ ¦  5¥ ¤ M x e¥ ¤ ¦ 5¥ ¤ p ¥ ¤ ∂F x conv ∂F x ∂C x ∂C x if C x if C x if C x 0 0 0 (4. then c i x 0 which leads to λi 0. and for that Merrild [Mer72] deﬁned the map As proposed in [Mad86] we will use the Fritz-John stationary point condition in the following. ¥ ¤ 6 6 Deﬁnition 4. the stationary point condition derived from Kuhn-Tucker in (4. If constraint i is not violated. the stationary point condition for the Fritz-John point is (4. then 0 will also belong to the convex hull in (4. i 1 where the multipliers λi 0 for i 0 q. That is. λ 0 (dotted line) . but ﬁrst we shall see that there is in fact no huge difference between the two stationary point conditions.2) says that 1 λ λ.44 C HAPTER 4. λ 05 (dashed line) and λ 1 (solid line).1 x D is a stationary point of the constrained minimax problem in (4.6) 0 conv F x ∂C x So. however. First.9) as seen on ﬁgure 4. and C x 0. There are. Then it holds for a stationary point that 0 F x.7) 1 (4.1.5) . C ONSTRAINED M INIMAX The stationary point could.1) is a smooth function.2) will not sufﬁce.

2: The dashed line shows the convex hull due to the Kuhn-Tucker stationary point condition. not without its drawbacks.C HAPTER 4. for λ 1. When the Fritz-John stationary point condition is used. the convex hull can be drawn as indicated on the ﬁgure by the solid lines. The solid line indicate Fritz-John. If wee used the Kuhn-Tucker stationary point condition. The Fritz-John conditions is. PSfrag replacements Figure 4. then the convex hull (dashed line) would not have 0 as part of its set. C ONSTRAINED M INIMAX 45 We now illustrate why the Fritz-John condition is preferred over Kuhn-Tucker in the following example. Here we see that the line between c1 x and c2 x can be a stationary point regardless of F.2 (left). for multiple values of λ. From the constraints it is seen that the feasible domain D is the point x 0 0 . even when F is increasing along the line. So the Fritz John condition can be used to deﬁne the stationary point successfully. which is obviously incorrect.2 For d IRn and d that x td D for 0 t ε.10)  q \$ t¦ o  5¥ ¤ 4 H 4 5¥ ¤ ¥ ¤ ¦ 6  C D exits if ε 0 so . a feasible direction from x D ±±D  D C ` ¥ ¤R ¥ ¤ e 6 ¥ ¤ e ¦ T x PSfrag replacements T e e x ¦ minx F x subject to c1 x x 2 x2 1 c2 x x 2 x2 1 x1 0 0 (4. for λ 0 0 25 1. We have the following problem An illustration of the problem is given in ﬁgure 4. because if λ 0 then it is not certain that Fd x 0 for every direction at x.1 to the constrained case. however. then we have the situation shown in ﬁgure 4.2 (right). Here we see that 0 can be a part of the convex hull if λ 0. As deﬁned in [Mad86] we introduce the feasible direction.10). 0. and hence the point would not be stationary due to the deﬁnition. so the solution must be that point regardless of F.   6 y¦ Deﬁnition 4. The last example shows that it is not possible to generalize proposition 2. If we linearize the constraints in (4.

C ONSTRAINED M INIMAX Proof: [Mad86] pp.46 C HAPTER 4. the following holds: M x. The second part of the proposition deals with the case where the unconstrained problem at x 0 we would have a negative directional derivative. so in the constrained case where C x are stopped by constraints at x. We see that in this case the Fritz-John and KuhnTucker stationary conditions are equivalent. then Cd x 0.13) which is obvious as illustrated in ﬁgure 4. Proof: [Mad86] pp. 54 We see that the ﬁrst part of the proposition is very similar to the unconstrained case. and F would have an inﬂuence on the stationary point condition in such a case because λ 0. and the feasible direction is indicated by d. Next we use the feasible direction d. PSfrag replacements where µ 0.1 If C x 0 and d is a feasible direction. ¦ f¥ ¤ ¦ f¥ ¤ 6 ` ¥ ¤R then Fd x 0 for all feasible directions d.3: The point x indicates a solution to the constrained problem.13) and µ 0. This means that the directional derivative for the constraints C d x must be negative or zero when whe are at the border of D and go in a feasible direction.1 we see that if d is a feasible direction. 53. ¥ ¤ ¥ ¤  e e¥ ¤ R ¥ ¤R ` ¥ ¤R ¦ 5¥ ¤ 6 Proposition 4. One way we can avoid the situation described in the last example is to assume that C x is negative for some x. then it must hold that Cd x 0 and therefore due to the deﬁnition of the directional derivative in (2.3. This leads to & ` ¥ ¤R i ` fT d 0 Fd x 0 \$ & e  i e e¥ ¤ R ¦ 0 fT d µcT d \$ ¥ H ¦ q ¤ \$  ¦ ¥ ¤ 6 ¥ ¤  ¥ H ° ¦ ¤ 0 λf 1 λc 0 f µc µ p ¥ ¤ We see that when C x 0 and λ 0 because y IR n : C y the Fritz-John stationary point condition can be written as 6   T x ¥ ¤ ¥ ¤ F x C x x 0 then the convex hull of λ λ (4. This leads to the following proposition that holds for stationary points.14) ¥ ¤ 6 0 for all feasible directions d from x then 0 Proposition 4. This must hold since C x is a convex function.12) (4. If Fd x On the other hand. And from proposition 4. we have that Cd x 0 µcT d 0 (4. f ∂F x and c ∂C x .11) 1 u d Figure 4.   e n¥ ¤ R . p ¥ ¤ 6  ¥ ¤ 6 0 M x and y IRn : C y 0 (4. An illustration of a simple case is given in ﬁgure 4.2 Let x D .3. In that case the feasible domain D would not be a point.

56]. The following proposition deﬁnes the condition that has to be satisﬁed in order to have a strongly unique local minimum.4: z is deﬁned inside the circle. not generalize this to the directional gradients for C x 0. @ 9 ¥ ¤ ¦ ¥ ¤ ³  B §k k A 6 ¥ ¤ & ( x¥ ¤ " ¥ ¤ 6 0 for \$ ² m( p ¨ v" ¨ 6 ` ¥ ¤ ¥ ¤ ¥ ¤R ¥ ¤ 6 Proposition 4. when going to a strongly unique local minimum. however.. C ONSTRAINED M INIMAX 47 4. This is illustrated in ﬁgure 4. possible to say something about Fd x if d is a feasible direction. When x is interior to the map M x we know from the strict separation theorem 2. We assume that 0 M x .2 Strongly Unique Local Minima A strongly unique local minimum in the constrained case. so that z ε M x. that is not a plane or a line etc.e. 2 ε z PSfrag replacements Figure 4. is a minimum. Then we introduce a vector that lives on the edge of the circle (in the 2 norm case) so that v εd d and v ε. As in the unconstrained case. Let d IRn be a feasible direction. 57]. It is. and that there exists a vector z IR n . however. which is a subset of M x .C HAPTER 4. i. Proof: [Mad86. Cd x 0.3 For x D we have a strongly unique local minimum if 0 int M x  6 p ¥ ¤ 6 £ . Then by using the stationary point condition (Fritz-John) we can write the vector v as \$ ( x¥ ¤ 6 '¥ ¤ \$ 6  ¥ H ° ¦ ¤ " v λf 1 λc f ∂F x c ∂C x ¦ }¨ ¨ ¨ q ¨ ¦ e e¥ ¤ R £ ¥ ¤ M x (4. algorithms will have quadratic ﬁnal convergence. p. but is a point and can be deﬁned uniquely by using ﬁrst order information. because C x does not inﬂuence the landscape of F x in any way – we shall later see that such an inﬂuence is possible through the use of a penalty function. For the constrained case we can not directly make a statement that says that Fd x 0 for all directions d. when x is a strongly unique local minimum.16) @ 9  ¥ ¤ R  ¥ ¤ R If C x 0 we just have the unconstrained problem and then we know that Fd x all directions d. We will brieﬂy give an explanation inspired by a proof in [Mad86.1 that for all directions d v M x (4.4 with the norm. p. Then we introduce a kind of radius measure ε 0 so that z ε M x . as in the unconstrained case.15) vT d 0 We can.

1 we get the following inequality Fd x λfT d ε d (4. We can. which leads to Because we go in a feasible direction Cd x 0 due to proposition 4.3 The Exact Penalty Function In constrained minimax optimization. Therefore by using an exact penalty function we can use all the tools and theory from unconstrained minimax on a minimax problem with constraints. and x D then \$ ( xQ¥ ¤  c¥ ¤ }¥ ¤ " \$ 4 5¥ \$ ¤ min P x σ max f j x fj x σci x (4.18) So even in the constrained case a strongly unique local minimum Fd x is strictly positive for all feasible directions. is a function that has the same minimizer x as the constrained problem. (4. the constraints c i x does not have an inﬂuence on the minimax landscape F x . We see that if σ 0. use the unconstrained theory in constrained minimax if we use an exact penalty function. Because of this.34). Then c i x and in this situation we have ¥ ¤  #¥ ¤ ¦ §¥ \$ ¤ i ¯¥ ¤  e¥ ¤  fj σci x fj x Pxσ fj x σci x ( §¥ ¤   "  q6  \$&&& d\$ ¦ \$&&& d\$ ¦ where i 1 means that m. This (4. as in other types of optimization. when x is feasible. the theory of constrained minimax is somewhat different from the unconstrained theory. j 1 q. An exact penalty function. 4.17) . We now introduce an exact penalty function for minimax that has the following form for σ 0 x Notice that in this case σ is irrelevant.22) ¥ ¤ ¥ ¤R T  ¨ ¨ ¥ ¤ & l¨ ¨ ` ¦ e f¥ ¤ R ¥ H j ¤ ` ¥ ¤R ¦ vT d λfT d 1 λ cT d ε d ¥ ¤ ¥ ¤ ¨ ¨ ¦ (4.19) i ci x 0 . however. C ONSTRAINED M INIMAX We can do this since the Fritz-John condition can create any vector in M x and v belongs to a subset of M x . The general setup of p t x σ that ¥ \$ ¤ \$&&& \$  &&& \$ ¦ \$ ¦ ´¥ ¤ \$  #¥ ¤ ¥ ¤ pt x σ pt x σ fj x fj x t t 1 m m 1 & ¥ ¤ ¦ 5¥ \$ ¤ i §¥ ¤  #¥ ¤ ¦ b¥ \$ ¤ ¦ b¥ \$ ¤ ¥ \$ ¤ ` ¥ ¤ fj x fj x σc j x Pxσ fj x z e Q¥ ¤ Of course the situation is the same vice versa.21) (4. By taking the inner product of v and the feasible direction d we see that vT d ε d .20) 0 for all i. In order to simplify the following notation we introduce the inner functions of P x α σci x where z is deﬁned in the following discussion in (4.48 C HAPTER 4.

pm x σ . C ONSTRAINED M INIMAX will be used here is 49 . . . To resume the discussion from previous. fm x . . So the feasible area D is deﬁned by C x 0. . . . 1 m Further we have that P x σ max pt x σ and the gradients are denoted with pt x σ . We illustrate this by an example We have the following equations And we solve the problem minx F x s. ¥ ¤ σc1 x (4.C HAPTER 4. .25) ¥ ¤ ¦ ¦ ¥ \$ ¤ & p1 x σ . . ¥ \$ ¤ ¥ ¤ ¦ T 4 f¥ \$ ¤ e f¥ ¤ . .23) (4. . .5: function P x σ σ 0 0 25 1. pz x f1 x . . . . . xσ f1 x . . . . . and the optimal feasible solution to the constrained minimax problem is x 0 3. ﬁnally the generalized gradient is denoted ∂P x σ . . . .1). . .t. . The penalty plotted for & ¦ T ( ¥ ¤ '¥ ¤ " \$ ¦ E¥ ¤ & H ¥ H ¤ \$ e e¥ ¤ ¥ ¤ ¦ ®¥ ¤ ¦W¥ ¤ ¦ W¥ ¤ f1 x f2 x c1 x 1 2 2x 1 2 x 1 3x 1 2 1 10 0 ¥ \$ ¤R ¥ \$ ¤ ¥ ¤ σc p x ¥ ¤  #¥ ¤  #¥ ¤ ¥ \$ ¤ ( ¥ \$ ¤ "  ¦ ¦ ¥ \$ ¤ ¦ 5¥ ¤ ¥ \$ ¤  pz m x σ . pm . . . fm x .19). . . f1 x .24) (4. However. a σ 0 is not enough to ensure that P x σ solves the constrained problem in (4. The problem is shown in ﬁgure 4. . we see that the unconstrained minimax function F x max f 1 x f2 x has an unconstrained minimum at xu 0 5. in fact σ has to be large enough. . fm x σc p x ¥ ¤  #¥ ¤ ¦ b¥ \$ ¤  xσ σc1 x ¥ ¤  #¥ ¤ ¦ ¥ \$ ¤  pm . . .5 05 ¥ ¤ C x x D ±±D  D C @ D 9 PSfrag replacements & T ¥ \$ ¤ Pxσ x Figure 4. C x c1 x by using the penalty function P x σ in (4. .

28) end If the constraints are violated then we just increase σ 0 by following some speciﬁc scheme. The simplest way to check if σ should be increased is to check if any of the constraints have been violated.26) 0 at a stationary point (4. we will eventually get an unconstrained minimizer of P x σ at x 0 3 which is the solution to (4. With the aim of making an algorithm for constrained minimax. When using an exact penalty function (4.27) (4.29) . We see that in order to write the exact penalty function into a linear program. we ﬁrst need the m functions f j x and then all the combinations of i and j in f j x σci x . C ONSTRAINED M INIMAX The ﬁgure clearly illustrate that σ 1 is not large enough. The example raise a basic question.25). multiply σ with 10 when C x 0. that is part of the package for Robust Subroutines for Non-linear Optimization [MNS02]. Such a scheme is also implemented in the fortran program MINCIN for linearly constrained minimax optimization.22). that is if C x 0 increase σ (4. Indeed if we let σ grow. If σ 2 1 then Fd x 0 and this will then not be a strongly unique local minimum. 4. we can not evaluate the size of σ as sketched above. for that simple reason that we do not know the solution x .19) we have to set up the linear subproblem in a special way. we should choose σ 2 1. We see that only p4 x is inﬂuenced by σ. would give the desired quadratic convergence. How can we detect if σ is large enough? When the constrained solution is known. As a theoretical insight.4 Setting up the Linear Subproblem As described in details in chapter 3. and by using that Fd x we have that 0 σ 31 x p4 x This yield σ 2 1 for x x . we need to sequentially solve a linear subproblem in order to solve the non-linear minimax problem in (2.1).7) and (4. The exact penalty function can be written as x where pt x σ is deﬁned in (4.50 C HAPTER 4.g. When i 1 m and j 1 q then the exact penalty function \$ ( 0¥ \$ ¤ " ¦ ( µ¥ ¤ \$&&& \$ ¥ ¤ ¦  #¥ ¤ ¥ ¤ " \$ \$&&& \$ ¦ 4 5¥ \$ ¤ min P x σ max f j x fj x σci x max pt x σ & T  ` ¥ ¤R & }¥ & }¥ H ¤  H T ¤  ¥ H ¤ ¥ H ¤  Q¥ ¤ `  ¥ ¤ i ` ¥ ¤R ` ¥ ¤R ¦®¥ ¤ ¦ ®¥ ¤ T ¦ & ¥ ¤ ¥ ¤ ¦ &  #¥ ¤ ¥ \$ ¤ ` & ¦ T At x 0 3 we see that there must be two active functions p2 x p4 x 1 2 1 2 & ¦ ¦ & ¥ \$ ¤ ¦ ¥ \$ ¤ x x 1 1 2 2 σ 1x 3 1 10 (4. the size of σ can be calculated. that is descriped in the following. because the unconstrained minimizer of P x σ is at x 0 5. e. We give an example based on the previous example. because then a posible strongly unique local minimum at x .

t. A stopping criterion for the algorithm will be discussed later.34) ¦ ¦ We have that the number of inner functions is m This gives the following linear system 2. Of course we kan not guarantee that σ 1 is big enough so that the solution of P x σ corresponds to the constrained minimizer of F x . Then we form the linear programming problem as described above. \$ e pt x σ T h 4 ¥ \$ ¤ R #¥ \$ ¤  ¥ \$ ¤ minh α G h α s.g. Then and The total amount of constraints in the LP problem for m inner functions.5 An Algorithm for Constrained Minimax We can now outline an algorithm that solves a constrained minimax problem by using an exact penalty function. The situation covered in the above example is for inequality constraints of the form . The easiest way to do this is inside the test function itself. Hence wee need an update scheme like the one proposed in (4.33) (4. and the number of constraints is q ¥ ¤ 1. ¥ \$ ¤ ¥ \$ ¤ ¦ & Q¥ H ¤ ¥ \$ ¤ ¥ ¤  ¦  ¦ z m 2m r mq r ¥ ¦ pt x α &}¥   ¤ \$&\$ #¥  ¤ & &   ¤ \$  \$&&& \$&&& d\$ ¦ ¦ ee¥ ¤ ¦ 5¥ ¤ ¥ ¤ fj x ci x ci x for t 0 for t 0 for t 1 m m 1 m1 m 1 2r 1 \$ \$  \$&&& \$&&& \$ ¦ ¦ e e¥ ¤ ¦ 5¥ ¤ ci x ci x e e¥ ¤ e α ci x α 0 for i 0 for i e}¥ ¤ H wi ¶ e }¥ ¤ ci x ci x α α 1 r r 1 q 2r m1 r q e ¦ 5¥ ¤ ·§¥ \$ ¤ ¦  \$&&& '\$ H ¦ where t 1 4. be σ 1. Due to the nature of the exact penalty function P x σ . p constraints where the r of them is equality constraints is 4.28) to update σ. Hence we can use the tools discussed in chapter 3 to solve P x σ .C HAPTER 4. We can however also handle equality constraints quite easily by just adding additional constraints to the LP problem. First we give a more formal outline of the algorithm. C ONSTRAINED M INIMAX 51 gives rise to at least m q extra inner function not counting the ﬁrst m inner functions f j x . we can solve it as an unconstrained minimax problem. We can write an equality constraint c i x α as So each equality constraint gives rise to two inequality constraints in the LP problem. pt x σ α α (4. First we decide an initial penalty factor. We order the constraints so that the ﬁrst r constraints corresponds to equality constraints and the rest q r corresponds to inequlity constraints.5. a .30) (4.32) (4.31) (4. This could e.25) shown in ﬁgure 4. We give an example of this from the problem in (4.

We ﬁnd the minimizer of this problem by using the CMINMAX algorithm.52 C HAPTER 4. If σ is large enough and x 0 is feasible then the algorithm will behave as an interior point method. The following table is a printout of the values of the inner functions of P x σ for each value of σ. Figure 4. Else if C x 0 then a constrained minimizer has been found. Both examples uses the Rosenbrock function subject to inequality and equality constraints and with the classic starting point at x 1 2 1 T .6 illustrate the behavior of the algorithm for each σ. and we get closer to the feasible area but still we are infeasible.1 CMINMAX We now give two examples that uses CMINMAX to ﬁnd the constrained minimax solution. and we ﬁnd the constrained minimizer.5. where σ is passed to fun by using fpar. repeat x = slp(fun. Finaly σ 5 is large enough. At σ 0 05 the algorithm ﬁnds a solution outside the feasible area indicated by the dashed circle on the ﬁgure. x := x0 . The algoritm then increase σ to 0. Here we use an unconstrained minimax solver. fpar.35) xT x 02 0 ¦ q \$ & H¸¦ o ¦ E¥ ¤ && p ¥ ¤ & ¦ (  " 1 (  " 2 ( " 3 The user selects the initial value of σ. Otherwise it will behave as an exterior point method. x. until f ound or k kmax end 1  ¦  Q¥ ¤   ¦ ¦ & . We can use the SLP or the CSLP algorithms from chapter 3. (  " (  " ( " 3 2 begin k := 0. if C x 0 Increase σ. else f ound : true. or Matlab’s own minimax solver fminimax The function fun should be an exact penalty function. k k 1. for σ 1. ¦ ¥ \$T ¤ ¦ q & \$ e & H & QT o ¦ 4 ¥ ¤ ¦ b¥ ¤ minx F x st c1 x ¦ 6 First we look at the system where x IR 2 Rosenbrock (4. We have m 4 inner functions and q 1 constraints. The constrained minimizer is found at x 0 4289 0 1268 T . f ound := false. If C x 0 then we have found an unconstrained minimizer inside the feasible region D and we should stop. σ 0.4. with σ 0 0 05. C ONSTRAINED M INIMAX Algorithm 4. opts). and again we should stop.

5312 0.C HAPTER 4. this is the same minimizer as in the previous example.5711 ¦ & ¦ && & & ¦ ¦ σ 0 05 σ 05 σ 5 ¦ .0900 σ 05 -0.4048 0.5711 -0.36) Tx 0 2 c1 x x 0 For this problem with equality constraints the constrained solution is found at x 0 4289 0 1268 T for σ 5.6: Iterations of the CMINMAX algorithm. this will become important in the later discussion regarding stopping criterions for CMINMAX. The behavior for CMINMAX with σ0 0 05 can be seen in ﬁgure 4. Figure 4. f1 f2 f3 f4 p1 p2 p3 p4 p5 p6 p7 p8 From the above function evaluations we notice that max f i max pt for σ 5.2785 0. where we also use the Rosenbrock function.7: Iterations of the CMINMAX algorithm.5711 -0.2785 σ 5 -0. Next we present another example of constrained minimax optimization.0000 0.0000 0.5711 -0. with an infeasible starting point.4048 -0.5711 0.5711 0.4048 -0. If we had chosen the initial penalty ¦ & ¦ & ¦ σ 0 05 σ 05 σ 5 q & \$ & T o ¦ ¦ ( " ¦ ¦ ( " ¦ & H & ¦ 4 ¥ ¤ ¦ b¥ ¤ & ¦ σ 0 05 0. ¦ As shown on the ﬁgure for σ 5 we have two minima.0000 0.0900 0.4048 0.5711 0.7. but with slightly different constraints. The dashed circle contains the feasible area. We have the following problem minx F x Rosenbrock st (4.0900 0. C ONSTRAINED M INIMAX 53 Figure 4.5312 -0. with an infeasible starting point. The dashed circle indicate the feasible area.0900 0.0000 -0.5711 0.

C ONSTRAINED M INIMAX Again we bring a table of the inner function values of P x σ .5312 -0.5711 0.0900 0.2785 0.0900 -0.5711 0. i. f1 f2 f3 f4 p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 From the tables of the inner functions in the two examples we saw that max f j max pt was always satisﬁed when the constrained minimizer was found. y IRn for which C y then we can use the Fritz-John stationary point condition that says (4.5711 -0.2785 0.0900 σ 05 -0. We can then deﬁne a stopping criterion where term 0 ε 1 handles the numerical issues regarding the stopping criterion.0900 -0.4048 0.0000 0.2.37) ( 6 \$ 6  ¥ H j ¤ ¥ ¤ " 76 0 λf 1 λc f ∂F c ∂C m p ¥ ¤ When the feasible domain D IRn is more than a point.5711 -0. i.5711 0.e.39) ( " ¦ ( " 6  ¦ & ¦ σ 0 05 0. We check to see whether one of the inner function of F x is active. This knowledge can be used as a stopping criterion for CMINMAX.0900 -0.5312 σ 5 -0.0900 0. We see that 4 rows in the tables of the two t 1 m (4.5711 0.4048 0.5711 -0.0000 0. where λ 0. This is also indicated by proposition 4. for each value of σ.0900 -0. The solution to the exact penalty function in the k’th iteration of CMINMAX is denoted by xk .5711 0. when σ was large enough. If the above criterion is not satisﬁed then it indicates that & ( h¥ \$ T ¤ " p ¥ T ¤ " ( max f j xk max pt xk σ \$ '\$ \$&&& ¦ \$ e¨v¥ \$ T ¤ " (  ev¥ T ¤ " ¨(  v¥ T ¤ e ¨( Hg¥ \$ T ¤ ¨ Hg¥ \$ T ¤ ¨ H g¥ \$ T ¤ ¨ P xk σ P xk σ P xk σ F xk ε max f j xk max pt xk σ ε ε t 1 & \$&&& h( \$ ¦ l¥ \$ ¤ l( d\$  " ¦ \$&&& ¦ l¥ ¤ "  fj x j 1 m pt x σ ¦ The inner functions of F x correspond to the ﬁrst m previous examples.4048 -0.0000 0.38) (4.5711 -0.e. This indicate that at least one of the inner functions of the unconstrained minimax problem F x has to be active at the constrained solution corresponding to the stationary point for P x σ when σ is large enough.5711 q & \$ & Ht¦ T o ¥ T ¤ \$ & ¦ ( " ¦ l( " ² ¥ ¤ ¥ \$ ¤ ¥ ¤ ¹ p ¦ factor σ0 5 then we would have found the other minimum at x 2 0 3599 0 2655 T . Again for σ 5 we have that max f j max pt .0900 0.0000 0.2785 -0.2785 -0.  ¦ 0.5711 0.5312 0.5711 -0.5312 0.54 C HAPTER 4. T .4048 -0.

As an addition to this presentation.38) is that it introduces a new preset parameter ε to the CMINMAX algorithm. It was also suggested that in this case σ should be increased by a factor of ten. From the theory in chapter 2 we know that if pt x σ is active then the corresponding multiplier will be positive. In this way an increase of σ can be found that is connected to the problem being solved.20) and (4. we have no idea of what the right size of the multiplication factor should be. we can avoid this preset parameter by seeing that (4. 4. The motivation behind the choise of multiplication factor is that we want to ﬁnd a σ that is large enough.6 Estimating the Penalty Factor In the previous description of the CMINMAX algorithm. we give a short argumentation for why the following stopping criterion does not work 1 As we saw in the start of this section on ﬁgure 4. and use the theory of constrained minimax to give us a value of σ k that guarantees a shift of stationary point for xk D .38) will also work if there is an unconstrained minimum inside the feasible domain x k D .21). the solution x k is not guaranteed to move 0 could happen even if we have not found when σ is increased. because the convexity of P x σ ensures that The major advantage with this stopping criterion is that we do not have to supply CMINMAX with the value of C x . C ONSTRAINED M INIMAX 55 The only way this can happen is if xk D . and there will be no corresponding positive multiplier. The stoping criterion (4. Hence pt x σ then max f j x for t 1 m can not be active. In this section we will take a closer look at the multiplication factor.40) ` ¥ ¤ " ( max f j x max pt x σ 6 T q6 T ( ¥ \$ ¤ " ¥ ¤ " e ¥ T ¤ \$&&& '\$ p ¥ ¤ " ( ¥ \$ ¤ ¥ \$ ¤ ¦ q6 T .40) because if x D max pt x σ for σ 0 as shown in (4. three or maybe even twelve. In other words. Therefore xk 1 xk the constrained minimizer. One could argue that the multiplication factor also could have been choosen as two.5. but equivalent stopping criterion The demand that C xk 0 is not needed when we use (4. However.38) should interpreted as a check for whether some inner functions of P x σ is active.38) or (4.C HAPTER 4. so that the unconstrained minimizer x of \$ ( h¥ T ¤  #¥ T ¤ " ¦ E¥ \$ T ¤ Px σ max f j x σci x T T i e ¨ T H  T ¨ xk xk ε stop (4. an increase of the penalty factor σ was proposed if the current iterate x k was not feasible.41) ¥ \$ ¤ q6 i g( d\$ \$&&& ¦ ¨ T H  T ¨ T ¦   T  µ( " max λt 0 t 1 m stop ¥ \$ ¤ & ( x¥ \$ ¤ " (4. The disadvantage by using the above stopping criterion (4. This gives rise to the following preset parameter free.

We give a brief example based on the two examples in the previous section for σ 0 5 where we ﬁnd σ by using the above formula. In the two examples the solution was the same in the second iteration of CMINMAX where x 2 0 5952 0 3137 T .56 C HAPTER 4. i. x However one could also look at this from a different perspective.. C ONSTRAINED M INIMAX corresponds to the minimizer x of a minimax problem with constraints i. as a question of ﬁnding the value of σ σk that will force a shift from a stationary point x k D to a new stationary point xk 1 . and hence we call them pseudo active.t. It is seen that f j x and ci x is part of this expression. Then we can write (4. and now a more precise formulation of the problem can be given.e. C x 0 ¥ \$ ¤ ¥ 'T ¤ º ¥ ¤ 6 T Y q6 T ¦ T T ¦ º T 6 T . First we look at the condition for a stationary point when a penalty function is being used. 6 7 for the example with inequality 0 5c1 x2 0 0000 10 0000 & ¦ q & & ¦ T o Y j Pf i Pc ¼ ¥ ¤ R \$ '¥ ¤ R ∑ λ jf j x ∑ λi ci x & j Pf ¨ ¥ ¤ R ∑ λ jf j Y j Pf i Pc x 2 (4. By iteratively increasing σ .22) we saw that t corresponded to some combinations of f j x and ci x . Find the smallest penalty factor σ k that triggers a shift of stationary points xk xk 1 . If we have a stationary point x so that 0 ∂P x σ and the constraints are violated C x 0 then ∂P x σ 0 ∑ λt pt x σ pt x σ t A In (4. where xk D .42) ∑ λ j f j x σ ∑ λi ci x which leads to σ We now have an expression that calculates σ when we know the multipliers and the gradients of the inner functions and constraints at x.43) ¥ ¤ 6 ¥ ¤ & ¥ ¤ 6 6 Q¥ ¤ R '¥ ¤ \$ \$ ¥ ¤ R 6 }¥ ¤ R ¥ ¤ ¥ & \$T ¤ ¦ 5¥ ¤ R Y \$ ¥ ¥ ¤ R ¨ H  P¥ ¤ Y » ¦ Y ¦ e¥ \$ ¤  #¥ ¤ R ¤ H T & r Y ¦ 0 ∑ λt fj x σci x fj x ∂F x ci x ∂C x  5¥ ¤ ¥ ¤ ¥ ¤ & ¥ \$ ¤ q6 T & 6 e¥ \$ ¤ R ¥ \$ ¤ R \$ e Q¥ ¤ min F x s. that eventually leads to an unconstrained minimizer of P x that solves the corresponding constrained problem. By using this correspondence we get t A That t A means that pt x α f j x σci x is active.e. shift of stationary points will be obtained. We denote a shift of stationary point by . j P f and i Pc . p7 x2 0 5 Here are the numerical results f 3 x2 0 5c1 x2 s & H ¦ §¥ T ¤ R \$ s & ¦ λ6:7 f2:3 x2 & & & H 0 9686 0 0314 1 0000 11 9034 ¥T ¤ ¥T ¤ & c¥ T ¤  & c¥ T ¤  r ¦®¥ & \$ T ¤ ¦ ®¥ & \$ T ¤ p6 x2 0 5 f 2 x2 \$ ¦ The active inner functions at x2 was pt x2 0 5 with t constraints.

For a problem with linear constraints. which is visualized in ﬁgure 4.1. Next we calculate σ j Pf and i Pc which yield We have now shown that we can calculate σ at x. we can write PSfrag replacements The above system can be solved by the Lagrange function i and if we differentiate this with respect to ε i we get & ¦ dL dεi λi \$ ¥ H  c¥ ¤ ¤  ¦ f¥ \$ \$ ¤ L xαλ T T ( \$ " max c1 c2 c1 x  c2 ε c2 Figure 4. e  #¥ ¤ 4 ¥ \$ ¤ ¦ b¥ \$ ¤ minx α G x α s.44) 0 and (4.8: Constraint c2 has been perturbed by ε. The value of σ changes the “landscape” of P x σ which again also changes the multipliers λ. The relation between λ and σ should become apparent. Note that due to the structure of pt x α we use  .16]. We can also calculate the multipliers for a given value of σ but not by using (4. 14.C HAPTER 4.45) (4. when looking upon λ as gradients [Fle00. C ONSTRAINED M INIMAX 57 in the following calculations. if we know the multipliers λ.43). xε x α ∑ λi ci x εi α ¦ Notice the perturbation ε of the constraint.46) & & ¦ §¥ T ¤ R c1 x2 1 1903 0 6275 1 1903 0 6275 ¥ \$ ¤ & & r q & & 5¥ T ¤ R o ¦ and c1 x2 1 1903 0 6275 . gxα α cx & & σ ¦ ¼ ¨ \$ ¨ » ¦ H v 2 vw 0 5000 ε α \$ q & & t¦ o ¥ T ¤ R §¥ ¤ R ¦ ¥ \$ ¤ a Y ¦ w ∑ λi ci x c1 x2 T λ6:7 1 1903 0 6275 T q & H & Ht¦ o ¥ T ¤ R §¥ ¤ R ¦ Y ¦ v ∑ λ jf j x f2:3 x2 T λ6:7 0 5952 \$ s 0 3137 T (4.8 for ε 1 ε2 0.t.

xk D which indicate P xk σ pt xk α If we are at an infeasible stationary point and increment the penalty factor ∆σ. At xk only one constraint is pseudo active. As we have seen in the previous. This shows that the multipliers in fact can be looked upon as gradients. To do this we use that is equivalent with A shift of stationary point will then occur for some σ k . if the penalty factor is not high enough. a shift of stationary point. when 0 ∂P xk σk . If e. The sollution to the problem of ﬁnding σ k is described in the following. we have two assumptions. In this case we can illustrate the convex hull like the one We are interested in ﬁnding that ∆σ that translates the convex hull ∂P x k σ so that 0 is at the edge of the hull. We chose that point a i is the vector v ai that ¦ ¥ \$T ¤ \$ ( h¥ T ¤ R  c¥ \$ T ¤ R " ¥  \$T ¤ ¦ ¿¥ 6 ( ¥ \$ T ¤ f¥ \$ T ¤  " ¦  \$T ¤ 6 0 ∂P xk σ ∆σ conv pt xk α ∆σci xk ¥T ¤ ¥T ¤ ¥ \$¥ T ¤  #¥ \$ T ¤  c¥ T ¤ #¥ T ¤   °#¥ T ¤ ¤  ¦ ¦ ¦ ¾¥  \$T ¤ P xk σ ∆σ f j xk σ ∆σ ci xk f j xk σci xk ∆σci xk pt xk σ ∆σci xk ¥T ¤  #¥ T ¤ f j xk T ¥ T \$ T ¤ q6 ( ¥ \$ ¤ §¥ \$ ¤ ¦ ¦ §¥ \$ T ¤  ¥ \$ ¤ R " T ¦ 5¥ \$ T ¤ ¦ 5¥ \$ ¤ ¥ T ¤R 6 0 ∂P x σ conv pt x σ t : pt x σ Y t A t A \$ x( ` \$ ¦  X¥ \$ ¤ R Y " ¦ V5¥ \$ ¤ 6 0 ∂P x σ ∑ λt pt xσ ∑ λt 1 λt 0 (4. and hence increase σk . because of their relation to σ.9 left. However.9 right. because then xk will no longer be a stationary point.50) ¥ ¤ T T º T T ¦ p T ¥ \$T ¤ T q6 T ½ ½ T .g. i. From [HL95. then the CMINMAX algorithm will ﬁnd an infeasible solution x k . c 1 x had a negative gradient.47) Pxσ (4. This means that λ and σ is related to each other. The above shows that an increase of σ will lead to a change of λ i and (4. so that x k To simplify this problem somewhat. it holds that 0 ∂P x k σ ∆σ .58 C HAPTER 4. C ONSTRAINED M INIMAX The above is only valid if the active set in the perturbed and unperturbed system is the same.43) indicated that a change of λi indicate a change of σ. then which give rise to the stationary point condition where t pt xk σ P xk σ shown in ﬁgure 4.e. and then the CMINMAX algorithm will ﬁnd a new stationary point at xk 1 . it would be beneﬁcial to ignore the multipliers all together.48) σci xk . shows γ deﬁned as the length from 0 to the intersection between the convex hull ∂P xk σ and ci xk . section 4. (4.7] we see that the multipliers are the same as shadow prices in operations research. The above ﬁgure also illustrate why λi can not be negative for a stationary point.49) (4. then λ2 0. Figure 4. This is a somewhat ambiguous deﬁnition because we always get two points ai of intersection with the hull. It would be interesting to ﬁnd that value σk σk where the penalty factor will be high enough to trigger xk 1 .

53) (4. Then it will hold that hence xk is no longer stationary and e. left: The convex hull of ∂P x 0 is indicated by the dashed triangle. has 0 vT ci xk .52) @ D 9  R   R R @ 9 R R c1 @ D 9 ¥T ¤R R R T @ D 9  T . because xk is still stationary. CMINMAX.g. We now have that where σk indicate the value of σ at the k’th iteration of e.9: pt x α and ci x has been substituted by pt and ci on the ﬁgure. we add a small value 0 ε 1 so that (4. trigger a shift to a new stationary point x k xk 1 .C HAPTER 4. C ONSTRAINED M INIMAX σc1 p1 PSfrag replacements p3 σc1 0 p2 PSfrag replacements σc1 γ 59 Figure 4. right: The length from 0 to the border of the hull in the opposite direction of c 1 is indicated by γ the length of the solid line. however.54)  T T º T \$ I¥  \$T ¤ G T 6 0 edge ∂P xk σk H ¨ V¦ ¥ T ¤ R ¨ ¦ i ¦ E¥ T ¤ R ∆σci xk γ ∆σ ∆σ & ¥ T ¨¤ R ¨ ¨¥ T ¤ R ¨ ¥ T ¤R ci xk ci xk γ ci xk v 2 vT ci xk (4. We illustrate the theory by a simple constrained minimax example with linear inner func- \$ }¥ T \$ T ¤ q6 0 ∂P xk σk &   ¦ T σk σk ∆σ ε ¹ p In order to ﬁnd the trigger value σk . The value σ k ∆σ does not. The solid triangle indicates the convex hull of ∂P x σ . We can now ﬁnd the translation of the convex hull so that 0 is at the edge of the hull. CMINMAX will ﬁnd a new stationary point at xk 1 .g.51) (4.

Generalized gradient indicated by solid line. At σ 4. because some λt 0 for t 1 m indicate that x3 D i.14. When σ 1 5 ε where ε 0 0001.55) p 1 5 3 and x2 2 2.12. then there is a new shift of stationary point from x 2 to x3 as shown on ﬁgure 4. Consider the following problem 1 The ﬁgure indicates the feasible area D as the region inside the dashed lines. and thus the multipliers are not taken into account.10: Illustration of the problem in (4. & ¥ ¤ (4. The problem has two critical values of σ that will trigger a change of stationary point.13.55). as indicated by the dots on the ﬁgure. pt x3 1 5001 f j x3 .e. The generalized gradient ∂P x is indicated by the solid lines. Constraints indicated by dashed lines. C x c1 x c2 x c3 x max f j x x 1 x2 x 1 x2 x1 4 3x1 j 1 4 ¤ ¤ ¤ ¤ ¤ (4. For all other values of σ it is only possible to get three different strongly unique local minima.10 for an illustration of the problem in the area x 1 \$&&& \$ ¦ \$ H H  H H  e ( À¥ ¤ " max x1 x1 x1 ci x 1 1 2 x2 1 4 x2 10 2 0 i \$&&& '\$ 1 ¦  H \$ ( 0¥ ¤ " H H H H ¦ T ¦b¥ ¦b¥ ¦b¥ ¦b¥ ¦ ¥ ¦ ¦®¥ ¤ ¦®¥ ¤ ¦®¥ ¤ ¥ ¤ minx F x f1 x f2 x f3 x f4 x s.60 C HAPTER 4. T ¥T ¤ ¦ Q¥ ¥ \$T ¤ ¥ \$T ¤ & \$T ¤ 6 T T & & ¦ ¦ q & \$&&& d\$ o Ä6 Ã & Ã ¦ ¦ When σ 4. σ 0. C ONSTRAINED M INIMAX tions and linear constraints. Note that ∆σ is calculated on the generalized gradient deﬁned only by the gradients pt x2 σ .56) ¦ ¦ ¦ T  . 1 ε where ε 0 0001 there is a shift of stationary point as shown on ﬁgure &  & ¦ T \$  1 15 q \$ HÂ6 o C q \$ & HÁ6 o See ﬁgure 4. For certain values of σ. Figure 4. When x3 is reached the algorithm should stop. the problem contains an inﬁnite number of stationary points.t.11. 1 and σ σ1 ε σ2 ε 1 5 the problem has an inﬁnity of stationary points as illustrated in ﬁgure In the interval σ 1 1 5 the generalized gradient ∂P x 2 σ based on the multipliers performs a rotation like movement as shown in ﬁgure 4.

C HAPTER 4.13: For σ 1 1 5 . C ONSTRAINED M INIMAX 61 Figure 4. x2 0 0T x3 02 04T & ¦ & ¦ σ 11 σ 14 u Æu  Å C Figure 4. .12: For σ 1 0 0001 there is a shift of stationary point x1 & ¦ & ¦ σ 0 9999 σ 1 0001 & ¦  C ¦ σ 1 σ 15  y C x2 . ∂P x2 σ based on the multipliers performs a rotation like movement.11: For σ 1 and σ 1 5 there is an inﬁnity of solutions as illustrated by the solid lines. The intermediate and the ﬁnal solution found for this example was & q & & H¦ T o \$ q o t¦ T \$ q o t¦ T x1 2 0T @ Du 9   Figure 4. The line perpendicular to the solid line indicate the generalized gradient.

t.g. that there is a connection between the distance γ to the edge of the generalized gradient ∂P x σ and the dual problem. The intersection with the convex 2 0 8766 & s & H ¦ R \$ s & ¦ R \$ s q & & Ht¦ o & H ¦ R p9 p10 p11 & & H 0 8766 1 0617 0 8766 0 9383 r r T ¦ & H r & ¦ For e.t. & ¦ & ¦ σ 1 4999 σ 1 5001 T º T . C ONSTRAINED M INIMAX 05 Until now. so that v T c 0. because they inﬂuence the penalty factor. ˆ x then. that if the edge of the convex hull is known. called the dual problem. AT y g xα \$ e ¥ \$ ¤R ¥ ¤ P ˆ min g x α T x s. Then for σ1 1 0000 ε. then we can solve the problems by using the multipliers quite easily. where there is a point of intersection v with c. and vice versa.62 C HAPTER 4. we can ﬁnd the corresponding dual problem y Surprisingly it turns out. Both can be used to ﬁnd the penalty factor. ∆σ 1 0000 . we get a shift of stationary point b \$ & ∆σ  & ¦ T & ¦ ¼ o ¦ dq & H q & &\$ q & & H¨ & Ho » ¦ H ¨ o 0 8766 0 4383 0 8766 0 4383 1 0 q & H & Ç¦ R o The pseudo active constraint gradient is c 2 0 8766 0 4383 T . Let us say that the LP problem in (3.14]. & ¥ \$ ¤ R ¦ ¥ \$ ¤ ¥ ¤ D max bT y s. Table 6. Aˆ x p  which yield σ x1 x2 . by using [HL95. Chapter 6] every linear programming problem is associated with another programming problem.14: For σ 15 0 0001 a shift occur from the stationary point x 2 to x3 . the dual solution to the primal problem. In other words. It turns out. we have not taken the multipliers into account. hull is found at v 10 0 5 T .4) is the primal problem. σ 0 1234 and x x1 the convex hull is deﬁned by the vectors 1 1234 0 0617 u u  Å  C Figure 4. According to [HL95. however. can be visualized as a distance from origo to the edge of a convex hull.

The above shows. which is interesting in itself. From the previous we know that the intersection point on the convex hull with c 2 x is at the line segment So we can write the following system of equations where we. λ4 0 25 and σ shift of stationary point in the primal problem. We now have a theory for ﬁnding the exact value of σ that triggers a shift of stationary point.57)  ¦ }T \$ s ¦ ¦ &  s ¦ ¥ ¤R & ¥ ¤R ¦ ¥ ¤R f1 x 1 f2 x 1 c2 x 0 r λ1 λ2 σ 0 1 ( \$ ¦ \$ ` \$   ¦ \$ '¥ ¤ R  #¥ ¤ R ¦  " u u λ 1 f1 x λ 2 f2 x ∑ λi 1 λi 0i & s 12 ¦ &  s ¦ ¥ ¤R ¥ ¤R ¥ ¤R & ¦ ¥ ¤R f1 x 1 f2 x 1 f3 x 1 r λ1 λ2 λ3 0 1 q \$ t¦ o We give an example of this.57) propose to use a σ where σk is multiplied with ξ.g. will trigger a shift of stationary point. λ2 0 75 and σ 1. 1 5. we had And the solution is λ1 0 25. For non-linear problems. where ξ 1 is a multiplication factor. In practice.. We get the following result λ1 0 25. This is why we in (4. it is expected that using σ k as the penalty factor.   \$&&& \$ ¦ & ¥ ¤ r r ¦  .  T T \$ T ¦ σk ξσk  ¦ T & ¦ where the solution is λ2 0 75. λ2 0 25 and λ3 0 5. j 1 3. Again σ \$ s ¦  s ¥ ¤R & ¥ ¤R ¦ T ¥ ¤R f2 x 1 f4 x 1 c2 x 0 r λ1 λ2 σ   & ¦ r q \$ ¦ o At x 0 0 T the penalty factor can be found by solving this system 0 1 σ ε will trigger a (4. We see that σ σ ε would trigger a shift of stationary point in the primal problem. where we use. compared with the previous system. that is based on the previous examples. A new update heuristic can now be made. have replaced a column in A T and a row in y. however. still connected to the problem being solved through σ k . could be close to xk 1 and hence the CMINMAX algorithm will use many iterations to converge to the solution.C HAPTER 4. C ONSTRAINED M INIMAX 63 2 0 T . that the estimation of the penalty factor σ can be obtained by solving a dual problem. this can be exploited by CMINMAX. At x three active functions f j x . The above heuristic guarantees to trigger a shift of stationary point. in the part of the algorithm where we update σ. e. but the iterate x k found. The above heuristic is.

64 C HAPTER 4. C ONSTRAINED M INIMAX .

The discussion will have its main focus of the ∞ norm. have a signiﬁcant inﬂuence on the performence. 781]. ¦ \$ ¢( ` V¦ " ² ` and ρ ξ2 indicate a very successful iteration V ρ ξ2 \$ x( ` V¦ " S ρ ξ1 ` 1. but is a common trademark for all trust-region methods [CGT00.1) ξ 1 indicate a successful (5. Finally if the iteration is very successful the trust region radius is increased. If the iteration is in S we have a successful iteration and the trust region radius is left unchanged if γ 2 1 or reduced. In the following we will look closer at three update strategies. The gain factor is a measure of how well our model of the problem corresponds to the real problem. The measure of performance is usually the gain factor that was deﬁned in (3. decides whether to increase. p.12). Practical tests performed on the SLP and CSLP algorithms suggest that that the choice of both the initial trust region radius and its update. In fact this holds not only for SLP and CSLP. and ﬁnally we look at good initial values for the trust region radius. All basic trust region update strategies implement a scheme that on the basis of some measure of performance.2) (5.3) £ e p p . and will trigger a reduction of the trust region radius. ηnew where 0 iteration γ1 γ2 We see that it always hold that V S . Further we will also discuss the very important choice of parameters for some of the trust region strategies. An iteration that is neither in V or S is said to be unsuccessful. decrease or keep the trust region radius. we have that ρ p ¥ \$ É6 o ` q \$ q \$  È6 ¥ \$ o  η∞ [γ2 η η [γ1 η γ2 η if ρ if ρ if ρ ξ2 ξ 1 ξ2 ξ1 (5. In the above framework.65 Chapter 5 Trust Region Strategies In this section we will take a closer look at the various update strategies for trust regions. and on basis of this we can write a general update scheme for trust regions.

For the ztran2f problem we get the different result.75.85 0. The result is shown in ﬁgure 5. the only approach to this question is experimental”. The above factors were used in the SLP and CSLP algorithms and were proposed in [JM94] on the basis of practical experience.95 0. we give an example for the Rosenbrock and ztran2f test functions.9 0.6 0.5 0 0.1 0. The ﬁgure clearly shows jumps in η new η when it passes over 0. low values are indicated by the darker areas.65 0.66 C HAPTER 5.1: The classical trust region update in (5. with γ 1 γ2 0 5. 1 0. chapter 17] they give this comment on the determination of factor values: “As far as we are aware. (left) Rosenbrock.3 0. hence the expression discontinuous trust region update. T RUST R EGION S TRATEGIES The above description gives rise to the following classical update strategy. that ξ 1 0 3 0 5 q & \$ & 76 o q \$ & j6 o q \$ o ¦ q & \$ j6 o q \$ & H¦ o & & 0 0 25 0 75 1 ρ ¦ ¦ z & p  & ¦ &  & q ¦ & ¦ q & & ¦ & & & .75 0.55 0.25 and 0. (right) ztran2f.1 0.5 Figure 5.2 0. (5. PSfrag replacements 2 5 ηnew η 1 05 Figure 5. If the iteration is very successful the trust region radius is increase by 2 5.8 0.5 1 0. An illustration of the update strategy is shown in ﬁgure 5.3 0.95 0.6 0. The number of iterations is calculated as a function of ξ1 and ξ2 and x0 1 2 1 T was used for Rosenbrock and x0 120 80 T was used for ztran2f.25 and 0.1.7 0.4 0.7 0.65 0.9 0.4) gives rise to jumps in ηnew η across 0. if(ρ 0 75) η 2 5 η. The factors are most commonly determined from practical experience.2: The number of iterations as function of ξ1 and ξ2 .2 0. It is seen from the ﬁgure that the optimal choice is ξ 1 0 0 1 and ξ2 0 8 1 for the Rosenbrock function.4 0.4) elseif ρ 0 25 η 0 5η. To clarify that the optimal choice of ξ 1 and ξ2 is very problem dependent.85 0.8 0. In [CGT00.5 0 0. because the determination of these values have no obvious theoretical answer.75 0.75. ξ1 0 25 and ξ2 0 75.55 0.2.

and that [JM94] claimed that the above parameters was optimal based on the cumulated experience of the writers.85 0.009 The ﬁgure shows that ξ1 0 25 and ξ2 0 75 is not a bad choice.4 0. We have already in the previous seen an example of a discontinuous update strategy.9 0. The positive result is the motivation for also trying this update strategy in a minimax framework. 0. Without going in to details. If the Marquardt step fails µ is increased with the effect that f x µI becomes increasingly diagonal dominant. that was proposed for 2 norm problems in connection with the famous Marquardt algorithm. Calculated on basis of a set consisting of twelve test problems.3: The optimal choice of ξ1 and ξ2 with γ1 γ2 0 5 and the multiplication factor 2.015 0.013 0. T RUST R EGION S TRATEGIES 67 and ξ2 0 85 1 . We will now look at an continuous update strategy.5 0 0. the ﬁgure seems to indicate that the choice is not the most optimal either.3. a Marquardt step can be deﬁned as We see that µ is both a damping parameter and trust region parameter. The ﬁgure was made by taking the mean value of the normalized iterations.1 0. The result is shown in ﬁgure 5.65 0.75 0.6 0.017 0. The aim with the experiment was to see whether or not ξ 1 0 25 and ξ2 0 75 was optimal when looking at the whole test set.C HAPTER 5. ¥ ¤ R ¦ H ¥  c¥ ¤ RR  #¥ ¤ RR ¤ f x µI hm f x (5. Further. which can be interpreted as the test problems having different optimal values of ξ1 and ξ2 .95 0.5 0.01 Figure 5. However.55 0.5 for a very successful step ﬁxed. that the above choice of parameters is bad.3 0.2 0.7 0.011 0. the above result is also dependent on the starting point and choice of γ1 and γ2 . Further the ﬁgure shows a somewhat ﬂat landscape.012 0.016 0. 1 0. This has the effect that the steps are turned towards the steepest descent direction.5) & ¦  C C £ & ¦ & ¦ ¦ & ¦ q \$ & Ê6 o & ¦ .014 0. It is important to realize that the plot was produced only from twelve test problems. and hence we resort to experience.5. An experiment has been done where γ1 γ2 0 5 and the multiplication factor for a very successful step was 2. Therefore we can not conclude. The conclusion is that the continuous update has a positive inﬂuence on the convergence rate. This illustrate how problem dependent the optimal choice of parameters is.8 0. An update strategy that gives a continuous update of the damping parameter µ in Marquardt’s algorithm has been proposed in [Nie99a]. 5.1 The Continuous Update Strategy We now move on to looking at different update strategies for the trust region radius.

68

C HAPTER 5. T RUST R EGION S TRATEGIES

When µ is increased by the Marquardt update, the length of h m is reduced as shown in [MNT99, 3.20] that hM min L h where L h is a linear model. Further [FJNT99, p. 57] shows that for large µ we have 1 hm µ f x . Therefore µ is also a trust region parameter. Lets look at the update formulation for Marquardts method as suggested in [Nie99a] and described in [FJNT99]. if ρ 0 then x: x h µ : µ max 1 γ 1 β else µ : µ ν : ν : 2 ν

where p has to be an odd integer, and β and γ is positive constants. The Marquardt update, however, has to be changed so that it resembles the discontinuous update strategy that we use in the SLP and CSLP algorithms. η

else η

η ν; ν

2 ν;

We see that this change, does not affect the essence in the continuous update. That is, we still use a p degree polynomial to ﬁt the three levels of η new η shown in ﬁgure 5.1. An illustration of the continuous update strategy is shown in ﬁgure 5.4. PSfrag replacements 2 5 ηnew η

05

The interpretation of the variables in (5.7) is straight forward. 1 γ controls the value of the lowest level value of ηnew η, and β controls the highest value. The intermediate values are deﬁned by a p degree polynomial , where p has to be an odd number. The parameter p controls how well the polynomial ﬁts the discontinuous interval 0 25 0 75 , where η new η 1 in the discontinuous update strategy. A visual comparison between the discontinuous and the continuous update strategy is seen in ﬁgure 5.5. The left and right plots show the results for the Enzyme and Rosenbrock

¦ q

q & \$ & o

q

&

q

&

0

0 25

0 75

1

ρ

C

C  C

C

1

Figure 5.4: The continuous trust region update (solid line) eliminates the jumps in ηnew η across 0.25 and 0.75. Values used, γ 2, β 2 5 and p 5. The thin line shows p 3, and the dashed line shows p7 .

¦

z

I ª( ¥ H \$

q

 ¦ q ¦ ¤ U¥ H ° \$ " ¤ G  ¦



ifρ

0 then η min max
1 γ

1

β

1 2ρ

1

p

β ;ν

¦

( ¥ H

h hM

 ¦  ¦ ¤ U¥ H gH \$ q " ¤  ¦ 



¦

q

¥ SR  { ¤ ¥ ¤ & &

1 2ρ

1

p

; ν:

β

(5.6)

2;

(5.7)

C HAPTER 5. T RUST R EGION S TRATEGIES

69

test functions. The continuous update strategy show superior results iteration wise, on the Enzyme problem, but shows a somewhat poor result when compared to the discontinuous update for the Rosenbrock problem. The results are given in table 5.1.
10
0

10

2

10

1

F(x) − discontinuous η − discontinuous F(x) − continuous η − continuous

10
10
−1

0

10

−1

10
10
−2

−2

10

−3

10

−3

F(x) − discontinuous η − discontinuous F(x) − continuous η − continuous 20 40 60 80 100 Iterations 120 140 160 180

10

−4

0

10

−5

0

10

20

30 Iterations

40

50

60

Figure 5.5: The continuous vs. the discontinuous update, tested on two problems. Left: The Enzyme problem. Right: The Rosenbrock problem.

From the ﬁgure we see that the continuous update give a much smoother update of the trust region radius η for both problems. The jumps that is characteristic for the discontinuous update is almost eliminated by the continuous strategy. However, the performance of the continuous update does not always lead to fewer iterations. Iterations Parabola Rosenbrock1 Rosenbrock2 BrownDen Ztran2f Enzyme El Attar Hettich (5.4) 31 16 41 42 30 169 10 30 (5.7) 33 20 51 47 30 141 8 28 (3.13) 31 16 41 42 30 169 10 30 (5.10) 23 24 186 38 20 242 10 19

Table 5.1: Number of iterations, when using different trust region update strategies, indicated by there. The update strategies are indicated by there equation number.

The continuous vs. the discontinuous update have been tried on the whole test set, and there was found no signiﬁcant difference in iterations when using either. This seems to suggest that they are equally good, still, however, one could make the argument that the discontinuous update gives a more intuitively interpretable update.

5.2 The Inﬂuence of the Steplength
In the latter general formulation (5.1) of the trust region we saw that the step length s k had no inﬂuence on the radius of the trust region. This is not reasonable, when realizing that

70

C HAPTER 5. T RUST R EGION S TRATEGIES
1

One could make the argument that an update strategy that includes s k will be more correct on a theoretical level. If the step does not go to the edge of the trust region then ρ k is not a sk . measure of trust for the whole region, but just for a subregion that is deﬁned by η sub Figure 5.6 tries to illustrate why it is reasonable to include s k in an update strategy. The subregion is illustrated by the dashed border.

Figure 5.6: When the step sk does not go to the border of the trust region, then ρk is only a measure of trust for a subregion, illustrated by the dashed lines.

All manipulation of the trust region should then be performed by using the subregion. Here is a more formal deﬁnition [CGT00, p. 782]

where 0 α2 1 α1 . We saw in the previous numerical experiment regarding SLP and CSLP, that when the algorithms approached a strongly unique local minimum, then the trust region radius grew. An illustration of this effect is shown in ﬁgure 3.4. It is exactly effects of this kind that (5.8) aims to eliminate, because there is no theoretical motivation to increase 0. the trust region when sk The factors α1 and α2 should as above, be determined from practical experience. In [CGT00, (17.1.2)] they propose to use the values

They do not say on which type of functions this experience was gained, so it could be any kind of problems. A test was done to see if the above factors is suitable for our minimax test problems. We have examined the average amount of normalized iterations as a function of α1 and α2 in an area of 0 2 around the point suggested in (5.9). The result is shown in ﬁgure 5.7. The result show that the values of α1 2 5 and α2 0 25 is not a bad choice, however, the ﬁgure seems to suggest that a small decrease in iterations could be made by choosing other values. However, this reduction does not seem to be signiﬁcant.

&

&

¦

¦

&

&

¦

¦

α1

2 5 and α2

0 25

\$ ¥ \$ Â6 o

p

¨ ¨

& Ã

 6

ηnew

`

( \$ ¨ ¨

max α1 sk η α2 sk ∞

η

if ρ if ρ if ρ

ξ2 ξ 1 ξ2 ξ1

¨ t¦ ¨ 

¦ 

the gain factor ρk is calculated upon a sample xk determines an increase or decrease of the radius.

xk

sk and is the only parameter that

"



(5.8)

2 ¨ ¨

p

p

p

(5.9)

C HAPTER 5.4 0.1. Finally we give an example of the step update strategy in (5. We can expect that such a strategy will be more dynamic and therefore be able to reduce its trust region more quickly than the discontinuous update. The only exceptions are the Rosenbrock and the Enzyme test problems. but for a problem like Rosenbrock this property of the step update gives rise to a lot more iterations.15 0.15 71 0. where the performance is signiﬁcantly more poor than for the other update strategies. a problem of designing an electrical circuit.1. T RUST R EGION S TRATEGIES 0. 5 10 15 20 25 30 10 −3 0 The ﬁgure shows that the step update is able to reduce the trust region radius faster than the discontinuous update.12 0. This gives nice results for the ztran2f problem.05 2.1 0.g. 2.8) tested on the ztran2f problem. 10 2 Steplength update Discontinuous update 10 1 10 0 10 −1 10 −2 Figure 5.3 0.45 0.8 shows the step update strategy vs.4 2.7 Using the above factors yields the following update strategy ηnew The strategy has also been used on the test problems and the results are given in table 5. See table 5. Figure 5.14 0.13 0.3 The Scaling Problem The scaling problem arise when the variables of a problem is not of the same order of magnitude. We see that the performance is better when looking at the number of iterations compared against the other update strategies.7: The average of the normalized iterations as a function of α1 and α2 . the discontinuous strategy on the ztran2f problem.10) .8).25 0.8: The discontinuous update strategy vs. because every update involves the step length s k . the step update strategy in (5.1 Figure 5. where one variable is measured ¨ ¨ & p \$ ¥ & \$ & Á6 o & ` ( \$ ¨ ¨ & " max 2 5 sk η 0 25 sk ∞ ∞ η if ρ if ρ if ρ 0 75 0 25 0 75 0 25 ¨ ¨ &  6  (5.5 2.3 0. Twelve test problems was used in the evaluation.11 0. 5.35 0. e.2 0.6 2.

as a pre process step. In the non diagonal case. T RUST R EGION S TRATEGIES in Farad and the other in Ohm. We have a model Mk xk h of the problem we want to solve.14)  ¦  . The value in Farad would be of magnitude 10 12 and Ohm would be of magnitude 106 . Sk could be determined from a eigenvalue decomposition of the Hessian H. a user could supply the algorithm with the scaling matrix S.13) We can then write a linear model of the problem where f xk Mk xk ¥  ¤ ¥ ¤ #¥ ¤  #¥ ¤ ∇x f x k h h (5. However we will in the following give a short outline on how such a rescaling could be made internally in an algorithm. that not alone is an indication of bad scaling. however. This is not an unreasonable assumption. p. Now we wish to calculate the norm of a step h k rk ck In the implementation of the SLP and CSLP algorithms there is an underlying assumption that the problems are well scaled. but would also introduce severe numerical problems.72 C HAPTER 5. however it could also be something different from than a diagonal matrix. 162].15) ¥ ¤ f xk  ¦ w¥ ¦ ¦ ¦  ¤ s Mk xk w s Mk gs w k ∇x f x k Sk w \$ ¥ ¤ ¦ §¥ ¤ ¦ E¥ ¤ ¦ \$ '¥ ¤ ¦ E¥ ¤ s Mk xk f xk gs k s ∇w Mk xk ∇w f s 0 & ¥ ¤ s 4 E¥  &  ¤ ¦ r  ¿¥ ¦  ¤ ¥  ¤ ~ 6 ¨ ¦ ¨ ¨ ¨  }¨ ¨ We see that rk sion then hk 2 1012 and ck 10 24 . This is a huge difference. because it is expected that qualiﬁed personal is able to do such rescaling of their problems fairly easy. depend on the current iterate if the problem is non linear and have local bad scaling. A scaling matrix could. because the scaling matrix is only given in the pre process stage and never changed again. All numerical calculations inside an algorithm should be done in a scaled space. Because of bad scaling we re-scale the problem by transforming the problem into a scaled space denoted by superscript s s Mk xk w f x k Sk w fs w (5.12) 0 10 12 Here the subscript k has been omitted. One such problem is loss of information [CGT00. 106 0 S (5. We have that c k is measured in Farad and rk is measured in Ohm. In this way the information from c is lost! rk k ¨  ¨ ©¦ ¨ ¨ ¨   Q¨ ¨  ©¦ ¨ ¨ ¨ hk 2 rk ck 2 rk 2 ck 2 ∇ x f x k Sk (5. so a scaled variable w is needed (5. In this case the scaling would also be a rotation of the scale space. and if the computer is limited to 15 digit preci2 .11) Sk w h The scaling matrix Sk IRn n is a diagonal matrix when only a scaling parallel to the axis are needed. From the previous example.

& Q¥  ¤ Ë © Ì © min Mk xk h \$ Q¥  ¤ s min Mk xk w (5. then we will get a scaling and rotation of the scaled space.17) . it is still a square in the scaled space. Middle: The unscaled space and the scaled trust region.9. however. Right: The scaled space and the scaled trust region. This is shown in ﬁgure 5. In the scaled space the trust region subproblem is deﬁned as w ∞ η and in the original space deﬁned as S 1h ∞ η We see that the trust region in original space is no longer a square but a rectangle.C HAPTER 5.9: Left: The unscaled space and the unscaled trust region. T RUST R EGION S TRATEGIES 73 xk xk wk PSfrag replacements PSfrag replacements PSfrag replacements Figure 5. If S k is a symmetric positive deﬁnite matrix with values off the diagonal.16) Ë © © (5. The trust region is affected by the transformation to the scaled space and back to the original space.

T RUST R EGION S TRATEGIES .74 C HAPTER 5.

but is in fact both well structured and well commented.75 Chapter 6 Linprog Matlabs optimization toolbox offers a wide range of tools to solve a variety of problems. however. This is good seen from a maintenance perspective.g. that in this way code redundancy i. For instance.1. supplemented with small tutorials. Still. and further it also contains tools for quadratic and linear programming. 2-norm and minimax. Just to list a few of the possibilities this toolbox offers. than they ought to be. 6.1) 18-May-2001 Currently a newer version 2.e.com/products/optimization/ ½ ½ ½ Linear and quadratic programming. the amount of repeated code. the algorithms may at ﬁrst sight seem more logical complex.1 How to use linprog Linprog is a program that solves a linear programming problem. The toolbox offers many exciting possibilities and have many basic parts that one can use to build new algorithms. that is bounded by linear 1 http://www. many algorithms have branching (if.2 of the toolbox is available at MathWorks homepage 1 .22 that is the linear solver in Matlab’s optimization toolbox. . A linear programming problem seeks to minimize or maximize a linear cost function. The tools span form general to large scale optimization of nonlinear functions. This chapter will focus on the program linprog ver. with constraints A ﬂexible API that ensures that all kinds of parameters can be passed directly to the test problems. A straight forward explanation is. is reduced to a minimum.mathworks.) that depends on the calling algorithm. The version of the optimization toolbox investigated here is Version 2. switch etc.1 (R12.1. Nonlinear optimization e. A good overview and documentation of the toolbox is given in [Mat00] that covers the basic theory. This chapter is build like this: bla bla. At ﬁrst. the source code in the toolbox can seem overwhelming.

fval. so that we can formulate the above problem as where f ∇F x is a vector.options) [x. to solve an optimization problem on the form x 6 \$ min fT x x IRn e ¦ e minx fT x subject to \$&&& d\$ \$&&& '\$ e ¦ ¦ \$ \$ \$ ci x 0 ci x 0 a x b i i 1 me 6 \$ ¥ ¤ min F x x IRn me m e e e Q¥ ¤ ¦ §¥ ¤ Ax b Aeq x beq lb x ub ¥ ¤ ¦ . that the solution to the LP problem in the previous iteration could provide the Simplex algorithm with a good warm start for the next iteration.76 C HAPTER 6. has shown that even if we supply it with a very good starting point. i. For problems that are medium scale. In the latter we assume that the problems are not degenerate. by using the following syntax x = linprog(f. Tests of linprog for medium scaled problems. and hence there is a good change.. When we are near the solution to the nonlinear problem..lb. It can take a starting point supplied by the user.x0. This means that the feasible region of an LP problem is convex. and the gradients of the constraints A eq and A are matrices. the solution of the LP problem must exist at a extreme point or a vertex of the feasible region. The constraints in linear programming can be either equality or inequality constraints. all the constraints should be linearized by a ﬁrst order Taylor approximation. where a linear subproblem is solved in each iteration. and since the cost function is linear.b.beq.2 Why Hot Start Takes at Least n Iterations Simplex algorithms have a feature that the Interior Point method does not have. a so-called warm start. The use of warm start.exitflag. The above problem can be solved by linprog. x subject to where a and b are lower and upper bounds. In this section we investigate why this is so. linprog uses an active set method. L INPROG constraints. we can expect that the active set remains the same. and hence terminate quickly.lambda] = linprog(.) 6. it will use at least n iterations. has the great advantage that if the starting point is close to the solution then the Simplex algorithm will not have to traverse so many vertices in order to ﬁnd the solution. and hence linprog can solve the following type of problem.output. implemented as qpsub in the optimization toolbox.ub.e. A warm start is also a good idea for nonlinear problems. The above problem is non-linear so in order to get a linear programming problem.A.Aeq.

e. simplex_iter = 1. In linprog at line 171 the call to qpsub is made. . one can reduce the number of iterations that linprog uses to ﬁnd the solution. and the solution as warm start. The Fourier series problem. . hence the rest of this investigation will have its main focus on qpsub. and the ordinate show the number of iterations. This will become important later on. it would still use n iterations to solve the problem. The abscissae show the dimension of the problem. because simplex_iter equal to one is the only way to get out of the main while loop by using the return call at line 627. L INPROG subject to Ax b Aeq x beq lb x ub 77 When solving medium scaled problems. i. end end e ¦ e e T Figure 6. 318 . The title of this section includes the words hot start. however this is not what happens. In this way.1. 592 . that not only is the supplied x close to the solution. As part of the initialization of qbsub the variable simplex_iter is set to zero. 50 . Even if linprog was used with hot start. One could then expect linprog to terminate after only one iteration. giving it a good start point that may be close to the solution x .C HAPTER 6. linprog uses an active set method for medium scale problems.1: The Fourier series problem solved with linprog using a guess as cold start. and by this we mean.1. 120 Cold start Warm start 100 80 60 40 20 0 0 10 20 30 40 As mentioned. . In the following we investigate what happens in qpsub when we supply linprog with a hot start. but it is not linprog that solves the problem. it is in fact the solution. Instead linprog calls qpsub to do the actual calculations. revealed an interesting aspect of the hot start problem. linprog offers the possibility of using a supplied x as a way to warm start the algorithm. this is illustrated in Figure 6. illustrated in the introduction of this report. . It is seen that linprog uses at least n iterations when supplied with a hot start. 744 while iterations < maxiter if ACTCNT == numberOfVariables .

then the simplex_iter ﬂag is set to one. where GSD is the gradient with respect to the search direction.g. In this way.1. ACTCNT == n-1. If the following condition11 is satisﬁed. this distance is zero. If we assume that a feasible solution do exist. then at least n elements in cstr is zero. The vector cstr is used for calculating the distance dist. Example 4. where n in qpsub is stored in numberOfVariables. The vector indf contains all the indexes in GSD that are larger than some small value. a hot start has the affect that X does not move. Because of Bland’s rule. then maxiter would have been set to a ﬁnite value. due to the deﬁnition of the constraints.4]. a singular working set is avoided. not crucial to ﬁnd all distances to the constraints that are equal to STEPMIN. if ACTCNT == numberOfVariables . If the caller of qpsub had been different. Further.78 C HAPTER 6. see [Nie99b. The algorithm now select the nearest constraint [STEPMIN. which is natural because X is the solution. X = X+STEPMIN*SD. that is calculated at each iteration in the main loop at line 341 dist = abs(cstr(indf). It is. This is a normal procedure also known as Bland’s rule for anti-cycling. simplex_iter = 1. So at the n 1 iteration. In each iteration. If this was not done. b is scaled and the active set ACTCNT is initialized H ¦ H . the system Ax to the empty set. at line 587–595. the variable ACTCNT that hold the number of active constraints is increased with one. The cstr vector is recalculated at line 549 in the main loop each time a new X is found. which have the result that the new X is equal to old X. it would be possible to have constraints parallel to the search direction. The constraint violations cstr Ax b is found as part of ﬁnding an initial feasible solution at line 232–266. As we shall see later. however. A step in the steepest decent direction is made at line 365. and the rest is discarded. Because of the hot start STEPMIM is zero. because it is set to inf.ind2] = min(dist). ind2 = find(dist == STEPMIN). ind=indf(min(ind2)). One may notice that the way ind2 is found is not ideal when thinking of numerical errors. it must be the case. See the comments in the code at line 330–334. This ensures that we leave a degenerate vertex after a ﬁnite number of iterations. The dist vector is then used to ﬁnd the minimum step STEPMIN. then cstr must contain elements that are less or equal to zero. This is seen at line 346 where the constraint with the smallest index is selected. So ind2 holds the indexes to the nearest constraints. L INPROG We can not use maxiter to break the while loop. In this way the Simplex algorithm avoids being stuck in an inﬁnite loop./GSD(indf)). e.. end e In the initilization phase. and because of the hot start. only one constraint becomes active in each iterations. that when qpsub is supplied with a hot start. a large scale algorithm.

Figure 6. see line 640–641. where f1 x 20x1 10 T and f2 x 1 0 T. ¦ The ﬁgure shows the LP minimax landscape for Rosenbrock’s function evaluated in x 1 2 1 T . of using Lagranian multipliers to deﬁne the active set needed for the corrective step. Hence the results indicate that the active set iterations are cheaper than the simplex iterations. where the trust region radius is varied with η 0 2 0 3 0 5 . This can give rise to an inﬁnite number of solutions.01 0.2 shows the results from a runtime experiment done on linprog. This happens when the active functions have contours parallel to the coordinate system.015 0. by using the linear Fourier problem described in appendix A. Figure 6. The algorithm will then terminate after doing n 1 active set iterations and at least 1 simplex iteration. where the solution is found by traversing the edges of the problem. that reduces the cost function and hence all the all the Lagrangian multipliers are non negative. 6. Because of the hot start. can fail when using linprog ver. the algorithm enters the simplex iterations at line 595–629.3 Issues regarding linprog. 0.2: Timing results from linprog when solving the linear Fourier problem.3 shows the three situations.025 0. that when using hot start.C HAPTER 6. The abscissae shows the number of variables in the problem.03 Cold Start Warm Start 0. H H q \$ Ht5¥ ¤ R o ¦ q Ht§¥ ¤ R o ¦ ( & \$ & \$ & " q \$ & o H . which is illustrated in the following. 20 40 60 80 100 120 0 0 The results clearly show. 1. When using hot start this will happen at the n iteration. The execution then return to linprog where the output information from qpsub is given the right format. there is no edges to traverse. L INPROG 79 1 When simplex_iter == 1. The strategy. ¦ H then the simplex_iter ﬂag is set to one. qpsub does n 1 active set iterations and at least one simplex iteration. When using a hot start. and in each case the solution is analysed. and the ordinate shows the average time per iteration.23. the average time per iteration is lower than that for cold start. which indicate that the solution has been found.02 0. We show three examples from linprog.005 Figure 6.

that both the multipliers for 1 x and 2 x are larger than zero. In this way the multipliers will indicate that both functions are active at x. as illustrated in Figure 6. Left: η 0 2. For instance. The dashed line in Figure 6. but not a strongly unique \$ h( ¦ §¥ ¤ U V¦ £ " A j x α ¥ ¤ £ ¥ ¤ £ For η 0 5 the sollution h 0 464 0 500 T is unique and both functions and the lower bound of x2 are active. 0 30 0 09 T is not unique either. in fact the above behavior is in good alignment with the theory. according to the linprog multipliers. The added linear constraints from the bounds. So the solution found is of course a stationary point. says that there is an inﬁnity of solutions. Both 1 x and the upper For η 0 3 the sollution h bound of x1 are active.3 middle. Middle: η 0 3. when evaluating a stationary point. have to be taken into consideration. This means that we extend the problem we want to solve.3 indicates an inﬁnity of solutions. The ﬁgure shows that for the ﬁrst case η 0 2 we have that the null vector is part of the convex hull. It is then strange that the multipliers do not indicate the same – or is it? Turning to the theory of generalized gradients. From line 74–82 in qpsub we see that the bounds are added to the problem as linear constraints. Both 1 x and the upper bound on x1 is active according to the λ multipliers from linprog. The example shows that linprog ver.3 middle. For η 0 2 the solution h 0 2 0 0 T is not unique.23 not always deliver the multipliers one would expect. The reason for this prior expectation is found in the deﬁnition of an active function j where both 1 x and 2 x would be active. 1. L INPROG Figure 6.  D   C 1 ¥ ¤ £ ¥ ¤ £ ¥ ¤ £ q & ¥ ¤ £ q & c\$ & ¦ H o ¦ & c\$ H q & \$ & p¦ o & ¦ o  C ¥ ¤ £  C ¥ ¤ £ & & & ¦ ¦ ¦  C x. It is then seen that in all three cases the stationary point condition is fulﬁlled.4. The dashed lines indicate an inﬁnity of solutions. we see that this is not so strange at all. because the null vector is interior to the convex hull of the generalized gradient. But ﬁrst we look at how qpsub deals with bounds on the variables. it could be expected for a situation as the one illustrated in Figure 6. Right: η 0 5.80 C HAPTER 6. 2 x . The dashed line in Figure 6.3: Plot of the minimax LP landscape of the Rosenbrock function in x 1 2 1 T . and in turn our solution will be conﬁned to the bounded area.

shown as the arrow that is not horizontal. Right: η 0 5. H \$ e h( F H g¥ ¤ £ c V¦  " A j x α ε  C ¥ ¤ £ ¥ ¤ £ ¥ ¤ £ & ¦ ¥ ¤ £ ¥ ¤ £  C & ¦  C @ 9 3 . This also explains why we have an inﬁnity of solutions along the dashed line in Figure 6. 6. local minimum. the point found would still be stationary. the optimization toolbox offers the possibility of using an interior point method. This seriously hamper the performance of the CSLP algorithm. the null vector is interior to the convex hull of the generalized gradient.C HAPTER 6.4 middle. Basically the situation is then the same as for the previous case. is only weakly active i. in this case there would only be one active function.4 Large scale version of linprog For medium scale problems linprog uses an active set method to ﬁnd n 1 active constraints at the edge of the feasible area. The gradient from 2 x shown as the arrow that is not horizontal. a function is active. Hence. This means that if we removed it. As seen in Figure 6. the case corresponding to the one shown in Figure 6. 2 x and the lower bound on x2 are positive. and the corrective step would not be calculated correctly. As seen in Figure 6. not feasible to use such a method for large problems. and hence the corrective step would be the null vector. This is why the multiplier for 2 x is zero in this case. and hence the multipliers for 1 x . L INPROG 81 Figure 6. was ﬁrst noticed during the work with the corrective step. It is.3 middle.4: Left: η 0 2. Middle: η 0 3. In the last case η 0 5 the solution found is not just a stationary point but a strongly unique local minimum. and we again have an inﬁnity of stationary points along the dashed line shown in Figure 6. For a problem like Rosenbrock. It was on this basis that it was decided to use i to indicate whether or not. Even worse. and all multipliers are greater than zero. This behavior described above.3 middle. the gradient corresponding to 2 x . the null vector is part of the convex hull.e. has a damaging affect if the multipliers are used to indicate the active functions. however. The point found in the second case η 0 3 is also a stationary point. This is a strongly unique local minimum.’on’) . By using opts = optimset(’LargeScale’. In this case it would seem like 2 x was not active. because the null vector is not interior to the hull.4 right. its multiplier is zero. is only weakly active.3 left. and then use Simplex iterations to ﬁnd the solution.

and hence we will in the following regard it as a Simplex-like algorithm. often interested in the average case performance. and use a Newton step to ﬁnd the next iterate. large problems does not require many more iterations than small problems. is basically a Simplex algorithm. . chapter 7]. regardless of the size of the problem.6527 The table clearly shows that the number of iterations for the IP method is almost constant. We are. This is not surprising. whereas the Interior Point (IP) method is a polynomial time algorithm [HL95. In fact. The interior point method is implemented in lipsol (large scale interior point solver) in the optimization toolbox. and [HL95.2001 20.6372 7.3509 17. In turn. The IP method is also described in [Nie99b. when looking at the theory behind these two methods. however. and the number of iterations. to conﬁne the iterations to the interior of the feasible domain. and can therefore in some cases outperform the Simplex method that is conﬁned to the edge of the feasible area.6]. m 130 . When comparing the Simplex and the IP method. The IP method follows a central path trough the feasible domain. that algorithm will be set to use an interior point method. one must look at the amount of work per iteration. The Simplex method ﬁnds a solution that lies on a vertex of a convex set that deﬁnes the feasible area.1492 sec. The active set method (AS) as implemented in the optimization tool box by qpsub. The Simplex method is in the worst case an exponential time algorithm. chapter 4]. (AS) 0. (AS) 67 271 359 465 515 ∞. as shown in Table 6.1. we also see that the runtime results indicate that the IP iterations is more computational costly than the AS iterations. It decreases the cost function by traversing the edges of the hull going from one vertex to the next. L INPROG and passing opts to an optimization algorithm. chapter 4. Iter. The IP method uses a logarithmic barrier function. chapter 3] and somewhat sketchy in [HL95. it does not take as many iterations as the Simplex method. Here both the Simplex.2024 14. For a thorough introduction see [Nie99b.1: The Fourier problem solved as number of variables. Without going into any details. the main work of a simplex iteration involves solving two systems BT g cB and Bh C: s a ¦ 3 Table 6. where m is the number of constraints and n is the ¦ .2350 18. and the (IP) method has a polynomial runtime.4359 49. and it requires more computer time per iteration. 370 410 450 490 n 34 . (IP) 1.5.6553 38. 94 104 114 124 Iter. . The IP method is more complex than the Simplex method.1073 9.82 C HAPTER 6. chapter 4]. However. (IP) 15 13 11 16 16 sec. . .

we can not update the factorization. The inner workings of the Simplex method requires. instead we refer to [Nie99b. The above system can be solved by. First we investigate how well the AS and IP methods handles sparse vs.2. For the purpose of our discussion it is not important to know the exact details. In this way the basis B k 1 differs from Bk by only one column. that we move from one vertex to another until the solution is found. and is very dense. dense problems. The problem is described in detail in Appendix B. Therefore the following tests of the AS and IP methods are in reality tests of lipsol and qpsub. L INPROG 83 where B IRm m . Similarly. however. Again. just as an indication of the amount of work the methods do. when we use medium scale mode. \$ e ˆ min cT x subject to Aˆ x b £  ¦ AD2 AT hy fy AD2 fx   ¥ ¤ ~   ~ 6 ¥ ~ 6 ¤ . The minimizer of this landscape is a non linear KKT system. Since the tests were made on a non dedicated system. We have to note. it is not important for the discussion to go into any speciﬁc details. and we solve for hy . In the following. due to the nature of the problem. (3. using a factorization. an algorithm that is also stored in Matlab’s private directory. described in Table 6. that is approximated by a linearization. it uses an algorithm called lipsol that is stored in Matlab’s private directory. is the problem of ﬁnding the solution to the Laplace equations numerically. The results are shown in Figure 6. and in qpsub this is done by the two Matlab functions qrinsert and qrdelete. Hence the computational work in each iteration is greater than that of the Simplex method.5 and 6. The ﬁrst problem is the Fourier problem described in Appendix A. the timings are not precise.6. which is conﬁrmed by the results in Table 6.C HAPTER 6.22)] where AD2 AT IRm m .1. All problems were solved as Chebyshev ∞ problems x ˆ We have performed the tests by using two very different test problems. linprog will use qpsub. Still. This is done by moving one column in each iteration in and out of the basis B.g. in order to solve the problem. The Newton step can be reduced to the following expression [Nie99b. We have set up six test cases. that the timing results in the table should be looked upon. they provide a rough indication of the amount of work done. but unlike the Simplex method. by using a rectangular grid. we present some results of test performed on linprog using medium and large scale settings. Chapter 4]. 6 6 6 where A IR2m n 1 ..b ˆ IR2m and x IRn 1 . Chapter 3]. The minimizer of the linearized KKT system is found by using a Newton step. e. The second test problem. and is sparse. The Interior Point method has more work to do in each iteration. instead we refer to [Nie99b. This can be exploited quite efﬁciently by updating the factorization of Bk 1 . The IP method uses a logarithmic barrier function to create a landscape of the feasible region. When linprog is set to large scale mode.

84 Setting Medium scale Large scale Medium scale Medium scale Large scale Large Scale Problem Fourier Fourier Fourier Laplace Laplace Fourier n n n n n n n C HAPTER 6. If nzratio < 1 at line 647.e. using the AS method to solve the dense Fourier. and the sparse Laplace problem. i. i. Figure 6. Because we are solving the above as an IR2m n 1 . then lipsol checks to see if A contains any dense columns. For m 100 the sparse Laplace is solved much faster than the dense Fourier. At line 643 a check is performed and if m 500. The decrease in time to solve both the dense and sparse problem drops. we see that the AS method uses an equal amount of time to solve the dense and sparse problem.2: The test cases.. problem. using Matlabs implementation of the IP method lipsol.5 shows runtime results for test case 1 and 4. we have that A H H   ¦ ¦ ¦ ¦ ¦ ¦ \$'\$ &&& \$'\$ &&& \$&&& \$ \$&&& \$ \$'\$ &&& \$&&& \$ \$ \$ \$ \$ \$ \$ ¦ ¦ ¦ ¦ ¦ ¦ 1: 2: 3: 4: 5: 6: 3 3 3 3 3 3 5 5 5 5 5 5 301 301 151 301 301 151 n 4 n 4 2n 1 n n 2n 1  Ð ¦  Î Í . Runtime results for the Active Set method when solving a dense ’ ’ and a sparse ’ ’ problem. L INPROG m m m m m m m ∞ 10 2 Active Set Method 10 1 seconds 10 0 10 −1 PSfrag replacements 10 −2 Fourier Laplace 100 200 300 0 m 400 500 600 700 Figure 6. otherwise it is set to 1. Especially is the reduction for the sparse problem rather dramatic. If it does. For all m. the last y Ï p p £ 3 Table 6. We see that lipsol exploits the sparsity of the Laplace problem quite well. For ∞ problems.6 shows the result for test cases 2 and 5. on the dense Fourier. and sparse Laplace problem. that qpsub can take advantage of the sparsity in the Laplace problem.5: Test cases 1 and 4.2. which indicate that the sparsity is exploited by qpsub. For problem sizes where 0 m 100. Figure 6. we have that lipsol uses more time to solve the dense than the sparse case. then the ﬂag Dense_cols_exist = 1.. For m 500 something interesting happens. All timings are an average of three observations.e. The results clearly show. The only way to investigate this is to look inside the code for lipsol. a parameter nzratio is set to 0.

10 2 Fourier Problem 10 1 seconds 10 0 10 −1 PSfrag replacements 10 −2 Active Set Interior Point 100 200 300 0 m 400 500 600 700 Figure 6. All timings are an average of three observations Ï Ñ Ñ . The dense Fourier problem.C HAPTER 6.7 shows the timing results for the Fourier problem solved by the AS and IP methods. that the reduction in computation time is not that big for the dense Fourier problem when compared to the sparse problem. solved by the AS method ’ ’ and by the IP method ’ ’.6: Test cases 2 and 5. In the following we compare the dense and the sparse problems. which for the sparse Laplace problem is quite dramatic. Figure 6. If Dense_cols_exist = 1 then conjugated gradient iterations is performed at line 887 and 911. The Interior Point method tested on a dense ’ ’ and a sparse ’+’ problem. L INPROG 10 2 85 Interior Point Method 10 1 seconds PSfrag replacements 10 0 10 −1 10 −2 Fourier Laplace 100 200 300 0 m 400 500 600 700 Figure 6. column of A will always be dense. It is also interesting to note.5). All timings are an average of three observations.7: Test cases 1 and 2. lipsol is able to reduce the computation time. see (3. Because of those conjugated gradient iterations.

In [HL95. but still their computational time results are comparable. but in turn AS is faster when 300 m 500. while this can not be exploited in the IP method. The many constraints (large m compared to n) and the dense problem. the workload for both methods grows with the number of constraints. Further. When m 500 the IP method begins to use conjugated gradient iterations. Figure 6. just enough to make it up to speed with the AS method. Therefore it is interesting to see if the IP method (test case 6) will do as well as the AS method (test case 3). however. when 400 m 500 then the AS method seems to catch up with the IP method. Chapter 4] there is a discussion about the pro and cons when comparing the interior point method and the Simplex method. For m 300 IP is slightly faster than AS. Figure 6. This test corresponds to test case 3 and 6 in Table 6. They say that for large problems with thousands of functional constraints the IP method are likely to be faster than the simplex method. The ﬁgure shows that AS method is slightly faster than the IP method. Ð p  p p p  p Laplace Problem Active Set Interior Point 100 200 300 m 400 500 600 700 . solved by the AS method ’ ’ and the IP method ’+’. solved by the AS and IP methods. As mentioned in the previous discussion.9 shows the results from test case 3 and 6. and this gives a dramatic decrease in computation time for this kind of sparse problem.2. will have the effect that the IP iterations will take even longer than the AS iterations. L INPROG From the ﬁgure we see that the AS and IP methods uses nearly the same time to solve the problem. and this gives a slight decrease in computation time.8: Test cases 4 and 5.86 C HAPTER 6. but with even more constraints than before. All timings are an average of three observations. the Simplex method can use a smart factorization update.8 shows the timing results for the sparse problem. For m 500 the IP method uses conjugated gradient iterations. it is able to compete with the Simplex method for large sized problems. In fact. This seems to suggest that even though the IP iterations becomes more and more expensive as m grows. The sparse Laplace problem. 10 2 10 1 seconds 10 0 10 −1 PSfrag replacements 10 −2 0 Figure 6. The ﬁgure clearly shows that the IP method is the fastest method for the sparse problem. We have also tested the AS and IP methods with the Fourier problem.

We have seen how the conjugated gradient (CG) iterations could speed up the IP method for the sparse Laplace system.4.1 on page 38. In other words.3. All timings are an average of three observations. the performance of SLP and CSLP are not changed signiﬁcantly when using either medium scale or large scale mode.1 Test of SLP and CSLP In this section the SLP and CSLP algorithms are tested on a set of test functions that are given in Table 3. From the comments in lipsol it is seen that preconditioned conjugated gradients is used during the CG iterations. a thorough investigation of this.C HAPTER 6. The sparse Fourier problem.9: Test cases 3 and 6. where linprog is used with medium scale setting. 6. falls outside the scope of this thesis. However. This could maybe explain the dramatic reduction in computational time we get when solving the Laplace problem.3] for more information about Toeplitz and preconditioned conjugated gradients. When looking on the structure of A for the Laplace problem we notice that A has a block Toeplitz structure and perhaps this kind of structure is well suited for precondition. The tests are performed with linprog set to large scale mode. L INPROG 10 2 87 Fourier Problem 10 1 seconds PSfrag replacements 10 0 10 −1 10 −2 Active Set Interior Point 100 200 300 0 m 400 500 600 700 Figure 6. that are comparable with the result given in Table 3. The table shows results.7 and 10. Ð . solved by the AS method ’ ’ and the IP method ’+’. Instead the reader is referred to [GvL96.3 on page 88. Section 4. and the results are presented in Table 6.

88 C HAPTER 6. Unless otherwise noted. [ ]: Failed corrective steps.J x h 7(13) 11(14) 9(13) 15(20) 3(4) 3(4) 6(7) 26(48) 6(8) 5(7) 13(25) 13(16) 11(15) 29(41) 4(5) 4(5) 21(30) 34(62) 8(10) 13(23) Table 6.J x 9(13) 15(25) 24(43) 12(14) 12(14) 12(14) 17(28) 18(29) 18(29) 15(20) 29(41) 42(57) 3(4) 4(5) 5(6) 3(4) 4(5) 5(6) 6(7) 21(30) 30(43) 58(85) 61(89) 61(89) 7(8) 9(10) 10(11) 7(11) 18(33) 29(55) Test of CSLP . L INPROG Test of SLP 11(12) 18(19) 19(20) 19(20) 3(4) 3(4) 6(7) 179(180) 7(8) 8(9) 21(22) 31(32) 18(19) 18(19) 19(20) 19(20) 32(33) 42(43) 4(5) 5(6) 4(5) 5(6) 21(22) 30(31) 184(185) 185(186) 9(10) 10(11) 19(20) 26(27) Test of CSLP . ( ): The number of function evaluations. Column 5–7: Attempted corrective steps. ­ ­ Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich 10 2 10 5 ­ ¥  ¤ 10 20(39) 13(16) 11(15) 36(52) 5(6) 5(6) 30(43) 35(63) 9(11) 21(39) ­ ­ ­ Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich 10 2 10 5 10 ­ ¥ ¤ ­ ­ Name Parabola Rosenbrock1 Rosenbrock2 BrownDen Bard1 Bard2 Ztran2f Enzyme El Attar Hettich 10 2 10 5 10 8 corrective steps - - 8 corrective steps 4[1] 10[1] 19[1] 5[4] 5[4] 5[4] 11[1] 11[1] 11[1] 4[0] 11[0] 14[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 8[0] 12[0] 32[6] 33[6] 33[6] 1[1] 1[1] 1[1] 5[2] 16[2] 27[2] corrective steps 5[0] 11[0] 18[0] 6[4] 6[4] 6[4] 4[1] 4[1] 4[1] 4[0] 11[0] 15[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 0[0] 8[0] 12[0] 22[1] 28[1] 28[1] 1[0] 1[0] 1[0] 3[2] 11[2] 19[2] 8  C C . η0 1 and ε 0 01.3: Results for SLP and CSLP with linprog set to large scale mode. Column 2–4: number of iterations.

The work showed that we do not need Q which is good. which is needed for our purpose. The theory of unconstrained and constrained minimax was presented in Chapter 2 and 4. Further the QR factorization should be rank revealing. Further it was shown that CSLP was cheap. The work of analyzing the CSLP algorithm was a motivation for learning more about the theory behind minimax optimization.3 showed that if a given problem was not regular in the solution. The insights gained from this analysis were used in the process of implementing CSLP.89 Chapter 7 Conclusion In this chapter we give a summary of the performed work. This is described in Chapter 3 and corresponds to method 1 in [Mad86]. The corrective step was described and analyzed in detail in Chapter 3. Both SLP and CSLP are well suited for large scale optimization. and studies of the performance in Chapter 3. the sparse version of qr does not do column permutations. and in Matlab. a more basic algorithm SLP was created. that it uses fewer iterations to get into the vicinity of a local minimum.3 and originally presented in [JM94]. SLP does not use a corrective step like CSLP. Studies of the CSLP algorithm showed. Hence we use the economy version of qr for full matrices. Q will be huge and dense. and is therefore more easy to analyze. and new insights and contributions were given. also . Large parts of the work presented in thesis is of theoretical character. In the process of implementing CSLP. Both. this comes as no surprise. CSLP showed in particular a good performance on problems that were highly non-linear. It was revealed. in the sense that it uses almost the same number of function evaluations as SLP. SLP and CSLP uses only ﬁrst order information. then only linear convergence could be obtained in the ﬁnal stages of convergence. The project started out with an implementation of the CSLP algorithm.2. The study of the theory was in itself very interesting. especially in the area regarding the penalty factor σ for exact penalty functions. as tests have shown. especially when the problem is large and sparse. From a theoretical standpoint. In order to ﬁnd the linearly independent gradients needed for the corrective step. we used a QR factorization. and comments on the insight gained through this project. that the estimation of σ was a problem of ﬁnding the distance to the edge of a convex hull. however. described in Chapter 3.

none of these depend on the problem being solved. used in most trust region algorithms. could be to ﬁnd extensions that could be used to identify an initial basic feasible solution in the Simplex algorithm. \$ h( e  H v¥ ¤ £ # V¦  " ALP j h α γ £ . by using neural networks. tests presented in Chapter 6 showed that linprog is well suited for large and sparse problems. C ONCLUSION known as the generalized gradient. has just as good a performance as the discontinuous update strategy. tweaked in the right way. In the process of describing and implementing the SLP and CSLP algorithms. we did not use the multipliers to ﬁnd the active set. The work with trust regions. e.90 C HAPTER 7.1 Future Work In this work We have covered many areas of minimax optimization.. and the source code can be found in Appendix D. The work with trust regions showed. that there are many ways to update the trust region radius. that this in fact corresponds to solving a dual linear problem. The drawback of this approach is that it requires the preset parameter γ. but there is still room for further studies. the theoretical work done with the penalty factor. All of them. Instead we used j where all the inner functions j close to α are regarded as active. 7. Further it was shown. However. are heuristics that have proven their worth in practice. showed that the continuous trust region update presented in [Nie99a]. An area of future research would be to ﬁnd a good heuristic to determine γ on the basis of the problem being solved. Finally. The algorithms presented in this thesis are implemented in a small toolbox. Last but not least. would be well suited for further research.g. One such research area. In the following we give some suggestions for future investigations. It would be useful to ﬁnd an update heuristic that was somehow connected to the underlying problem.

and solve it by simple linear algebra. H ¦ §¥ ¤ \$ & a a  ¤ t¦ x AT A 1 T \$ ¥ ¦ Ax b A b ~ 6 £ ¥ ¤ ¥ ¤ 6 6 ¦ Ò ¦ bp f t sin pωt dt ¥ ¤  Ò ¦ ap f t cos pωt dt (A. We can also ﬁnd the 1 and ∞ solutions if we solve a linear programming problem with an LP solver e. and we want to ﬁt it with a Fourier expansion. the least squares solution. we have a speciﬁcation of design.1) a0 ∑ a p cos pωt ∑ b p sin pωt 2 p 1 p 1 where the coefﬁcients can be found by 1 T 1 T d T d d T d In e.e. and setup a vector b IR m that consists of function evaluations of f t at the sample points. also ﬁnd these coefﬁcients by using the normal equations.. In the following we deﬁne a residual r x y Fx. which is given by ∞ ∞ 1 f t (A.91 Appendix A The Fourier Series Expansion Example A periodic function f t with a period of T 2π ω can be expanded by the Fourier series.g. If we arrange the coefﬁcients in a vector x IR n and set up a matrix A IRm n . by using x = A\b. where m is the number of samples. Matlab’s linprog.e. We can. however.g. Finding the Fourier coefﬁcients by using (A. Then we can ﬁnd the solution to the system by using the normal equations which ﬁnd the 2 norm solution. 1 x R1:n : QT1:n b : In Matlab this can be done efﬁciently for over determined systems. A more numerical stable way to solve the normal equations is to use QR factorization.2) h  q ¦  h  ¦ 5¥ ¤ £ ¥ ¤ £ £ ¥ ¤ . electrical engineering the periodic function f t can also be looked upon as a design criterion i.2) corresponds to the 2 norm solution i.

we see that minimizing the cost function x z min ∑ zi m H e´¥ ¤ ` ¥ ¤  H e ` H H Fx Fx y y Iz Iz ri x ri x zi zi min ∑ ri x m s H ¦ \$ s H ~ 6 H ¦ \$ s ¦ \$ s ~ 6 6 \$ \$ ¦ x ˆ c A H x z 0 e F F I I b r r r r where e ¥ ¤ £ To ﬁnd the 1 solution we set up the following primal system P min cT x subject to Ax ˆ ˆ b Ó A. z e y IRm and e is a column vector of ones and y is a vector with m samples of f t .1 The 1 norm ﬁt y y 6 ¥ ¤ £ . we have that the Lagrangian multipliers of the dual problem correspond to x. We see that a residual can be expressed by the above equations so that i 1 i 1 is exactly the same as minimizing the sum of the residuals. F IRn m and the unit matrix I IRm m . By using the above. Further. and we can get the exact same solution by solving a smaller dual problem. The solution to the problem of minimizing the largest absolute residual. The dual problem can be expressed as u Because of the symmetry between the primal and dual problem. as shown in [MNP94].92 A PPENDIX A. T HE F OURIER S ERIES E XPANSION E XAMPLE x ˆ where x IRn . We solve the following system x ˆ with the same dimensions as described in the former section. This is also the deﬁnition of the 1 norm. can also be found by solving a linear problem. By using the above we see that Fx y αe ri x α Fx y αe α ri x \$ s H & ¦ \$ s e´¥ ¤ e ´¥ ¤ H H H  ¦ \$ s e e H  ¦ \$ s H ¦ x ˆ c A H x α 0 1 F F e e b r r r r where e ¥ ¤ Ó A. The primal system is large.2 The ∞ norm ﬁt P min cT x subject to Ax ˆ ˆ b y y \$ e e H ¦ D max yT u subject to F T u 0 and e u h h x ˆ x ˆ z x e \$ ¥ ¤ i i s q o  min cT x ˆ min 0 e r ¥ ¤ \$&&& '\$ ¦ where i 1 m.

A PPENDIX A.7). If we deﬁne a residual function f x y Fx and minimize it with respect to the ∞ norm. then s ¥ ¤ ¥ ¤ H e s s H´¥ ¤ H H ´¥ ¤  e e ¥ ¤ v¥ ¤ H H ¥ ¤ c¥ ¤  f x f x J x ∆x J x ∆x α α J x J x e e ∆x α r & ( h¥ ¤ '¥ ¤ " H\$ r H ¦ §¥ ¤ r ¦ a £ £ f x f x . it is just two different ways to get to the same result. we are in fact minimizing the maximum absolute residual min α α max r x r x xα x The above system for ∞ is identical with (3. T HE F OURIER S ERIES E XPANSION E XAMPLE 93 So when minimizing the cost function.

T HE F OURIER S ERIES E XPANSION E XAMPLE .94 A PPENDIX A.

just to mention a few. one must remember to use sparse matrices. A simple grid of size m 2 (columns) and n 2 (rows) is illustrated on ﬁgure B.3) we get the following equations on matrix form When implementing this problem in Matlab. Information about the boundary condition of the solution region is given. One can test this. aerodynamics and electrostatic problems.    ÔÔ ¦   ÔÔ    H H H H H H 4 1 1 0 1 4 0 1 1 0 4 1 0 1 1 4 u1 u2 u3 u4 0 0 1 1 ¦ & ¥  \$ ¤ #¥ H \$ ¤ #¥ \$  ¤ #¥ \$ H ¤    ÕÕ ÕÕ ¨ ÕÕ ¦ H  H ÔÔ ¦ ¦ f¥ \$ ¤ 4u i j ui 1 j ui 1 j ui j 1 ui j 1 & ¦ ¥ H \$ ¤ c¥ \$ ¤  H g¥  \$ ¤  ¥ \$ H ¤ c¥ \$ ¤  H v¥ \$  ¤ ui 1 j 2u i j ∆x2 ui 1 j ui j 1 ¦  2u i j ∆y2 ui j 1 0 (B.1 and by using (B. then we can reformulate the above to this.1) ∂x2 ∂y2 which turns out to be somewhat difﬁcult to solve analytically. somewhat simpler expression We see that the above gives rise to the famous ﬁve point computational molecule.4) . there is an easy numerical solution by using a mesh. by using a ﬁnite difference approximation to (B. This is done by solving a linear system equations.95 Appendix B The Sparse Laplace Problem The Laplace equation can be used to solve steady heat conduction. The idea is now to use a grid of the solution area that contains m n points and to solve Au b. by using u = A\b for both dense and sparse matrices. because the sparsity can be exploited to conserve memory.2) (B. and to solve the Laplace system much faster than by using dense matrices. However. so all the mesh points has to be solved simultaneously.3) (B.1). If the grid is equidistant in the x and y direction. Formally we seek a solution to the Laplace equation ∂2 u ∂2 u 0 (B.

96

A PPENDIX B. T HE S PARSE L APLACE P ROBLEM

y

frag replacements u1 0

1

u

u2

u4

For the purpose of making a sparse test problem for linprog, we change the least squares problem Au b into a Chebyshev problem ∞ , by giving linprog the following problem

¦

¦

A visualization of the solution for a grid of size m conditions as in ﬁgure B.1 is shown in ﬁgure B.2.

20 and n

20 with the same boundary

e

6

where e

IRmn is a vector of ones, and solving minu cT u subject to F u ˆ ˆ ˆ

\$ s

H

¦ \$ s

H

H

¦

F

H

A A

e e

y

r

6 \$

£

r

6

The resulting system will have A

IRmn

mn

and u b

6 \$

~  

ÔÔ ¦

L

L

IRn

IRmn . In Matlab, we write A = kron(I,B) + kron(L,C);.

b b

& ~

1 0 . .. .. . . . 1 . 0 1 0

..

ÕÕ 



0

1

0 . . . .

6

ÔÔ



¦

6 \$

where B C

IRm

m

. Further, we have that I

IRn

n

is the unit matrix and

n

H 



ÕÕ  ~

H  Ø

0



H

..

. 1

1 4

0

0 1

y.

\$

ÔÔ ¦

\$

B

C

H

1 . . .

4 .. .

0 . . .

.

ÔÔ

..

. 1 .. .. . . 0

..

ÕÕ 



H

ÕÕ 



H

4

1

0 . . .

1

0

0 . . .

ÕÕ

ÕÕ

and

\$

× 

×

¦

A

I

B

×

For large grids we use the Kronecker product

Ô H ÔÔÔ ¦

¦

u

0

x to generate A, so that L C

C

C

Ö

u

¦

u

0

u3

~

Ö

Figure B.1: The grid used for solving the Laplace problem. The grid dimension is m 2, n 2.

A PPENDIX B. T HE S PARSE L APLACE P ROBLEM

97

1 0.8 0.6

u

0.4 0.2

PSfrag replacements

0 0 5 10 15 20 5 25 0 10 15 20

25

Figure B.2: The solution to the Laplace problem. The grid dimension is m 20, n 20 and the same boundary conditions is used as those shown in ﬁgure B.1.

y

x

C

C

98

A PPENDIX B. T HE S PARSE L APLACE P ROBLEM

1) 4. m 5. Bard [Bar70]. Function deﬁnition.2). The Chebyshev norm ∞ can be formulated as a minimax problem max fj x (C. . We use the following format. For convenience we repeat the deﬁnition of the different types of problems here. )ÙÙ Ú Ù Ú ä Ý ã Ú à Ú Ú where u j j. 0 050816326531. y j á v j x2 w j x3 y j or y j Ýâââ UUÝ Ú Ý á à Ú }ß Þ fj x yj x1 uj j 1 15 yj. is deﬁned in the following.99 Appendix C Test functions The test functions used in this repport. Dimension: y : n 3. Start point. For y j yj: F x 0 0040700234725. b) â è 5ß Û Þ ÜÙÙ Ú â è 5ß Û Þ Ù Ú d) For y j Data: y j: F x ç å æÚ c) x0 1 1 1 T. a) b) c) d) Type of problem and dimension. Solution and auxiliary data. v j 16 i and w j min u j v j . m 4t 4 f¥ ¤ \$ ¥ ¤ min F x F x max f j x (C.2) \$ ( x¥ ¤ " 3. y : n &(¥ c¥ ¤ H ¤ \$¥ '¥ ¤ ¤ " 4 5¥ ¤ ¥ ¤ min F x F x max max f j x ¨ ¿¨ Ù \$&&& \$ ¦ where j 1 m. Minimax problems is deﬁned as x x j j The test functions are presented in the list below. Ú Û Ý Ú Ú ÜÙÙ Ú Û Ú Ú a) Type (C. t 3.

25 5 5 1 T. f2 x x2 à Ú }ß Þ Ý Àß à w x2 x2 1 Ú iÛ Ú Þ Ú }ß Þ Ú a) Type (C.1).2).39 yj 0.10 4. T.16 0. m 2.22 0. Dimension: n 4.53 0.40 0. Dimension: n 4. Parabola b) c) x0 3 3 0.30 0.32 0.26 0. d) F x Enzyme [KO68] Ú Ú a) Type: (C.58 yj 0.21 0. 2.43 0.54 2.100 A PPENDIX C.96 1.10 1.10 Ú }ß Þ à êÚ å .66 i 11 12 13 14 15 yj 0. m 2. Ú Ú where w 10 or w 100. t 3. m 11. i 1 20.37 0.2).34 i 6 7 8 9 10 yj 0.2).37 0. Ú }ß Þ Ý à Ú }ß Þ f1 x x2 1 x2 Ú Û Ú Ú a) Type: (C.73 0. 115 70643952. t 3. f2 x 1 x1 ß à á á ß Þ à â Ú ß Û Þ ç à à å êÚ Ýâââ UUUÝ Ú é Ú á Þ }ß Þ Ú fj x x1 t j x2 et j 2 Ú Û Ú Ú a) Type: (C. Dimension: n ç ç Ú ß Û Þ â à êÚ å c) x0 12 1 T.18 0. b) where ti c) x0 d) F x i 5.39 0. b) f1 x d) F x 0.25 0. t 2. T EST FUNCTIONS Brown-Den Brown and Dennis [BD71].29 yj 0.14 0. x4 sint j cos t j 2 x3 )ÙÙ Ù ÜÙÙ Ù ÜÙÙ Ù i 1 2 3 4 5 yj 0.35 0.43 5.83 1. Dimension: n 2. Rosenbrock [Ros60]. m 20.34 2.

0833 0.0456 0.0246 - yj 0. à ß ç â Þ é â ß à á â Ú á ß á á î}ß Þ ÞÞ í Ú tj Ú Ú a) Type: (C.2).0000 0.0000 1.0323 0. 0 05 1 15 T. 5. x4 x5 e x6 t j á á y2 j x3 y j x4 j 7 8 9 10 11 - vj 0.1947 0.0844 0. ç à Ú}ß Þ å æÚ c) x0 2 2 7 0 2 1 T.0625 - yj 5t j 2 â sin 5t j Ýâââ 'UUÝ Ú Ý à Ú }ß Þ fj x vj ß á Þ x1 y2 j x2 y j j 1 11 .1670 6.1600 0.0235 0. yj 4.2).0342 0.0000 2.1957 0.2500 0.1000 0.0627 El-Attar [EAVD79] b) 2 Hettich [Wat79] b) fj x c) x0 d) F x â Û Þ Ú ß â à æÚ å Ýâââ UUUÝ Ú where j 1 5 and t j 0 25 j 1 0 75 4. m 51.1250 0. Dimension: n â d) F x 0 034904. T EST FUNCTIONS 101 b) d) Data: j 1 2 3 4 5 6 vj 0.0714 0. 4. Dimension: n ç â â â â æÚ å c) x0 05 05 05 05 T. 0 002459. x1 t j x2 t j x3 2 x4 Ý Àß Þ ë á ë á á ë à Ý â âUUâUÝ Ú Ú }ß Þ yj x e 3e e ì ì 2t j ß Þ ë ë e tj e 3t j 3t j 2 sin 7t j 2 à ë áß á Þ ë Ú }ß Þ fj x x1 e x2 t j cos x3t j Ú Ú a) Type: (C.1735 0.5000 0.A PPENDIX C. m é ß à Þ Ú where j 1 51 and t j 2 j 1 10.

T EST FUNCTIONS .102 A PPENDIX C.

( may be empty ) . % i n f o ( 4 ) : Number o f f u n c t i o n e v a l u a t i o n s . DTU . % info ( 3 ) : Function value F(x ) . % omit o p t s from t h e argument l i s t . % i n f o ( 1 ) : Number o f i t e r a t i o n s . % rho : Set t r u s t region r a d i u s rho . % opts : options . % minStep : If % t r u s t R e g i o n : choose t r u s t region update . To u s e d e f a u l t s e t t i n g . % 1 : Maximum number o f i t e r a t i o n s r e a c h e d . BIT 3 4 ( 1 9 9 4 ) . J ] = fun ( x . % [ 3 ] : K . % % INPUT : % f u n : a s t r i n g w i t h a f u n c t i o n name . % precisionFun : The f u n c t i o n v a l u e a t t h e s o l u t i o n . 3 7 2 3 8 7 . . p e r f ] = s l p 4 ( f u n . % % I n p a r t b a s e d upon : % [ 1 ] : K . % c a l l SLP w i t h no a r g u m e n t s t o s e e d e f a u l t v a l u e s . f p a r ) . % % R e l e a s e v e r s i o n 1 . o p t s ) % % Usage : [ x . See [ 3 ] . % fval : Function evaluation at the solution . f p a r . % 2 : Small s t e p = 0. o p t s ) % % S e q u e n t i a l Linear Programing s o l v e r with c o n t i n o u s % t r u s t r e g i o n u p d a t e [ 2 ] . % [ 2 ] : H . % 4 : E n h a n c e d c l a s s i c a l u p d a t e w i t h damping . Madsen . % x0 : s t a r t p o i n t . % T e c h n i c a l R e p p o r t IMM REP 1999 05.1 Source code for SLP in Matlab f u n c t i o n [ x . step 2 minStep t h e n s t o p . % % The o p t i o n s a r e s e t by u s i n g SETPARAMETERS . % maxIter : Maximum number o f i t e r a t i o n s . Kgs . p e r f ] = s l p 2 ( f u n . x0 . % perf : Contains the following f ie l d s : % F : Function evaluation . f p a r . % 3 : T r u s t r e g i o n u p d a t e . Madsen . Ì Ì Ì Ì Ì Ì ò ñïï Ì Ì ò ðïï . C o r r e c t e d S e q u e n t i a l L i n e a r P r o g r a m m i n g % f o r S p a r s e Minimax O p t i m i z a t i o n . % A l l f u n c t i o n s h o u l d f o l l o w t h i s API % [ f . B . % X : Steps . % 1 : C l a s s i c a l u p d a t e w i t h damping [ 1 ] % 2 : C o n t i n o u s u p d a t e o f t r u s t r e g i o n . i n f o . % epsilon : T h r e s h o l d f o r a c c e p t i n g new s t e p . 0 28 10 02. % R : Trust region radius . % f p a r : p a r a m e t e r s t o t h e f u n c t i o n . % p r e c i s i o n D e l t a : S t o p when a p r e c i s i o n d e l t a i s r e a c h e d . % info ( 2 ) : Active stopping c r i te ri o n .103 Appendix D Source code D. See [ 2 ] . % 3 : dL % 4 : Desired precision reached . % 5 : Class ical update s trate gy . Damping p a r a m e t e r i n M a r q u a r d t ’ s method . Lyngby . % multi pli ers : M ultipliers at the solution . i n f o . % 6 : Udate s t r a t e g y t h a t u s e s t h e s t e p l e n g t h . x0 . J o n a s s o n and K . N i e l s e n . Minimax S o l u t i o n o f Non l i n e a r E q u a t i o n s w i t h o u t % calculating derivatives . % % output : % X: a vector containing all points visited .

% do n o t c a l c u l a t e g a i n = dF / dL due t o n u m e r i c u n s t a b i l i t y . = zeros (2 m. f p a r . 1 e 10). 0 . ô ôô ó if nargin fprintf fprintf fprintf fprintf fprintf fprintf fprintf fprintf fprintf return end Ì Ì ÌÌÌÌÌÌÌÌÌÌÌ Ì Ì ó õ ó ó õ ó ó ó Ì õò Ì Ì Ì ÌÌÌÌÌÌÌÌÌÌÌÌÌÌ ±±±±±±±±Ì Ìò ÌÌò ò ò ò 1. e p s ˆ ( 1 / 3 ) norm ( x0 ) ) . R = []. 1 ) . UB = LB . precisionDelta . b = f. 100 ] n ’ ). 0 ) . trustRegion . [ ] . x = xt . 1 max ( x0 ) ] n ’ ) . flag = 1. f l a g = 0 . 1 ) . a l g o r i t h m ) ) . % fcount = [f . f c o u n t = 0 . dk A PPENDIX D. size ( J ). ” off ” ] n ’) . r h o . 5 rho . % x = x0 ( : ) . d t u . d t u . largesca le . 0 1 ) . f e v l . 0 . n ) 1 ] . d = s ( 1 : n ) . e r r o r ( s p r i n t f (’% s : l i n p r o g f a i l e d ’ . 7 5 dL ) & f l a g . 0 ] n ’). ” off ” ] n ’) . Take t h e r o l e a s g a i n o l d . f p a r ) . i n f ] . i t e r a t i o n s = 0 . [ ] . end switch t rus tre gion C l a s s i c a l u p d a t e w i t h damping [ 1 ] case 1 % i f ( dF 0 . max iter = setChoise ( opts . LB . [ s . imm . o u t p t . X = [ X x ] . 1 ] n ’). 1 e 10 ] n ’ ) . S OURCE CODE if ˜ exis t (’ opts ’) o p t s = s e t p a r a m e t e r s ( ’ empty ’ ) . o p t i o n s ) . [ f t . 0 ) . m i n s t e p = s e t C h o i s e ( o p t s . e r r . [ ] . 1 max ( x0 ) ) . xt . % storing information . x0 . J t ] = f e v a l ( fun . end r h o = s e t C h o i s e ( o p t s . i n d e x ] = c h e c k j a c o b i ( f u n . s t o p = 0 . n . n ] = activeset fcount + 1. ó ôô ôô ô ôÌ ó (’ (’ (’ (’ (’ (’ (’ (’ (’ rho epsilon maxIter minStep trustRegion precisionDelta precisionFun derivativeCheck largescale : : : : : : : : : [ [ [ [ [ [ [ [ [ Default Default Default Default Default Default Default Default Default = = = = = = = = = 0 . 0. A . 3 i n ” C h e c k i n g G r a d i e n t s ” H . rho = 2 . .104 % Mark Wrobel . dF = FX dL = FX alpha . epsilon = setChoise ( opts .4 ) A = [ J o n e s (m . [ ] . max ( f t ) . xt = x + d . 9 abs ( e r r ( 1 ) ) . minStep . derivativeCheck . l i n p r o g ( [ r e p m a t ( 0 . 1 ) . UB . proj76@imm . ÌÌ õ . N i e l s e n IMM 2 1 . FX = max ( f ) . larges cale ). epsilon . 1 ) . 0 ] n ’). % See p . i f dF e p s i l o n dL & ( dL 0). b . 1 ) ] . nu = 2 . p rec is io n d elt a = setChoise ( opts . end i f isempty ( opts ) o p t s = s e t p a r a m e t e r s ( ’ empty ’ ) . . end end % Initialization . F = []. options = optimset (’ Display ’ .01 ] n ’). end a l p h a = s ( end ) . dk / ˜ hbn [ maxJ . % MAIN LOOP while ˜ stop LB = [ r e p m a t ( r h o . maxIter . 1 0 0) . if derivativeCheck . if exitfl 0. s e e [ 1 ] ) flag = 1 . ’ LargeScale ’ . f e v a l ( fun . J = Jt . e x i t f l . J] = [m . % defining constraints for linprog (2. % e n s u r e s a column v e c t o r X i s a matrix of x ’ s X = x. 0 . 1 . B . 9 . f = ft . 2 0 0 % Download a t www . R = [ R r h o ] . derivativeCheck = setChoise ( opts . precisionFun . F = [ F FX ] . i f abs ( e r r ( 3 ) ) e r r o r ( ’ D e r i v a t i v e check f a i l e d ’ ) . ( g a i n o l d = 2 e p s i l o n . [ ] . x . 1 ) . p re cis ion f = setChoise ( opts . f p a r ) . ’ off ’ . e l s e . lambda ] = . t rus tre gion = setChoise ( opts . 1 . fcount = fcount + 1 . largesc ale = setChoise ( opts .

A PPENDIX D. S OURCE

CODE

105

iterations = iterations + 1; if iterations = max iter , stop = 1 ; e l s e i f norm ( d ) min step , stop = 2 ; 0, stop = 3 ; e l s e i f dL end if precision delta , % use the p r e c i s i o n stop c r i t e r i a . F x = max ( f e v a l ( f u n , x , f p a r ) ) ; p r e c i s i o n f ) / max ( 1 , p r e c i s i o n f ) if (( F x stop = 4; end end end % end w h i l e .

= precision delta ),

% s t o r i n g performence parameters p e r f . F = F ; p e r f . R = R ; p e r f .X = X ; p e r f . f v a l = f ; p e r f . m u l t i p l i e r s = lambda . i n e q l i n ; i n f o ( 1 ) = i t e r a t i o n s ; i n f o ( 2 ) = s t o p ; i n f o ( 3 ) = max ( f ) ; info (4 ) = fcount ;

function [ value ] = setChoise ( parameter , default , booleancheck , boolstring ) i f nargin 2 & ˜ isempty ( d e f a u l t ) , e r r o r ( ’ Can n o t s e t a d e f a u l t v a l u e f o r a b o o l e a n e x p r e s s i o n ’ ) ; end i f nargin 3 i f isempty ( parameter ) , value = default ; else value = parameter ; end else i f isempty ( parameter ) , value = 0; else i f s t r c m p ( p a r a m e t e r , ’ on ’ ) , value = 1 ; else value = 0 ;

ÌÌÌÌÌÌÌÌÌÌ ±±Ì

ÌÌÌÌÌÌÌÌÌÌ ±±Ì

%

INNER FUNCTIONS

Ì

e l s e i f dF 0 . 2 5 dL rho = rho / 2 ; end case 2 % C o n t i n o u s u p d a t e o f t r u s t r e g i o n . See [ 2 ] . gamma = 2 ; b e t a = 2 . 5 ; 0 ) & ( dL 0) i f ( dF i f norm ( d , i n f ) 0.9 rho , min ( max ( 1 / gamma , 1 + ( b e t a 1 ) ( 2 dF / dL rho = rho nu = 2 ; end else r h o = r h o / nu ; nu = nu 2 ; end T r u s t r e g i o n u p d a t e . See [ 3 ] . case 3 % rho1 = 0 . 0 1 ; rho2 = 0 . 1 ; sigma1 = 0 . 5 ; sigma2 = 2 ; i f dF = r h o 2 dL , r h o = s i g m a 1 max ( norm ( d , i n f ) , 1 / s i g m a 2 r h o ) ; else r h o = min ( s i g m a 2 max ( norm ( d , i n f ) , 1 / s i g m a 2 r h o ) , 2 ) ; end case 4 % E n h a n c e d c l a s s i c a l u p d a t e w i t h damping . 0 . 7 5 dL ) & f l a g , i f ( dF rho = 2 . 5 min ( r h o , norm ( d , i n f ) ) ; 0 . 2 5 dL e l s e i f dF min ( r h o , norm ( d , i n f ) ) ; rho = 0 . 5 end Clas sical update strategy . case 5 % i f ( dF 0 . 7 5 dL ) , rho = 2 . 5 rho ; 0 . 2 5 dL e l s e i f dF rho = rho / 2 ; end case 6 % Udate t h a t use s t e p l e n g t h . i f ( dF 0 . 7 5 dL ) , r h o = max ( 2 . 5 norm ( d , i n f ) , r h o ) ; e l s e i f dF 0 . 2 5 dL r h o = 0 . 2 5 norm ( d , i n f ) ; end otherwise e r r o r ( s p r i n t f ( ’ No t r u s t r e g i o n method % d ’ , t r u s t r e g i o n ) ) ; end s w i t c h end %

ó ó Ì ó

ò

ó

ó

ó õõ ó ó ó ó óó

ó

óó

ó ò ó Ì õò ò ó ó Ì õò ó ò ó ó Ì õò

òõ ò

ò

Ì

õ Ìò Ìò

1 )ˆ5), beta );

õ

ò

ò

Ìò

Ìò

106
end end end if nargin == 4, if value , v a l u e = ’ on ’ ; else value = ’ off ’; end end

A PPENDIX D. S OURCE

CODE

A PPENDIX D. S OURCE

CODE

107

D.2 Source code for CSLP in Matlab
f u n c t i o n [ x , i n f o , p e r f ] = c s l p 4 ( f u n , f p a r , x0 , o p t s ) % Usage : [ x , i n f o , p e r f ] = c s l p 4 ( f u n , f p a r , x0 , o p t s ) % Minimax s o l v e r t h a t u s e s a c o r r e c t i v e s t e p . To u s e % d e f a u l t s e t t i n g s , omit o p t s from t h e argument l i s t . % T h i s a l g o r i t h m i s b a s e d on [ 1 ] . % % INPUT : % fun : a s t r i n g w i t h a f u n c t i o n name . % A l l f u n c t i o n s s h o u l d f o l l o w t h i s API % [ f , J ] = fun ( x , f p a r ) ; % fpar : p a r a m e t e r s t o t h e f u n c t i o n , i f any . % x0 : s t ar t point . % opts : options . % % The o p t i o n s a r e s e t by u s i n g SETPARAMETERS . % c a l l CSLP w i t h no a r g u m e n t s t o s e e d e f a u l t v a l u e s . % rho : Set t r u s t region r a d i u s rho . % epsilon : T h r e s h o l d f o r a c c e p t i n g new s t e p . % maxIter : Maximum number o f i t e r a t i o n s . % minStep : If step 2 minStep t h e n s t o p . % t r u s t R e g i o n : choose t r u s t region update . % 1 : C l a s s i c a l u p d a t e w i t h damping [ 1 ] % 2 : C o n t i n o u s u p d a t e o f t r u s t r e g i o n . See [ 2 ] . % 3 : T r u s t r e g i o n u p d a t e . See [ 3 ] . % 4 : E n h a n c e d c l a s s i c a l u p d a t e w i t h damping . % 5 : Class ical update s trate gy . % 6 : Udate s t r a t e g y t h a t u s e s t h e s t e p l e n g t h . % u s e T e n t a t i v e J a c o b i a n : I f ” y e s ” u s e t h e J a c o b i a n a t x+h % for the c orrective step . % p r e c i s i o n D e l t a : S t o p when a p r e c i s i o n d e l t a i s r e a c h e d . % precisionFun : The f u n c t i o n v a l u e a t t h e s o l u t i o n . % % output : % x : Resulting x . % i n f o ( 1 ) : number o f i t e r a t i o n s . % info ( 2) : active stoping criterion . % 1 : Maximum number o f i t e r a t i o n s r e a c h e d . % 2 : Small s t e p . % 3 : dL = 0 % : 4 : Desired precision reached . % info ( 3 ) : Function value F(x ) . % i n f o ( 4 ) : Number o f f u n c t i o n e v a l u a t i o n s . % i n f o ( 5 ) : Number o f a c t i v e s t e p s . % i n f o ( 6 ) : Number o f f a i l e d a c t i v e s t e p s . % perf : Contains the following fi e l ds : % F : Function evaluation . % R : Trust region radius . % X : Steps . % fval : Function evaluation at the solution . % mu ltip lie rs : Multipliers at the solution . % % References % [ 1 ] : K . J o n a s s o n and K . Madsen , C o r r e c t e d S e q u e n t i a l L i n e a r P r o g r a m m i n g % f o r S p a r s e Minimax O p t i m i z a t i o n . BIT 3 4 ( 1 9 9 4 ) , 3 7 2 3 8 7 . % [ 2 ] : H . B . N i e l s e n . Damping p a r a m e t e r i n M a r q u a r d t ’ s method . % T e c h n i c a l R e p p o r t IMM REP 1999 05. DTU , Kgs . Lyngby . % [ 3 ] : K . Madsen . Minimax S o l u t i o n o f Non l i n e a r E q u a t i o n s w i t h o u t % calculating derivatives . % % Mark Wrobel . proj76@imm . d t u . dk

i f ˜ exi st (’ opts ’) o p t s = s e t p a r a m e t e r s ( ’ empty ’ ) ; end i f isempty ( opts ) o p t s = s e t p a r a m e t e r s ( ’ empty ’ ) ; end rho = s e t C h o i s e ( opts . rho , epsilon = setChoise ( opts . max iter = setChoise ( opts min step = setChoise ( opts 0 . 1 max ( x0 ) ) ; epsilon , 0. 01 ); . maxIter , 1 0 0 ); . minStep , 1 e 10);

ô ôô ô ô ô ôÌ

(’ (’ (’ (’ (’ (’ (’ (’ (’ (’

rho epsilon maxIter minStep trustRegion precisionDelta precisionFun derivativeCheck useTentativeJacobian largescale

: : : : : : : : : :

[ [ [ [ [ [ [ [ [ [

Default Default Default Default Default Default Default Default Default Default

= = = = = = = = = =

0 . 1 max ( x0 ) ] n ’ ) ; 0.01 ] n ’); 100 ] n ’); 1 e 10 ] n ’ ) ; 1 ] n ’); 0 ] n ’); 0 ] n ’); ” off ” ] n ’ ); ” off ” ] n ’ ); ” off ” ] n ’ );

ô

ôô ó

i f nargin fprintf fprintf fprintf fprintf fprintf fprintf fprintf fprintf fprintf fprintf return end

1,

Ì

Ì Ì

Ì

Ì Ì

Ì

ó

ò ñïï

ò

ðïï

ò

p re cis ion f = setChoise ( opts . . d = h . fcount = fcount + 1 .108 t rus tre gion = setChoise ( opts . derivativeCheck . J = Jt . derivativeCheck = setChoise ( opts . active steps = 0. i f abs ( e r r ( 3 ) ) 0 . e p s i l o n dL i f dF x = xt . [ ] . [ ] . UB . lambda ] = . ’ off ’ . UB = LB . Use J ( x ) . fcount = 0 . 3 i n ” C h e c k i n g G r a d i e n t s ” H . 2 0 0 % Download a t www . i n f ) . LB = [ r e p m a t ( r h o . if derivativeCheck . f. % e n s u r e s a column v e c t o r X = x. R = []. J ) . [ f t J t ] = f e v a l ( fun . h . X = [ X x ] . imm . max ( f t ) . 0 ) . [ s . d = h + v. f p a r . 1 ) . 9 . o p t i o n s ) . A . % end Take a c o r r e c t i v e s t e p Use J ( x+h ) . if corrstepJX . x0 . x = x0 ( : ) . i t e r a t i o n s = 0. else wasted active = wasted active + 1. l i n p r o g ( [ r e p m a t ( 0 . LB . [m . end ó ó õ Ìò Ìò Ìò Ì ó ó Ì Ì ó ò ó ó õ õ ó ó Ì ò Ì ò Ì ò ÌÌ õ Ì Ì õ . 9 abs ( e r r ( 1 ) ) . . [ ] . f = ft . error (’ linprog failed ’). xt . h . S OURCE CODE fcount = fcount + 1. dL = FX i f ( dL = e p s ) & ( dF = e p s i l o n dL ) % active steps = active steps + 1. 1 ) . 1 ) ] . flag = 1 . 9 norm ( h ) ) & ( norm ( v ) 0). else flag = 0. 1 . n ) 1 ] . e r r o r ( ’ D e r i v a t i v e check f a i l e d ’ ) .4 ) A = [ J o n e s (m . n ] = s i z e ( J ) . f p a r ) . while ˜ stop i n f ] . precisionFun . fcount = fcount + 1. xt = x + d . v = corrstep ( ft . stop = 0 . d l e n g t h = norm ( d . p rec is io n d elt a = setChoise ( opts . 1 ) . end a l p h a = s ( end ) . [ ] . la rges cale ). 1 ) . h = s ( 1 : n ) . % See p . % else v = corrstep ( ft . d = rho d / d l e n g t h . d t u . abs ( angle pi ) . largesc ale = setChoise ( opts . i n d e x ] = c h e c k j a c o b i ( f u n . end xt = x + d . % use f l a g i n s t e a d of g a i n o l d = 2 e p s i l o n . f e v l . B . f p a r ) . dk / ˜ hbn [ maxJ . end end F = []. J ) . end end % Storing information . precisionDelta . 1 ) . % X i s a matrix of x ’ s A PPENDIX D. f p a r ) . b . FX = max ( f ) . Jt . [ f . corrstepJX = setChoise ( opts . 0 ) . e p s ˆ ( 1 / 3 ) norm ( x0 ) ) . R = [ R r h o ] . [ ] . F = [ F FX ] . J ] = f e v a l ( fun . [ ] . f . e x i t f l . % truncating to f i t t r u s t region if d length rho . [ f t J t ] = fe v a l ( fun . useTentativeJacobian . options = optimset (’ Display ’ . o u t p t . largesca le . xt . J . wasted active = 0. % a n g l e = a c o s ( ( h ’ v ) / ( norm ( h ) norm ( v ) ) ) . b = % s o l v e t h e LP p r o b l e m . if exitfl 0. dF = FX max ( f t ) . % defining constraints for linprog (2. n . flag = 1. N i e l s e n IMM 2 1 . dF = FX alpha . ’ LargeScale ’ . % i f pi /32 i f ( norm ( v ) = 0 . 1 . trustRegion . e r r . f . x .

i n f o ( 1 ) = i t e r a t i o n s . info (5 ) = a c tive s te ps . % use the p r e c i s i o n stop c r i t e r i a . 2 5 norm ( d . 7 5 dL ) . i n f ) . rho = 0 . 0 . p e r f . F = F . See [ 3 ] . 5 rho . 2 ) . i n f o ( 2 ) = s t o p . 0 1 . F x = max ( f e v a l ( f u n . p e r f . end ÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööÌööööööööööööööööööööööööööööööÌ Ì Ì Ì Ì Ì Ì Ì Ì Ì Ì Ì Ì Ì Ì Ì ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ Ì ò Ì ó ó Ì ó ò ó ó ó õõ ó õ Ìò ó ò ó ó Ì õò Ìò ó ó ó óó óó ó ò ó Ì õò ò ó ó Ì õò ó ò ó ó Ì õò ò õ Ì Ìò ò 1 )ˆ5). 5 . 5 e l s e i f dF 0 . i n f o ( 3 ) = max ( f ) . end Udate t h a t use s t e p l e n g t h . else r h o = min ( s i g m a 2 max ( norm ( d . i f dF = r h o 2 dL . p e r f . iterations = max iter . end otherwise d i s p ( s p r i n t f ( ’ No t r u s t r e g i o n method % d ’ . m u l t i p l i e r s = lambda . 0 . i f ( dF rho = 2 . i f ( dF r h o = max ( 2 . i n f ) ) . end else r h o = r h o / nu . case 6 % 0 . end case 3 % T r u s t r e g i o n u p d a t e . info (6) = wasted active . See [ 2 ] . case 4 % i f ( dF 0 . end E n h a n c e d c l a s s i c a l u p d a t e w i t h damping . end case 2 % C o n t i n o u s u p d a t e o f t r u s t r e g i o n . sigma2 = 2 . p r e c i s i o n f ) = precision delta ). i f ( dF rho = 2 . 2 5 dL min ( r h o . Ì switch trus tre gio n C l a s s i c a l u p d a t e w i t h damping [ 1 ] case 1 % 0 . 2 5 dL e l s e i f dF rho = rho / 2 . rho1 = 0 . 5 . norm ( d . p e r f . x . nu = nu 2 . 5 norm ( d . i f norm ( d . sigma1 = 0 .9 rho . r h o = s i g m a 1 max ( norm ( d . i n f ) . rho = 2 . % Input : % f : f u n c t i o n e v a l u a t i o n s ( f o r c s l p b a s e d on l i n a r i z a t i o n s o f f ) . 5 end case 5 % Clas sical update strategy . 1 / s i g m a 2 r h o ) . i = f i n d ( abs ( f % max ( f ) ) = 1e 3). R = R . end end end % end w h i l e % Formating output . 7 5 dL ) & f l a g . i n f ) . r h o ) . b e t a = 2 . i n f ) rho = rho min ( max ( 1 / gamma . 7 5 dL ) . stop = 3. end if precision delta . p e r f . i n f ) . 2 5 dL e l s e i f dF r h o = 0 . 1 / s i g m a 2 r h o ) . ò Ìò . norm ( d . f p a r ) ) .X = X . 1 . if % ========== auxiliary function ================================= function i = activeset ( f ) % % Usage : i = a c t i v e s e t ( f ) % Finds the a c t i v e s e t . S OURCE CODE 109 iterations = iterations + 1. t r u s t r e g i o n ) ) . 2 5 dL rho = rho / 2 . e l s e i f dF 0 . i n e q l i n . f v a l = f . 0 ) & ( dL 0) i f ( dF 0. 1 + ( b e t a 1 ) ( 2 dF / dL nu = 2 . info (4 ) = fcount . gamma = 2 . min step . i n f ) ) . p r e c i s i o n f ) / max ( 1 . rho2 = 0 .A PPENDIX D. e l s e i f norm ( d ) stop = 2. break . 7 5 dL ) & f l a g . beta ). min ( r h o . stop = 1. e l s e i f dL 0. 0 . % Output : % i : The a c t i v e s e t . 5 rho . if (( F x stop = 4.

mode ) % C a l c u l a t e s t h e c o r r e c t i v e s t e p . value = default . R . % h : A s t e p h . R . j ). m ] = s i z e (A ) . p r o p o s e d by a LP s o l v e r . A = A E. end 3 if nargin i f isempty ( parameter ) . 0 ) .1). h . if value . b = b( j ). % f i n d s m a l l v a l u e s on d i a g o n a l (R ) . h (1: n ). function [ value ] = setChoise ( parameter . t = length ( j ) . J lin ) % % Usage : v = c o r r s t e p ( f . default . end end ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ ööööööööööööööööööööööööööööööööööööööööööööööööööööööööööööÌ Ì ó Ì Ìò Ì ô ó ó ó õ Ìò Ìò Ì ÌÌ òò Ì õ ò óó ò b ]. %[Q . % f l i n : Used t o d e t e r m i n a c t i v e f u n c t i o n s i n l i n p r o g . Now w i t h % p i v o t i n g of the Jacobian . end else i f isempty ( parameter ) . n ] = s i z e ( J ) . else i f s t r c m p ( p a r a m e t e r . else value = parameter . h . end end end if nargin == 4. if ( t v else I A v v end % % number o f l i n e a r i n d e p e n d e n t g r a d i e n t s . ’ on ’ ) . A PPENDIX D. J . 1 : n )= eye ( n ) . f li n . ones ( 1 . % Input : % f : Function e v a l u a t i o n s of a c t i v e f u n c t i o n s . boolstring ) if nargin 2 & ˜ isempty ( d e f a u l t ) . else value = 0 . [ I h A h . e e ] = q r (A . v a l u e = ’ on ’ . b = f ( i ). = 1) % Null s o l u t i o n = zeros ( size (h ) ) . % pivoted v e r s i o n of A. 0 1 n e p s ). A h ’ zeros ( t ) ] [ zeros (n +1. t ) ] . e r r o r ( ’ Can n o t s e t a d e f a u l t v a l u e f o r a b o o l e a n e x p r e s s i o n ’ ) . I h ( 1 : n . % J : Jacobian of a c t i v e f u n c t i o n s .110 function v = corrstep ( f . % J l i n : Used t o d e t e r m i n a c t i v e f u n c t i o n s i n l i n p r o g . value = 1 . J . % pivoting b b = E’ b . E = s p a r s e (m. m ) = 1 . j = f i n d ( a b s ( d i a g (R ) ) A = A( : . %[m. 1 . m ) . E ] = q r (A ) . i = activeset ( f lin+J lin h ). h h h = = = = v z e r o s ( n + 1 ) . S OURCE CODE A = J( i . % % Output : % v : The c o r r e c t i v e s t e p . E ( e e + ( 0 : m 1 ) . : ) ’ . booleancheck . [ n . % n e e d some r a n k r e v e a l i n g QR ( RRQR ) o f A t o a v o i d l i n e a r % dependent g r a d i e n t s . [Q . . [ A. else value = ’ off ’. value = 0 .

. opts . scalar ] n ’). case ’ maxiter ’ value = checkInteger ( parameter . . Nielsen . . integer ] n ’) . ôô i f nargin == 0. . opts . % 3 : K . ’ largescale ’ . e r r o r ( ’ P a r a m e t e r i s n o t a s t r i n g ’ ) . ’ empty ’ ) .B . opts = s t r u c t ( ’ rho ’ . . Madsen . value ). t h e n t h e % minStep % algorithm stops . i f ˜ s t r c m p ( l o w e r ( c h a r ( v a r a r g i n ( 1 ) ) ) . rho = value . ôô ôô ô ôô ô ïï ï : : : : : : : : : : [ [ [ [ [ [ [ [ [ [ positive positive positive positive positive positive positive off on on off off on scalar ] n ’). scalar ] n ’). Use t h e J a c o b i a n e v a l u a t e d a t X t e n t a t i v e . [ ] . . ] n ’). % 4 : Modified Johnasson . The v a l u e o f f u n c t i o n i n t h e s o l u t i o n . . v a l u e 1 . . % precisionDelta The p r e c i s i o n o f t h e s o l u t i o n . [ ] ) . ] n ’). e r r o r ( ’ A r g u m e n t s m u s t come i n p a i r s ’ ) . [ ] . value ). ’ precisionFun ’ . i f ˜ i s c h a r ( p a r a m e t e r ) . . . [ ] . ’ derivativeCheck ’ . % epsilon % maxIter Maximum number o f i t e r a t i o n s . % 6 : Update t h a t i n c o r p o r a t e s t h e s t e p l e n g t h . maxIter = value . . minStep = v a l u e . opts . % precisionFun % derivativeCheck Check t h e g r a d i e n t s . . case ’ epsilon ’ value = checkScalar ( parameter . ’ param2 ’ . . [ ] . v a l u e 2 . R e q u i r e s t h a t % also precision fun is set . a s s i n g v a l u e s . I f a s t e p becomes l o w e r t h a n m i n s t e p .A PPENDIX D. . value = cell2mat ( varargin ( i +1)). [ ] . case ’ rho ’ value = checkScalar ( parameter . fprintf (’ rho fprintf (’ epsilon fprintf (’ maxIter fprintf (’ minStep fprintf (’ trustRegion fprintf (’ precisionDelta fprintf (’ precisionFun fprintf (’ derivativeCheck fprintf (’ useTentativeJacobian fprintf (’ largescale return end Ì Ì Ì ÌÌ õ ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ ÌÌÌÌ öööö±öööööööö±ööööööööö±öööööööö±ööööö±ÌÌ ±Ì Ì Ì Ì Ì Ì . . ’ useTentativeJacobian ’ . parameter = lower ( char ( v a r a r g i n ( i ) ) ) . integer ] n ’) . end try switch parameter . . case ’ minstep ’ value = checkScalar ( parameter . epsilon = value . S OURCE CODE 111 D. . scalar ] n ’). i f mod ( n u m a r g s . ) % % Parameters % % rho Trust region radius . . [ ] . ] n ’). % 2 : H. . . . if gainfactor rho then a s t e p i s accepted . ’ maxIter ’ . . ’ epsilon ’ . ’ minStep ’ . end f o r i = 1 : 2 : numargs . . ’ precisionDelta ’ . . ’ trustRegion ’ . % 1 : Johnasson . % Check i n p u t . .3 Source code for SETPARAMETERS in Matlab function [ opts ] = setparameters ( varargin ) % % Usage : [ o p t s ] = s e t p a r a m e t e r s ( v a r a r g i n ) % % T a k e s v a r i a b l e i n p u t on t h e f o r m % o p t i o n s = s e t p a r a m e t e r s ( ’ param1 ’ . % 5 : Class ical update . o p t s . numargs = n a r g i n . % useSolver % trustRegion Set the type of t r u s t r e g i o n update . % useTentativeJacobian % empty r e t u r n s t h e empty s t r u c t . [ ] . [ ] . [ ] . . value ). . . . % Create s t ru c t . scalar ] n ’). . 2 ) . value ).

p r e c i s i o n D e l t a ) flag = 0. v a l u e ) . case ’ precisionfun ’ value = checkScalar ( parameter . S OURCE CODE function [ value ] = checkScalar ( parameter . u s e T e n t a t i v e J a c o b i a n = lower ( value ) . p r e c i s i o n D e l t a ) . otherwise e r r o r ( s p r i n t f ( ’ Unrecognized parameter : % s ’ . value ). i s n o t an i n t e g e r ’ . opts . is not a scalar ’ . trustRegion = value . e r r o r ( s p r i n t f (’% s end f u n c t i o n [ v a l u e ] = checkYesNo ( p a r a m e t e r . case ’ large scale ’ v a l u e = checkYesNo ( p a r a m e t e r . p a r a m e t e r ) ) . value ). l o w e r ( v a l u e ) ) . c a s e ’ empty ’ % r e t u r n t h e empty s t r u c t . opts . end ÌÌÌÌÌÌÌÌÌÌÌÌÌ ö±±±Ì õ Ì Ì Ì Ì Ì ï ï Ì ÌÌÌÌÌÌÌ ±Ì % INNER FUNCTIONS ï Ìò . end end A PPENDIX D. value ). l a r g e s c a l e = lower ( value ) . l o w e r ( v a l u e ) ) & ˜ s t r c m p ( ’ on ’ . case ’ prec isio ndelt a ’ value = checkScalar ( parameter . d e r i v a t i v e C h e c k = lower ( value ) . i f ˜ isempty ( opts . value ) if ˜ i s r e a l ( value ) ischar ( value ) . e r r o r ( s p r i n t f (’% s e l s e i f ˜ s t r c m p ( ’ o f f ’ . p a r a m e t e r . flag = 1. opts . end % end s w i t c h catch e r r o r ( s p r i n t f ( ’ An e r r o r a c c u r e d : % s ’ . case ’ derivativecheck ’ v a l u e = checkYesNo ( p a r a m e t e r . p r e c i s i o n F u n ) . v a l u e ) ) . l a s t e r r ) ) . value ) if ˜ i s r e a l ( value ) ischar ( value ) . e r r o r ( s p r i n t f (’% s e l s e i f ( value floor ( value )) 0. case ’ usetenta tivejacobian ’ v a l u e = checkYesNo ( p a r a m e t e r . end if flag . e r r o r ( s p r i n t f (’% s ”% s ” i s n o t a v a l i d o p t i o n ’ . is not a s trin g ’ . v a l u e ) . i f isempty ( opts . is not a scalar ’ . e r r o r ( ’ p r e c i s i o n F u n and p r e c i s i o n D e l t a m u s t b o t h be s e t ’ ) . flag = 1. parameter ) ) .112 case ’ tru st regi on ’ value = checkScalar ( parameter . e r r o r ( s p r i n t f (’% s end function [ value ] = checkInteger ( parameter . parameter ) ) . precisionFun = value . precisionDelta = value . end end end ˜ isempty ( opts . v a l u e ) if ˜ ischar ( value ) . parameter ) ) . opts . v a l u e ) . p r e c i s i o n F u n ) . parameter ) ) . opts . e l s e i f isempty ( opts . opts .

end fun = i n t e r n . [m. else i f ˜ i s e m p t y (A ) ˜ isempty ( b ) e r r o r ( ’ B o t h A and b h a s t o be g i v e n ’ ) . end end % Equality constraints . . A .A. . nonlcon . Aeq ] . i f i s a ( nonlcon . multipliers (1: n ). 1 . b . repmat ( f u n J . i n t e r n . b = [ ] . ’ f u n c t i o n h a n d l e ’ ) i s a ( nonlcon . J c o n t a i n e r = [ J c o n t a i n e r . nonlcon = i n t e r n . [ f u n f .A PPENDIX D. m ) ’ . sigma = i n t e r n . n o n l c o n ) . m) . beq . A = r e s h a p e (A . beq . n = l e n g t h ( f ) . p . c = c ’ . end ( ’ n o n l c o n ’ ) . 1 ) + sigma A ] . n . x . p = l e n g t h ( beq ) . x ) . Aeq = [ Aeq . unconstr multipliers = perf . Aeq . m p ) ’ . U s e s an e x a c t p e n a l t y f u n c t i o n . f c o n t a i n e r = [ f c o n t a i n e r . intern ) Ìò ò Ì ó ó ï ï ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ ±±±±±±±±±Ì ÌÌ ó ò Ì ó . beq . b ) % INPUT : % nonlcon : Function t h a t r e t u r n s the n o n l i n e a r c o n s t r a i n t s . A = [ ] . Aeq = r e s h a p e ( Aeq . sigma = 0 . f p a r ) % C o n s t r a i n e d minimax s o l v e r . b . ó ó ó ó ÌÌÌÌÌÌÌÌÌÌÌÌ ±Ì % INNER FUNCTIONS function [ f . i n f o . p . % Equality constraints i f ˜ i s e m p t y (A ) & ˜ i s e m p t y ( b ) b. sigma = sigma + 0 . r e p m a t ( f u n J . A . 1 . i n f o ] = cminimax ( f u n . p . f container = []. J c o n t a i n e r = [ J c o n t a i n e r . 1 ) + s i g m a Aeq ] . end ( ’ Aeq ’ ) . w h i l e max ( u n c o n s t r m u l t i p l i e r s ) eps . beq = i n t e r n . A = i n t e r n . c = c ’ . 1 .4 Source code for CMINIMAX in Matlab f u n c t i o n [ x . f p a r . A . p . % g e t u n c o n s t r a i n e d s i z e of problem . p e r f ] = cslp4 ( @penaltyFunction . n ] = s i z e ( f u n J ) . b = intern . fun . c = r e p m a t ( beq . i f ˜ i s e m p t y ( Aeq ) & ˜ i s e m p t y ( beq ) Double t h e c o n s t r a i n t s . end ( ’ b ’ ) . un constr multiplie rs = zeros (1. % display default values . % i f nargin 1. 1 ) + sigma c ( : ) ] . f p a r = [ ] . % x = cminimax ( f u n . 1 . Aeq = r e p m a t ( Aeq . nonlcon . m p ) ’ . c = repmat ( b . f c o n t a i n e r = [ f c o n t a i n e r . x . f = f e v a l ( fun . end end ï % Nonlinear c ons t ra in ts . end x = x(:). n o n l c o n = [ ] . J container = [ ] . i n t e r n = i n t e r n O p t s ( f u n . f u n J ] = f e v a l ( fun . fpar = intern . m ) . b . b = A x p = length (b ). repmat ( f u n f . repmat ( f u n f . [ x . f p a r ) . n . end ( ’ beq ’ ) . beq = [ ] . end ( ’ A ’ ) . J ] = penaltyFunction(x . ’ char ’ ) . Aeq . else i f ˜ i s e m p t y ( Aeq ) ˜ i s e m p t y ( beq ) e r r o r ( ’ B o t h Aeq and beq h a s t o be g i v e n ’ ) . f p a r ) . opts . beq ]. s i g m a . x0 . e x i t f l a g . fpar . A = r e p m a t (A . % beq = Aeq x beq . 1 ) + sigma c ( : ) ] . . Aeq = i n t e r n . Aeq = [ ] . return end if if if if if if ˜ ˜ ˜ ˜ ˜ ˜ exist exist exist exist exist exist ( ’ f p a r ’ ) . x . m ) ’ . 1 . 1 . sigma . n ) . cslp4 . Aeq . S OURCE CODE 113 D. beq = [ beq .

[ ] ) . [ ] . fun = fun . A = repmat ( fun A . Aeq = Aeq . % OUTPUT: % s i g m a : The p e n a l t y f a c t o r . end end % Upper and l o w e r bound . p . x ) . m) . fpar = fpar . [ ] . Aeq . . % C : The g r a d i e n t o f C . ’ Aeq ’ . f c o n t a i n e r = [ f c o n t a i n e r . p . c ) % [ s i g m a ] = g e t S i g m a ( PF . ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ öööööööööööööööööööööööööööööööööööööööööööööööööööööööÌ % ÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌÌ öööööööööööööööööööööööööööööööööööööööööööööööööööööööÌ % ó ó ó . . n o n l c o n ) % Create s t ru c t . f p a r . ’ n o n l c o n ’ . e r r o r ( ’ nonlcon i s not a f u n c t i o n h a n d l e nor a s t r i n g ’ ) .114 [ f u n c . J container ]. [ ] . A PPENDIX D. ’ b ’ . [ ] . m p ) ’ . f = [ fun f . 1 ) + K A] . . . f u n c t i o n [ s i g m a ] = g e t S i g m a ( PF . intern intern intern intern intern intern intern intern . c = c ’ . % % INPUT : % PF : The g e n e r a l i z e d g r a d i e n t o f F . . beq . . sigma = sigma . . % Not i m p l e m e n t e d y e t . ’ f p a r ’ . J c o n t a i n e r = [ J c o n t a i n e r . . % Not i m p l e m e n t e d y e t . ’ sigma ’ . fun A ] = f e v a l ( nonlcon . n . c = repmat ( fun c . 1 . else i f ˜ isempty ( nonlcon ) . ’ beq ’ . repmat ( f u n J . A . f container ]. I f more t h a n one % c o n s t r a i n t i s a c t i v e then j u s t chose % one . ’ A ’ . [ ] . s i g m a . S OURCE CODE % Add t h e c o n t a i n e r s w i t h t h e f u n c t i o n . . 1 ) + sigma c ( : ) ] . [ ] . nonlcon = nonlcon . J = [ fun J . A = r e s h a p e (A . [ ] . i n t e r n = s t r u c t ( ’ fun ’ .A = A. b .b = b. beq = beq . f u n c t i o n [ i n t e r n ] = i n t e r n O p t s ( f u n . m) ’ . c ) % C a l c u l a t e t h e sigma t h a t t r i g g e r s a s h i f t t o a % new s t a t i o n a r y p o i n t . . 1 . repmat ( f u n f .

. Fletcher. Dennis. 2000. sixth edition. SIAM Journal of Numerical Analysis.dk/courses/02611/uncon. DTU. Bard. Optimization Toolbox For Use with Matlab. Newer title available from SIAM Society for Industrial & Applied Mathematics ISBN 0898713412. Practical Methods of Optimization. Toint. R. Griewank and Ph. 1981. 1996. september 1992. Hillier and Gerald J. Kristian Jonasson. Toint. Osborne. http://www. [GT82] A. 1965. Math. Math. Society. Clarke. [FJNT99] Poul Erik Frandsen. 1982.115 Bibliography Y. 34:372–387. ﬁrst edition. Am. and S.K. MPS-SIAM Series on Optimization. 1979. [KO68] J. Huber. McGraw-Hill Book Company. Wiley. Nicholas I. Mathematical Software.L. ﬁfth edition. Dutta. http://www. [Fle00] R. (7):157–186. Methods for Unconstrained Optimization Problems. 16(1):70–86. Brown and J.dtu. Introduction to Operations Research. 1970.com/access/helpdesk/help/pdf_ [Bar70] £ . A new algorithm for nonlinear least squares curve ﬁtting. van Loan. pages 391–396. Siam J. [Mad86] Kaj Madsen. Hans Bruun Nielsen.mathworks. Nummer. and Ole Tingleff. ﬁrst edition. Gould. Kowalik and M. S. Jonasson and K. [CGT00] Andrew. Robust Statistics. Corrected sequential linear programming for sparce minimax optimization. Conn. 1994.R. IMM. [Mat00] MathWorks. 1999. E. Corrected sequential linear programming for o sparse minimax optimization. Polyteknisk Forlag. 39:119–137. Technical Report NI-92-06. [EAVD79] El-Attar.H. Unconstrained Optimization. H. M. 1968. American Elsevier Publishing Co. Madsen. [BD71] K. and Philippe L. 1971. M. [Cla75] F. third edition. Matrix Computation. [HL95] Frederick S. Institute for Numerical Analysis. Golub and C. [Hub81] Peter J. L. imm. 2000.. Minimization of Non-linear Approximation Functions. 205:247–262. Vidyasagar. Lieberman. Nonlinear Programming. The John Hopkins University Press. F. [JM94] K. J´ nasson and K. BIT. 1975. An algorithm for 1 approximation. Madsen. anal. 1995. Partitioned variable metric updates for large structured optimization problems.pdf. Trust-Region Methods. Mangasarian. Numer. Trans. 1986. Comparison of gradient methods for solution of nonlinear parameter estimation problems. M. McGraw Hill. 2nd edition edition.. [JM92] K. 2000. Technical University of Denmark. [Man65] O. The MathWorks. [GvL96] G. R. John Wiley. Generalized gradients and applications.

B. Hans Bruun Nielsen. H. DTU. Technical University of Denmark. IMM. 3:175–184. Madsen.dtu. Math.dtu.C. An automatic method for ﬁnding the greatest or least value of a function. dtu. and Jacob Søndergaard.ps.B. Tingleff. imm. K. IMM.dk/02611/SN.B.pdf. http://www. 2002. 2nd edition. http://www. Ann Arbor. ﬁrst edition. Comput. The minimax solution of an overdetermined system of non-linear equations.html. Institute for Numerical Analysis.B.H. H. 1972. Technical Report IMM-REP-1999-05. Optimization with constraints. Inst. Robust subroutines for nonlinear optimization.dtu. Nielsen. DTU. dk/˜km/F-pak. 1999. Methods for Non-linear Least Squares Problems. H.. http://www. K. 1999. K. H. Nummerisk Lineær Algebra. 1999. Nielsen. 3(16):159–166. 23:167–180. Rosenbrock. O. DTU. and O.imm. Algorithms for linear optimization. USA. J.dtu. 2002.dk/˜hbn/ publ/TR9905. DTU. Operations Research Letters.imm.pdf. http://www.dk/courses/02611/NLS.dk/˜hbn/publ/ALO.116 BIBLIOGRAPHY doc/optim/optim_tb. K. Supplementary notes for 02611 optimization and data ﬁtting.B.imm. Madsen and H. Nielsen. Hans Bruun Nielsen. Applications and extensions of an algorithm that computes ﬁxed points of certain upper semicontinuous point to set mappings. IMM Department of Informatics and Mathematical Modelling. H. Damping parameter in marquardt’s method. 2001. Appl. DTU.imm. IMM. 1994. University of Michigan. New characterizations of solutions to overdetermined systems of linear equations. 1979. 1960. Nielsen. [Mer72] [MN02] [MNP94] [MNS02] [MNT99] [MNT01] [Nie96] [Nie99a] [Nie99b] [Ros60] [Wat79] . http://www.z.B. IMM. H.imm. Watson.dtu. Pinar. http://www.pdf.ps. and M. Nielsen. Nielsen. Madsen. Technical Report IMM-REP-2002-02.dk/courses/02611/H40. Merrild. Kaj Madsen. 1996. J. PhD thesis.pd\%f. Madsen. and Ole Tingleff.H.

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->