Nonlinear Programming - Olvi L Mangasarian (1994)

Nonlinear Programming& fe Nonlinear 3 ProgrammingSIAM’s Classics in Applied Mathematics series consists of books that were previously allowed to go out of print. These books are republished by SIAM as a professional service because they continue to be important resources for mathematical scientists. Editor-in-Chief Robert E. O'Malley, Jr., University of Washington Editorial Board Richard A. Brualdi, University of Wisconsin Madison Herbert B. Keller, California Institute of Technology Andrzej Z. Manitius, George Mason University Ingram Olkin, Stanford University Stanley Richardson, University of Edinburgh Ferdinand Verhulst, Mathematisch Instituut, University of Utrecht Classics in Applied Mathematics C.C. Lin and L. A. Segel, Mathematics Applied to Deterministic Problems in the Natural Sciences Johan G. F. Belinfante and Bernard Kolman, A Survey of Lie Groups and Lie Algebras with Applications and Computational Methods James M. Ortega, Numerical Analysis: A Second Course Anthony V. Fiacco and Garth P. McCormick, Nonlinear Programming: Sequential Unconstrained Minimization Techniques FE. H. Clarke, Optimization and Nonsmooth Analysis George F Carrier and Carl E. Pearson, Ordinary Differential Equations Leo Breiman, Probability R. Bellman and G. M. Wing, An Introduction to Invariant Imbedding Abraham Berman and Robert J. Plemmons, Nonnegative Matrices in the Mathematical Sciences Olvi L. Mangasarian, Nonlinear Programming *Carl Friedrich Gauss, Theory of the Combination of Observations Least Subject to Errors: Part One, Part Two, Supplement. Translated by G. W. Stewart Richard Bellman, Introduction to Matrix Analysis U.M. Ascher, R. M. M. Mattheij, and R. D. Russell, Numerical Solution of Boundary Value Problems for Ordinary Differential Equations K. E. Brenan, S. L. Campbell, and L. R. Petzold, Numerical Solution of Initial- Value Problems in Differential-Algebraic Equations Charles L. Lawson and Richard J. Hanson, Solving Least Squares Problems J. E. Dennis, Jr. and Robert B. Schnabel, Numerical Methods for Unconstrained Optimization and Nonlinear Equations Richard E. Barlow and Frank Proschan, Mathematical Theory of Reliability *First time in print.Classics in Applied Mathematics (continued) Cornelius Lanczos, Linear Differential Operators Richard Bellman, Introduction to Matrix Analysis, Second Edition Beresford N. Parlett, The Symmetric Eigenvalue Problem Richard Haberman, Mathematical Models: Mechanical Vibrations, Population Dynamics, and Traffic Flow Peter W. M. John, Statistical Design and Analysis of Experiments Tamer Basar and Geert Jan Olsder, Dynamic Noncooperative Game Theory, Second Edition Emanuel Parzen, Stochastic Processes Petar Kokotovié, Hassan K. Khalil, and John O'Reilly, Singular Perturbation Methods in Control: Analysis and Design Jean Dickinson Gibbons, Ingram Olkin, and Milton Sobel, Selecting and Ordering Populations: A New Statistical Methodology James A. Murdock, Perturbations: Theory and Methods Ivar Ekeland and Roger Témam, Convex Analysis and Variational Problems Ivar Stakgold, Boundary Value Problems of Mathematical Physics, Volumes I and II J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables David Kinderlehrer and Guido Stampacchia, An Introduction to Variational Inequalities and Their Applications E Natterer, The Mathematics of Computerized Tomography Avinash C. Kak and Malcolm Slaney, Principles of Computerized Tomographic Imaging R. Wong, Asymptotic Approximations of Integrals ©. Axelsson and V. A. Barker, Finite Element Solution of Boundary Value Problems: Theory and Computation David R. Brillinger, Time Series: Data Analysis and Theory Joel N. Franklin, Methods of Mathematical Economics: Linear and Nonlinear Programming, Fixed-Point Theorems Philip Hartman, Ordinary Differential Equations, Second Edition Michael D. Intriligator, Mathematical Optimization and Economic Theory Philippe G. Ciarlet, The Finite Element Method for Elliptic ProblemsThis page intentionally left blankNonlinear Programming Olvi L. Mangasarian University of Wisconsin Madison, Wisconsin Society for Industrial and Applied Mathematics PhiladelphiaLibrary of Congress Cataloging-in-Publication Data Mangasarian, Olvi L., 1934- Nonlinear programming / Olvi L. Mangasarian. Pp. cm. -- (Classics in applied mathematics ; 10) Originally published: New York : McGraw-Hill, 1969, in series: McGraw-Hill series in systems science. Includes bibliographical references and indexes. ISBN 0-89871-341-2 1, Nonlinear programming. I, Title. Il. Series TS7.8.M34 1994 519.7'6--de20 94-36844 109876543 All rights reserved. Printed in the United States of America. No part of this book may be reproduced, stored, or transmitted in any manner without the written permission of the Publisher. For information, write the Society for Industrial and Applied Mathematics, 3600 University City Science Center, Philadelphia, PA 19104-2688. Copyright © 1994 by the Society for Industrial and Applied Mathematics. This SIAM edition is a corrected republication of the work first published in 1969 by the McGraw-Hill Book Company, New York, New York. siam. is a registered trademark.To Josephine Mangasarian, my mother, and to ClaireThis page intentionally left blankPreface to the Classics Edition Twenty-five years have passed since the original edition of this book appeared; however, the topics covered are still timely and currently taught at the University of Wisconsin as well as many other major institutions. At Wisconsin these topics are taught in a course jointly listed by the Computer Sci- ences, Industrial Engineering, and Statistics departments. Students from these and other disciplines regularly take this course. Each year | get a number of requests from the United States and abroad for copies of the book and for permission to reproduce reserve copies for libraries. | was therefore pleased when SIAM approached me with a proposal to reprint the book in its Classics series. I believe that this book is an appropriate choice for this series inasmuch as it is a concise, ‘igorous, yet acces- sible account 0’ the fundamentals of constrained optimization theory that is useful to both the beginning student as well as the active re- searcher. lam appreciative that SIAM has chosen to publish the book and to make the corrections that I sup- plied. I am especially grateful to Vickie Kearn and Ed Block for their friendly and professional handling of the publication process. My hope is that the mathematical programming community will benefit from this endeavor, Olvi L. MangasarianThis page intentionally left blankPreface This book is based on a course in nonlinear programming given in the Electrical Engineering and Com- puter Sciences Department and the Industrial Engineering and Opera- tions Research Department of the University of California at Berkeley and in the Computer Sciences De- partment of the University of Wis- consin at Madison. The intent of the book is to cover the fundamental theory underlying nonlinear programming for the applied math- ematician. The entire book could be used as a text for a one-semester course, or the first eight chapters for a one-quarter course. The course level would probably be advanced undergraduate or first~ year graduate, The only prereq- uisite would be a good course in advanced calculus or real analysis. (Linear programming is not a pre- requisite.) All the results needed in the book are given in the Appendixes. I am indebted to J. Ben Rosen who first introduced me to the fascinating subject of nonlinear programming, to Lotfi A. Zadeh who originally suggested the writing of such a book, to Jean-Paul Jacob, Phillippe Rossi, and James W. Daniel who read the manu- script carefully and made numerous improvements, and to all my students whose questions and observations resulted in many changes. Olvi L. Mangasarian xiThis page intentionally left blankTo the Reader The following system of num- bering and cross-referencing is used in this book. At the top of each page in the outer margin ap- pear chapter and section numbers in boldface type; for example, 3.2 at the top of a page means that the discussion on that page is part of Chapter 3, Section 2. In addition, each item on the page (Definition, Theorem, Example, Comment, Remark, etc.) is given a number that appears in the left- hand margin; such items are num- bered consecutively within each section. Item numbers and all cross-references in the text are in italic type. Cross-references are of the form “by Definition 5.4.3”; this means ‘‘by the definition which is item 3 of Section 4 in Chapter 5.” Since the four appendixes are la- beled A, B, C, and D, the reference “C.1.8” is to “item 3 in Section 1, Appendix C.” When we refer in a section to an item within the same section, only the item number is given; thus “substituting in 7” means “‘substituting in Equation 7 of this section.” xbiiThis page intentionally left blankContents Preface to the Classics Edition Preface To the Reader Chapter 1. The Nonlinear Programming Problem, Preliminary Concepts, and Notation 1. The nonlinear programming problem 2. Sets and symbols 3. Vectors 4. Matrices 5. Mappings and functions 6. Notation Chapter 2. Linear Inequalities and Theorems of the Alternative 1. Introduction 2. The optimality criteria of linear programming: An application of Farkas’ theorem 3. Existence theorems for linear systems 4. Theorems of the alternative Chapter 3. Convex Sets in R” 1. Convex sets and their properties 2. Separation theorems for convex sets Chapter 4. Convex and Concave Functions 1, Definitions and basic properties 2. Some fundamental theorems for convex functions Chapter 5. Saddlepoint Optimality Criteria of Nonlinear Programming Without Differentiability 1. The minimization and saddlepoint problems 2, Some basic results for minimization and local minimization problems 8. Sufficient optimality criteria 4. Necessary optimality criteria xi xiii Ro ma wom me 16 16 18 at a7 38 38 54 65 63 69 70 72 74 16 xvxvi Contents Chapter 6. Differentiable Convex and Concave Functions 83 1. Differentiable convex and concave functions 83 2. Differentiable strictly convex and concave functions 87 3. Twice-differentiable convex and concave functions 88 4. Twice-differentiable strictly conver and concave functions 90 Chapter 7. Optimality Criteria in Nonlinear Programming with Differentiability 92 1. The minimization problems and the Frita John and Kuhn-Tucker stationary-point problems 98 2. Sufficient optimality criteria 96 3. Necessary optimality criteria o7 Chapter 8. Duality in Nonlinear Programming 13 1. Duality in nonlinear programming 114 2. Duality in quadratic programming 123 3. Duality in linear programming 126 Chapter 9. Generalizations of Convex Functions: Quasiconvex, Strictly Quasiconvex, and Pseudoconvex Functions 131 1. Quasiconvex and quasiconcave functions 131 &. Strictly quasiconvez and strictly quasiconcave functions 136 8. Pseudoconvex and pseudoconcave functions 140 4. Summary of properties and relations between quasiconvex, strictly quasiconver, pseudoconvex, convex, and strictly convex functions 145 5. Warning 147 6. Problems 148 Chapter 10. Optimality and Duality for Generalized Convex and Concave Functions 151 1. Sufficient optimality criteria 161 &. Necessary optimality criteria 153 8. Duality 1857Contents xvii Chapter 11. Optimality and Duality in the Presence of Nonlinear Equality Constraints 161 1. Sufficient optimality criteria 162 “Minimum principle” necessary optimality criteria: X° not open 162 8. Fritz John and Kuhn-Tucker stationary-point necessary optimality criteria: X° open 170 4. Duality with nonlinear equality constraints 174 Appendix A. Vectors and Matrices 177 1. Vectors 177 2. Matrices 179 Appendix B. Résumé of Some Topological Properties of R” 182 1. Open and closed sets 182 2. Sequences and bounds 185 8. Compact sets in R" 188 Appendix C. Continuous and Semicontinuous Functions, Minima and Infima 191 1. Continuous and semicontinuous functions 191 2. Infimum (supremum) and minimum (maximum) of a set of real numbers 195 3. Infimum (supremum) and minimum (maximum) of a numerical function 196 4. Bzistence of a minimum and a maximum of a numert- cal function 198 Appendix D. Differentiable Functions, Mean-value and Implicit Function Theorems 200 1. Differentiable and twice-differentiable functions 200 2. Mean-value theorem and Taylor’s theorem 204 3. Implicit function theorem 204 Bibliography 205 Name Index 215 Subject Index 217This page intentionally left blankChapter One The Nonlinear Programming Problem, Preliminary Concepts, and Notation 1. The nonlinear programming problem+ The nonlinear programming prob- Jem that will concern us has three fundamental ingredients: a finite number of real variables, a finite number of constraints which the variables must satisfy, and a function of the variables which must be minimized (or maximized). Mathematically speaking we can state the problem as follows: Find specific values (#1, ... ,@,), if they exist, of the variables (1, . . . ,€a) that will satisfy the tnequality constraints g(t... tn) SO t=1,...,m the equality constraints Aj(t1, . . . 2a) = 0 jel,...,k and minimize (or maximize) the objective function O(x1, . . . yEn) over all values of 21... ,tn satisfying 1 and 2. Here, g:, hy, and @ are numerical functionst of the variables 21, .. . ,%,, which are defined for all finite values of + In order to introduce the problem in the first section of the book, some undefined terms (function, real variable, constraints, etc.) must be interpreted intuitively for the time being. The problem will be stated rigorously at the end of this chapter (see 1.6.9 to 1.6.12). t The concept of a numerical function will be defined precisely in Sec. 1.5. For the present by a numerical function of 2... 520 we mean a correspondence which assigns a single real number for each n-tuple of real values that the variables 21, .-.. Zn assume.AL Nonlinear Programming the variables. The fundamental difference between this problem and that of the classical constrained minimization problem of the ordinary calculus [Courant 47, Fleming 65] is the presence of the inequalities 1. As such, inequalities will play a crucial role in nonlinear programming and will be studied in some detail. As an example of the above problem consider the case shown in Fig. 1.1.1. Here we have n = 2 (two variables 21,22), m = 3 (three inequality constraints), and k = 1 (one equality constraint). Each curve in Fig. 1.1.1 is obtained by setting some numerical function equal to a real number such as 6(z),22) = 5 or g2(x1,22) = 0. The little arrows on the t This refers to the works by Courant, written in 1947, and by Fleming, written in 1965, as listed in the Bibliography at the back of the book. This system of references will be used throughout the book with one exception: [Gordan 73] refers to Gordan’s paper written in 1873. Fig. 1.1.1 A typical nonlinear programming problem in two variables (21,22).Preliminary Concepta and Notations 12 curves gi(z1,t2) = 0 indicate the side in the direction of which g; increases, and hence all (z1,22) must lie on the opposite side of these curves if they are to satisfy 7. All such (21,22) lie in the shaded area of Fig. 1.1.1. To satisfy 2, (11,22) must lie on the curve hi(z1,22) = 0. The solution to the problem is (41,42). This is the point on the curve A,(x1,22) = 0 at which @ assumes its lowest value over the set of all (1,22) satisfying 9g:(11,22) < 0, a= 1, 2,3. In more complicated situations where n, m, and k may be large, it will not be easy to solve the above problem. We shall then be concerned with obtaining necessary and/or sufficient conditions that a point (Z1, . . . ,Zn) must satisfy in order for it to solve the nonlinear programming problem / to 3. These optimality conditions form the crux of nonlinear programming. In dealing with problems of the above type we shall confine our- selves to minimization problems only. Maximization problems can be easily converted to minimization problems by employing the identity maximum 6(21, . . . ,2n) = —minimum [—6(21, . . . ,2,)] Problem Solve graphically as indicated in Fig. 1.1.1 the following nonlinear programming problem: minimize (—z: — 22) subject to (21)? — 22 50 (ea)? + (a2)? = 1 2. Sets and symbols We shall use some symbols and elementary concepts from set theory [Anderson-Hall 63, Hamilton-Landin 61, Berge 63]. In particular a set T. is a collection of objects of any kind which are by definition elements or poinis of T. For example if we let R (the reals or the real line) denote the set of all real numbers, then 7 is an element or point of R. We use the symbol € to denote the fact that an element belongs to a set. For example we write 7€R. For simplicity we also write sometimes 5,7 € BR instead of 5€ Rand7 ER. If T and A are two sets, we say that I is contained in A, T is in A, T is a subset of A, or A contains I, if each element of I is also an element of A,andwewntel! C AorA DI. Iff CAandA CTI wewritel = A. A slash across a symbol denotes its negation. Thusz@VTandr'@a denote respectively that is not an element of I' and that I is not a subset1a Nonlinear Programming of A. The empty set is the set which contains no elements and is denoted by @. We denote a set sometimes by {z,y,2}, if the set is formed by the elements z, y, 2. Sometimes a set is characterized by a property that its elements must have, in which case we write {x | x satisfying property P} For example the set of all nonnegative real numbers can be written as {z|c ER, 2 20} The set of elements belonging to either of two sets I or A is called the union of the sets l and A and is denoted by TU A. We have then TUA= {x|xE€QTorzE€a} The set of elements belonging to at least one of the sets of the (finite or infinite) family of sets (T,)er is called the unzon of the family and is denoted by UT,. Then ser UT; = {x|« ET; for somei € I} ier The set of elements belonging to both sets f and A is called the intersection of the sets I' and A and is denoted by fA. We then have TOA= {c|z€Pandz€ A} The set of elements belonging to all the sets of the (finite or infinite) family of sets (Ti)ser, is called the intersection of the family and is denoted by OT;. Then ier OT; = {x|2 ET; for each i € I} ser ‘Two sets T and A are disjoint if they do not intersect, that. is, if TOA=%. The difference of the sets A and I is the set of those elements of A not contained in I and is denoted by A~T. We have then A~T={r|rEa cr} In the above it is not assumed in general that? C A. Ifhoweverl C A, then A ~T is called the complement of I relative to A. The product of two sets T and A, denoted by F X A, is defined as the set of ordered pairs (x,y) of which zr © T andy © A. We have then TXKA= {ay |e ET, y EA} 4Preliminary Concepts and Notations 1a Fig. 1.2.1 The product [ X A of the sets T and A. The product of n sets T:, ..., Pn, denoted by T1 XT2X °° ° Xa, is defined as the set of ordered n-tuples (v1, . . . ,tn) of which 2, ET), -,% ET. We have then Mr XT2xX +0 KPa = (ys. te) [tr Ely... te EUs} Iff;=l2= +++ =I, =T, then we write =TXTXK ++: XT, If we let T= fz|e€R, 1S283}) A= {x|zER, 1 S25 2) then TXA={(ty)|zER, yER 1S 283,184 8 2) Figure 1.2.1 depicts the set T X A. The set R? = R X R, which can be represented by points on a plane, is called the Euclidean plane. The following symbols will also be used: (Wz) reads for each x (4r) reads there exists an x such that => reads implies <= reads is implied by reads 1s equivalent to (A slash [/] across any one of the last three symbols denotes their negation.) For example the statement “for each z there exists ay such that 6(z,y) = 1” can be written as (Wa) (Ay): 6(@,y) = 1 The negation of the above statement can be automatically written as (Az)(Wy): (a,y) #4 1 Frequently we shall refer to a certain relationship such as an equation or an inequality by a number or Roman numeral such as I or II.13 Nonlinear Programming The notation I = II means relationship I implies relationship II. An overbar on I or II (I or FI) denotes the negation of the relationship referred to by that numeral, Obviously then the statement that I = II is logically equivalent to] = II. Thus (I=) d =I) 3. Vectors n-vector An n-vector or n-dimensional vector x, for any positive integer n, is an n-tuple (x1, . .. ,%n) of real numbers. The real number 2; is referred to as the ith component or element of the vector x. Re The n-dimensional (real) Euclidean space R”, for any positive integer n, is the set of all n-vectors, The notation « € R* means that z is an element of R”, and hence, z is an n-vector. Frequently we shall also refer to x as a point in Rr. R}, or simply R, is then the Euclidean line (the set of all real numbers), R? is the Euclidean plane (the set of all ordered pairs of real numbers), and R"=RXRX--- XR (n times). Vector addition and multiplication by a real number Let zy GR" anda€R. The sum xz + y is defined by t+y=Gity, ..- a+ Yn) and the multiplication by a real number az is defined by ax = (ax), .. . On) Linear dependence and independence The vectors z}, . . . ,2™ © R* are said to be linearly independent if Mattos + +Amam™=0 AA AMER otherwise they are linearly dependent. (Here and elsewhere 0 denotes the real number zero or a vector each element of which is zero.) =M= Mees =m =0 Linear combination The vector x € Rr is a linear combination of x1, ... ,2" € Rif z= Nabe ++ +A forsome \!,...,A™ER 610 Preliminary Concepts and Notations is and it is a nonnegative linear combination of x’, . . . , x” if in addition to the above equality \!,...,A™ 20. The numbers A}, ... , Aare called weights. The above concepts involving vector addition and multiplication by a scalar define the vector space structure of R. They are not enough however to define the concept of distance. For that purpose we introduce the scalar product of two vectors. Scalar product The scalar product xy of two vectors z,y © R* is defined by ay = ryt + 2nYn Norm of a vector The norm ||z\| of a vector x € Rr is defined by (fall = +(xx)* = 4a)? + ++ + Ga) Cauchy-Schwarz inequality Let xy © R*. Then leyl S Ilall + llyll where |xy| is the absolute value of the real number zy. PROOF Let z,y € Rbe fixed. Foranya ER (at + y)(at + y) = x(a)? + 2rya + yy 20 Hence the roots of the quadratic equation in a xa(a)? + 2rye + yy = 0 cannot be distinct real numbers, and so (zy)? S (zx) (yy) which implies the Cauchy-Schwarz inequality. J Distance between two points Let zy © R". The nonnegative number 4(z,y) = ||x — yj is called the distance between the two points x and y in R*. Problem Establish the fact that R" is a metric space by showing that 711 1A Nonlinear Programming (x,y) satisfies the following conditions s(z,y) 20 ary) =Oer=y a(z,y) = 8(y,2) 5(x,z) S 4(x,y) + 8(y,2) (triangle inequality) (Hint: Use the Cauchy-Schwarz inequality to establish the triangle inequality.) Angle between two vectors Let z and y be two nonzero vectors in R*: The angle y between x and y is defined by the formula ay cov Talal |= SYS This definition of angle agrees for n = 2,3 with the one in analytic geometry. The nonzero vectors x and y are orthogonal if ry = 0 (v = 7/2); form an acute angle with each other if zy 2 0(0 S ¥ S 1/2), a strict acute angle if zy > 0 (0 S$ ¥ < 7/2), an obtuse angle if zy $0 (r/2 S$ ¥ Sm), and a strict obtuse angle if zy < 0 (x/2 << ~ Sz). 4. Matrices Although our main concern is nonlinear problems, linear systems of the following type will be encountered very frequently: Antit -* + + Ante = bi Amiti t+ °° + Anntn = bm where Aj; and bi =1,...,m,j = 1, ... ,n, aregiven real numbers. We can abbreviate the above system by using the concepts of the previous section. If we let A; denote an n-vector whose » components are Ay, j=1,..., n, and if we let x € R*, then the above system is equivalent to Ag=b tsl...,m In 2 we interpret Ajz as the scalar product 1.3.6 of A; and. If we further let Ax denote an m-vector whose m components are A,z, 7 = 1, . , m, and b an m-vector whose m components are b;, then the equivalent systems / and 2 can be further simplified to 3 Ax=b10 11 Preliminary Concepts and Notations La In order to be consistent with ordinary matrix theory notation, we define the m X n matrix A as follows The 2th row of the matrix A will be denoted by 4; and will be an n-vector. Hence A; = (Anda... Ain) t=1,...,™m The jth column of the matrix A will be denoted by A,; and will be an m-vector. Hence Ay Aa As={ " Ams, The transpose of the matrix A is denoted by 4’ and is defined by Obviously the ith row of A is equal to the 7th column of A’, and the jth column of A is equal to the jth row of A’. Hence Ay = (A) = Al, Aj = (A; = Aj The last equalities of 8 and 9 are to be taken as the definitions of A’; and A; respectively. Since A,; is the real number in the ith row of the jth column of A, then if we define Aj, as the real number in the jth row of the ith column of A’, we have Ais = Aj The equivalent systems J, 2, and 3 can be written still in another form as follows 3 Aaj; =b fel12 13 14 15 16 Nonlinear Programming Here A; and b are vectors in R and z; are real numbers. The represen- tation 2 can be interpreted as a problem in R” whereas 1! can be interpreted as a problem in R™. In 2 we are required to find an z € R* that makes the appropriate scalar products }; (or angles, see 1.8.11) with the n-vectors Ay,i = 1,...,m. In 11, we are given the n + 1 vectors in Rv, Ajj =1,..., and b, and we are required to find n weights %1,..., 2, such that b is a linear combination of the vectors Aj. These two dual representations of the same linear system will be used in interpreting some of the important theorems of the alternative of the next chapter. The m X n matrix A of 4 can generate another linear system yA, defined as follows yA = Ay = 3 Aly = : Aw 1 im. where y © R. Hence, yA is an n-dimensional vector whose jth component is given by GWA)i= Ay FHl... yn In general we shall follow the convention of using upper case Latin letters to denote matrices. If A is an m Xn matrix, and if we let ICM = {1,2,... mj JON = {1,2,...,n} then we define the following submatrices of A (which are matrices with rows and columns extracted respectively from the rows and columns of A) Ar= (As|¢ ET) Fig. 1.4.1 An m X n matrix and its submatrices. 1017 18 19 21 22 23 Preliminary Concepts and Notations 16 As = {As |G ES) Ay; = jth column of Ar As = ith row of Ay Ars = (Anj)ies = (Ais)ier It follows then that Ay=Ay=A Ary = Ar and Aus=As Figure 1.4.1 depicts some of the above submatrices of A. Nonvacuous matrix A matrix A is said to be nonvacuous if it contains at least one element A, An m Xn matrix A with m = l andn 2 1 is nonvacuous even if all its elements A,; = 0. 5. Mappings and functions Mapping Let X and Y be two sets. A mapping I from X into Y is a correspondence which associates with every element x of X asubset of Y. For each x € X, the set I'(z) is called the zmage of x. The domain X* of f is the subset of points of X for which the image I'(z) is nonempty, that is, X* = {x{/zE X, r(x) = B} The range 1'(X*) of T' is the union of the images of all the points of X*, that is x4) = VU re) 2Ex* EXAMPLE Let X = Y = R, T(z) = {y|cosy = 2}. Then X* = {4|2ER,-1 Sas lj, r(X*)=VY=R. Function A function f is a single-valued mapping from a set X into a set Y. That is for each x € X, the image set f(x) consists of a single element of Y. The domain of f is X, and we say that f is defined on X. The range of f is f(X) = u S(z). (For convenience we will write the image of a func- 2 tion not as a set but as the unique element of that set.) a16 Nonlinear Programming Numerical function A numerical function @ is a function from a set X into R. In other words a numerical function is a correspondence which associates a real number with each element x of X. EXAMPLES If X = R, then @ is the familiar real single-valued function of a real variable, such as 6(x) = sin z. If X is the set of positive integers, then 6 assigns a real number for each positive integer, for example 6(z) = 1/z!. If X = R*, then @ is the real single-valued function of n variables. Vector function An m-dimensional vector function f is a function from a set X into R™, In other words a vector function is a correspondence which associates a vector from R” with each element z of X. The m components of the vector f(x) are denoted by fi(z), . . . ,fm(x). Each f; isa numerical function on X. A vector function f has a certain property (for example continuity) whenever each of its components f; has that property. EXAMPLE If X = Rk’, then f associates a point of R” with each point of R". The m components f;,, i = 1, .. . , m, of f are numerical functions on R*. Linear vector functions on R* An m-dimensional vector function f defined on R* is said to be linear if f(z) = Ar+b where A is some fixed m X n matrix and 6 is some fixed vector in R™. It follows that if f is a linear function on R” then fe + 2) = f@) + f@) —f) for zz" E Re far) = 4@)+(1-f0) forAER, xe R* (Conversely, the last two relations could be used to define a linear vector function on R*, from which it could be shown that f(z) = Az + 6 [Berge 63, p. 159}.) If m = 1 in the above, then we have a numerical linear function 6 on R* and Ox) = ce +7 where c is a fixed vector in R* and ¥ is a fixed real number. 12Preliminary Concepts and Notations 16 Inequalities or equalities involving linear vector functions (or linear numerical functions) will be naturally called linear inequalities or equaltties. 6. Notation Vectors and real numbers In general we shall follow the convention that small Latin letters will denote vectors such as a, b, ¢, x, y, 2, or vector functions such as f, g, h. Exceptions will be the letters 7, j, k, m, n, and sometimes others, which will denote integers. Small Greek letters will denote a real number (a point in 2) such as a, 8, y, & 7, ¢, or a numerical function such as 8, ¢, y. Subscripts Asmall Latin letter with an integer subscript or a small Latin letter subscript will denote a component of a vector, in general, and on occasion will denote a vector. For example, if x € R’, then x3 and z; denote respectively the third and 7th components of x. On the other hand we will have occasion to let 11 € R”, r. € R", ete., in which case this intent will be made explicit. Small Greek letters with integer or Latin subscripts will occasionally be used to denote real numbers such as Ai, Ax. If cER, KCN={l,... nm}, and K contains k elements each of which is distinct, then zicx is a vector in R* with the components {x;|¢ © K} and is denoted by rx. Thus a small Latin letter with a capital Latin letter subscript denotes a vector in a space of smaller or equal dimension to that of the space of the unsubscripted vector. Superscripts A small Latin or Greek letter with a superscript or an elevated symbol will denote a fixed vector or real number, for example x’, x, x‘, @, £, &', E, ete. Exponentiation on the other hand will be distinguished by parentheses enclosing the quantity raised to a power, for example (z)?. Zero The number 0 will denote either the real number zero or a vector in R* all components of which are zero. Matrices Matrices will be denoted by capital Latin letters as described in detail in a previous section, Sec. 1.4. 1310 11 12 16 Nonlinear Programming Sets Sets will always be denoted by capital Greek or Latin letters such asl, A,9, R,I, X,Y. Capital letters with subscripts, such as T,, Ta, Ti, and capital letters with elevated symbols, such as I'*, X° will also denote sets. (See also Sec. 1.2.) Ordering relations The following convention for equalities and inequalities will be used. If z,y € R", then P=yeou=y4i t=1,...,0" z2zyen2yi tT=1,...,07 apyerezy and ay z>yeu> yi t=1...,90 If xz 2 0, xis said to be nonnegative, if z > 0 then z is said to be semipost- tive, and if > 0 then z is said to be positive. The relations =, 2, >, > defined above are called ordering relations (in R"). The nonlinear programming problem By using the notation introduced above, the nonlinear programming problem 1.1.1 to 1.1.3 can be rewritten in a slightly more general form as follows. Let X° C R", let g, h, and @ be respectively an m-dimensional vector function, a k-dimensional vector function, and a numerical function, all defined on X*°. Then the problem becomes this: Find an 4, if such exists, such that o(@) = min Oz) FE X = {x|2 © X, g(x) <0, h@) = 0) x The set X is called the feasible region, # the minimum solution, and 6(2) the minimum. All points x in the feasible region X are referred to as feasible points or simply as feasible, Another way of writing the same problem which is quite common in the literature is the following: min 6(2z) zexe subject to g(x) $0 h(t) = 0 “13 14 Preliminary Concepts and Notations 16 We favor the more precise and brief designation 9 of the problem instead of 10 to 12. Notice that if we let X° = R* in the above problem, then we obtain the nonlinear programming problem 1.1.1 to 1.1.3. If X° = R* and 9, g, and h are all linear functions on R*, then problem 9 becomes a linear programming problem: Find an 2, if such exists, such that —bz = min (—bz) £EX = {z|2 CR, Ax Sc, Br = d} 26X where b, c, and d are given fixed vectors in R*, R”, and R* respectively, and A and B are given fixed m X n and k X n matrices respectively. There exists a vast literature on the subject of linear programming [Dantzig 63, Gass 64, Hadley 62, Simmonard 66]. It should be remarked that problem 13 is equivalent to finding an Z such that b= = max bx EX = {x|z ER, Ax Sc, Br = d} 26X When B and d are absent from this formulation, 14 becomes the standard dual form of the linear programming problem [Simmonard 66, p. 95]. isChapter Two Linear Inequalities and Theorems of the Alternative 1. Introduction It was mentioned in Chap. 1 that the presence of inequality constraints in a minimization problem constitutes the distinguishing fea- ture between the minimization problems of the classical calculus and those of nonlinear programming. Although our main interest lies in nonlinear problems, and hence in nonlinear inequalities, linearization (that is, approximat- ing nonlinear constraints by linear ones) will be frequently resorted to. This will lead us to linear inequalities. It is the purpose of this chapter to establish some fundamental theorems for linear inequalities which will be used throughout this work. (Needless to say, these fundamental theorems also play a crucial role in linear programming. See for example [Gale 60, chap. 2].) The type of theorem that will concern us in this chapter will involve two systems of linear inequalities and/or equalities, say systems I and II. A_ typical theorem of the alternative asserts that either system I has a solution, or that system II has a solution, but never both. The most famous theorem of this type is perhaps Farkas’ theorem [Farkas 02, Tucker 56, Gale 60]. Farkas’ theorem For each fixed p X n matriz A and each fixed vector b in R*, either I Az s0 bz > 0 has a solution x € R" orLinear Inequalities and Theorems of the Alternative a1 Ar As {x| Ax £ O} Fig. 2.1.1 Geometric interpretation of Farkas’ theorem: II’ has solution, I’ has no solution. II Ay=b y = 0 has a solution y € R? but never both. We shall postpone a proof of Farkas’ theorem until after we have given a geometric interpretation of it and applied it to get the necessary optimality conditions of linear programming, To give a geometric interpretation of Farkas’ theorem, we rewrite I and II as follows Vo Ax S0,j=1,...,p,b2>0 Pp Pp W YAw= ¥ Av = hy 205=1,...,7 gal j=l where A’, denotes the jth column of A’ and A; the jth row of A (see Sec. 7.4). System II’ requires that the vector b be a nonnegative linear combination of the vectors A; to Ap. System I’ requires that we find a vector z © R* that makes an obtuse angle (27/2) with the vectors Ai to A, and a strictly acute angle ( 0} Fig. 2.1.2 Geometric interpre- {+l Ax £0} tation of Farkas’ theorem: I’ has solution, Il’ has no solution. arDRA Pw & 22a Nonlinear Programming shows a simple case with n = 2, p = 8, in which II’ has a solution, and hence by Farkas’ theorem I’ cannot have a solution. Figure 2.1.2 shows a case with n = 2, p = 2, in which II’ has no solution, and hence by Farkas’ theorem I’ must have a solution. 2. The optimality criteria of linear programming: An application of Farkas’ theorem As a typical example of the use and power of the theorems of the alternative we shall show how Farkas’ theorem can be employed to derive necessary optimality conditions for the following linear programming problem: Find an @, if it exists, such that —bz = min (—br) EX = {x|xr€ R", Ar Sc} 2EX where 6 and ¢ are given fixed vectors in R" and R™ respectively, and A is a given fixed m X n matrix. Optimality criteria of linear programming (Necessity) Let 2 be a solution of the linear programming problem 1. Then there exists a i © R” such that (£,u) satisfy Aise (dual feasibility) + a) . oo. ae > (primal feasibility) t bf = ca (complementarity) t (Sufficiency) If some = € R" and some u € R™ satisfy 3 to 6, then = solves 1, PROOF (Necessity) Let Zsolvei. Define the index sets P, Q, and M as follows: PUQ=M=([l,... ,m} P = {i| Ai = cs} Q = {t| Ait < 3 and assume that P and Q contain p and q elements, respectively. Then (see 1.4.16) + These are standard terms of linear programming {Dantzig 63], which differ from the terminology of nonlinear programming (see Chap. 8). The complementarity condition usually refers to an equivalent form of 6: i(AZ — ¢) = 0. 1810 11 12 13 Linear Inequalities and Theorems of the Alternative aa Apé = cp Agt < cg If P = @, it follows that Az—c < —8e for some real number 6 > 0, where ¢ is an m-vector of ones. Then for each x € R*, we can find a real number « > 0 such that A(é+az)—c< —be+aAr 50 and hence + ax © X. Since Z is the minimum solution of 1, we have that for each z € R" there exists an a > 0 such that —bz S$ —b(z + az) Hence br 50 for each x € R" which implies that b = 0.t By taking 7 = 0 € R*, the relations 3 to 6 are satisfied because b = 0. If P ~ @, then we assert that the system Apz £0 bx > 0 has no solution zs € R*. For if 9 did have a solution z, say, then ar would also be a solution of 9 for each a > 0. Now consider the point &+ az, where z is a solution of 9anda >0. Then — b(Z + ax) < —be fora>0O (by 9) Ap(E+ ar)—cp 50 fora >0 (by 9, 7) Ag(é + at) — cg S —ée + aAgr S0 for some a > 0 (by 8) where in 12 the first inequality follows by defining e as a g-vector of ones and —8 = max (A\é — «;) < 0 1EQ and the second inequality of 12 holds for some « > 0 because —5 < 0. But relations 10 to 12 imply that + az € X and —b(#+ az) < —bd, which contradicts the assumption that @ is a solution of 4. Hence 9 has no solution z € R*, and by Farkas’ theorem 2.1.1 the system Ay=b y2z0 To see this, take for a fixed i€ M = {1,2,..., 2}, 2% = 5;, tj; = 0, then bx S 0 implies that (b;)? $0. Hence 6; = 0. Repeating this process for each i € M, we get 6 = 0. 194 15 16 2.2 Nonlinear Programming must have a Solution y © R. If we let 0 © R*, we have then Apy+AQ=b y20 and epy + cQ0 = yep = yApt = bt (by 7, 13) By defining @@E™ such that @ = (up,te), where wp = y ER, tig = 0 E Re, condition 14 becomes conditions 4 and 5, and condition 15 becomes condition 6. Condition 3 holds because # € X. This completes the necessity proof. (Sufficiency) Let € R™ anda € R” satisfy 3 to6, andletzr C X. By 83 we have that? GX. Now —bz — (—b#) = —b(z — 2) = -tA(@@— 4) (by 4) —tAr + ci (by 4, 6) = —a(Azr — c) 20 (by 5,2 EX) We remark that Farkas’ theorem was used only in establishing the necessity of the conditions 3 to 6, whereas the sufficiency of the conditions 3 to 6 required only elementary arguments. This is the typical situation in establishing optimality criteria in mathematical programming in general. Necessity requires some fairly sophisticated mathematical tool, such as a theorem of the alternative or a separation theorem for convex sets (see Chap. 3), whereas sufficiency can be established merely by using elementary manipulation of inequalities. The remaining sections of this chapter will be devoted to obtaining a rather comprehensive set of theorems of the alternative of the Farkas type. We shall follow Tucker [Tucker 56] in establishing these results. We begin first in the next section with some existence theorems for linear inequalities, from which all the theorems of the alternative follow. Problem Consider the following general form of the linear programmingLinear Inequalities and Theorems of the Alternative as problem: Find z € R*, 9 € R’, if they exist, such that —bi — ay = min (—bx — ay) 2yeX tERy,yER _ Ax+ Dy Sec (4.9) EX = (xy) Be + By =d y20 where b, a, c, and d are given vectors in R", R', R™, and R* respectively, and A, D, B, and E are given m X n,m X ¢,k Xn, andk X ¢ matrices respectively, Show that a necessary and sufficient condition for (2,9) to solve the above problem is that (%,y) and some @ € R™ and 6 € R* satisfy the following conditions: G9) Ex A’G+ Bo=b Di+Eiza a20 ci + di = be + 0g (Hint: Replace the equality constraint Br + Ey =d by Be+Eysd and —Bx — Ey $ -d and use Theorem 2.) 3. Existence theorems for linear systems We establish now some key theorems for the existence of certain types of solutions for linear systems and begin with a lemma due to Tucker [Tucker 56]. a1as Nonlinear Programming Tucker's lemma For any given p X n matrix A, the systems I Ar 20 and Il A’y=0,y20 possess solutions x and y satisfying Art+y>0 PROOF The proof is by induction on p. For p = 1, if A: = 0, take yi = 1,c = 0; if A, ¥ 0, take y: = 0, 2 = Aj. Now assume that the lemma is true for a matrix A of p rows and proceed to prove it for a matrix of p + 1 rows A: Ay A : a= =|" Api A, Apt. By applying the lemma to A, we get z,y satisfying Azz20 AYy=0 y20 Ayw+yi>O0 If Apyiz 2 0, we take g = (y,0). Then 4z20 Ay =0 g20 Aw+y>0 which extends the lemma to A. However, if A,,:¢ < 0, we apply the lemma a second time to the matrix B: By Art Apes ;- a : B, Ay + dodo where w=—4@_ 20 jH1,...,p (bya)10 i 12 13 Linear Inequalities and Theorems of the Alternative a8 So By = Aye + djApuit = 0 or Br =0 This second use of the lemma yields v,u satisfying Bu 20 Bu=0 u2z0 Byt+tu>0 Let @ = (4, ¥ Aju). It follows from § and 7 that int uZ0 . P > 2 Ali = Aut Any Yo dws = Bu — YAS + Ags Yo Muy = 0 im im i (by 4, 7) Let wao- prt pit then Aw (Bi — MApaiw cI 8 "1 — >> % = cs 8 K — oe us i I A,w (Bp — ApApsi)w 0 Apyw Bw Bu ~ —** Br Bu -[-| Ant “(= 19 | 2° 0 where the last inequality follows from 7, the equality before from 6, the equality before from 10, the equality before from 11, and the equality before from 4, Finally from 4, 11, 10, 6, and 7 we have Aw + us = (Bi— Apu tm = Bw t uy = By — 424” Bye tu = Bo +m > 0 ptt Relations 9, 8, 12, and 13 extend the lemma to 4.Pe 15 16 a8 Nonlinear Programming From Tucker’s lemma two important existence theorems follow. These theorems assert the existence of solutions of two linear systems that have a certain positivity property. First existence theorem [Tucker 56] For any given p X n matrix A, the systems I Az 20 and II Ay =0,y20 possess solutions x and y satisfying Arty>0 PROOF In Tucker’s lemma the row A, played a special role. By renumbering the rows of A, any other row, say A;, could have played the same role. Hence, by Tucker’s lemma /, there exist 2’ € R*, y EG R,t=1,..., p, such that Ar 20 Ay=0,y20 t=1,...,3 Aw + yi > 0 Define 2 2 z= )at y= Dy i= a1 Hence by 16, we have that Ax = $ av z0 im] AY Say! 0 y= y= j=l Pp y= Yyiz0 ial andfort=1,...,p, 2 Ag+ y= Aai+yi+ J) (Aat + yi) > 0 —|_ tr} 1 | >0 20 (by 15) (by 15)17 Linear Inequalities and Theorems of the Alternative a3 A, As A a NU ° A 0 ° ° 0 Ay Aa As (0) (4) (e) Fig. 2.3.1 Geometric interpretation of Theorem 14, n = 2, p = 3. or Ar+ty>0 J A geometric interpretation of the above theorem can be given in the space of the rows A; G@ R",i=1,..., p. The theorem states that one of the following alternatives must occur for any given p X n matrix A: (a) There exists a vector z which makes a strict acute angle (<2/2) with all the rows Aj, Fig. 2.3./a, or (b) There exists a vector x which make an acute angle (7/2) with all the rows A;, and the origin 0 can be expressed as a nonnegative linear combination of the rows A; with positive weights assigned to the rows A; that are orthogonal to z, Fig. 2.3.1b, or (c) The origin can be expressed as a positive linear combination of the rows Aj, Fig. 2.3.1c. By letting the matrix A of the previous theorem have a special structure a second existence theorem can be easily established. Second existence theorem [Tucker 56] Let A and B be gwen p' X n and p? X n matrices, with A nonvacuous. Then the systems I Ar20 Br =0 and Ho Ayt By =0,y,20 possess solutions zt € R*, y1 € R*, yx © R* satisfying Ar+yi>0 2618 as Nonlinear Programming PROOF We remark first that the requirement that A be nonvacuous is merely to ensure that the statement of the theorem is nonvacuous, that is, without a matrix A the theorem says nothing. We apply Theorem /4 to the systems A Bi)z20 —B and yw yn [4’,B’,—B’}| a] =0 (| 20 22 ae and obtain z, 41, 2:, 22, satisfying Az+yi>0 Be+2>0 —Br+a>0 Define now yz = 2: — ze. We have then that 2, yi, y2 satisfy Az 20 Bz =0 Ann + By, = 0 n20 Azt+ty>0 § Corollary Let A, B, C, and D be given p' X n, p* X n, p? X n, and pt Xn matrices, with A, B, or C nonvacuous. Then the systems I Az2=0 Be20 Czr20 Dzr=0 and WT Ay + By t+ Cys + Dy =0,91 20,220,420 possess solutions © R", yi © R*, yx ER”, ys C KR”, ys € KR, satisfying Az+iyi>0 Be+y.>0 and Cz+y:>0 This corollary to Theorem 17 will be used to derive the most general types of the theorems of the alternative of the next section.Linear Inequalities and Thoorems of the Alternative aa 4. Theorems of the alternative In this section we shall be concerned with establishing a series of theorems relating to the certain occurrence of one of two mutually exclusive events. The two events, denoted by I and II, will be the existence of solutions of two related systems of linear inequalities and/or equalities. The proto- type of the theorem of the alternative can be stated as follows: Either I or II, but never both. If we let I denote the nonoccurrence of J and similarly for II, then we can state the following. Typical theorem of the alternative lell or equivalently Tell TYPICAL PROOF I=TI (or equivalently I = II) and I= II (or equivalently I = TI) The proof that I => II is usually quite elementary, but the proof that T =I] utilizes the existence theorems of the previous section. In the theorems to follow, certain obvious consistency conditions will not be stated explicitly for the sake of brevity. For example, it will be understood that certain matrices must have the same number of rows, that the dimensionality of certain vectors must be the same as the number of columns in certain matrices, etc. We begin now by establishing a fairly general theorem of the alternative due to Slater [Slater 51). Slater's theorem of the alternative [Slater 51] Let A, B, C, and D be given matrices, with A and B being nonvacuous. Then either I Az>0 Bu>0 Cz20 Dz = 0 has a solution x or t Occasionally we shall also refer to the systema of inequalities and equalities themselves as systems I and II.2 aa Nonlinear Programming Ayn + Byr t+ Cys t+ Diya = with Il yi 20,y220,y320 or has a solution ys, Y2, Ys, Ys IV 20, y2>0,y320 but never both. PROOF (I= II) By assumption, I holds. We will now show that if II also holds, then a contradiction ensues. If both I and II hold, then we would have 2, y1, Y2, ys, ys such that atA'y, + xB'y2 + xC’ys + xD'y, > 0 because zD’y, = 0, xC’ys 2 0, and either rB’y, = 0 and xA’y; > 0, or 2B’y, > 0 and zA’y,; 2 0. But now we have a contradiction to the first equality of II. Hence I and II cannot hold simultaneously. Thus, I=T. d=ID Az >0 I= ( (Az 2 0, Bz 20, Cz 2 0, Dz = 0) ( or Be =0 Az 20, Br 20,Cz 20, Dr =0 y1 20 = A’yy + Blyz + Cys + D'yy = 0 =( or yi 20,42 20,4320 y2>O0 (by Corollary 18) =II. jj We remark that in the above proof, the requirement that both A and B be nonvacuous was used essentially in establishing the fact that IT. Corollary 18, which was used to prove that I = II, can handle systems in which merely A or B are nonvacuous. By slightly modifying the above proof, the cases B vacuous and A vacuous lead respectively to Motzkin’s theorem of the alternative (or transposition theorem, as Motzkin called it) [Motzkin 36] and Tucker’s theorem of the alternative [Tucker 56]. Motzkin's theorem of the alternative (Motzkin 36] Let A, C, and D be given matrices, with A being nonvacuous. ThenLinear Inequalities and Theorems of the Alternative a4 either I Az>0 Cz 20 Dz =0 has a solution x or Ul ( Aly + Cys + D'ys = has a solution yi, Yas Ys mn 20,y:20 but never both, PROOF (I= TI) IfbothI and II hold, then we would have 2, y1, y3, ¥« such that tA'y, + 2C’ys + zD'ys > 0 because xD'y, = 0, 2C’y; 2 0, and zA’y, > 0. But now we have a contradiction to the first equality of 11. Hence, I and II cannot hold simultaneously. Thus, I => IT. d=) I= (Ar = 0, Cr 0, De = 0) = (Ax $ 0)) Ax 20, Cx 20, Dx = 0 =( (Ay, + Cyst Dy =0 ) => (yw 20) (by Corollary 2.3.18) 120,422 0 =Il § Tucker's theorem of the alternative [Tucker 56] Let B, C, and D be given matrices, with B being nonvacuous. Then either I Br>0 Cz 20 Dz = 0 has a solution x or 7 7 ty, = Il Cres web Diy = > has a solution yo, Ys, Ya but never both. The proof is similar to the proof of Motzkin’s theorem 2. (The reader is urged to go through the proof himself here.) Slater [Slater 51] considered his theorem as the one providing the most general system I possible, because it involved all the ordering relations >, 2, 2, =. Similarly we can derive another theorem which involves the most general system II possible, in which y, > 0, yz 2 0, ya 2 0, and ys is unrestricted.24 Nonlinear Programming Theorem of the alternative Let A, B, C, and D be given matrices, with A and B being nonvacuous. Then either Az > 0, Br 20, Cx 2 0, Dt = 0 I or has a solution x Az 20, Br > 0, Cz 20, Dr = 0 or Aly, + Blyr + Cys + D'ys = 0 . II (4 > 0,42 >0,y320 has a solution y1, Ya, Ys, Ys but never both. PROOF (I= TT) If both I and II huld, then we would have z, yi, y2, ys ya Satisfying 2A'y, + 2B'y2 + 2C’ys + zD'ys > 0 because 2D’y, = 0, zC’ys = 0, and either 2B’y, = 0 and rA’y, > 0, or 2B’y2 > Oand zA’y, 2 0. But now we have a contradiction to the first equality of II. Hence, I and II cannot hold simultaneously. Thus, I=T]. di= 1 A'y, + BYy2 + Cys + Dy. = 0 yb 0 i= =( or yi 20,y220,y320 yz =0 A’y + Byrn t+ Cys + D'yw = 0 Azr>0 > w120,4220,y:20 = or Az 2 0, Br 2 0, Cr 2 0, Dt = 0 Be >0 (by Corollary 18) =I, J We remark that if either A or B is vacuous, then we revert to Tucker's theorem $ or Motzkin’s theorem 2. We remark further that in all of the above theorems of the alternative the systems I are all homogeneous. Hence, by defining z = —z, the system I of, say, Slater's theorem / can be replaced by lV Az < 0, Bz < 0, Cz S$ 0, Dz = 0 has a solution z Similar remarks apply to theorems 2 through 4. 80Linear Inequalities and Theorems of the Alternative ae The above theorems of the alternative subsume in essence all other theorems of this type. We derive below some of these theorems directly from the above ones. Gordan’s theorem [Gordan 73] For each given matrix A, either I Az > 0 has a solution x or II A’'y = 0, y > 0 has a solution y but never both. PROOF Follows directly from Motzkin’s theorem 2, by deleting the matrices C and D. §j Geometrically we may interpret Gordan’s theorem as follows. Either there exists a vector z which makes a strict acute angle (<7/2) with all the row vectors of A, Fig. 2.4./¢, or the origin can be expressed as a nontrivial, nonnegative linear combination of the rows of A, Fig. 2.4.10. Farkas’ theorem [Farkas 02] For each given p X n matriz A and each given vector b in R” either I Az s0 bz > 0 has a solution x € R" or Il A’y = b, y 2 0 has a solution y € RP but never both. PROOF By Motzkin’s theorem 2, either I holds or Il’ by — A’ys = 0, 7 > 0, ys 2 O must have a solution 1 € RF and ys © RP A A : Fig. 2.4.1 Geometric interpretation of a3 Gordan’s theorem. (o) (4) aa Nonlinear Programming but not both. Since » © R, 7 > 0 meansy > 0. Dividing through by n and letting y = y:/n, we have that II’ is equivalent to JJ. J Stiemke’s theorem (Stiemke 15] For each given matrix B, either I Bx > 0 has a solution x or Il By = 0, y > 0 has a solution y but never both. PROOF Follows directly from Tucker’s theorem 3, by deleting C and D. J Nonhomogeneous Farkas theorem [Duffin 56] For a given p X n matrix A, given vectors b € R*, c € R*, anda given scalar B, either I bz > 8, Ax = chasa solution z € R" or Ay=b vy SB, y2z0 Il or has a solution y © R? Ay =0,cy<0,y20 but never both. PROOF The system I is equivalent to br — BE >0 V &>0 )hasasolutionz GR, ECR —Ar+c&§20 By Motzkin’s theorem 2, then, either I’ holds or b Olfm 7 [26 [R]+[]=-° has a solution 9; € R, [n]zouz0 bE R, ys © Re Ayres = but not both.10 Linear Inequalities and Theorems of the Alternative a4 Now we have either 7; > 0 or 7; = 0 (and hence {; > 0). By defining y = y:/m in the first case and y = y; in the second, we have that II’ is equivalent to b- Ay =0,-Bt+oy=—-hi 50,¥20 I” or has a solution —Aly = 0, cy = 1 <0,y20 yer II” is equivalent to II. §j Gale's theorem for linear equalities [Gale 60] For a given p X n matriz A and given vector c € E?, either I Ax = chas a solution x € R* or II A'y = 0, cy = Lhas a solution y € R? but never both. PROOF The system I is equivalent to Vv —> 0, —t + Az = Ohas a solution & € R, x € R" By Motzkin’s theorem 2, either I’ holds or w— cys =0 I A’'ys = 0 Shas a solution y: € R, ys © R? yw 20 but not both. Since y.€ R, y1 > 0. By defining y = ys/y:, Il’ is equivalent toll. §f Gale's theorem for linear inequalities (<) [Gale 60] For a given p X n matriz A and given vector c € R?, either I Az S$ chas a solution x € R* or Il A’'y = 0,cy = —1, y 2 0 has a solution y € R? but never both. PROOF The system I is equivalent to YV &> 0, cf — Ax 2 0 has a solution § C R,z © R*a fonlinear Programming Table 2.4.1 Theorems of the Alternativet 4 Az >0, Br >0,Cx 20, Dx = 0 (A and B nonvacuous) (Slater) 2 Az >0, Cz 20, Dt =0 (A nonvacuous) (Motzkin) A'y + Byrn + Cys + D'yw = 0 yw 20,y220,y20 or n20%>0,y 20 Ay, + Cys + D'ys = 0 wu 2020 3 Br>0,Cz 20, Dx =0 (B nonvacuous) (Tucker) Byz + Cys + D'ys = 0 yz > Oy 20 4 Az >0, Br 2 0,Cx 20, Dt = 0 or Az 20, Br > 0,Cz 20, Dt = 0 (A and B nonvacuous) A’yy + BYyr + C'ys + D'yy = 0 wu >0% 20420 5 Ax > 0 (Gordan) A’y=0,y 20 6 bz >0, Az S$ 0 (Farkas) A'y=b,y20 7 Bz > 0 (Stiemke) By =0,y>0 br > 6, Az Se (Nonhomogeneous Farkas) Aly =b, cy S8,y20 or Aly =0,cy<0,y20 9 Ax =c (Gale) Ay =0,cy=1 10 Az Sc (Gale) Aly =0,cy = -ly20 Az Se A'y = 0, cy or A’'y =0,cyS0,y>0 -ly20 exclusive ‘or.’ jppearing in the above table and in Problems 2.4.12 to 2.¢.17 is an By Motzkin’s theorem 2, either I’ holds or ywtcys =0 W —A'y; = 0 yi 20,4220 has a solution y, © R, ys © RP but not both. By defining y = y3/y1, II follows from II’. §f11 12 13 14 Linear Inequalities and Theorems of the Alternative a4 Theorem for linear inequalities (<) For a given p X n matriz A and a given vector c © R?, either I Az < has a solution x © R* or A’'y=0,cy= —-1,y20 Il or has a solution y © RP A'y=0,cyS0,y>0 bul never both. PROOF The system I is equivalent to Yl — > 0, ct — Az > 0 has asolution § C R, x € R” By Slater’s theorem /, either I’ holds or yi tcy2=0 Ay, = 0 IV y1 >0,y2 20 )has a solution y: € R, y2 € R? or y: 20,y2>0 but not both. If for the case when y: > 0, y2 2 0, we set y = y2/ys, and for the case when y1 2 0, y2 > 0, we set y = yz, then II is equivalent to . | In the table above, Table 2.4.1, we give a convenient summary of all the above theorems of the alternative. Problems By using any of the above theorems / to 11, establish the validity of the following theorems of the alternative (12 to 17): Either I holds, or II holds, but never both, where I and II are given below. I Ax S$ c,xz = 0 has a solution z II A’y 2 0, cy < 0, y 2 0 has a solution y [Gale 60] I Az <0, x 2 O has a solution x II A’y 2 0, y > O has a solution y {Gale 60] I Az < 0, z 2 0 has a solution x Il A’y 2 0, y 2 0 has a solution y [Gale 60]15 16 17 18 a4 Nonlinear Programming I Az < 0,2 > 0 has a solution z Il A’'y 2 0, y > 0 has a solution y Az $0,x200r . has a solution z Ar $0,2>0 II A’y > 0, y > 0 has a solution y I Az S 0,2 > 0 has a solution x II A’y > 0, y > 0 has a solution y Mnemonic hint In all the theorems of the alternative / to 17 above, which involve homogeneous inequalities and/or homogeneous equalities, the following correspondence between the ordering relations, >, >, 2, =, occurs: Orderings appearing in I Orderings appearing in II >* 2* 2 2 2 = Unrestricted, 2* >* 2 e = = Unrestricted, >* > 2* 2 > > > or = Unrestricted, 2 > 2 Unrestricted, 2 2 >* 2 > 2* 27 \e] /7 z = = Unrestricted19 Linear Inequalities and Theorems of the Alternative ae The asterisks indicate ordering relations which must be present in order for the correspondence to hold. The arrows indicate the direction in which the correspondence is valid; for example, — indicates that starting with the relations at the unpointed end of the arrow, the corresponding relations are those at the pointed end of the arrow. Problem Establish Motzkin’s theorem 2 by starting with Farkas’ theorem 6. (Hint: Let I @ (Az > 0, Cz 2 0, Dr = 0 has a solution x € R") then show that Te (Az 2 et, ¢ > 0, Cx 2 0, Dr = 0 has no solution z € R*, ¢ € R) and use Farkas’ theorem. €¢ is a vector of ones in the above.)Chapter Three Convex Sets in R® The purpose of this chapter is to introduce the fundamental concept of convex sets, to describe some properties of these sets, and to derive the basic separation theorems for convex sets. These separation theorems are the foundations on which many optimality conditions of nonlinear programming rest. 1. Convex sets and their properties In order to define the concept of a convex set, we begin by defining line and line segments through two points in R*. Line Let zc? CR, The line through z! and x? is defined as the set {z[o = (1— da! +r, € BR} or equivalently {z|z = pit’ + px’, pi, pr E RB, Pit pe = 1} If we rewrite the first definition in the equivalent form {e[ x = 2+ dA@?— 2), NER] and consider the case when x € R?, it becomes obvious that the vector equation x = 2!+ A(z? — 2) is the parametric equation of elementary analytic geometry of the line through z! and z?, Fig. 3.1.1.Convex Sets in 2” a1 Line segments Let z!,z? EG R*. We define the following line segments joining z! and z?: (i) Closed line segment [z',27] = {z|z = (1—A)z'+dz4z,0SAS 1} (ii) Open line segment (z!,2?) = {z|x = (1 — A)z'+AzXx,0 n + 1, then x can be written as a convex combination of m — 1 points in T. (This would establish the theorem then, for we could repeatedly apply the result until x is a convex combination of n + 1 points of I.) If any p; in the above expression is zero, then x is a convex combination of m — 1 or fewer points of . So let each p; > 0. Since m > n+ 1, there exist m1, . . . , fm-1 © R, not all zero, such that rial — a) fo + tma(z™ — 2") = 0 (by A.1.3) Define rn = —(ri + © + * + 7m-1). Then m m > 7 = 0 x rai =0 <1 it Define Qi = Di — a7; fort? =1,...,m where a is some positive number chosen such that g; 2 0 for all i, and at least one gi, say gz, is equal to0. In particular we choose a such that 1 { “t Te — = max j—} = — @ a UR Pe16 17 aa Nonlinear Programming Fig. 3.1.4 A set I and its convex hull (}. Then a@2Zz07=1,...,mqa=0 m m m m m Ya= Yar Ya-aYrnea Yamal i=l vel il i=l ial ink and nm m m m a=} pai= SY gata SY nai= YS gat m1 roost m1 ich Hence z is a convex combination of m — 1 pointsinT. J Convex hull Let FC R*. The convex hull of I, denoted by [F'], is the intersection of all convex sets in R* containing I’. (By Theorem 9, the convex hull of any set f C R* is convex. Figure 3.1.4 shows a hatched noncon- vex set in R? and its shaded convex hull.) Obviously if F is convex, then r = [I]. Theorem The convex hull {T] of a set T C R® is equal to the set of all convex combinations of points of Y. PROOF Let A denote the latter set, that is, k & A= {z|2= Y paipER, @ ET, 20, Y p= 1,k21} isl i=l If x12? € A, then k k a= ) pain ER a Cl, p20, Y w= 1 i=) fal m m v= Yd abi wR, KET, g 20, Yarl ist i=.18 19 20 a1 Convex Sets in R* a4 Hence forO SA S81 k * Att + (1 — dds? = Y dpai+ Y (1 — r)gidé i=l im) and k m dpe 20,1 ~ 2) 20, Yr + YA-Ag=l ier vel Thus Az! + (1 — \)z? € A, and Ais convex. Itis also clear that TC A. Since A is convex, then [T}] C A. We also have by Theorem /3 that the convex set [I'] containing must also contain all convex combinations of points of f. Hence A C [Ir], andA = {Ir}. ff Sum of two sets Let P,A C R*. Their sum I + A is defined by Pas {zlz=r+yr2El yea} Product of a set with a real number Let CR", and let} © R. The product XV is defined by AV = {z|z=aAz,x ET} Note that if \ = —landI’, ACR", then A+ AT=A—T. Note that this is not the complement of I" relative to A as defined in 1.2 and written as A~T. Theorem The sum T + A of two convex sets T and A in R* is a convex set. PROOF Let 2,22 ET +4 A, then z! = zt 4+ y! and 2? = 2? + y?, where ojwerandyyCA. ForOSAS1 (1 — Adz + de? = (L—Ajat + Az? + (1 —dy FAY ET +A |}«+———_>| |«——_> | (A point in (A point in T, by convex- A, by convexity of T) ity of A) Hence F + Ais convex. § Theorem The product pV of a convex set T in R* and the real number p is a convex set.22 a2 Nonlinear Programming PROOF Jet 2,2? € ul, then 2! = ua! 2? = wx’, where 2!,27 © T. For OSAS1 (L — Ada! + 2? = wl(1 — Nxt + 7] ur Ef \¢——_>| (A point in T, by convexity of Tr) Corollary If T and A are two convex sets in R”, then T — A is a convex set. 2. Separation theorems for convex sets It is intuitively plausible that if we had two disjoint convex sets in R", then we could construct a plane such that one set would lie on one side of the plane and the other set on the other side. Despite its simplicity, this is a rather deep result and is not easy to prove. One version of this result, the Hahn-Banach theorem, can be established by only using the vector space properties 1.3.3 of R" and not the topological properties induced by the norm |/2x|! [Berge 63, Valentine 64]. We shall, however, use these topological properties of R” (all summarized in Appendix B) in deriving the separation theorems for convex sets. In particular our method of proof will make use of Gordan’s theorem of the alternative 2.4.8 and the finite intersection theorem of compact sets B.3.2 (iii). (Knowledge of the contents of Appendix B is assumed from here on.) Separating plane The plane {z| a € R", cx = a}, c # 0, is said to separate (strictly separate) two nonempty sets I and A in R* if zelrscasa (cz < a) rEAs>caza (cz > a) If such a plane exists, the sets T and A are said to be separable (strictly separable). Figure 3.2.1 gives a simple illustration in R? of two sets in R" which are separable, but which are neither disjoint nor convex. It should be remarked that in general separability does not imply that the sects are disjoint (Fig. 3.2.1), nor is it true in general that two disjoint sets are separable (Fig. 3.2.2). However, if the sets are nonempty, convex, and 46Convex Sets in R* Fig. 3.2.1 Separable but not disjoint sets. disjoint, then they are separable, and in fact this is a separation theorem we intend to prove. Lemma Let 2 be a nonempty convex set in R", not containing the origin 0. Then there exists a plane {x |x © R*, cx = 0}, c ¥ 0, separating 2 and 0, thatis, zENg=eZz0 PROOF With every z € © we associate the nonempty closed set A. = lyly ER, yy = 1, zy 2 0} Let z', . . . , 2™ be any finite set of pointsin®. It follows from the convexity of ©, Theorem 3.1.13, and from the fact that 0 ¢ Q, that 22m = 0, ype 1,p20,¢=1,..., m has no solution p € R™ in i= Fig. 3.2.2 Disjoint but not separable sets. aras Nonlinear Programming or equivalently », x'p; = 0, p > 0 has no solution p € R™ Hence by Gordan’s theorem 2.4.6 zy >0,i=1,...,mhasasolution y € R* Obviously y ¥ 0, and we can take y such that yy = 1. Then n m yEO lye By yw = 1 wy 20} = 0 As inl and hence m OAs #8 ae The sets (A.)zE9 are closed sets relative to the compact set {y|y € R’, yy = 1} [see B.1.8 and B.3.2(i)], hence by the finite intersection theorem B.8.2(iii) we have that a Az #9. Let ¢ be any point in this intersec- a tion. Then cc = 1 and cx 20 for all x EQ. Hence {x|2€ R*, cx = 0} is the required separating plane. §j It should be remarked that in the above lemma we did not impose any conditions on Q other than convexity. The following example shows that the above lemma cannot be strengthened to z € Q => cz > 0 without some extra assumptions. The set Q= {x|x ER x > 0) U(r] 2 € RY, 21 > 0, a2 < 0} is convex and does not contain the origin, but there exists no plane {x |x € R*, ce = 0} such that zr CQ cx > O (Fig. 3.2.3). If on the other hand we do assume that 2 is closed (or even if we Yj *Convex Sets in R* 32 assume less, namely that the origin is not a point of closure @), then we can establish a stronger result, that is, there exists a plane which strictly separates the origin from 2 (see Corollary 4 and Lemma 4 below). How- ever, before doing this, we need to establish the following fundamental separation theorem. Separation theorem Let T and A be two nonempty disjoint convex sets in R". Then there exists a plane {x |x € R*, cx = a,c ¥ O, which separates them, that 1s, reErscasa rEAScezea PROOF The set AT = {c]e=y-2yEA,zET} is convex by Corollary 3.1.22, and it does not contain the origin 0 beeause TOA =9. By Lemma 2 above there exists a plane {x | x € R", cr = 0}, c * 0, such that zeEa-—-Ts>cr 20 or yEA,zE€T=scy—z) 20 Hence 8 = infimum cy = supremum cz = y wea 2er Define a =P ty 2 Then 2€T>cz yEAsyza | We derive now from the above fundamental separation theorem a corollary, and from the corollary a lemma, Lemma 5. Lemma 6 will be used in establishing a strict separation theorem, Theorem 6, below. Corollary Let Q be a nonempty convex set in R. If the origin 0 is not a point of closure of Q (or equivalently if the origin is not in the closure © of Q), then HA a 4aa Nonlinear Programming there exists a plane {x| xz © R*, cx = a}, c #0, a > O, strictly separating Q and 0, and conversely. In other words og ma( en ber® ) zrENsac>a PROOF (<=) Assume that there exist c ¥ 0, a > 0 such that cx > a forallz EQ. If0 EQ, then (see B.1.3 and B.1.6) there exists anz € 9 such that ||z|| < «/2|[c||, and hence a a 3 = Vell apy > Hell Wiel 2 lexi > @ (by 1.8.8) which is a contradiction. Hence 0 ¢ &. (=) Since 0 is not a point of closure of 9, there exists an open ball B,(0) = {z|z € R*, |\z|| < e} around 0 such that B.(0) 12 = § (see B.1.8). Since the ball B.(0) is convex (see 3.1.7), it follows by Theorem 8 that there exists a plane {z | z © R*, cr = y}, c ¥ 0, such that ceEBO>csy rEN>sc2zy Since B,(0) is an open ball, it must contain the nonzero vector dc for some positive §. Hence y 2 écc > 0. Leta = lgéce > 0. Then zrENQ=ecrzy>a>O | Lemma Let 2 be a nonempty closed convex set in R*. If 2 does not contain the origin, then there exists a plane {x|x € R*, cx = a}, c ¥0,a>0, strictly separating Q and 0, and conversely. In other words Jc #0,a > 0: 0g2e rEQ=>c>ea PROOF This lemma follows from Corollary 4 above by observing that the requirement that 9 be closed and not contain the origin 0 implies that 0 is not a point of closure of Q, that is, 0 ¢ & (see B.1.3, B.1.5 and B.1.6). J Strict separation theorem Let T and A be two nonempty convex sets in R", with T compact and A closed. If T and A are disjoint, then there exists a plane {x|x € R’,Conver Sots in R" 3.2 cz = a}, c 0 which strictly separates them, and conversely In other words Jc = 0 and a: TONA=98\ cE€T>cc>a PROOF (=) IfsE@INa, thenez < @ < cz, a contradiction. (=) The set A-T= {t]z=y-—z2yEa,z2ET} is convex by Corollary 3.1.22 and closed by Corollary B.3.3. Hence by Lemma 6 above there exists a plane {z|z © R*, cz = uJ, ¢ #0, > 0, such that zEaA-Tsce>u>0 or yEA zE€Tsecy—2>p>0 Hence 8 = inf cy 2 supce + yu > super = y vEA 2er 7er Define a afty 2 Then 2E€T>cea a Ff The above separation theorems will be used to derive some fundamental theorems for convex functions in the next chapter, which in turn will be used in obtaining the fundamental Kuhn-Tucker saddlepoint optimality criteria of convex nonlinear programming in Chap. 5 and also the minimum principle necessary optimality condition of Chap. 11. We remark here that a theorem of the alternative, the Gordan theorem 2.4.5, was fundamental in deriving the above separation theorems. We can reverse the process and use the above separation theorems to derive theorems of the alternative. Thus to derive Gordan’s theorem 2.4.5, namely that either A’y = 0,y > Ohasasolutiony € R™ or Az > 0 6142 Nonlinear Programming Ar A As As ‘A (a) ‘ (6) Fig. 3.2.4 Geometric reinterpretation of Gordan’s theorem by using Lemma 6, (a) A'y =0, y 20 has solution; Az >O has no. solution; (b) Az > 0 has solution; A’y = 0, y > 0 has no solution. has a solution z € R*, we observe that if e € R™ is a vector of ones, then A’y = 0, y 2 0 has no solution #0 EQ = {z|z = A’y, y 2 0, ey = 1} ede #0,a>0:2ENScz>0 (by 5) ey2z0,ey=1=cAy>a>OD = Ac>0 The last implication follows by taking y=e' CR, i=1,...,m, where e# has zeros for all elements except 1 for the ith element. Using the framework of Lemma 4 we can give a geometric reinterpretation of the Gordan’s theorem as follows: Either the origin 0 € R” (a) (4) Fig. 3.2.5 Geometric interpretation of Lemma 5Convex Sets in A" 3a is in the convex hull of the row vectors Ai, ..., An of the matrix A (A’y = 0, y > O has a solution, Fig. 3.2.4a), or it is not (in which case by Lemma 6 Az > 0 has a solution z = c, Fig. $.2.4b). More generally, if @ is any nonempty closed convex set in R*, then either it contains the origin, Fig. 3.2.6e, or it does not (in which case by Lemma 4 there exists a vector c € R* which makes a strict acute angle with each z EQ, Fig. 8.2.5), Problem Establish Farkas’ theorem 2.4.6 by using Theorem 6 above. (Hint: Observe that A’y = b, y 2 0 has no solution if and only if the sets {b} and {z|z = A’y, y 2 0} are disjoint. Then use Theorem 6.)Chapter Four Convex and Concave Functions In this chapter we introduce convex, concave, strictly convex, and strictly concave functions defined on subsets of R*. Convex and concave functions are extremely important in nonlinear programming because they are among the few functions for which sufficient optimality criteria can be given (Chaps. 5 and 7), and they are the only functions for which necessary optimality conditions can be given without linearization (Kuhn-Tucker saddlepoint condition in Chap. 5), We give in this chapter some of the basic properties of convex and concave functions and obtain some fundamental theorems involving these functions. These theorems, derived by using the separation theorems for convex sets of Chap. 3, are akin to the theorems of the alternative derived in Chap. 2 for linear systems. In this sense convex and concave functions inherit some of the important properties of linear functions. These fundamental theorems will be used to derive the important saddlepoint necessary optimality condition of Chap. 5 and the minimum principle necessary optimality condition of Chap. 11. Finally it should be mentioned that no differentiability or explicit continuity requirements are made on the functions introduced in this chapter. A subsequent chapter, Chap. 6, will be devoted to differentiable convex and concave functions.Convex and Concave Functions at 1, Definitions and basic properties Convex function A numerical function 6 defined on a set F C R* is said to be conver at £ ET (with respect to Lr) if cer OSAS1 )>(1— dlo(@) + rO(z) = Of(1 — JZ +H Az] (l—-vz+r.2 ET 4 is said to be convex on T if it is convex at each z € TL. Note that this definition of a convex function is slightly more general than the customary definition in the literature [Fenchel 53, Valentine 64, Berge-Ghouila Houri 65] in that (i) we define convexity at a point first and then convexity on a set, and (ii) we do not require I to be a convex set. This generalization will allow us to handle a somewhat wider class of problems later. It follows immediately from the above definition that a numerical function @ defined on a convex set I is convex on I if and only if oe » => (1 — d)0(z!) + AO(x?) = O[(1 — A)z* + Az*] OsS\ASs1 Figure 4.1.1 depicts two convex functions on convex subsets of R" = R. ® . Ln t-)) 8 1 1 I t 1 t (I-A) x! + dx?) ' t x av e /}+—__—___ — po” (0) (6) Fig. 4.1.1 Convex functions on subsets of Rt = R. (a) A convex function @ on R; (b) A convex function @ on r = [—1, ©).4 41 Nonlinear Programming Concave function A numerical function 6 defined on a set T C R® is said to be concave at £ © T (with respect to Pr) if zer 0S. 81 )> (1 —d)o(@) + Oz) S Of(1 — d)E + 2] (l-—A)#+ <4 ET 0 is said to be concave on I if it is concave at each x ET. Obviously @ is concave at € T (concave on IL) if and only if —6 is convex at # (convex on I’), Results obtained for convex functions can be changed into results for concave functions by the appropriate multiplication by —1, and vice versa. It follows immediately from the above definition that a numerical function @ defined on a convex set T' is concave on I if and only if ae@er => (1 — A)a(z!) + AO(@?) S ol(1 — Aa! + Az’) OsS\ds1 Figure 4.1.2 depicts two concave functions on convex subsets of Rn = R. Problem Show that a linear function, 6(z) = cz — a, z € R", is both convex and concave on R*, and conversely. Strictly convex function A numerical function 6 defined on a set ! C "is said to be strictly (I-A) t+ dx?) \ (I-A) O (4!) +8 (x2) }----------- 1 ! 1 1 1 ! ' 1 ! 1 1 t ' ' 1 1 1 2 Rp? (a) (eo) R? x Fig. 4.1.2 Concave functions on subsets of k* = R. (a) A concave function @ on R; (6) A concave function # on Fr = (0,1).Convex and Concave Functions 4a convex at ¢ © T (with respect to I) if zéer LAE 0 Of(1 — AME + Az] 6 is said to be strictly convex on I if it is strictly convex at each x € I. Strictly concave function A numerical function 6 defined on a set I C Rt is said to be strictly concave at £ € T (with respect to I’) if zeEr ue )= (1 — d)0@) + rAO(z) < OL ~— ANE +H Ag] 0Opitt't+ +++ + pat™) S pio) +--- + P(x”) Pitt t tpn =i Fig. 4.1.4 The convex sets Aa and 9 of 10 and 11 associated with a function @.13 14 Convex and Concave Functions 41 (Hint: Use Theorems 8 and 3.1.13. The above inequality is Jensen’s inequality [Jensen 06].) Theorem If (@)ier is a family (finite or infinite) of numerical functions which are convex and bounded from above on a convex set T C R*, then the numerteal function A(x) = sup 6;(x) i€l is a convex function on T. PROOF Since each 6; is a convex function on I, their epigraphs G@, = (mole ET, FER, a(z) SF are convex sets in R*+! by Theorem 8, and hence their intersection D Ge, = {|e ET, FER, 6(z) St, ViE T} {@ol2 ET, fER, Oe) so} is also a convex set in R**! by Theorem 3.1.9. But this convex intersection is the epigraph of 6. Hence @ is a convex function on I by Theorem 8. ff Corollary If (Gi)ier ts a family (finite or infinite) of numerical functions which are concave and bounded from below on a convex set T C R”, then the numerical function 6(x) = inf 6,(x) ier ts a concave function on T. We end this section by remarking that a function @ which is convex on a convex set I C R* is not necessarily a continuous function. [or example on the halfline Pf = {x |x € #,xz 2 —1}, the numerical function 2 fort = —1b (x) = (x)? forz > -1l is a convex function on I’, but is obviously not continuous at = —1, Fig. 4.1.16. However, if Tis an open convex set, then a convex function 6 on T is indeed continuous. This fact is established in the following theorem, 115 4a Nonlinear Programming Theorem Let T be an open convex set in R*. If 6 is a convex numerical function on T’ then 6 is continuous on T. PROOF [Fleming 65]t Let x° EI, and let a be the distance (see 1.3.9) from x° to the closest point in R" notinT (a = +o iff = R*), LetC be an n-cube with center +° and side length 26, that is C={e|tER, -8S24,—-2795861=1,...,n} By letting (n)*5 < a, we have that C CT, Let V denote the set of 2" vertices of C. Let B = max 0(x) 26V By Theorem 10 the set Ag = (2 | a ET, 0(x) S 8} is convex. Since C is the convex hull of V (this can be easily shown by induction on 7) and V C As, it follows that C C Ag, by Theorem 3.1.13 (Fig. 4.1.5). Let z be any point such that 0 < ||z — x°|| < 6, and define x® + u, x° — u on the line through x° and « as in Fig. 4.1.5. Write x now asa convex combination of x and x° + u, and z° as a convex combination of zandz°—u. Ifd = {lx — x°l|/8, then 2° + hu = A(x9 + u) + (1 — dA)z? esx — du =2+X(x°— u) — Az? 1 » peat tips x (2° — u) + Fleming attributes this proof to F. J. Almgren. Fig. 4.1.5Convex and Concave Functions 42 Since 6 is convex on I’ a(z) S O(x? + u) + (1 — A(z) SAB + CA — ANO(2) 8(z) +8 O28) S pay O02) + Ay oat — w) 5 OE? 1+) 1+a These inequalities give —NB — 6(2°)] S 6(z) — O(x) S ALB — O(2")} or —_ 0° Jo¢e) — o(2*)| = 8A) jx — soy Thus for any given ¢ > 0 it follows that [6(x) — 6(a°)| < ¢ for all x satisfying [8 ~ &x°)] le — x°l] < 5, and hence 6(«) is continuous at 2°. Since the interior of each set I C R” is open, it follows that if @ is a convex function on a convex set T C R", it is continuous on its interior. 2. Some fundamental theorems for convex functions We saw in Chap. 2 that Farkas’ theorem of the alternative played a crucial role in deriving the necessary optimality conditions of linear programming. In this section we shall derive what may be considered extensions of theorems of the alternative of Chap. 2 to convex and con- eave functions. These theorems in turn will play a similar crucial role in deriving the necessary optimality conditions of nonlinear programming in Chaps. 5 and 11. (In the remainder of this chapter various properties of continuous and semicontinuous functions will be used. For convenience, these results are summarized in Appendix C.) We begin by establishing a fundamental theorem for convex functions, the essence of which is given in [Fan-Glicksburg-Hoffman 57]. Theorem Let T be a nonempty convex set in R", let f be an m-dimensional convex vector function on I’, and let h be a k-dimensional linear vector function on Rk”. If I(x) <0 has no solution x ET h(z) = 04a Nonlinear Programming then there exist p © R" and q © R* such that p = 0, (p,q) #0 pf(x) + qh(x) 2 0 for alla ET REMARK p 2 0 and (p,q) # 0 does not imply p > 0 and g # 0, but it does imply p > 0 or g # 0 or both. However if we delete the linear equalities h(x) = 0, then p > 0. PROOF Define the sets A@) = {y2lyeRy ze Ry > fia),2=h@)} cer and A= VU A(z) z€Pr By hypothesis A does not contain the originO € R™+*, Also, A is convex, for if (y,z') and (y?,z*) are in A, then forO SA S 1 (1 — Ady? + Ay? > (CL = AS) + (2) 2 SL ~ Az! + da] and (1 = Nyt + dz? = (1 — dA") + AACA?) = ALG — Az! + Ar?] Because A is a nonempty convex set not containing the origin, it follows by Lemma 3.2.2 that there exist p € R", g © R*, (p,q) ¥ O such that (uz) CAS put q2o0 Since each u; can be made as large as desired, p = 0. Let « > 0, u = f(a) + ee, v = A(x), x ET, where ¢ is a vector of ones in R™. Jence (u,v) € A(z) C A, and pu + qu = pf(x) + epe + gh(z) 2 0 forz ET or pi(x) + qh(x) = —epe forx ET Now, if inf pf(x) + gh(x) = -5 <0 2er we get, by picking ¢ such that epe < 6, that inf pf(z) + qh(z) = —6 < —epe serConvex and Concave Functions 4a which is a contradiction to the fact that pf(x) + gh(x) 2 —epe for all x€I, Hence inf pf(z) + qh(z) 20 9 er If we observe that for an m-dimensional vector function f defined on IC R* we have that f(z) <0 f(z) $0 f(z) $0 has a solution ) =*( has asolution ) =( has a solution zEer cer zEr and f(z) <0 f(@) <0 f(z) £0 has no solution } <= { has no solution ) <= { has no solution zeEr zeEr rer then the following corollary is a direct consequence of Theorem /. Corollary Let I’ be a nonempty convex set in R, let fi, fe, fa be m'-, m?-, and m*-dimensional convex vector functions on I, and h a k-dimensional linear vector function on R”. If Ce <0, fala) <0, falx) $0 has no solution x ECT h{z) =0 then there exist p, © R™, p2 © R™, ps © R™, and g © R* such that Pay D2, Ps 2 O, (P1P2,Pa,9) ¥ O Pifi(z) + pefe(z) + pifa(x) + gh(z) 20 forallx ET We give now a generalization of Gordan’s theorem of the alternative 2.4.5 to convex functions over an arbitrary convex set in R*. Generalized Gordan theorem [Fan-Glicksburg-Hoffman 57] Let f be an m-dimensional conver vector function on the convex set l'CR*. Then either I S(z) < 0 has a solutionx ET42 Nonlinear Programming or Il pf(x) = 0 for allz ET for some p > 0, pC R™ bué never both. PROOF (I= TI) Let €TI bea solution of f(z) <0, Then for any p = Oin R™, pf(z) < 0, and hence II cannot hold. (I= II) This follows directly from Theorem / above by deleting h(x) = 0 from the theorem. §f To see that 3 is indeed a generalization of Gordan’s theorem 2.4.5 we let f(z) = Ax, where A is an m X n matrix. Then pAr 20 for allz € R" forsomep >0,pER" (by 3) @ Ap = 0,920 for some p € R™ where the last equivalence follows by taking x = te,7=1,...,, where e' © R® has zeros for all its elements except 1 for the ith element. In the same spirit, Theorem 1 above can be considered a partial generalization of Motzkin’s theorem 2.4.2 to convex functions. The generalization is partial (unlike 3 which is a complete generalization of Gordan’s theorem), because the statement of Theorem / does not exclude the possibility of both systems having a solution, that is, there may exist an ET and p20, (p,q) ¥ 0, such that f(z) <0, A(z) = 0, and pi(z) + qh(z) 2 0 for all x ET. Similarly, Corollary 2 is a partial generalization of Slater’s theorem 2.4.1. However, it is possible to sharpen Theorem / and make it a theorem of the alternative if we let T = R*, h(z) = Be —d and require that the rows of B be linearly independent. We obtain then the following result. Az > 0 has no solution z € R" = ( Theorem Let f be a given m-dimenstonal convex function on R", let B be a given k X n matrix with linearly independent rows, and let d be a given k-dimensional vector. Then either I Six) < 0, Bz = d has a solution x € R* or II pf(z) + q(Bx ~— d) 2 0 for all E R* for some p > 0, p E R, qEeR but never both.Convex and Concave Functions a2 PROOF (I= TI) Let @ € R* be a solution of f(x) < 0 and Br = d. Then for any p > 0 and g in R” and R* respectively, pf) + 9(Bt — 4) <0 Hence II cannot hold. (I= II) If I has no solution then by Theorem / there exists p 2 O, (p,q) ¥ 0 such that pf(z) + (Br —- d) 20 for alla € R” If p > 0, the theorem is proved. We assume the contrary, that p = 0, and exhibit a contradiction. If p = 0, then q(Bx — d) 20 for all z € R* for some gq ¥ 0 We will show now that B’g =0. For, if B’g = 0, then by picking x = —@B for the case when gd = 0, and z = 2(gd)qgB/qBB’q for the case when gd < 0, we obtain that ¢(Bz — d) < 0. Hence B’g = 0 for some q * 0, which contradicts the assumption that the rows of B are linearly independent. J We close this chapter by obtaining another fundamental theorem for a (possibly infinite) family of convex and linear functions. Theorem [Bohnenblust-Karlin-Shapley 50] Let T be a nonempty compact convex set in R* and let (fi)ica be a family (finite or infinite) of numerical functions which are convex and lower semicontinuous on T', and let (hijicx be a family (finite or infinite) of linear numerical functions on R”. If f(z) $0, EM has no solution x CT hz) =0,7 EK then for some finite subfamily (fi, . - . fi.) of (fien and some finite subfamily (hi, ... hy) of (hiicx there exist pC R™ and q © Re such that p = 0, (p,q) #0 m zk x pif, (a) + x ghi(z) 2 0 forallzx ET gat j=l If K is empty, that zs af all equalities h;(x) = 0 are deleted, then the last inequality above (20) becomes a strict inequality (>0). 6t4a Nonlinear Programming PROOF {Berge-Ghouila Houri 65} The system fila) S$ WiE M, We>0 h(z) = 0, Wie K has no solution z in T. [For if it did have a solution @, then f,(Z) S ¢ for all e > 0 and all 7 € AM, and h,(#) = 0 for all? © K. This in turn implies that f,(@) S$ 0 for alli © M and Ai(#) = 0 for all i € K (for otherwise if f,(Z) > 0 for some i € M, then picking « = 14f,(z) > 0 would lead to a contradiction). This however contradicts the hypothesis of the theorem.] The sets Taj) = {ele ET, f(z) S ¢ hz) = 0} are closed sets (because of the lower semicontinuity of fi, the linearity of hy, and the compactness of I', see Appendix C) contained in the compact set I, and their intersection is empty. Hence by the finite intersection theorem B.3.2(ili) there exist a finite number of such sets so that their intersection is empty. Thus we obtain indices (ii,%2, ... jim) © M, (tijt2, . . . ,te) E K, and real numbers e:, 2, . . . , ém > 0, such that the system f(a) -6 S$0j=1,...,m h(a) =0j=1,...,k has no solutions € TF. Hence by Corollary 2 there exist p € R",q © R* such that p20 (p,q) #0 and m k m Y, pdi,(2) + > ghi(z) = Y pe = for alla ET j=) j=l j=l from which the conclusion of the theorem follows if we observe that m m > pi; = 0, and if K is empty, then p > 0 and x pe > 0. El a=. jalChapter Five Saddlepoint Optimality Criteria of Nonlinear Programming Without Differentiability The purpose of this chapter is to derive optimality criteria of the saddlepoint type for nonlinear programming problems. This type of optimality criterion is perhaps best illustrated by a simple example. Consider the problem of minimiz- ing the function @ on the set X= {z|z€R, -r +2 0), where 6(x) = (r)?. Obviously the solution is = = 2, and the minimum is (=) = 4. The saddlepoint optimality criterion for this problem is this: A necessary and sufficient condition that @ be a solution of the minimization problem is that there exists a real number @ (here @ = 4) such that for all x € R and all u€R,u2z0 6(@) + u(—# + 2) S 0(&) + a(—-% + 2) SOx) + a(—2 + 2) It is easy to verify that the above inequalities are satisfied for Z = 2, a = 4, Hence the function y defined on R? by (zu) = Oz) + u(—z + 2) has a saddlepoint at = 2,% = 4, because it has a minimum at (Z,d) with respect to 2 for all real x, and a maximum with respect to u for all real nonnegative u. For the above simple problem, the saddlepoint criterion hap- pens to be both a necessary and a sufficient optimality criterion for z to be a solution of the minimization problem. This is not always the case. We shall show in this chapter that the above saddlepoint condition is a sufficient optimality condition without any convexityba Nonlinear Programming requirements. However to establish the necessity of the above saddlepoint condition, we need not only convexity but also some sort of a regularity condition, a constraint qualification. This confirms earlier state- ments made to the effect that necessary optimality conditions are more complex and harder to establish. We shall develop the optimality criteria of this chapter without any differentiability assumptions on the functions involved. Subsequent chapters, Chaps. 7 and 11, will establish optimality criteria that involve differentiable functions. 1. The minimization and saddlepoint problems The optimality criteria of this chapter relate the solutions of a minimization problem, a local minimization problem, and two saddlepoint problems to each other. We define these problems below now. Let X° be a subset of R*, let @ and g be respectively a numerical function and an m-dimensional vector function defined on X°. The minimization problem (MP) Find an 4, if it exists, such that az) = min Oz) £EX = {x| x € X, g(x) $0) (MP) The set X is called the feasible region or the constraint set, the minimum solution or solution, and 6(#) the minimum, All points x in the feasible region X are called feasible points. If X is a convex set, and if @ is convex on X, the minimization problem MP is often called a convex programming problem or convex program. (We observe that the above minimization problem is a special case of the general minimization problem 1.6.9, where the additional k-dimensional vector equality constraint h(x) = 0 was also present. The reason for this is that in the absence of differentiability there are no significant optimality criteria for problems with nonlinear equality constraints. Some results for linear equality constraints will be obtained however. See 5.3.2, 5.4.2, and 6.4.8.) The local minimization problem (LMP) Find an Z in X, if it exists, such that for some open ball B;(z) around £ with radius 6 > 0 zt © Bs) VX = 6(x) 2 0(2) (LMP) 10Saddlepoint Optimality Criteria without Difterentiability a1 The Fritz John saddlepoint problem (FJSP) Find @ © X°, %) GR, # © R*, (Fo,#) > 0, if they exist, such that O(Z,Fo7) S $2,707) S (2,707) for allr = 0, r € R*, and all z € X°, (FISP) (z,ro,7) = ro6(z) + rg(z) The Kuhn-Tucker saddlepoint problem (KTSP) Find ¢ € X°,a € R", a = O, if they exist, such that ¥(G,u) S ¥(G,2) S ¥(z,%) for all u 2 0, u ER”, and all z € X°, (KTSP) ¥(z,u) = (2) + ug(z) Remark If (€,7o,7) is a solution of FJSP and 7 > 0, then (#,7/70) is a solution of KTSP. Conversely, if (,a) is a solution of KTSP, then (4,1,a) is a solution of FJSP. Remark The numerical functions ¢(z,ro,r) and ¥(z,u) defined above are often called Lagrangian functions or simply Lagrangians, and the m-dimensional vectors # and u Lagrange multipliers or dual variables. These multipliers play a role in linear and nonlinear programming which is very similar to the role played by the Lagrange multipliers of the classical calculus where a function of several variables is to be minimized subject to equality constraints (see for example [Fleming 65]). Here, because we have inequality constraints, the Lagrange multipliers turn out to be nonnegative. When we shall consider equality constraints in 5.3.2, 6.4.2, and 5.4.8, the multiplier associated with these equalities will not be required to be nonnegative. Remark The right inequality of both saddlepoint problems, FJSP 3 and KTSP 4 $(Z,7o7) S ¢(z,7a7) for alla © X® and ¥(G,a) S (zi) ~—for allz © X° m1as Nonlinear Progremming can be interpreted as a minimum principle, akin to Pontryagin’s maximum principlet [Pontryagin et al. 62]. Pontryagin’s principle in its original form is a necessary optimality condition for the optimal control of systems described by ordinary differential equations. As such, it is a necessary optimality condition for a programming problem, not in R*, but in some other space. More recently [Halkin 66, Canon et al. 66, Mangasarian-Fromovitz 67] a minimum principle has also been established for optimal control problems described by ordinary difference equations. This is a programming problem in R*, which unfortunately is not convex in general, and hence the results of this chapter do not apply. However the optimality conditions of Chaps. 7 and 11, which are based mainly on linearization and not on convexity, do apply to optimal control problems described by nonlinear difference equations. 2. Some basic results for minimization and local minimization problems We establish now some basic results concerning the set of solutions of the minimization problem and relate the solutions of the minimization and local minimization problems to each other. Theorem Let X be a convex set, and let 6 be a convex function on X. The set of solutions of MP 5.1.1 is convea. REMARK A sufficient but not necessary condition for the convexity of X is that X° be a convex set and that g be convex on X°. This follows from 4.1.10 and 3.1.9. PROOF Let z' and z? be solutions of MP. That is, 6(c!) = 6(z*) = min 6(x) 2EX It follows by the convexity of X and @, that forO S \ S 1, (1 — Aja? + Az? E X,and 4[(l — A)z! + dz] S (1 — A)O(z") + AO(z?) = O(z') = min A(z) 2EX Hence (1 — d)z! + Az? is also a solution of MP, and the set of solutions is convex. + Pontryagin gets 3 maximum principle instead of 9 minimum principle because his Lagrangian is the negative of the Lagrangian of nonlinear programming. 7mSaddlepoint Optimality Criteria without Differentiability 5.2 Uniqueness theorem Let X be convex and z be a solution of MP 6.1.1. If 8 ts strictly convex at £, then # is the unique solution of MP. PROOF Let # = @ be another solution of MP, that is, 4 X, and 6(£) = o(@). Since X is convex, then (1 — A)zé + A¢ GC X whenever 0 <2 <1, and by the strict convexity of @ at Z al(1 — A)Z + AZ] < (1 — d)O(Z) + (4) = 0(2) This contradicts the assumption that @(Z) is a minimum, and hence # cannot be another solution. Theorem Let X be convex, and let 6 be a nonconstant concave function on X. Then no interior point of X is a solution of MP 6.1.1, or equivalently any solution & of MP, if it exists, must be a boundary point of X. PROOF If MP 6.1.1 has no solution the theorem is trivially true. Let #be asolution of MP. Since @ is not constant on X, there exists a point x EX such that 6(x) > 0(@). If z is an interior point of X, there exists a point y € X such that for some 4,0 SA <1 z2=(1-—A)r+hy See Fig. 5.2.1. Hence (2) = O[(1 — d)a + Ay] 2 (1 ~ AO) + Oy) > (1 — A)O(E) + dO(z) = 0(@) and @(z) does not attain its minimum at an interior point z. fj Figure 5.2.2 shows a simple example of Theorem 3 in R. Theorem If €%s a solution of MP 6.1.1, then it is also a solution of LMP 5.1.2. The converse is true if X is convex and 6 ts convex at £. Fig. 5.2.1 38 Nonlinear Programming Fig. 5.2.2 A simple example of Theorem Sin R. PROOF If < solves MP, then Z solves LMP for any 6 > 0. To prove the converse now, assume that Z solves LMP for some 6 > 0, and let X be convex and 6 be convex at. Let y be any point in X distinct from 2. Since X is convex, (1 — \)é+ Ay E X forO 0, then & is a solution of MP 6.1.1. PROOF The second statement of the theorem follows trivially from the first statement by Remark 5.1.5. Let (4%) be a solution of KTSP 6.1.4.. Then for all u 2 0 in R™ 7”Saddlepoint Optimality Criteria without Differentisbility 5s and all z in X° (2) + ug(Z) S 0(@) + ag(Z) S A(z) + g(x) From the first inequality we have that (u ~ a)g(Z) $0 for allu 20 For any j, 1 Sj Sm, let w= fort=1,2,...,7-Ugti,...,m u=aj+1 It follows then that g;(#) $0. Repeating this for all j, we get that g(Z) = 0, and hence Z is a feasible point, that is, # € X. Now since &@ 2 0 and g(2) S 0, we have that ag(Z) $0. But again from the first inequality of the saddlepoint problem we have, by setting u = 0, that ag(z) 2 0. Hence ug(#) = 0. Let z be any point in X, then from the second inequality of the saddlepoint problem we get 6(Z) S 6(z) + tg(z) —_—[since aig(Z) = 0] S 6(x) [since @ = 0, g(x) < 0) Hence @ is a solution of MP. It should be remarked here that because no convexity assumptions were made in the above theorem, equality constraints ean be handled by replacing them by two inequality constraints. That is, replace h(z) = 0 by A(z) < 0 and —A(z) $0. Problem Consider the minimization problem (2) = min 2) EX = (2 |z | X%, gla) SO, h(x) = 0} ze where A is a k-dimensional vector function on X° and all else is defined as in MP 5.1.1. Let $(2,ro7,8) = ro0(x) + rg(x) + sh(z) and ¥(z,u,v) = O(2) + ug(z) + vh(z) Show that if there exist 2 € X°, a © R", i 2 0, 5 © R* such that V(E,uy) S ¥(Ea,0) S ¥(2,7,0) ie allu 2 0,u ER”, allv € R*, and allz € «) CTBe Nonlinear Programming or if there exist £ © X°, 7) ER, 7) > 0,7 CR, F 2 0, § EC R* such that 6(2,70,7,8) S O(G,Fo 7,3) S o(x,70,7,8) for allr 2 0,r € R”, all s © R*, and allc © X° then is a solution of the minimization problem. (Notice that v and s are not restricted in sign.) The question may be raised as to what sort of point is the point @ if (Z,70,7) is a solution of FJSP 5.1.8 and we do not require that 7) > 0. An answer to this question is given by the following result. Corollary If (&,70,7) is a solution of FJSP 5.1.8, then either £ solves MP 5.1.1 or X has no interior relative to g(x) 0, thatis, {x |x © X% g(x) < 0} = 9. PROOF By the same argument as in the proof of Theorem 1 above we show that g(Z) S$ 0 and 7g(z) = 0. Now, if 7) > 0, then % solves MP by Theorem J. If 7) = 0, then 7 > 0 and we have from the second inequality of FJSP 6.1.3 that = 79(@) Sfg(z) foralla @ X® Now, if the set {x |x © X°, g(x) < 0} is nonempty, then for any element # in it 79(@) <0, which contradicts the fact established above that Fg(x) 2 O for allx © X°. Hence {x| 4 © X%, g(x) <0} = 9. 4, Necessary optimality criteria The situation with respect to necessary criteria is considerably more complicated than the situation with respect to sufficient optimality criteria. The two situations are compared in the following table: Necessary criteria Sufficient criteria (a) Convexity needed No convexity needed (b) Consequence of separation theorem of Separation theorem of convex sets not convex sets needed needed {c) Regularity condition (constraint quali- No constraint qualification needed fication) needed in the more important necessary criterion (7 below) 1%Saddlepoint Optimality Criteria without Differentiability 6.4 We begin by establishing a necessary optimality criterion which does not require any regularity conditions. This necessary optimality criterion is similar in spirit to the necessary optimality criterion of Fritz John (John 48] (see also Chap. 7), which was derived for the case where the functions 6 and g were differentiable but not convex. We use no differentiability here, but instead we use convexity. The present criterion is a saddlepoint criterion, whereas Fritz John’s is a gradient criterion. The main point of similarity is the presence of the multiplier 7s in both criteria. Fritz John saddlepoint necessary optimality theorem [Uzawa 58, Karlin 59] Let X° be a convex set in R, and let 6 and g be convex on X°, If é is a solution of MP 5.1.1, then = and some 7. C R,F ER", (Fo,7) > O solve FISP 5.1.8 and ig(é) = 0. PROOF Because é solves MP 6(z) — 6%) <0 @) @) has no solution c € X® gz) 50 By Corollary 4.2.2 there exist 7. € R, 7 € R*, (70,7) > 0 such that F[O(x) — O(2)] + Fo(x) 2 0 for allz © X° By letting z = @ in the above, we get that 79(@) 2 0. But since 7 = 0 and g(Z) S 0, we also have 7g(z) $ 0. Hence 79(@) = 0 and 7o0(2) + Fg(#) S 7o0(x) + Fo(x) for alla © X® which is the second inequality of FISP 5.1.3. We also have, because g(#) S 0, that rg(z) <0 for ally 2 0,r € R™ and hence, since 7g(#) = 0 F(Z) + rg(Z) S Fo) + 7g(Z) forallr 20,7 © R" which is the first inequality of FJSP 6.1.3. Jj Problem Consider the minimization problem (Z) = min 6(2) EEX = (x|4 EX, g(z) S 0, A(z) = 0} Wm5a Nonlinear Programming where h is a k-dimensional linear vector function on R*, 6 and g are convex on X°, and all else is defined as in MP 5.1.1. Show that if 7 is asolution of the above problem, then ¢ and some?) € R,# € R*,5 € R*, (FF) 2 0, (F0,7,3) ¥ 0 satisfy Fg(Z) = 0, and $(E,70,7,8) S O(€,70,7,8) S o(2,70,7,3) for allr 2 0,r © R*, alis € R*, and all z € X° $(x,70,7,8) = rob(x) + rg(x) + sh(x) (Hint: Again use Corollary 4.2.2.) It should be remarked here that in the above necessary optimality criteria there is no guarantee that 7) > 0. In cases where 7) = 0 it is intuitively obvious that the necessary optimality criterion FJSP 6.1.3 does not say much about the minimization problem MP 5.1.1, because the function @ has disappeared from 6.1.3 and any other function could have played its role. In order to exclude such cases, we have to introduce some regularity conditions. These regularity conditions are referred to in the literature as constraint qualifications. We shall have occasion to use a number of these constraint qualifications throughout this book. Some of these constraint qualifications (like the three introduced below) make use only of the convexity properties of the functions defining the feasible region X. Other constraint qualifications, to be introduced later, in Chap. 7 for example, make use mostly of the differentiability properties of the functions defining the feasible region X. Slater's constraint qualification [Slater 50] Let X° be a convex set in R". The m-dimensional convex vector function g on X° which defines the convex feasible region X = {x|x EX, g(x) $0} is said to satisfy Slater’s constraint qualification (on X°) if there exists an # € Xsuchthatg(#) < 0. Karlin’s constraint qualification [Karlin 59] Let X° be a convex set in R*. The m-dimensional convex vector function g on X° which defines the convex feasible region X = {x|2 EX, g(x) $0} is said to satisfy Karlin’s constraint qualification (on X°) if there exists no p € R”, p 2 O such that pg(z) 20 = foralla € X° 3Saddlepoint Optimality Criteria without Differentiability 6.4 The strict constraint qualification Let X° be a convex set in R". The m-dimensional convex vector function g on X° which defines the convex feasible region X = {z| x € X% g(a) $0} is said to satisfy the strict constraint qualification (on X°) if X contains at least two distinct points z' and x? such that g is strictly convex at z'. Lemma Slater’s constraint qualification 3 and Karlin's constraint qualification 4 are equivalent. The strict constraint qualification & implies Slater’s and Karlin’s constraint qualifications 3 and 4. PROOF (84) By Gordan’s generalized theorem 4.2.3, 8 and 4 are equivalent. (6 = 8) Since X° is convex, for any \,0 0, solve FISP 5.1.3 and fg(@) = 0. Fy > 0, then by Remark 5.1.5 we are done. If 7) = 0, then F > 0, and from the second inequality of FJSP 5.1.3 0 S rg(z) for alla € X® [since # = 0 and fg(z) = 0} which contradicts Karlin’s constraint qualification 4. Hence 7) >0. §j We summarize in Fig. 5.4./ the relationships between the solutions of the various problems of this chapter. We end this section by deriving a Kuhn-Tucker saddlepoint necessary optimality criterion in the presence of linear equality constraints. In order to do this, we have to let the set X° of MP 5.1.1 be the entire space R*. Kuhn-Tucker saddlepoint necessary optimality theorem in the presence of linear equality constraints [Uzawa 58] Let 0, g be respectively a numerical function and an m-dimensional vector function which are both conver on R", Let h be a k-dimensional linear vector function on R", that is, h(x) = Br — d, where Bisak Xn matrix, and disak-vector. Let € be a solution of the minimization problem 9) = min 6(2) EEX = (z|c£ CR, g(x) $0, Br = d} and let g and h satisfy any of the constraint qualifications: (i) (Generalized Slater 3) g(x) < 0, Br = d has a solution x € R* Fig. 5.4.1 Relationships between the solutions of the local minimization problem (LMP) 6.1.2, the minimization problem (MP) 4.1.1, the Fritz John saddlepoint problem (FJSP) 5.1.3, and the Kuhn-Tucker saddlepoint problem (KTSP) 6.1.4.Baddlepoint Optimality Criteria without Differentiability Ba (ii) (Generalized Karlin 4) There exists no p >0, pCR", g € R* such that pg(z) + q(Br~—~d) 20 — forallz € Re (iii) (Generalized strict 5) X contains at least two distinct points x! and x? such that g is strictly convex at x! Then & and some & € R", & 2 0,5 € R* satisfy uig(Z) = 0, and (Zur) S O(€,2,8) S $(x,0.0) for all u 2 0, u € R*, all v € R*, and ali x € R" o(z,u,v) = 0(z) + ug(z) + v(Bz — d) PROOF We shall first establish the fact that (iii) = (i) = Gi) and then prove the theorem under (ii). [Gii) => ()] Since g(z') S$ 0, g(x?) $ 0, Br' =d, Bx? =d, we have forO <\ < 1 that B[(1 — A)z! + Ax*] = d and g{(l — da? + dz?) < (1 — Agia!) + Ag(z?) $0 Hence {i) holds. [(i) = (ii)) If g(#) < 0 and Bé = d, then for any p = 0, p€ R”, and any g€ R*, pg(z) + g(BE — ad) <0 Hence (ii) holds. We establish now the theorem under (ii). There will be no loss of generality if we assume that the rows B,, ... , By of B are linearly independent, for suppose that some row, B, say, is linearly dependent kal on By... , Br-1, that is By = > siB;, where s;, . . . , S,~1 are fixed i=l real numbers. Then kal kol Ba-d = Y sBa-de= Y sidi— dy isa is for any z satisfying Ba =d;,i=1,...,k—1. But since ZE X, ko and By, =d, i=1,..., k, it follows that > sd;i—d, = 0 and i= Byx —d, =0 for any x satisfying Ba =d, i=1,..., k-1. a1ba Nonlinear Programming Hence the equality constraint B,z = d, is redundant and can be dropped from the minimization problem without changing the solution @ Then, once we have established the theorem for the linearly independent rows of B, we can reintroduce the linearly dependent row B, (without changing the minimization problem) and set 6, = Oin the saddlepoint problem. By 2 above, there exist % CR, FER", SCR, (FF) 20, (¥,7,8) * 0, which satisfy 7g(Z) = 0 and solve the saddlepoint problem of 2. If % > 0, then & = 7/7, 0 = 5/7 solve the saddlepoint problem of the present theorem, and we are done. Suppose 7 = 0. Then since 7g(z) = 0 and Bz — d = 0, we have by the second inequality of the saddlepoint problem of 2 that O S rg(x) + 3(Bx — d) for allz © R" which contradicts (ii) above, if # > 0. Now suppose that 7 = 0, then 3 ~ 0 and 3(Bx — d) 2 Oforallzin R*. Hence (see last part of proof of 4.2.4) B’5 = 0, which contradicts the assumption that the rows of B are linearly independent. Thus f)>0. JChapter Six Differentiable Convex and Concave Functions in this chapter we give some of the properties of differentiable and twice-differentiable convex and concave functions. Appendix D summarizes the results of differentiable and twice-differentiable functions which are needed in this chapter. 1. Differentiable convex and concave functions Let @ be a numerical function defined on an open set Tin R". We recall from Appendix D that if 6 is differentiable at € I, then zeR, 6(@ + 2) = @(2) + Voz)z = + a(Z,z) [x1] E+z2EPr lim a(2,2) = 0 290 where V6(z) is the n-dimensional gradient vector of @ at # whose n components are the partial deriva- tives of @ with respect tozi,..., x, evaluated at #, and a is a numerical function of z. Theorem Let @ be a numerical function defined on an open set T C R* and let @ be differentiable ati GT. If 6 ts conver at £ ET, then (x) — o(@) 2 VO(z)(x — 2) for eachz ET If @ is concave at ¢ ET, then O(a) — O(&) S VO(z)(x — #) for each c ET PROOF Tet @ be convex at &. Since I is open, there exists an open ball Ba(Z) around #, which is contained in T. Let EF, and let x. Then for some y, such61 Nonlinear Programming that 0 < 4 < 1 andy < 4/||\z — &||, we have that = 8+ na — &) = (1 — we + wx © BA) CT Since 6 is convex at 2, it follows from 4.1.1, the convexity of Bs(#) (see 3.1.7), and the fact that @ € B,(#), that for0 0, the last three relations give 6(x) — 6() 2 VO(z)(x — 2) The proof for the concave case follows in a similar way to the above by using 4.1.2 instead of 4.7.1. J Theorem Let 6 be a numerical differentiable function on an open convex set ICR. 6%s convex on T tf and only if 6(z?) — 6(x') = V(x!) (x? — x) for each x',x7 ET 6 ts concave on T af and only if 6(z?) — O(z1) S$ VO(x')(2? — 2!) for each v2? ET 84

Nonlinear Programming - Olvi L Mangasarian (1994)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonlinear Programming - Olvi L Mangasarian (1994)

Uploaded by

Copyright:

Available Formats

You might also like