Professional Documents
Culture Documents
Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)
Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)
IN
CONVEX OPTIMIZATION
A Finite-Dimensional View
Anulekha Dhara
Joydeep Dutta
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
List of Figures xi
Foreword xv
Preface xvii
vii
Bibliography 413
Index 423
xi
xiii
xv
Stephan Dempe
Freiberg, Germany
xvii
Anulekha Dhara
Avignon, France
Joydeep Dutta
Kanpur, India
1.1 Introduction
Optimization is the heart of applied mathematics. Various problems encoun-
tered in the areas of engineering, sciences, management science, and economics
are based on the fundamental idea of mathematical formulation. Optimiza-
tion is an essential tool for the formulation of many such problems expressed
in the form of minimization of a function under certain constraints like in-
equalities, equalities, and/or abstract constraints. It is thus rightly considered
a science of selecting the best of the many possible decisions in a complex
real-life environment.
Even though optimization problems have existed since very early times,
the optimization theory has settled as a solid and autonomous field only in
recent decades. The origin of analytic optimization lies in the classical calculus
of variations and is interrelated with the development of calculus. The very
concept of derivative introduced by Fermat in the mid-seventeenth century via
the tangent slope to the graph of a function was motivated by solving an op-
timization problem, leading to the Fermat stationary principle. Around 1684,
Leibniz developed a method to distinguish between minima and maxima via
second-order derivatives. The calculus of variations was introduced by Euler
while solving the Brachistochrone problem, which was posed by Bernoulli in
1696. The problem is stated as “Given two points x and y in the vertical plane.
A particle is allowed to move under its own gravity from x to y. What should
be the curve along which the particle should move so as to reach y from x in
the shortest time?” In 1759, Lagrange gave a completely different approach
to solve the problems in calculus of variations, today known as the Lagrange
multiplier rule. The Lagrange multipliers are viewed as the auxiliary variables
that are primarily used to derive the optimality conditions for constrained
optimization problems. These optimality conditions are the building blocks of
optimization theory.
During the second world war, Dantzig developed the simplex method to
solve linear programming problems. The first attempt to develop the La-
grange multiplier rules for nonlinear optimization problem was made by Fritz
John [71] in 1948. In 1951, Kuhn and Tucker [73] gave the Lagrange multiplier
rule for convex and other nonlinear optimization problems involving differen-
tiable functions. It was later found that Karush in 1939 had independently
established the optimality conditions similar to those of Kuhn and Tucker.
These optimality conditions are today famous as the Karush–Kuhn–Tucker
(KKT) optimality conditions. All the initial theories were developed with the
differentiability assumptions of the functions involved.
Meanwhile, efforts were made to shed the differentiability hypothesis,
thereby leading to the development of nonsmooth convex analysis as a subject
in itself. This added a new chapter to optimization theory. The key contrib-
utors in the development of convexity theory are Fenchel [45], Moreau [88],
and Rockafellar [97]. An important milestone in this direction was the publi-
cation of Convex Analysis by Rockafellar [97], where the theory of nonsmooth
convex analysis was presented in detail for the first time. No wonder this text
is by far a must for all optimization researchers. In the early 1970s, his stu-
dent Clarke coined the term nonsmooth optimization to categorize the theory
involving nondifferentiable optimization problems. He extended the calculus
rules and applied them to optimization problems involving locally Lipschitz
functions. This was just the beginning. The subsequent decade witnessed a
large development in the field of nonsmooth nonconvex optimization. For de-
tails on nonsmooth analysis, one may refer to Borwein and Lewis [17]; Bor-
wein and Zhu [18]; Clarke [27]; Clarke, Ledyaev, Stern and Wolenshi [28];
Mordukhovich [86]; and Rockafellar and Wets [101].
However, such developments have not overshadowed the importance of
convex optimization, which still is and will remain a pivotal area of research. It
has paved a path not only for theoretical improvements, but also algorithmic
designing aspects. In this book we focus mainly on convex analysis and its
application to the development of convex optimization theory.
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m and
hj (x) = 0, j = 1, 2, . . . , l},
φ(x) = ha, xi + b,
where a ∈ Rn and b ∈ R.
It is important to note at the very outset that in optimization theory it is
worthwhile to consider extended-valued functions, that is, functions that take
values in R̄ = R∪{−∞, +∞}. The need to do so arises when we seek to convert
a constrained optimization problem into an unconstrained one. Consider for
example the problem (CP ), which can be restated as
where
f (x), x ∈ C,
f0 (x) =
+∞, otherwise.
All the modern books on convex analysis beginning with the classic Convex
Analysis by Rockafellar [97] follow this framework. However, when we include
infinities, we need to know how to deal with them. Most rules with infinity
are intuitively clear except possibly 0 × (+∞) and ∞ − ∞. Because we will
be dealing mainly with minimization problems, we will follow the convention
0 × (+∞) = (+∞) × 0 = 0 and ∞ − ∞ = ∞. This convention was adopted in
Rockafellar and Wets [101] and we shall follow it. However, we would like to
ascertain that we really need not get worried about ∞ − ∞ as the functions
considered in this book are real-valued or proper functions. An extended-
valued function φ : Rn → R̄ is said to be a proper function if φ(x) > −∞ for
every x ∈ Rn and dom φ is nonempty where dom φ = {x ∈ Rn : φ(x) < +∞}
is the domain of φ.
It is worthwhile to note that the definition of a convex function given
above can be extended to the case when φ is an extended-valued function. An
extended-valued function φ : Rn → R̄ is a convex function if for any x, y ∈ Rn
and λ ∈ [0, 1],
with the convention that ∞ − ∞ = +∞. A better way to handle the convexity
of an extended-valued convex function is to use its associated geometry. In
this direction we describe the epigraph of a function φ : Rn → R̄, which is
given as
F1 + F2 = {x1 + x2 ∈ Rn : x1 ∈ F1 , x2 ∈ F2 }.
λF = {λx ∈ Rn : x ∈ F }.
Bδ (x̄) = x̄ + δB.
The above inequality holds as equality if and only if x = αy for some scalar
α ∈ R.
converges to the i-th component of x̄. The vector x̄ is called the limit of {xk }.
Symbolically it is expressed as
xk → x̄ or lim xk = x̄.
k→∞
It is obvious that the sequences {zr } and {yr } are nondecreasing and non-
increasing, respectively. If {xk } is bounded below or bounded above, the se-
quences {zr } or {yr }, respectively, have a limit. The limit of {zr } is called
the limit infimum or lower limit of {xk } and denoted by lim inf k→∞ xk , while
that of {yr } is called the limit supremum or upper limit of {xk } and denoted
by lim supk→∞ xk . Equivalently,
For a sequence {xk }, lim inf k→∞ xk = −∞ if the sequence is unbounded below
while lim supk→∞ xk = +∞ if the sequence is unbounded above. Therefore,
{xk } converges to x̄ if and only if
Equivalently,
where the term on the right-hand side of the inequality denotes the limit
infimum or the lower limit of the function φ defined as
lim inf φ(x) = lim inf φ(x).
x→x̄ δ↓0 x∈Bδ (x̄)
Equivalently,
φ(x̄) ≥ lim sup φ(x),
x→x̄
where the term on the right-hand side of the inequality denotes the limit
supremum or the upper limit of the function φ defined as
lim sup φ(x) = lim sup φ(x).
x→x̄ δ↓0 x∈Bδ (x̄)
Alternatively, φ is continuous at x̄ if for any ε > 0 there exists δ(ε, x̄) > 0
such that
|φ(x) − φ(x̄)| ≤ ε whenever kx − x̄k < δ(ε, x̄).
The function φ is continuous over a set F ⊂ Rn if φ is continuous at every
x̄ ∈ F .
The next result from Rockafellar and Wets [101] gives a characterization
of limit infimum of an arbitrary extended-valued function.
Proof. Suppose that lim inf x→x̄ φ(x) = ᾱ. We claim that for xk → x̄ with
φ(xk ) → α, α ≥ ᾱ. As xk → x̄, for any δ > 0, there exists kδ ∈ N such that
xk ∈ Bδ (x̄) for every k ≥ kδ . Therefore,
Because δ is arbitrarily chosen, so taking the limit δ ↓ 0 along with the defin-
ition of the limit infimum of φ leads to
which reduces the preceding condition to φ(x̄) ≤ ᾱ, thereby proving that epi φ
is a closed set in Rn × R.
Next we show that (ii) implies (iii). For a fixed α ∈ R, suppose
that {xk } ⊂ lev≤α φ such that xk → x̄. Therefore, φ(xk ) ≤ α, that is,
(xk , α) ∈ epi φ. By (ii), epi φ is closed, which implies (x̄, α) ∈ epi φ, that
is, φ(x̄) ≤ α. Thus, x̄ ∈ lev≤α φ, thereby yielding condition (iii).
Finally, to obtain the equivalence, we will establish that (iii) implies (i).
To show that φ is lsc, we need to show that for every x̄ ∈ Rn ,
On the contrary, assume that for some x̄ ∈ Rn and some sequence xk → x̄,
Thus, there exists a subsequence, without relabeling, say {xk } such that
φ(xk ) ≤ α for every k ∈ N, which implies xk ∈ lev≤α φ. By (iii), the lower
level set lev≤α φ is closed and hence x̄ ∈ lev≤α φ, that is, φ(x̄) ≤ α, which
contradicts (1.1). Therefore, φ is lsc over Rn .
The proof of the last implication, that is, (iii) implies (i) of Theorem 1.9
by contradiction was from Bertsekas [12]. We present an alternative proof for
the same from Rockafellar and Wets [101].
It is obvious that for any x̄ ∈ Rn ,
Proof. Suppose that (x̄, ᾱ) ∈ cl epi φ, which implies that there exists
{(xk , αk )} ⊂ epi φ such that (xk , αk ) → (x̄, ᾱ). Thus, taking the limit as
k → +∞, the condition
yields
as desired.
Conversely, assume that lim inf x→x̄ φ(x) ≤ ᾱ but (x̄, ᾱ) 6∈ cl epi φ.
We claim that, lim inf x→x̄ φ(x) = ᾱ. On the contrary, suppose that
lim inf x→x̄ φ(x) < ᾱ. As (x̄, ᾱ) ∈
/ cl epi φ, there exists δ̄ > 0 such that for
every δ ∈ (0, δ̄),
which implies for every (x, α) ∈ Bδ ((x̄, ᾱ)), φ(x) > α. In particular for
(x, ᾱ) ∈ Bδ ((x̄, ᾱ)), φ(x) > ᾱ, that is,
Therefore, taking the limit as δ → 0 along with the definition of limit infimum
of a function yields
which is a contradiction. Therefore, lim inf x→x̄ φ(x) = ᾱ. By Lemma 1.8, there
exists a sequence xk → x̄ such that φ(xk ) → ᾱ. Because (xk , φ(xk )) ∈ epi φ,
(x̄, ᾱ) ∈ cl epi φ, thereby reaching a contradiction and hence the result.
Now the question is whether it is possible to construct a function that is
the closure of the epigraph of another function. This leads to the concept of
closure of a function.
Definition 1.11 For any function φ : Rn → R̄, an lsc function that is con-
structed in such a way that its epigraph is the closure of the epigraph of φ is
called the lower semicontinuous hull or the closure of the function φ and is
denoted by cl φ. Therefore,
epi(cl φ) = cl epi φ.
−1 1 −1 1
epi φ epi cl φ
If φ is lsc, then it is closed as well. Also cl φ is lsc and the greatest of all
lsc functions ψ such that ψ(x) ≤ φ(x) for every x ∈ Rn . From Theorem 1.9,
one has that closedness is the same as lower semicontinuity over Rn . In this
discussion, the function φ was defined over Rn . But what if φ is defined over
some subset of Rn . Then one cannot talk about the lower semicontinuity of
the function over Rn . In such a case, how is the closedness of a function related
to lower semicontinuity? This issue was addressed by Bertsekas [12]. Consider
a set F ⊂ Rn and a function φ : F → R̄. Observe that here we define φ over
the set F and not Rn . The function φ can be extended over Rn by defining a
function φ̄ : Rn → R̄ as
φ(x), x ∈ F,
φ̄(x) =
+∞, otherwise.
Note that both the extended-valued functions φ and φ̄ have the same epigraph.
Thus from the above discussion, one has φ is closed if and only if φ̄ is lsc over
Rn . Also observe that the lower semicontinuity of φ over dom φ is not sufficient
for φ to be closed. In addition, one has to assume the closedness of dom φ.
To emphasize this fact, let us consider a simple example. Consider φ : R → R̄
defined as
0, x ∈ (−1, 1),
φ(x) =
+∞, otherwise.
Here, dom φ = (−1, 1) over which the function is lsc but epi φ is not closed
and hence, φ is not closed. The closure of φ is given by
0, x ∈ [−1, 1],
cl φ(x) =
+∞, otherwise.
Observe that in Figure 1.1, epi φ is not closed while epi cl φ is closed. There-
fore, we have the following result from Bertsekas [12].
Observe that for a coercive function, every nonempty lower level set is
bounded. Below we prove the Weierstrass Theorem.
(ii) there exists α ∈ R such that the lower level set lev≤α φ is nonempty and
bounded.
(iii) φ is coercive.
Proof. Suppose that condition (i) holds, that is, dom φ is bounded. Because
φ is proper, φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. Denote
φinf = inf x∈Rn φ(x), which implies φinf = inf x∈dom φ φ(x). Therefore, there
exists a sequence {xk } ⊂ dom φ such that φ(xk ) → φinf . Because dom φ is
bounded, {xk } is a bounded sequence, which by Bolzano–Weierstrass Theo-
rem, Proposition 1.3, has a convergent subsequence. Without loss of generality,
assume that xk → x̄. By the lower semicontinuity of φ,
o(kx − x̄)k
where limx→x̄ = 0. A function φ is differentiable if it is differen-
kx − x̄k
tiable at every x ∈ Rn . The derivative, ∇φ(x̄), of φ at x̄ is also called the
gradient of φ at x̄, which can be expressed as
∂φ ∂φ ∂φ
∇φ(x̄) = (x̄), (x̄), . . . , (x̄) ,
∂x1 ∂x2 ∂xn
∂φ
where , i = 1, 2, . . . , n denotes the i-th partial derivative of φ. If φ is
∂xi
continuously differentiable, that is, the map x 7→ ∇φ(x) is continuous over
Rn , then φ is called a smooth function. If φ is not smooth, it is called a
nonsmooth function.
Similar to the first-order differentiability, we have the second-order differ-
entiability notion as follows.
which is equivalent to
φ(x) = φ(x̄) + h∇φ(x̄), x − x̄i + h∇2 φ(x̄)(x − x̄), x − x̄i + o(kx − x̄k2 ).
The matrix ∇2 φ(x̄) is also referred to as the Hessian with the ij-th entry of
∂2φ
the matrix being the second-order partial derivative (x̄). If φ is twice
∂xi ∂xj
2
continuously differentiable, then the matrix ∇ φ(x̄) is a symmetric matrix.
In the above definitions we considered the function φ to be a scalar-valued
function. Next we define the notion of differentiability for a vector-valued
function Φ.
∂φi
with the ij-th entry of the matrix being the partial derivative (x̄). In
∂xj
the above expression of JΦ(x̄), the vectors ∇φ1 (x̄), ∇φ2 (x̄), . . . , ∇φm (x̄) are
written as row vectors.
Observe that the derivative is a local concept and it is defined at a point
x if x ∈ int dom φ. Below we state the Mean Value Theorem, which plays a
pivotal role in the study of optimality conditions.
With all these basic concepts we now move on to the study of convexity.
The importance of convexity in optimization stems from the fact that when-
ever we minimize a convex function over a convex set, every local minimum
is a global minimum. Many other issues in optimization depend on convexity.
However, convex functions suffer from the drawback that they need not be
differentiable at every point of their domain of definition and the nondiffer-
entiability may be precisely at the point where the minimum is achieved. For
instance, consider the minimization of the absolute value function, |x|, over R.
At the point of minima, x̄ = 0, the function is nondifferentiable. How this ma-
jor difficulty was overcome by the development of a completely different type
of analysis is possibly one of the most thrilling developments in optimization
theory. This analysis depends on set-valued maps, which we briefly present
below.
Definition 1.19 A set-valued map Φ from Rn to Rm associates every x ∈ Rn
to a set in Rm ; that is, for every x ∈ Rn , Φ(x) ⊂ Rm . Symbolically it is
expressed as Φ : Rn ⇉ Rm . A set-valued map is associated with its graph
defined as
gph Φ = {(x, y) ∈ Rn × Rm : y ∈ Φ(x)}.
Φ is said to be a proper map if there exists x ∈ Rn such that Φ(x) 6= ∅. Φ is said
to be closed-valued or convex-valued or bounded-valued if for every x ∈ Rn , the
sets Φ(x) are closed or convex or bounded, respectively. Φ is locally bounded
at x̄ ∈ Rn if there exists δ > 0 and a bounded set F ⊂ Rn such that
Φ(x) ⊂ V, ∀ x ∈ Bδ (x̄).
The set-valued map Φ is said to be closed if it has a closed graph; that is, for
any sequence {xk } ⊂ Rn with xk → x̄ and yk ∈ Φ(xk ) with yk → ȳ, ȳ ∈ Φ(x̄).
A set-valued map Φ : Rn → Rm is said to be upper semicontinuous (usc) at
x̄ ∈ Rn if for any ε > 0, there exists δ > 0 such that
Φ(x) ⊂ Φ(x̄) + εB, ∀ x ∈ Bδ (x̄),
where the balls are in the respective spaces. If Φ is locally bounded and has a
closed graph, then it is usc. If Φ is single-valued, that is, Φ(x) is singleton for
every x, the upper semicontinuity of Φ coincides with continuity.
For more on set-valued maps, the readers may refer to Berge [10]. A de-
tailed analysis of convex function appears in Chapter 2.
(y, f (y))
x y x
The result is obtained by simply multiplying (1.3) with λ and (1.4) with
(1 − λ) and then adding them up. This description geometrically means that
the tangent plane should always lie below the graph of the function. For a
convex function f : R → R, it looks something like Figure 1.2. This important
characterization of a convex function leads to the following result.
that is,
f (x) ≥ f (x̄), ∀ x ∈ C,
Remark 1.21 Expressing the optimality condition in the form of (1.5) leads
to what is called a variational inequality. Let F : Rn → Rn be a given function
and C be a closed convex set in Rn . Then the variational inequality V I(F, C)
is the problem of finding x̄ ∈ C such that
hF (x̄), x − x̄i ≥ 0, ∀ x ∈ C.
C λ = λ0
x x̄
hF (y) − F (x), y − xi ≥ 0.
However, when f is a convex function, one has the following pleasant property.
For proof, see Rockafellar [97]. However, the reader should try to prove it on
his/her own. We have shown that when (CP ) has a smooth f , one can write
down a necessary and sufficient condition for a point x̄ ∈ C to be a global
minimizer of (CP ). In fact, as already mentioned, the importance of studying
convexity in optimization stems from the following fact. For the problem (CP ),
every local minimizer is a global minimizer irrespective of the fact whether f
is smooth or not. This can be proved in a simple way as follows. If x̄ is a local
minimizer of (CP ), then there exists δ > 0 such that
f (x) ≥ f (x̄), ∀ x ∈ C ∩ Bδ .
Now consider any x ∈ C. Then it is easy to observe from Figure 1.3 that there
exists λ0 ∈ (0, 1) such that for every λ ∈ (0, λ0 ),
λx + (1 − λ)x̄ ∈ C ∩ Bδ .
Hence
θ(x) ≥ 0, ∀ x ∈ C
that is,
h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 . (1.6)
The property of ∇f given by (1.6) is called strong monotonicity with 2ρ as the
modulus of monotonicity. It is in fact interesting to observe that if f : Rn → R
is a differentiable function for which there exists ρ > 0 such that for every
x, y ∈ Rn ,
Now we request the reader to show that f is strongly convex with modulus
ρ > 0. In fact, if f is strongly convex with ρ > 0 one can also show that ∇f
is strongly monotone with ρ > 0. Thus we conclude that f is strongly convex
with modulus of strong convexity ρ > 0 if and only if ∇f is strongly monotone
with modulus of monotonicity ρ > 0.
It is important to note that one cannot guarantee θ to be finite unless C
has some additional conditions, for example, C is compact. Assume that C
is compact and let x̄ be a solution of the problem (CP ), where f is strongly
convex. (Think why a solution should exist.) Now as f is strongly convex, it
is simple enough to see that x̄ is the unique solution of (CP ). Thus from the
definition of θ, for any x ∈ C and y = x̄,
thereby yielding
which leads to
s
θ(x)
kx − x̄k ≤ .
2ρ
This provides an error bound for (CP ), where f is strongly convex and C is
compact. In this derivation if ∇f was strongly monotone with modulus ρ > 0,
then the error bound will have the expression
s
θ(x)
kx − x̄k ≤ .
ρ
and hence the error bound provided by considering that f is strongly monotone
with modulus 2ρ gives a sharper error bound.
Now the question is can we design a merit function for (CP ) that can be
used to develop an error bound even when C is noncompact. Such a merit func-
tion was first developed by Fukushima [48] for general variational inequalities.
In our context, the function given by
α
θ̂α (x) = sup h∇f (x), x − yi − ky − xk2 , α > 0.
y∈C 2
θ̂α (x) ≥ 0, ∀ x ∈ C
and θ̂α (x) = 0 for x ∈ C if and only if x is a solution of (CP ). Observe that
α
θ̂α (x) = − inf h∇f (x), y − xi + ky − xk2 , α > 0.
y∈C 2
For a fixed x, observe that the function
α
φα
x (y) = h∇f (x), y − xi + ky − xk2
2
is a strongly convex function and is coercive (Definition 1.13). Hence φαx attains
a lower bound on C. The point of minimum is unique as φα x is strongly convex.
Hence for each x, the function φα x has a finite minimum value. Thus θ̂ α (x) is
always finite, thereby leading to the following error bound.
as desired.
The reader is urged to show that under the hypothesis of the above theo-
rem, one can prove a more tighter error bound of the form
s
2θ̂α (x)
kx − x̄k ≤ .
4ρ − α
2.1 Introduction
With the basic concepts discussed in the previous chapter, we devote this
chapter to the study of concepts related to the convex analysis. Convex analy-
sis is the branch of mathematics that studies convex objects, namely, convex
sets, convex functions, and convex optimization theory. These concepts will be
used in the subsequent chapters to discuss the details of convex optimization
theory and in the development of the book.
23
(λ1 + λ2 )F ⊂ λ1 F + λ2 F. (2.1)
(λ1 + λ2 )F ⊃ λ1 F + λ2 F,
which along with the inclusion (2.1) yields the desired equality. Observe that
(ii) and (iii) lead to the convexity of (λ1 + λ2 )F = λ1 F + λ2 F .
From Proposition 2.3, it is obvious that intersection of finitely many closed
half spaces is again a convex set. Such sets that can be expressed in this
form are called polyhedral sets. These sets play an important role in linear
programming problems. We will deal with polyhedral sets later in this chapter.
However, unlike the intersection and sum of convex sets, the union as well
as the complement of convex sets need not be convex. For instance, consider
the sets
Observe from Figure 2.2 that both F1 and F2 along with their intersection are
convex sets but neither their complements nor the union of these two sets is
convex.
To overcome such situations where nonconvex sets come into the picture
in convex analysis, one has to convexify the nonconvex sets. This leads to the
notion of convex combination and convex hull.
x = λ1 x1 + λ2 x2 + . . . + λm xm
Pm
with λi ≥ 0, i = 1, 2, . . . , m and i=1 λi = 1.
0000000
1111111 0
10
10
10
10
10
1 0 1
1
0
1 00
10
1
000
111
0
1
0
1
11
000
10
10
10
1 0
1 0
10
10
1
0
10
1
111
000
1111111
000000011
00 0
1
11
00 0
1
01
1 0
10 1
1 0 10
1
010
10
111
00
11
00
111
000 11
00 111
000
0
100
10
1 0
1 0
100
10
1
F1 111
0000
1
0
1
0
1
0
1
F
0
12
0
1 0
10
1
0
10
1
1111
0000
1111
0000
0
1
11
00 11
00 111
0000
1
11111
000000
10
1 0
1 1111
0000
0
1 0
1
1111
0000
111
000 11
00
111
000 01
10 0
1 0
1
111
000 0 1
1 0 1
1111111111
00000000000
1111111
000000011
00 1111111111
0000000000
11111111111
00000000000
1111111
0000000
1111111
0000000 1111111111
0000000000
F1c F2c
1
0
0
10
1
0
1 0
1
1
0 1
0 1
00
10
1
0
10
1
01
10
0
1 0
1
0
1 0
1 0
1
0
10
10
1
0
10
1
0
1
0
1 0
1
0
1 0
1 0
1
0
1 0
1
0
1
01
1 0
1
0
1
0
1 0
1
1
0
0
1 1
0 1
0
1
0
0
10
10
1
0
0
1
0
1
0
1
1
00
1 0
1
0
1 0
1
0
1 0
1 0
1
0
10
10
1
0
10
1
0
1 0
1 0
1
0
1
1 0
1 0
1
1
00
10
1
0
10
10
1
0
1 0
1 0
0
1 1
0 0
1
0
1
01
100
1
0
1 0
1 0
1
0
1
0
1 0
1
0
1 0
1
0
1
0
1
0
1 11
0
0
1 0 0
1
0
1
0
1
1
0 1
0 1
0
1
0
0
1
0
1 0
1 0
1
0
1
0
1 0
1
0 1
0 1
1 0
F1 ∩ F2 F1 ∪ F2
FIGURE 2.2: F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 are
nonconvex.
The next result expresses the concept of convex set in terms of the convex
combination of its elements.
Theorem 2.5 A set F ⊂ Rn is convex if and only if it contains all the convex
combinations of its elements.
x = λ1 x1 + λ2 x2 + . . . + λl xl ,
x = λj xj + (1 − λj )x̃.
Definition 2.6 The convex hull of a set F ⊂ Rn is the smallest convex set
containing F and is denoted by co F . It is basically nothing but the intersection
of all the convex sets containing F .
co F ⊃ F. (2.4)
To establish the result, we will show that F is also convex. Suppose that
Px̃
x, ∈ F, which implies there exist xi ∈ F, λi ≥ P 0, i = 1, 2, . . . , m with
m l
λ
i=1 i = 1 and x̃ i ∈ F, λ̃ i ≥ 0, i = 1, 2, . . . , l with i=1 λ̃i = 1 such that
x = λ1 x1 + λ2 x2 + . . . + λm xm ,
x̃ = λ̃1 x̃1 + λ̃2 x̃2 + . . . + λ̃l x̃l .
(1 − λ)x + λx̃ ∈ F.
x = λ1 x1 + λ2 x2 + . . . + λm xm .
Assume that m > (n + 1). We will prove that x can be expressed as a convex
combination of (m − 1) elements. The result can be established by applying
and
m
X m
X m
X m
X
x= λi xi = λ̃i xi + γ αi xi = λ̃i xi ,
i=1 i=1 i=1 i=1, i6=j
n+1
X
z= λi xi .
i=1
Therefore, by the boundedness of F along with the fact that λi ∈ [0, 1],
i = 1, 2, . . . , n + 1 yields that
n+1
X n+1
X
kzk = k λi xi k ≤ λi kxi k ≤ (n + 1)M.
i=1 i=1
x = λ1 x1 + λ2 x2 + . . . + λm xm
m
X
with λi ∈ R, i = 1, 2, . . . , m and λi = 1.
i=1
Definition 2.11 The affine hull of a set F ⊂ Rn is the smallest affine set
containing F and is denoted by af f F . It consists of all affine combinations
of the elements of F .
The notion of interior suffers from a drawback that even for a nonempty
convex set, it may turn out to be empty. For example, consider a line in R2 .
From the above definition, it is obvious that the interior is empty. But the
set of interior points relative to the affine hull of the set is nonempty. This
motivates us to introduce the notion of relative interior.
but ri F1 ⊂ ri F2 need not hold. For instance, consider F1 = {(0, 0)} and
F2 = {(0, y) ∈ R2 : y ≥ 0}. Here F1 ⊂ F2 with ri F1 = {(0, 0)} and
ri F2 = {(0, y) ∈ R2 : y > 0}. Here the relative interiors are nonempty and
disjoint.
Next we present some properties of closure and relative interior of convex
sets. The proofs are from Bertsekas [11, 12] and Rockafellar [97].
(i) ri F is nonempty.
(ii) (Line Segment Principle) Let x ∈ ri F and y ∈ cl F . Then for λ ∈ [0, 1),
(1 − λ)x + λy ∈ ri F.
x + (γ − 1)(x − y) ∈ F.
(iv) ri F and cl F are convex sets with the same affine hulls as that of F .
Proof. (i) Without loss of generality assume that 0 ∈ F . Then the affine hull
of F , af f F , is a subspace containing F . Denote the dimension of af f F by
m. If m = 0, then F as well as af f F consist of a single point and hence ri F
is the point itself, thus proving the result.
Suppose that m > 0. Then one can always find linearly independent ele-
ments x1 , x2 , . . . , xm from F such that af f F = span{x1 , x2 , . . . , xm }, that is,
x1 , x2 , . . . , xm form a basis of the subspace af f F . If this was not possible, then
there exist linearly independent elements y1 , y2 , . . . , yl with l < m from F such
that F ⊂ span{y1 , y2 , . . . , yl }, thereby contradicting the fact that the dimen-
sion of af f F is m. Observe that co {0, x1 , x2 , . . . , xm } ⊂ F has a nonempty
interior with respect to af f F , which implies co {0, x1 , x2 , . . . , xm } ⊂ ri F ,
thereby yielding that ri F is nonempty.
(ii) Suppose that y ∈ cl F , which implies there exists {yk } ⊂ F such that
yk → y. As x ∈ ri F , there exists ε > 0 such that Bε (x) ∩ af f F ⊂ F . For
λ ∈ [0, 1), define yλ = (1 − λ)x + λy and yk,λ = (1 − λ)x + λyk . Therefore, from
Figure 2.3, it is obvious that each point of B(1−λ)ε (yk,λ ) ∩ af f F is a convex
combination of yk and some point from Bε (x) ∩ af f F . By the convexity of F ,
B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F, ∀ k ∈ N.
x yk,λ
yk
ε
(1 − λ)ε
which implies
Thus, S can be considered a copy of Rm . From this view, one can simply
consider the case when F ⊂ Rn is an n-dimensional set, which implies that
ri F = int F . We will now establish the result for int F instead of ri F .
Because y ∈ cl F ,
y ∈ F + εB, ∀ ε > 0.
y = x + (γ − 1)(x − x̃) ∈ F.
1
Therefore, for λ = ∈ (0, 1),
γ
x = (1 − λ)x̃ + λy,
which by the fact that y ∈ F ⊂ cl F along with the line segment principle (ii)
implies that x ∈ ri F , thereby establishing the result.
(iv) Because ri F ⊂ cl F , by (ii) we have that ri F is convex. From (i), we know
that there exist x1 , x2 , . . . , xm ∈ F such that af f {x1 , x2 , . . . , xm } = af f F
and co {0, x1 , x2 , . . . , xm } ⊂ ri F . Therefore, ri F has an affine hull the same
as that of F .
By Proposition 2.3, F +εB is convex for every ε > 0. Also, as intersection of
convex sets is convex, cl F , which is the intersection of the collection of the sets
F + εB over ε > 0 is convex. Because F ⊂ af f F , cl F ⊂ cl af f F = af f F ,
and as F ⊂ cl F , af f F ⊂ af f cl F , which together implies that the affine
hull of cl F coincides with af f F .
In the result below we discuss the closure and relative interior operations.
The proofs are from Bertsekas [12] and Rockafellar [97].
(i) cl(ri F ) = cl F .
(ii) ri(cl F ) = ri F .
(iii) ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ) and cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2 .
In addition if ri F1 ∩ ri F2 6= ∅,
cl F1 + cl F2 = cl (F1 + F2 )
cl(ri F ) ⊂ cl F.
(1 − λ)x + λy ∈ ri F.
y = x + (γ − 1)(x − x̃) ∈ cl F.
1
Therefore, for λ = ∈ (0, 1),
γ
x = (1 − λ)x̃ + λy,
which by the Line Segment Principle, Proposition 2.14 (ii), implies that
x ∈ ri F , thereby leading to the requisite result.
x + (γi − 1)(x − y) ∈ Fi , i = 1, 2.
x + (γ − 1)(x − y) ∈ F1 ∩ F2 ,
ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ).
(1 − λ)x + λy ∈ ri F1 ∩ ri F2 .
cl F1 ∩ cl F2 = cl (F1 ∩ F2 ).
cl (ri F1 ∩ ri F2 ) = cl (F1 ∩ F2 ).
ri F1 ∩ ri F2 ⊃ ri (F1 ∩ F2 ),
(1 − γ)ỹ + γ x̃ ∈ F,
(1 − γ)ȳ + γ x̄ ∈ LF.
Note that for the equality part in Proposition 2.15 (iii), the nonemptiness
of ri F1 ∩ ri F2 is required, otherwise the equality need not hold. We present
an example from Bertsekas [12] to illustrate this fact. Consider the sets
F1 = {x ∈ R : x ≥ 0} and F2 = {x ∈ R : x ≤ 0}.
Here, both F1 and F2 are closed unbounded sets, whereas the sum
F1 + F2 = {(x1 , x2 ) ∈ R2 : x1 > 0}
ri F1 ⊂ F2 ⊂ cl F1 .
(ii) Consider a convex set F ⊂ Rn . Then any open set that meets cl F also
meets ri F .
(iii) Consider a convex set F ⊂ Rn and an affine set H ⊂ Rn containing a
point from ri F . Then
ri (F ∩ H) = ri F ∩ H and cl(F ∩ H) = cl F ∩ H.
ri F1 = ri F2 ⊂ F2 ⊂ cl F2 = cl F1 ,
x ∈ O ∩ cl(ri F ).
xk ∈ x + εB ⊂ O,
which along with the fact that xk ∈ ri F implies that O also meets ri F ,
hence proving the result.
(iii) Observe that for an affine set H, ri H = H = cl H. Therefore, by the
given hypothesis,
ri F ∩ H = ri F ∩ ri H 6= ∅.
Note that for a nonconvex set F , cone F may or may not be convex. For
example, consider F = {(1, 1), (2, 2)}. Here,
which is convex. Now consider F = {(−1, 1), (1, 1)}. Observe that the cone
generated by F comprises of two rays, that is,
But we are interested in the convex scenarios, thereby moving on to the notion
of the convex cone.
The convex cone generated by the set F is the smallest convex cone containing
F . Also, for a collection of convex sets Fi ⊂ Rn , i = 1, 2, . . . , m, the convex
cone generated by Fi , i = 1, 2, . . . , m can be easily shown to be expressed as
m
[ m
[ X
cone co Fi = λ i Fi .
i=1 λ∈Rm
+
i=1
Some of the important convex cones that play a pivotal role in convex
optimization are the polar cone, tangent cone, and the normal cone. We shall
discuss them later in the chapter. Before going back to the discussion of un-
bounded sets, we characterize the class of convex cones in the result below.
(1 − λ)x ∈ K and λy ∈ K,
0+ F = {d ∈ Rn : F + d ⊂ F }. (2.10)
Proof. (i) Suppose that d ∈ 0+ F , which implies that for every x ∈ F and
λ ≥ 0, x + λd ∈ F . Consider α > 0. Denote λ̄ = λ/α ≥ 0. Then,
x + λ̄(αd) = x + λd ∈ F,
x + λdi ∈ F, i = 1, 2.
x + λd ∈ F, ∀ x ∈ F, ∀ λ ≥ 0,
x + λd ∈ F, ∀ λ ≥ 0.
xk = x + kd, k ∈ N,
x̃ + d = x + (k + 1)d ∈ F
kyk
Therefore, for λ̃ = ≥ 0,
kxk − x̃k
x̃ + dk = (1 − λ̃)x̃ + λ̃xk ,
which implies that x̃ + dk lies on the line starting at x̃ and passing through
xk . Now consider
dk xk − x̃
=
kdk kxk − x̃k
xk − x x − x̃
= +
kxk − x̃k kxk − x̃k
kxk − xk xk − x x − x̃
= +
kxk − x̃k kxk − xk kxk − x̃k
kxk − xk d x − x̃
= + .
kxk − x̃k kdk kxk − x̃k
kxk − xk kkdk x − x̃ x − x̃
= →1 and = → 0,
kxk − x̃k kx − x̃ + kdk kxk − x̃k kx − x̃ + kdk
x + λd ∈ F, ∀ λ ≥ 0.
x + λd ∈ F, ∀ λ ≥ 0.
Observe that if the set F is not closed, then the recession cone of F need not
be closed. Also the equivalence in (ii) of the above proposition need not hold.
To verify this claim, we present an example from Rockafellar [97]. Consider
the set
which is not closed. Here the recession cone 0+ F = F and hence is not closed.
/ 0+ F but (1, 1)+λ(1, 0) ∈ F for every λ ≥ 0, thereby contradicting
Also (1, 0) ∈
the equivalence in (ii).
H = {x ∈ Rn : ha, xi = b},
Definition 2.24 The hyperplane H divides the space into two half spaces,
either closed or open. The closed half spaces associated with H are
ha, x1 i ≤ b ≤ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i.
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2
The next obvious question is when will the separating hyperplane or the
supporting hyperplane exist. In this respect we prove some existence results
below. The proof is from Bertsekas [12].
x̄ ∈
/ ri F.
where b = inf x∈cl F ha, xi, thereby yielding the desired result. If x̄ ∈ cl F , then
the hyperplane so obtained supports F at x̄.
(ii) Define the set
F = F1 − F2 = {x ∈ Rn : x = x1 − x2 , xi ∈ Fi , i = 1, 2}.
ha, xi ≥ 0, ∀ x ∈ F,
which implies
ha, x1 i ≥ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 ,
which implies
kak2
ha, x1 i ≤ ha, x̄i − < ha, x̄i, ∀ x1 ∈ F1 ,
2
kak2
ha, x̄i < ha, x̄i + ≤ ha, x2 i, ∀ x2 ∈ F2 .
2
Denoting b = ha, x̄i, the above inequality leads to
as desired.
(iv) Suppose that there exists a hyperplane that separates F and x̄ properly;
that is, there exists a ∈ Rn with a 6= 0 such that
ha, x̄i ≤ inf ha, xi and ha, x̄i < sup ha, xi.
x∈F x∈F
We claim that x̄ ∈
/ ri F . Suppose on the contrary that x̄ ∈ ri F . Therefore by
the conditions of proper separation, ha, .i attains its minimum at x̄ over F .
By the assumption that x̄ ∈ ri F implies that ha, xi = ha, x̄i for every x ∈ F ,
thereby violating the strict inequality. Hence the supposition was wrong and
x̄ ∈
/ ri F .
Conversely, suppose that x̄ ∈
/ ri F . Consider the following two cases.
ha, x̄i ≤ inf ha, xi and ha, x̄i < sup ha, xi,
x∈F x∈F
ha, yi = 0, ∀ y ∈ C ⊥ .
Thus, the equivalence between the proper separation of F and x̄ and the fact
that x̄ ∈
/ ri F is proved.
Consider the nonempty convex sets F1 , F2 ⊂ Rn . Define F = F2 − F1 ,
which by Proposition 2.15 (v) and (vi) implies that ri F = ri F1 − ri F2 .
By Proposition 1.7,
sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i,
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2
which implies that a closed half space associated with the supporting hyper-
plane contains F and not x̄. Thus the intersection of the closed half spaces
containing F has no points that are not in F .
For any F̃ ⊂ Rn , taking F = cl co F̃ and applying the result for closed
convex set F yields that cl co F̃ is the intersection of all the closed half spaces
containing F̃ .
Another application of the separation theorem is the famous Helly’s The-
orem. We state the result from Rockafellar [97] without proof.
Proof. Suppose that F is not convex, which implies that there exist x, y ∈ F
such that z ∈ [x, y] but z ∈/ F . Because int F is nonempty, there exists some
a ∈ int F such that x, y and a are affinely independent. Also as F is closed,
[a, z) meets the boundary of F , say at b ∈ F . By the given hypothesis, there
is a supporting hyperplane H to F through b with a 6∈ H. Therefore, H meets
af f {x, y, a} in a line and hence x, y and a must lie on the same side of the
line, which is a contradiction. Hence, F is a convex set.
hx∗ , xi ≤ 0, ∀ x ∈ F2 .
hx∗ , xi ≤ 0, ∀ x ∈ F1 ,
hx∗ , xk i ≤ 0,
hx∗ , xi ≤ 0.
hx∗ , xi ≤ 0, ∀ x ∈ F.
hx∗ , zi ≤ 0, ∀ z ∈ co F.
hx∗ , λxi ≤ 0, ∀ x ∈ F,
hx∗ , zi ≤ 0, ∀ z ∈ cone F.
hx∗ , xi ≤ 0, ∀ x∗ ∈ F ◦ ,
b ≤ ha, xi, ∀ x ∈ F.
F ◦ = (cl cone co F )◦ .
Therefore,
which by the fact that cl cone co F is a closed convex cone yields that
F ◦◦ = cl cone co F,
hd, xi ≤ 0, ∀ x ∈ K1 × K2 .
hd1 , x1 i + hd2 , x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 .
hd1 , x1 i ≤ 0, ∀ x1 ∈ K1 ,
which implies that d1 ∈ K1◦ . Similarly it can be shown that d2 ∈ K2◦ . Thus,
d ∈ K1◦ × K2◦ , thereby leading to (K1 × K2 )◦ ⊂ K1◦ × K2◦ .
Conversely, suppose that di ∈ Ki◦ , i = 1, 2, which implies
hdi , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.
Therefore,
which yields (d1 , d2 ) ∈ (K1 × K2 )◦ , that is, K1◦ × K2◦ ⊂ (K1 × K2 )◦ , thereby
proving the result.
(v) Suppose that x∗ ∈ (K1 + K2 )◦ , which implies that for xi ∈ Ki , i = 1, 2,
hx∗ , x1 + x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 .
hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.
hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.
hx∗ , xi ≤ 0, ∀ x ∈ K1 + K2 ,
(K1◦ + K2◦ )◦ = K1 ∩ K2 .
Definition 2.32 Consider a set F ⊂ Rn . The positive polar cone to the set
F is defined as
F + = {x∗ ∈ Rn : hx∗ , xi ≥ 0, ∀ x ∈ F }.
The notion of polarity will play a major role in the study of tangent and
normal cones that are polar to each other. These cones are important in the
development of convex optimization.
x̄ + TF (x̄)
x̄
TF (x̄)
Proof. (i) Suppose that {dk } ⊂ TF (x̄) such that dk → d. Because dk ∈ TF (x̄),
there exist {xrk } ⊂ F with xrk → x̄ and {trk } ⊂ R+ with trk → 0 such that
xrk − x̄
→ dk , ∀ k ∈ N.
trk
xrk − x̄ 1
k − dk k < , ∀ r ≥ r̄.
trk k
Taking the limit as k → +∞, one can generate a sequence {xk } ⊂ F with
xk → x̄ and {tk } ⊂ R+ with tk → 0 such that
xk − x̄
→ d.
tk
1 (xk − x̄)
Denoting tk = > 0, tk → 0 such that → x − x̄, which implies that
k tk
x − x̄ ∈ TF (x̄). Because x ∈ F is arbitrary, F − x̄ ⊂ TF (x̄). As TF (x̄) is a
cone, cone (F − x̄) ⊂ TF (x̄). By (i), TF (x̄) is closed, which implies
The above inclusion along with the reverse inclusion (2.11) yields the desired
equality.
Because F is convex, the set F − x̄ is also convex. Invoking Proposi-
tion 2.14 (iv) implies that TF (x̄) is a convex set.
We now move on to another conical approximation of a convex set that
is the normal cone that plays a major role in establishing the optimality
conditions.
x̄
x̄ + NF (x̄)
NF (x̄)
hd, x − x̄i ≤ 0, ∀ x ∈ F.
For a convex set, the relation between the tangent cone and the normal
cone is given by the proposition below.
hd, x − x̄i ≤ 0, ∀ x ∈ F.
hd, x − x̄i ≤ 0, ∀ x ∈ F,
Example 2.38 (i) For a convex set F ⊂ Rn , it can be easily observed that
TF (x) = Rn for every x ∈ int F and by polarity, NF (x) = {0} for every
x ∈ int F .
(ii) For a closed convex cone K ⊂ Rn , by Theorem 2.35 (ii) it is obvious that
TK (0) = K while by Proposition 2.37, NK (0) = K ◦ . Also, for 0 6= x ∈ K,
from the definition of normal cone,
F = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m}
and define the active index set I(x) = {i ∈ {1, 2, . . . , m} : hai , xi = bi }. The
set F is called a polyhedral set, which we will discuss in the next section. Then
Proof. (i) We first establish the result for the normal cone and then use it
to derive the result for the tangent cone. Suppose that di ∈ NFi (x̄), i = 1, 2,
which implies that
hdi , xi − x̄i ≤ 0, ∀ xi ∈ Fi , i = 1, 2.
hd1 + d2 , x − x̄i ≤ 0, ∀ x ∈ F1 ∩ F2 ,
which implies that d1 + d2 ∈ NF1 ∩F2 (x̄). Because di ∈ NFi (x̄), i = 1, 2, were
arbitrarily chosen, NF1 (x̄) + NF2 (x̄) ⊂ NF1 ∩F2 (x̄).
By Propositions 2.31 (i), (v), and 2.37,
TF1 ∩F2 (x̄) ⊂ (NF1 (x̄) + NF2 (x̄))◦ = TF1 (x̄) ∩ TF2 (x̄),
that is,
hd1 , x1 − x̄1 i ≤ 0, ∀ x1 ∈ F1 ,
P = {x ∈ Rn : hai , xi ≤ 0, i = 1, 2, . . . , m},
where ai ∈ Rn for i = 1, 2, . . . , m.
Next we state some operations on the polyhedral sets and cones. For proofs,
the readers are advised to refer to Rockafellar [97].
With the notion of polyhedral sets, another concept that comes into the
picture is that of a finitely generated set.
where {x1 , x2 , . . . , xm } are the generators of the set. For a finitely generated
cone, it is the same set with j = 0 and then {x1 , x2 , . . . , xm } are the generators
of the cone.
F = cone co{x1 , x2 , . . . , xm }
F ◦ = {x ∈ Rn : hxi , xi ≤ 0, i = 1, 2, . . . , m}.
Observe that the notion of epigraph holds for domain points only. The function
is proper if φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. A function
is said to be improper if there exists x̂ ∈ Rn such that φ(x̂) = −∞.
φ
epi φ
(y, φ(y))
(x, φ(x))
x y
Figure 2.6 represents the graph and epigraph of a convex function. Observe
that the epigraph is a convex set. Another alternate characterization of the
convex function is in terms of the strict epigraph set. So next we state the
notion of strict epigraph and present the equivalent characterization.
Proof. The necessary part, that is, the convexity of φ implies that epi s φ is
convex, can be worked along the lines of proof of Proposition 2.48.
Conversely, suppose that epis φ is convex. Consider x1 , x2 ∈ dom φ and
αi ∈ R, i = 1, 2 such that φ(xi ) < αi , i = 1, 2. Therefore, (xi , αi ) ∈ epis φ,
i = 1, 2. By the convexity of epis φ, for every λ ∈ [0, 1],
As the above inequality holds for every αi > φ(xi ), i = 1, 2, taking the limit
as αi → φ(xi ), i = 1, 2, the above condition becomes
Because x1 and x2 were arbitrarily chosen, the above inequality leads to the
convexity of φ and hence the result.
The definitions presented above are for extended-valued functions. These
definitions can also be given for a function to be convex over a convex set
F ⊂ Rn as φ is convex when dom φ is restricted to F . However, in this book,
we will be considering real-valued functions unless otherwise specified.
Next we state Jensen’s inequality for the proper convex functions. The
proof can be worked out using the induction and the readers are advised to
do so.
It can be easily shown that δF is lsc and convex if and only if F is closed and
convex, respectively. Also, for sets F1 , F2 ⊂ Rn ,
We will look into this aspect more when we study the derivations of optimality
condition for the convex programming problem (CP ) presented in Chapter 1
in the subsequent chapters.
For a set F ⊂ Rn , the distance function, dF : Rn → R, to F from a point
x̄ is defined as
Proof. Suppose that the inequality (2.12) holds for x̃ ∈ F and x̄ ∈ Rn . For
any x ∈ F , consider
kx − x̄k2 = kx − x̄k2 + kx̃ − x̄k2 − 2hx̄ − x̃, x − x̃i
≥ kx̃ − x̄k2 − 2hx̄ − x̃, x − x̃i, ∀ x ∈ F.
Because (2.12) is assumed to hold, the above condition leads to
kx − x̄k2 ≥ kx̃ − x̄k2 , ∀ x ∈ F,
thereby implying that x̃ ∈ projF (x̄).
Conversely, suppose that x̃ ∈ projF (x̄). Consider any x ∈ F and for
α ∈ [0, 1], define
xα = (1 − α)x̃ + αx ∈ F.
Therefore,
kx̄ − xα k2 = k(1 − α)(x̄ − x̃) + α(x̄ − x)k2
= (1 − α)2 kx̄ − x̃k2 + α2 kx̄ − xk2 + 2α(1 − α)hx̄ − x̃, x̄ − xi.
Observe that as a function of α, kx̄ − xα k2 has a point of minimizer over [0, 1]
at α = 0. Thus,
∇α {kx̄ − xα k2 } |α=0 ≥ 0,
which implies
2 −kx̄ − x̃k2 + hx̄ − x, x̄ − x̃i ≥ 0.
The above inequality leads to
−hx̄ − x̃, x̄ − x̃i + hx̄ − x, x̄ − x̃i = hx̄ − x̃, x̃ − xi ≥ 0, ∀ x ∈ F,
thereby yielding (2.12) and hence completing the proof.
Another class of functions that is also convex in nature are the sublinear
functions and support functions. These classes of functions will be discussed
in the next subsection. But before that we present some operations on the
convex functions that again belong to the class of convex functions itself.
n
Proposition 2.53 (i) Consider proper Pm convex functions φi : R → R̄ and
αi ≥ 0, i = 1, 2, . . . , m. Then φ = i=1 αi φi is also a convex function.
(ii) Consider a proper convex function φ : Rn → R̄ and a nondecreasing
proper convex function ψ : R → R̄. Then the composition function defined as
(ψ ◦ φ)(x) = ψ(φ(x)) is a convex function provided ψ(+∞) = +∞.
(iii) Consider a family of proper convex functions φi : Rn → R̄, i ∈ I, where
I is an arbitrary index set. Then φ = supi∈I φi is a convex function.
(iv) Consider a convex set F ⊂ Rn+1 . Then φ(x) = inf{α ∈ R : (x, α) ∈ F }
is convex.
Proof. (i) By Definition 2.46 of convexity, for any x, y ∈ Rn and any λ ∈ [0, 1],
αi ≤ φ(xi ) + ε, i = 1, 2.
Because the above condition holds for every ε > 0, taking the limit as ε → 0,
the above condition reduces to
Now suppose that (xi , αi ) ∈ epi φi , i = 1, 2, which along with the definition
of inf-convolution implies that
Taking closure on both sides of the above relation along with the condition
(2.15) yields the condition (2.14), as desired.
Using this proposition, we now move on to show that the inf-convolution
of proper convex functions is also convex.
As φ1 and φ2 are convex functions, by Proposition 2.50, epis φ1 and epis φ2 are
convex sets. This along with the above condition implies that epis (φ1 φ2 ),
is convex which again by the characterization of convex functions, Proposi-
tion 2.50, leads to the convexity of φ1 φ2 .
epi cl co φ = cl co epi φ.
For more details on the convex hull and the closed convex hull of a function,
readers are advised to refer to Hiriart-Urruty and Lemaréchal [62, 63].
Now before moving on with the properties of convex functions, we briefly
discuss an important class of convex functions, namely sublinear and support
functions, which as we will see later in the chapter are important in the study
of convex analysis.
From the positive homogeneity property, p(0) = λp(0) for every λ > 0, which
is satisfied for p(0) = 0 as well as p(0) = +∞. Most sublinear functions satisfy
p(0) = 0. As p is proper, dom p is nonempty. So if p(x) < +∞, then by the
positive homogeneity property, p(tx) < +∞, which implies that dom p is a
cone. Observe that as p is positively homogeneous, for x, y ∈ Rn and any
λ ∈ (0, 1),
Note that if p is positively homogeneous, then the above condition holds triv-
ially. Conversely, if the above inequality holds, then for any λ > 0,
1
p(x) = p(λ−1 λx) ≤ p(λx), ∀ x ∈ Rn ,
λ
which along with the preceding inequality yields that p is positively homoge-
neous.
which implies that λ(x, α) = (λx, λα) ∈ epi p for every λ > 0. Also,
(0, 0) ∈ epi p. Thus, epi p is a cone.
Conversely, suppose that epi p is a convex cone. By Theorem 2.20, for any
(xi , αi ) ∈ epi p, i = 1, 2,
(x1 + x2 , α1 + α2 ) ∈ epi p.
From Proposition 1.7 (ii) and (iii), it is obvious that a support function
is sublinear. As it is the supremum of linear functions that are continuous,
support functions are lsc. For a closed convex cone K ⊂ Rn ,
0, hx̄, xi ≤ 0, ∀ x ∈ K,
σK (x̄) =
+∞, otherwise,
which is nothing but the indicator function of the polar cone K ◦ . Equivalently,
σ K = δK ◦ and δK = σ K ◦ .
Because F ⊂ co F , by (i),
thus yielding the equality as desired. These relations also imply that
σF = σcl co F .
(iii) Invoking Theorem 2.27, the desired result holds.
(iv) By (i) and (ii), cl F1 ⊂ cl F2 implies that
σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Conversely, suppose that the above inequality holds, which implies for
every x ∈ cl F1 ,
hx∗ , xi ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Therefore, by (iii), x ∈ cl F2 . Because x ∈ cl F1 was arbitrary, cl F1 ⊂ cl F2 ,
thereby completing the proof.
(v) Consider x ∈ K. As F2 ⊂ F2 + K ◦ , by (i) along with Proposition 1.7 and
the definition of polar cone leads to
σF2 (x) ≤ σF2 +K ◦ (x) = σF2 (x) + σK ◦ (x) ≤ σF2 (x),
that is, σF2 (x) = σF2 +K ◦ (x) for x ∈ K. Now if x 6∈ K, there exists z ∈ K ◦
such that hz, xi > 0. Consider y ∈ F2 . Therefore, as the limit λ → +∞,
hy + λz, xi → +∞ which implies σF2 +K ◦ (x) = +∞. Thus, establishing the
first equivalence. The second equivalence can be obtained by (ii) and (iv).
(vi) Suppose that F is bounded, which implies that there exists M > 0 such
that
kx′ k ≤ M, ∀ x′ ∈ F.
Therefore, by the Cauchy–Schwarz Inequality, Proposition 1.1,
hx, x′ i ≤ kxkkx′ k ≤ kxkM, ∀ x′ ∈ F,
which implies that σF (x) ≤ kxkM for every x ∈ Rn . Thus, σF is finite every-
where.
Conversely, suppose that σF is finite everywhere. In the next section, we
will present a result establishing the local Lipschitz property and hence con-
tinuity of the convex function, Theorem 2.72. This leads to the local bound-
edness. Therefore there exists M such that
hx, x′ i ≤ σF (x) ≤ M, ∀ (x, x′ ) ∈ B × F.
x′
If x′ 6= 0, taking x = , the above inequality leads to kx′ k ≤ M for every
kx′ k
x′ ∈ F , thereby establishing the boundedness of F and hence proving the
result.
As mentioned earlier, the support function is lsc and sublinear. Similarly,
a closed sublinear function can be viewed as a support function. We end
this subsection by presenting this important result to assert the preceding
statement. The proof is again due to Hiriart-Urruty and Lemaréchal [63].
Theorem 2.62 For a proper lsc sublinear function σ : Rn → R̄, there exists a
linear function minorizing σ. In fact, σ is the supremum of the linear function
minorizing it; that is, σ is the support function of the closed convex set given
by
Fσ = {x ∈ Rn : hx, di ≤ σ(d), ∀ d ∈ Rn }.
Proof. Because sublinear functions are convex, σ is a proper lsc convex func-
tion. As we will discuss in one of the later sections, every proper lsc convex
function can be represented as a pointwise supremum of affine functions ma-
jorized by it, Theorem 2.100 and there exists (x, α) ∈ Rn × R such that
hx, di − α ≤ σ(d), ∀ d ∈ Rn .
hx, di ≤ σ(d), ∀ d ∈ Rn ,
−1 1 −1 1
epi φ1 epi φ2
functions can however have finite values at the boundary points. For example,
consider φ1 : R → R̄ given by
−∞, |x| < 1,
φ1 (x) = 0, |x| = 1,
+∞, |x| > 1.
This contradicts the convexity of the epigraph. This aspect can be easily
visualized by modifying the previous example as follows. Define an improper
function φ2 : R → R̄ as
−∞, x = 1,
φ2 (x) = 0, −1 ≤ x < 1,
+∞, |x| > 1.
Equivalently, (x̄, ᾱ) ∈ ri epi φ if and only if ᾱ > lim supx→x̄ φ(x).
Proof. To obtain the result for ri epi φ, it is sufficient to derive it for int epi φ,
that is,
By Definition 2.12, for (x̄, ᾱ) ∈ int epi φ, there exists ε > 0 such that
which implies that x̄ ∈ int dom φ along with φ(x̄) < ᾱ. As (x̄, ᾱ) ∈ int epi φ
is arbitrary,
By the convexity
Pm of F , for any x ∈ F there exist λi ≥ 0, i = 1, 2, . . . , m,
satisfying i=1 λi = 1 such that
m
X
x= λi xi .
i=1
Because φ is convex,
m
X m
X
φ(x) ≤ λi φ(xi ) ≤ λi γ = γ.
i=1 i=1
In particular, for any α > γ, (x̄, α) ∈ int epi φ. Thus, (x̄, ᾱ) can be con-
sidered as lying on the interior of line segment passing through the points
(x̄, α) ∈ int epi φ, which by the line segment principle, Proposition 2.14,
(x̄, ᾱ) ∈ int epi φ. Because (x̄, ᾱ) is arbitrary,
Conversely, suppose that for (x̄, ᾱ) the strict inequality condition holds
which implies
which yields φ(x̄) < ᾱ with x̄ ∈ ri dom φ, thereby proving the equivalent
result. Note that this equivalence can be established for int epi φ as well.
Note that the above result can also be obtained for the relative interior of
the epigraph as it is nothing but the interior relative to the affine hull of the
epigraph. As a consequence of the above characterization of ri F , we have the
following result from Rockafellar [97].
Because for some x ∈ Rn , φ(x) < α, in particular for µ = φ(x), we have that
H meets epi φ. Invoking Corollary 2.16 (ii), H also meets ri epi φ, which by
Proposition 2.64 implies that there exists x ∈ ri dom φ such that φ(x) < α,
thereby yielding the desired result.
Recall that in the previous chapter the closure of a function φ : Rn → R̄
was defined as
lim inf (cl φ)((1 − λ)x̂ + λx) = cl φ(x) ≤ lim inf φ((1 − λ)x̂ + λx).
λ→1 λ→1
Consider any (x̂, α̂) ∈ ri epi φ. Applying the Line Segment Principle, Propo-
sition 2.14,
By Proposition 2.64,
In the relation
respectively.
ri H ∩ ri epi φ = H ∩ ri epi φ 6= ∅.
Now consider
Therefore, by Corollary 2.16 (ii), {x ∈ Rn : φ(x) < α} has the same closure
and relative interior as {x ∈ Rn : φ(x) ≤ α}.
cl (φ1 + φ2 + . . . + φm ) = cl φ1 + cl φ2 + . . . + cl φm .
and thus cl φ = φ.
Suppose that φi , i = 1, 2, . . . , m, are not all lsc. If
m
\
ri dom φi 6= ∅,
i=1
Therefore,
x̂ ∈ ri dom φi , i = 1, 2, . . . , m.
By the convexity of φ,
which leads to
Observe that
kx − yk 2M
φ(x) − φ(y) ≤ 2M ≤ kx − yk.
δ + kx − yk δ
2M
|φ(x) − φ(y)| ≤ kx − yk,
δ
thereby establishing the result.
In the above result, we showed that if a proper convex function is locally
bounded at a point, then it is locally Lipschitz at that point. As a matter of
fact, it is more than that which is presented in the result below, the proof of
which is along the lines of Hiriart-Urruty and Lemaréchal [63].
By the convexity of φ,
n+1
X
φ(x) ≤ λi φ(xi ) ≤ max φ(xi ) = M < +∞.
1,2,...,n+1
i=1
Because x ∈ Bε (x̄) is arbitrary, the above condition holds for every x ∈ Bε (x̄).
Therefore, by Theorem 2.72, for ε′ < ε,
2M
|φ(x) − φ(y)| ≤ kx − yk, ∀ x, y ∈ Bε′ (x̄),
ε − ε′
thus proving that φ is locally Lipschitz at x̄ ∈ ri dom φ. Because x̄ ∈ ri dom φ
is arbitrary, φ is locally Lipschitz on ri dom φ.
Applying (2.19) to the remaining two cases leads to the fact that
φ(y) − φ(x)
ψ(y) = is nondecreasing.
y−x
Suppose that φ is convex, which implies (2.19) holds. As φ is differentiable,
for x1 , x2 ∈ I with x1 < x2 ,
φ(x2 ) − φ(x1 ) φ(x1 ) − φ(x2 )
∇φ(x1 ) ≤ = ≤ ∇φ(x2 ),
x2 − x1 x1 − x2
thereby establishing the result.
If φ : R → R̄ is a proper convex function, dom φ may be considered an
interval I. Then from the nondecreasing property of ψ in the above proposi-
tion, the right-sided derivative of φ, φ′+ , exists at x̄ provided both −∞ and
+∞ values are allowed and is defined as
φ(x) − φ(x̄)
φ′+ (x̄) = lim .
x↓x̄ x − x̄
φ(x) − φ(x̄)
If has a finite lower bound,
x − x̄
φ(x) − φ(x̄) φ(x) − φ(x̄)
lim = inf ,
x↓x̄ x − x̄ x>x̄, x∈I x − x̄
φ(x) − φ(x̄) φ(x) − φ(x̄)
because is nondecreasing on I. In case does not have
x − x̄ x − x̄
a finite lower bound,
φ(x) − φ(x̄) φ(x) − φ(x̄)
lim = inf = −∞,
x↓x̄ x − x̄ x>x̄, x∈I x − x̄
and for the case when I = {x̄},
φ(x) − φ(x̄)
inf = +∞
x>x̄, x∈I x − x̄
as {x ∈ R : x > x̄, x ∈ I} = ∅. Thus,
φ(x) − φ(x̄)
φ′+ (x̄) = inf .
x>x̄, x∈I x − x̄
As x̄ ∈ dom φ, ψ(0) = φ(x̄) < +∞, which along with the convexity of φ
implies that ψ is a proper convex function. Now consider ϕ : R → R̄ defined
as
ψ(λ) − ψ(0) φ(x̄ + λd) − φ(x̄)
ϕ(λ) = = .
λ λ
By Proposition 2.75, ϕ is nondecreasing when λ > 0. Then by the discussion
′
preceding the theorem, ψ+ (0) exists and
′
ψ+ (0) = lim ϕ(λ) = inf ϕ(λ),
λ→0 λ>0
as desired.
Suppose that d ∈ Rn and α > 0. Then
φ(x̄ + λαd) − φ(x̄)
φ′ (x̄, αd) = lim
λ→0 λ
φ(x̄ + λαd) − φ(x̄)
= lim α
λ→0 λα
φ(x̄ + λ′ d) − φ(x̄)
= α lim = αφ′ (x̄, d),
λ′ →0 λ′
which implies that φ′ (x, .) is positively homogeneous.
Suppose that d1 , d2 ∈ Rn and α ∈ [0, 1], by the convexity of φ,
Dividing both sides by λ > 0 and taking the limit as λ → 0, the above
inequality reduces to
In particular for α = 1/2 and applying the positive homogeneity property, the
above condition yields
But in absence of differentiability, can one have such a relation for the direc-
tional derivative? The answer is yes. The notion that replaces the gradient in
the above condition is the subgradient.
which is nothing but the normal cone to the set F at x̄. Therefore, for a convex
set F , ∂δF = NF .
Consider the norm function φ(x) = kxk, x ∈ Rn . Observe that φ is a
convex function. At x̄ = 0, φ is not differentiable and ∂φ(x̄) = B.
Like the relation between the directional derivative and gradient, we are
interested in deriving a relationship between the directional derivative and the
subdifferential, which we establish in the next result.
However, if x̄ ∈ ri dom φ,
φ′ (x̄, d) = σ∂φ(x̄) (d), ∀ d ∈ Rn
and if x̄ ∈ int dom φ, φ′ (x̄, d) is finite for every d ∈ Rn .
Proof. Because φ′ (x̄, .) is sublinear, combining Theorems 2.62 and 2.78 leads
to
cl φ′ (x̄, d) = σ∂φ(x̄) (d).
If x̄ ∈ ri dom φ, the domain of φ′ (x̄, .) is an affine set that is actually a
subspace parallel to the affine hull of dom φ. By sublinearity, φ′ (x̄, 0) = 0, it is
not identically −∞ on the affine set. Therefore, by Proposition 2.63, cl φ′ (x̄, .)
and hence φ′ (x̄, .) is a proper function. By Proposition 2.66, cl φ′ (x̄, .) agrees
with φ′ (x̄, .) on the affine set and hence is closed, thereby leading to the desired
condition. For x̄ ∈ int dom φ, the domain of φ′ (x̄, .) is Rn and hence it is finite
everywhere.
As mentioned earlier for a differentiable convex function, for every d ∈ Rn ,
′
φ (x̄, d) = h∇φ(x̄), di. So the question is: for a differentiable convex function,
how are the gradient and the subdifferential related? We discuss this aspect
in the result below.
h∇φ(x̄) − ξ, di ≥ 0, ∀ d ∈ Rn .
h∇φ(x̄) − ξ, di = 0, ∀ d ∈ Rn ,
Observe that in Theorem 2.79 we defined the relation between the direc-
tional derivative and the support function of the subdifferential for point x̄ in
the relative interior of the domain. The reason for this is the fact that at the
boundary of the domain, the subdifferential may be an empty set. For a clear
view into this aspect, we consider the following example from Bertsekas [12].
Let φ : R → R̄ be a proper convex function given by
√
− x, 0 ≤ x ≤ 1,
φ(x) =
+∞, otherwise.
The subdifferential of φ is
−1
√ , 0 < x < 1,
2 x
∂φ(x) =
[−1/2, +∞) ,
x = 1,
∅, x ≤ 0 or x > 1.
φ(x̄ + εd) ≤ M, ∀ d ∈ B.
In particular, for any d ∈ B, the above inequality along with the boundedness
of φ in the neighborhood of x̄ leads to
Theorem 2.84 (Closed Graph Theorem) Consider a proper lsc convex func-
tion φ : Rn → R̄. If for sequences {xk }, {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ) with
xk → x̄ and ξk → ξ,¯ then ξ¯ ∈ ∂φ(x̄). This means gph ∂φ is a closed subset of
Rn × Rn .
¯ then from Definition 2.77
Proof. Because ξk ∈ ∂φ(xk ) with (xk , ξk ) → (x̄, ξ),
of subdifferential,
Taking the limit infimum as k → +∞, which along with the lower semiconti-
nuity of φ reduces the above condition to
¯ x − x̄i, ∀ x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ,
thereby implying that ξ¯ ∈ ∂φ(x̄) and thus establishing that gph ∂φ is closed,
as desired.
From the above theorem one may note that the normal cone to a convex
set F ⊂ Rn is also graph closed as it is nothing but the subdifferential of the
convex indicator function δF , that is, NF = ∂δF .
In general we know that the arbitrary union of closed sets need not be
closed. But in the proposition below from Bertsekas [12] and Rockafellar [97]
we have that the union of the subdifferential over a compact set is compact.
n
S function φ : R → R and a compact
Proposition 2.85 Consider a convex
n
set F ∈ R . Then the set ∂φ(F ) = x∈F ∂φ(x) is nonempty and compact.
hξ1 − ξ2 , x1 − x2 i ≥ 0, ∀ ξi ∈ ∂φ(xi ), i = 1, 2,
and maximal monotone map in the sense that its graph is not properly con-
tained in the graph of any other monotone map.
ψ ′ (λ̄, d) ≥ 0, ∀ d ∈ R.
f (x) − f (x̄) ≥ 0, ∀ x ∈ Rn .
By the convexity of C, (1 − λ)x̄ + λȳ ∈ (x̄, ȳ) ⊂ C. Now the strict inequality
(2.21) contradicts the fact that x̄ and ȳ are the points of global minimizers
of f over C, which is a contradiction. Thus, x̄ = ȳ, thereby implying that
minimizing a strictly convex function f over a convex set C has at most one
point of global minimum.
As discussed earlier in this chapter, the above problem can be converted
into the unconstrained convex programming problem of the form (CPu ) with
the objective function f replaced by f + δC . From the above theorem, x̄ is the
point of minimizer of (CP ) if and only if
0 ∈ ∂(f + δC )(x̄).
Therefore,
Here, ψ1 (0) = ψ2 (0) = 0. Observe that ξ ∈ ∂(φ1 + φ2 )(x̄) which by the above
constructed functions is equivalent to
(ψ1 + ψ2 )(x) ≥ 0, ∀ x ∈ Rn ,
0 ∈ ∂(φ1 + φ2 )(0),
which implies
F1 = {(x, α) ∈ Rn × R : φ1 (x) ≤ α}
ri F1 ∩ F2 = ∅
with (0, 0) ∈ F1 ∩F2 . Therefore, by the separation theorem, Theorem 2.26 (ii),
there exists (x∗ , α∗ ) ∈ Rn × R with (x∗ , α∗ ) 6= (0, 0) such that
hx∗ , xi + α∗ α ≥ 0, ∀ (x, α) ∈ F1 ,
hx∗ , xi + α∗ α ≤ 0, ∀ (x, α) ∈ F2 .
This implies that dom φ1 and dom φ2 can be separated, which contradicts
the hypothesis that ri dom φ1 ∩ ri dom φ2 6= ∅. Hence, α∗ > 0 and can be
normalized to one and thus
hx∗ , xi + α ≥ 0, ∀ (x, α) ∈ F1 ,
hx∗ , xi + α ≤ 0, ∀ (x, α) ∈ F2 .
In particular, for (x, φ1 (x)) ∈ F1 and (x, −φ2 (x)) ∈ F2 , we have −x∗ ∈ ∂φ1 (0)
and x∗ ∈ ∂φ2 (0), thereby leading to
Therefore, ∂(φ1 + φ2 )(0) 6= ∂φ1 (0) + ∂φ2 (0). Observe that dom φ1 ∩ dom φ2 =
F1 ∩ F2 = {(0, 0)} while ri dom φ1 ∩ ri dom φ2 = ri F1 ∩ ri F2 = ∅.
Now as an application of the Subdifferential Sum Rule, we prove the equal-
ity in Proposition 2.39 (i) under the assumption of ri F1 ∩ ri F2 6= ∅.
Proof of Proposition 2.39 (i). For convex sets F1 , F2 ⊂ Rn , define φ1 = δF1
and φ2 = δF2 . Observe that dom φi = Fi for i = 1, 2. If ri F1 ∩ ri F2 6= ∅,
then ri dom φ1 ∩ ri dom φ2 6= ∅. Now applying the Sum Rule, Theorem 2.91,
which along with the facts that δF1 + δF2 = δF1 ∩F2 and ∂δF = NF implies
that
Therefore,
m
X
kξk ≤ |µi | kξi k.
i=1
where (µj1 , µj2 , . . . , µjm ) ∈ ∂φ(Φ(x̄) and ξij ∈ ∂φi (x̄), i = 1, 2, . . . , m, for
j = 1, 2. Now for any λ ∈ (0, 1), define
ξλ = (1 − λ)ξ1 + λξ2 .
Note that µλi = 0 only when µ1i = µ2i = 0 as λ ∈ (0, 1). Therefore,
X (1 − λ)µ1 λµ2i 2
i 1
ξ= µi ξi + ξi ,
¯
µi µi
i∈I
(µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄))
(1 − λ)µ1i 1 λµ2i 2
and ξi + ξ ∈ ∂φi (x̄), i = 1, 2, . . . , m,
µi µi i
thereby showing that F is convex.
Step 2: Denote
Φ′ (x̄, d) = (φ′1 (x̄, d), φ′2 (x̄, d), . . . , φ′m (x̄, d)).
As µ = (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
m
X
µi φ′i (x̄, d) = hµ, Φ′ (x̄, d)i ≤ φ′ (Φ(x̄), Φ′ (x̄, d)).
i=1
m
X
Denoting ξ¯ = µ̄i ξ¯i ∈ F,
i=1
Step 3: It is obvious that the support function of ∂(φ ◦ Φ)(x̄) is (φ ◦ Φ)′ (x̄, d).
We claim that
To establish the result, we will prove the reverse inequality, that is,
We claim that there exists a neighborhood N (x̄) such that I(x) ⊂ I(x̄) for
every x ∈ N (x̄). On the contrary, assume that there exists {xk } ⊂ Rn with
xk → x̄ such that I(xk ) 6⊂ I(x̄). Therefore, we may choose ik ∈ I(xk ) but
/ I(x̄). As {ik } ⊂ {1, 2, . . . , m} for every k ∈ N, by the Bolzano–Weierstrass
ik ∈
Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of
generality, suppose that ik → ī. Because I(xk ) is closed, ī ∈ I(xk ), which
implies ϕī (xk ) = ϕ(xk ). By Theorem 2.69, the functions are continuous on
Rn . Thus ϕī (x̄) = ϕ(x̄), that is, ī ∈ I(x̄). Because ik ∈ / I(x̄) for every k ∈ N,
which implies that ī 6∈ I(x̄), which is a contradiction, thereby establishing the
claim.
Now consider {λk } ⊂ R+ such that λk → 0. Observe that
ϕ′ (y, d) = max
′
{hei , di},
i∈I (y)
that is,
Thus,
Xm
∂φ(x̄) = { µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m}
Xm Xm
= { µi ξi : µi ≥ 0, i ∈ I(x̄), µi = 0, i ∈
/ I(x̄), µi = 1,
i=1 i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m},
which implies
[
∂φ(x̄) = co ∂φi (x̄),
i∈I(x̄)
Proof. Observe that (ii) and (iii) ensure that Ŷ (x̄) is nonempty and compact.
Suppose that ξ ∈ ∂x φ(x̄, ȳ) for some ȳ ∈ Ŷ (x̄). By Definition 2.77 of the
subdifferential,
thus implying that ξ ∈ ∂Φ(x̄). Because ȳ ∈ Ŷ (x̄) and ξ ∈ ∂x φ(x̄, ȳ) were
arbitrary,
φ(x, ȳ) ≥ lim sup φ(x, yk ) ≥ φ(x̄, ȳ) + lim suphξk , x − x̄i
k→∞ k→∞
¯ x − x̄i, ∀ x ∈ Rn ,
= φ(x̄, ȳ) + hξ,
S
thereby yielding that ξ¯ ∈ ∂x φ(x̄, ȳ). Hence, y∈Ŷ (x̄) ∂x φ(x̄, y) is closed.
Now let us assume on the contrary that
[
∂Φ(x̄) 6⊂ co ∂x φ(x̄, y),
y∈Ŷ (x̄)
S
As co y∈Ŷ (x̄) ∂x φ(x̄, y) is a closed convex set, by the Strict Separation The-
orem, Theorem 2.26 (iii), there exists d ∈ Rn with d 6= 0 such that
¯ di > hξ, di, ∀ ξ ∈ ∂x φ(x̄, y), ∀ y ∈ Ŷ (x̄).
hξ, (2.25)
Taking the limit supremum as r → +∞, which along with the upper semicon-
tinuity of φ(x, .) for every x ∈ Rn implies that
Proof. Suppose that ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ). By Definition 2.77 of the subd-
ifferential, for i = 1, 2,
Define y1 +y2 = ȳ. The above inequality along with the given hypothesis leads
to
Therefore, ξ ∈ ∂φ1 (x1 ). Similarly, it can be shown that ξ ∈ ∂φ2 (x2 ) and
hence ξ ∈ ∂φ1 (x1 ) ∩ φ2 (x2 ). Because ξ ∈ ∂(φ1 φ2 )(x̄) was arbitrary,
∂(φ1 φ2 )(x̄) ⊂ ∂φ1 (x1 ) ∩ φ2 (x2 ), thereby establishing the result.
λ(φ(x̄) − γ) > 0.
As φ(x̄) > γ, the above strict inequality leads to λ > 0. Again, taking
(x, φ(x)) ∈ epi φ in the condition (2.30) yields
φ(x) ≥ h(x), ∀ x ∈ Rn ,
φ(x) ≥ (h + µh̄)(x), ∀ x ∈ Rn ,
which implies the affine function (h + µh̄) is majorized by φ. As h(x̄) > 0, for
µ sufficiently large, (h + µh̄)(x̄) > γ, thereby establishing the result.
Denote the set of all affine functions by H. Consider the support set of φ
denoted by supp(φ, H), which is the collection of all affine functions majorized
by φ, that is,
which is actually the support function to the set F . Therefore, δF∗ = σF for
any set F .
Observe that the definitions of conjugate and biconjugate functions are
given for any arbitrary function. Below we present some properties of conju-
gate functions.
Because ξ1 and ξ2 are arbitrary, from the above inequality φ∗ is convex. Also,
as φ∗ is a pointwise supremum of affine functions hx, .i − φ(x), it is lsc.
As φ is a proper convex function, dom φ is a nonempty convex set in Rn ,
which by Proposition 2.14 (i) implies that ri dom φ is nonempty. Also, by
Proposition 2.82, for any x̄ ∈ ri dom φ, ∂φ(x̄) is nonempty. Suppose that
ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential implies that
As φ(x̄) is finite, φ∗ (ξ) is also finite, that is, ξ ∈ dom φ∗ . Also, by the proper-
ness of φ and the definition of φ∗ , it is obvious that φ∗ (ξ) > −∞ for every
x ∈ Rn , thereby showing that φ∗ is proper convex function.
Observe that φ∗ is lsc convex irrespective of the nature of φ but for φ∗ to
be proper, we need φ to be a proper convex function. Simply assuming φ to be
proper need not imply that φ∗ is proper. For instance, consider φ(x) = −x2 ,
which is a nonconvex proper function. Then φ∗ ≡ +∞ and hence not proper.
Next we state some conjugate rules that can be proved directly using the
definition of conjugate functions.
The readers are urged to verify these properties simply using Defini-
tion 2.101 of conjugate and biconjugate functions.
As we discussed in Theorem 2.100 that a convex function is pointwise
supremum of affine functions, the biconjugate of the function plays an impor-
tant role in this respect. Below we present a result that relates the biconju-
gate with the support set. The proof is along the lines of Hiriart-Urruty and
Lemaréchal [63].
Proof. An affine function h is majorized by φ, that is, h(x) ≤ φ(x) for every
x ∈ Rn . Because an affine function is expressed as h(x) = hξ, xi − α for some
ξ ∈ Rn and α ∈ R,
hξ, xi − α ≤ φ(x), ∀ x ∈ Rn .
thereby yielding the desired result. From Definition 2.57 of the closed convex
function, φ∗∗ = cl co φ, as desired.
Combining Theorems 2.100 and 2.104 we have the following result for a
proper lsc convex function.
Observe that the above theorem holds when the function is lsc. What if
φ is only proper convex but not lsc, then how is one supposed to relate the
function φ to its biconjugate φ∗∗ ? The next result from Attouch, Buttazzo,
and Michaille [3] looks into this aspect.
Proposition 2.106 Consider a proper convex function φ : Rn → R̄. Assume
that φ admits a continuous affine minorant. Then φ∗∗ = cl φ. Consequently,
φ is lsc at x̄ ∈ Rn if and only if φ(x̄) = φ∗∗ (x̄).
Proof. By the Fenchel–Young inequality, Proposition 2.103 (iv),
φ(x) ≥ hξ, xi − φ∗ (ξ), ∀ x ∈ Rn ,
which implies that h(x) = hξ, xi − φ∗ (ξ) belongs to supp(φ, H). By Defini-
tion 2.101 of the biconjugate function,
φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)},
ξ∈Rn
which leads to φ∗∗ being the upper envelope of the continuous affine minorants
of φ. Applying Proposition 2.102 to φ∗ , φ∗∗ is a proper lsc convex function
and thus,
φ∗∗ ≤ cl φ ≤ φ.
This inequality along with Proposition 2.103 (i) leads to
(φ∗∗ )∗∗ ≤ (cl φ)∗∗ ≤ φ∗∗ .
As φ∗∗ and cl φ are both proper lsc convex functions, by Theorem 2.105,
(φ∗∗ )∗∗ = φ∗∗ and (cl φ)∗∗ = cl φ,
thereby reducing the preceding inequality to
φ∗∗ ≤ cl φ)∗∗ ≤ φ∗∗ .
Hence, φ∗∗ = cl φ, thereby establishing the first part of the result.
From Chapter 1, we know that closure of a function φ is defined as
cl φ(x̄) = lim inf φ(x),
x→x̄
is attained.
(iii) (Infimum Rule) Consider a family of proper functions φi : Rn → R̄,
i ∈ I, where I is an arbitrary index set, having a common affine minorant
and satisfying supi∈I φ∗i (ξ) < +∞ for some ξ ∈ Rn . Then
Proof. (i) From Definition 2.101 of the conjugate function and Definition 2.54
of the inf-convolution along with Proposition 1.7,
Taking the conjugate on both sides and again applying Proposition 2.106
yields the requisite condition,
cl φ1 + cl φ2 + . . . + cl φm = cl (φ1 + φ2 + . . . + φm ).
1
φ∗1 (ξk1 ) + . . . + φ∗m (ξkm ) ≤ α + , ∀ k ∈ N. (2.33)
k
Tm
By assumption, suppose that x̂ ∈ i=1 ri dom φi . As φi , i = 1, 2, . . . , m, are
convex, by Theorem 2.69, the functions are continuous at x̂. Therefore, for
some ε > 0 and Mi ∈ R, i = 1, 2, . . . , m,
ξ = ξ1 + ξ2 + . . . + ξm .
as desired.
(iv) Replacing φi by φ∗i for i = 1, 2, . . . , m in (iii),
sup φ∗∗ ∗ ∗
i = (inf φi ) .
i∈I i∈I
that is,
which along with the Fenchel–Young inequality, Proposition 2.103 (iv), re-
duces to the desired condition
that is,
that is,
¯ ≥ hξ − ξ,
φ∗ (ξ) − φ∗ (ξ) ¯ x̄i, ∀ ξ ∈ Rn .
¯ The converse can be worked
By the definition of subdifferential, x̄ ∈ ∂φ∗ (ξ).
out along the lines of the previous part and thus establishing the desired
relation.
As an application of the above theorem, consider a closed convex cone
K ⊂ Rn . We claim that ξ¯ ∈ ∂δK (x̄) if and only if x̄ ∈ ∂δK ◦ (ξ).
¯ Suppose that
¯
ξ ∈ ∂δK (x̄) = NK (x̄), which is equivalent to
¯ x − x̄i ≤ 0, ∀ x ∈ K.
hξ,
¯ x̄i = 0.
In particular, taking x = 0 and x = 2x̄, respectively, implies that hξ,
Therefore, the above inequality reduces to
¯ xi ≤ 0, ∀ x ∈ K,
hξ,
which by Definition 2.30 implies that ξ¯ ∈ K ◦ . Thus, ξ¯ ∈ NK (x̄) is equivalent
to
x̄ ∈ K, ξ¯ ∈ K ◦ , ¯ x̄i = 0.
hξ,
For a closed convex cone K, by Proposition 2.31, K ◦◦ = K. As x̄ ∈ K = K ◦◦ ,
hξ, x̄i ≤ 0, ∀ ξ ∈ K ◦ .
¯ x̄i = 0, the above condition is equivalent to
Because hξ,
¯ x̄i ≤ 0, ∀ ξ ∈ K ◦ ,
hξ − ξ,
¯ thereby proving our claim.
which implies that x̄ ∈ NK ◦ (ξ),
2.6 ε-Subdifferential
In Subsection 2.3.3 on differentiability properties of a convex function, from
Proposition 2.82 and the examples preceding it, we noticed that ∂φ(x) may
turn out to be empty, even though x ∈ dom φ. To overcome this aspect of
subdifferentials, the concept of the ε-subdifferential came into existence; it not
only overcomes the drawback of subdifferentials but is also important from
the optimization point of view. The idea can be found in the work of Brønsted
and Rockafellar [19] but the theory of ε-subdifferential calculus was given by
Hiriart-Urruty [58].
Proof. Observe that for x̄ ∈ dom φ and ε > 0, φ(x̄) − ε < φ(x̄), which
implies (x̄, φ(x̄) − ε) 6∈ epi φ. Because φ is a lsc convex, by Theorem 1.9 and
Proposition 2.48, epi φ is closed convex set in Rn × R. Therefore, applying the
Strict Separation Theorem, Theorem 2.26 (iii), there exists (ξ, γ) ∈ Rn × R
with (ξ, γ) 6= (0, 0) such that
As (x, φ(x)) ∈ epi φ for every x ∈ dom φ, the above condition leads to
Conversely, consider ξ ∈ ∂ε φ(x̄) for every ε > 0, which implies that for
every x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ ε > 0.
As the preceding inequality holds for every ε > 0, taking the limit as ε → 0
leads to
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
thereby yielding ξ ∈ ∂φ(x̄). Because ξ was arbitrary, the reverse inclusion is
satisfied, that is,
\
∂φ(x̄) ⊃ ∂ε φ(x̄),
ε>0
−1
−1 1/2
−1/2 1
−1
as desired.
Conversely, suppose that the inequality holds, which by the definition of
conjugate function implies that
φ(x̄) − φ(xλ ) ≤ ε.
|φ(xλ ) − φ(x̄)| ≤ ε.
that is,
By applying
√ √ 2.113, to φ − hξ, .i with
Ekeland’s Variational Principle, Theorem
λ = ε, there exists xε ∈ Rn such that kxε − x̄k ≤ ε,
and
√
φ(xε ) − hξ, xε i ≤ φ(x) − hξ, xi + εkx − xε k, ∀ x ∈ Rn . (2.38)
Theorem 2.115 (Sum Rule) Consider two proper convex functions φi : Rn → R̄,
i = 1, 2 such that ri dom φ1 ∩ ri dom φ2 6= ∅. Then for ε > 0,
[
∂ε (φ1 + φ2 )(x̄) = (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄))
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
By the Sum Rule of the conjugate function, Theorem 2.107 (ii), as the as-
sumption ri dom φ1 ∩ ri dom φ2 6= ∅ holds,
(φ∗1 (ξ1 ) + φ1 (x̄) − hξ1 , x̄i) + (φ∗2 (ξ2 ) + φ2 (x̄) − hξ2 , x̄i) ≤ ε.
For the ε-subdifferential Max-Function Rule, we will need the Scalar Prod-
uct Rule that we present below.
ξ ε
φ(x) − φ(x̄) ≥ h , x − x̄i − , ∀ x ∈ Rn ,
λ λ
ξ ε
which implies ∈ ∂ε̃ φ(x̄), where ε̃ = , that is, ξ ∈ λ∂ε̃ φ(x̄). Because
λ λ
ξ ∈ ∂ε (λφ)(x̄) was arbitrary,
Conversely, suppose that ξ ∈ λ∂ε̃ φ(x̄) for λ > 0, which implies there exists
ξ˜ ∈ ∂ε̃ φ(x̄) such that ξ = λξ.
˜ By the definition of ε-subdifferential,
˜ x − x̄i − ε̃, ∀ x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ,
which implies
By the relation between the ε-subdifferential and the conjugate function, The-
orem 2.112, as ξ ∈ ∂ε φ(x),
which along with (2.43) and Theorem 2.112 implies that ξ ∈ ∂ε φ(x), thereby
completing the proof.
Remark 2.119 In the above result, applying the Scalar Product Rule, The-
orem 2.117, to the condition (2.41) implies that ξ˜i = λi ξi ∈ ∂εi (λi φi )(x) pro-
vided λi > 0. Therefore, ξ ∈ ∂ε φ(x) is such that there exist ξ˜i ∈ ∂εi (λi φi )(x),
i = 1, 2, . . . , p, satisfying
p
X p
X p
X
ξ= ξ˜i and εi + φ(x) − λi φi (x) ≤ ε.
i=1 i=1 i=1
Therefore,
To establish the result, we shall prove the reverse containment in the above
condition. Suppose that ξ¯ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.108,
¯ = hξ,
(φ1 + φ2 )(x̄) + (φ1 + φ2 )∗ (ξ) ¯ x̄i,
which along with the Fenchel–Young inequality, Proposition 2.103 (iv), implies
that
¯ ≤ hξ,
(φ1 + φ2 )(x̄) + (φ1 + φ2 )∗ (ξ) ¯ x̄i.
Applying the Sum Rule of conjugate functions, Theorem 2.107 (ii), to proper
lsc convex functions φ1 and φ2 leads to
Define
ξ¯ ∈ cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}.
φ(ξ) − α = inf {φ∗1 (ξ1 ) + φ∗2 (ξ2 ) − hξ1 , x̄i − hξ2 , x̄i + φ1 (x̄) + φ2 (x̄)}
ξ=ξ1 +ξ2
= inf {(φ∗1 (ξ1 ) − hξ1 , x̄i + φ1 (x̄)) + (φ∗2 (ξ2 ) − hξ2 , x̄i + φ2 (x̄))}.
ξ=ξ1 +ξ2
(φ∗1 (ξ1 ) − hξ1 , x̄i + φ1 (x̄)) + (φ∗2 (ξ2 ) − hξ2 , x̄i + φ2 (x̄))} < ε.
which along with Definition 2.101 of the conjugate and the preceding condi-
tions imply that
which implies ξ¯ ∈ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)) for every ε > 0. As ξ¯ ∈ ∂(φ1 + φ2 )(x̄)
was arbitrary,
\
∂(φ1 + φ2 )(x̄) ⊂ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)),
ε>0
Proof. Denote
[
F= {(ξ, hξ, x̄ − φ(x̄) + ε) : ξ ∈ ∂ε φ(x̄)}.
ε≥0
hξ, xi − φ(x) ≤ α, ∀ x ∈ Rn .
Theorem 2.123 (i) (Sum Rule) Consider proper lsc convex functions
φi : Rn → R̄, i = 1, 2, . . . , m. Then
(iv) Consider a proper lsc convex function φ : Rn → R. Then for every λ > 0,
epi (λφ)∗ = λ epi φ∗ .
which implies
S
Hence, λ∈Rm epi (λΦ)∗ is a cone.
+ S
Now consider (ξi , αi ) ∈ λ∈Rm epi (λΦ)∗ , i = 1, 2, which implies there
+
∗
exist λi ∈ Rm + such that (ξi , αi ) ∈ epi (λi Φ) for i = 1, 2. Therefore, by the
definition of conjugate function, for every x ∈ Rn ,
S
Because (ξi , αi ), i = 1, 2, were arbitrary, thus λ∈Rm epi (λΦ)∗ is a convex
+
set.
(iv) Suppose that (ξ, α) ∈ epi (λφ)∗ , which implies that (λφ)∗ (ξ) ≤ α. As
λ > 0, Proposition 2.103 (iii) leads to
∗ ξ α
φ ≤ ,
λ λ
ξ α
which implies , ∈ epi φ∗ , that is, (ξ, α) ∈ λ epi φ∗ . Because
λ λ
(ξ, α) ∈ epi (λφ)∗ was arbitrary, epi (λφ)∗ ⊂ λepi φ∗ . The reverse inclusion
can be obtained by following the steps backwards, thereby establishing the
result.
From the above theorem, for two proper lsc convex functions φi : Rn → R̄,
i = 1, 2,
In general, epi φ∗1 + epi φ∗2 need not be closed. But under certain additional
conditions, it can be shown that epi φ∗1 + epi φ∗2 is closed. We present below
the result from Burachik and Jeyakumar [20] and Dinh, Goberna, López, and
Son [32] to establish the same.
cl (epi φ∗1 + epi φ∗2 ) = epi (φ1 + φ2 )∗ = epi (φ∗1 φ∗2 ) = epi φ∗1 + epi φ∗2 ,
thereby yielding the result that epi φ∗1 + epi φ∗2 is closed.
Suppose that φ1 is continuous at x̂ ∈ dom φ1 ∩ dom φ2 , which yields
This implies that cone (dom φ1 − dom φ2 ) is a closed subspace and thus leads
to the desired result.
Note that the result gives only sufficient condition for the closedness of
epi φ∗1 + epi φ∗2 . The converse need not be true. For a better understanding,
we consider the following example from Burachik and Jeyakumar [20]. Let
φ1 = δ[0,∞) and φ2 = δ(−∞,0] . Therefore,
which leads to epi φ∗1 + epi φ∗2 = R × R+ , a closed convex cone. Observe that
cone(dom φ1 − dom φ2 ) = [0, ∞), which is not a subspace, and also neither
φ1 nor φ2 are continuous at dom φ1 ∩ dom φ2 = {0}. Thus, the condition,
epi φ∗1 + epi φ∗2 is closed, is a relaxed condition in comparison to the other
assumptions.
Using this closedness assumption, Burachik and Jeyakumar [21] obtained
an equivalence between the exact inf-convolution and ε-subdifferential Sum
Rule, which we present next.
Proof. (i) ⇒ (ii): The proof follows along the lines of Theorem 2.115.
(ii) ⇒ (iii): Suppose that (ξ, γ) ∈ cl (epi φ∗1 + epi φ∗2 ). By Theorem 2.123 (i),
(ξ, γ) ∈ epi (φ1 + φ2 )∗ . Let x̄ ∈ dom φ1 ∩ dom φ2 . By Theorem 2.122, there
exists ε ≥ 0 such that
ξ = ξ1 + ξ2 and ε = ε 1 + ε2 .
epi (φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ) = epi φ∗1 + epi φ∗2 ,
which implies (ξ, (φ1 + φ2 )∗ (ξ)) ∈ epi φ∗1 + epi φ∗2 . Thus for i = 1, 2, there
exist (ξi , γi ) ∈ epi φ∗i such that
ξ = ξ1 + ξ2 and (φ1 + φ2 )∗ = γ1 + γ2 ,
Therefore,
¯ + φ∗ (ξ)
(φ∗1 φ∗2 )(ξ) ≤ φ∗1 (ξ − ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ).
2
which along with the preceding condition leads to the exact infimal convolu-
tion, thereby establishing (i).
Though it is obvious that under the closedness of epi φ∗1 + epi φ∗2 , one
can obtain the subdifferential Sum Rule by choosing ε = 0 in (ii) of the above
theorem, we present a detailed version of the result from Burachik and Jeyaku-
mar [20]. Below is an alternative approach to the Sum Rule, Theorem 2.91.
To prove the result, we shall prove the converse inclusion. Suppose that
ξ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.108,
(ξ, hξ, x̄i − (φ1 + φ2 )(x̄)) ∈ epi (φ1 + φ2 )∗ = epi φ∗1 + epi φ∗2 ,
which implies that there exist (ξi , γi ) ∈ epi φ∗i , i = 1, 2, such that
Also, as (ξi , γi ) ∈ epi φ∗i , i = 1, 2, which along with the above conditions lead
to
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≤ hξ, x̄i − (φ1 + φ2 )(x̄) = hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄).
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≥ hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄),
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) = hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄).
φ∗1 (ξ1 ) + φ1 (x̄) − hξ1 , x̄i = hξ2 , x̄i − φ2 (x̄) − φ∗2 (ξ2 ) ≤ 0,
which by Theorem 2.108 yields ξ1 ∈ ∂φ1 (x̄). Along similar lines it can be
obtained that ξ2 ∈ ∂φ2 (x̄). Thus,
Proof. We know that for any convex set F ⊂ Rn , δF∗ = σF . Thus the condition
is equivalent to
∂(δF1 + δF2 )(x̄) = ∂δF1 (x̄) + ∂δF2 (x̄), x̄ ∈ dom δF1 ∩ dom δF2 .
Because δF1 + δF2 = δF1 ∩F2 , the above equality condition along with the fact
that for any convex set F ⊂ Rn , ∂δF = NF , yields the desired result, that is,
NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 .
3.1 Introduction
Recall the convex optimization problem presented in Chapter 1
min f (x) subject to x ∈ C, (CP )
n n
where f : R → R is a convex function and C ⊂ R is a convex set. It is nat-
ural to think that f ′ (x, h) and ∂f (x) will play a major role in the process of
establishing the optimality conditions as these objects have been successful in
overcoming the difficulty posed by the absence of a derivative. In this chapter
we will not bother ourselves with extended-valued function but such a frame-
work can be easily adapted into the current framework. But use of extended-
valued convex functions might appear while doing some of the proofs, as one
will need to use the calculus rules for subdifferentials like the Sum Rule or the
Chain Rule. To begin our discussion more formally, we right away state the
following basic result.
Therefore,
143
which implies
f ′ (x̄, x − x̄) ≥ 0, ∀ x ∈ C.
By Theorem 2.35, TC (x̄) = cl cone(C − x̄) and therefore, the above inequality
becomes
f ′ (x̄, d) ≥ 0, ∀ d ∈ TC (x̄).
min (f + δC )(x).
x∈Rn
0 ∈ ∂(f + δC )(x̄).
0 ∈ ∂f (x̄) + NC (x̄).
Conversely, suppose that condition (ii) is satisfied, which means that there
exists ξ ∈ ∂f (x̄) such that −ξ ∈ NC (x̄), that is,
hξ, x − x̄i ≥ 0, ∀ x ∈ C.
Therefore by the convexity of f , which along with the above inequality yields
f (x) ≥ f (x̄), ∀ x ∈ C,
thereby leading to the desired result.
By condition (ii) of the above theorem, there exists ξ ∈ ∂f (x̄) such that
h−ξ, xi ≤ h−ξ, x̄i, ∀ x ∈ C.
As x̄ ∈ C, the above condition yields that the support function to the set C
at −ξ is given by
σC (−ξ) = −hξ, x̄i.
Thus, condition (ii) is equivalent to the above condition.
Again, by condition (ii) of Theorem 3.1, there exists ξ ∈ ∂f (x̄) such that
−ξ ∈ NC (x̄), which can be equivalently expressed as
h(x̄ − αξ) − x̄, x − x̄i ≤ 0, ∀ x ∈ C, ∀ α ≥ 0.
Therefore, by Proposition 2.52, condition (ii) is equivalent to
x̄ = projC (x̄ − αξ), ∀ α ≥ 0.
We state the above discussion as the following result.
Theorem 3.2 Consider the convex optimization problem (CP ). Then x̄ is a
point of minimizer of (CP ) if and only if there exists ξ ∈ ∂f (x̄) such that
either σC (−ξ) = −hξ, x̄i or x̄ = projC (x̄ − αξ), ∀ α ≥ 0.
Thus, we define
Proposition 3.3 Consider the set C as in (3.1). Assume that the active index
set at x̄, that is,
is nonempty. Let the Slater constraint qualification hold, that is, there exists
x̂ ∈ Rn such that gi (x̂) < 0, for i = 1, 2, . . . , m. Then
X
NC (x̄) = λi ξi ∈ Rn : ξi ∈ ∂gi (x̄), λi ≥ 0, i ∈ I(x̄) .
i∈I(x̄)
Proof. Consider a sequence {xk } ⊂ K such that xk → x̃. To prove the result,
we need to show that x̃ ∈ K. As xk ∈ K, there exist λk ≥ 0 and ak ∈ A
such that xk = λk ak for every k. Because A is compact, {aν } is a bounded
sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, {ak } has a
|λk |kak k 1
|λk | = ≤ kλk ak k.
kak k α
xk = λk ak → λ̃ã as k → +∞.
Lemma 3.5 Assume that I(x̄) is nonempty and the Slater constraint quali-
b
fication holds. Then the set S(x̄) given by (3.2) is a closed convex cone.
λ1i λ2
ξi1 + 1 i 2 ξi2 ∈ ∂gi (x̄).
λ1i + λi2 λi + λi
b
Hence, v1 + v2 ∈ S(x̄).
b
Finally, we have to show that S(x̄) is closed. Consider the function
b
We claim that S(x̄) = cone(∂g(x̄)), that is,
b
S(x̄) = {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}. (3.4)
b
But before showing that S(x̄) is given as above and applying Proposition 3.4
b
to conclude that S(x̄) is closed, we first need to show that 0 6∈ ∂g(x̄).
As the Slater constraint qualification holds, there exists x̂ such that
gi (x̂) < 0 for every i = 1, 2, . . . , m. Hence g(x̂) < 0. By the convexity of
g,
0 ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.
b
Consider v ∈ S(x̄) with v 6= 0. We will show that
v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.
b
As vP∈ S(x̄), there exist λi ≥ 0 and ξi ∈ ∂gi (x̄), i ∈ I(x̄) such that
v = i∈I(x̄) λi ξi . Because v 6= 0 and 0 6∈ ∂gi (x̄) for all i ∈ I(x̄), it is clear that
P
all the λi , i ∈ I(x̄) cannot be simultaneously zero and hence i∈I(x̄) λi > 0.
P P
Let α = i∈I(x̄) λi and thus i∈I(x̄) λi /α = 1. Therefore,
1 X λi [
v= ξi ∈ co ∂gi (x̄) ,
α α
i∈I(x̄) i∈I(x̄)
b
S(x̄) ⊆ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.
b
where λ′i = λµi ≥ 0 for i ∈ I(x̄). Hence, v ∈ S(x̄). Because v was arbitrary,
(3.4) holds, which along with the fact that 0 6∈ ∂g(x̄) and Proposition 3.4
b
yields that S(x̄) is closed.
Remark 3.6 It may be noted here that S(x̄) b is proved to be closed under
the Slater constraint qualification, which is equivalent to
[
0 6∈ co ∂gi (x̄).
i∈I(x̄)
This observation was made by Wolkowicz [112]. In the absence of such condi-
b
tions, S(x̄) need not be closed.
b
Proof of Proposition 3.3. First we will prove that S(x̄) ⊆ NC (x̄). Consider
b
v ∈ S(x̄). Thus, there exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
any P
v = i∈I(x̄) λi ξi . Hence, for any x ∈ C,
X
hv, x − x̄i = λi hξi , x − x̄i.
i∈I(x̄)
Define
Our first step is to show that K is nonempty. By the Slater constraint qual-
ification, there exists x̂ such that gi (x̂) < 0 for every i = 1, 2, . . . , m, and
corresponding to that x̂, set u = x̂ − x̄. By the convexity of each gi and
Theorem 2.79,
which implies
0 ∈ ∂f (x̄) + NC (x̄).
By combining it with Proposition 3.3, we can conclude that under the Slater
constraint qualification, x̄ is a point of minimizer of the convex programming
problem (CP ) with C given by (3.1) if and only if there exists λ̄ ∈ Rm + such
that
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄).
i∈I(x̄)
The above inequalities along with (3.6) yields that for every x ∈ Rn ,
m
X m
X
f (x) − f (x̄) + λi (gi (x) − gi (x̄)) ≥ hξ0 , x − x̄i + λi hξi , x − x̄i = 0. (3.7)
i=1 i=1
f (x) ≥ f (x̄), ∀ x ∈ C.
Theorem 3.7 Consider the convex programming problem (CP ) with C given
by (3.1). Assume that the Slater constraint qualification holds. Then x̄ is a
point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m,
such that
m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Remark 3.8 In Proposition 3.3, we have seen how to compute the normal
cone when the convex inequality constraints need not be smooth. Now if gi ,
i = 1, 2, . . . , m, are differentiable and the Slater constraint qualification holds,
then from Proposition 3.3
X
NC (x̄) = {v ∈ Rn : λi ∇gi (x̄), λi ≥ 0, ∀ i ∈ I(x̄)}. (3.8)
i∈I(x̄)
Hence, any v ∈ NC (x̄) belongs to the set on the right-hand side. One can
simply check that any element on the right-hand side is also an element in the
normal cone. From (3.8), it is simple to see that NC (x̄) is a finitely generated
cone with {∇gi (x̄) : i ∈ I(x̄)} being the set of generators. Thus, NC (x̄) is
polyhedral when the gi , i = 1, 2, . . . , m, are differentiable and the Slater
constraint qualification holds.
Is the normal cone also polyhedral if the Slater constraint qualification
holds but the constraint functions gi , i = 1, 2, . . . , m, are not be differen-
tiable? What is seen from Proposition 3.3 is that in the case nondifferentiable
constraints, NC (x̄) can be represented as
X
NC (x̄) = { λi ξi ∈ Rn : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)}
i∈I(x̄)
[ X
= { λi ξi ∈ Rn : λi ≥ 0, i ∈ I(x̄)},
ξi ∈∂gi (x̄) i∈I(x̄)
x3
NC (x̄)
x2
x̄ = (0, 0, 0)
x1
C
Each of these are convex functions. It is simple to see that the Slater condition
holds. Just take the point x̂ = (0, 0, −1). It is also simple to see that the first
constraint is not differentiable at x̄ = (0, 0, 0). However, from the geometry,
It is easy to observe that this cone, which is also known as the second-order
cone, is not polyhedral as it has an infinite number of generators and hence is
not finitely generated.
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}
It can be easily verified that S(x̄) is a closed convex cone. Also observe that
is always satisfied. So one may simply consider the reverse inclusion as the
◦
Abadie constraint qualification. We will now compute (S(x̄)) . But before
we do that, let us convince ourselves through an example that the Abadie
C = {x ∈ R : |x| ≤ 0, x ≤ 0}.
Here, g1 (x) = |x|, g2 (x) = x and of course C = {0}. Let us set x̄ = 0. This
shows that TC (x̄) = {0}. Further, because both constraints are active at x̄,
Hence TC (x̄) = S(x̄), showing that the Abadie constraint qualification holds
while it is clear that the Slater constraint qualification does not hold.
Now we present the following result.
◦ b
Proposition 3.9 (S(x̄)) = cl S(x̄).
is a convex cone from Lemma 3.5. Recall from the proof of Lemma 3.5 that
b
S(x̄) was shown to be closed under the Slater constraint qualification. In the
b
absence of the Slater constraint qualification, S(x̄) need not be closed. First
b ◦ b
we show that cl S(x̄) ⊆ (S(x̄)) . Consider any v ∈ S(x̄), P which implies there
exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that v = i∈I(x̄) λi ξi . Consider
any element w ∈ S(x̄), that is, gi′ (x̄, w) ≤ 0 for i ∈ I(x̄). Hence for every
i ∈ I(x̄), by Theorem 2.79, hξi , wi ≤ 0 for every ξi ∈ ∂gi (x̄), which implies
* +
X
λi ξi , w ≤ 0,
i∈I(x̄)
b ◦
that is, hv, wi ≤ 0. Because w ∈ S(x̄) was arbitrarily chosen, S(x̄) ⊆ (S(x̄)) ,
◦ b ◦
which by closedness of (S(x̄)) leads to cl S(x̄) ⊆ (S(x̄)) .
To complete the proof, we will establish the reverse inclusion, that is,
◦ b ◦ b
(S(x̄)) ⊆ cl S(x̄). On the contrary, assume that (S(x̄)) * cl S(x̄), which
◦ b b
implies there exists w ∈ (S(x̄)) and w ∈/ cl S(x̄). As cl S(x̄) is a closed convex
cone, by the strict separation theorem, Theorem 2.26 (iii), there exists v ∈ Rn
with v 6= 0 such that
b
will be violated. Thus, v ∈ (cl S(x̄)) ◦
. Further, observe that for i ∈ I(x̄),
b
ξi ∈ S(x̄), where ξi ∈ ∂gi (x̄). Therefore, hv, ξi i ≤ 0 for every ξi ∈ ∂gi (x̄),
i ∈ I(x̄), which implies that gi′ (x̄, v) ≤ 0 for every i ∈ I(x̄). This shows that
◦
v ∈ S(x̄) and therefore, hv, wi ≤ 0 because w ∈ (S(x̄)) . This leads to a
contradiction, thereby establishing the result.
The result below presents the KKT optimality conditions under the Abadie
constraint qualification.
Proof. If the Abadie constraint qualification holds at x̄, then using Proposi-
tion 3.9, the relation (3.9) holds.
Conversely, suppose that (3.9) holds at x̄. By the convexity of gi , i ∈ I(x̄),
for every ξi ∈ ∂gi (x̄),
For every b
v ∈ S(x̄), there exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
P
v = i∈I(x̄) λi ξi . Therefore, by the above inequality,
X
hv, x − x̄i = hλi ξi , x − x̄i ≤ 0, ∀ x ∈ C,
i∈I(x̄)
b
which implies v ∈ NC (x̄). Thus S(x̄) ⊆ NC (x̄), which along with the fact that
b
NC (x̄) is closed implies that cl S(x̄) ⊆ NC (x̄). Therefore, (3.9) yields
0 ∈ ∂f (x̄) + NC (x̄)
and hence, by Theorem 3.1 (ii), x̄ is a point of minimizer of the convex pro-
gramming problem (CP ).
b
Thus, S(x̄) is a finitely generated cone and hence is closed. Therefore, it is
b
clear that when either S(x̄) is closed or gi , i ∈ I(x̄) are smooth functions,
then under the Abadie constraint qualification, the standard KKT conditions
are satisfied.
Theorem 3.11 Let us consider the problem (CP 1). Assume the Slater-type
constraint qualification, that is, there exists x̂ ∈ ri X such that gi (x̂) < 0 for
i = 1, 2, . . . , m. Then the KKT optimality conditions are necessary as well as
sufficient at a point of minimizer x̄ of (CP 1) and are given as
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2 . . . , m.
i=1
min (f + δC + δX )(x).
x∈Rn
0 ∈ ∂(f + δC + δX )(x̄).
The fact that ri dom f = Rn along with the Slater-type constraint qualifica-
tion and Propositions 2.15 and 2.67 imply that x̂ ∈ ri dom f ∩ ri C ∩ ri X. In-
voking the Sum Rule, Theorem 2.91, along with the facts that ∂δC (x̄) = NC (x̄)
and ∂δX (x̄) = NX (x̄), the above relation leads to
that is,
m
X
hξ0 , x − x̄i + hλi ξi , x − x̄i ≥ 0, ∀ x ∈ X.
i=1
f (x) ≥ f (x̄), ∀ x ∈ X,
0 ∈ ∂f (x̄) + NC (x̄).
min −hv, xi
subject to Ax = b.
As the constraints are affine, the KKT optimality conditions for this problem
automatically hold, that is, there exists λ ∈ Rm such that
−v + AT λ = 0,
−AT λ ∈ ∂f (x̄).
Using the convexity of f , the above relation implies that x̄ is a point of mini-
mizer of (CP 2). This discussion can be stated as the following theorem.
Theorem 3.12 Consider the problem (CP 2). Then x̄ is a point of minimizer
of (CP 2) if and only if there exists λ ∈ Rm such that
−AT λ ∈ ∂f (x̄).
where F (x) = max{f (x) − f (x̄), g1 (x), g2 (x), . . . , gm (x)}. Then by the uncon-
strained optimality condition, Theorem 2.89,
0 ∈ ∂F (x̄).
λi
where λ̄i = , i ∈ I(x̄). Taking λ̄i = 0, i 6∈ I(x̄), the above condition
λ0
becomes
m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i=1
λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
thus yielding the desired conditions. The sufficiency part can be worked out
using the convexity of the functions, as done in the previous KKT optimality
theorems.
C = {x ∈ Rn : G(x) ∈ −S}.
min (f + δC + δX )(x).
and thus,
So our main concern now is to explicitly compute NC (x̄). How does one do
that? We have already observed that it is not so straightforward to compute
the normal cone when the inequality constraints are not smooth. Let us now
mention that in this case also we do not consider G to be differentiable. Thus
we shall now introduce the notion of a subdifferential of a cone convex function.
As G is a S-convex function, we call an m × n matrix A to be a subgradient
of G at x ∈ Rn
It was shown, for example, in Luc, Tan, and Tinh [78] that if G is an
S-convex function, then G is continuous on Rn . Further, G is also a locally
Lipschitz function on Rn ; that is, for any x0 ∈ Rn , there exists a neighborhood
N (x0 ) of x0 such that there exists Lx0 > 0 satisfying
Observe that Lx0 depends on the chosen x0 and is also called the Lipschitz
constant at x0 . Also, note that a locally Lipschitz vector function need not
be differentiable everywhere. For a locally Lipschitz function G, the Clarke
Jacobian of G at x is given as follows,
∂C G(x) = co A ∈ Rm×n : A = lim JG(xk ) where xk → x, xk ∈ D ,
k→∞
where D̃ denotes the set of points at which φ is differentiable. One can observe
that if m = 1, the Clarke Jacobian reduces to the Clarke subdifferential.
The Clarke subdifferential is nonempty, convex, and compact. If x̄ is a local
minimum of φ over Rn , then 0 ∈ ∂ ◦ φ(x̄). It is important to note that this
condition is necessary but not sufficient. Now we state a calculus rule that
will be useful in our computation of the normal cone. The Sum Rule is from
Clarke [27].
The Chain Rule that we state is from Demyanov and Rubinov [30] (see also
Dutta [36]).
Observe that v ∈ NC (x̄) (in the current context of (CCP 1)) if and only if
x̄ is a point of minimizer of the problem
min − hv, xi subject to G(x) ∈ −S. (N P )
n
For simplicity, assume that C = {x ∈ R : G(x) ∈ −S} is an n-dimensional
convex set. The approach to derive the necessary and sufficient condition for
optimality is due to Rockafellar [100] (see also Chapter 6 of Rockafellar and
Wets [101]). As the above problem is a convex programming problem, x̄ is a
global point of minimizer of (N P ). Further, without loss of generality, we can
assume it to be a unique. Observe that if we define
C 11
00
C ∩Y 00
11
x̄
FIGURE 3.2: C ∩ Y .
Note that ū = G(x̄). Also, Sk is nonempty as (x̄, ū) ∈ Sk for each k. Because
Sk is nonempty for each k, solving (Nd P k ) is same as minimizing fˆk over Sk .
Denote the minimum of f over the compact set Y by µ. For any (x, u) ∈ Sk ,
which implies
1
f (x) + kG(x) − uk2 ≤ f (x̄),
2εk
1
µ+ kG(x) − uk2 ≤ f (x̄),
2εk
which leads to
p
kG(x) − uk ≤ 2εk (f (x̄) − µ).
and as (xk , uk ) ∈ Sk ,
Hence, ũ = G(x̃) and thus (x̃, ũ) is also a minimizer of (N d P ). But as (x̄, ū) is
the unique point of minimizer of (N d P ), we have x̃ = x̄ and ũ = ū.
d
Because (xk , uk ) is a point of minimizer of (N P k ), it is a simple exercise to
see that xk minimizes fk (x, uk ) over Y and uk minimizes fˆk (xk , u) over −S.
ˆ
Hence,
One can easily compute ∇u fˆk (xk , uk ) to see that yk = −∇u fˆk (xk , uk ) and
hence yk ∈ N−S (uk ). Moreover, applying the Sum Rule and the Chain Rule
for a locally Lipschitz to (3.13), then for each k,
0 ∈ −v + ∂C G(x̄)T ȳ + NY (x̄).
1 yk 1
0∈ (−v) + ∂G(xk )T + NY (xk ),
kyk k kyk k kyk k
which implies
1
0∈ (−v) + ∂G(xk )T wk + NY (xk ), (3.16)
kyk k
yk
where wk = . Hence, {wk } is a bounded sequence and thus by the
kyk k
Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence.
Without loss of generality, assume that wk → w̄ with kw̄k = 1. Hence from
(3.16),
0 ∈ ∂G(x̄)T w̄. (3.17)
As yk ∈ N−S (uk ), we have wk ∈ N−S (uk ). Again using the fact that the
normal cone map has a closed graph, w̄ ∈ N−S (ū). Hence,
hw̄, −G(x̄)i ≤ 0,
that is,
hw̄, G(x̄)i ≥ 0. (3.18)
Consider p ∈ −S. As S is a convex cone, by Theorem 2.20, G(x̄) + p ∈ −S.
Hence,
hw̄, pi ≤ 0, ∀ p ∈ −S.
which implies
hw̄, G(y)i ≥ 0, ∀ y ∈ Rn .
v = z T ȳ, where ȳ ∈ S + ,
The reader is now urged to write the necessary and sufficient optimality con-
ditions for the problem (CCP 1), as the structure of the normal cone to C at
x̄ is now known.
4.1 Introduction
In the previous chapter, the KKT optimality conditions was studied using the
normal cone as one of the main vehicles of expressing the optimality condi-
tions. One of the central issues in the previous chapter was the computation
of the normal cone at the point of the feasible set C where the set C was ex-
plicitly described by the inequality constraints. In this chapter our approach
to the KKT optimality condition will take us deeper into convex optimization
theory and also we can avoid the explicit computation of the normal cone.
This approach uses the saddle point condition of the Lagrangian function as-
sociated with (CP ). We motivate the issue using two-person-zero-sum games.
Consider a two-person-zero-sum game where we denote the players as
Player 1 and Player 2 having strategy sets X ⊂ Rn and Λ ⊂ Rm , respec-
tively, which we assume to be compact for simplicity. In each move of the
game, the players reveal their choices simultaneously. For every choice x ∈ X
by Player 1 and λ ∈ Λ by Player 2, an amount L(x, λ) is paid by Player 1 to
Player 2. Now Player 1 behaves in the following way. For any given choice of
strategy x ∈ X, he would like to know what the maximum amount he would
have to give to Player 2. In effect, he computes the function
φ(x) = max L(x, λ).
λ∈Λ
Similarly, Player 2 would naturally want to know what the guaranteed amount
he will receive once he makes a move λ ∈ Λ. This means he computes the
function
ψ(λ) = min L(x, λ).
x∈X
169
Of course he would like to maximize the amount of money he gets and therefore
solves the problem
Thus, in every game there are two associated optimization problems. The
minimization problem for Player 1 and the maximization problem for Player
2. In the optimization literature, the problem associated with Player 1 is
called the primal problem while that associated with Player 2 is called the dual
problem. Duality is a deep issue in modern optimization theory. In this chapter,
we will have quite a detailed discussion on duality in convex optimization. The
game is said to have a value if
The above relation is called the saddle point condition. It is easy to observe
that (x̄, λ̄) ∈ X × Λ is a saddle point if and only if
min max L(x, λ) ≤ max L(x̄, λ) = min L(x, λ̄) ≤ max min L(x, λ),
x∈X λ∈Λ λ∈Λ x∈X λ∈Λ x∈X
which along with the minimax inequality yields the minimax equality.
Before moving on to study the optimality of the convex programming
problem (CP ) via the saddle point approach, we state the Saddle Point The-
orem (Proposition 2.6.9, Bertsekas [12]) for which we will need the following
notations.
For each λ ∈ Λ, define the proper function φλ : Rn → R̄ as
L(x, λ), x ∈ X,
φλ (x) =
+∞, otherwise,
{x ∈ X : L(x, λ̄) ≤ α}
{λ ∈ Λ : L(x̄, λ) ≥ α}
This proposition will play a pivotal role in the study of enhanced optimality
conditions in Chapter 5.
Our question is, can we construct a function like L(x, λ) for the convex
(CP ) for which f (x) can be represented in a way as φ(x) has been represented
through L(x, λ)? Note that if we remove the compactness from Λ, then φ(x)
could take up +∞ value for some x. It is quite surprising that for the objective
function f (x) of (CP ), such a function can be obtained by considering the
classical Lagrangian function from calculus.
For the problem (CP ) with inequality constraints, we construct the La-
grangian function L : Rn × Rm + → R as
m
X
L(x, λ) = f (x) + λi gi (x)
i=1
with λ = (λ1 , λ2 , . . . , λm ) ∈ Rm
+ . Observe that it is a simple matter to show
that
f (x), x is feasible,
sup L(x, λ) =
λ∈Rm +
+∞, otherwise.
Here, the Lagrangian function L(x, λ) is playing the role of L(x, λ). So the
next pertinent question is, if we can solve (CP ) then does there exist a saddle
point for it? Does the existence of a saddle point for L(x, λ) guarantee that
a solution to the original problem (CP ) is obtained? The following theorem
answers the above questions. Recall the convex programming problem
min f (x) subject to x∈C (CP )
with C given by
Theorem 4.2 Consider the convex programming problem (CP ) with C given
by (4.1). Assume that the Slater constraint qualification holds, that is, there
exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. Then x̄ is a point of
minimizer of (CP ) if and only if there exists λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) ∈ Rm
+
satisfying the complementary slackness condition, that is, λ̄i gi (x̄) = 0 for
i = 1, 2, . . . , m and the saddle point condition
We leave it the reader to prove that the set Λ is convex and open. It is clear
that (0, 0) ∈
/ Λ. Hence, by the Proper Separation Theorem, Theorem 2.26 (iv),
there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that
m
X
λ0 y0 + λi yi ≥ 0, ∀ (y0 , y) ∈ Λ. (4.2)
i=1
which implies
m
X m
X
f (x̄) + λi gi (x̄) ≤ f (x̄) = f (x̄) + λ̄i gi (x̄),
i=1 i=1
that is,
Therefore,
m
X
f (x̄) + λi gi (x̄) > f (x̄),
i=1
which implies L(x̄, λ) > L(x̄, λ̄), thereby contradicting the saddle point con-
dition. Hence, x̄ is feasible to (CP ).
Because L(x̄, λ̄) ≤ L(x, λ̄) and the complementary slackness condition is
satisfied,
m
X
f (x̄) ≤ f (x) + λi gi (x) ≤ f (x), ∀ x ∈ C.
i=1
The consequence of the saddle point criteria is simple. If (x̄, λ̄) is a saddle
point associated with the Lagrangian function of (CP ) where x̄ is a point of
minimizer of f over C, then
0 ∈ ∂x L(x̄, λ̄),
C = {(x1 , x2 ) ∈ R2 : x1 + x2 ≤ 0, − x1 ≤ 0}.
This set is described by affine inequalities. However, C = {(0, 0)} and hence
the Slater constraint qualification fails. The question is whether in such a sit-
uation the saddle point condition exists or not. What we show below is that
the presence of affine inequalities does not affect the saddle point condition.
In fact, we should only bother about the Slater constraint qualification for
the convex non-affine inequalities. The presence of affine inequalities by itself
is a constraint qualification. To the best of our knowledge, the first study
in this respect was due to Jaffray and Pomerol [65]. We present their result
establishing the saddle point criteria under a modified version of Slater con-
straint qualification using the separation theorem. For that we now consider
the feasible set C of the convex programming problem (CP ) defined by convex
non-affine and affine inequalities as
m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x).
i=1 j=1
Then (x̄, λ̄, µ̄) is the saddle point of (CP ) with C given by (4.4) if
We shall now present the proof of Jaffray and Pomerol in a more detailed
and simplified manner.
Theorem 4.3 Consider (CP ) with C defined by (4.4). Assume that the mod-
ified Slater constraint qualification holds, that is, there exists x̂ ∈ Rn such that
gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) ≤ 0, j = 1, 2, . . . , l. Then x̄ is a point of
minimizer of (CP ) if and only if there exist (λ̄, µ̄) ∈ Rm l
+ × R+ such that
hj (x) = 0, ∀ j ∈ J.
Otherwise, if for some x ∈ C and for some j ∈ J, hj (x) < 0, the maximality
of J is contradicted.
Define the Lagrange covers of (CP ) as
where J c = {j ∈ {1, 2, . . . , l} : j ∈
/ J}.
We claim that the set Λ is convex. Consider (y01 , y 1 , z 1 ) and (y02 , y 2 , z 2 ) in Λ
with x1 and x2 the respective associated elements from Rn . For any λ ∈ [0, 1],
x = λx1 + (1 − λ)x2 ∈ Rn . By the convexity of f and gi , i = 1, 2, . . . , m,
Consider (y0 , y, z) ∈ Λ. For any α0 > 0 and α ∈ int Rm + , (y0 +α0 , y +α, z) ∈ Λ.
Therefore, by (4.5), i′ = 0, 1, . . . , m,
′
1
m
X Xl iX −1 Xm
λi′ ≥ − λ0 y0 + λi yi + µj zj + λi αi + λi αi ,
αi′ i=1 i=1 i=0 ′
i=i +1
where γ > 0 is chosen such that µ̂j > 0 for j ∈ J. Also, observe that
X X X X
µ̂j hj (x) = µj hj (x) + γαj hj (x) = µj hj (x).
j∈J j∈J j∈J j∈J
Thus, the conditions (4.5) and (4.6) hold for (λ̂0 , λ̂, µ̂) as well.
We claim that λ̂0 , λ̂i , i = 1, 2, . . . , m, and µ̂j , j ∈ J c , are not all simul-
taneously zero. On the contrary, assume that λ̂0 = 0, λ̂i = 0, i = 1, 2, . . . , m,
which along with condition (4.5) and the modified Slater constraint qualifica-
tion leads to
m
X l
X
0> λ̂i gi (x̂) + µ̂j hj (x̂) ≥ 0,
i=1 j=1
λ̂i µ̂j
where λ̄i = , i = 1, 2, . . . , m, and µ̄j = , j = 1, 2, . . . , l. Corresponding
λ̂0 λ̂0
to every x ∈ Rn ,
By the feasibility of x̄ for (CP ) and the fact that (λ̄, µ̄) ∈ Rm l
+ × R+ , condition
(4.8) implies that
m
X l
X m
X l
X
λi gi (x̄) + µj hj (x̄) ≤ 0 = λ̄i gi (x̄) + µ̄j hj (x̄),
i=1 j=1 i=1 j=1
that is,
thereby leading to the desired result. The converse of the above the result can
be obtained in a manner similar to Theorem 4.2.
In the convex programming problem (CP ) considered by Jaffray and
Pomerol [65], the problem involved only convex non-affine and affine inequal-
ities. Next we present a similar result from Florenzano and Van [47] to derive
the saddle point criteria under a modified version of Slater constraint quali-
fication but for a more general scenario involving additional affine equalities
and abstract constraints in (4.4). Consider the feasible set C of the convex
programming problem (CP ) as
C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , s,
hj (x) = 0, j = s + 1, s + 2, . . . , l}, (4.9)
m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x),
i=1 j=1
where µ = (µ̂, µ̃) ∈ Rs+ × Rl−s . Then (x̄, λ̄, µ̄) is called the saddle point of the
above problem if
ˆ, µ̄
where µ = (µ̂, µ̃) and µ̄ = (µ̄ ˜) are in Rs+ × Rl−s .
Theorem 4.4 Consider the convex programming problem (CP ) with C de-
fined by (4.9). Let x̄ be a point of minimizer of (CP ). Assume that there exists
x̂ ∈ ri X such that
hj (x̂) ≤ 0, j = 1, 2, . . . , s,
hj (x̂) = 0, j = s + 1, s + 2, . . . , l.
m
X l
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x) + µj hj (x), ∀ x ∈ X,
i=1 j=1
It can be easily shown as in the proof of Theorem 4.3 that Λ is a convex set.
Λ ∩ (R1+m+s
− × {0Rl−s }) = ∅.
ri Λ ∩ ri (R1+m+s
− × {0Rl−s }) = ∅.
Invoking the Proper Separation Theorem, Theorem 2.26 (iv), there exists
(λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that
m
X l
X m
X s
X
λ0 y0 + λi yi + µj zj ≥ λ0 w0 + λi wi + µj vj (4.10)
i=1 j=1 i=1 j=1
Xm Xl Xm
1
µ ≥
j′ {λ0 y0 + λi yi + µj zj − λ0 w0 − λi wi
vj ′ i=1 j=1 i=1
′
jX −1 s
X
− µj vj − µj vj }.
j=1 j=j ′ +1
By the given hypothesis, for x̂ ∈ ri X along with the above inequality implies
that
l
X
µj hj (x̂) = 0,
j=1
Pl
that is, the affine function j=1 µj hj (.) achieves its minimum at a relative
interiorPpoint. Because an affine function achieves its minimum at a boundary
l
point, j=1 µj hj (.) has a constant value zero over X, that is,
l
X
µj hj (x) = 0, ∀ x ∈ X. (4.13)
j=1
such that
m
X l
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x) + µj hj (x), ∀ x ∈ X (4.14)
i=1 j=1
and
λi gi (x̄) = 0, i = 1, 2, . . . , m, and µj hj (x̄) = 0, j = 1, 2, . . . , s. (4.15)
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. Because
(λ0 , λ) 6= (0, 0), λ =
6 0. Therefore, the optimality condition (4.14) becomes
m
X l
X
λi gi (x) + µj hj (x) ≥ 0, ∀ x ∈ X.
i=1 j=1
In particular, for x = x̂, the above condition along with the modified Slater
constraint qualification leads to
m
X l
X
0> λi gi (x̂) + µj hj (x̂) ≥ 0,
i=1 j=1
λi µj
where λ̄i = , i = 1, 2, . . . , m, and µ̄j = , j = 1, 2, . . . , l. This inequality
λ0 λ0
along with the condition (4.15) leads to
m
X l
X m
X l
X
λi gi (x̄) + µj hj (x̄) ≤ 0 = λ̄i gi (x̄) + µ̄j hj (x̄),
i=1 j=1 i=1 j=1
which leads to
thereby proving the desired saddle point result. The converse can be worked
out as in Theorem 4.2.
Observe that the saddle point condition in the above theorem
can be rewritten as
m
X s
X l
X
f (x̄) + λ̄i gi (x̄) + µ̂j hj (x̄) + µ̃j hj (x̄) + δX (x̄)
i=1 j=1 j=s+1
m
X s
X l
X
≤ f (x) + λi gi (x) + µ̂j hj (x) + µ̃j hj (x) + δX (x)
i=1 j=1 j=s+1
nonempty. Applying the Sum Rule, Theorem 2.91 along with the fact that
∂δX (x̄) = NX (x̄) yields the KKT optimality condition
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∂hj (x̄) + ∂( µ̃j hj )(x̄) + NX (x̄).
i=1 j=1 j=s+1
along with
λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s.
where Ω = Rm s
+ × R+ × R
l−s
. Taking a clue from the two-person-zero-sum
games, the dual problem to (CP ) that we denote by (DP ) can be stated as
that is,
sup inf L(x, λ, µ̂, µ̃) = inf sup L(x, λ, µ̂, µ̃).
(λ,µ̂,µ̃)∈Ω x∈X x∈C (λ,µ̂,µ̃)∈Ω
The statement (4.16) is known as strong duality. We now present a result that
shows when strong duality holds.
Theorem 4.7 Consider the problem (CP ) where the set C is defined by (4.9).
Assume that (CP ) has a lower bound, that is, it has an infimum value, vL ,
that is finite. Also, assume that the modified Slater constraint qualification is
satisfied. Then the dual problem (DP ) has a supremum and the supremum is
attained with
vL = d L .
vL = inf f (x).
x∈C
Working along the lines of the proof of Theorem 4.4, we conclude from (4.12)
ˆ, µ̄
that there exists nonzero (λ̄0 , λ̄, µ̄ ˜) ∈ R+ × Rm s
+ × R+ × R
l−s
such that
m
X s
X l
X
λ̄0 (f (x) − vL ) + λ̄i gi (x) + ˆj hj (x) +
µ̄ ˜j hj (x) ≥ 0, ∀ x ∈ X.
µ̄
i=1 j=1 j=s+1
Therefore,
ˆ, µ̄
L(x, λ̄, µ̄ ˜) ≥ vL , ∀ x ∈ X,
ˆ, µ̄
that is, w(λ̄, µ̄ ˜) ≥ vL . Hence,
sup ˆ, µ̄
w(λ, µ̂, µ̃) ≥ w(λ̄, µ̄ ˜ ) ≥ vL .
(λ,µ̂,µ̃)∈Ω
where
q
w(λ) = inf2 ex2 + λ( x21 + x22 − x1 ), λ ≥ 0.
x∈R
Observe that the only feasible point of the primal problem is (x1 , x2 ) = (0, 0)
and hence inf ex2 = e0 = 1. Thus, the minimum value or the infimum value
of the primal problem is vL = 1. Now let us evaluate the p function w(λ) for
each λ ≥ 0. Observe that for every fixed x2 , the term ( x21 + x22 − x1 ) → 0
as x1 → +∞. Thus, for each x2 , the value ex2 dominates the expression
q
ex2 + λ( x21 + x22 − x1 )
By letting x2 → −∞,
w(λ) = 0, ∀ λ ≥ 0.
We are now going to present some deeper properties of the dual variables
(or Lagrange multipliers) for the problem (CP ) with convex non-affine in-
equality, that is, the feasible set C is given by (4.1),
C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m}.
It is quite natural to think that when we change x̄, the set of multipliers will
also change. We now show that for a convex programming problem, the set
M(x̄) does not depend on the solution x̄. Consider the set
M = {λ ∈ Rm
+ : inf f (x) = infn L(x, λ)}. (4.17)
x∈C x∈R
In the following result we show that M(x̄) = M for any solution x̄ of (CP ).
The proof of this fact is from Attouch, Buttazzo, and Michaille [3].
Theorem 4.9 Consider the convex programming problem (CP ) with C de-
fined by (4.1). Let x̄ be the point of minimizer of (CP ). Then M(x̄) = M.
0 ∈ ∂x L(x̄, λ)
Thus,
m
X
f (x̄) = infn (f + λi gi )(x) = infn L(x, λ).
x∈R x∈R
i=1
Therefore,
m
X
f (x̄) ≤ f (x̄) + λi gi (x̄),
i=1
thereby yielding
m
X
λi gi (x̄) ≥ 0.
i=1
The above inequality along with the feasibility of x̄ for (CP ) and nonnegativity
of λi , i = 1, 2, . . . , m, leads to
m
X
λi gi (x̄) = 0.
i=1
λi gi (x̄) = 0, i = 1, 2, . . . , m.
Thus,
m
X m
X
f (x̄) + λi gi (x̄) = infn (f + λi gi )(x),
x∈R
i=1 i=1
Remark 4.10 In the above theorem, x̄ was chosen to be any arbitrary solu-
tion of (CP ). Thus, it is clear that M(x̄) is independent of the choice of x̄
and hence M(x̄) = M for every solution x̄ of (CP ).
Note that the above result can be easily extended to the problem with
feasible set C defined by (4.9), that is, convex non-affine and affine inequalities
along with affine equalities. If we take a careful look at the set M, we realize
that for λ ∈ Rm + it is not essential that (CP ) has a solution; one merely
needs (CP ) to be bounded below. Thus Attouch, Buttazzo, and Michaille [3]
call the set M to be the set of generalized Lagrange multipliers. Of course
if (CP ) has a solution, then M is the set of Lagrange multipliers. We now
show how deeply the notion of Lagrange multipliers is associated with the
perturbation of the constraints of the problem. From a numerical point of
view, it is important to deal with constraint perturbations. Note that due to
rounding off and other errors, often the iterates do not satisfy the constraints
exactly but some perturbed version of it, that is, possibly in the form
gi (x) ≤ yi , i = 1, 2, . . . , m.
is called the value function or the marginal function associated with (CP ). It
is obvious that if v(0) ∈ R, then v(0) is the optimal value of (CP ). We now
establish that v : Rm → R̄ is a convex function. In order to show that, we
need the following interesting and important lemma.
is a convex function in v.
φ(vi ) < αi , i = 1, 2.
Therefore, there exist ū1 , ū2 ∈ Rn such that by the definition of infimum,
Φ(ūi , vi ) < αi , i = 1, 2.
which implies
Thus
Theorem 4.12 (i) Let v(0) ∈ R, then M = −∂v(0). Further, if the Slater
constraint qualification holds, then v is continuous at the origin and hence M
is convex compact set in Rm +.
m
X
−v ∗ (−λ) = infn {f (x) + λi gi (x)}.
x∈R
i=1
Thus the problem (DP 1) coincides with the Lagrangian dual problem of (CP ).
As λ ∈ Rm
+ , for any x ∈ C(y),
m
X m
X
λi gi (x) ≤ λi yi ,
i=1 i=1
Therefore,
m
X m
X
inf {f + λi gi }(x) ≤ inf f (x) + λi yi . (4.19)
x∈C(y) x∈C(y)
i=1 i=1
that is,
that is,
v(0) ≥ v(y), ∀ y ∈ Rm
+.
Because y ∈ Rm m
+ was arbitrary, it is clear that λ ∈ R+ . We now establish that
λ ∈ M by proving that
m
X
inf f (x) = infn {f + λi gi }(x),
x∈C x∈R
i=1
that is,
m
X
v(0) = infn {f + λi gi }(x).
x∈R
i=1
Therefore,
m
X m
X
infn {f + λi gi }(x) ≤ inf {f + λi gi }(x) ≤ inf f (x) = v(0). (4.20)
x∈R x∈C x∈C
i=1 i=1
By the definition (4.18) of value function v(ỹ) ≤ f (x̃), which along with the
above inequality leads to
m
X
f (x̃) + λi gi (x̃) ≥ v(0).
i=1
exists δ > 0 such that for every y ∈ Bδ (0) = δB, gi (x̂) < yi , i = 1, 2, . . . , m,
which implies that
v(y) ≤ f (x̂), ∀ y ∈ Bδ (0). (4.22)
As dom f = Rn , f (x̂) < +∞, thereby establishing that v is bounded above
on Bδ (0). This fact shows that
We claim that v(y) > −∞ for every y ∈ Rm . On the contrary, assume that
there exists ŷ ∈ Rm such that v(ŷ) = −∞. Thus,
{ŷ} × R ⊂ epi v.
Consider z = −αŷ such that α > 0 and kzk < δ. This is possible by choosing
δ 1 1−λ
α= . Setting λ = , we have λ ∈ (0, 1) and α = . This implies
2kŷk 1+α λ
−(1 − λ)
that z = ŷ, that is,
λ
λz + (1 − λ)ŷ = 0.
By choice, z ∈ Bδ (0), which by (4.22) implies that v(z) ≤ f (x̂) and thus,
that is,
Therefore,
Taking the limit as t → −∞, v(0) ≤ −∞. But v(0) ≥ −∞ and hence
v(0) = −∞, which is a contradiction because v(0) ∈ R. By Theorem 2.72,
the function v : Rm → R ∪ {+∞} is majorized on a neighborhood of the ori-
gin and hence v is continuous at y = 0. Then by Proposition 2.82, ∂v(0) is
convex compact set, which implies so is M.
(ii) We already know that λ ∈ M if and only −λ ∈ ∂v(0). Therefore, from
Theorem 2.108,
v(0) + v ∗ (−λ) = 0.
Observe that
If for some i ∈ {1, 2, . . . , m}, µi > 0, then v ∗ (µ) = +∞. So assume that
µ ∈ −Rm + . Then
m
X
v ∗ (µ) = sup {−f (x) + sup µi yi }.
x∈Rn yi ≥gi (x) i=1
Pm
As i=1 µi yi = hµ, yi is a linear function,
m
X m
X
sup µi yi = µi gi (x).
yi ≥gi (x) i=1 i=1
Xm
v ∗ (µ) = sup { µi gi (x) − f (x)}.
x∈Rn i=1
which implies
m
X
∗
−v (−λ) = infn {f (x) + λi gi (x)}.
x∈R
i=1
Thus, −v ∗ (−λ) = w(λ), thereby showing that the dual problems (DP ) and
(DP 1) are the same.
where AT denotes the conjugate of the linear map A or the transpose of the
matrix represented by A. In fact, A can be viewed as an m×n matrix. Assume
that the condition
0 ∈ core(dom g − A dom f )
holds. Then vF = dF and the supremum in the dual problem is attained if the
optimal value is finite. (Instead of the term core, one can also use interior or
relative interior.)
Proof. We first prove that vF ≥ dF , that is, the weak duality holds. By the
definition of conjugate function, Definition 2.101,
which implies
Similarly, we have
The above inequalities immediately show that for any λ ∈ Rm and any x ∈ Rn ,
Consider y ∈ dom h, that is, h(y) < +∞. Hence there exists x ∈ Rn such that
x ∈ dom f and Ax + y ∈ dom g, which leads to
y ∈ dom g − A dom f.
Let z ∈ dom g − A dom f , which implies that there exists u ∈ dom g and
x̂ ∈ dom f such that z = u − Ax̂. Hence z + Ax̂ ∈ dom g, that is,
Thus h(z) < +∞, thereby showing that z ∈ dom h. This proves the assertion
toward the domain of h.
Note that if vF = −∞, there is nothing to prove. Without loss of gen-
erality, we assume that vF is finite. By assumption, 0 ∈ core(dom h) (or
0 ∈ int(dom h)). By Proposition 2.82, ∂h(0) 6= ∅, which implies that there
exists −ξ ∈ ∂h(0). Thus, by Definition 2.77 of the subdifferential along with
the definition of h,
Hence,
Taking the infimum first over y and then over x yields that
From the historical point of view, we provide the statement of the classical
Fenchel duality theorem as it appears in Rockafellar [97].
We request the readers to figure out how one will define the notion of a
proper concave function. Of course, if we consider g to be a convex function,
(4.23) can be written as
Note that this can be easily proved using Theorem 4.13 by taking A to be
the identity mapping I : Rn → Rn . Moreover, ri(dom f ) ∩ ri(dom g) 6= ∅
shows that 0 ∈ int(dom g − dom f ). Hence the result follows by invoking
Theorem 4.13.
We now look into the perturbation-based approach. This approach is due
to Rockafellar. Rockafellar’s monogarph [99] entitled Conjugate Duality and
Optimization makes a detailed study of this method in an infinite dimen-
sional setting. We however discuss the whole issue from a finite dimensional
viewpoint. In this approach, one considers the original problem being embed-
ded in a family of problems. In fact, we begin by considering the convexly
parameterized family of convex problems
min F (x, y) subject to x ∈ Rn , (CP (y))
where the vector y is called the parameter and the function F : Rn × Rm → R̄
is assumed to be proper convex jointly in x and y. In fact, in such a situation,
the optimal value function
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
where
f (x), gi (x) ≤ yi , i = 1, 2, . . . , m,
F (x, y) =
+∞, otherwise.
It is clear that
f (x), gi (x) ≤ 0, i = 1, 2, . . . , m,
F (x, 0) = f0 (x) =
+∞, otherwise.
Next we look at how to construct the dual problem for (CP (y)). Define the
Lagrangian function L : Rn × Rm → R̄ as
that is,
Observe that
Thus,
Theorem 4.15 Consider the problem (CP ) and (DPF ) as given above. Then
the following are equivalent:
(ii) x̄ is a solution for (CP ) and λ̄ is a solution for (DPF ) and there is no
duality gap.
where
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m,
hj (x) = 0, j = 1, 2, . . . , l, x ∈ X},
m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x).
i=1 j=1
Denote the optimal values of the Lagrangian and the Fenchel dual problems
as dL and dF , respectively. Note that f1∗ (ξ) = +∞ for some ξ ∈ Rn is a pos-
sibility. Similarly, for the concave conjugate, (f2 )∗ (ξ) = −∞ for some ξ ∈ Rn
is also a possibility. But these values play no role in the Fenchel dual problem
and thus the problem may be considered as
where
C1∗ = {ξ ∈ Rn : f1∗ (ξ) < +∞} and C2∗ = {ξ ∈ Rn : (f2 )∗ (ξ) > −∞}.
By Theorem 4.7 and Theorem 4.14, we have the strong duality results for the
Lagrangian and the Fenchel problems, respectively, that is,
vL = d L and vF = d F .
Now we move on to show that the two strong dualities are equivalent. But
before doing so we present a result from Magnanti [81] on relative interior.
Lemma 4.16 Consider the set
Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 ,
gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}.
If x̂ ∈ ri X such that
f (x̂) < ŷ0 , gi (x̂) < ŷi , i = 1, 2, . . . , m, and hj (x̂) = ẑj , j = 1, 2, . . . , l,
then (ŷ0 , ŷ, ẑ) ∈ ri Λ.
Proof. By the convexity of the functions f , gi , i = 1, 2, . . . , m, and hj ,
j = 1, 2, . . . , l, and the set X, it is easy to observe that the set Λ is con-
vex. It is left to the reader to verify this fact. To prove the result, we will
invoke the Prolongation Principle, Proposition 2.14 (iii).
Consider (y0 , y, z) ∈ Λ, that is, there exists x ∈ X such that
f (x) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m, and hj (x) = zj , j = 1, 2, . . . , l.
n
Because X ⊂ R is a nonempty convex set and x̂ ∈ ri X, by the Prolongation
Principle, there exists γ > 1 such that
γ x̂ + (1 − γ)x ∈ X,
which by the convexity of X yields that
αx̂ + (1 − α)x ∈ X, ∀ α ∈ (1, γ]. (4.24)
As dom f = dom gi = X, i = 1, 2, . . . , m with x̂ ∈ ri X, for some α ∈ (1, γ],
f (αx̂ + (1 − α)x) < αŷ0 + (1 − α)y0 , (4.25)
gi (αx̂ + (1 − α)x) < αŷi + (1 − α)yi , i = 1, 2, . . . , m. (4.26)
By the affineness of hj , j = 1, 2, . . . , l,
hj (αx̂ + (1 − α)x) < αẑj + (1 − α)zj , j = 1, 2, . . . , l. (4.27)
Combining the conditions (4.24) through (4.27) yields that for α > 1,
α(ŷ0 , ŷ, ẑ) + (1 − α)(y0 , y, z) ∈ Λ.
Because (y0 , y, z) ∈ Λ was arbitrary, by the Prolongation Principle,
(ŷ0 , ŷ, ẑ) ∈ ri Λ as desired.
Now we present the equivalence between the strong duality results.
where
C = {x ∈ R3n : hrj (x) = (xj − x3 )r = 0, j = 1, 2, r = 1, 2, . . . , n, x ∈ X}.
Observe that here hj : Rn → Rn . Note that the reformulated Fenchel problem
is nothing but the Lagrangian convex programming problem. The correspond-
ing Lagrangian dual problem is as follows:
n
X n
X
dL = sup inf {f1 (x1 ) − f2 (x2 ) + µr1 (x1 − x3 )r + µr2 (x2 − x3 )r },
(µ1 ,µ2 )∈R2n x∈X r=1 r=1
that is,
dL = sup inf {f1 (x1 ) − f2 (x2 ) + hµ1 , x1 i + hµ2 , x2 i
(µ1 ,µ2 )∈R2n x∈X
thereby implying that dL = dF . This along with the relation (4.29) yields that
vF = dF and hence the Fenchel strong duality holds.
Conversely, suppose that the Fenchel strong duality holds under the as-
sumption that ri C1 ∩ ri C2 6= ∅. Define
and
vL = inf{y0 : (y0 , y, z) ∈ C1 ∩ C2 },
vL = d F . (4.31)
m
X l
X
= sup inf {y0 + λi yi + µj zj }
(λ,µ)∈Rm l (y0 ,y,z)∈C1
+ ×R i=1 j=1
m
X l
X
= sup inf {f (x) + λi gi (x) + µj hj (x)},
(λ,µ)∈Rm l x∈X
+ ×R i=1 j=1
5.1 Introduction
Until now we have studied how to derive the necessary KKT optimality con-
ditions for convex programming problems (CP ) or its slight variations such
as (CP 1), (CCP ) or (CCP 1) via normal cone or saddle point approach. Ob-
serve that in the KKT optimality conditions, the multiplier associated with
the subdifferential of the objective function is nonzero and thus normalized
to one. As discussed in Chapters 3 and 4, some additional conditions known
as the constraint qualifications are to be satisfied by the constraints to ensure
that the multiplier is nonzero and hence the KKT optimality conditions hold.
But in absence a of constraint qualification, one may not be able to derive
KKT optimality conditions. For example, consider the problem
min x subject to x2 ≤ 0.
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X}.
207
Theorem 5.1 Consider the convex programming problem (CP 1) and let x̄ be
a point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m,
not all simultaneously zero, such that
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, ∀ i = 1, 2, . . . , m.
i=1
where F (x) = max{f (x) − f (x̄), g1 (x), g2 (x), . . . , gm (x)} is a convex function.
Therefore by the optimality condition (ii) of Theorem 3.1,
0 ∈ ∂F (x̄) + NX (x̄).
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i∈I(x̄)
where I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is the active index set at x̄. For
i∈
/ I(x̄), defining λi = 0 yields
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄)
i=1
Theorem 5.2 Consider the convex programming problem (CP 1) and let x̄ be
the point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m,
not all simultaneously zero, such that
m
X
(i) 0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄).
i=1
Fk (xk ) ≤ Fk (x̄), ∀ k ∈ N,
which implies
m
kX + 1
f (xk ) + (g (xk ))2 + kxk − x̄k2 ≤ f (x̄), ∀ k ∈ N. (5.1)
2 i=1 i 2
Otherwise as k → +∞, the left-hand side of (5.1) also tends to infinity, which
is a contradiction.
As {xk } is a bounded sequence, by the Bolzano–Weierstrass Theorem,
Proposition 1.3, it has a convergent subsequence. Without loss of generality,
assume that {xk } converge to x̃ ∈ X ∩ cl Bε (x̄). By the condition (5.2),
gi (x̃) ≤ 0, i = 1, 2, . . . , m,
and hence x̃ is feasible for (CP 1). Taking the limit as k → +∞ in the condition
(5.1) yields
1
f (x̃) + kx̃ − x̄k2 ≤ f (x̄).
2
As x̄ is the point of minimizer of (CP 1) and x̃ is feasible to (CP 1),
f (x̄) ≤ f (x̃). Thus the above inequality reduces to
kx̃ − x̄k2 ≤ 0,
which implies kx̃ − x̄k = 0, that is, x̃ = x̄. Hence, the sequence xk → x̄ and
thus there exists a k̄ ∈ N such that xk ∈ ri X ∩ Bε (x̄) for every k ≥ k̄.
For k ≥ k̄, xk is a point of minimizer of the penalized problem (Pk ), which
by Theorem 3.1 implies that
Again, because xk ∈ Bε (x̄), by Example 2.38, NBε (x̄) (xk ) = {0} and thus
which implies that for every k ≥ k̄, there exist ξ0k ∈ ∂f (xk ) and ξik ∈ ∂gi (xk ),
i = 1, 2, . . . , m, such that
m
X
−{ξ0k + αik ξik + (xk − x̄)} ∈ NX (xk ), (5.3)
i=1
Observe that
m
X
(λk0 )2 + (λki )2 = 1, ∀ k ≥ k̄. (5.5)
i=1
which implies
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i=1
Also, by condition (5.1), f (xk ) < f (x̄) for sufficiently large k and hence con-
dition (ii) is satisfied, thereby yielding the requisite result.
Observe that the condition (ii) in the above theorem is a condition that
replaces the complementary slackness condition in the Fritz John optimal-
ity condition. According to the condition (ii), if λi > 0, the corresponding
constraint gi is violated at the points arbitrarily close to x̄. Thus the con-
dition (ii) is called the complementary violation condition by Bertsekas and
Ozdaglar [13].
Now let us consider, in particular, X = Rn and gi , i = 1, 2, . . . , m,
to be affine in (CP 1). Then from the above theorem there exist λi ≥ 0,
i = 0, 1, . . . , m, not all simultaneously zero, such that conditions (i) and (ii)
hold. Due to affineness of gi , i = 1, 2, . . . , m, we have
As all the scalars cannot be all simultaneously zero, the index set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. By condition (ii), there exists
a sequence {xk } ⊂ Rn such that gi (xk ) > 0 for i ∈ I. ¯ Therefore, by (5.7),
which along with the above condition for x = xk , leads to
m
X m
X X
λi gi (x̄) = λi gi (xk ) = λi gi (xk ) > 0,
i=1 i=1 i∈I¯
Theorem 5.4 Consider the problem (CP 1) and let x̄ be a feasible point of
(CP 1). Then x̄ is pseudonormal under either one of the following two criteria:
Proof. (a) Suppose on the contrary that x̄ is not pseudonormal, which implies
that there exist λi , i = 1, 2, . . . , m, and {xk } ⊂ Rn satisfying conditions (i),
(ii), and (iii) in the Definition 5.3. By the affineness of gi , i = 1, 2, . . . , m, for
every x ∈ Rn ,
which implies
m
X m
X m
X
λi gi (x) = λi gi (x̄) + λi h∇gi (x̄), x − x̄i, ∀ x ∈ Rn . (5.8)
i=1 i=1 i=1
By condition (i) in the definition of pseudonormality, there exist ξ¯i ∈ ∂gi (x̄),
i = 1, 2, . . . , m, such that
m
X
λi hξ¯i , x − x̄i ≥ 0, ∀ x ∈ X.
i=1
The above inequality along with condition (ii) reduces the condition (5.9) to
m
X
λi gi (x) ≥ 0, ∀ x ∈ X. (5.10)
i=1
if λi > 0 for some i ∈ I(x̄). Thus, the condition (5.10) holds only if λi = 0
for i = 1, 2, . . . , m. But then this contradicts condition (iii). Therefore, x̄ is
pseudonormal.
In Chapter 3 we derived the KKT optimality conditions under the Slater
constraint qualification as well as the Abadie constraint qualification. For
the convex programming problem (CP ) considered in previous chapters, the
feasible set C was given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}.
where I(x̄) is the active index set at x̄. But unlike the Slater constraint quali-
fication, the Abadie constraint qualification need not imply pseudonormality.
For better understanding, let us recall the example
C = {x ∈ R : |x| ≤ 0, x ≤ 0}.
From the discussion in Chapter 3, we know that the Abadie constraint qual-
ification is satisfied at x̄ = 0 but the Slater constraint qualification does not
hold as the feasible set C = {0}. Observe that both constraints are active at
x̄. Taking the scalars λ1 = λ2 = 1 and the sequence {xk } as {1/k}, conditions
(i), (ii), and (iii) in Definition 5.3 are satisfied. Thus, x̄ = 0 is not pseudo-
normal. The Abadie constraint qualification is also known as quasiregularity
at x̄. This condition was defined for X = Rn . The notion of quasiregularity
is implied by the concept of quasinormality. This concept was introduced by
Hestenes [55] for the case X = Rn . The notion of quasinormality is further
implied by pseudonormality.
Now if X 6= Rn , the quasiregularity at x̄ is defined as
The above condition was studied by Gould and Tolle [53] and Guignard [54].
It was shown by Ozdaglar [91] and Ozdaglar and Bertsekas [92] that under
the regularity (Chapter 2 end notes) of the set X, pseudonormality implies
quasiregularity. They also showed that unlike the case X = Rn where quasi-
regularity leads to KKT optimality conditions, the concept is not enough to
derive KKT conditions when X 6= Rn unless some additional conditions are
assumed. For more on quasiregularity and quasinormality, readers are advised
to refer to the works of Bertsekas.
Next we establish the KKT optimality conditions under the pseudonor-
mality assumptions at the point of minimizer.
Theorem 5.5 Consider the convex programming problem (CP 1). Assume
that x̄ satisfies pseudonormality. Then x̄ is a point of minimizer of (CP 1)
if and only if there exist λi ≥ 0, i = 1, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Proof. For the positive integers k and r, consider the saddle point function
Lk,r : X × Rm
+ → R defined as
Xm
1 2 1
Lk,r (x, α) = f (x) + 3
kx − x̄k + αi gi (x) − kαk2 .
k i=1
2r
Xk = X ∩ B̄k (x̄).
{α ∈ Rm
+ : Lk,r (x̄, α) ≥ β}
is nonempty and compact. Thus by condition (iii) of the Saddle Point Theo-
rem, Proposition 4.1, Lk,r has a saddle point over Xk × Rm
+ , say (xk,r , αk,r ).
By the saddle point definition,
Xm
1 2 1
Lk,r (xk,r , αk,r ) = f (xk,r ) + 3 kxk,r − x̄k + αk,r i gi (xk,r ) − kαk,r k2
k i=1
2r
Xm
1 2
≤ inf {f (x) + kx − x̄k + αk,r i gi (x)}
x∈Xk k3 i=1
Xm
1 2
≤ inf {f (x) + kx − x̄k + αk,r i gi (x)}
x∈Xk ,gi (x)≤0,∀i k3 i=1
1
≤ inf {f (x) + kx − x̄k2 }.
x∈Xk ,gi (x)≤0,∀i k3
1 r
Lk,r (xk,r , αk,r ) = f (xk,r ) + kxk,r − x̄k2 + kg + (xk,r )k2 , (5.14)
k3 2
which implies that
Lk,r (xk,r , αk,r ) ≥ f (xk,r ). (5.15)
Further note that using the definition of Lk,r (xk,r , αk,r ) and using (5.12) and
(5.15), for every k,
m
X
lim αk,ri gi (xk,r ) = 0.
r→+∞
i=1
Denote
v
u m
u X 1 αk,r i
γ k,r = t1 + (αk,r i )2 , λk,r
0 = and λk,r
i = , i = 1, 2, . . . , m.
i=1
γ k,r γ k,r
(5.17)
Dividing (5.16) by γ k,r > 0 leads to
m
X
lim {λk,r k,r
0 f (xk,r ) − λ0 f (x̄) + λk,r
i gi (xk,r )} = 0.
r→∞
i=1
and
1 1 1
kxk,rk − x̄k ≤ , |f (xk,rk ) − f (x̄)| ≤ , |gi+ (xk,rk )| ≤ , i = 1, 2, . . . , m.
k k k
(5.19)
Dividing the saddle point condition
by γ k,rk yields
Xm
λk,r k
λk,r
0
k
f (xk,rk ) 0 2
+ 3 kxk,rk − x̄k + λk,r
i
k
gi (xk,rk )
k i=1
Xm
1
≤ λk,r
0
k
f (x) + kx − x̄k2
+ λk,r
i
k
gi (x), ∀ x ∈ Xk .
k 3 γ k,rk i=1
Therefore, {λk,r
i
k
}, i = 0, 1, . . . , m, are bounded sequences in R+ and thus
by the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent
subsequence. Without loss of generality, assume that λk,r i
k
→ λi ≥ 0,
which implies
m
X
λ0 f (x̄) ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀i
i=1
≤ inf λ0 f (x)
x∈X,gi (x)≤0,∀i
= λ0 f (x̄).
gi (xk,rk ) > 0, ∀ i ∈ I¯
for sufficiently large k. For each k, choosing rk such that xk,rk 6= x̄ and the
condition (5.19) is satisfied, implies that
f (xk,rk ) ≤ f (x̄),
m
X
(ii) λi gi (x′ ) > 0.
i=1
sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i.
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2
Now consider a set G = {g(x) = (g1 (x), g2 (x), . . . , gm (x)) : x ∈ X}. Then
from Definition 5.7 it is easy to observe that pseudonormality implies that
there exists no hyperplane H that separates the set G and the origin {0}
properly.
Similar to Theorem 5.4, the pseudonormality of the constraint set can be
derived under the Linearity criterion or the Slater constraint qualification.
Theorem 5.8 Consider the problem (CP 1). Then the constraint set is
pseudonormal under either one of the following two criteria:
Proof. (a) Suppose on the contrary that the constraint set is not pseudo-
normal, which implies that there exist λi ≥ 0, i = 1, 2, . . . , m, and a vector
x′ ∈ Rn satisfying conditions (i) and (ii) in the Definition 5.7. Suppose that
x̄ ∈ Rn is feasible to (CP 1), that is, gi (x̄) ≤ 0, i = 1, 2, . . . , m, which along
with condition (i) yields
Xm
λi gi (x̄) = 0. (5.20)
i=1
By the affineness of gi , i = 1, 2, . . . , m,
λ X = Rn
λ
G
0 0
G
H H
Pm
By Definition 2.36 of the normal cone, i=1 λi ∇gi (x̄) ∈ NRn (x̄). As x̄ ∈ Rn ,
by Example 2.38, the normal cone NRn (x̄) = {0}, which implies
m
X
λi ∇gi (x̄) = 0.
i=1
This equality along with the condition (5.20) and the affineness of gi ,
i = 1, 2, . . . , m implies that
m
X
λi gi (x) = 0, ∀ x ∈ Rn ,
i=1
λ H
0
Proof. Suppose that in the enhanced Fritz John optimality condition, Theo-
rem 5.6, λ0 = 0. This implies
m
X
0 = min λi gi (x),
x∈X
i=1
that is, satisfying condition (ii) in Definition 5.7, thereby contradicting the
fact that the constraint sets are pseudonormal. Thus, λ0 6= 0 and hence can
be taken in particular as one, thereby establishing the optimality condition.
Using the optimality condition along with the feasibility of x̄ leads to
m
X
0≤ λi gi (x̄) ≤ 0,
i=1
that is,
m
X
λi gi (x̄) = 0.
i=1
As the sum of nonpositive terms is zero if every term is zero, the above equality
leads to
λi gi (x̄) = 0, i = 1, 2, . . . , m,
In particular, for any x feasible to (CP 1), that is, x ∈ X satisfying gi (x) ≤ 0,
i = 1, 2, . . . , m, the above inequality reduces to
f (x̄) ≤ f (x),
condition? The answer is yes and we present a result from Bertsekas [12] and
Bertsekas, Ozdaglar, and Tseng [14] to establish the enhanced Fritz John op-
timality conditions similar to those derived in Section 5.3. But in the absence
of a point of minimizer of (CP 1), the multipliers are now dependent on the
infimum, as one will observe in the theorem below.
Theorem 5.10 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn
and let finf < +∞ be the infimum of (CP 1). Then there exist λi ≥ 0 for
i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
λ0 finf = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
which implies that µ(d10 , d1 ) + (1 − µ)(d20 , d2 ) ∈ M for every µ ∈ [0, 1]. Hence
M is a convex subset of R × Rm .
Next we prove that (finf , 0) ∈ / int M. On the contrary, suppose that
(finf , 0) ∈ int M, which by Definition 2.12 implies that there exists ε > 0
such that (finf − ε, 0) ∈ M. Thus, there exists x ∈ X such that
From the above condition it is obvious that x is a feasible point of (CP 1),
thereby contradicting the fact that finf is the infimum of the problem (CP 1).
which implies
m
X
λ0 finf ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀i
i=1
≤ inf λ0 f (x)
x∈X,gi (x)≤0,∀i
= λ0 finf ,
Theorem 5.11 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn
and let finf < +∞ be the infimum of (CP 1). Assume that the Slater-type
Theorem 5.12 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex
set X ⊂ Rn and let finf < +∞ be the infimum of (CP 1). Then there exist
λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 finf = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
Proof. If for every x ∈ X, f (x) ≥ finf , then the result holds for λ0 = 1 and
λi = 0, i = 1, 2, . . . , m.
Now suppose that there exists an x̄ ∈ X such that f (x̄) < finf , thereby
implying that finf is finite. Consider the minimization problem
min f (x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ Xk . (CP 1k )
In (CP 1k ), Xk is a closed convex subset of Rn defined as
Xk = X ∩ B̄βk (0), ∀ k ∈ N
and β > 0 is chosen to be sufficiently large such that for every k, the constraint
set
{x ∈ Xk : gi (x) ≤ 0, i = 1, 2, . . . , m}
Xm
δk2 2 kαk2
Lk,r (x, α) = f (x) + 2
kx − x̄k k + αi gi (x) − .
4k i=1
2r
Observe that the above function is similar to the saddle point function con-
1
sidered in Theorem 5.6 except that the term 3 kx − x̄k2 is now replaced by
k
δk2 2
kx − x̄k k . In Theorem 5.6, x̄ is a point of minimizer of (CP 1) whereas
4k 2
here the infimum is not attained and thus the term involves x̄k , the point of
minimizer of the problem (CP 1k ) and δk .
Now working along the lines of proof of Theorem 5.6, Lk,r has a saddle
point over Xk × Rm + , say (xk,r , αk,r ), which by the saddle point definition
implies
Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm
+ . (5.22)
Note that in the proof, the problem (CP 1k ) is considered instead of (CP 1)
as in Theorem 5.6 and hence the condition obtained involves the point of
minimizer of (CP 1k ), that is, x̄k . Now as δk = f (x̄k )−finf , the above equality
leads to
lim f (xk,r ) = finf + δk . (5.26)
r→∞
Now before continuing with the proof to obtain the multipliers for the Fritz
John optimality condition, we present a lemma from Bertsekas, Ozdaglar, and
Tseng [14].
δk
f (xk,r ) ≤ finf − . (5.27)
2
√
Furthermore, there exists a scalar rk ≥ 1/ δk such that
δk
f (xk,rk ) = finf − . (5.28)
2
Proof. Define δ = finf − f (x̄), where x̄ ∈ X is such that f (x̄) < finf . For
sufficiently large k, x̄ ∈ Xk . As x̄k is the point of minimizer of the problem
(CP 1k ), f (x̄k ) ≥ finf such that f (x̄k ) → finf , thus for sufficiently large k,
2δk
where yk = λk x̄+(1−λk )x̄k . Because 0 ≤ δk < δ, 0 ≤ < 1. Substituting
δk + δ
2δk
λk = in the above condition yields
δk + δ
From the saddle point condition (5.22) along with (5.24) and (5.25),
δk 5δk
finf − < f (xk,r ) < finf + . (5.31)
2 2
1
For r ≤ √ , by (5.27),
δk
δk
f (xk,r ) ≤ finf − .
2
1
Now for r = √ , we have two possibilities:
δk
δk
(i) f (xk,r ) = finf − ,
2
δk
(ii) f (xk,r ) < finf − .
2
If (i) holds, then we are done with r = rk . If (ii) holds, then it contradicts
δk
finf − < f (xk,r )
2
1
and thus, r must satisfy r > √ . As f (xk,r ) is continuous in r, by the
δk √
Intermediate Value Theorem, there exists rk ≥ 1/ δk such that
δk
f (xk,r ) = finf − ,
2
that is, (5.28) holds.
Now we continue proving the theorem. From the conditions (5.23) and
(5.25),
Xm
δk2 2
f (xk,r ) ≤ Lk,r (xk,r , αk,r ) ≤ inf {f (x) + 2 kx − x̄k k + αk,r i gi (x)}
x∈Xk 4k i=1
Xm
δk2 2
= f (xk,r ) + kxk,r − x̄k k + αk,r i gi (xk,r )
4k 2 i=1
≤ f (x̄k ).
Xm
δk2 2
lim f (xk,rk ) − finf + kxk,rk − x̄k k + αk,rk i gi (xk,rk ) = 0. (5.32)
k→∞ 4k 2 i=1
Define
v
u m
u X 1 αk,rk i
γk = t1 + 2
αk,r ki
, λk0 = , λki = , i = 1, 2, . . . , m. (5.33)
i=1
γk γk
Xm
δk2 λk0
lim λk0 f (xk,rk ) − λk0 finf + 2
kxk,rk − x̄k k + λki gi (xk,rk ) = 0. (5.34)
k→∞ 4k 2 i=1
Dividing the above inequality throughout by γk along with the fact that
kx − x̄k k ≤ 2βk for every x ∈ Xk implies that
Xm
δk2 λk0
λk0 f (xk,rk ) + kxk,r − x̄k k2
+ λki gi (xk,rk )
4k 2 k
i=1
Xm
δk2 λk0
≤ λk0 f (x) + kx − x̄k k2
+ λki gi (x).
4k 2 i=1
Therefore,
m
X
λ0 finf ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀ i
i=1
≤ inf λ0 f (x)
x∈X,gi (x)≤0,∀i
= λ0 finf ,
rk gi+ (xk,rk )
λki = , i = 1, 2, . . . , m.
γk
As k → +∞,
rk gi+ (xk,rk )
λi = lim , i = 1, 2, . . . , m.
k→∞ γk
In the beginning of the proof, we assumed that there exists x̄ ∈ X satisfying
f (x̄) < finf , which along with the condition (i) implies that the index set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Otherwise, if I¯ is empty, then
¯
gi (xk,rk ) > 0, ∀ i ∈ I.
1
In particular, for r = rk ≥ √ , conditions (5.23) and (5.24) yield
δk
rk + δ2 rk
f (xk,rk ) + kg (xk,rk )k2 ≤ f (xk,rk ) + k2 kxk,rk − x̄k k2 + kg + (xk,rk )k2
2 4k 2
≤ f (x̄k ),
which along with (5.28) and the relation δk = f (x̄k ) − finf ≥ 0 implies that
Lemma 5.14 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the convex set
X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the
infimum of (CP 1) and for every δ > 0, assume that
fδ = inf f (x).
x∈X,gi (x)≤δ,∀i
Then the supremum of (DP 1), wsup , satisfies fδ ≤ wsup for every δ > 0 and
wsup = lim fδ .
δ↓0
Proof. For the problem (CP 1), as the infimum finf exists and finf < +∞,
the feasible set of (CP 1) is nonempty, that is, there exists x̄ ∈ X satisfying
gi (x̄) ≤ 0, i = 1, 2, . . . , m. Thus for δ > 0, the problem
m
X
w(λ) = inf {f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ f (xδ ) + λi gi (xδ )
i=1
Xm
≤ fδ + δ + δ λi .
i=1
w(λ) ≤ lim fδ , ∀ λ ∈ Rm
+,
δ→0
Xm
−1
w(λ) ≤ +δ λi ,
δ i=1
Theorem 5.15 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex
set X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the
infimum of (CP 1) and wsup > −∞ be the supremum of (DP 1). Then there
exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 wsup = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
Proof. By the weak duality, wsup ≤ finf , which along with the hypothesis
implies that finf and wsup are finite. For k = 1, 2, . . ., consider the problem:
1
min f (x) subject to , i = 1, 2, . . . , m, x ∈ X.
gi (x) ≤ (CP 1k )
k4
k
By Lemma 5.14, the infimum finf of (CP 1k ) satisfies the condition
k
finf ≤ wsup for every k. For each k, consider x̂k ∈ X such that
1 1
f (x̂k ) ≤ wsup + and gi (x̂k ) ≤ , i = 1, 2, . . . , m. (5.36)
k2 k4
Now consider another problem:
1 ˆ 1k )
min f (x) subject to gi (x) ≤ , i = 1, 2, . . . , m, x ∈ X̂k , (CP
k4
where X̂k = X ∩ {x ∈ Rn : kxk ≤ k(maxj=1,...,k kx̂j k + 1)} is a compact
set. By the lower semicontinuity and convexity of f and gi , i = 1, 2, . . . , m,
over X, the functions are lsc convex and coercive on X̂k . Therefore, by the
Weierstrass Theorem, Theorem 1.14, (CP ˆ 1k ) has a point of minimizer, say
ˆ 1k ) which leads to
x̄k . From (5.36), x̂k is feasible for (CP
1
f (x̄k ) ≤ f (x̂k ) ≤ wsup + . (5.37)
k2
kµk k2
w(µk ) → wsup and → 0, (5.41)
2k
As αk ∈ Rm
+,from the above condition it is obvious that γk ≥ 1 for every k
and thus dividing (5.44) by γk yields
m
X
lim {λk0 (f (xk ) − wsup ) + λki gi (xk )} = 0. (5.46)
k→∞
i=1
which leads to
m
X
λ0 wsup ≤ inf {λ0 f (x) + λi gi (x)}. (5.47)
x∈X
i=1
Xm
λi
wsup ≤ inf {f (x) + gi (x)} = w(λ/λ0 ) ≤ wsup ,
x∈X λ
i=1 0
As finf exists and is finite, the feasible set of (CP 1) is nonempty, which implies
that there exists x ∈ X satisfying gi (x) ≤ 0, i = 1, 2, . . . , m. Therefore, the
above condition becomes
m
X
0 = inf λi gi (x).
x∈X
i=1
kgi+ (xk )
λki = , i = 1, 2, . . . , m.
δk
kgi+ (xk )
λi = lim , i = 1, 2, . . . , m.
k→∞ δk
¯ λi > 0, which implies for sufficiently large k, g + (xk ) > 0, that
For any i ∈ I, i
is,
¯
gi (xk ) > 0, ∀ i ∈ I.
Pm 1
By the condition (5.39), i=1 αik gi (xk ) = kαk k2 . Therefore, the above in-
k
equality becomes
m
X m+1
k(f (xk ) − wsup ) + (αik )2 ≤ .
i=1
k
Dividing the above inequality throughout by γk2 , which along with (5.45) im-
plies that
m
k(f (xk ) − wsup ) X k 2 m+1
+ (λi ) ≤ ,
γk2 i=1
kγk2
which as k → +∞ yields
Xm
k(f (xk ) − wsup )
lim sup 2 ≤− λ2i . (5.48)
k→∞ γk i=1
kαk k2
lim = 0. (5.49)
k→∞ 2k
The condition (5.39) along with (5.41) and (5.43) leads to
kαk k2
lim (f (xk ) − wsup ) + = 0,
k→∞ 2k
which together with (5.48) implies that f (xk ) → wsup . Also, (5.49) along with
(5.39) and (5.48) yields
m
X
lim k (gi+ (xk ))2 = 0,
k→∞
i=1
Thus for nonempty I, ¯ the sequence {xk } ⊂ X satisfies condition (ii), thereby
establishing the desired result.
6.1 Introduction
In the last few chapters we saw how fundamental the role of constraint qual-
ification is like the Slater constraint qualification in convex optimization. In
Chapter 3 we saw that a relaxation of the Slater constraint qualification to
the Abadie constraint qualification leads to an asymptotic version of the KKT
conditions for the nonsmooth convex programming problems. Thus it is inter-
esting to ask whether it is possible to develop necessary and sufficient opti-
mality conditions for (CP ) without any constraint qualifications. Recently a
lot of work has been done in this respect in the form of sequential optimality
conditions. But to the best of our knowledge the first step in this direction
was taken by Ben-Tal, Ben-Israel, and Zlobec [7]. They obtained the necessary
and sufficient optimality conditions in the smooth scenario in the absence of
constraint qualifications. This work was extended to the nonsmooth scenario
by Wolkowicz [112]. All these studies involved direction sets, which we will
discuss below. So before moving on with the discussion of the results derived
by Ben-Tal, Ben-Israel, and Zlobec [7], and Wolkowicz [112], we present the
notion of direction sets. Before that we introduce the definition of a blunt
cone.
A set K ⊂ Rn is said to be a cone (Definition 2.18) if
λx ∈ K whenever λ ≥ 0 and x ∈ K,
whereas K is a blunt cone if K is a cone without origin, that is,
0∈
/K and λx ∈ K if x ∈ K and λ > 0.
For example, R2+ \ {(0, 0)} is a blunt cone while the set K ⊂ R2 given as
K = {(x, y) ∈ R2 : x = y} is not a blunt cone.
Definition 6.1 Let φ : Rn → R be a given function and let x̄ ∈ Rn be any
given point. Then the set
Dφrelation (x̄) = {d ∈ Rn : there exists ᾱ > 0 such that
φ(x̄ + αd) relation φ(x̄), ∀ α ∈ (0, ᾱ]},
243
In particular, the set Dφ= is called the cone of directions of constancy that was
considered by Ben-Tal, Ben-Israel, and Zlobec [7]. The other direction sets
were introduced in the work of Wolkowicz [112]. We present certain examples
of computing explicitly the set Dφ= (x̄) from Ben-Tal, Ben-Israel, and Zlobec [7].
For a strictly convex function φ : Rn → R, Dφ= (x̄) = {0} for any x̄ ∈ Rn .
Another interesting example from Ben-Tal, Ben-Israel, and Zlobec [7] is the
cone of the directions of constancy for the so-called faithfully convex function
given as
where N ull(S) is the null space of the matrix S. It is obvious that the null
space is contained in Dφ= . For the sake of completeness, we provide an expla-
nation for the reverse containment. We consider the following cases.
2. ha, di = 0: Suppose that d ∈ Dφ= (x̄), which implies there exists ᾱ > 0
such that
Suppose Ad 6= 0, then Ax̄ + αAd + b 6= Ax̄ + b for every α ∈ (0, ᾱ]. Now
two cases arise. If h(Ax̄ + α̂Ad + b) = h(Ax̄ + b) for some α̂ ∈ (0, ᾱ],
then by the strict convexity of h, for every λ ∈ (0, 1),
which implies
/ Dφ= (x̄).
and hence, d ∈
The second case is that h(Ax̄+αAd+b) 6= h(Ax̄+b) for every α ∈ (0, ᾱ].
Then again it implies that d 6∈ Dφ= (x̄), which violates our assumption.
Therefore, for d to be a direction of constancy, Ad = 0 .
This condition holds for every x1 , x2 ∈ [x̄, x̄ + αd] and thus φ is strictly
convex on [x̄, x̄ + αd]. Hence as mentioned earlier, Dφ= (x̄) = {0} for the
strictly convex function φ. But this contradicts the fact that d 6= 0.
Below we present some results on the direction sets that will be required
in deriving the optimality conditions from Ben-Tal, Ben-Israel, and Zlobec [7],
Ben-Tal and Ben-Israel [6], and Wolkowicz [112].
Proof. (i) Consider d ∈ Dφ= (x̄), which implies there exists ᾱ > 0 such that
For any λ ∈ [0, 1], consider d = λd1 + (1 − λ)d2 . The convexity of φ along with
(6.1) on d1 and d2 yields
that is,
φ(x̄ + αd) ≤ φ(x̄), ∀ α ∈ (0, ᾱ]. (6.2)
Again, by the convexity of φ for the differentiable case, for every α ∈ (0, ᾱ],
as d1 , d2 ∈ Dφ= (x̄). This inequality along with the condition (6.2) implies that
d ∈ Dφ= (x̄), thereby leading to the convexity of Dφ= .
(iii) We will prove the result for Dφ< (x̄). Consider d ∈ Dφ< (x̄), which implies
there exists ᾱ > 0 such that
thereby implying the convexity of Dφ< . From Definition 6.1, it is obvious that
Dφ< is open by using the continuity of φ.
Consider d ∈ Dφ< (x̄), which implies that there exists ᾱ > 0 such that
which implies d ∈ Dφ< (x̄), thereby establishing the equality in the relation
(6.5). Working along the above lines of proof, readers are advised to prove the
result for Dφ≤ (x̄).
Note that unlike (iii) where the relation holds as equality, one is able to
prove only inclusion in (i) and not equality. For example, consider the strict
convex function φ : R → R defined as φ(x) = x2 . For x̄ = 0, Dφ= (x̄) = {0} and
∇φ(x̄) = 0. Observe that
Hence, the equality need not hold in (i) even for a differentiable function. Also,
for a differentiable function φ : Rn → R, if there are n linearly independent
vectors di ∈ Dφ= (x̄), i = 1, 2, . . . , n, then ∇φ(x̄) = 0. Observe that one needs
the differentiability assumption only in (ii). A careful look at the proof of (ii)
shows that to prove the reverse inequality in (6.2), we make use of (i) under
differentiability. So if φ is nondifferentiable, to prove the result one needs to
assume that for some ξ ∈ ∂φ(x̄), φ′ (x̄, d) = hξ, di = 0 for every d ∈ Dφ= (x̄).
For a better understanding, we illustrate with an example from Ben-Tal, Ben-
Israel, and Zlobec [7]. Consider a convex nondifferentiable function φ : R2 → R
defined as
φ(x1 , x2 ) = max{x1 , x2 }.
which is not convex. Note that h(ξ¯1 , ξ¯2 ), (d, 0)i = 0 for ξ¯ = (0, 1) whereas
h(ξ˜1 , ξ˜2 ), (0, d)i = 0 for ξ˜ = (1, 0), that is, φ′ (x̄, d) = 0 for ξ,
¯ ξ˜ ∈ ∂φ(x̄) with
ξ¯ 6= ξ.
˜
With all these discussions on the direction sets, we move on to study the
work done by Ben-Tal, Ben-Israel, and Zlobec [7].
Thus, to establish our claim, we will prove the reverse inclusion in the con-
dition (6.6). Consider any i ∈ Ω∗ . Define a differentiable convex function
Gi : R → R as Gi (λ) = gi (x̄ + λd). Therefore,
Gi (λ + δ) − Gi (λ)
∇Gi (λ) = lim
δ↓0 δ
gi (x̄ + (λ + δ)d) − gi (x̄ + λd)
= lim ,
δ↓0 δ
gi (x̄ + (λ + δ)d)
∇Gi (λ) = lim = h∇gi (x̄), di = 0.
δ↓0 δ
thereby proving the claim. As the claim holds for every i ∈ Ω∗ , x̄ is not a
point of minimizer of (CP ) implies that the system (CPΩ ) is consistent.
Conversely, suppose that the system (CPΩ ) is consistent for some subset
Ω ⊂ I(x̄), that is,
Similarly, from the condition (6.9), there exist ᾱi > 0, i ∈ Ω such that
From (6.10), d ∈ Di= (x̄), i ∈ Ω∗ , which by Definition 6.1 implies that there
exist ᾱi > 0, i ∈ Ω∗ such that
Define ᾱ = min{ᾱf , ᾱ1 , . . . , ᾱm }. Therefore, the conditions (6.12), (6.13), and
(6.14) hold for ᾱ as well, which implies x̄ + ᾱd ∈ C, that is, feasible for (CP ).
By the strict inequality (6.11),
Observe that x̄ = (−1, 0) is the point of minimizer of the above problem. The
KKT optimality condition at x̄ is given by
−1 1 0 0
+ λ1 + λ2 = ,
1 1 0 0
y1 + y2 + . . . + ym + ym+1 = 0.
Theorem 6.5 Consider the convex programming problem (CP ) with C given
by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then
x̄ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(x̄) the
system
X
= ◦
0 ∈ λ0 ∇f (x̄) + λi ∇gi (x̄) + (DΩ ∗ (x̄)) ,
i∈Ω (CPΩ′ )
λ0 ≥ 0, λi ≥ 0, i ∈ Ω, not all simultaneously zeros
is consistent, where
\
Di= (x̄), if Ω∗ 6= ∅,
=
DΩ ∗ (x̄) = i∈Ω∗
n
R , if Ω∗ = ∅.
i∈Ω
where Di< (x̄) = Dg<i (x̄), i ∈ Ω. By Proposition 6.2 (c), Df< (x̄) and Di< (x̄),
=
i ∈ Ω are open blunt convex cones while DΩ ∗ (x̄), being the intersection of
convex cones, is itself a convex cone. Applying Propositions 6.2 (iv) and 2.80,
i∈Ω
Proof. We prove the result by establishing the negation. From the definition
of regularization condition, it is equivalent to verifying that x̄ is not a point of
minimizer of (CP ) if and only if the system (CPI(x̄) ) is consistent. If (CPI(x̄) )
is consistent, then by Theorem 6.3, x̄ is not a point of minimizer of (CP ).
Conversely, suppose that x̄ is not a point of minimizer for (CP ). Again, by
Theorem 6.3, there exists a subset Ω̄ ∈ I(x̄) and d¯ ∈ Rn such that the system
¯ < 0,
h∇f (x̄), di
¯ < 0, i ∈ Ω̄,
h∇gi (x̄), di (CPΩ̄ )
d¯ ∈ Di= (x̄), i ∈ Ω̄∗ = I(x̄)\Ω̄.
By Proposition 6.2 (i),
¯ = 0, i ∈ Ω̄∗ .
h∇gi (x̄), di (6.15)
Define d˜ = d¯+ α(x̂ − x̄) for α > 0 sufficiently small. Then using the condition
(6.15), the system
where
S(x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)}.
Recall that we introduced the set S(x̄) in Section 3.3 and proved in Proposi-
tion 3.9 that
b
(S(x̄))◦ = cl S(x̄),
where
X
b
S(x̄) ={ λi ξi : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)}.
i∈I(x̄)
The set I b (x̄) is the set of constraints that create problems in KKT conditions.
A characterization of the above set in terms of the directional derivative was
stated by Wolkowicz [112] without proof. We present the result with proof for
a better understanding.
Theorem 6.9 Consider the convex programming problem (CP ) with C given
by (3.1). Let i∗ ∈ I = . Then i∗ ∈ I b (x̄) if and only if the system
gi′∗ (x̄, d) = 0,
gi′ (x̄, d) ≤ 0, ∀ i\∈ I(x̄)\i∗ ,
(CPb )
d∈ / Di=∗ (x̄) ∪ cl Di= (x̄).
i∈I =
is consistent.
Proof. Suppose that i∗ ∈ I b (x̄), which implies there exists d∗ ∈ Rn such that
∗
d ∈ Di>∗ (x̄),
∗
d ∈ S(x̄),\
d ∗
∈
/ cl Di= (x̄).
i∈I =
As d∗ ∈ Di>∗ (x̄), d∗ ∈
/ Di=∗ (x̄), which along with the last condition implies
\
d∗ ∈
/ Di=∗ (x̄) ∪ cl Di= (x̄). (6.16)
i∈I =
Also, as d∗ ∈ Di>∗ (x̄), by Definition 6.1 there exists α∗ > 0 such that
Therefore,
which implies
gi′∗ (x̄, d∗ ) ≥ 0. (6.17)
Because d∗ ∈ S(x̄),
gi′ (x̄, d∗ ) ≤ 0, ∀ i ∈ I(x̄). (6.18)
In particular, taking i∗ ∈ I = ⊆ I(x̄) in the above inequality along with (6.17)
yields
gi′∗ (x̄, d∗ ) = 0. (6.19)
Combining the conditions (6.16), (6.18), and (6.19) together imply that d∗
solves the system (CPb ), thereby leading to its consistency.
d∗ ∈ S(x̄). (6.21)
Also, from the inequality (6.20), there exists α∗ > 0 such that
As d∗ 6∈ Di=∗ (x̄), the above inequality holds as a strict inequality and hence
then
which implies Di≤ (x̄) = Di= (x̄) for every i ∈ I = . Therefore, by this condition,
\ \ \
Di≤ (x̄) = Di= (x̄) ∩ Di≤ (x̄),
i∈I(x̄) i∈I = i∈I < (x̄)
as desired.
(ii) If I(x̄) = ∅, the result holds trivially by (i). Suppose that I = and I < (x̄)
are nonempty. Then corresponding to any i ∈ I < , there exists some x̂ ∈ C
such that gi (x̂) < 0. By the convexity of gi , for every λ ∈ (0, 1],
which implies that dˆ = x̂ − x̄ ∈ Di< (x̄). Also, suppose that there is some
j ∈ I < (x̄), j 6= i, then corresponding to j there exists some x̃ ∈ C such that
gj (x̃) < 0. Then as before, d˜ = x̃ − x̄ ∈ Dj< (x̄). Now if i and j are such that
which implies for λ ∈ (0, 1), (λx̂ + (1 − λ)x̃) − x̄ = λdˆ + (1 − λ)d˜ such that
with corresponding ᾱ > 0 such that x̄+αd ∈ C for every α ∈ (0, ᾱ]. Therefore,
for every i ∈ I = ,
that is,
d ∈ Di= (x̄), ∀ i ∈ I = ,
which along with the condition (6.23) proves the desired result.
(iii) Consider a feasible point x ∈ C of (CP ) that implies
gi (x) ≤ 0, ∀ i ∈ I(x̄).
T
As i∈I(x̄) Di≤ (x̄) is a cone,
\
cone (C − x̄) ⊂ Di≤ (x̄). (6.24)
i∈I(x̄)
T
Suppose that d ∈ i∈I(x̄) Di≤ (x̄), which implies there exists ᾱ > 0 such that
For i 6∈ I(x̄), gi (x̄) < 0 and thus, there exists some α′ > 0 such that for any
d ∈ Rn ,
By Theorem 2.35,
\
TC (x̄) = cl cone (C − x̄) = cl Di≤ (x̄),
i∈I(x̄)
thereby implying that d ∈ Di≤ (x̄), i ∈ Ω. Thus, the condition (6.27) becomes
\ \ \
co Di= (x̄) ⊂ Di= (x̄) ⊂ co Di= (x̄).
i∈Ω i∈Ω i∈Ω
T
The above relation implies that i∈Ω Di= (x̄) is convex, thereby leading to
\ \
co Di= (x̄) ∩ S(x̄) = Di= (x̄) ∩ S(x̄),
i∈Ω i∈Ω
T
as desired. Note that, in particular, for Ω = I = , i∈I = Di= (x̄) is convex.
\ \
(3) cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =
Suppose that Ω ( I = . We claim that
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =
but
\
/ Di= (x̄) ∪ cl
d∈ Di= (x̄), ∀ i ∈ Ω̃.
i∈I =
Therefore, \ \
d∈ Di< (x̄) ∩ Di= (x̄). (6.28)
i∈Ω̃ i∈I = \Ω̃
By (ii), as
\ \
Di= (x̄) ∩ Di< (x̄) 6= ∅,
i∈I = i∈I < (x̄)
which implies \
dλ ∈ Di< (x̄), ∀ λ ∈ (0, 1]. (6.32)
i∈Ω̃
Observe that
Therefore, combining (6.32), (6.33), and (6.34) along with the above relation
yields
\ \
dλ ∈ Di≤ (x̄) ∩ Di< (x̄).
i∈I(x̄) i∈Ω̃
Thus,
\
dλ ∈ Di= (x̄),
i∈Ω̃
which is a contradiction to
\
dλ ∈ Di< (x̄).
i∈Ω̃
Therefore,
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =
+ (1 − λ)d.
Denote dλ = λd[ ¯ Therefore, by Theorem 2.79 and Proposition 6.2,
for every ξ ∈ ∂gi (x̄),
i∈I < (x̄)
which again by Theorem 2.79 implies that for every i ∈ I < (x̄),
T
Also, by the convexity of i∈I = Di= (x̄),
\
dλ ∈ Di= (x̄), ∀ λ ∈ [0, 1). (6.38)
i∈I =
Thus, by the relations (6.37) and (6.38), along with (i), we obtain
\ \ \
dλ ∈ Di< (x̄) ∩ Di= (x̄) ⊂ Di≤ (x̄), ∀ λ ∈ [0, 1).
i∈I < (x̄) i∈I = i∈I(x̄)
T
As the limit λ → 1, dλ → d, which implies d ∈ cl i∈I(x̄) Di≤ (x̄), thus proving
(6.36), which yields that
\ \
cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di≤ (x̄).
i∈I = i∈I(x̄)
The above condition along with (6.35) establishes the desired result.
(vi) Define
[
F = −co ∂gi (x̄).
i∈I < (x̄)
By Proposition 2.31,
\
Di= (x̄) ⊂ {ξ}◦ = {d ∈ Rn : hξ, di ≤ 0},
i∈Ω
−F ◦ ⊂ −{ξ}◦ = {d ∈ Rn : hξ, di ≥ 0}.
Therefore,
\
−F ◦ ∩ Di= (x̄) ⊂ {d ∈ Rn : hξ, di = 0}.
i∈Ω
which implies
X
hξ, di = λi hξi , di < 0,
i∈I < (x̄)
Lemma 6.11 Consider the convex programming problem (CP ) with C given
by (3.1). Suppose that x̄ ∈ C and F ⊂ Rn any nonempty set. Then the
statement
Proof. Suppose that the statement is satisfied for any fixed objective func-
tion. We will prove the condition (6.40). Consider ξ ∈ NC (x̄) and define the
objective function as f (x) = −hξ, xi. Then ξ ∈ −∂f (x̄)∩NC (x̄), which implies
0 ∈ ∂f (x̄) + NC (x̄).
By the optimality conditions for (CP ), Theorem 3.1 (ii), x̄ is a point of min-
imizer of (CP ). Therefore, by (6.39) along with ∂f (x̄) = {−ξ} leads to
b
ξ ∈ S(x̄) + F,
that is,
b
NC (x̄) ⊂ S(x̄) + F. (6.41)
b
Now suppose that ξ ∈ S(x̄) + F , which implies there exist ξi ∈ ∂gi (x̄) and
λi ≥ 0 for i ∈ I(x̄) such that
X
ξ− λi ξi ∈ F.
i∈I(x̄)
Again define the objective function as f (x) = −hξ, xi, which implies
∂f (x̄) = {−ξ}. By the above condition it is obvious that the condition (6.39)
is satisfied and thus by the statement, x̄ is a point of minimizer of (CP ).
Applying Theorem 3.1, −ξ ∈ NC (x̄), which implies
b
S(x̄) + F ⊂ NC (x̄).
The above containment along with the relation (6.41) yields the desired con-
dition (6.40).
Conversely, suppose that (6.40) holds. By Theorem 3.1 (ii), x̄ is a point of
minimizer of (CP ) if and only if
0 ∈ ∂f (x̄) + NC (x̄),
b
0 ∈ ∂f (x̄) + S(x̄) + F,
that is, the system (6.39) is consistent, thereby completing the proof.
b
As mentioned in the beginning of this section, (S(x̄))◦ = cl S(x̄) by Propo-
b
sition 3.9. Therefore, if S(x̄) is closed, condition (6.40) becomes
NC (x̄) = (S(x̄))◦ + F.
A similar result as the above theorem was studied by Gould and Tolle [53]
under the assumption of differentiability of the functions but not necessarily
convex.
Applying the above lemma along with some additional conditions, Wolkow-
icz [112] established KKT type optimality conditions. We present the result
below.
I b (x̄) ⊂ Ω ⊂ I =
are closed. Then x̄ is a point of minimizer of (CP ) if and only if the system
X \
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + ( Di= (x̄))◦ ,
i∈I(x̄) i∈Ω (6.42)
λi ≥ 0, i ∈ I(x̄),
is consistent.
The closedness assumption leads to the condition (6.43), thereby yielding the
requisite result.
In the above theorem, the closedness conditions on the sets
\ \
co b
Di= (x̄) and S(x̄) +( Di= (x̄))◦
i∈Ω i∈Ω
is always satisfied. Below we present the result for this particular case.
0 ∈ ∂f (x̄) + NC (x̄).
Observe that int Di≤ (x̄) = Di< (x̄) for every i ∈ I < (x̄). Thus, invoking Propo-
sitions 2.31 and 6.10 implies
X \
NC (x̄) = TC (x̄)◦ = (Di≤ (x̄))◦ + ( Di= (x̄))◦ .
i∈I < (x̄) i∈I =
Again by Proposition 6.10 (ii), Di< (x̄) 6= ∅, which along with Proposi-
tion 6.2 (iv) yields
X \
NC (x̄) = { λi ∂gi (x̄) : λi ≥ 0, i ∈ I < (x̄)} + ( Di= (x̄))◦ .
i∈I < (x̄) i∈I =
As NC (x̄) is a closed convex cone, the above relation along with (6.46) leads
to
\
b
S(x̄) +( Di= (x̄))◦ ⊂ NC (x̄),
i∈I =
For the differentiable case, Gould and Tolle [52, 53] showed that the Abadie
constraint qualification, that is,
TC (x̄) = S(x̄)
Proof. Suppose that I b (x̄) = ∅. Therefore by Proposition 6.10 (iii) and (v),
it is obvious that TC (x̄) = S(x̄).
Conversely, let I b (x̄) 6= ∅, which implies there exists i∗ ∈ I b (x̄) such that
∗
i ∈ I = and there exists
\
v ∗ ∈ (Di>∗ (x̄) ∩ S(x̄))\cl Di= (x̄).
i∈I =
C = {x ∈ Rn : x2 ≤ 0, x ≤ 0}
Thus, TC (x̄) 6= S(x̄), thereby showing that the Abadie constraint qualification
is not satisfied. Also by the definitions of the cones of directions, we have
Observe that I b (x̄) = {1}, that is, the set of badly behaved constraints is
nonempty.
Next let us consider the set
C = {x ∈ R : |x| ≤ 0, x ≤ 0}.
Suppose that the Slater constraint qualification is satisfied, that is, there exists
x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. We claim that λ0 6= 0. On the
contrary, assume that λ0 = 0. Then the above condition implies that there
exist λi ≥ 0, i ∈ I(x̄), not all simultaneously zero, such that
X
0∈ λi ∂gi (x̄),
i∈I(x̄)
which implies that there exist ξi ∈ ∂gi (x̄), i ∈ I(x̄), such that
X
0= λi ξi . (6.47)
i∈I(x̄)
is a closed set. Invoking the Strict Separation Theorem, Theorem 2.26 (iii),
there exists d¯ ∈ Rn and d¯ 6= 0 such that
[
¯ < 0, ∀ z ∈ cone co
hz, di ∂gi (x̄).
i∈I(x̄)
which implies
¯ − gi (x̄)
gi (x̄ + λd) ¯
gi (x̄ + λd)
lim = lim < 0.
λ↓0 λ λ↓0 λ
Therefore, for every λ > 0,
¯ < 0, ∀ i ∈ I(x̄).
gi (x̄ + λd) (6.49)
Proof. Suppose that condition (i) holds. We claim that condition (ii) is also
satisfied. On the contrary, assume that (ii) does not hold, which along with
the fact that for a real-valued sublinear function p : Rn → R, p(0) = 0 implies
that for any λi ≥ 0, i = 1, 2, . . . , m, x̄ = 0 is not a point of minimizer of the
unconstrained problem
m
X
min p̃0 (x) + λi p̃i (x) subject to x ∈ Rn .
i=1
Note that [li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ] forms a convex polytope in Rn
with 2n vertices denoted by
r r r
(vir ) = (vi1 , vi2 , . . . , vin ), i = 1, 2, . . . , m, r = 1, 2, . . . , 2n ,
r
where vij ∈ {lij , uij }. Also, any element in the polytope can be expressed as
the convex combination of the vertices. Therefore,
Consider
( m
)
X
suphα, ξi = sup hα, ξi : ξ ∈ { λi ∂ p̃i (0) : λi ≥ 0, i = 1, 2, . . . , m}
ξ∈P i=1
m
X
≥ sup{hα, ξi : ξ ∈ λi ∂ p̃i (0)}, ∀ λ ∈ Rm
+
i=1
m
X
= λi p̃i (α), ∀ λ ∈ Rm
+.
i=1
which implies
m
X
λi p̃i (α) < −p̃0 (α), ∀ λ ∈ Rm
+.
i=1
Therefore, for α ∈ Rn ,
But Theorem 6.19 cannot be applied directly to the above system as the
theorem is for the system involving sublinear functions, whereas here neither
pi (x) − bi , i = 1, 2, . . . , m, nor p0 (x) − p0 (x̄) is positively homogeneous and
hence not sublinear functions. So define p̃i : Rn × R → R, i = 0, 1, . . . , m, as
p̃0 (x, t) = p0 (x) − tp0 (x̄) and p̃i (x, t) = pi (x) − tbi , i = 1, 2, . . . , m.
This system is in the desired form needed for the application of Farkas’
Lemma, Theorem 6.19. To establish the result, we will first establish the equiv-
alence between the systems (6.53) and (6.54).
Suppose that the system (6.53) holds. We claim that (6.54) is also satisfied.
On the contrary, assume that the system (6.54) does not hold, which implies
there exists (x̃, t̃) ∈ Rn × R such that
For t̃ > 0, by positive homogeneity of the sublinear function and the con-
struction of p̃i , i = 0, 1, . . . , m,
and
which contradicts (6.53). Thus from all three cases, it is obvious that our
assumption is wrong and hence the system (6.54) holds.
Conversely, taking t = 1 in system (6.54) yields (6.53). Hence, both systems
(6.53) and (6.54) are equivalent. Applying Farkas’ Lemma, Theorem 6.19, for
the sublinear systems to (6.54), there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
p̃0 (x, t) + λi p̃i (x, t) ≥ 0, ∀ (x, t) ∈ Rn × R,
i=1
where the subdifferential is with respect to x and the gradient with respect to
t. Therefore, a componentwise comparison leads to
m
X
0 ∈ ∂(p0 + λi pi )(0) and p0 (x̄) + λi bi = 0.
i=1
7.1 Introduction
281
where lim supxi →φi −h,i x̄ {∂φ1 (x1 ) + ∂φ2 (x2 )} denotes the set of all limits
limk→∞ (ξ1k + ξ2k ) for which there exists xki → x̄, i = 1, 2 such that
ξik ∈ ∂φi (xki ), i = 1, 2, and
φi (xki ) − hξik , xki − x̄i → φi (x̄), i = 1, 2. (7.1)
Proof. Suppose that ξ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.120,
\
ξ∈ cl{∂1/k φ1 (x̄) + ∂1/k φ2 (x̄)},
k∈N
that is,
which implies for i = 1, 2, there exist xki → x̄, ξik ∈ ∂φi (xki ) satisfying
φi (xki ) − hξik , xki − x̄i → φi (x̄) and
thereby yielding
for every x ∈ Rn . Taking the limit as k → +∞ and using the condition (7.1),
the above inequality reduces to
0 ∈ core(dom φ1 − dom φ2 ),
ξ = lim (ξ1k + ξ2k ) and γik = φi (xki ) − hξik , xki − x̄i → φi (x̄). (7.5)
k→∞
Therefore,
My
hξ1k , yi ≤ , ∀ k ∈ N,
α
My
that is, {hξ1k , yi} is bounded above by which is independent of k. Simi-
α
k
larly, the sequence {hξ1 , −yi} is bounded above. In particular, taking y = ei ,
i = 1, 2, . . . , n, where ei is a vector in Rn with i-th component 1 and all other
zeroes,
Thus, {ξ1k } is a bounded sequence. As ξ1k + ξ2k → ξ, {ξ2k } is also a bounded se-
quence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, the sequences
{ξik }, i = 1, 2, have a convergent subsequence. Without loss of generality,
assume that ξik → ξi , i = 1, 2, such that ξ1 + ξ2 = ξ. By Theorem 2.84,
ξi ∈ ∂φi (x̄), i = 1, 2, thereby yielding
ξ1k + ξ2k → 0, f (xk1 ) − hξ1k , xk1 − x̄i → f (x̄) and hξ2k , xk2 − x̄i → 0.
0 ∈ ∂(f + δC )(x̄).
Applying Theorem 7.1, there exist sequence {xki } ⊂ Rn with xki → x̄, i = 1, 2,
ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ δC (xk2 ) = NC (xk2 ) satisfying
such that
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
Also,
Φ((1 − λ)x1 + λx2 ) ∈ Φ(Rn ) ⊂ Φ(Rn ) + Rm
+.
Proof. Define φ1 (x, y) = φ(y) and φ2 (x, y) = δepi Φ (x, y). We claim that
φ1 (x, y) + φ2 (x, y) − φ1 (x̄, ȳ) − φ2 (x̄, ȳ) ≥ hξ, x − x̄i + h0, y − ȳi,
which implies that ξ ∈ ∂(φ ◦ Φ)(x̄), thereby establishing our claim (7.7).
Now by Theorem 7.1,
where
with
yk′ − Φ(xk ) ∈ Rm
+.
y ′ + yk′ − Φ(xk ) ∈ Rm
+,
that is, (xk , y ′ +yk′ ) ∈ epi Φ. In particular, taking x = xk and setting y = yk′ +y ′
for any y ′ ∈ Rm + in (7.11) yields
hηk , y ′ i ≥ 0, ∀ y ′ ∈ Rm
+,
For a convex function φ, the convex subdifferential and the Clarke subdiffer-
ential coincide, that is, ∂φ(x̄) = ∂ ◦ φ(x̄). Now we present the lemma that plays
an important role in obtaining the Sequential Chain Rule.
ωk ∈ ∂ ◦ (λk Φ)(xk ), ∀ k ∈ N.
Then ωk → 0.
Therefore,
m
X m
X
kωk k = k λki ωik k ≤ |λki | kωik k.
i=1 i=1
satisfying
xk → x̄, yk → Φ(x̄), ξk → ξ,
φ(yk ) − hηk , yk − ȳi → φ(ȳ) and hηk , Φ(xk ) − ȳi → 0.
which implies that there exist ρk ∈ ∂ ◦ (ζk Φ)(xk ) and ̺ ∈ ∂ ◦ ((ηk − ζk )Φ)(xk )
such that ξk = ρk + ̺k . As ηk − ζk → 0 and xk → x̄, by Lemma 7.5, ̺k → 0.
Setting ̺k = −βk ,
ρk = ξk + βk ∈ ∂ ◦ (ζk Φ)(xk ).
In particular, for y = Φ(x) for every x ∈ Rn , the above inequality yields that
for every x ∈ Rn ,
C = {x ∈ Rn : G(x) ∈ −Rm
+ },
Theorem 7.7 Consider the convex programming problem problem (CP ) with
C given by (3.1). Then x̄ ∈ C is a point of minimizer of (CP ) if and only if
there exist xk → x̄, yk → G(x̄), λk ∈ Rm + , ξ ∈ ∂f (x̄), and ξk ∈ ∂(λk G)(xk )
such that
ξ + ξk → 0, hλk , yk i = 0,
hλk , yk − G(x̄)i → 0, hλk , G(xk ) − G(x̄)i → 0.
Proof. Observe that the problem (CP ) can be rewritten as the unconstrained
problem
min(f + (δ−Rm
+
◦ G))(x) subject to x ∈ Rn .
By Theorem 2.89, x̄ is a point of minimizer of (CP ) if and only if
0 ∈ ∂(f + (δ−Rm
+
◦ G))(x̄).
As dom f = Rn , invoking the Sum Rule, Theorem 2.91,
0 ∈ ∂f (x̄) + ∂(δ−Rm
+
◦ G)(x̄),
such that
λk ∈ ∂δ−Rm
+
(yk ) and ξk ∈ ∂(λk G)(xk ),
satisfying
δ−Rm
+
(yk ) − hλk , yk − G(x̄)i → (δ−Rm
+
◦ G)(x̄) and hλk , G(xk ) − G(x̄)i → 0.
As λk ∈ ∂δ−Rm
+
(yk ) = N−Rm
+
(yk ), the sequence {yk } ⊂ −Rm
+ with
hλk , y − yk i ≤ 0, ∀ y ∈ −Rm
+.
δ−Rm
+
(yk ) − hλk , yk − G(x̄)i → (δ−Rm
+
◦ G)(x̄)
reduces to
hλk , yk − G(x̄)i → 0,
thereby leading to the requisite result.
Theorem 7.8 Consider the convex programming problem (CP ) with C given
by (3.1). Then x̄ is a point of minimizer of (CP ) if and only if
[ Xm
(0, −f (x̄)) ∈ epi f ∗ + cl epi ( λi gi )∗ . (7.12)
λ∈Rm
+
i=1
Proof. Recall that the feasible set C of the convex programming problem
(CP ) is given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
C = {x ∈ Rn : G(x) ∈ −Rm
+ },
f (x) ≥ f (x̄), ∀ x ∈ C,
that is,
φ(x) + δC (x) ≥ 0, ∀ x ∈ Rn ,
(0, 0) ∈ epi (φ + δC )∗ ,
which by the epigraph of the conjugate of the sum, Theorem 2.123, implies
that
If x ∈
/ C, there exists some i ∈ {1, 2, . . . , m} such that gi (x) > 0. Hence, it is
simple to see that
sup (λG)(x) = +∞. (7.15)
λ∈Rm
+
Applying Theorem 2.123, relation (7.13) along with Proposition 2.103 yields
[
(0, 0) ∈ epi f ∗ + (0, f (x̄)) + cl co epi(λG)∗ .
λ∈Rm
+
S
By Theorem 2.123, λ∈Rm epi(λG)∗ is a convex cone and thus, the above
+
relation reduces to
[
(0, 0) ∈ epi f ∗ + (0, f (x̄)) + cl epi(λG)∗
λ∈Rm
+
[ Xm
= epi f ∗ + (0, f (x̄)) + cl epi( λi gi )∗ ,
λ∈Rm
+
i=1
0 = ξ + lim ξk , (7.16)
k→∞
m
X
−f (x̄) = f ∗ (ξ) + α + lim ( (λki gi )∗ (ξk ) + αk ). (7.17)
k→∞
i=1
f (x̄) ≤ f (x), ∀ x ∈ C.
Theorem 7.9 Consider the convex programming problem (CP ) with C given
by (3.1). Then x̄ is a point of minimizer for (CP ) if and only if there exist
ξ ∈ ∂f (x̄), εki ≥ 0, λki ≥ 0, ξik ∈ ∂εki gi (x̄), i = 1, 2, . . . , m, such that
m
X m
X m
X
ξ+ λki ξik → 0, λki gi (x̄) → 0 and λki εki ↓ 0 as k → +∞.
i=1 i=1 i=1
0 = ξ + lim ξk ,
k→∞
Xm
−ε = hξ, x̄i + lim (hξk , x̄i + εk − ( λki gi )(x̄)),
k→∞
i=1
i = 1, 2, . . . , m, such that
m
X m
X
ξk ∈ ∂εki (λki gi )(x̄) and εk = εki .
i=1 i=1
εki
where ε̄ki = ≥ 0. Therefore,
λki
X X
ξk ∈ λki ∂ε̄ki gi (x̄) + ∂εki (λki gi )(x̄). (7.19)
i∈I¯k / I¯k
i∈
/ I¯k .
∂εki (λki gi )(x̄) = 0 = λki ∂εki gi (x̄), ∀ i ∈
The above relation along with the condition (7.19) yields that
X X
ξk ∈ λki ∂ε̄ki gi (x̄) + λki ∂εki gi (x̄). (7.20)
i∈I¯k / I¯k
i∈
Also,
X X
εk = λki ε̄ki + λki εki ,
i∈I¯k / I¯k
i∈
which along with (7.20) leads to the desired sequential optimality conditions.
Conversely, suppose that the sequential optimality conditions hold. From
Definitions 2.77 and 2.109 of subdifferentials and ε-subdifferentials,
respectively. The above inequalities along with the sequential optimality con-
ditions imply that
m
X
f (x) − f (x̄) + λki gi (x) ≥ 0, ∀ x ∈ Rn ,
i=1
f (x) ≥ f (x̄), ∀ x ∈ C,
By the closure properties of the arbitrary union of sets, the condition (7.21)
leads to
m
[ X
(0, −f (x̄)) ∈ epi f ∗ + cl epi (λi gi )∗ . (7.22)
λ∈Rm
+
i=1
ᾱ
where ≥ 0. As (ξi , αi ) ∈ epi gi∗ ,
λi
ᾱ
gi∗ (ξi ) ≤ αi ≤ αi + , ∀ i ∈ I¯λ ,
λi
P
which implies that (ξi , αi + ᾱ/λi ) ∈ epi gi∗ . Hence (ξ, α + ᾱ) ∈ i∈I¯λ λi epi gi∗
for every ᾱ ≥ 0. Therefore, (7.23) reduces to
m
[ X
∗
(0, −f (x̄)) ∈ epi f + cl λi epi gi∗ .
λ∈Rm
+
i=1
The condition (7.24) implies that there exist ξ ∈ dom f ∗ , α ≥ 0, ξik ∈ dom gi∗ ,
αik , λki ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λki (ξik , gi∗ (ξik ) + αik ).
k→∞
i=1
k k
This equation along
Pm with the nonnegativity
Pm of kε,k εi and λi , i = 1, 2, . . . , m,
k
implies ε = 0, i=1 λi gi (x̄) → 0 and i=1 λi εi ↓ 0 as k → +∞, thereby
establishing the sequential optimality conditions. The converse can be verified
as in Theorem 7.9.
As already discussed in the previous chapters, if one assumes certain con-
straint qualifications, then the standard KKT conditions can be established. If
we observe the necessary and sufficient condition
S given
Pmin Theorem 7.8 care-
fully, we will observe that the term cl λ∈Rm epi ( i=1 λi gi )∗ prevents us
+
from further manipulation. On the other hand, one might feel that the route
to the KKT optimality conditions lies in further manipulation of the condition
(7.12). Further, observe that we arrived at the condition (7.12) without any
constraint qualification. However, in order to derive the KKT optimality con-
ditions, one needs some additional qualification conditions on the constraints.
Thus from (7.12) it is natural to consider that the set
[ Xm
epi ( λi gi )∗ is closed.
λ∈Rm
+
i=1
This is usually known as the closed cone constraint qualification or the Farkas–
Minkowski (FM) constraint qualification. One may also take the more relaxed
constraint qualification based on condition (7.27), that is,
m
[
cone (co epi gi∗ ) is closed.
i=1
0 = ξ + ξ′, (7.28)
m
X
−f (x̄) = hξ, x̄i + ε − f (x̄) + hξ ′ , x̄i + ε′ − ( λi gi )(x̄). (7.29)
i=1
λi gi (x̄) = 0, i = 1, 2, . . . , m. (7.31)
The conditions (7.30) and (7.31) together yield the KKT optimality condi-
tions.
Now if the relaxed constraint qualification is satisfied, (7.27) reduces to
m
[
∗
(0, −f (x̄)) ∈ epi f + cone co epi gi∗ .
i=1
[ Xm [
epi ( λi gi )∗ = epi (λG)∗ .
λ∈Rm
+
i=1 λ∈Rm
+
Suppose that
[
(ξk , αk ) → (ξ, α) ∈ cl epi (λG)∗
λ∈Rm
+
which implies
that is, (bG)(x) ≥ 0 for every x ∈ Rn . But by the Slater constraint qual-
ification, G(x̂) ∈ −int Rm
+ and b 6= 0. Therefore, (bG)(x̂) < 0, which is a
contradiction.
(iii) γk → 0: This implies that λk → 0 and thus (λk G) → 0. Therefore,
Observe that
∗ ′ 0, ξ ′ = 0,
0 (ξ ) =
+∞, otherwise,
Observe that C = {0} and hence the Slater constraint qualification is not
satisfied. Also TC (0) = {0} while
S(0) = {d ∈ R : g ′ (0, d) ≤ 0}
= {d ∈ R : d ≤ 0},
which implies that the Abadie constraint qualification is also not satisfied. For
ξ ∈ R,
∗ 0, 0 ≤ ξ ≤ 1,
g (ξ) =
ξ 2 /4, ξ ≤ 0.
Therefore, the FM constraint qualification reduces to the set cone epi g ∗ being
closed, which is same as the relaxed FM constraint qualification. Here,
Again, C = {0} and the Slater constraint qualification does not hold. But
unlike the above example, TC (0) = {0} = S(0) which implies that the Abadie
constraint qualification is satisfied. For ξ ∈ R,
g ∗ (ξ) = 0, − 2 ≤ ξ ≤ 1.
Thus the Abadie constraint qualification is also not satisfied. Now, for any
(ξ1 , ξ2 ) ∈ R2 ,
∗ 0, ξ1 = ξ2 = 0,
g (ξ1 , ξ2 ) =
+∞, otherwise.
Therefore,
epi f ∗ + epi δC
∗
.
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X} (7.35)
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}.
[ Xm
∗
epi δC = cl epi ( λi gi )∗ .
λ∈Rm
+
i=1
Therefore, by Theorem 2.123 and Propositions 2.102 and 2.15 (vi), the above
condition becomes
[ m
X
⊂ cl {epi φ∗ + epi ( λi gi )∗ + epi δX
∗
}.
λ∈Rm
+
i=1
[ Xm
(0, −f (x̄)) ∈ cl {epi f ∗ + epi ( λi gi )∗ + epi δX
∗
},
λ∈Rm
+
i=1
Pm
Applying Theorem 2.122, there exist ξf ∈ ∂εf f (x̄), ξg ∈ ∂εg ( i=1 λi gi )(x̄),
and ξx ∈ ∂εx δX (x̄) = NX,εx (x̄) with εf , εg , εx ≥ 0 such that
0 = ξf + ξg + ξx , (7.36)
−f (x̄) = (hξf , x̄i − f (x̄) + εf )
Xm
+(hξg , x̄i − ( λi gi )(x̄) + εg ) + hξx , x̄i + εx . (7.37)
i=1
Xm
0 ∈ ∂εf f (x̄) + ∂εg ( λi gi )(x̄) + NX,εx (x̄). (7.38)
i=1
Condition (7.37) along with (7.36) and the nonnegativity conditions yields
m
X
εf + εg + εx − λi gi (x̄) = 0.
i=1
εf = 0, εg = 0, εx = 0, and λi gi (x̄) = 0, i = 1, 2, . . . , m.
Xm
0 ∈ ∂f (x̄) + ∂( λi gi )(x̄) + NX (x̄).
i=1
n
As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.91, the above condition
becomes
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i=1
which along with the complementary slackness condition yields the desired
optimality conditions.
Conversely, suppose that the optimality conditions hold. Therefore, there
exist ξ ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄) such that
m
X
−ξ − λi ξi ∈ NX (x̄),
i=1
that is,
m
X
hξ + λi ξi , x − x̄i ≥ 0, ∀ x ∈ X.
i=1
f (x) ≥ f (x̄), ∀ x ∈ C.
C = argmin{φ(x) : x ∈ Θ},
α = inf φ(x).
x∈Θ
Proof. Observe that the problem (RP ) is of the type considered in The-
orem 7.13. We can invoke Theorem 7.13 if the CC qualification condition
holds, that is,
[
epi f ∗ + epi(µ(φ(.) − α))∗ + epi δΘ
∗
µ≥0
For µ = 0,
∗ ∗ 0, ξ = 0,
(µ(φ(.) − α)) (ξ) = 0 (ξ) =
+∞, otherwise,
which implies
Observe that cone{(0, 1)}∪{(0, 0)} = cone{(0, 1)} and thus the above becomes
By the hypothesis of the theorem, (7.42) is a closed set and thus the reformu-
lated problem (RP ) satisfies the FM constraint qualification. Now invoking
Theorem 7.13, there exists λ ≥ 0 such that
thereby establishing the desired result. The converse can be proved as in Chap-
ter 3.
For a better understanding of the above result, consider the bilevel pro-
gramming problem where f (x) = x2 + 1, Θ = [−1, 1], and φ(x) = max{0, x}.
Observe that C = [−1, 0] and α = 0. Thus the reformulated problem is
For ξ ∈ R,
+∞, ξ < 0 or ξ > 1,
φ∗ (ξ) =
0, ξ ∈ [0, 1],
which implies
which is a closed set. Because cone{(0, 1)} ⊂ cone epi φ∗ , the reformulated
problem satisfies the qualification condition in Theorem 7.14. It is easy to see
that x̄ = 0 is a solution of the bilevel problem with NΘ (0) = {0}, ∂f (0) = {0},
and ∂φ(0) = [0, 1]. Thus the KKT optimality conditions of Theorem 7.14 are
satisfied with λ = 0. Note that the Slater condition fails to hold for the
reformulated problem.
We end this chapter by presenting the optimality conditions for the bilevel
programming problem
inf f (x) subject to x ∈ C, (BP 1)
where C is the solution set of the lower-level problem
Theorem 7.15 Consider the bilevel programming problem (BP 1). Assume
that
[ Xm [ [ Xm
{ epi ( λi gi )∗ } ∪ { λ0 epi φ∗ + epi ( λi gi )∗ } + epi δX
∗
λ∈Rm
+
i=1 λ0 >0 λ∈Rm
+
i=1
m
X
e
(λg)(x) = λ0 φ(x) + λi gi (x).
i=1
Therefore,
Xm
e ∗ = cl{epi (λ0 φ)∗ + epi(
epi(λg) λi gi )∗ }.
i=1
m
X
where (ξ, α) ∈ epi ( λi gi )∗ . Because µ ≥ 0 was arbitrary,
i=1
Xm Xm
cone {(0, 1)} + epi ( λi gi )∗ ⊂ epi ( λi gi )∗ .
i=1 i=1
m
X
Also, for any (ξ, α) ∈ epi ( λi gi )∗ ,
i=1
m
X
(ξ, α) = (0, 0) + (ξ, α) ∈ cone {(0, 1)} + epi ( λi gi )∗ .
i=1
Xm
As (ξ, α) ∈ epi ( λi gi )∗ was arbitrary,
i=1
Xm Xm
epi ( λi gi )∗ ⊂ cone {(0, 1)} + epi ( λi gi )∗ ,
i=1 i=1
Thus, for λ0 = 0,
Xm
e ∗ = epi (
epi(λg) λi gi )∗ .
i=1
Therefore,
[ [ Xm
e ∗ + epi δ ∗ = {
epi(λg) epi ( λi gi )∗ )} ∪
X
e 1+m
λ∈R +
λ∈Rm
+
i=1
[ [ Xm
{ epi φ∗ + epi ( λi gi )∗ } + epi δX
∗
.
λ0 >0 λ∈Rm
+
i=1
is closed. Hence, the FM constraint qualification holds for the problem (RP 1).
Because dom f = Rn , by Theorem 2.69 f is continuous on Rn , CC qualification
condition holds for (RP 1). As the bilevel problem (BP 1) is equivalent to
(RP 1), by Theorem 7.11, x̄ ∈ C is a point of minimizer of (BP 1) if and only
e = (λ0 , λ) ∈ R+ × Rm such that
if there exists λ +
e
0 ∈ ∂f (x̄) + ∂(λg)(x̄) + NX (x̄) and e
(λg)(x̄) = 0. (7.44)
As φ is proper convex, λ0 φ is also proper convex. Therefore, dom (λ0 φ) is
a nonempty convex set in Rn . By Proposition 2.14 (i), ri dom (λ0 φ) is non-
empty. Because dom g = Rn , dom (λg) = Rn . Now invoking the Sum Rule,
Theorem 2.91,
Xm
e
∂(λg)(x̄) = λ0 ∂φ(x̄) + ∂( λi gi )(x̄).
i=1
As (λ0 , λ) ∈ R+ × Rm
+ , which along with the feasibility of x̄ yields that
8.1 Introduction
Until now, we discussed the convex programming problem (CP ) with the
convex feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2 . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions and its variations like
(CP 1) and (CCP ). But is the convexity of the functions forming the convex
feasible set C important? For example, assume C as a subset of R2 given by
C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 ≥ 0}.
This set is convex even though g(x1 , x2 ) = 1−x1 x2 is a nonconvex function. As
stated in Chapter 1, convex optimization basically means minimizing a convex
function over a convex set with no emphasis on as to how the feasible set is
obtained. Very recently (2010), Lasserre [74] published a very interesting paper
discussing this aspect of convex feasibility for smooth convex optimization.
315
Observe that the KKT conditions for smooth convex optimization problems
look absolutely the same as the KKT conditions for the usual smooth opti-
mization problem. As discussed in earlier chapters, under certain constraint
qualifications like the Slater constraint qualification, the above KKT condi-
tions are necessary as well as sufficient.
Lasserre observed that the convex feasible set C of (CP ) need not always
be defined by convex inequality constraints as in the above example. The
question that Lasserre answers is “in such a scenario what conditions would
make the KKT optimality conditions necessary as well as sufficient? ” So now
the convex set C given by (3.1) is considered, with the only difference that
gi , i = 1, 2, . . . , m, need not be convex even though they are assumed to
be smooth. Lasserre showed that if the Slater constraint qualification and
an additional nondegeneracy condition hold, then the KKT condition is both
necessary and sufficient. Though Lasserre defined the notion of nondegeneracy
for every point of the set C, we define it for a particular point and extend it
to the feasible set C.
where I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} denotes the active index set at x̄.
The set C is said to satisfy the nondegeneracy condition if it holds for every
x̄ ∈ C.
Proof. Suppose that C is a convex set and consider x̄ ∈ C. Therefore, for any
y ∈ C, for every λ ∈ [0, 1], x̄ + λ(y − x̄) ∈ C, that is,
C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 + x2 − 2 ≤ 0, x1 ≥ 0}.
which implies that the nondegeneracy condition is satisfied for C = {x̄}. But
observe that there exists no (d1 , d2 ) ∈ R2 satisfying
from the result below, the nondegeneracy condition is required only for the
necessary part at the given point and not the set as mentioned in the statement
of the theorem in Lasserre [74]. Also in the converse part, we require the
necessary part of Theorem 8.2, which is independent of the nondegeneracy
condition.
Theorem 8.3 Consider the problem (CP ) where f is smooth and C is given
by (3.1), where gi , i = 1, 2, . . . , m, are smooth but need not be convex. Assume
that the Slater constraint qualification is satisfied and the nondegeneracy con-
dition holds at x̄ ∈ C. Then x̄ is a point of minimizer of (CP ) if and only if
there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Suppose that λ0 = 0, which implies for some i ∈ {1, 2, . . . , m}, λi > 0. There-
fore, the set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}
As I¯ ⊂ I(x̄), along with the convexity of C and Theorem 8.2, yields that for
¯
every i ∈ I,
For any d ∈ Rn , consider the vector x̂ + λd such that for λ > 0 sufficiently
¯
small, x̂ + λd ∈ Bδ (x̂). Hence, by the condition (8.3), for each i ∈ I,
which implies
h∇gi (x̄), di = 0, ∀ d ∈ Rn .
Hence, ∇gi (x̄) = 0 for every i ∈ I¯ ⊂ I(x̄) and thereby contradicting the non-
degeneracy condition at x̄. Thus, λ0 6= 0. Dividing the Fritz John optimality
condition by λ0 , the KKT optimality condition is established at x̄ as
m
X
∇f (x̄) + λ̄i ∇gi (x̄) = 0 and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
i=1
λi
where λ̄i = , i = 1, 2, . . . , m.
λ0
Conversely, suppose that x̄ satisfies the KKT optimality conditions. As-
sume that x̄ is not a point of minimizer of (CP ). Therefore, there exists x ∈ C
such that f (x) < f (x̄), which along with the convexity of f implies that
thereby contradicting the condition (8.5) and thus leading to the requisite
result, that is, x̄ is a point of minimizer of f over C.
0 6∈ ∂ ◦ gi (x̄), ∀ i ∈ I(x̄).
C = {x ∈ R : g0 (x) ≤ 0},
from which it is obvious that the nondegeneracy condition is ensured for the
convex nonsmooth scenario.
Next we present the equivalent characterization of the convex set C under
the nonsmooth scenario.
Proof. Consider the convex set C. Working along the lines of Theorem 8.2,
for arbitrary but fixed x̄ ∈ C, for λ ∈ (0, 1),
Theorem 8.6 Consider the problem (CP ) with C is given by (3.1), where
gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions. Assume that the
Slater constraint qualification holds and the nondegeneracy condition is sat-
isfied at x̄ ∈ C. Then x̄ is a point of minimizer of (CP ) if and only if there
exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂ ◦ gi (x̄) and λi gi (x̄) = 0, i = 1, . . . , m.
i=1
From the definition of the Clarke subdifferential, the above condition leads to
X X
λi gi◦ (x̄, d) ≥ λi hξi , di = 0, ∀ d ∈ Rn . (8.7)
i∈I¯ i∈I¯
¯ that is,
By the complementary slackness condition, gi (x̄) = 0 for every i ∈ I,
¯
I ⊂ I(x̄). Therefore, by Theorem 8.5, as C is a convex set, we have
¯
gi◦ (x̄, x − x̄) ≤ 0, ∀ x ∈ Bδ (x̂), ∀ i ∈ I,
¯
which along with the condition (8.8) implies that for every i ∈ I,
¯
gi◦ (x̄, v) ≥ 0, ∀ v ∈ Rn , ∀ i ∈ I.
¯
From the definition of the Clarke subdifferential, 0 ∈ ∂ ◦ gi (x̄) for every i ∈ I,
thereby contradicting the nondegeneracy condition. Therefore λ0 6= 0 and
dividing the optimality condition throughout by λ0 reduces it to
m
X
0 ∈ ∂f (x̄) + λ̄i ∂ ◦ gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
i=1
λi
where λ̄i = , i = 1, 2, . . . , m leading to the requisite result.
λ0
Conversely, suppose that the conditions hold at x̄. On the contrary, assume
that x̄ is not a point of minimizer of f over C. Thus, there exists x ∈ C such
that f (x) < f (x̄), which along with the convexity of f ,
Using the optimality conditions at x̄, there exists ξ0 ∈ ∂f (x̄) and ξi ∈ ∂ ◦ gi (x̄),
i = 1, 2, . . . , m, such that
m
X
0 = ξ0 + λi ξi .
i=1
which by the definition of Clarke subdifferential along with Theorem 8.5 yields
m
X
0>− λi gi◦ (x̄, x − x̄) ≥ 0,
i=1
where
3 −x − 1, x ≤ 0,
f (x) = −x, g1 (x) = x and g2 (x) =
−1, x > 0.
9.1 Introduction
In the preceding chapters we studied the necessary and sufficient optimality
conditions for x̄ ∈ Rn to be a point of minimizer for the convex optimization
problem wherein a convex objective function f is minimized over a convex
feasible set C ⊂ Rn . From Theorem 2.90, if the objective function f is strictly
convex, then the point of minimizer x̄ is unique. The notion of unique mini-
mizer was extended to the concept of sharp minimum or, equivalently, strongly
unique local minimum. The ideas of sharp minimizer and strongly unique min-
imizer were introduced by Polyak [94, 95] and Cromme [29]. These notions
played an important role in the approximation theory or the study of pertur-
bation in optimization problems and also in the analysis of the convergence
of algorithms [1, 26, 56]. Below we define the notion of sharp minimum.
327
Observe that any v ∈ NF (x̄) along with the fact that x̄ ∈ F satisfies the
inequality
F ◦ = {v ∈ Rn : hv, xi ≤ 0, ∀ x ∈ F }.
Therefore,
0, v ∈ F ◦ ,
σF (v) = (9.1)
+∞, otherwise.
From (ii), which along with the above relation (9.1) yields that
which implies
as desired.
(iv) By Theorem 2.35, TF (x) is a closed convex cone and hence x + TF (x) is
also a closed convex cone. Invoking (iii) along with Proposition 2.37 leads to
Therefore,
Thus, ∂dF (x) = {0} for x ∈ int F . For x ∈ bdry F , again taking y = 0,
Proof. It is easy to observe that (i) implies (ii). Conversely, suppose that (ii)
holds. Consider x̄ ∈ S with ξ ∈ α cl B ∩ NS (x̄). As (ii) is satisfied, there exists
ȳ ∈ S such that ξ ∈ ∂f0 (ȳ). By Definition 2.77 of subdifferential,
In particular, for any x ∈ S, f0 (x) = f0 (ȳ), thereby reducing the above in-
equality to
hξ, x − ȳi ≤ 0, ∀ x ∈ S,
thereby implying that ξ ∈ ∂f0 (x̄). Because x̄ ∈ S was arbitrary, (i) holds.
The above result was from Burke and Ferris [25]. The next result from
Burke and Deng [22] provides a characterization for weak sharp minimizer in
terms of f0 .
Theorem 9.5 Consider the convex optimization problem (CP ) and its equiv-
alent unconstrained problem (CPu ). Let α > 0. Then S is the set of weak sharp
minimizers with modulus α if and only if
Proof. Suppose that S is the set of weak sharp minimizers with modulus
α > 0. Consider x̄ ∈ S. Therefore, by Definition 9.2,
dS (x̄ + λv)
≥ inf kv − yk = dTS (x̄) (v). (9.9)
λ y∈TS (x̄)
f0 (x) − f0 (x̄) ≥ f0′ (x̄, x − x̄) ≥ α dTS (x̄) (x − x̄) = α dx̄+TS (x̄) (x).
Because x ∈ C and x̄ ∈ S were arbitrary, the above condition holds for every
x ∈ C and every x̄ ∈ S, and hence S is the set of weak sharp minimizers.
We end this chapter by giving equivalent characterizations for the set of
weak sharp minimizers, S, for (CP ) from Burke and Deng [22].
Theorem 9.6 Consider the convex optimization problem (CP ) and its equiv-
alent unconstrained problem (CPu ). Let α > 0. Then the following statements
are equivalent:
(i) S is the set of weak sharp minimizers for (CP ) with modulus α > 0.
(ii) For every x̄ ∈ S and v ∈ TC (x̄),
holds.
(v) For every x̄ ∈ S and v ∈ TC (x̄) ∩ NS (x̄),
f ′ (x̄, v) ≥ α kvk.
Proof. [(i) =⇒ (ii)] Because S is the set of weak sharp minimizers, by Theo-
rem 9.5,
f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.10)
The above condition holds in particular for v ∈ TC (x). As f0 (x) = f (x) + δC (x),
which along with the fact that f0′ (x, v) = f ′ (x, v) for every x ∈ S and
v ∈ TC (x), and condition (9.10) yields
By Theorem 2.35, TC (x) is a closed convex cone. Invoking Proposition 2.61 (v)
along with Proposition 2.37 yields
By the fact that NC (x) = ∂δC (x) and from the Sum Rule, Theorem 2.91,
∂f (x) + NC (x) ⊂ ∂(f + δC )(x) = ∂f0 (x), which is always true along with
Proposition 2.61 (i), the above inequality yields
By Proposition 2.82, ∂f0 (x) is a closed convex set which along with Proposi-
tion 2.61 (iv) and (ii) implies that
As α > 0, for every x ∈ S and every v ∈ TC (x) ∩ NS (x), the above inequality
is equivalent to
Because TC (x) ∩ NS (x) is a closed convex cone, by Proposition 2.61 (v), the
above condition yields that for every x ∈ S,
f ′ (x, v) ≥ α kvk,
hx − x̄, ȳ − x̄i ≤ 0, ∀ ȳ ∈ S,
x − x̄ ∈ TC (x̄) ∩ NS (x̄).
i vii
iv iii
ii v vi
Because x ∈ C and x̄ ∈ projS (x) were arbitrary, the inequality holds for every
x ∈ C and x̄ ∈ projS (x), thereby yielding the relation (vii).
[(vii) =⇒ (i)] As dom f = Rn , by Theorem 2.79 along with Definition 2.77 of
subdifferential and the relation (vii) leads to
with x̄ ∈ projS (x). Because for any ȳ ∈ S with ȳ 6= x̄, f (ȳ) = f (x̄). Thus,
(9.15) holds for every x ∈ C and every x̄ ∈ S, thereby leading to (i).
Figure 9.1 presents the pictorial representation of Theorem 9.6. We have
devoted this chapter only to the theoretical aspect of weak sharp minimizers,
though as mentioned in the beginning this notion plays an important role
from the algorithmic point of view. For readers interested in its computational
aspects, one may refer to Burke and Deng [23, 24] and Ferris [46].
10.1 Introduction
We have discussed the various aspects of studying optimality conditions for
the convex programming problem (CP ). Throughout, we concentrated on es-
tablishing the standard or the sequential optimality conditions at the exact
point of minima. But it may not be always possible to find the point of min-
imizer. There may be cases where the infimum exists but is not attainable.
For instance, consider
min ex subject to x ∈ R.
As we know, the infimum for the above problem is zero but it is not attainable
over the whole real line. Thus for scenarios we try to approximate the solution.
In this example, for a given ε > 0, one can always find x̄ ∈ R such that ex̄ < ε.
This leads to the notion of approximate solutions, which play a crucial role in
algorithmic study of optimization problems. Recall the convex optimization
problem
min f (x) subject to x ∈ C, (CP )
where f : Rn → R is a convex function and C is a convex subset of Rn .
f (x̄) ≤ f (x) + ε, ∀ x ∈ C.
This is not the only way to study approximate solutions. In the literature,
one finds the notions of various approximate solutions introduced over the
years, such as quasi ε-solution, regular ε-solution, almost ε-solution [76], to
name a few. We will define these solution concepts before moving on to study
the approximate optimality conditions. The classes of quasi ε-solution and
regular ε-solution are motivated by Ekeland’s variational principle stated in
Chapter 2.
337
where σC̄ (ξ) denotes the support function to the set C̄ at ξ. Observe that
dom g = Rn and hence by Theorem 2.69 continuous over the whole of Rn .
Now invoking Theorem 13.5 from Rockafellar [97] (see also Remark 10.8), the
support function σC̄ is the closure of the positively homogenous function φ
generated by g ∗ , which is defined as
Therefore,
From the above condition, there exists λ ≥ 0 such that ξ ∈ ∂ε+(λg)(x̄) (λg)(x̄).
As ∂ε1 φ(x) ⊂ ∂ε2 φ(x) whenever ε1 ≤ ε2 , there exists an ε̄ satisfying
0 ≤ ε̄ ≤ ε + (λg)(x̄) such that ξ ∈ ∂ε̄ (λg)(x̄). Therefore,
[
NC̄,ε (x̄) = {ξ ∈ Rn : there exists λ ≥ 0 such that
0≤ε̄≤ε+(λg)(x̄)
(λg)∗ (ξ) + (λg)(x̄) ≤ hξ, x̄i + ε̄}
[ [
= ∂ε̄ (λg)(x̄),
0≤ε̄≤ε+(λg)(x̄) λ≥0
as desired.
Conversely, define εi = ε̄i − λ̄i gi (x̄), i = 1, 2, . . . , m. Applying Proposi-
tion 10.7, ξi ∈ ∂ε̄i (λ̄i gi )(x̄) is equivalent to ξi ∈ NCi ,εi (x̄) for i = 1, 2, . . . , m.
Also, from the condition (10.2),
m
X m
X m
X
ε̄0 + εi + λ̄i gi (x̄) − ε ≤ λ̄i gi (x̄) ≤ 0,
i=1 i=1 i=1
Observe that F (x̄) = ε̄. To show that for every ε ≥ ε̄, x̄ ∈ Rn is an ε-solution
for (CPmax ), it is sufficient to establish that
F (x̄) ≤ F (x) + ε, ∀ x ∈ Rn .
Therefore,
gi (x̄) ≤ 2ε, i = 1, 2, . . . , m,
Theorem 10.11 Consider the convex programming problem (CP ) with C de-
fined by (3.1). Assume that the Slater constraint qualification is satisfied and
let ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0,
ε̄i ≥ 0, and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε = λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1
0 ∈ ∂ε F (x̄).
X m
X X
0= ξi and εi + F (x̄) − λi gi (x̄) = ε. (10.5)
i∈I¯ i=0 i∈I¯
¯ ε̄i = εi , i = 0, 1, . . . , m, and
where ξ¯0 ∈ ∂ε̄0 f (x̄), ξ¯i ∈ ∂ε̄i (λ̄i gi )(x̄), i ∈ I,
λ0
λi ¯ Corresponding to i ∈ ¯ λ̄i = 0 with ξ¯i = 0 ∈ ∂ε̄ (λ̄i gi )(x̄),
λ̄i = , i ∈ I. / I, i
λ0
thereby leading to the approximate optimality condition
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄)
i=1
The reader is urged to verify that Λ is an open convex set. Observe that
(0, 0) ∈
/ Λ. Therefore, by the Separation Theorem, Theorem 2.26 (ii), there
Working along the lines of proof of Theorem 4.2, it can be proved that
(λ0 , λ) ∈ R+ × Rm +.
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. By the
Slater constraint qualification, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m which implies
m
X
λi gi (x̂) < 0,
i=1
λi
where λ̄i = for i = 1, 2, . . . , m. In particular, taking x = x̄, the above
λ0
inequality reduces to
X m
ε+ λ̄i gi (x̄) ≥ 0. (10.10)
i=1
As gi (x̄) ≤ 0, i = 1, 2, . . . , m, which along with (10.9) leads to
m
X m
X
f (x̄) + λ̄i gi (x̄) ≤ f (x) + λ̄i gi (x) + ε, ∀ x ∈ Rn ,
i=1 i=1
which implies
L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn . (10.11)
For any λi ≥ 0, i = 1, 2, . . . , m, the feasibility of x̄ along with the nonnega-
tivity of ε and (10.10) leads to
m
X m
X
f (x̄) + λi gi (x̄) − ε ≤ f (x̄) − ε ≤ f (x̄) + λ̄i gi (x̄),
i=1 i=1
that is,
L(x̄, λ) − ε ≤ L(x̄, λ̄), ∀ λ ∈ Rm
+.
The above inequality along with (10.11) implies that (x̄, λ̄) is an ε-saddle point
of (CP ), which satisfies (10.10), thereby yielding the desired result.
Using this ε-saddle point result, we establish the approximate optimal-
ity conditions. But unlike Theorems 10.9 and 10.11, the result below is only
necessary with a relaxed ε-complementary slackness condition.
Observe that the conditions obtained in Theorem 10.14 are only neces-
sary and not sufficient. The approach used in Theorems 10.9 and 10.11 for
the sufficiency part cannot be invoked here. But if instead of the relaxed
ε-complementary slackness condition, one has the standard complementary
slackness, which is equivalent to
m
X
λ̄i gi (x̄) = 0,
i=1
then working along the lines of Theorem 10.9 the sufficiency can also be estab-
lished. The result below shows that the optimality conditions derived in the
above theorem imply toward the 2ε-solution of (CP ) instead of the ε-solution.
Summing the above inequalities along with the condition (10.13) leads to
m
X m
X m
X
f (x) + λ̄i gi (x) ≥ f (x̄) + λ̄i gi (x̄) − (ε0 + εi ).
i=1 i=1 i=1
With respect to the ε-solution, we will call x̄ an ε-minimum solution of L(., λ̄)
and similarly, call λ̄ an ε-maximum solution of L(x̄, .).
We end this section by presenting a result relating the ε-solutions of the
saddle point to the almost ε-solution of (CP ) that was derived by Dutta [37].
Proof. Because λ̄ ∈ Rm m
+ is an ε2 -maximum solution of L(x̄, λ) over R+ ,
gi (x̄) − ε2 ≤ 0, i = 1, 2, . . . , m
/ Rm
(g1 (x̄) − ε2 , g2 (x̄) − ε2 , . . . , gm (x̄) − ε2 ) ∈ −.
We claim that γ ∈ Rm m
+ . On the contrary, assume that γ 6∈ R+ , which implies
for some i ∈ {1, 2, . . . , m}, γi < 0. As the inequality (10.15) holds for every
y ∈ Rm − , taking the corresponding yi → −∞ leads to a contradiction. Hence,
γ ∈ Rm +. Pm
Because γ 6= 0, it can be so chosen satisfying i=1 γi = 1. Therefore, the
strict inequality condition in (10.15) reduces to
m
X
γi gi (x̄) > ε2 . (10.16)
i=1
As λ̄ ∈ Rm m m
+ and γ ∈ R+ , λ̄ + γ ∈ R+ . Therefore, taking λ = λ̄ + γ in (10.14)
leads to
m
X
γi gi (x̄) ≤ ε2 ,
i=1
which implies
m
X m
X
f (x̄) + λ̄i gi (x̄) ≤ f (x) + λ̄i gi (x) + ε1 , ∀ x ∈ Rn .
i=1 i=1
f (x̄) ≤ f (x) + ε1 + ε2 , ∀ x ∈ C.
which implies
m
X
ρi max{0, gi (xρ )} ≤ α + ε. (10.19)
i=1
(α + ε)
Now consider ρ = (ρ1 , ρ2 , . . . , ρm ) such that ρi ≥ ρε = for every
ε
i = 1, 2, . . . , m. Therefore, for the ε-solution xρ of (CP )ρ , the condition (10.19)
leads to
m
X
gi (xρ ) ≤ max{0, gi (xρ )} ≤ max{0, gi (xρ )} ≤ ε, ∀ i = 1, 2, . . . , m,
i=1
f (xρk ) ≤ f (x) + ε, ∀ x ∈ C.
Because f (xρk ) is bounded above for every k, therefore by the given hypothesis
gi (xρk ) ≤ (α + ε)/ρki .
gi (xρ ) ≤ 0, ∀ i = 1, 2, . . . , m.
Proof. Because x̄ is a unique point on the line segment joining xρ and x̂ lying
on the boundary, the active index set I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is
nonempty. Define a convex auxiliary function as
X
F(x) = f (x) + ρ0 gi (x).
i∈I(x̄)
As x̄ lies on the line segment joining xρ and x̂, there exists λ ∈ (0, 1) such
that x̄ = λxρ + (1 − λ)x̂. Then by the convexity of gi , i = 1, 2, . . . , m,
For i ∈ I(x̄), gi (x̄) = 0, which along with the Slater constraint qualification
reduces the above inequality to
X X m
X
gi (xρ ) = max{0, gi (xρ )} ≤ max{0, gi (xρ )},
i∈I(x̄) i∈I(x̄) i=1
To prove the result, it is sufficient to show that F(x̄) < F(xρ ). But first
we will show that F(x̂) < F(x̄). Consider
X
F(x̂) = f (x̂) + ρ0 gi (x̂).
i∈I(x̄)
P
Because gi (x̂) < 0, i = 1, 2, . . . , m, i∈I(x̄) gi (x̂) ≤ maxi=1,...,m gi (x̂), which
by the given hypothesis implies
f (x̄) ≤ f (x) + ε, ∀ x ∈ C,
For a better understanding of the above result, let us consider the following
example. Consider
inf ex subject to x ≤ 0.
Also, for any x ∈ C, f (x) = fρ (x), which along with the above condition
implies
This leads to the fact that every ε-solution of (CP ) is also an ε-solution of the
penalized unconstrained problem. Next we derive the approximate optimality
conditions for (CP ) using the penalized unconstrained problem.
Theorem 10.20 Consider the convex programming problem (CP ) with C de-
fined by (3.1). Assume that the Slater constraint qualification is satisfied. Let
ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0, ε̄i ≥ 0
and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε = λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1
0 ∈ ∂ε fρ (x̄).
principle and the other two need not be satisfied. In this section we deal
with the quasi ε-solution and derive the approximate optimality conditions
for this class of approximate solutions for (CP ). But before doing so, let
us illustrate by an example that a quasi ε-solution may or may not be an
ε-solution. Consider the problem
1
inf subject to x > 0.
x
1
Note that the infimum of the problem is zero, which is not attained. For ε = ,
4
it is easy to note that xε = 4 is an ε-solution. Now x̄ > 0 is a quasi ε-solution
if
1 1 1
≤ + |x − x̄|, ∀ x > 0.
x̄ x 2
Observe that x̄ = 4.5 is a quasi ε-solution that is also an ε-solution satisfying
all the conditions of the Ekeland’s variational principle, while x̄ = 3.5 is a
quasi ε-solution that is not ε-solution. Also, it does not satisfy the condition
1 1
≤ . These are not the only quasi ε-solutions. Even points that satisfy
x̄ xε
only the unique minimizer condition of the variational principle, like x̄ = 3,
are also the quasi ε-solution to the above problem.
Now we move on to discuss the approximate optimality conditions for the
quasi ε-solutions.
As dom f = dom k. − x̄k = Rn , invoking the Sum Rule, Theorem 2.91, along
with the fact that ∂k. − x̄k = B, the above inclusion becomes
m
X
√
0 ∈ ∂f (x̄) + εB + λi ∂gi (x̄),
i=1
Combining the inequalities (10.27), (10.28), and (10.29) along with (10.26)
implies
m
X m
X √
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) + εkbk kx − x̄k ≥ 0, ∀ x ∈ Rn .
i=1 i=1
As x̄ is an ε-saddle point,
L(x̄, λ) ≤ L(x, λ) + ε, ∀ x ∈ Rn ,
n
√ over R . Applying Ekeland’s
which implies x̄ is an ε-solution of L(., λ)
n
vari-
ational principle,
√ Theorem 2.113, for ε, there exists x̃ ∈ R satisfying
kx̃ − x̄k ≤ ε such that x̃ is a minimizer of the problem
√
min L(x, λ) + εkx − x̃k subject to x ∈ Rn .
which implies there exist ξ˜0 ∈ ∂f (x̃), ξ˜i ∈ ∂gi (x̃), i = 1, 2, . . . , m, and b ∈ B
such that
m
X √
0 = ξ˜0 + λi ξ˜i + εb,
i=1
thereby leading to
m
X √
kξ˜0 + λi ξ˜i k ≤ ε,
i=1
which along with the condition (10.30) implies that x̄ is a modified ε-KKT
point as desired.
In the case of the exact penalization approach, from Theorem 10.16 we
have that every convergent sequence of ε-solutions of a sequence of penalized
problems converges to an ε-solution of (CP ). Now is it possible to establish
such a result by studying the sequence of modified ε-KKT points and the
answer is yes, as shown in the following theorem.
which yields
m
X m
X
γi gi (x) ≥ γi gi (x̄) ≥ 0, ∀ x ∈ Rn ,
i=1 i=1
Denote the duality gap by θ = inf x∈C f (x) − supλ∈Rm w(λ). Next we present
the theorem relating the ε-solution of (CP )ρ with the almost ε-solution of
(CP ) under the assumption of the ε-maximum solution of (DP ). Recall the
penalized problem
min fρ (x) subject to x ∈ Rn , (CP )ρ
Pm
where fρ (x) = f (x) + i=1 ρi max{0, gi (x)} and ρ = (ρ1 , ρ2 , . . . , ρm ) with
ρi > 0, i = 1, 2, . . . , m.
Theorem 10.26 Consider the convex programming problem (CP ) with C
given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying
θ
ρ ≥ 3 + max λ̄i + ,
i=1,...,m ε
where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is an ε-maximum solution of (DP ), every x̄ that
is an ε-solution of (CP )ρ is also an almost ε-solution of (CP ).
Proof. Consider an ε-solution x̂ ∈ C of (CP ), that is,
f (x̂) ≤ inf f (x) + ε.
x∈C
that is,
m
X m
X
ρ max{0, gi (x̄)} ≤ λ̄i gi (x̄) + 3ε + θ.
i=1 i=1
Define the index set I > = {i ∈ {1, 2, . . . , m} : gi (x̄) > 0}. Thus
X m
X m
X
ρ gi (x̄) = ρ max{0, gi (x̄)} ≤ λ̄i gi (x̄) + 3ε + θ
i∈I > i=1 i=1
X
≤ λ̄i gi (x̄) + 3ε + θ,
i∈I >
which implies
X X
(ρ − max λ̄i ) gi (x̄) ≤ (ρ − λ̄i )gi (x̄) ≤ 3ε + θ.
i=1,...,m
i∈I > i∈I >
ρ ≥ max λ̄i ,
i=1,...,m
11.1 Introduction
In all the preceding chapters we considered the convex programming problem
(CP ) with the feasible set C of the form (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
CI = {x ∈ Rn : gi (x) ≤ 0, i ∈ I}.
365
Observe that in the Slater constraint qualification for (CP ), only condition
(iii) is considered. Here the additional conditions (i) and (ii) ensure that the
supremum is attained over I, which holds trivially in the finite index set
scenario. We now present the KKT optimality condition for (SIP ).
Theorem 11.2 Consider the convex semi-infinite programming problem
(SIP ). Assume that the Slater constraint qualification for (SIP ) holds. Then
[I(x̄)]
x̄ ∈ Rn is a point of minimizer of (SIP ) if and only if there exists λ ∈ R+
such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
where I(x̄) = {i ∈ I : g(x̄, i) = 0} denotes the active index set and the
subdifferential ∂g(x̄, i) is with respect to x.
[I(x̄)]
where λ ∈ R+ . By Definition 2.77 of the subdifferential, for every x ∈ Rn ,
The above inequality along with the fact that g(x̄, i) = 0, i ∈ I(x̄) leads to
X
f (x) + λi g(x, i) ≥ f (x̄), ∀ x ∈ Rn .
i∈supp λ
f (x) ≥ f (x̄), ∀ x ∈ CI ,
result by Klee [72]. But this approach was a bit difficult to follow. So Bor-
wein [16] provided a self-contained proof of the reduction approach involving
quasiconvex functions. Here we present the same under the assumptions that
g(., i) is convex for every i ∈ I and g(., .) is jointly continuous as a function
of (x, i) ∈ Rn × I. In the proof, one only needs g(x, i) to be jointly usc as a
function of (x, i) ∈ Rn × I along with the convexity assumption.
1
C r (i) = {x ∈ C ∩ r cl B : g(y, i) < 0, ∀ y ∈ x + B}.
r
Observe that C r (i) ⊂ r cl B and hence is bounded.
We claim that C r (i) is convex. Consider x1 , x2 ∈ C r (i), which implies that
xj ∈ C ∩ r cl B, j = 1, 2. Because C and cl B are convex sets, C ∩ r cl B is
also convex. Thus,
1
For any yj ∈ xj + B, j = 1, 2,
r
1 1
y = (1 − λ)y1 + λy2 ∈ (1 − λ)(x1 + B) + λ(x2 + B)
r r
1
⊂ (1 − λ)x1 + λx2 + B.
r
As x1 , x2 ∈ C r (i), for j = 1, 2,
1
g(yj , i) < 0, ∀ yj ∈ xj + B.
r
By the convexity of g(., i), for any λ ∈ [0, 1],
g(y, i) ≤ (1 − λ)g(y1 , i) + λg(y2 , i) < 0.
1
Because the above conditions hold for arbitrary yj ∈ xj + B, j = 1, 2,
r
1
g(y, i) < 0, ∀ y ∈ (1 − λ)x1 + λx2 + B.
r
Therefore, from the definition of C r (i), it is obvious that
(1 − λ)x1 + λx2 ∈ C r (i), ∀ λ ∈ [0, 1].
Because x1 , x2 ∈ C r (i) are arbitrary, C r (i) is a convex set.
Next we prove that C r (i) is closed. Suppose that x̄ ∈ cl C r (i), which
implies there exists a sequence {xk } ⊂ C r (i) with xk → x̄. Because xk ∈ C r (i),
xk ∈ C ∩ r cl B such that
1
g(y, i) < 0, ∀ y ∈ xk + B. (11.3)
r
Because C and cl B are closed sets, C ∩ r cl B is also closed and thus,
1
x̄ ∈ C ∩ r cl B. Now if x̄ ∈ / C r (i), there exists some ȳ ∈ x̄ + B such that
r
1
g(ȳ, i) ≥ 0. As xk → x̄, for sufficiently large k, ȳ ∈ xk + B with g(ȳ, i) ≥ 0,
r
which is a contradiction to condition (11.3). Thus C r (i) is a closed set.
Finally, we claim that for some r̄ ∈ N and every set of n + 1 points
{i0 , i1 , . . . , in } ⊂ I,
n
\
C r̄ (ij ) 6= ∅.
j=0
1
kx̄k ≤ r̄, ε> and ir̄j ∈ N (īj ), j = 0, 1, . . . , n. (11.6)
r̄
Combining (11.5) and (11.6), x̄ ∈ C ∩ r̄ cl B such that
1
g(y, ir̄j ) < 0, ∀ y ∈ x̄ + B.
r̄
Therefore, x̄ ∈ C r̄ (ir̄j ) for every j = 0, 1, . . . , n, which contradicts our as-
sumption (11.4). Thus, for some r̄ ∈ N and every set of n + 1 points
{i0 , i1 , . . . , in } ⊂ I,
n
\
C r̄ (ij ) 6= ∅.
j=0
From the above condition, there exists x̃ ∈ C r (i) for every i ∈ I, which implies
x̃ ∈ C such that
1
g(y, i) < 0, ∀ y ∈ x̃ + B, i ∈ I.
r
1
Taking U = Rn and defining ε = for r ∈ N, the above condition yields (i).
r
To complete the proof, we have to finally show that (ii)(b) also implies (i).
This can be done by expressing (ii(b) in the form of (ii)(a). Consider a point
i′ ∈
/ I and define I ′ = {i′ } ∪ I, which is again a compact set. Also define the
function g ′ on Rn × I ′ as
′ ′ −δ, x ∈ U,
g (x, i ) = and g ′ (x, i) = g(x, i), i ∈ I,
+∞, x 6∈ U,
where δ > 0. Observe that g ′ (., i), i ∈ I ′ satisfies the convexity assumption
and is jointly usc on Rn × I ′ . Therefore, (ii)(b) is equivalent to the existence
of x ∈ C for every n points {i1 , i2 , . . . , in } ⊂ I,
As (ii)(a) is also satisfied, for every n+1 points {i0 , i1 , . . . , in } ⊂ I there exists
x ∈ C such that
Combining the conditions (11.7) and (11.8), (ii)(b) implies that for every n+1
points {i0 , i1 , . . . , in } ⊂ I ′ there exists x ∈ C such that
which is of the form (ii)(a). As we have already seen that (ii)(a) implies (i)
with U = Rn , there exists x ∈ C and ε > 0 such that
that is,
Observe that the Slater constraint qualification for (SIP ) also implies the
relaxed Slater constraint qualification for (SIP ). Now we present the KKT
g ).
optimality condition for (SIP ) by reducing it to the equivalent (SIP
f ((1 − λ)x1 + λx2 ) ≤ (1 − λ)f (x1 ) + λf (x2 ) < f (x̄), ∀ λ ∈ [0, 1],
which implies that (i) of Proposition 11.3 does not hold. Therefore either
(ii)(a) or (ii)(b) is not satisfied. As the relaxed Slater constraint qualification
for (SIP ), which is the same as (ii)(a), holds, (ii)(b) cannot be satisfied. Thus,
there exist n points {i1 , i2 , . . . , in } ⊂ I such that
g(x, ij ) ≤ 0, j = 1, 2, . . . , n. (11.10)
f (x̃) ≥ f (x̄).
g ), x̄ is a point of minimizer of
Because x̃ is an arbitrary feasible point of (SIP
g
(SIP ). Observe that by (11.11), the Slater constraint qualification is satisfied
by the reduced problem. Therefore, invoking Theorem 3.7, there exist λij ≥ 0,
j = 1, 2, . . . , n, such that
n
X
0 ∈ ∂f (x̄) + λij ∂g(x̄, ij ) and λij g(x̄, ij ) = 0, j = 1, 2, . . . , n. (11.13)
j=1
[I(x̄)]
We claim that λ ∈ R+ . From the complementary slackness condition in
the optimality condition (11.13), if ij 6∈ I(x̄), λij = 0, whereas for ij ∈ I(x̄),
λij ≥ 0. For i 6∈ {i1 , i2 , . . . , in } but i ∈ I(x̄), define λi = 0. Therefore,
[I(x̄)]
λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
thereby yielding the KKT optimality condition for (SIP ). The converse can
be worked out along the lines of Theorem 11.2.
and
X [I(x̄)]
b
S(x̄) = cone co S̄(x̄) = { λi ∂g(x̄, i) ∈ Rn : λ ∈ R+ }.
i∈supp λ
In particular, for {xk } ⊂ CI , that is, g(xk , i) ≤ 0 along with the fact that
g(x̄, i) = 0, i ∈ I(x̄), the above inequality reduces to
hξi , xk − x̄i ≤ 0, ∀ k ∈ N.
where z ∈ cl cone (CI − x̄). By Theorem 2.35, z ∈ TCI (x̄). Because i ∈ I(x̄)
and ξi ∈ S̄(x̄) were arbitrary,
hξ, zi ≤ 0, ∀ ξ ∈ S̄(x̄),
By Proposition 3.9,
b
(S(x̄))◦ = cl S(x̄)
which by Proposition 2.31(ii) and (iii) along with the fact that S(x̄) is a closed
convex cone implies that
b
(S(x̄))◦
= (S̄(x̄))◦ = S(x̄)
where
m
[
S̄(x̄) = {∂gi (x̄) : i = 1, 2, . . . , m} = ∂gi (x̄).
i=1
(S̄(x̄))◦ ⊂ TC (x̄).
−ξ ∈ NCI (x̄).
0 ∈ ∂f (x̄).
as desired. The converse can be worked out as in Theorem 11.2, thereby es-
tablishing the requisite result.
In Goberna and López [50], they consider the feasible direction cone to
F ⊂ Rn at x̄ ∈ F , DF (x̄), defined as
and hence, the tangent cone to F at x̄ is related to the feasible direction set
as
TF (x̄) = cl DF (x̄).
For the convex semi-infinite programming problem (SIP ), the feasible set is
CI . In particular, taking F = CI in the above condition yields
From (11.14) we have that TCI (x̄) ⊂ (S̄(x̄))◦ ; thus the above condition yields
Combining the above condition with (11.20), which along with Proposi-
b
tion 2.31 and the fact that S(x̄) = cone co S̄(x̄) implies that
b
cl S(x̄) = (S̄(x̄))◦◦ = (DCI (x̄))◦ .
b
By the closedness condition of S(x̄) at the Lagrangian regular point x̄, the
preceding condition reduces to
b
S(x̄) = (DCI (x̄))◦ .
which implies
b
S(x̄) = cl cone co S̄(x̄) = (S̄(x̄))◦◦ = {0}.
b
Thus, S(x̄) = NCI (x̄) for every x̄ ∈ ri CI . Therefore one needs to impose
the Lagrangian regular point condition to boundary points only. This fact
was mentioned in Goberna and López [50] and was proved in Fajardo and
López [44].
Recall that in Chapter 3, under the Slater constraint qualification, Propo-
b
sition 3.3 leads to NC (x̄) = S(x̄), which by Propositions 2.31 and 2.37 is
equivalent to
b
TC (x̄) = (S(x̄))◦ b
= (cl S(x̄))◦
= S(x̄).
b
Also, under the Slater constraint qualification, by Lemma 3.5, S(x̄) is closed.
Hence the Slater constraint qualification leads to the Abadie constraint qualifi-
cation along with the closedness criteria. A similar result also holds for (SIP ).
But before that we present Gordan’s Theorem of Alternative which plays an
important role in establishing the result.
Proof. If xi = 0 for some i ∈ I, then the result holds trivially as system (I)
is not satisfied while system (II) holds. So without loss of generality, assume
that xi 6= 0 for every i ∈ I.
Suppose (I) does not hold. Let 0 6∈ co {xi : i ∈ I}. As by hypothesis
co {xi : i ∈ I} is closed, by the Strict Separation Theorem, Theorem 2.26(iii),
there exists a ∈ Rn with a 6= 0 such that
ha, xi i < 0, ∀ i ∈ I,
which is a contradiction. Thus, system (I) does not hold, thereby completing
the proof.
The hypothesis that co {xi : i ∈ I} is a closed set is required as shown
in the
h example from López and Vercher [75]. Consider xi = (cos i, sin i) and
π π
I= − , . Observe that (0, 0) 6∈ co {xi : i ∈ I} as
2 2
0 ∈ co {cos i : i ∈ I} and 0 ∈ co {sin i : i ∈ I}
and (0, 1) ∈
/ co {xi : i ∈ I}.
Now we present the result from López and Vercher [75] showing that
the Slater constraint qualification for (SIP ) implies that every feasible point
x ∈ CI is a Lagrangian regular point.
g(x̄, i) ≤ 0, i ∈ I.
(ii) I(x̄) is nonempty: We claim that I(x̄) is compact. By condition (i) of the
Slater constraint qualification for (SIP ), I is compact. Because I(x̄) ⊂ I, I(x̄)
is bounded. Now consider {ik } ⊂ I(x̄) such that ik → i. By the compactness
of I and the fact that I(x̄) ⊂ I, i ∈ I. As ik ∈ I(x̄),
g(x̄, ik ) = 0, ∀ k ∈ N,
which by condition (ii) of the Slater constraint qualification for (SIP ), that
is, the continuity of g(x, i) with respect to (x, i) in Rn × I, implies that as the
limit k → +∞, g(x̄, i) = 0 and thus i ∈ I(x̄). Therefore, I(x̄) is closed, which
along with the boundedness impliesSthat I(x̄) is compact.
Next we will show that S̄(x̄) = i∈I(x̄) ∂g(x̄, i) is compact. Suppose that
{ξk } ⊂ S̄(x̄) with ξk → ξ. As ξk ∈ S̄(x̄), there exists ik ∈ I(x̄) such that
ξk ∈ ∂g(x̄, ik ), that is, by Definition 2.77 of the subdifferential,
that is, ξ ∈ ∂g(x̄, i) with i ∈ I(x̄). Therefore ξ ∈ S̄(x̄), thereby implying that
S̄(x̄) is closed. As dom g(., i) = Rn , i ∈ I(x̄), by Proposition 2.83, ∂g(x̄, i) is
compact for i ∈ I(x̄), which implies for every ξi ∈ ∂g(x̄, i) there exists Mi > 0
such that
kξi k ≤ Mi , ∀ i ∈ I(x̄).
Because I(x̄) is compact, the supremum of Mi over I(x̄) is attained, that is,
which leads to
Thus,
that the notion of Lagrangian regular point is also known as the convex locally
Farkas–Minkowski problem. In this section, we will discuss about the global
qualification condition, namely the Farkas–Minkowski qualification studied
in Goberna, López, and Pastor [51] and López and Vercher [75]. Before in-
troducing this qualification condition, let us briefly discuss the concept of
Farkas–Minkowski system for a linear semi-infinite system from Goberna and
López [50].
Consider a linear semi-infinite system
Θ = {hxi , xi ≥ ci , i ∈ I} (LSIS)
The relation hx̃, xi ≥ c̃ is a consequence relation of the system Θ if every solu-
tion of Θ satisfies the relation. A consistent (LSIS) Θ is said to be a Farkas–
Minkowski system, in short, an FM system, if every consequent relation is a
consequence of some finite subsystem. Before we state the Farkas–Minkowski
qualification for convex semi-infinite programming problem (SIP ), we present
some results on the consequence relation and the FM system from Goberna
and López [50].
Consider i′ ∈
/ I and define (xi′ , ci′ ) = (0, −1) and I ′ = {i′ } ∪ I. Thus
K = cone co {(xi , ci ), i ∈ I ′ }.
[I ′ ]
Suppose that (x̃, c̃) ∈ cl K, which implies there exist {λk } ⊂ R+ , {sk } ∈ N,
{xikj } ⊂ Rn and {cikj } ⊂ R satisfying ikj ∈ I ′ for j = 1, 2, . . . , sk , such that
sk
X
(x̃, c̃) = lim λkj (xikj , cikj ). (11.22)
k→∞
j=1
If x̄ is a solution of (LSIS) Θ,
Taking the limit as k → +∞ in the above condition along with (11.23) yields
On the contrary, suppose that there exists (x̄, c̄) ∈ cl K such that
Because cl K is a cone, λ(x̄, c̄) ∈ cl K for λ > 0. Therefore, the above inequality
along with the condition (11.24) implies that
−λγn+1 ≥ 0, ∀ λ > 0,
which implies that γn+1 ≤ 0. We now consider the following two cases.
(i) γn+1 = 0: The condition (11.26) reduces to
hx̄, xi i ≥ ci , ∀ i ∈ I. (11.28)
Therefore, from the inequalities (11.27) and (11.28), for any λ > 0,
hx̄ + λγ, xi i ≥ ci , ∀ i ∈ I,
that is,
which by the Carathéodory Theorem, Theorem 2.8, implies that there exist
λj ≥ 0 and ij ∈ I ′ , j = 1, . . . , s, with 1 ≤ s ≤ n + 2 such that
s
X
(x̃, c̃) = λj (xij , cij ).
j=1
hxij , xi ≥ cij , j = 1, 2, . . . , s.
hxi , xi ≥ ci , i ∈ I
hxi , xi ≥ ci , i = 1, 2, . . . , s.
is an FM system.
e that is,
Proof. Define the solution set of (LSIS) Θ by C,
e = {x ∈ Rn : g(y, i) + hξ, x − yi ≤ 0, ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)}.
C
g(x̃, i) ≤ 0, i ∈ I,
which implies that there exists ξ ∈ ∂f (x̄) such that −ξ ∈ NCI (x̄). By Defini-
tion 2.36 of the normal cone,
hξ, x − x̄i ≥ 0, ∀ x ∈ CI ,
As ξj ∈ ∂g(x̄, ij ), j = 1, 2, . . . , s,
s
X
that is, λj g(x̄, ij ) = 0. Thus,
j=1
λj g(x̄, ij ) = 0, ∀ j = 1, 2, . . . , s.
[I(x̄)]
where λ ∈ R+ . By the optimality condition for the unconstrained problem,
Theorem 2.89,
X
0 ∈ ∂(f + λi g(., i))(x̄).
i∈supp λ
As dom f = dom g(., i) = Rn for i ∈ I(x̄), by the Sum Rule, Theorem 2.91,
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
thereby establishing the KKT optimality conditions for (SIP ). The converse
can be worked out along the lines of Theorem 11.2.
Another notion that implies (LSIS) Θ is an FM system is that of the
canonically closed system. Below we define this concept and a result relating
a canonically closed system and an FM system from Goberna, López, and
Pastor [51].
Proof. (i) Suppose that cone co {(xi , ci ), i ∈ I; (0, −1)} is closed. Then by
Proposition 11.9, hx̃, xi ≥ c̃ is the consequence relation of (LSIS) Θ if and
only if
Suppose that K e is closed. We claim that K is closed, which by (i) will then
imply that Θ is an FM system. Consider a bounded sequence {(x̃k , c̃k )} ⊂ K
such that (x̃k , c̃k ) → (x̃, c̃). Note that (x̃k , c̃k ) for k ∈ N can be expressed as
0 = h0, xi ≥ 1
is closed. On the contrary, assume that it is not closed, which implies there
exists a convergent sequence {(x̃k , c̃k )} ⊂ K e such that (x̃k , c̃k ) → (x̃, c̃) with
e but (x̃, c̃) ∈
(x̃, c̃) ∈ cl K e Because (x̃k , c̃k ) ∈ K,
/ K. e there exist λi ≥ 0, ik ∈ I
kj j
As (LSIS) Θ is canonically closed, there exists x̂ ∈ Rn such that hxi , x̂i > ci ,
i ∈ I, that is,
h(xi , ci ), (x̂, −1)i = hxi , x̂i − ci > 0, ∀ i ∈ I. (11.38)
Combining the relations (11.37) and (11.38) along with the fact that λij ≥ 0,
j = 1, 2, . . . , n + 2, not all simultaneously zero, implies that
n+2
X n+2
X
0= λij h(xij , cij ), (x̂, −1)i = λij (hxij , x̂i − cij ) > 0,
j=1 j=1
Θ = {hxi , xi ≤ ci , i ∈ I}
such that
(i) every point of F is a solution of the system Θ,
(ii) there exists x̂ ∈ F such that hxi , xi < ci , i ∈ I,
(iii) given any y ∈ Fb , there exists some i ∈ I such that hxi , yi = ci .
Then F is the solution set of Θ, that is,
F = {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}.
z ∈ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}
and z 6∈ F . By (ii), there exists x̂ ∈ F such that hxi , xi < ci for every i ∈ I. As
F is a closed convex set, the line segment joining x̂ and z meets the boundary
Fb at only one point, say y ∈ (x̂, z). Therefore, there exists λ ∈ (0, 1) such
that y = (1 − λ)x̂ + λz ∈ Fb . By condition (iii), there exists ī ∈ I such that
respectively. Thus
is compact.
As CI is bounded and CIb ⊂ CI , therefore CIb is bounded. Also by condition
(i) and (ii) of the Slater constraint qualification for (SIP ), the supremum is
attained over I. Therefore, as dom g(., i) = Rn , i ∈ I, dom g = Rn , which
by Theorem 2.69 is continuous over Rn . Thus, for a sequence {yk } ⊂ CIb with
yk → ȳ, g(yk ) → g(ȳ). Also, as g(yk ) = 0 for every k ∈ N, g(ȳ) = 0, which
implies ȳ ∈ CIb . Hence, CIb is closed, thereby yielding the compactness of CIb .
By Proposition 2.85,
[
∂g(CIb ) = {ξ ∈ Rn : ξ ∈ ∂g(y), y ∈ CIb } = ∂g(y)
y∈CIb
is compact.
Now consider a convergent sequence {(ξk , hξk , yk i)} ⊂ K e where {yk } ⊂ C b
I
˜ γ̃), that is ξk → ξ˜
and ξk ∈ ∂g(yk ) ⊂ ∂g(CIb ). Suppose that (ξk , hξk , yk i) → (ξ,
and hξk , yk i → γ̃. As ξk → ξ,˜ which by the compactness of ∂g(C b ) implies that
I
˜
ξ ∈ ∂g(CI ). Because {yk } ⊂ CIb , {yk } is a bounded sequence. By the Bolzano–
b
is equivalent to Θ,e that is, both Θ and Θ e have the same solution set. To
establish this claim, we will prove that CI is the solution set of Θ.
For any (y, i) ∈ Rn × I and ξi ∈ ∂g(y, i), by Definition 2.77 of the subdif-
ferential,
hξi , x − yi ≤ g(x, i) − g(y, i), ∀ x ∈ Rn . (11.43)
In particular, taking x ∈ CI , that is,
g(x, i) ≤ 0, ∀ i ∈ I,
g(x̂, i) < 0, ∀ i ∈ I.
F = {x ∈ Rn : φ(x) ≤ 0}.
We claim that cl cone co epi φ∗ ⊂ epi δF∗ . By Definition 2.101 of the conjugate
function, δF∗ is the same as the support function to the set F , that is, δF∗ = σF .
By Proposition 2.102, δF∗ is lsc, hence by Theorem 1.9, epi δF∗ is closed. Also,
as σF is a sublinear function, by Theorem 2.59 epi σ is a convex cone. So it
is sufficient to establish that epi φ∗ ⊂ epi δF∗ . Consider any (ξ, α) ∈ epi φ∗ ,
which by condition (11.45) implies that
Therefore, (ξ, α) ∈ epi δF∗ . As (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ epi δF∗ .
Because epi δF∗ is a closed convex cone,
To complete the proof, we will prove the converse inclusion, that is,
epi δF∗ ⊂ cl cone co epi φ∗ . Suppose that (ξ, α) ∈ / cl cone co epi φ∗ . As
δF = σF is a sublinear function with δF (0) = 0. Therefore, (0, −1) 6∈ epi δF∗ ,
∗ ∗
which by the relation (11.46) implies that (0, −1) 6∈ cl cone co epi φ∗ . Define
the convex set
We claim that
Fe ∩ cl cone co epi φ∗ = ∅.
Then by the Strict Separation Theorem, Theorem 2.26 (iii), there exist
(a, γ) ∈ Rn × R with (a, γ) 6= (0, 0) such that
For any λ > 0, λ(ξ, α) ∈ cl cone co epi φ∗ , which by the conditions (11.48)
and (11.49) implies that
Consider any ξ ∈ dom φ∗ and ε > 0. Thus, (ξ, φ∗ (ξ) + ε) ∈ cl cone co epi φ∗ .
Therefore, from the condition (11.50),
which implies
1
(ha, ξi + γφ∗ (ξ)) + γ ≥ 0.
ε
Taking the limit as ε → +∞ in the above inequality, which along with (11.50)
yields that 0 > γ ≥ 0, which is a contradiction. Thus, (0, 1) ∈ cl cone co epi φ∗
and hence
{0} × R+ = cone (0, 1) ⊂ cl cone co epi φ∗ . (11.51)
From the relations (11.47) and (11.51),
which implies
1
(ξ, α) = {(1 − λ̃)(ξ, α)} ∈ cl cone co epi φ∗ ,
(1 − λ̃)
Fe ∩ cl cone co epi φ∗ = ∅.
for every (z, β) ∈ cl cone co φ∗ and (z̃, β̃) ∈ Fe. As (0, 0) ∈ cl cone co epi φ∗ ,
for every (z, β) ∈ cl cone co φ∗ and (z̃, β̃) ∈ Fe. For any ξ ∈ dom φ∗ ,
(ξ, φ∗ (ξ)) ∈ cl cone co epi φ∗ , which by the above inequality implies that
−a
which implies that ∈ F . Again using the condition (11.53),
γ
−a −a −a
δF∗ ( ) = σF ( ) ≥ hξ, i > α,
γ γ γ
CI = {x ∈ Rn : g(x) ≤ 0}.
f (x̄) ≤ f (x), ∀ x ∈ CI .
The above condition along with the fact that x̄ ∈ CI , that is, g(x̄, i) ≤ 0, i ∈ I
and the nonnegativity of ε, εi and λi , i ∈ I, implies that
ε = 0, λi εi = 0 and λi g(x̄, i) = 0, i ∈ I.
12.1 Introduction
This is the final chapter of the book. What we want to discuss here is essen-
tially outside the preview of convex optimization. Yet as we will see, convexity
will play a fundamental role in the issues discussed. We will discuss here two
major areas in nonconvex optimization, namely maximization of a convex
function and minimization of a d.c. function. The acronym d.c. stands for
difference convex function, that is, functions expressed as the difference of
two convex functions. Thus, more precisely, we would look into the following
problems:
max f (x) subject to x∈C (P 1)
and min f (x) − g(x) subject to x ∈ C, (P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. A large
class of nonconvex optimization problems actually come into this setting. Note
that (P 1) can be posed as
and thus as
where φ is the zero function. Thus the problem (P 1) can also be viewed as
a special case of (P 2), though we will consider them separately for a better
understanding.
403
For more general results, see [97]. One of the earliest papers dealing exclu-
sively with the optimality conditions of maximizing a convex function over a
convex set is due to Strekalovskii [104]. Though Strekalovskii [104] frames his
problem in a general setting, his results are essentially useful for the convex
case and the main results in his paper are given only for the convex case.
Observe that if f : Rn → R is a convex function and x̄ ∈ C is the point
where f attains a global maximum, then for every x ∈ C,
which implies
Thus the necessary condition is ∂f (x̄) ⊂ NC (x̄). The reader should try to find
a necessary condition when x̄ is a local maximum. Can we find a sufficient
condition for a global maximum? Strekalovskii [104] attempts to answer this
question by developing a set of necessary and sufficient conditions.
C̄ = {x ∈ Rn : f (x) ≤ f (x̄)}
Proof. We will only prove (a) and leave (b) to the readers. If x̄ is a global
maximum, then our discussion preceding the theorem shows that (a) holds,
that the condition in (a) is necessary. Now we will look into the reverse, that
is, whether (a) is sufficient for a global maximum or not. Observe that under
the given hypothesis, for every x∗ ∈ ∂f (x̄),
hx∗ , x − x̄i ≤ 0, ∀ x ∈ C.
Further, as C̄ has a nonempty interior, there exists x̂ such that f (x̂) < f (x̄).
Hence
Thus, NC̄ (x̄) = cone ∂f (x̄). This shows that NC̄ (x̄) ⊂ NC (x̄), which implies
that C ⊂ C̄. Hence x̄ is the point where the maximum is achieved as x̄ is
already given to be an element of C.
It is important to note that without the additional conditions,
∂f (x̄) ⊂ NC (x̄) does not render a global maximum. Here we put forward an
example from Dutta [38]. Consider f : R → R defined as
which is a closed convex set. It is clear that x̂ 6∈ S(x̄). Thus, the following
projection problem:
1
min kx − x̂k2 subject to f (x) ≤ f (x̄), x ∈ C
2
has a unique solution. Let x̃ ∈ C be that unique solution. Now using the Fritz
John optimality conditions for a convex optimization problem, Theorem 5.1,
there exist λ0 ≥ 0 and λ1 ≥ 0 with (λ0 , λ1 ) 6= (0, 0) such that
(i) 0 ∈ λ0 (x̃ − x̂) + λ1 ∂f (x̃) + NC (x̃),
0 ∈ ∂f (x̃) + NC (x̃).
This is obtained by dividing both sides by λ1 and noting that NC (x̃) is a cone.
As f is convex, invoking Theorem 3.1, x̃ ∈ C is a point of minimizer of f over
C, that is,
The condition f (x̃) = f (x̄) along with the given hypothesis yields that
that is,
x̂ − x̃ ∈ NC (x̃).
Because x̂ ∈ C,
implying that x̃ = x̂. This is indeed a contradiction. Hence λ1 > 0. Thus there
exist ξ ∈ ∂f (x̃) and η ∈ NC (x̃) such that
The conditions (12.1), (12.2), and (12.3) lead to a contradiction, thereby es-
tablishing the result.
0 ∈ ∂ ◦ (f − g)(x̄).
For details, see Clarke [27] or Chapter 3. Hence, by the Sum Rule of the Clarke
subdifferential,
0 ∈ ∂ ◦ f (x̄) + ∂ ◦ (−g)(x̄).
Noting that ∂ ◦ f (x̄) = ∂f (x̄) and ∂ ◦ (−g)(x̄) = −∂ ◦ g(x̄) = −∂g(x̄), the above
condition becomes
0 ∈ ∂f (x̄) − ∂g(x̄).
∂f (x̄) ∩ ∂g(x̄) 6= ∅.
We would again like to stress that for details on the Clarke subdifferential,
see Clarke [27].
Let us note that the above condition is only necessary and not sufficient.
Consider h(x) = f (x) − g(x), where f (x) = x2 and g(x) = |x|. At x̄ = 0,
∂f (0) = 0 and ∂g(0) = [−1, 1]. Thus,
0 ∈ ∂ ◦ (f − g)(x̄) + NC (x̄),
ξg = ξf + η.
ξg ∈ ∂f (x̄) + NC (x̄).
Therefore,
∂g(x̄) ⊂ ∂f (x̄).
Note that this is again a necessary condition and not sufficient. We urge the
reader to find an example demonstrating this fact.
We now present interesting and important necessary and sufficient optimal-
ity conditions for the global optimization of problem (P 2). Here the optimality
conditions will be expressed in terms of the ε-subdifferential. We present this
result as given in Bomze [15].
thereby implying that ξ ∈ ∂ε f (x̄). Because ε > 0 was arbitrary, this establishes
that
Let us now look at the converse. On the contrary, assume that x̄ is not a
global minimizer of (P 2), which implies that there exists x̂ ∈ Rn such that
Set δ = (1/2)(f (x̄) − f (x̂) − g(x̄) + g(x̂)). It is simple to see that δ > 0.
Now consider ξˆ ∈ ∂g(x̂), which implies that
ˆ x̄ − x̂i ≥ 0.
g(x̄) − g(x̂) − hξ,
Because δ > 0,
ˆ x̄ − x̂i + δ > 0.
g(x̄) − g(x̂) − hξ,
ˆ x̄ − x̂i + δ. Then for any x ∈ Rn ,
Set ε = g(x̄) − g(x̂) − hξ,
ˆ x − x̄i − ε
hξ, ˆ x − x̂ + x̂ − x̄i − ε
= hξ,
ˆ x − x̂i − δ + g(x̂) − g(x̄).
= hξ,
Thus,
ˆ x − x̄i − ε ≤ g(x) − g(x̄), ∀ x ∈ Rn ,
hξ,
Now
Hence,
ˆ x̂ − x̄i − hξ,
2δ ≤ δ + hξ, ˆ x̂ − x̄i = δ < 2δ,
Using the above result, one can deduce an optimality conditions for the case
when C ⊂ Rn and f : Rn → R. Observe that when C ⊂ Rn and f : Rn → R,
the problem (P 2) can be equivalently written as
This is done of course by applying Theorem 12.8. Invoking the Sum Rule of
ε-subdifferential, Theorem 2.115,
[
∂ε (f + δC )(x̄) = (∂ε1 f (x̄) + ∂ε2 δC (x̄)).
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
Hence,
[
∂ε g(x̄) ⊂ (∂ε1 f (x̄) + Nε2 ,C (x̄)), ∀ ε > 0.
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
that is,
413
[37] J. Dutta. Necessary optimality conditions and saddle points for approx-
imate optimization in Banach spaces. Top, 13:127–144, 2005.
[72] V. L. Klee. The critical set of a convex body. Am. J. Math., 75:178–188,
1953.
[98] R. T. Rockafellar. Some convex programs whose duals are linearly con-
strained. In Nonlinear Programming (Proc. Sympos., Univ. of Wiscon-
sin, Madison, Wis., 1970), pages 293–322. Academic Press, New York,
1970.
[110] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley & Sons,
Inc., New York, 1984.
423