Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)

OPTIMALITY CONDITIONS
IN
CONVEX OPTIMIZATION
A Finite-Dimensional View
© 2012 by Taylor & Francis Group, LLC
K13102_FM.indd 1 9/2/11 11:33 AM

OPTIMALITY CONDITIONS
IN
CONVEX OPTIMIZATION
A Finite-Dimensional View
Anulekha Dhara
Joydeep Dutta
K13102_FM.indd 3 9/2/11 11:33 AM

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works

Version Date: 20110831
International Standard Book Number-13: 978-1-4398-6823-2 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

In memory of
Professor M. C. Puri
and
Professor Alex M. Rubinov

Contents
List of Figures xi
Symbol Description xiii
Foreword xv
Preface xvii
1 What Is Convex Optimization? 1

1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Smooth Convex Optimization . . . . . . . . . . . . . . . . . 15
2 Tools for Convex Optimization 23

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Convex Cones . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 Hyperplane and Separation Theorems . . . . . . . . . 44
2.2.3 Polar Cones . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.4 Tangent and Normal Cones . . . . . . . . . . . . . . . 54
2.2.5 Polyhedral Sets . . . . . . . . . . . . . . . . . . . . . . 60
2.3 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.1 Sublinear and Support Functions . . . . . . . . . . . . 71
2.3.2 Continuity Property . . . . . . . . . . . . . . . . . . . 75
2.3.3 Differentiability Property . . . . . . . . . . . . . . . . 85
2.4 Subdifferential Calculus . . . . . . . . . . . . . . . . . . . . . 98
2.5 Conjugate Functions . . . . . . . . . . . . . . . . . . . . . . . 111
2.6 ε-Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . 122
2.7 Epigraphical Properties of Conjugate Functions . . . . . . . 136
3 Basic Optimality Conditions Using the Normal Cone 143

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.2 Slater Constraint Qualification . . . . . . . . . . . . . . . . . 145
3.3 Abadie Constraint Qualification . . . . . . . . . . . . . . . . 154
3.4 Convex Problems with Abstract Constraints . . . . . . . . . 157
3.5 Max-Function Approach . . . . . . . . . . . . . . . . . . . . 159
3.6 Cone-Constrained Convex Programming . . . . . . . . . . . 161
vii

viii Contents
4 Saddle Points, Optimality, and Duality 169

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2 Basic Saddle Point Theorem . . . . . . . . . . . . . . . . . . 171
4.3 Affine Inequalities and Equalities and Saddle Point Condition 175
4.4 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . 185
4.5 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . 196
4.6 Equivalence between Lagrangian and Fenchel Duality . . . . 200
5 Enhanced Fritz John Optimality Conditions 207

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.2 Enhanced Fritz John Conditions Using the Subdifferential . 208
5.3 Enhanced Fritz John Conditions under Restrictions . . . . . 216
5.4 Enhanced Fritz John Conditions in the Absence of Optimal
Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.5 Enhanced Dual Fritz John Optimality Conditions . . . . . . 235
6 Optimality without Constraint Qualification 243

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.2 Geometric Optimality Condition: Smooth Case . . . . . . . . 249
6.3 Geometric Optimality Condition: Nonsmooth Case . . . . . . 255
6.4 Separable Sublinear Case . . . . . . . . . . . . . . . . . . . . 274
7 Sequential Optimality Conditions 281

7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
7.2 Sequential Optimality: Thibault’s Approach . . . . . . . . . 282
7.3 Fenchel Conjugates and Constraint Qualification . . . . . . . 293
7.4 Applications to Bilevel Programming Problems . . . . . . . . 308
8 Representation of the Feasible Set and KKT Conditions 315

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.2 Smooth Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.3 Nonsmooth Case . . . . . . . . . . . . . . . . . . . . . . . . . 320
9 Weak Sharp Minima in Convex Optimization 327

9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.2 Weak Sharp Minima and Optimality . . . . . . . . . . . . . . 328
10 Approximate Optimality Conditions 337

10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
10.2 ε-Subdifferential Approach . . . . . . . . . . . . . . . . . . . 338
10.3 Max-Function Approach . . . . . . . . . . . . . . . . . . . . 342
10.4 ε-Saddle Point Approach . . . . . . . . . . . . . . . . . . . . 345
10.5 Exact Penalization Approach . . . . . . . . . . . . . . . . . . 350
10.6 Ekeland’s Variational Principle Approach . . . . . . . . . . . 355
10.7 Modified ε-KKT Conditions . . . . . . . . . . . . . . . . . . 358
10.8 Duality-Based Approach to ε-Optimality . . . . . . . . . . . 361

Contents ix
11 Convex Semi-Infinite Optimization 365

11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.2 Sup-Function Approach . . . . . . . . . . . . . . . . . . . . . 366
11.3 Reduction Approach . . . . . . . . . . . . . . . . . . . . . . . 368
11.4 Lagrangian Regular Point . . . . . . . . . . . . . . . . . . . . 374
11.5 Farkas–Minkowski Linearization . . . . . . . . . . . . . . . . 382
11.6 Noncompact Scenario: An Alternate Approach . . . . . . . . 395
12 Convexity in Nonconvex Optimization 403

12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
12.2 Maximization of a Convex Function . . . . . . . . . . . . . . 403
12.3 Minimization of d.c. Functions . . . . . . . . . . . . . . . . . 408
Bibliography 413
Index 423

List of Figures
1.1 Lower semicontinuous hull. . . . . . . . . . . . . . . . . . . . 11

1.2 Graph of a real-valued differentiable convex function. . . . . . 16
1.3 Local minimizer is global minimizer. . . . . . . . . . . . . . . 18
2.1 Convex and nonconvex sets. . . . . . . . . . . . . . . . . . . . 24

2.2 F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 are
nonconvex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Line segment principle. . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Tangent cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Normal cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6 Graph and epigraph of convex function. . . . . . . . . . . . . 63
2.7 Epigraphs of improper functions φ1 and φ2 . . . . . . . . . . . 76
2.8 Graph of ∂(|.|). . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.9 Graph of ∂1 (|.|). . . . . . . . . . . . . . . . . . . . . . . . . . 126
3.1 NC (x̄) is not polyhedral. . . . . . . . . . . . . . . . . . . . . . 153

3.2 C ∩Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
5.1 Pseudonormality. . . . . . . . . . . . . . . . . . . . . . . . . . 222

5.2 Not pseudonormal. . . . . . . . . . . . . . . . . . . . . . . . . 223
9.1 Pictorial representation of Theorem 9.6. . . . . . . . . . . . . 336
xi

Symbol Description
∅ empty set k.k norm

∞ infinity φ(F ) image space of F under φ
N set of natural numbers gph Φ graph of set-valued map Φ
R real line dom φ effective domain of φ : X
R̄ R ∪ {−∞, +∞} → R̄
Rn n-dimensional Euclidean epi φ epigraph of φ
space lev≤α φ α-lower level set of φ
R+ nonnegative orthant of R δF indicator function to F
Rn+ nonnegative orthant of Rn dF distance function to F
[x, y] closed line segment joining projF (x̄) projection of x̄ to F
x and y σ( . ; F ) support function to F
(x, y) open line segment joining φ∗ conjugate function of φ
x and y Q φ+ (x) max{0, φ(x)}
RI product space I R ∇φ(x̄) derivative or gradient of φ
R[I] {λ ∈ RI : λi 6= 0 for finitely at x̄
many i ∈ I} φ◦ (x̄; d) Clarke directional derivative
[I]
R+ positive cone in R[I] of φ at x̄ in the direction d
supp λ {i ∈ I : λ ∈ R[I] , λi 6= 0} ∂φ
partial derivative of φ with
B open unit ball ∂x
Bδ (x̄) open ball with radius δ > 0 respect to x
and center at x̄ ∂2φ
second-order partial deriva-
cl F closure of F ∂xi ∂xj
co F convex hull of F tive of φ with respect to xi
cl co F closed convex hull of F and xj
af f F affine hull of F ∂φ(x̄) convex subdifferential of φ
int F interior of F at x̄
ri F relative interior of F ∂ǫ φ(x̄) ǫ-subdifferential of φ at x̄
cone F cone generated by F ∂ ◦ φ(x̄) Clarke subdifferential or
F+ positive polar cone of F generalized gradient of φ at
F◦ polar cone of F x̄
x → x̄ x converges to x̄ ∇2 φ(x̄) Hessian of φ at x̄
lim limit Jφ(x̄) Jacobian of φ at x̄
lim inf limit infimum TF (x̄) tangent cone to F at x̄
lim sup limit supremum NF (x̄) normal cone to F at x̄
h., .i inner product Nǫ,F (x̄) ǫ-normal set to F at x̄
xiii

Foreword
The roots of the mathematical topic of optimization go back to ancient Greece,

when Euclid considered the minimal distance of a point to a line; convex
sets were investigated by Minkowski about a hundred years ago, and fifty
years ago, J.-J. Moreau [87] defined the notion of the subdifferential of a
convex function. In 1970, R.T. Rockafellar wrote his monograph [97] on convex
analysis. Since then, the field of convex optimization and convex analysis has
developed rapidly, a huge number of papers on that topic have been published
in scientific journals and a large number of monographs and textbooks have
been produced. Now, we have a new book at hand and one can ask why read
this book.
A recent topic of research in mathematical optimization is the need to
compute global optima of nonconvex problems. To do that, the problem can
be convexified using the optimal function value of the resulting convex opti-
mization problem as a bound for the problem investigated. Combining this
with an enumeration idea the problem can be solved. The same approach of
convexification plus enumeration can serve as a way to solve mixed-integer
nonlinear optimization problems which is a second challenging problem of re-
cent and future research. Moreover, many practical situations lead directly to
convex programming problems. Hence the need to develop a deep insight into
convex optimization.
The theory of convex differentiable optimization is well established. Every
student will be introduced in basic courses on mathematical optimization to
the Fritz John and Karush–Kuhn–Tucker necessary optimality conditions. For
guaranteeing the Karush–Kuhn–Tucker conditions a constraint qualification
such as the Slater condition is needed. But, in many applications, this condi-
tion is violated. There are a larger number of ways out in such a situation:
Abadie constraint qualification can be supposed, sequential optimality condi-
tions can be used or we can try to filter out full information of (enhanced)
Fritz John necessary optimality conditions. These nonstandard but essential
parts of the theory of convex optimization need to be described in detail and
in close relation to each other.
Nonsmooth analysis (see, for example, Mordukhovich [86]) is a quickly
developing area in mathematical optimization. The initial point of nonsmooth
analysis is convex analysis, but recent developments in nonsmooth analysis are
a good influence on convex analysis.
The aim of this book is to develop deep insight into the theory of convex
xv

xvi Foreword
optimization, combining very recent ideas of nonsmooth analysis with stan-

dard and nonstandard theoretical results. Lagrange and Fenchel duality use
different tools and can be applied successfully in distinct directions. But in
the end, both are shown to coincide.
If, at an optimal solution, no constraint qualification is satisfied, algorithms
solving the Karush–Kuhn–Tucker conditions cannot be used to compute this
point. And, how to characterize such a point? Roughly speaking one idea is
the existence of a sequence outside of the feasible set with smaller objective
function values converging to that point. These are the enhanced Fritz John
necessary optimality conditions. A second idea is to characterize optimality
via subgradients of the regular Lagrangian function at perturbed points con-
verging to zero. This is the sequential optimality condition. Both optimality
conditions work without constraint qualifications. ε-optimal solutions can be
characterized using ε-subgradients.
One special convex optimization problem is also investigated. This is the
problem to find a best point within the set of optimal solutions of a convex
optimization problem. If the objective function is convex, this is a convex
optimization problem called a simple bilevel programming problem. It is easy
to see that standard regularity conditions are violated at every feasible point.
For this problem, a very general constraint qualification is derived.
A last question is if convexity can successfully be used to investigate non-
convex problems as the maximization of convex functions or the minimization
of a function being the difference of convex functions.
Algorithmic approaches for solving convex optimization problems are not
described in this book. This results in much more space for theoretical proper-
ties. The result is a book illuminating not only the body but also the bounds
and corners of the theory of convex optimization. Many of the results pre-
sented are usually not contained in books on this topic. But, if more and
more (applied) difficult optimization problems need to be solved, we are more
likely be faced with instances where usual approaches fail. Then it is necessary
to search away from standard tools for applicable approaches. I am sure that
this book will be very helpful.
I deeply recommend this book for advanced reading.
Stephan Dempe
Freiberg, Germany

Preface
This is a book on convex optimization. More precisely it is a book on the re-

cent advances in the theory of optimality conditions for convex optimization.
The question is why should one need an additional book on the subject? How-
ever, possibly the books on convex analysis are much more in number than the
ones on convex optimization. In the books dealing with convex analysis, like
the classic Convex Analysis by Rockafellar [97] or the the more recent Convex
Analysis and Nonlinear Optimization by Borwein and Lewis [17], one would
find convex optimization theory appears as an application to various results
of convex analysis. However, from 1970 until now there has been a growing
body of research in the area of optimality conditions for a convex optimiza-
tion. Many of these results address the question as to what happens when
the Slater condition fails for a convex optimization problem or are there very
general constraint qualification conditions which hold even if the the most pop-
ular ones fail? The books on convex analysis usually do not present results of
this type and thus these results remain largely scattered in the vast literature
on convex optimization. On the other hand, the books dealing with convex
optimization largely focus on algorithms or algorithms and theory associated
with a certain special class of problems like second-order conic programming
or semidedfinite programming. Some recent books like Introductory Lectures
in Convex Optimization by Nesterov [90] or Lectures on Modern Convex Opti-
mization by Ben-Tal and Nemirovskii [8] deal with algorithms and the special
problems, respectively.
This book has a completely different focus. It deals with optimality con-
ditions in convex optimization. It attempts to bring in most of the important
and recent results in this area that are scattered in the literature. However,
we do not ignore the required convex analysis either. We provide a detailed
chapter on the main convex analytic tools and also provide some new results
that have appeared recently in the literature. These results are usually not
found in standard books on convex analysis but they are essential in devel-
oping many of the important results in this book. This book actually began
as a survey paper but then we realized that it has too much material to be
considered as a survey; and then we thought of converting the survey paper
into the form of a monograph.
We would look to thank the many people who encouraged us to write
the book. Professor Stephan Dempe agreed very kindly to write the foreword.
Professor Boris Mordukhovich, Professor Suresh Chandra, Professor Juan En-
xvii

xviii Preface
rique Martinez-Legaz also encouraged us to go ahead and write the book. We

are indeed grateful to them. We would also like to thank Aastha Sharma of
Taylor & Francis, India, for superb handling of the whole book project and
Shashi Kumar from the helpdesk of Taylor & Francis for helping with the
formatting. We would also like to extend our deepest gratitude to our families
for their support. Joydeep Dutta would like to thank his daughter Naina and
his wife Lalty for their understanding and patience during the time this book
was written. Anulekha Dhara would like to express her deepest and sincer-
est regards and gratitude to her parents Dr. Madhu Sudan Dhara and Dolly
Dhara for their understanding and support. She would also like to thank the
National Board for Higher Mathematics, Mumbai, India, for providing finan-
cial support during her tenure at the Indian Institute of Technology Kanpur,
India.
The book is intended for research mathematicians in convex optimization
and also for graduate students in the area of optimization theory. This could be
of interest also to the practitioner who might be interested in the development
of the theory. We have tried our best to make the book free of errors. But to
err is human, so we take the responsibility for any errors the readers might
find in the book. We would also like to request that readers communicate
with us by email at the address: jdutta@iitk.ac.in. We sincerely hope that the
young researchers in the field of optimization will find this book helpful.
Anulekha Dhara
Avignon, France
Joydeep Dutta
Kanpur, India

Chapter 1
What Is Convex Optimization?
1.1 Introduction
Optimization is the heart of applied mathematics. Various problems encoun-
tered in the areas of engineering, sciences, management science, and economics
are based on the fundamental idea of mathematical formulation. Optimiza-
tion is an essential tool for the formulation of many such problems expressed
in the form of minimization of a function under certain constraints like in-
equalities, equalities, and/or abstract constraints. It is thus rightly considered
a science of selecting the best of the many possible decisions in a complex
real-life environment.
Even though optimization problems have existed since very early times,
the optimization theory has settled as a solid and autonomous field only in
recent decades. The origin of analytic optimization lies in the classical calculus
of variations and is interrelated with the development of calculus. The very
concept of derivative introduced by Fermat in the mid-seventeenth century via
the tangent slope to the graph of a function was motivated by solving an op-
timization problem, leading to the Fermat stationary principle. Around 1684,
Leibniz developed a method to distinguish between minima and maxima via
second-order derivatives. The calculus of variations was introduced by Euler
while solving the Brachistochrone problem, which was posed by Bernoulli in
1696. The problem is stated as “Given two points x and y in the vertical plane.
A particle is allowed to move under its own gravity from x to y. What should
be the curve along which the particle should move so as to reach y from x in
the shortest time?” In 1759, Lagrange gave a completely different approach
to solve the problems in calculus of variations, today known as the Lagrange
multiplier rule. The Lagrange multipliers are viewed as the auxiliary variables
that are primarily used to derive the optimality conditions for constrained
optimization problems. These optimality conditions are the building blocks of
optimization theory.
During the second world war, Dantzig developed the simplex method to
solve linear programming problems. The first attempt to develop the La-
grange multiplier rules for nonlinear optimization problem was made by Fritz
John [71] in 1948. In 1951, Kuhn and Tucker [73] gave the Lagrange multiplier
rule for convex and other nonlinear optimization problems involving differen-

2 What Is Convex Optimization?
tiable functions. It was later found that Karush in 1939 had independently
established the optimality conditions similar to those of Kuhn and Tucker.
These optimality conditions are today famous as the Karush–Kuhn–Tucker
(KKT) optimality conditions. All the initial theories were developed with the
differentiability assumptions of the functions involved.
Meanwhile, efforts were made to shed the differentiability hypothesis,
thereby leading to the development of nonsmooth convex analysis as a subject
in itself. This added a new chapter to optimization theory. The key contrib-
utors in the development of convexity theory are Fenchel [45], Moreau [88],
and Rockafellar [97]. An important milestone in this direction was the publi-
cation of Convex Analysis by Rockafellar [97], where the theory of nonsmooth
convex analysis was presented in detail for the first time. No wonder this text
is by far a must for all optimization researchers. In the early 1970s, his stu-
dent Clarke coined the term nonsmooth optimization to categorize the theory
involving nondifferentiable optimization problems. He extended the calculus
rules and applied them to optimization problems involving locally Lipschitz
functions. This was just the beginning. The subsequent decade witnessed a
large development in the field of nonsmooth nonconvex optimization. For de-
tails on nonsmooth analysis, one may refer to Borwein and Lewis [17]; Bor-
wein and Zhu [18]; Clarke [27]; Clarke, Ledyaev, Stern and Wolenshi [28];
Mordukhovich [86]; and Rockafellar and Wets [101].
However, such developments have not overshadowed the importance of
convex optimization, which still is and will remain a pivotal area of research. It
has paved a path not only for theoretical improvements, but also algorithmic
designing aspects. In this book we focus mainly on convex analysis and its
application to the development of convex optimization theory.
1.2 Basic Concepts

By convex optimization we simply mean the problem of minimizing a convex
function over a convex set. More precisely, we are concerned with the following
problem
min f (x) subject to x ∈ C, (CP )
where f : Rn → R is a convex function and C ⊂ Rn is a convex set. Of course
in most cases the the set C is described by a system of convex inequalities
and affine equalities. Thus we can write
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m and
hj (x) = 0, j = 1, 2, . . . , l},
where gi : Rn → R, i = 1, 2, . . . , m are convex functions and hj : Rn → R,

1.2 Basic Concepts 3
j = 1, 2, . . . , l are affine functions. When C is expressed explicitly as above,

(CP ) is called the convex programming problem.
A set C ⊂ Rn is a convex set if for any x, y ∈ Rn , the line segment joining
them, that is
[x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1},
is also in C. A function φ : Rn → R is a convex function if for any x, y ∈ Rn

and λ ∈ [0, 1],
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),
while it is an affine function if it is a translate of a linear function; that is, φ

is affine if
φ(x) = ha, xi + b,
where a ∈ Rn and b ∈ R.
It is important to note at the very outset that in optimization theory it is
worthwhile to consider extended-valued functions, that is, functions that take
values in R̄ = R∪{−∞, +∞}. The need to do so arises when we seek to convert
a constrained optimization problem into an unconstrained one. Consider for
example the problem (CP ), which can be restated as
min f0 (x) subject to x ∈ Rn ,
where

f (x), x ∈ C,
f0 (x) =
+∞, otherwise.
All the modern books on convex analysis beginning with the classic Convex
Analysis by Rockafellar [97] follow this framework. However, when we include
infinities, we need to know how to deal with them. Most rules with infinity
are intuitively clear except possibly 0 × (+∞) and ∞ − ∞. Because we will
be dealing mainly with minimization problems, we will follow the convention
0 × (+∞) = (+∞) × 0 = 0 and ∞ − ∞ = ∞. This convention was adopted in
Rockafellar and Wets [101] and we shall follow it. However, we would like to
ascertain that we really need not get worried about ∞ − ∞ as the functions
considered in this book are real-valued or proper functions. An extended-
valued function φ : Rn → R̄ is said to be a proper function if φ(x) > −∞ for
every x ∈ Rn and dom φ is nonempty where dom φ = {x ∈ Rn : φ(x) < +∞}
is the domain of φ.
It is worthwhile to note that the definition of a convex function given
above can be extended to the case when φ is an extended-valued function. An
extended-valued function φ : Rn → R̄ is a convex function if for any x, y ∈ Rn
and λ ∈ [0, 1],
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),

with the convention that ∞ − ∞ = +∞. A better way to handle the convexity
of an extended-valued convex function is to use its associated geometry. In
this direction we describe the epigraph of a function φ : Rn → R̄, which is
given as
epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.
A function is said to be convex if the epigraph is convex. We leave it as a simple

exercise for the reader to show that if the epigraph of a function φ : Rn → R̄
is convex in Rn × R, then φ is a convex function over Rn . For more details see
Chapter 2.
In case of extended-valued functions, one can work with the semicontinuity
of the functions rather than the continuity. Before we define those notions, we
present certain notations that will be used throughout.
For any two sets F1 , F2 ⊂ Rn , define
F1 + F2 = {x1 + x2 ∈ Rn : x1 ∈ F1 , x2 ∈ F2 }.
For any set F ⊂ Rn and any scalar λ ∈ R,
λF = {λx ∈ Rn : x ∈ F }.
The closure of a set F is denoted by cl F while the interior is given by int F .

The open unit ball, or simply unit ball, is denoted by B. By Bδ (x̄) we mean
an open ball of radius δ > 0 with center at x̄. Explicitly,
Bδ (x̄) = x̄ + δB.
(y1 , y2 , . . . , yn ) in Rn , the inner

For vectors x = (x1 , x2 , . . . , xn ) and y = P
n
py is denoted by hx, yi = i=1 xi yi while the norm of x is
product of x and
given by kxk = hx, xi. We state a standard result on the norm.
Proposition 1.1 (Cauchy–Schwarz Inequality) For any two vectors x, y ∈ Rn ,
|hx, yi| ≤ kxkkyk.
The above inequality holds as equality if and only if x = αy for some scalar
α ∈ R.
To discuss the concept of continuities of a function, we shall consider the

notions of limit infimum and limit supremum of a function. But first we discuss
the convergence of sequences in Rn .
Definition 1.2 A sequence {xk ∈ R : k = 1, 2, . . .} or simply {xk } ⊂ R is

said to converge to x̄ ∈ R if for every ε > 0, there exists kε such that
|xk − x̄| < ε, ∀ k ≥ kε .
A sequence {xk } ⊂ Rn converges to x̄ ∈ Rn if the i-th component of xk

converges to the i-th component of x̄. The vector x̄ is called the limit of {xk }.
Symbolically it is expressed as
xk → x̄ or lim xk = x̄.
k→∞
The sequence {xk } ⊂ Rn is bounded if each of its components is bounded.

Equivalently, {xk } is bounded if and only if there exists M ∈ R such that
kxk k ≤ M for every k ∈ N. A subsequence of {xk } ⊂ Rn is a sequence {xkj },
j = 1, 2, . . ., where each xkj is a member of the original sequence and the order
of the elements as in the original sequence is maintained. A vector x̄ ∈ Rn is
a limit point of {xk } ⊂ Rn if there exists a subsequence of {xk } converging
to x̄. If the limit point is unique, it is the limit of {xk }. Next we state the
classical result on the bounded sequences.
Proposition 1.3 (Bolzano–Weierstrass Theorem) A bounded sequence in Rn

has a convergent subsequence.
For a sequence {xk } ⊂ R, define
zr = inf{xk : k ≥ r} and yr = sup{xk : k ≥ r}.
It is obvious that the sequences {zr } and {yr } are nondecreasing and non-
increasing, respectively. If {xk } is bounded below or bounded above, the se-
quences {zr } or {yr }, respectively, have a limit. The limit of {zr } is called
the limit infimum or lower limit of {xk } and denoted by lim inf k→∞ xk , while
that of {yr } is called the limit supremum or upper limit of {xk } and denoted
by lim supk→∞ xk . Equivalently,
lim inf xk = lim { inf xr } and lim sup xk = lim {sup xr }.

k→∞ k→∞ r≥k k→∞ k→∞ r≥k
For a sequence {xk }, lim inf k→∞ xk = −∞ if the sequence is unbounded below
while lim supk→∞ xk = +∞ if the sequence is unbounded above. Therefore,
{xk } converges to x̄ if and only if
−∞ < lim inf xk = x̄ = lim sup xk < +∞.

k→∞ k→∞
Now we move on to define the semicontinuities of a function that involve

the limit infimum and limit supremum of the function.
Definition 1.4 A function φ : Rn → R̄ is said to be lower semicontinuous

(lsc) at x̄ ∈ Rn if for every sequence {xk } ⊂ Rn converging to x̄,
φ(x̄) ≤ lim inf φ(xk ).

k→∞
Equivalently,
φ(x̄) ≤ lim inf φ(x),

x→x̄

where the term on the right-hand side of the inequality denotes the limit
infimum or the lower limit of the function φ defined as
lim inf φ(x) = lim inf φ(x).
x→x̄ δ↓0 x∈Bδ (x̄)
The function φ is lsc over a set F ⊂ Rn if φ is lsc at every x̄ ∈ F .

For a function φ : Rn → R̄,
inf φ(x) ≤ φ(x̄).
x∈Bδ (x̄)
Taking the limit as δ ↓ 0 in the above inequality leads to

lim inf φ(x) ≤ φ(x̄).
x→x̄
Thus, the inequality in the above definition of lsc can be replaced by an

equality, that is, φ : Rn → R̄ is lsc at x̄ if
φ(x̄) = lim inf φ(x).
x→x̄
Similar to the concept of lower semicontinuity and limit infimum, we next

define the upper semicontinuity and the limit supremum of a function.
Definition 1.5 A function φ : Rn → R̄ is said to be upper semicontinuous
(usc) at x̄ ∈ Rn if for every sequence {xk } ⊂ Rn converging to x̄,
φ(x̄) ≥ lim sup φ(xk ).
k→∞
Equivalently,
φ(x̄) ≥ lim sup φ(x),
x→x̄
where the term on the right-hand side of the inequality denotes the limit
supremum or the upper limit of the function φ defined as
lim sup φ(x) = lim sup φ(x).
x→x̄ δ↓0 x∈Bδ (x̄)
The function φ is usc over a set F ⊂ Rn if φ is usc at every x̄ ∈ F .

Definition 1.6 A function φ : Rn → R̄ is said to be continuous at x̄ if it is
lsc as well as usc at x̄, that is,
lim φ(x) = φ(x̄).
x→x̄
Alternatively, φ is continuous at x̄ if for any ε > 0 there exists δ(ε, x̄) > 0
such that
|φ(x) − φ(x̄)| ≤ ε whenever kx − x̄k < δ(ε, x̄).
The function φ is continuous over a set F ⊂ Rn if φ is continuous at every
x̄ ∈ F .

Because we will be considering minimization problems, the continuity of

a function will be replaced by lower semicontinuity. Before moving on, we
state a result on the infimum and supremum operations from Rockafellar and
Wets [101].
Proposition 1.7 (i) Consider an extended-valued function φ : Rn → R̄ and

sets Fi ⊂ Rn , i = 1, 2 such that F1 ⊂ F2 . Then
inf φ(x1 ) ≥ inf φ(x2 ) and sup φ(x1 ) ≤ sup φ(x2 ).

x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2
(ii) Consider the functions φ1 , φ2 : Rn → R̄ and a set F ⊂ Rn . Then
inf φ1 (x) + inf φ2 (x) ≤ inf (φ1 + φ2 )(x)

x∈F x∈F x∈F
≤ sup (φ1 + φ2 )(x) ≤ sup φ1 (x) + sup φ2 (x).
x∈F x∈F x∈F
Also, for functions φi : Rni → R̄ and sets Fi ⊂ Rni , i = 1, 2,
inf φ1 (x1 ) + inf φ2 (x2 ) = inf (φ1 (x1 ) + φ2 (x2 )),

x1 ∈F1 x2 ∈F2 (x1 ,x2 )∈F1 ×F2
sup φ1 (x1 ) + sup φ2 (x2 ) = sup (φ1 (x1 ) + φ2 (x2 )).

x1 ∈F1 x2 ∈F2 (x1 ,x2 )∈F1 ×F2
(iii) Consider an extended-valued function φ : Rn → R̄ and a set F ⊂ Rn .

Then for λ ≥ 0,
inf (λφ)(x) = λ inf φ(x) and sup (λφ)(x) = λ sup φ(x),

x∈F x∈F x∈F x∈F
provided 0 × (+∞) = 0 = 0 × (−∞).
The next result from Rockafellar and Wets [101] gives a characterization
of limit infimum of an arbitrary extended-valued function.
Lemma 1.8 For an extended-valued function φ : Rn → R̄,
lim inf φ(x) = min{α ∈ R̄ : there exists xk → x̄ satisfying φ(xk ) → α}.

x→x̄
Proof. Suppose that lim inf x→x̄ φ(x) = ᾱ. We claim that for xk → x̄ with
φ(xk ) → α, α ≥ ᾱ. As xk → x̄, for any δ > 0, there exists kδ ∈ N such that
xk ∈ Bδ (x̄) for every k ≥ kδ . Therefore,
φ(xk ) ≥ inf φ(x).

x∈Bδ (x̄)
Taking the limit as k → +∞ in the above inequality,
α≥ inf φ(x), ∀ δ > 0.

x∈Bδ (x̄)

Because δ is arbitrarily chosen, so taking the limit δ ↓ 0 along with the defin-
ition of the limit infimum of φ leads to
α ≥ lim inf φ(x),

x→x̄
that is, α ≥ ᾱ.

To prove the result, we shall show that there exists a sequence xk → x̄
such that φ(xk ) → ᾱ. For a nonnegative sequence {δk }, define
ᾱk = inf φ(x).

x∈Bδk (x̄)
As δk → 0, by Definition 1.4 of limit infimum, ᾱk → ᾱ. Now for every k ∈ N,

by the definition of infimum it is possible to find xk ∈ Bδk (x̄) for which φ(xk )
is very close to ᾱk , that is, in an interval [ᾱk , αk ] where ᾱk < αk and αk → ᾱ.
Therefore, as k → +∞, xk → x̄, and φ(xk ) → ᾱ, thereby establishing the
result.
After the characterization of limit infimum of a function, the result below
gives an equivalent characterization of lower semicontinuity of the function in
terms of the epigraph and lower level set.
Theorem 1.9 Consider a function φ : Rn → R̄. Then the following condi-

tions are equivalent:
(i) φ is lsc over Rn .

(ii) The epigraph of φ, epi φ, is a closed set in Rn × R.
(iii) The lower-level set lev≤α φ = {x ∈ Rn : φ(x) ≤ α} is closed for every
α ∈ R.
Proof. If φ ≡ ∞, the result holds trivially. So assume that dom φ is nonempty

and thus epi φ and lev≤α φ are nonempty.
We will first show that (i) implies (ii). Consider a sequence {(xk , αk )} ⊂
epi φ such that (xk , αk ) → (x̄, ᾱ). Therefore, φ(xk ) ≤ αk , which implies that
lim inf φ(x) ≤ lim inf φ(xk ) ≤ ᾱ.

x→x̄ k→∞
By the lower semicontinuity of φ,
φ(x̄) = lim inf φ(x),

x→x̄
which reduces the preceding condition to φ(x̄) ≤ ᾱ, thereby proving that epi φ
is a closed set in Rn × R.
Next we show that (ii) implies (iii). For a fixed α ∈ R, suppose
that {xk } ⊂ lev≤α φ such that xk → x̄. Therefore, φ(xk ) ≤ α, that is,
(xk , α) ∈ epi φ. By (ii), epi φ is closed, which implies (x̄, α) ∈ epi φ, that
is, φ(x̄) ≤ α. Thus, x̄ ∈ lev≤α φ, thereby yielding condition (iii).

Finally, to obtain the equivalence, we will establish that (iii) implies (i).
To show that φ is lsc, we need to show that for every x̄ ∈ Rn ,
φ(x̄) ≤ lim inf φ(xk ) whenever xk → x̄.

k→∞
On the contrary, assume that for some x̄ ∈ Rn and some sequence xk → x̄,
φ(x̄) > lim inf φ(xk ),

k→∞
which implies there exists α ∈ R such that
φ(x̄) > α > lim inf φ(xk ). (1.1)

k→∞
Thus, there exists a subsequence, without relabeling, say {xk } such that
φ(xk ) ≤ α for every k ∈ N, which implies xk ∈ lev≤α φ. By (iii), the lower
level set lev≤α φ is closed and hence x̄ ∈ lev≤α φ, that is, φ(x̄) ≤ α, which
contradicts (1.1). Therefore, φ is lsc over Rn .
The proof of the last implication, that is, (iii) implies (i) of Theorem 1.9
by contradiction was from Bertsekas [12]. We present an alternative proof for
the same from Rockafellar and Wets [101].
It is obvious that for any x̄ ∈ Rn ,
ᾱ = lim inf φ(x) ≤ φ(x̄).

x→x̄
Therefore, to establish the lower semicontinuity of φ at x̄, we need to prove

that φ(x̄) ≤ ᾱ. By Lemma 1.8, there exists a sequence {xk } ⊂ Rn with
xk → x̄ such that φ(xk ) → ᾱ. Thus, for every α > ᾱ, φ(xk ) ≤ α, which
implies xk ∈ lev≤α φ. Now if condition (iii) of the above theorem holds, that
is, lev≤α φ is closed in Rn ,
x̄ ∈ lev≤α φ, ∀ α > ᾱ.
Thus, φ(x̄) ≤ α, which leads to φ(x̄) ≤ ᾱ. Because x̄ ∈ Rn was arbitrarily

chosen, φ is lsc over Rn .
Theorem 1.9 gives equivalent characterization of lower semicontinuity of a
function. But if the function is not lsc, its epigraph is not closed. The result
below gives an equivalent characterization of the closure of the epigraph of
any arbitrary function.
Proposition 1.10 For any arbitrary extended-valued function φ : Rn → R̄,

(x̄, ᾱ) ∈ cl epi φ if and only if
lim inf φ(x) ≤ ᾱ.

x→x̄

Proof. Suppose that (x̄, ᾱ) ∈ cl epi φ, which implies that there exists
{(xk , αk )} ⊂ epi φ such that (xk , αk ) → (x̄, ᾱ). Thus, taking the limit as
k → +∞, the condition
lim inf φ(x) ≤ lim inf φ(xk )

x→x̄ xk →x̄
yields
lim inf φ(x) ≤ ᾱ,

x→x̄
as desired.
Conversely, assume that lim inf x→x̄ φ(x) ≤ ᾱ but (x̄, ᾱ) 6∈ cl epi φ.
We claim that, lim inf x→x̄ φ(x) = ᾱ. On the contrary, suppose that
lim inf x→x̄ φ(x) < ᾱ. As (x̄, ᾱ) ∈
/ cl epi φ, there exists δ̄ > 0 such that for
every δ ∈ (0, δ̄),
Bδ ((x̄, ᾱ)) ∩ cl epi φ = ∅,
which implies for every (x, α) ∈ Bδ ((x̄, ᾱ)), φ(x) > α. In particular for
(x, ᾱ) ∈ Bδ ((x̄, ᾱ)), φ(x) > ᾱ, that is,
φ(x) > ᾱ, ∀ x ∈ Bδ (x̄).
Therefore, taking the limit as δ → 0 along with the definition of limit infimum
of a function yields
lim inf φ(x) ≥ ᾱ,

x→x̄
which is a contradiction. Therefore, lim inf x→x̄ φ(x) = ᾱ. By Lemma 1.8, there
exists a sequence xk → x̄ such that φ(xk ) → ᾱ. Because (xk , φ(xk )) ∈ epi φ,
(x̄, ᾱ) ∈ cl epi φ, thereby reaching a contradiction and hence the result.
Now the question is whether it is possible to construct a function that is
the closure of the epigraph of another function. This leads to the concept of
closure of a function.
Definition 1.11 For any function φ : Rn → R̄, an lsc function that is con-
structed in such a way that its epigraph is the closure of the epigraph of φ is
called the lower semicontinuous hull or the closure of the function φ and is
denoted by cl φ. Therefore,
epi(cl φ) = cl epi φ.
Equivalently, the closure of φ is defined as
cl φ(x̄) = lim inf φ(x), ∀ x̄ ∈ Rn .

x→x̄
By Proposition 1.10, it is obvious that (x̄, ᾱ) ∈ cl epi φ if and only if

(x̄, ᾱ) ∈ epi cl φ. The function φ is said to be closed if cl φ = φ.

−1 1 −1 1
epi φ epi cl φ
FIGURE 1.1: Lower semicontinuous hull.
If φ is lsc, then it is closed as well. Also cl φ is lsc and the greatest of all
lsc functions ψ such that ψ(x) ≤ φ(x) for every x ∈ Rn . From Theorem 1.9,
one has that closedness is the same as lower semicontinuity over Rn . In this
discussion, the function φ was defined over Rn . But what if φ is defined over
some subset of Rn . Then one cannot talk about the lower semicontinuity of
the function over Rn . In such a case, how is the closedness of a function related
to lower semicontinuity? This issue was addressed by Bertsekas [12]. Consider
a set F ⊂ Rn and a function φ : F → R̄. Observe that here we define φ over
the set F and not Rn . The function φ can be extended over Rn by defining a
function φ̄ : Rn → R̄ as

φ(x), x ∈ F,
φ̄(x) =
+∞, otherwise.
Note that both the extended-valued functions φ and φ̄ have the same epigraph.
Thus from the above discussion, one has φ is closed if and only if φ̄ is lsc over
Rn . Also observe that the lower semicontinuity of φ over dom φ is not sufficient
for φ to be closed. In addition, one has to assume the closedness of dom φ.
To emphasize this fact, let us consider a simple example. Consider φ : R → R̄
defined as

0, x ∈ (−1, 1),
φ(x) =
+∞, otherwise.
Here, dom φ = (−1, 1) over which the function is lsc but epi φ is not closed
and hence, φ is not closed. The closure of φ is given by

0, x ∈ [−1, 1],
cl φ(x) =
+∞, otherwise.
Observe that in Figure 1.1, epi φ is not closed while epi cl φ is closed. There-
fore, we have the following result from Bertsekas [12].

Proposition 1.12 Consider F ⊂ Rn and a function φ : F → R̄. If dom φ is

closed and φ is lsc over dom φ, then φ is closed.
Because we are interested in studying the minimization problem, it is im-

portant to know whether a minimizer exists or not. In this respect, we have
the classical Weierstrass theorem, according to which “A continuous function
attains its minimum over a compact set.” For a more general version of this
theorem from Bertsekas [12], we require the notion of coercivity.
Definition 1.13 A function φ : Rn → R̄ is said to be coercive over a set

F ⊂ Rn if for every sequence {xk } ⊂ F
lim φ(xk ) = +∞ whenever kxk k → +∞.

k→∞
For F = Rn , φ is simply called coercive.
Observe that for a coercive function, every nonempty lower level set is
bounded. Below we prove the Weierstrass Theorem.
Theorem 1.14 (Weierstrass Theorem) Consider a proper lsc function

φ : Rn → R̄ and assume that one of the following holds:
(i) dom φ is bounded.
(ii) there exists α ∈ R such that the lower level set lev≤α φ is nonempty and
bounded.
(iii) φ is coercive.
Then the set of minimizers of φ over Rn is nonempty and compact.
Proof. Suppose that condition (i) holds, that is, dom φ is bounded. Because
φ is proper, φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. Denote
φinf = inf x∈Rn φ(x), which implies φinf = inf x∈dom φ φ(x). Therefore, there
exists a sequence {xk } ⊂ dom φ such that φ(xk ) → φinf . Because dom φ is
bounded, {xk } is a bounded sequence, which by Bolzano–Weierstrass Theo-
rem, Proposition 1.3, has a convergent subsequence. Without loss of generality,
assume that xk → x̄. By the lower semicontinuity of φ,
φ(x̄) ≤ lim inf φ(xk ) = lim φ(xk ) = φinf ,

k→∞ k→∞
which implies that x̄ is a point of minimizer of φ over Rn . Denote the set

of minimizers by S. Therefore, x̄ ∈ S and hence S is nonempty. Because
S ⊂ dom φ which is bounded, S is a bounded set. Also, S is the intersection
of the lower level sets lev≤α φ, where α > m. For an lsc function φ, lev≤α φ is
closed by Theorem 1.9 and thus S is closed. Hence S is compact.
Assume that condition (ii) holds; that is, for some α ∈ R, the lower level

set lev≤α φ is nonempty and bounded. Consider a proper function φ̄ : Rn → R̄

defined as

φ(x), φ(x) ≤ α,
φ̄(x) =
+∞, otherwise.
Therefore, dom φ̄ = lev≤α φ which is nonempty and bounded by condition (ii).

Since φ is lsc which by Theorem 1.9 implies that dom φ̄ is closed. Also by the
lower semicontinuity of φ along with Proposition 1.12, φ̄ is closed and hence
lsc. Moreover, the set of minimizers of φ̄ is the same as that of φ. The result
follows by applying condition (i) to φ̄.
Suppose that condition (iii) is satisfied, that is, φ is coercive. Because φ
is proper, dom φ is nonempty and thus has a nonempty lower level set. By
the coercivity of φ, it is obvious that the nonempty lower level sets of φ are
bounded, thereby satisfying condition (ii), and therefore leading to the desired
result.
As we all know, the next concept that comes to mind after limit and
continuity is the derivative of a function. Below we define this very notion.
Definition 1.15 For a scalar-valued function φ : Rn → R, the derivative of

φ at x̄ is denoted by ∇φ(x̄) ∈ Rn and is defined as
φ(x̄ + h) − φ(x̄) − h∇φ(x̄), hi

lim = 0.
khk→0 khk
Equivalently, the derivative can also be expressed as
φ(x) = φ(x̄) + h∇φ(x̄), x − x̄i + o(kx − x̄k),
o(kx − x̄)k
where limx→x̄ = 0. A function φ is differentiable if it is differen-
kx − x̄k
tiable at every x ∈ Rn . The derivative, ∇φ(x̄), of φ at x̄ is also called the
gradient of φ at x̄, which can be expressed as

∂φ ∂φ ∂φ
∇φ(x̄) = (x̄), (x̄), . . . , (x̄) ,
∂x1 ∂x2 ∂xn
∂φ
where , i = 1, 2, . . . , n denotes the i-th partial derivative of φ. If φ is
∂xi
continuously differentiable, that is, the map x 7→ ∇φ(x) is continuous over
Rn , then φ is called a smooth function. If φ is not smooth, it is called a
nonsmooth function.
Similar to the first-order differentiability, we have the second-order differ-
entiability notion as follows.

Definition 1.16 For a scalar-valued function φ : Rn → R, the second-order

derivative of φ at x̄ is denoted by ∇2 φ(x̄) ∈ Rn×n and is defined as
1
φ(x̄ + h) − φ(x̄) − h∇φ(x̄), hi − h∇2 φ(x̄)h, hi
lim 2 = 0,
khk→0 khk2
which is equivalent to
φ(x) = φ(x̄) + h∇φ(x̄), x − x̄i + h∇2 φ(x̄)(x − x̄), x − x̄i + o(kx − x̄k2 ).
The matrix ∇2 φ(x̄) is also referred to as the Hessian with the ij-th entry of
∂2φ
the matrix being the second-order partial derivative (x̄). If φ is twice
∂xi ∂xj
2
continuously differentiable, then the matrix ∇ φ(x̄) is a symmetric matrix.
In the above definitions we considered the function φ to be a scalar-valued
function. Next we define the notion of differentiability for a vector-valued
function Φ.
Definition 1.17 For a vector-valued function Φ : Rn → Rm , the derivative

of Φ at x̄ is denoted by JΦ(x̄) ∈ Rm×n and is defined as
kΦ(x̄ + h) − Φ(x̄) − hJΦ(x̄), hik

lim = 0.
khk→0 khk
The matrix JΦ(x̄) is also called the Jacobian of Φ at x̄. If Φ = (φ1 , φ2 , . . . , φm ),

Φ is differentiable if each φi : Rn → R, i = 1, 2, . . . , m is differentiable. The
Jacobian of Φ at x̄ can be expressed as
 
∇φ1 (x̄)
 ∇φ2 (x̄) 
 
JΦ(x̄) =  .. 
 . 
∇φm (x̄)
∂φi
with the ij-th entry of the matrix being the partial derivative (x̄). In
∂xj
the above expression of JΦ(x̄), the vectors ∇φ1 (x̄), ∇φ2 (x̄), . . . , ∇φm (x̄) are
written as row vectors.
Observe that the derivative is a local concept and it is defined at a point
x if x ∈ int dom φ. Below we state the Mean Value Theorem, which plays a
pivotal role in the study of optimality conditions.
Theorem 1.18 (Mean Value Theorem) Consider a continuously differen-

tiable function φ : Rn → R. Then for every x, y ∈ Rn , there exists z ∈ [x, y]
such that
φ(y) − φ(x) = h∇φ(z), y − xi.

1.3 Smooth Convex Optimization 15
With all these basic concepts we now move on to the study of convexity.
The importance of convexity in optimization stems from the fact that when-
ever we minimize a convex function over a convex set, every local minimum
is a global minimum. Many other issues in optimization depend on convexity.
However, convex functions suffer from the drawback that they need not be
differentiable at every point of their domain of definition and the nondiffer-
entiability may be precisely at the point where the minimum is achieved. For
instance, consider the minimization of the absolute value function, |x|, over R.
At the point of minima, x̄ = 0, the function is nondifferentiable. How this ma-
jor difficulty was overcome by the development of a completely different type
of analysis is possibly one of the most thrilling developments in optimization
theory. This analysis depends on set-valued maps, which we briefly present
below.
Definition 1.19 A set-valued map Φ from Rn to Rm associates every x ∈ Rn
to a set in Rm ; that is, for every x ∈ Rn , Φ(x) ⊂ Rm . Symbolically it is
expressed as Φ : Rn ⇉ Rm . A set-valued map is associated with its graph
defined as
gph Φ = {(x, y) ∈ Rn × Rm : y ∈ Φ(x)}.
Φ is said to be a proper map if there exists x ∈ Rn such that Φ(x) 6= ∅. Φ is said
to be closed-valued or convex-valued or bounded-valued if for every x ∈ Rn , the
sets Φ(x) are closed or convex or bounded, respectively. Φ is locally bounded
at x̄ ∈ Rn if there exists δ > 0 and a bounded set F ⊂ Rn such that
Φ(x) ⊂ V, ∀ x ∈ Bδ (x̄).
The set-valued map Φ is said to be closed if it has a closed graph; that is, for
any sequence {xk } ⊂ Rn with xk → x̄ and yk ∈ Φ(xk ) with yk → ȳ, ȳ ∈ Φ(x̄).
A set-valued map Φ : Rn → Rm is said to be upper semicontinuous (usc) at
x̄ ∈ Rn if for any ε > 0, there exists δ > 0 such that
Φ(x) ⊂ Φ(x̄) + εB, ∀ x ∈ Bδ (x̄),
where the balls are in the respective spaces. If Φ is locally bounded and has a
closed graph, then it is usc. If Φ is single-valued, that is, Φ(x) is singleton for
every x, the upper semicontinuity of Φ coincides with continuity.
For more on set-valued maps, the readers may refer to Berge [10]. A de-
tailed analysis of convex function appears in Chapter 2.
1.3 Smooth Convex Optimization

Recall the convex optimization problem (CP ) stated in Section 1.1, that is,

(y, f (y))
(x, f (x)) (y, f (x) + ∇f (x)(y − x))

graph of f
x y x
FIGURE 1.2: Graph of a real-valued differentiable convex function.

where f : Rn → R is a convex function and C is a closed convex set in Rn . Let
us additionally assume that f is differentiable. It is mentioned in Chapter 2
that if f is differentiable, then for any x ∈ Rn ,
f (y) − f (x) ≥ h∇f (x), y − xi, ∀ y ∈ Rn .
Conversely, if the above relation holds for a function, then the function is
convex. This fact appears as Theorem 2.81 in the next chapter. It is mentioned
there as a consequence of more general facts. However, we provide a direct
proof here.
Observe that if f is convex, then for any x, y ∈ Rn and any λ ∈ [0, 1],
(1 − λ)f (x) + λf (y) ≥ f (x + λ(y − x)).
Hence, for any λ ∈ (0, 1),
f (x + λ(y − x)) − f (x)
f (y) − f (x) ≥ .
λ
Taking the limit as λ ↓ 0, the above inequality yields
f (y) − f (x) ≥ h∇f (x), y − xi. (1.2)
For the converse, suppose that (1.2) holds for any x, y ∈ Rn . Setting
z = x + λ(y − x) with λ ∈ (0, 1), then
f (y) − f (z) ≥ h∇f (z), y − zi (1.3)
f (x) − f (z) ≥ h∇f (z), x − zi (1.4)

The result is obtained by simply multiplying (1.3) with λ and (1.4) with
(1 − λ) and then adding them up. This description geometrically means that
the tangent plane should always lie below the graph of the function. For a
convex function f : R → R, it looks something like Figure 1.2. This important
characterization of a convex function leads to the following result.
Theorem 1.20 Consider the convex optimization problem (CP ) where f is

a differentiable convex function and C is a closed convex set in Rn . Then x̄
is a point of minimizer of (CP ) if and only if
h∇f (x̄), x − x̄i ≥ 0, ∀ x ∈ C. (1.5)
Proof. It is simple to see that as C is a convex set, for x ∈ C,
x̄ + λ(x − x̄) ∈ C, ∀ λ ∈ [0, 1].
Therefore, if x̄ is a point of minimum,
f (x̄ + λ(x − x̄)) ≥ f (x̄),
that is,
f (x̄ + λ(x − x̄)) − f (x̄) ≥ 0.
Dividing both sides by λ > 0 and taking the limit as λ ↓ 0 leads to
h∇f (x̄), x − x̄i ≥ 0.
Because x ∈ C was arbitrarily chosen,
h∇f (x̄), x − x̄i ≥ 0, ∀ x ∈ C.
Also as f is convex, by the condition (1.2), for any x ∈ C,
f (x) − f (x̄) ≥ h∇f (x̄), x − x̄i.
Now if (1.5) is satisfied, then the above inequality reduces to
f (x) ≥ f (x̄), ∀ x ∈ C,
thereby proving the requisite result.
Remark 1.21 Expressing the optimality condition in the form of (1.5) leads
to what is called a variational inequality. Let F : Rn → Rn be a given function
and C be a closed convex set in Rn . Then the variational inequality V I(F, C)
is the problem of finding x̄ ∈ C such that
hF (x̄), x − x̄i ≥ 0, ∀ x ∈ C.

C λ = λ0
x x̄
FIGURE 1.3: Local minimizer is global minimizer.
When f is a differentiable convex function, for F = ∇f , V I(∇f, C) is nothing

but the condition (1.5). In order to solve V I(F, C) efficiently, one needs an
additional property on F which is monotonicity. A function F : Rn → Rn is
called monotone if for any x, y ∈ Rn ,
hF (y) − F (x), y − xi ≥ 0.
However, when f is a convex function, one has the following pleasant property.
Theorem 1.22 A differentiable function f is convex if and only if ∇f is

monotone.
For proof, see Rockafellar [97]. However, the reader should try to prove it on
his/her own. We have shown that when (CP ) has a smooth f , one can write
down a necessary and sufficient condition for a point x̄ ∈ C to be a global
minimizer of (CP ). In fact, as already mentioned, the importance of studying
convexity in optimization stems from the following fact. For the problem (CP ),
every local minimizer is a global minimizer irrespective of the fact whether f
is smooth or not. This can be proved in a simple way as follows. If x̄ is a local
minimizer of (CP ), then there exists δ > 0 such that
f (x) ≥ f (x̄), ∀ x ∈ C ∩ Bδ .
Now consider any x ∈ C. Then it is easy to observe from Figure 1.3 that there
exists λ0 ∈ (0, 1) such that for every λ ∈ (0, λ0 ),
λx + (1 − λ)x̄ ∈ C ∩ Bδ .
Hence
f (λx + (1 − λ)x̄) ≥ f (x̄).

The convexity of f shows that
λ(f (x) − f (x̄)) ≥ 0.
Because λ > 0, f (x) ≥ f (x̄). As x ∈ C was arbitrary, our claim is established.

The result can also be obtained using the approach of contradiction as done
in Theorem 2.90.
Now consider the following function
θ(x) = sup h∇f (x), x − yi.

y∈C
The interesting feature of the function is that
θ(x) ≥ 0, ∀ x ∈ C
and if θ(x) = 0 for x ∈ C, then x solves the problem (CP ). Furthermore, if x

solves the problem (CP ), we have θ(x) = 0. The function θ is usually called the
gap function or the merit function associated with (CP ). For the variational
inequality problem, such a function was first introduced by Auslender [5]. The
next question is how useful is the function θ to the problem (CP ). What we
will now show is that for certain classes of the problem (CP ), the function
θ can provide an error bound for the problem (CP ). By an error bound we
mean an upper estimate of the distance of a point in C to the solution set
of (CP ). The class of convex optimization problems where such a thing can
be achieved is the class of strongly convex optimization problems. A function
f : Rn → R is strongly convex with modulus of strong convexity ρ > 0 if for
any x, y ∈ Rn and λ ∈ [0, 1],
(1 − λ)f (x) + λf (y) ≥ f (x + λ(y − x)) + ρλ(1 − λ)ky − xk2 .
If f is differentiable, then f is strongly convex if and only if for any x, y ∈ Rn ,
f (y) − f (x) ≥ h∇f (x), y − xi + ρky − xk2 .

1
Observe that f (x) = hx, Axi, where x ∈ Rn and A is a positive definite n×n
2
matrix, is strongly convex with ρ = λmin (A), the minimum eigenvalue of A
while f (x) = x with x ∈ Rn is not strongly convex.
If f is a twice continuously differentiable strongly convex function, then
∇2 f (x) is always positive definite for each x. Now if f is strongly convex with
modulus of convexity ρ > 0, then for any x, y ∈ Rn ,
f (y) − f (x) ≥ h∇f (x), y − xi + ρky − xk2 ,

f (x) − f (y) ≥ h∇f (y), x − yi + ρky − xk2 .
Adding the above inequalities leads to
0 ≥ h∇f (x), y − xi + h∇f (y), x − yi + 2ρky − xk2 ,

that is,
h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 . (1.6)
The property of ∇f given by (1.6) is called strong monotonicity with 2ρ as the
modulus of monotonicity. It is in fact interesting to observe that if f : Rn → R
is a differentiable function for which there exists ρ > 0 such that for every
x, y ∈ Rn ,
h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 ,
which implies that
h∇f (y) − ∇f (x), y − xi ≥ ρky − xk2 , ∀ x, y ∈ Rn .
Now we request the reader to show that f is strongly convex with modulus
ρ > 0. In fact, if f is strongly convex with ρ > 0 one can also show that ∇f
is strongly monotone with ρ > 0. Thus we conclude that f is strongly convex
with modulus of strong convexity ρ > 0 if and only if ∇f is strongly monotone
with modulus of monotonicity ρ > 0.
It is important to note that one cannot guarantee θ to be finite unless C
has some additional conditions, for example, C is compact. Assume that C
is compact and let x̄ be a solution of the problem (CP ), where f is strongly
convex. (Think why a solution should exist.) Now as f is strongly convex, it
is simple enough to see that x̄ is the unique solution of (CP ). Thus from the
definition of θ, for any x ∈ C and y = x̄,
θ(x) ≥ h∇f (x), x − x̄i.
By strong convexity of f with ρ > 0 as the modulus of strong convexity, ∇f

is strongly monotone with modulus 2ρ. Thus,
h∇f (x), x − x̄i ≥ h∇f (x̄), x − x̄i + 2ρkx − x̄k2 ,
thereby yielding
θ(x) ≥ h∇f (x̄), x − x̄i + 2ρkx − x̄k2 . (1.7)
But by the optimality condition in Theorem 1.20,
h∇f (x̄), x − x̄i ≥ 0, ∀ x ∈ C.
Therefore, the inequality (1.7) reduces to
θ(x) ≥ 2ρkx − x̄k2 ,
which leads to
s
θ(x)
kx − x̄k ≤ .
2ρ

This provides an error bound for (CP ), where f is strongly convex and C is
compact. In this derivation if ∇f was strongly monotone with modulus ρ > 0,
then the error bound will have the expression
s
θ(x)
kx − x̄k ≤ .
ρ
Observe that as ρ > 0,

s s
θ(x) θ(x)
≤
2ρ ρ
and hence the error bound provided by considering that f is strongly monotone
with modulus 2ρ gives a sharper error bound.
Now the question is can we design a merit function for (CP ) that can be
used to develop an error bound even when C is noncompact. Such a merit func-
tion was first developed by Fukushima [48] for general variational inequalities.
In our context, the function given by
α
θ̂α (x) = sup h∇f (x), x − yi − ky − xk2 , α > 0.
y∈C 2
It will be an interesting exercise for the reader to show that
θ̂α (x) ≥ 0, ∀ x ∈ C
and θ̂α (x) = 0 for x ∈ C if and only if x is a solution of (CP ). Observe that
α
θ̂α (x) = − inf h∇f (x), y − xi + ky − xk2 , α > 0.
y∈C 2
For a fixed x, observe that the function
α
φα
x (y) = h∇f (x), y − xi + ky − xk2
2
is a strongly convex function and is coercive (Definition 1.13). Hence φαx attains
a lower bound on C. The point of minimum is unique as φα x is strongly convex.
Hence for each x, the function φα x has a finite minimum value. Thus θ̂ α (x) is
always finite, thereby leading to the following error bound.
Theorem 1.23 Consider the convex optimization problem (CP ) where f is

a differentiable strongly convex function with modulus ρ > 0 and C is a closed
convex set in Rn . Let x̄ ∈ C be the unique solution of (CP ). Furthermore, if
α
ρ > , then for any x ∈ C,
2
s
2θ̂α (x)
kx − x̄k ≤ .
2ρ − α

Proof. For any x ∈ C and y = x̄ in particular,

α
θ̂α (x) ≥ h∇f (x), x − x̄i − kx − x̄k2 .
2
By the fact that ∇f is strongly monotone with modulus ρ > 0,
α
θ̂α (x) ≥ h∇f (x̄), x − x̄i + ρkx − x̄k2 − kx − x̄k2 . (1.8)
2
Because x̄ is the unique point of minimizer of (CP ), by Theorem 1.20,
h∇f (x̄), x − x̄i ≥ 0,
thereby reducing the inequality (1.8) to

α
θ̂α (x) ≥ (ρ − )kx − x̄k2 .
2
Therefore,
s
2θ̂α (x)
kx − x̄k ≤ ,
2ρ − α
as desired.
The reader is urged to show that under the hypothesis of the above theo-
rem, one can prove a more tighter error bound of the form
s
2θ̂α (x)
kx − x̄k ≤ .
4ρ − α
The study of optimality conditions with C explicitly given by functional con-

straints will begin in Chapter 3.

Chapter 2
Tools for Convex Optimization
2.1 Introduction
With the basic concepts discussed in the previous chapter, we devote this
chapter to the study of concepts related to the convex analysis. Convex analy-
sis is the branch of mathematics that studies convex objects, namely, convex
sets, convex functions, and convex optimization theory. These concepts will be
used in the subsequent chapters to discuss the details of convex optimization
theory and in the development of the book.
2.2 Convex Sets

Recall that for any x, y ∈ Rn , the set
[x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1}
denotes the line segment joining the points x and y. The open line segment
joining x and y is given by
(x, y) = {z ∈ Rn : z = (1 − λ)x + λy, 0 < λ < 1}.
Definition 2.1 A set F ⊂ Rn is a convex set if
λx + (1 − λ)y ∈ F, ∀ x, y ∈ F, ∀ λ ∈ [0, 1].
Equivalently, for any x, y ∈ F , the line segment [x, y] is contained in F . Fig-
ure 2.1 present convex and nonconvex sets.
Consider the hyperplane defined as
H(a, b) = {x ∈ Rn : ha, xi = b},
where a ∈ Rn and b ∈ R. Observe that it is a convex set. Similarly, the closed
half spaces given by
H≤ (a, b) = {x ∈ Rn : ha, xi ≤ b} and H≥ (a, b) = {x ∈ Rn : ha, xi ≥ b},
23

24 Tools for Convex Optimization
convex set nonconvex set
FIGURE 2.1: Convex and nonconvex sets.
and the open half spaces given by

H< (a, b) = {x ∈ Rn : ha, xi < b} and H> (a, b) = {x ∈ Rn : ha, xi > b}
are also convex. Another class of sets that are also convex is the affine sets.
Definition 2.2 A set M ⊂ Rn is said to be an affine set if
(1 − λ)x + λy ∈ M, ∀ x, y ∈ M, ∀ λ ∈ R,
where the set {z ∈ Rn : z = (1 − λ)x + λy, λ ∈ R} denotes the line passing
through x and y. Equivalently, M ∈ Rn is affine if for any x, y ∈ M , the line
passing through them is contained in M .
Note that a hyperplane is an example of an affine set. The empty set ∅ and
the whole space Rn are the trivial examples of affine sets. Even though affine
sets are convex, the converse need not be true, as is obvious from the example
of half spaces.
Next we state some basic properties of convex sets.
Proposition 2.3 (i) The intersection of an arbitrary collection of convex sets
is convex.
(ii) For two convex sets F1 , F2 ⊂ Rn , F1 + F2 is convex.
(iii) For a convex set F ⊂ Rn and scalar λ ∈ R, λF is convex.
(iv) For a convex set F ⊂ Rn and scalars λ1 ≥ 0 and λ2 ≥ 0,
(λ1 + λ2 )F = λ1 F + λ2 F
which is convex.
Proof. The properties (i)-(iii) can be established by simply using Defini-
tion 2.1. The readers are urged to prove (i)-(iii) on their own. Here we will
prove only (iv). Consider z ∈ (λ1 + λ2 )F . Thus, there exists x ∈ F such that
z = (λ1 + λ2 )x = λ1 x + λ2 x ∈ λ1 F + λ2 F.

2.2 Convex Sets 25
Because z ∈ (λ1 + λ2 )F was arbitrary,
(λ1 + λ2 )F ⊂ λ1 F + λ2 F. (2.1)
Conversely, let z ∈ λ1 F + λ2 F , which implies that there exist x1 , x2 ∈ F such

that
λ1 λ2
z = λ1 x1 + λ2 x2 = (λ1 + λ2 ) x1 + x2 . (2.2)
λ1 + λ2 λ1 + λ2
λi
Because ∈ [0, 1], i = 1, 2, which along with the convexity of F implies
λ1 + λ2
that
λ1 λ2
x= x1 + x2 ∈ F. (2.3)
λ1 + λ2 λ1 + λ2
Combining the conditions (2.2) and (2.3) lead to
z = (λ1 + λ2 )x ∈ (λ1 + λ2 )F.
As z ∈ λ1 F + λ2 F was arbitrarily chosen,
(λ1 + λ2 )F ⊃ λ1 F + λ2 F,
which along with the inclusion (2.1) yields the desired equality. Observe that
(ii) and (iii) lead to the convexity of (λ1 + λ2 )F = λ1 F + λ2 F .
From Proposition 2.3, it is obvious that intersection of finitely many closed
half spaces is again a convex set. Such sets that can be expressed in this
form are called polyhedral sets. These sets play an important role in linear
programming problems. We will deal with polyhedral sets later in this chapter.
However, unlike the intersection and sum of convex sets, the union as well
as the complement of convex sets need not be convex. For instance, consider
the sets
F1 = {(x, y) ∈ R2 : x2 + y 2 ≤ 1} and F2 = {(x, y) ∈ R2 : y ≥ x2 }.
Observe from Figure 2.2 that both F1 and F2 along with their intersection are
convex sets but neither their complements nor the union of these two sets is
convex.
To overcome such situations where nonconvex sets come into the picture
in convex analysis, one has to convexify the nonconvex sets. This leads to the
notion of convex combination and convex hull.
Definition 2.4 A point x ∈ Rn is said to be a convex combination of the

points x1 , x2 , . . . , xm ∈ Rn if
x = λ1 x1 + λ2 x2 + . . . + λm xm
Pm
with λi ≥ 0, i = 1, 2, . . . , m and i=1 λi = 1.

0000000
1111111 0
10
10
10
10
10
1 0 1
1
0
1 00
10
1
000
111
0
1
0
1
11
000
10
10
10
1 0
1 0
10
10
1
0
10
1
111
000
1111111
000000011
00 0
1
11
00 0
1
01
1 0
10 1
1 0 10
1
010
10
111
00
11
00
111
000 11
00 111
000
0
100
10
1 0
1 0
100
10
1
F1 111
0000
1
0
1
0
1
0
1
F
0
12
0
1 0
10
1
0
10
1
1111
0000
1111
0000
0
1
11
00 11
00 111
0000
1
11111
000000
10
1 0
1 1111
0000
0
1 0
1
1111
0000
111
000 11
00
111
000 01
10 0
1 0
1
111
000 0 1
1 0 1
1111111111
00000000000
1111111
000000011
00 1111111111
0000000000
11111111111
00000000000
1111111
0000000
1111111
0000000 1111111111
0000000000
F1c F2c
1
0
0
10
1
0
1 0
1
1
0 1
0 1
00
10
1
0
10
1
01
10
0
1 0
1
0
1 0
1 0
1
0
10
10
1
0
10
1
0
1
0
1 0
1
0
1 0
1 0
1
0
1 0
1
0
1
01
1 0
1
0
1
0
1 0
1
1
0
0
1 1
0 1
0
1
0
0
10
10
1
0
0
1
0
1
0
1
1
00
1 0
1
0
1 0
1
0
1 0
1 0
1
0
10
10
1
0
10
1
0
1 0
1 0
1
0
1
1 0
1 0
1
1
00
10
1
0
10
10
1
0
1 0
1 0
0
1 1
0 0
1
0
1
01
100
1
0
1 0
1 0
1
0
1
0
1 0
1
0
1 0
1
0
1
0
1
0
1 11
0
0
1 0 0
1
0
1
0
1
1
0 1
0 1
0
1
0
0
1
0
1 0
1 0
1
0
1
0
1 0
1
0 1
0 1
1 0
F1 ∩ F2 F1 ∪ F2
FIGURE 2.2: F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 are
nonconvex.
The next result expresses the concept of convex set in terms of the convex
combination of its elements.
Theorem 2.5 A set F ⊂ Rn is convex if and only if it contains all the convex
combinations of its elements.
Proof. From Definition 2.1 of convex set, F ⊂ Rn is convex if and only if
(1 − λ)x1 + λx2 ∈ F, ∀ x1 , x2 ∈ F, λ ∈ [0, 1],
that is, the convex combination for m = 2 belongs to F .

To establish the result, we will use the induction approach. Suppose that
the convex combination of m = l − 1 elements of F belong to F . Consider
m = l. The convex combination of l elements is
x = λ1 x1 + λ2 x2 + . . . + λl xl ,

2.2 Convex Sets 27
Pl
where xi ∈ F and λi ≥ 0, i = 1, 2, . . . , l with i=1 λi = 1. Because
Pl
i=1 λi = 1, there exists at least one λj > 0 for some j ∈ {1, 2, . . . , l}. Denote
x̃ = λ̃1 x1 + . . . + λ̃j−1 xj−1 + λ̃j+1 xj+1 + . . . + λ̃l xl ,

λi
where λ̃i = ≥ 0, i = 1, . . . , j − 1, j + 1, . . . , l. Observe that
1 − λj
Pl
i=1, i6=j λ̃i = 1 and thus x̃ ∈ F because it is a convex combination of l − 1
elements of F . The element x can now be expressed in terms of xj and x̃ as
x = λj xj + (1 − λj )x̃.
Therefore, x is a convex combination of two elements from F , which implies

that x ∈ F . Thus, the convex set F can be equivalently expressed as a convex
combination of its elements, as desired.
Similar to the concept of convex combination of points, next we introduce
the notion of the convex hull of a set.
Definition 2.6 The convex hull of a set F ⊂ Rn is the smallest convex set
containing F and is denoted by co F . It is basically nothing but the intersection
of all the convex sets containing F .
Further, the convex hull of F can be expressed in terms of the convex

combination of the elements of F as presented in the theorem below.
Theorem 2.7 For any set F ⊂ Rn , the convex hull of F , co F , consists of

all the convex combinations of the elements of F , that is,
m
X
co F = {x ∈ Rn : x = λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m,
i=1
m
X
λi = 1, m ≥ 0}.
i=1
Proof. Denote the set of convex combination of the elements of F by F, that

is,
m
X
F = {x ∈ Rn : x = λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m,
i=1
m
X
λi = 1, m ≥ 0}.
i=1
From Definition 2.6, co F is the smallest convex set containing F . Therefore,

F ⊂ co F . By Theorem 2.5, the convex combination of the elements of F also
belong to the convex set co F , that is,
co F ⊃ F. (2.4)

To establish the result, we will show that F is also convex. Suppose that
Px̃
x, ∈ F, which implies there exist xi ∈ F, λi ≥ P 0, i = 1, 2, . . . , m with
m l
λ
i=1 i = 1 and x̃ i ∈ F, λ̃ i ≥ 0, i = 1, 2, . . . , l with i=1 λ̃i = 1 such that
x = λ1 x1 + λ2 x2 + . . . + λm xm ,
x̃ = λ̃1 x̃1 + λ̃2 x̃2 + . . . + λ̃l x̃l .
For any λ ∈ [0, 1],
(1 − λ)x + λx̃ = (1 − λ)λ1 x1 + . . . + (1 − λ)λm xm + λλ̃1 x̃1 + . . . + λλ̃l x̃l .
Observe that (1 − λ)λi ≥ 0, i = 1, 2, . . . , m and λλ̃i ≥ 0, i = 1, 2, . . . , l

satisfying
m
X l
X
(1 − λ) λi + λ λ̃i = 1.
i=1 i=1
Thus, for any λ ∈ [0, 1],
(1 − λ)x + λx̃ ∈ F.
As x, x̃ ∈ F were arbitrary, the above relation holds for any x, x̃ ∈ F, thereby

implying the convexity of F. Also, F ⊂ F. Therefore, by Definition 2.6 of
convex hull, co F ⊂ F, which along with the inclusion (2.4) leads to the
desired result.
It follows from the above discussion that a set F ⊂ Rn is convex if co F = F
and thus equivalent to the fact that a convex set F ⊂ Rn contains all the
convex combinations of the elements of F . From the above theorem we observe
that co F is expressed as a convex combination of m elements of F , where
m ≥ 0 is arbitrary. But the obvious question is how large this m has to be
chosen in the result. This is answered in the famous Carathéodory Theorem,
which we present next. Though one finds various approaches to prove the
result [12, 97, 101], we present a simple proof from Mangasarian [82].
Theorem 2.8 (Carathéodory Theorem) Consider a nonempty set F ⊂ Rn .

Then any point of the convex hull of F is representable as a convex combina-
tion of at most n + 1 points of F .
Proof. From Theorem 2.7, any element in co F can be expressed as a convex

combination of m elements of F . We have to show that m ≤ (n + 1). Suppose
that xP∈ co F , which implies that there exist xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m,
m
with i=1 λi = 1 such that
x = λ1 x1 + λ2 x2 + . . . + λm xm .
Assume that m > (n + 1). We will prove that x can be expressed as a convex
combination of (m − 1) elements. The result can be established by applying

2.2 Convex Sets 29
the reduction process until m = n + 1. In case for some i ∈ {1, 2, . . . , m},

λi = 0, then x is a convex combination of (m − 1) elements of F .
Suppose that λi > 0, i = 1, 2, . . . , m. It is known that for l > n, any l
elements in Rn are linearly dependent. As m − 1 > n, there exist αi ∈ R,
i = 1, 2, . . . , m − 1, not all zeroes, such that
α1 (x1 − xm ) + α2 (x2 − xm ) + . . . + αm−1 (xm−1 − xm ) = 0.
Define αm = −(α1 + α2 + . . . + αm−1 ). Observe that

m
X m
X
αi = 0 and αi xi = 0. (2.5)
i=1 i=1
Define λ̃i = λi − γαi , i = 1, 2, . . . , m, where γ > 0 is chosen such that λ̃i ≥ 0,

i ∈ {1, 2, . . . , m} and for some j ∈ {1, 2, . . . , m}, λ̃j = 0. This is possible by
taking

1 αi αj
= max = .
γ i=1,...,m λi λj
By choice, λ̃j = 0 and λ̃i ≥ 0, i = 1, . . . , j − 1, j + 1, . . . , m, which by the

condition (2.5) yields
m
X m
X m
X m
X
λ̃i = λ̃i = λi − γ αi = 1
i=1, i6=j i=1 i=1 i=1
and
m
X m
X m
X m
X
x= λi xi = λ̃i xi + γ αi xi = λ̃i xi ,
i=1 i=1 i=1 i=1, i6=j
which implies that x is now expressed as a convex combination of (m − 1)

elements of F . This reduction can be carried out until m = (n + 1) and thus
any element in the convex hull of F is representable as a convex combination
of at most (n + 1) elements of F , as desired.
Using the above theorem, we have the following important result for a
compact set from Bertsekas [12] and Rockafellar [97].
Theorem 2.9 For a compact set F ⊂ Rn , its convex hull co F is also a

compact set.
Proof. We claim that co F is closed. Consider a sequence {zk } ⊂ co F . By

the Carathéodory Theorem, Theorem 2.8, there exist sequences {λki } ⊂ R+ ,
Pn+1
i = 1, 2, . . . , n + 1, satisfying i=1 λki = 1 and {xki } ⊂ F , i = 1, 2, . . . , n + 1,
such that
m
X
zk = λki xki , ∀ k ∈ N. (2.6)
i=1

Pm
As for any k ∈ N, λki ≥ 0, i = 1, 2, . . . , n + 1 with i=1 λki = 1, {λki } is a
bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3,
{λki }, i = 1, 2, . . . , n + 1, has a convergent subsequence. Without loss of
generality, assume that λki → λi , i = 1, 2, . . . , n + 1, such that λi ≥ 0,
Pn+1
i = 1, 2, . . . , n + 1 with i=1 λi = 1. By the compactness of F , the se-
k
quence {xi }, i = 1, 2, . . . , n+1, is bounded. Again by the Bolzano–Weierstrass
Theorem, {xki } has a convergent subsequence. Without loss of generality, let
xki → xi , i = 1, 2, . . . , n + 1. By the compactness of F , F is a closed set and
thus xi ∈ F, i = 1, 2, . . . , n + 1. Taking the limit as k → +∞, (2.6) yields that
n+1
X
zk → z = λi xi ∈ co F.
i=1
Because {zk } ⊂ co F was arbitrary sequence, co F is a closed set.

To prove that co F is compact, we will establish that co F is bounded.
As F is compact, it is a bounded set, which implies that there exists M > 0
such that kxk ≤ M for every x ∈ F . Now consider z ∈ co F , which by the
Carathéodory
Pn+1Theorem implies that there exist λi ≥ 0, i = 1, 2, . . . , n + 1,
satisfying i=1 λi = 1 and xi ∈ F , i = 1, 2, . . . , n + 1, such that
n+1
X
z= λi xi .
i=1
Therefore, by the boundedness of F along with the fact that λi ∈ [0, 1],
i = 1, 2, . . . , n + 1 yields that
n+1
X n+1
X
kzk = k λi xi k ≤ λi kxi k ≤ (n + 1)M.
i=1 i=1
Because z ∈ co F was arbitrary, every element in co F is bounded above by

(n + 1)M and thus co F is bounded. Hence co F is a compact set.
However, the above result does not hold true if one replaces the compact-
ness of F by simply the closedness of the set. To verify this fact, we present
an example from Bertsekas [12]. Consider the closed set F defined as
F = {(0, 0)} ∪ {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0},
while the convex hull of F is
co F = {(0, 0)} ∪ {(x1 , x2 ) ∈ R2 : x1 > 0, x2 > 0},
which is not a closed set.

Now, similar to the concepts of convex combination and convex hull, we
present the notions of affine combination and affine hulls.

2.2 Convex Sets 31
Definition 2.10 A point x ∈ Rn is said to be a affine combination of the

points x1 , x2 , . . . , xm ∈ Rn if
x = λ1 x1 + λ2 x2 + . . . + λm xm
m
X
with λi ∈ R, i = 1, 2, . . . , m and λi = 1.
i=1
Definition 2.11 The affine hull of a set F ⊂ Rn is the smallest affine set
containing F and is denoted by af f F . It consists of all affine combinations
of the elements of F .
Now we move on to the properties of closure, interior, and relative interior

of convex sets.
Definition 2.12 The closure of a set F ⊂ Rn , cl F , is expressed as

\ \
cl F = (F + εB) = {x + εB : x ∈ F },
ε>0 ε>0
while the interior of the set F , int F , is defined as
int F = {x ∈ Rn : there exists ε > 0 such that (x + εB) ⊂ F }.
It is well known that the arbitrary intersection of closed sets is closed

but not for the union. However, for the case of union, the following relation
holds. For any arbitrary family of set {Fλ }, λ ∈ Λ, where the index set Λ may
possibly be infinite,
[ [
cl Fλ ⊂ cl Fλ .
λ∈Λ λ∈Λ
The notion of interior suffers from a drawback that even for a nonempty
convex set, it may turn out to be empty. For example, consider a line in R2 .
From the above definition, it is obvious that the interior is empty. But the
set of interior points relative to the affine hull of the set is nonempty. This
motivates us to introduce the notion of relative interior.
Definition 2.13 The relative interior of a convex set F ⊂ Rn , ri F , is the

interior of F relative to the affine hull of F , that is,
ri F = {x ∈ Rn : there exists ε > 0 such that (x + εB) ∩ af f F ⊂ F }.
For an n-dimensional convex set F ⊂ Rn , af f F = Rn and thus

ri F = int F . Though the notion of relative interior helps in overcoming the
emptiness of the interior of a convex set, it also suffers from a drawback. For
nonempty convex sets F1 , F2 ⊂ Rn ,
F1 ⊂ F2 =⇒ cl F1 ⊂ cl F2 and int F1 ⊂ int F2 ,

but ri F1 ⊂ ri F2 need not hold. For instance, consider F1 = {(0, 0)} and
F2 = {(0, y) ∈ R2 : y ≥ 0}. Here F1 ⊂ F2 with ri F1 = {(0, 0)} and
ri F2 = {(0, y) ∈ R2 : y > 0}. Here the relative interiors are nonempty and
disjoint.
Next we present some properties of closure and relative interior of convex
sets. The proofs are from Bertsekas [11, 12] and Rockafellar [97].
Proposition 2.14 Consider a nonempty convex set F ⊂ Rn . Then the fol-

lowing hold:
(i) ri F is nonempty.
(ii) (Line Segment Principle) Let x ∈ ri F and y ∈ cl F . Then for λ ∈ [0, 1),
(1 − λ)x + λy ∈ ri F.
(iii) (Prolongation Principle) x ∈ ri F if and only if every line segment in

F having x as one end point can be prolonged beyond x without leaving
F , that is, for every y ∈ F there exists γ > 1 such that
x + (γ − 1)(x − y) ∈ F.
(iv) ri F and cl F are convex sets with the same affine hulls as that of F .
Proof. (i) Without loss of generality assume that 0 ∈ F . Then the affine hull
of F , af f F , is a subspace containing F . Denote the dimension of af f F by
m. If m = 0, then F as well as af f F consist of a single point and hence ri F
is the point itself, thus proving the result.
Suppose that m > 0. Then one can always find linearly independent ele-
ments x1 , x2 , . . . , xm from F such that af f F = span{x1 , x2 , . . . , xm }, that is,
x1 , x2 , . . . , xm form a basis of the subspace af f F . If this was not possible, then
there exist linearly independent elements y1 , y2 , . . . , yl with l < m from F such
that F ⊂ span{y1 , y2 , . . . , yl }, thereby contradicting the fact that the dimen-
sion of af f F is m. Observe that co {0, x1 , x2 , . . . , xm } ⊂ F has a nonempty
interior with respect to af f F , which implies co {0, x1 , x2 , . . . , xm } ⊂ ri F ,
thereby yielding that ri F is nonempty.
(ii) Suppose that y ∈ cl F , which implies there exists {yk } ⊂ F such that
yk → y. As x ∈ ri F , there exists ε > 0 such that Bε (x) ∩ af f F ⊂ F . For
λ ∈ [0, 1), define yλ = (1 − λ)x + λy and yk,λ = (1 − λ)x + λyk . Therefore, from
Figure 2.3, it is obvious that each point of B(1−λ)ε (yk,λ ) ∩ af f F is a convex
combination of yk and some point from Bε (x) ∩ af f F . By the convexity of F ,
B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F, ∀ k ∈ N.
Because yk → y, yk,λ → yλ . Thus, for sufficiently large k,
B(1−λ)ε/2 (yλ ) ⊂ B(1−λ)ε (yk,λ ),

2.2 Convex Sets 33
x yk,λ
yk
ε
(1 − λ)ε
FIGURE 2.3: Line segment principle.
which implies
B(1−λ)ε/2 (yλ ) ∩ af f F ⊂ B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F.
Hence, yλ = (1 − λ)x + λy ∈ ri F for λ ∈ [0, 1).

Aliter. The above approach was direct and somewhat cumbersome. In the
proof to follow, we use the fact that relative interiors are preserved under one-
to-one affine transformation of Rn to itself and hence these transformations
preserve the affine hulls. This property simplifies the proofs. If F is an m-
dimensional set in Rn , there exists a one-to-one affine transformation of Rn
to itself that carries af f F to the subspace
S = {(x1 , . . . , xm , xm+1 , . . . , xn ) ∈ Rn : xm+1 = 0, . . . , xn = 0}.
Thus, S can be considered a copy of Rm . From this view, one can simply
consider the case when F ⊂ Rn is an n-dimensional set, which implies that
ri F = int F . We will now establish the result for int F instead of ri F .
Because y ∈ cl F ,
y ∈ F + εB, ∀ ε > 0.
Therefore, for every ε > 0,
(1 − λ)x + λy + εB ⊂ (1 − λ)x + λ(F + εB) + εB

ε(1 + λ)
= (1 − λ) x + B + λF, ∀ λ ∈ [0, 1).
1−λ

Because x ∈ int F , choosing ε > 0 sufficiently small such that

ε(1 + λ)
x+ B ⊂ F,
1−λ
which along with the convexity of F reduces the preceding relation to
(1 − λ)x + λy + εB ⊂ (1 − λ)F + λF ⊂ F, ∀ λ ∈ [0, 1).
Thus, (1 − λ)x + λy ∈ int F for λ ∈ [0, 1) as desired.

(iii) For x ∈ ri F , by the definition of relative interior, the condition holds.
Conversely, suppose that x ∈ Rn satisfies the condition. We claim that
x ∈ ri F . By (i) there exists x̃ ∈ ri F . If x = x̃, we are done. So assume
that x 6= x̃. As x̃ ∈ ri F ⊂ F , by the condition, there exists γ > 1 such that
y = x + (γ − 1)(x − x̃) ∈ F.
1
Therefore, for λ = ∈ (0, 1),
γ
x = (1 − λ)x̃ + λy,
which by the fact that y ∈ F ⊂ cl F along with the line segment principle (ii)
implies that x ∈ ri F , thereby establishing the result.
(iv) Because ri F ⊂ cl F , by (ii) we have that ri F is convex. From (i), we know
that there exist x1 , x2 , . . . , xm ∈ F such that af f {x1 , x2 , . . . , xm } = af f F
and co {0, x1 , x2 , . . . , xm } ⊂ ri F . Therefore, ri F has an affine hull the same
as that of F .
By Proposition 2.3, F +εB is convex for every ε > 0. Also, as intersection of
convex sets is convex, cl F , which is the intersection of the collection of the sets
F + εB over ε > 0 is convex. Because F ⊂ af f F , cl F ⊂ cl af f F = af f F ,
and as F ⊂ cl F , af f F ⊂ af f cl F , which together implies that the affine
hull of cl F coincides with af f F .
In the result below we discuss the closure and relative interior operations.
The proofs are from Bertsekas [12] and Rockafellar [97].
Proposition 2.15 Consider nonempty convex sets F, F1 , F2 ⊂ Rn . Then the

following hold:
(i) cl(ri F ) = cl F .
(ii) ri(cl F ) = ri F .
(iii) ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ) and cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2 .
In addition if ri F1 ∩ ri F2 6= ∅,
ri F1 ∩ ri F2 = ri (F1 ∩ F2 ) and cl (F1 ∩ F2 ) = cl F1 ∩ cl F2 .

2.2 Convex Sets 35
(iv) Consider a linear transformation L : Rn → Rm . Then
L(cl F ) ⊂ cl (LF ) and L(ri F ) = ri (LF ).
(v) ri (αF ) = α ri F for every α ∈ R.
(vi) ri (F1 + F2 ) = ri F1 + ri F2 and cl F1 + cl F2 ⊂ cl (F1 + F2 ). If either

F1 or F2 is bounded,
cl F1 + cl F2 = cl (F1 + F2 )
Proof. (i) Because ri F ⊂ F , it is obvious that
cl(ri F ) ⊂ cl F.
Conversely, suppose that y ∈ cl F . We claim that y ∈ cl(ri F ). Consider

any x ∈ ri F . By the line segment principle, Proposition 2.14 (ii), for every
λ ∈ [0, 1),
(1 − λ)x + λy ∈ ri F.
Observe that the sequence {(1 − λk )x + λk y} ⊂ ri F is such that as the limit

λk → 1, (1 − λk )x + λk y → y, which implies that y ∈ cl(ri F ), as claimed.
Hence the result.
(ii) We know that F ⊂ cl F and by Proposition 2.14 (iv), af f F = af f (cl F ).
Consider x ∈ ri F , which by the definition of relative interior along with the
preceding facts imply that there exists ε > 0 such that
(x + εB) ∩ af f (cl F ) = (x + εB) ∩ af f F ⊂ F ⊂ cl F,
thereby yielding that x ∈ ri (cl F ). Hence, ri F ⊂ ri (cl F ).

Conversely, suppose that x ∈ ri (cl F ). We claim that x ∈ ri F . By the
nonemptiness of ri F , Proposition 2.14 (i), there exists x̃ ∈ ri F ⊂ cl F .
If in particular x = x̃, we are done. So assume that x 6= x̃. We can choose
γ > 1, sufficiently close to 1 such that by applying the Prolongation Principle,
Proposition 2.14 (iii),
y = x + (γ − 1)(x − x̃) ∈ cl F.
1
Therefore, for λ = ∈ (0, 1),
γ
x = (1 − λ)x̃ + λy,
which by the Line Segment Principle, Proposition 2.14 (ii), implies that
x ∈ ri F , thereby leading to the requisite result.

(iii) Suppose that x ∈ ri F1 ∩ ri F2 and y ∈ F1 ∩ F2 . By the Prolongation

Principle, Proposition 2.14 (iii), there exist γi > 1, i = 1, 2 such that
x + (γi − 1)(x − y) ∈ Fi , i = 1, 2.
Choosing γ = min{γ1 , γ2 } > 1, the above condition reduces to
x + (γ − 1)(x − y) ∈ F1 ∩ F2 ,
which again by the Prolongation Principle leads to x ∈ ri (F1 ∩ F2 ). Thus,
ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ).
Because F1 ∩ F2 ⊂ cl F1 ∩ cl F2 , it is obvious that cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2

as intersection of arbitrary closed sets is closed.
Assume that ri F1 ∩ ri F2 is nonempty. Suppose that x ∈ ri F1 ∩ ri F2
and y ∈ cl F1 ∩ cl F2 . By the Line Segment Principle, Proposition 2.14 (ii),
for every λ ∈ [0, 1),
(1 − λ)x + λy ∈ ri F1 ∩ ri F2 .
Observe that the sequence {(1 − λk )x + λk y} ⊂ F is such that as λk → 1,

(1 − λk )x + λk y → y and hence y ∈ cl (ri F1 ∩ ri F2 ). Therefore,
cl F1 ∩ cl F2 ⊂ cl (ri F1 ∩ ri F2 ) ⊂ cl (F1 ∩ F2 ), (2.7)
thereby yielding the desired equality, that is,
cl F1 ∩ cl F2 = cl (F1 ∩ F2 ).
Also from the inclusion (2.7),
cl (ri F1 ∩ ri F2 ) = cl (F1 ∩ F2 ).
By (ii), the above condition leads to
ri (ri F1 ∩ ri F2 ) = ri (cl (ri F1 ∩ ri F2 ))

= ri (cl (F1 ∩ F2 )) = ri (F1 ∩ F2 ),
which implies that
ri F1 ∩ ri F2 ⊃ ri (F1 ∩ F2 ),
thus establishing the requisite result.

(iv) Suppose that x ∈ cl F , which implies there exists a sequence {xk } ⊂ F
such that xk → x. Because L is a linear transformation, it is continu-
ous. Therefore, L(xk ) → L(x), which implies L(x) ∈ cl(LF ) and hence
L(cl F ) ⊂ cl(LF ).

2.2 Convex Sets 37
As ri F ⊂ F , on applying the linear transformation L, L(ri F ) ⊂ LF and

thus cl L(ri F ) ⊂ cl (LF ). Also, as F ⊂ cl F , proceeding as before which
along with (i) and the closure inclusion yields
LF ⊂ L(cl F ) = L(cl (ri F )) ⊂ cl L(ri F ).
Therefore, cl (LF ) ⊂ cl L(ri F ), which by earlier condition leads to

cl (LF ) = cl L(ri F ). By (ii),
ri (LF ) = ri (cl (LF )) = ri (cl L(ri F )) = ri (L(ri F )),
thereby yielding ri (LF ) ⊂ L(ri F ).

Conversely, suppose that x̄ ∈ L(ri F ), which implies that there exists
x̃ ∈ ri F such that x̄ = L(x̃). Consider any ȳ ∈ LF and corresponding ỹ ∈ F
such that ȳ = L(ỹ). By the Prolongation Principle, Proposition 2.14 (iii),
there exists γ > 1 such that
(1 − γ)ỹ + γ x̃ ∈ F,
which under the linear transformation leads to
(1 − γ)ȳ + γ x̄ ∈ LF.
Because ȳ ∈ LF was arbitrary, again applying the Prolongation Principle

yields x̄ ∈ ri (LF ), that is, L(ri F ) ⊂ ri (LF ), thereby establishing the
desired equality.
(v) For arbitrary but fixed α ∈ R, define a linear transformation Lα : Rn → Rn
given by Lα (x) = αx. Therefore, for a set F , Lα F = αF . For every α ∈ R,
applying (iv) to Lα F leads to
α ri F = Lα (ri F ) = ri (Lα F ) = ri (αF )
and hence the result.

(vi) Define a linear transformation L : Rn × Rn → Rn given by
L(x1 , x2 ) = x1 + x2 which implies L(F1 , F2 ) = F1 + F2 . Now applying (iv)
to L yields
ri (F1 + F2 ) = ri F1 + ri F2 and cl F1 + cl F2 ⊂ cl (F1 + F2 ).
To establish the equality in the closure part, assume that F1 is bounded.

Suppose that x ∈ cl(F1 + F2 ), which implies that there exist {xki } ⊂ Fi ,
i = 1, 2, such that xk1 + xk2 → x. Because F1 is bounded, {xk1 } is a bounded se-
quence, which leads to the boundedness of {xk2 }. By the Bolzano–Weierstrass
Theorem, Proposition 1.3, the sequence {(xk1 , xk2 )} has a subsequence converg-
ing to (x1 , x2 ) such that x1 +x2 = x. As xi ∈ cl Fi for i = 1, 2, x ∈ cl F1 +cl F2 ,
hence establishing the result cl F1 + cl F2 = cl (F1 + F2 ).

Note that for the equality part in Proposition 2.15 (iii), the nonemptiness
of ri F1 ∩ ri F2 is required, otherwise the equality need not hold. We present
an example from Bertsekas [12] to illustrate this fact. Consider the sets
F1 = {x ∈ R : x ≥ 0} and F2 = {x ∈ R : x ≤ 0}.
Therefore, ri (F1 ∩ F2 ) = {0} 6= ∅ = ri F1 ∩ ri F2 . For the closure part,

consider
F1 = {x ∈ R : x > 0} and F2 = {x ∈ R : x < 0}.
Thus, cl (F1 ∩ F2 ) = ∅ 6= {0} = cl F1 ∩ cl F2 .

Also the boundedness assumption in (vi) for the closure equality is neces-
sary. For instance, consider the sets
F1 = {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 > 0, x2 > 0},

F2 = {(x1 , x2 ) ∈ R2 : x1 = 0}.
Here, both F1 and F2 are closed unbounded sets, whereas the sum
F1 + F2 = {(x1 , x2 ) ∈ R2 : x1 > 0}
is not closed. Thus cl F1 + cl F2 = F1 + F2 $ cl (F1 + F2 ).

As a consequence of Proposition 2.15, we have the following result from
Rockafellar [97].
Corollary 2.16 (i) Consider two convex sets F1 and F2 in Rn . Then

cl F1 = cl F2 if and only if ri F1 = ri F2 . Equivalently,
ri F1 ⊂ F2 ⊂ cl F1 .
(ii) Consider a convex set F ⊂ Rn . Then any open set that meets cl F also
meets ri F .
(iii) Consider a convex set F ⊂ Rn and an affine set H ⊂ Rn containing a
point from ri F . Then
ri (F ∩ H) = ri F ∩ H and cl(F ∩ H) = cl F ∩ H.
Proof. (i) Suppose that cl F1 = cl F2 . Invoking Proposition 2.15 (ii),
ri F1 = ri(cl F1 ) = ri(cl F2 ) = ri F2 . (2.8)
Now assume that ri F1 = ri F2 , which by Proposition 2.15 (i) implies that
cl F1 = cl(ri F1 ) = cl(ri F2 ) = cl F2 . (2.9)
Combining the relations (2.8) and (2.9) leads to
ri F1 = ri F2 ⊂ F2 ⊂ cl F2 = cl F1 ,

2.2 Convex Sets 39
thereby establishing the desired result.

(ii) Denote the open set by O. Suppose that O meets cl F , that is, there exists
x ∈ Rn such that x ∈ O ∩ cl F . By Proposition 2.15 (i), cl F = cl(ri F ), which
implies
x ∈ O ∩ cl(ri F ).
Because x ∈ cl(ri F ), there exists {xk } ⊂ ri F such that xk → x. Therefore,

for k sufficiently large, one can choose ε̄ > 0 such that xk ∈ x + ε̄B. Also, as
O is an open set and x ∈ O, there exists ε̃ > 0 such that x + ε̃B ⊂ O. Define
ε = min{ε̄, ε̃}. Thus for sufficiently large k,
xk ∈ x + εB ⊂ O,
which along with the fact that xk ∈ ri F implies that O also meets ri F ,
hence proving the result.
(iii) Observe that for an affine set H, ri H = H = cl H. Therefore, by the
given hypothesis,
ri F ∩ H = ri F ∩ ri H 6= ∅.
Thus, by Proposition 2.15 (iii),
ri(F ∩ H) = ri F ∩ H and cl(F ∩ H) = cl F ∩ H,
thereby completing the proof.

Before moving on to the various classes of convex sets, we would like to
mention the concept of core of a set like the notions of closure and interior of
a set from Borwein and Lewis [17].
Definition 2.17 The core of a set F ⊂ Rn , denoted by core F , is defined as
core F = {x ∈ F : for every d ∈ Rn there exists λ > 0

such that x + λd ∈ F }.
It is obvious that int F ⊂ core F . For a convex set F ⊂ Rn , int F = core F .
2.2.1 Convex Cones

Since we are interested in the study of convex optimization theory, a class
of sets that plays an active role in this direction is the epigraphical set as
discussed briefly in Chapter 1. From the definition it is obvious that epigraph-
ical sets are unbounded. Thus it seems worthwhile to understand the class of
unbounded convex sets for which one needs the idea of recession cones. But
before that, we require the concept of cones.

Definition 2.18 A set K ⊂ Rn is said to be a cone if for every x ∈ K,

λx ∈ K for every λ ≥ 0. Therefore, for any set F ⊂ Rn , the cone generated
by F is denoted by cone F and is defined as
[
cone F = λF = {z ∈ Rn : z = λx, x ∈ F, λ ≥ 0}.
λ≥0
Note that for a nonconvex set F , cone F may or may not be convex. For
example, consider F = {(1, 1), (2, 2)}. Here,
cone F = {z ∈ R2 : z = λ(1, 1), λ ≥ 0},
which is convex. Now consider F = {(−1, 1), (1, 1)}. Observe that the cone
generated by F comprises of two rays, that is,
cone F = {z ∈ R2 : z = λ(−1, 1) or z = λ(1, 1), λ ≥ 0}.
But we are interested in the convex scenarios, thereby moving on to the notion
of the convex cone.
Definition 2.19 The set K ⊂ Rn is said to be convex cone if it is convex

as well as a cone. Therefore, for any set F ⊂ Rn , the convex cone generated
by F is denoted by cone co F and is expressed as a set containing all conic
combinations of the elements of the set F , that is,
m
X
cone co F = {x ∈ Rn : x = λi xi , xi ∈ F, λi ≥ 0,
i=1
i = 1, 2, . . . , m, m ≥ 0}.
The convex cone generated by the set F is the smallest convex cone containing
F . Also, for a collection of convex sets Fi ⊂ Rn , i = 1, 2, . . . , m, the convex
cone generated by Fi , i = 1, 2, . . . , m can be easily shown to be expressed as
m
[ m
[ X
cone co Fi = λ i Fi .
i=1 λ∈Rm
+
i=1
Some of the important convex cones that play a pivotal role in convex
optimization are the polar cone, tangent cone, and the normal cone. We shall
discuss them later in the chapter. Before going back to the discussion of un-
bounded sets, we characterize the class of convex cones in the result below.
Theorem 2.20 A cone K ⊂ Rn is convex if and only if K + K ⊂ K.
Proof. Suppose that the cone K is convex. Consider x, y ∈ K. Because K is

convex, for λ = 1/2,
1
(1 − λ)x + λy = (x + y) ∈ K.
2

2.2 Convex Sets 41
Also, as K is a cone, x + y ∈ 2K ⊂ K which implies that K + K ⊂ K.

Conversely, suppose that K + K ⊂ K. Consider x, y ∈ K. Because K is a
cone, for λ ∈ [0, 1],
(1 − λ)x ∈ K and λy ∈ K,
which along with the assumption K + K ⊂ K leads to
(1 − λ)x + λy ∈ K, ∀ λ ∈ [0, 1].
As x, y ∈ K were arbitrary, the cone K is convex, thus proving the result.

Now coming back to unbounded convex sets, a set can be thought to be un-
bounded if for any point in the set there exists a direction moving along which
one still remains within the set. Such directions are known as the directions
of recession and are independent of the point chosen.
Definition 2.21 For a convex set F ⊂ Rn , d ∈ Rn is said to be the direction

of recession of F if x + λd ∈ F for every x ∈ F and for every λ ≥ 0. The
collection of all the directions of recession of a set F ⊂ Rn form a cone known
as the recession cone of F and is denoted by 0+ F . Equivalently,
0+ F = {d ∈ Rn : F + d ⊂ F }. (2.10)
It is easy to observe that for any d ∈ 0+ F , d belongs to the set on the

right-hand side of the relation (2.10) by choosing in particular λ = 1. Now
suppose that d ∈ Rn belongs to the set on the right-hand side of the relation
(2.10). Therefore, for any x ∈ F , x + d ∈ F . Invoking the condition iteratively,
x + kd ∈ F for k ∈ N. Because F is convex, for any λ̄ ∈ [0, 1],
(1 − λ̄)x + λ̄(x + kd) = x + λ̄kd ∈ F, ∀ k ∈ N.
Denoting λ = λ̄k ≥ 0, the above condition reduces to x + λd ∈ F for every

λ ≥ 0. As this relation holds for any x ∈ F , d ∈ 0+ F , thereby establishing the
relation (2.10).
Below we present some properties of the recession cone with proofs from
Bertsekas [12].
Proposition 2.22 Consider a closed convex set F ⊂ Rn . Then the following

holds:
(i) 0+ F is a closed convex cone.
(ii) d ∈ 0+ F if and only if there exists x ∈ F such that x + λd ∈ F for every

λ ≥ 0.
(iii) The set F is bounded if and only if 0+ F = {0}.

Proof. (i) Suppose that d ∈ 0+ F , which implies that for every x ∈ F and
λ ≥ 0, x + λd ∈ F . Consider α > 0. Denote λ̄ = λ/α ≥ 0. Then,
x + λ̄(αd) = x + λd ∈ F,
which implies αd ∈ 0+ F for α > 0. For α = 0, it is trivial. Thus, 0+ F is a

cone.
Suppose that d1 , d2 ∈ 0+ F , which implies for every x ∈ F and λ ≥ 0,
x + λdi ∈ F, i = 1, 2.
By the convexity of F , for any λ̃ ∈ [0, 1],
(1 − λ̃)(x + λd1 ) + λ̃(x + λd2 ) = x + λ((1 − λ̃)d1 + λ̃d2 ) ∈ F,
which yields that (1 − λ̃)d1 + λ̃d2 ∈ 0+ F , thus implying the convexity of 0+ F .

Finally, to establish the closedness of 0+ F , suppose that d ∈ cl 0+ F , which
implies that there exists {dk } ⊂ 0+ F . Therefore, for every x ∈ F and every
λ ≥ 0, x + λdk ∈ F . Because F is closed,
x + λd ∈ F, ∀ x ∈ F, ∀ λ ≥ 0,
which implies that d ∈ 0+ F , thereby implying that 0+ F is closed.

(ii) If d ∈ 0+ F , then from the definition of recession cone itself, the condition
is satisfied. Conversely, suppose that d ∈ Rn is such that there exists x ∈ F
satisfying
x + λd ∈ F, ∀ λ ≥ 0.
Without loss of generality, assume that d 6= 0. Consider arbitrary x̃ ∈ F .

Because 0+ F is a cone, it suffices to prove that x̃ + d ∈ F . Define
xk = x + kd, k ∈ N,
which by the condition implies that {xk } ⊂ F . If x̃ = xk for some k ∈ N, then

again by the condition
x̃ + d = x + (k + 1)d ∈ F
and thus we are done. So assume that x̃ 6= xk for every k. Define

xk − x̃
dk = kdk, ∀ k ∈ N.
kxk − x̃k
kyk
Therefore, for λ̃ = ≥ 0,
kxk − x̃k
x̃ + dk = (1 − λ̃)x̃ + λ̃xk ,

2.2 Convex Sets 43
which implies that x̃ + dk lies on the line starting at x̃ and passing through
xk . Now consider
dk xk − x̃
=
kdk kxk − x̃k
xk − x x − x̃
= +
kxk − x̃k kxk − x̃k
kxk − xk xk − x x − x̃
= +
kxk − x̃k kxk − xk kxk − x̃k
kxk − xk d x − x̃
= + .
kxk − x̃k kdk kxk − x̃k
By the construction of {xk }, we know that it is an unbounded sequence.

Therefore,
kxk − xk kkdk x − x̃ x − x̃
= →1 and = → 0,
kxk − x̃k kx − x̃ + kdk kxk − x̃k kx − x̃ + kdk
which along with the preceding condition leads to dk → d. The vector

x̃ + dk ∈ (x̃, xk ) for every k ∈ N such that kxk − x̃k ≥ kdk, which by the
convexity of F implies that x̃ + dk ∈ F . Therefore, x̃ + dk → x̃ + d, which
by the closedness of F leads to x̃ + d ∈ F . As x̃ ∈ F was arbitrarily chosen,
F + d ⊂ F , thereby implying that d ∈ 0+ F .
(iii) Suppose that F is bounded. Consider 0 6= d ∈ 0+ F , which implies that
for every x ∈ F ,
x + λd ∈ F, ∀ λ ≥ 0.
Therefore, as the limit λ → ∞, kx + λdk → ∞, thereby contradicting the

boundedness of F . Hence, 0+ F = {0}.
Conversely, suppose that F is unbounded. Consider x ∈ F and an un-
bounded sequence {xk } ⊂ F . Define
xk − x
dk = .
kxk − xk
Observe that {dk } is a bounded sequence and thus by the Bolzano–Weierstrass

Theorem, Proposition 1.3, has a convergent subsequence. Without loss of gen-
erality, assume that dk → d and as kdk k = 1, kdk = 1. For any fixed λ ≥ 0,
x + λdk ∈ (x, xk ) for every k ∈ N such that kxk − xk ≥ λ. By the convexity
of F , x + λdk ∈ F . Because x + λdk → x + λd, which by the closedness of F
implies that
x + λd ∈ F, ∀ λ ≥ 0.
Applying (ii) yields that 0 6= d ∈ 0+ F , thereby establishing the result.

Observe that if the set F is not closed, then the recession cone of F need not
be closed. Also the equivalence in (ii) of the above proposition need not hold.
To verify this claim, we present an example from Rockafellar [97]. Consider
the set
F = {(x, y) ∈ R2 : x > 0, y > 0} ∪ {(0, 0)},
which is not closed. Here the recession cone 0+ F = F and hence is not closed.
/ 0+ F but (1, 1)+λ(1, 0) ∈ F for every λ ≥ 0, thereby contradicting
Also (1, 0) ∈
the equivalence in (ii).
2.2.2 Hyperplane and Separation Theorems

An unbounded convex set that plays a pivotal role in the development of
convex optimization is the hyperplane. A hyperplane divides the space into
two half spaces. This property helps in the study of separation theorems, thus
moving us a step ahead in the study of convex analysis.
Definition 2.23 A hyperplane H ⊂ Rn is defined as
H = {x ∈ Rn : ha, xi = b},
where a ∈ Rn with a 6= 0 and b ∈ R. If x̄ ∈ H, then the hyperplane can be

equivalently expressed as
H = {x ∈ Rn : ha, xi = ha, x̄i} = x̄ + {x ∈ Rn : ha, xi = 0}.
Therefore, H is an affine set parallel to {x ∈ Rn : ha, xi = 0}.
Definition 2.24 The hyperplane H divides the space into two half spaces,
either closed or open. The closed half spaces associated with H are
H≤ = {x ∈ Rn : ha, xi ≤ b} and H≥ = {x ∈ Rn : ha, xi ≥ b},
while the open half spaces associated with H are
H< = {x ∈ Rn : ha, xi < b} and H> = {x ∈ Rn : ha, xi > b}.
As already mentioned, the notion of separation is based on the fact that

the hyperplane in Rn divides it into two parts. Before discussing the sepa-
ration theorems, we first present types of separation that we will be using
in our subsequent study of developing the optimality conditions for convex
optimization problems.
Definition 2.25 Consider two convex sets F1 and F2 in Rn . A hyperplane

H ⊂ Rn is said to separate F1 and F2 if
ha, x1 i ≤ b ≤ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .

2.2 Convex Sets 45
The separation is said to be strict if
ha, x1 i ≤ b < ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
The separation is proper if
sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i.
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2
In particular, if F1 = {x̄} and F2 = F such that x̄ ∈ cl F , a hyperplane that

separates {x̄} and F is called a supporting hyperplane to F at x̄, that is,
ha, x̄i ≤ ha, xi, ∀ x ∈ F.
The next obvious question is when will the separating hyperplane or the
supporting hyperplane exist. In this respect we prove some existence results
below. The proof is from Bertsekas [12].
Theorem 2.26 (i) (Supporting Hyperplane Theorem) Consider a nonempty

convex set F ⊂ Rn and x̄ ∈
/ ri F . Then there exist a ∈ Rn with a 6= 0 and
b ∈ R such that
ha, x̄i ≤ b ≤ ha, xi, ∀ x ∈ F.
(ii) (Separation Theorem) Consider two nonempty convex sets F1 and F2 in

Rn such that either F1 ∩ F2 = ∅ or F1 ∩ ri F2 = ∅. Then there exists a
hyperplane in Rn separating them.
(iii) (Strict Separation Theorem) Consider two nonempty convex sets F1 and
F2 in Rn such that F1 ∩F2 = ∅. Furthermore, if F1 −F2 is closed or F1 is closed
while F2 is compact, then there exists a hyperplane in Rn strictly separating
them. In particular, consider a nonempty closed convex set F ⊂ Rn and x̄ ∈ / F.
Then there exist a ∈ Rn with a 6= 0 and b ∈ R such that
ha, x̄i < b ≤ ha, xi, ∀ x ∈ F.
(iv) (Proper Separation Theorem) Consider a nonempty convex set F ⊂ Rn

and x̄ ∈ Rn . There exists a hyperplane separating F and x̄ properly if and
only if
x̄ ∈
/ ri F.
Further, consider two nonempty convex sets F1 and F2 in Rn . Then

ri F1 ∩ ri F2 = ∅ if and only if there exists a hyperplane in Rn separating
the sets properly.
Proof. (i) Consider the closure of F , cl F , which by Proposition 2.14 (iv) is

also convex. Because x̄ ∈
/ ri F , there exists a sequence {xk } such that xk ∈
/ cl F

and xk → x̄. Denote the projection of xk on cl F by x̄k . By Proposition 2.52

(see Section 2.3), for every k ∈ N,
hxk − x̄k , x − x̄k i ≤ 0, ∀ x ∈ cl F,
which implies for every k ∈ N,
hx̄k − xk , xi ≥ hx̄k − xk , x̄i

= hx̄k − xk , x̄ − xk i + hx̄k − xk , xk i
≥ hx̄k − xk , xk i, ∀ x ∈ cl F.
Dividing the above inequality throughout by kx̄k − xk k and denoting

x̄k − xk
ak = ,
kx̄k − xk k
hak , xk i ≤ hak , xi, ∀ x ∈ cl F, ∀ k ∈ N.
As kak k = 1 for every k, {ak } is a bounded sequence. By the Bolzano–

Weierstrass Theorem, Proposition 1.3, {ak } has a convergent subsequence.
Without loss of generality, assume that ak → a, where a 6= 0 with kak = 1.
Taking the limit as k → +∞ in the above inequality yields
ha, x̄i ≤ ha, xi, ∀ x ∈ cl F.
Because F ⊂ cl F , the above inequality holds in particular for x ∈ F , that is,
ha, x̄i ≤ b ≤ ha, xi, ∀ x ∈ F,
where b = inf x∈cl F ha, xi, thereby yielding the desired result. If x̄ ∈ cl F , then
the hyperplane so obtained supports F at x̄.
(ii) Define the set
F = F1 − F2 = {x ∈ Rn : x = x1 − x2 , xi ∈ Fi , i = 1, 2}.
Suppose that either F1 ∩ F2 = ∅ or F1 ∩ ri F2 = ∅. Under both scenarios,

0 ∈
/ ri F . By the Supporting Hyperplane Theorem, that is (i), there exist
a ∈ Rn with a 6= 0 such that
ha, xi ≥ 0, ∀ x ∈ F,
which implies
ha, x1 i ≥ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 ,
hence proving the requisite result.

(iii) We shall prove the result under the assumption that F2 −F1 is closed as by
Proposition 2.15, the closedness of F1 along with the compactness of F2 imply

2.2 Convex Sets 47
that F1 − F2 is closed. As F1 ∩ F2 = ∅, 0 6∈ F2 − F1 . Suppose that a ∈ F2 − F1

is the projection of origin on F2 − F1 . Therefore, there exist x̄i ∈ Fi , i = 1, 2,
x̄1 + x̄2
such that a = x̄2 − x̄1 . Define x̄ = . Then the projection of x̄ on cl F1
2
is x̄1 while that on cl F2 is x̄2 . By Proposition 2.52,
hx̄ − x̄i , xi − x̄i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2,
which implies
kak2
ha, x1 i ≤ ha, x̄i − < ha, x̄i, ∀ x1 ∈ F1 ,
2
kak2
ha, x̄i < ha, x̄i + ≤ ha, x2 i, ∀ x2 ∈ F2 .
2
Denoting b = ha, x̄i, the above inequality leads to
ha, x1 i < b < ha, x2 i, ∀ xi ∈ Fi , i = 1, 2,
thus obtaining the strict separation result.

Now consider a closed convex set F ⊂ Rn with x̄ ∈ / F . Taking F1 = F and
F2 = {x̄} in the strict separation result, there exist a ∈ Rn with a 6= 0 and
b ∈ Rn such that
ha, x̄i < b < ha, xi, ∀ x ∈ F.
Defining b̄ = inf x∈F ha, xi, the above inequality yields
ha, x̄i < b̄ ≤ ha, xi, ∀ x ∈ F,
as desired.
(iv) Suppose that there exists a hyperplane that separates F and x̄ properly;
that is, there exists a ∈ Rn with a 6= 0 such that
ha, x̄i ≤ inf ha, xi and ha, x̄i < sup ha, xi.
x∈F x∈F
We claim that x̄ ∈
/ ri F . Suppose on the contrary that x̄ ∈ ri F . Therefore by
the conditions of proper separation, ha, .i attains its minimum at x̄ over F .
By the assumption that x̄ ∈ ri F implies that ha, xi = ha, x̄i for every x ∈ F ,
thereby violating the strict inequality. Hence the supposition was wrong and
x̄ ∈
/ ri F .
Conversely, suppose that x̄ ∈
/ ri F . Consider the following two cases.
(a) x̄ 6∈ af f F : Because af f F is a closed convex subset of Rn , by the Strict

Separation Theorem, that is (iii), there exists a ∈ Rn with a 6= 0 such
that
ha, x̄i < ha, xi, ∀ x ∈ af f F.

As F ⊂ af f F , the above inequality holds for every x ∈ F and hence
ha, x̄i ≤ inf ha, xi and ha, x̄i < sup ha, xi,
x∈F x∈F
thereby establishing the proper separation between F and x̄.
(b) x̄ ∈ af f F : Consider a subspace C parallel to af f F and define the

orthogonal complement of C as
C ⊥ = {x∗ ∈ Rn : hx∗ , xi = 0, ∀ x ∈ C}.
Define F̃ = F +C ⊥ and thus, by Proposition 2.15 (vi), ri F̃ = ri F +C ⊥ .

We claim that x̄ ∈/ ri F̃ . On the contrary, assume that x̄ ∈ ri F̃ , which
implies that there exists x ∈ ri F such that x̄−x ∈ C ⊥ . As x̄, x ∈ af f F ,
x̄−x ∈ C. Therefore, kx̄−xk2 = 0, thereby yielding x̄ = x ∈ ri F , which
is a contradiction. Therefore, x̄ 6∈ ri F̃ . By the Supporting Hyperplane
Theorem, that is (i), there exists a ∈ Rn with a 6= 0 such that
ha, x̄i ≤ ha, x̃i, ∀ x̃ ∈ F̃ ,
which implies that
ha, x̄i ≤ ha, x + yi, ∀ x ∈ F, ∀ y ∈ C ⊥ .
Suppose that ha, ȳi 6= 0 for some ȳ ∈ C ⊥ . Without loss of generality,

let ha, ȳi > 0. Consider x̃ = x + αȳ. Therefore, as the limit α → −∞,
ha, x̃i → −∞, thereby contradicting the above inequality. Thus,
ha, yi = 0, ∀ y ∈ C ⊥ .
Now by Proposition 2.14, ri F̃ is nonempty and thus ha, xi is not con-

stant over F̃ . Thus, by the above condition on C ⊥ ,
ha, x̄i < sup ha, x̃i

x̃∈F̃
= sup ha, xi + sup ha, yi = sup ha, xi,
x∈F y∈C ⊥ x∈F
thereby establishing the proper separation between F and x̄.
Thus, the equivalence between the proper separation of F and x̄ and the fact
that x̄ ∈
/ ri F is proved.
Consider the nonempty convex sets F1 , F2 ⊂ Rn . Define F = F2 − F1 ,
which by Proposition 2.15 (v) and (vi) implies that ri F = ri F1 − ri F2 .

2.2 Convex Sets 49
Therefore, ri F1 ∩ ri F2 = ∅ if and only if 0 ∈

/ ri F . By the proper separation
/ ri F is equivalent to the existence of a ∈ Rn with a 6= 0 such that
result, 0 ∈
0 ≤ inf ha, xi and 0 < sup ha, xi.

x∈F x∈F
By Proposition 1.7,
sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i,
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2

A consequence of the Separation Theorem is the following characterization
of a closed convex set.
Theorem 2.27 A closed convex set F ⊂ Rn is the intersection of closed half

spaces containing it. Consequently for any F̃ ⊂ Rn , cl co F̃ is the intersection
of all the closed half spaces containing F̃ .
Proof. Without loss of generality, we assume that F 6= Rn , otherwise the

result is trivial. For any x̄ ∈
/ F , define F1 = {x̄} and F2 = F . Therefore by
Theorem 2.26 (iii), there exist (a, b) ∈ Rn × R with a 6= 0 such that
ha, x̄i < b ≤ ha, xi, ∀ x ∈ F,
which implies that a closed half space associated with the supporting hyper-
plane contains F and not x̄. Thus the intersection of the closed half spaces
containing F has no points that are not in F .
For any F̃ ⊂ Rn , taking F = cl co F̃ and applying the result for closed
convex set F yields that cl co F̃ is the intersection of all the closed half spaces
containing F̃ .
Another application of the separation theorem is the famous Helly’s The-
orem. We state the result from Rockafellar [97] without proof.
Proposition 2.28 (Helly’s Theorem) Consider a collection of nonempty

closed convex sets Fi , i ∈ I in Rn , where I is an arbitrary index set. Assume
that the sets Fi have no common direction of recession. If every subcollection
T
consisting of n + 1 or fewer sets has nonempty intersection, then i∈I Fi is
nonempty.
The supporting hyperplanes through the boundary points of a set charac-

terizes the convexity of the set that we present in the result below. The proof
is from Schneider [103].
Proposition 2.29 Consider a closed set F ⊂ Rn such that int F is nonempty

and through each boundary point of F there is a supporting hyperplane. Then
F is convex.

Proof. Suppose that F is not convex, which implies that there exist x, y ∈ F
such that z ∈ [x, y] but z ∈/ F . Because int F is nonempty, there exists some
a ∈ int F such that x, y and a are affinely independent. Also as F is closed,
[a, z) meets the boundary of F , say at b ∈ F . By the given hypothesis, there
is a supporting hyperplane H to F through b with a 6∈ H. Therefore, H meets
af f {x, y, a} in a line and hence x, y and a must lie on the same side of the
line, which is a contradiction. Hence, F is a convex set.
2.2.3 Polar Cones

From the previous discussions, we know that closed half spaces are closed
convex sets and by Proposition 2.3 that arbitrary intersection of half spaces
give rise to another closed convex set. One such class is of the polar cones.
Definition 2.30 Consider a set F ⊂ Rn . The cone defined as
F ◦ = {x∗ ∈ Rn : hx∗ , xi ≤ 0, ∀ x ∈ F }
is called the polar cone of F . Observe that the elements of the polar cone
make an obtuse angle with every element of the set. The cone F ◦◦ = (F ◦ )◦ is
called the bipolar cone of the set F .
Thus, the polar of the set F is a closed convex cone irrespective of whether
F is closed convex or not. We present some properties of polar and bipolar
cones.
Proposition 2.31 (i) Consider two sets F1 , F2 ⊂ Rn such that F1 ⊂ F2 .
Then F2◦ ⊂ F1◦ .
(ii) Consider a nonempty set F ⊂ Rn . Then
F ◦ = (cl F )◦ = (co F )◦ = (cone co F )◦ .
(iii) Consider a nonempty set F ⊂ Rn . Then
F ◦◦ = cl cone co F.
If F is a convex cone, F ◦◦ = cl F and in addition if F is closed, F ◦◦ = F .
(iv) Consider two cones Ki ∈ Rni , i = 1, 2. Then
(K1 × K2 )◦ = K1◦ × K2◦ .
(v) Consider two cones K1 , K2 ⊂ Rn . Then

(K1 + K2 )◦ = K1◦ ∩ K2◦ .
(vi) Consider two closed convex cones K1 , K2 ⊂ Rn . Then
(K1 ∩ K2 )◦ = cl(K1◦ + K2◦ ).
The closure is superfluous under the condition K1 ∩ int K2 6= ∅.

2.2 Convex Sets 51
Proof. (i) Suppose that x∗ ∈ F2◦ , which implies that
hx∗ , xi ≤ 0, ∀ x ∈ F2 .
Because F1 ⊂ F2 , the above inequality leads to
hx∗ , xi ≤ 0, ∀ x ∈ F1 ,
thereby showing that F2◦ ⊂ F1◦ .

(ii) As F ⊂ cl F , by (i) (cl F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ .
Consider x ∈ cl F , which implies that there exists {xk } ⊂ F such that xk → x.
Because x∗ ∈ F ◦ , by Definition 2.30,
hx∗ , xk i ≤ 0,
which implies that
hx∗ , xi ≤ 0.
Because x ∈ cl F was arbitrary, the above inequality holds for every x ∈ cl F

and hence x∗ ∈ (cl F )◦ , thus yielding F ◦ = (cl F )◦ .
As F ⊂ co F , by (i) (co F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ ,
which implies
hx∗ , xi ≤ 0, ∀ x ∈ F.
For any λ ∈ [0, 1],
hx∗ , (1 − λ)x + λyi ≤ 0, ∀ x, y ∈ F,
which implies that
hx∗ , zi ≤ 0, ∀ z ∈ co F.
Therefore, x∗ ∈ (co F )◦ , as desired.

Also, because F ⊂ cone F , again by (i) (cone F )◦ ⊂ F ◦ . Conversely,
suppose that x∗ ∈ F ◦ . For any λ ≥ 0,
hx∗ , λxi ≤ 0, ∀ x ∈ F,
which implies that
hx∗ , zi ≤ 0, ∀ z ∈ cone F.
Therefore, x∗ ∈ (cone F )◦ , thereby yielding the desired result.

(iii) We shall first establish the case when F is a closed convex cone. Suppose
that x ∈ F . By Definition 2.30 of F ◦ ,
hx∗ , xi ≤ 0, ∀ x∗ ∈ F ◦ ,

which implies that x ∈ F ◦◦ . Therefore, F ⊂ F ◦◦ .

Conversely, suppose that x̄ ∈ F ◦◦ . We claim that x ∈ F . On the contrary,
assume that x̄ 6∈ F . Because F is closed, by Theorem 2.26 (iii), there exist
a ∈ Rn with a 6= 0 and b ∈ R such that
ha, x̄i < b ≤ ha, xi, ∀ x ∈ F.
As F is a cone, 0 ∈ F , which along with the above inequality implies that

b ≤ 0 and ha, x̄i < 0. We claim that a ∈ −F ◦ . If not, then there exists x̃ ∈ F
such that ha, x̃i < 0. Choosing λ̃ > 0 such that ha, λ̃x̃i < b. Again, as F is a
cone, λ̃x̃ ∈ F , thereby contradicting the fact that
b ≤ ha, xi, ∀ x ∈ F.
Therefore, a ∈ −F ◦ . Because ha, x̄i < 0 for x̄ ∈ F ◦◦ , it contradicts a ∈ −F ◦ .

Thus we arrive at a contradiction and hence F ◦◦ ⊂ F , thereby leading to the
requisite result.
Now from (ii), it is obvious that
F ◦ = (cl cone co F )◦ .
Therefore,
F ◦◦ = (cl cone co F )◦◦ ,
which by the fact that cl cone co F is a closed convex cone yields that
F ◦◦ = cl cone co F,
as desired. If F is a convex cone, from the above condition it is obvious that

F ◦◦ = cl F .
(iv) Suppose that d = (d1 , d2 ) ∈ (K1 × K2 )◦ , which implies that
hd, xi ≤ 0, ∀ x ∈ K1 × K2 .
Therefore, for x = (x1 , x2 ) ∈ K1 × K2 ,
hd1 , x1 i + hd2 , x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 .
Because K1 and K2 are cones, 0 ∈ Ki , i = 1, 2. In particular, for x2 = 0, the

above inequality reduces to
hd1 , x1 i ≤ 0, ∀ x1 ∈ K1 ,
which implies that d1 ∈ K1◦ . Similarly it can be shown that d2 ∈ K2◦ . Thus,
d ∈ K1◦ × K2◦ , thereby leading to (K1 × K2 )◦ ⊂ K1◦ × K2◦ .
Conversely, suppose that di ∈ Ki◦ , i = 1, 2, which implies
hdi , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.

2.2 Convex Sets 53
Therefore,
h(d1 , d2 ), (x1 , x2 )i ≤ 0, ∀ (x1 , x2 ) ∈ K1 × K2 ,
which yields (d1 , d2 ) ∈ (K1 × K2 )◦ , that is, K1◦ × K2◦ ⊂ (K1 × K2 )◦ , thereby
proving the result.
(v) Suppose that x∗ ∈ (K1 + K2 )◦ , which implies that for xi ∈ Ki , i = 1, 2,
hx∗ , x1 + x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 .
Because K1 and K2 are cones, 0 ∈ Ki , i = 1, 2, which reduces the above

inequality to
hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.
Therefore, x∗ ∈ K1◦ ∩ K2◦ , thereby leading to (K1 + K2 )◦ ⊂ K1◦ ∩ K2◦ .

Conversely, suppose that x∗ ∈ K1◦ ∩ K2◦ , which implies that
hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.
Thus, for x = x1 + x2 ∈ K1 + K2 , the above inequality leads to
hx∗ , xi ≤ 0, ∀ x ∈ K1 + K2 ,
which implies that x∗ ∈ (K1 + K2 )◦ , thus yielding the desired result.

(vi) Replacing Ki by Ki◦ , i = 1, 2, in (iv) along with (iii) leads to
(K1◦ + K2◦ )◦ = K1 ∩ K2 .
Again by (iii), the above condition becomes
cl (K1◦ + K2◦ ) = (K1 ∩ K2 )◦ ,
thereby yielding the requisite result.

Similar to the polar cone, we have the notion of a positive polar cone.
Definition 2.32 Consider a set F ⊂ Rn . The positive polar cone to the set
F is defined as
F + = {x∗ ∈ Rn : hx∗ , xi ≥ 0, ∀ x ∈ F }.
Observe that F + = (−F )◦ = −F ◦ .
The notion of polarity will play a major role in the study of tangent and
normal cones that are polar to each other. These cones are important in the
development of convex optimization.

2.2.4 Tangent and Normal Cones

In the analysis of a constrained optimization problem, we try to look at the
local behavior of the function in the neighboring feasible points. To move
from one point to another feasible point, we need a direction that leads to the
notion of feasible directions.
Definition 2.33 Let F ⊂ Rn and x̄ ∈ F . A vector d ∈ Rn is said to be a
feasible direction of F at x̄ if there exists ᾱ > 0 such that
x̄ + αd ∈ F, ∀ α ∈ [0, ᾱ].
It is easy to observe that the set of all feasible directions forms a cone. For
a convex set F , the set of feasible directions at x̄ is of the form α(x − x̄) where
α ∈ [0, 1] and x ∈ F . However, in case F is nonconvex, the set of feasible
directions may reduce to singleton {0}. For example, consider the nonconvex
set F = {(−1, 1), (1, 1)}. The only feasible direction possible is {0}. This
motivates us to introduce the concept of tangent cones that would provide
local information of the set F at a point even when feasible direction is just
zero. The notion of tangent cones may be considered a generalization of the
tangent concept in a smooth scenario to that in a nonsmooth case.
Definition 2.34 Consider a set F ⊂ Rn and x̄ ∈ F . A vector d ∈ Rn is said
to be a tangent to F at x̄ if there exist {dk } ⊂ Rn with dk → d and {tk } ⊂ R+
with tk → 0 such that
x̄ + tk dk ∈ F, ∀ k ∈ N.
Observe that if d is a tangent, then so is λd for λ ≥ 0. Thus, the collection
of all tangents form a cone known as the tangent cone denoted by TF (x̄) and
given by
TF (x̄) = {d ∈ Rn : there exist dk → d, tk ↓ 0 such that
x̄ + tk dk ∈ F, ∀ k ∈ N}.
In the above definition, denote xk = x̄ + tk dk ∈ F . Taking the limit as
k → +∞, tk → 0, and dk → d, which implies that tk dk → 0, thereby leading
to xk → x̄. Also from construction,
xk − x̄
= dk → d.
tk
Thus, the tangent cone can be equivalently expressed as
TF (x̄) = {d ∈ Rn : there exist {xk } ⊂ F, xk → x̄, tk ↓ 0 such that
xk − x̄
→ d}.
tk
Figure 2.4 is a representation of the tangent cone to a convex set F . Next
we present some properties of the tangent cone. The proofs are from Hiriart-
Urruty and Lemaréchal [63].

2.2 Convex Sets 55
x̄ + TF (x̄)
x̄
TF (x̄)
FIGURE 2.4: Tangent cone.
Theorem 2.35 Consider a set F ⊂ Rn and x̄ ∈ F . Then the following hold:
(i) TF (x̄) is closed.
(ii) If F is convex, TF (x̄) is the closure of the cone generated by F − {x̄},

that is,
TF (x̄) = cl cone(F − x̄)
and hence convex.
Proof. (i) Suppose that {dk } ⊂ TF (x̄) such that dk → d. Because dk ∈ TF (x̄),
there exist {xrk } ⊂ F with xrk → x̄ and {trk } ⊂ R+ with trk → 0 such that
xrk − x̄
→ dk , ∀ k ∈ N.
trk
For a fixed k, one can always find r̄ such that
xrk − x̄ 1
k − dk k < , ∀ r ≥ r̄.
trk k

Taking the limit as k → +∞, one can generate a sequence {xk } ⊂ F with
xk → x̄ and {tk } ⊂ R+ with tk → 0 such that
xk − x̄
→ d.
tk
Thus, d ∈ TF (x̄), thereby establishing that TF (x̄) is closed.

(ii) Suppose that d ∈ TF (x̄), which implies that there exist {xk } ⊂ F with
xk → x̄ and {tk } ⊂ R+ with tk → 0 such that
xk − x̄
→ d.
tk
Observe that xk − x̄ ∈ F − x̄. As tk > 0, 1/tk > 0, which implies that

xk − x̄
∈ cone (F − x̄),
tk
thereby implying that d ∈ cl cone (F − x̄). Hence
TF (x̄) ⊂ cl cone (F − x̄). (2.11)
Conversely, consider an arbitrary but fixed element x ∈ F . Define a se-

quence
1
xk = x̄ + (x − x̄), k ∈ N.
k
By the convexity of F , it is obvious that {xk } ⊂ F . Taking the limit as
k → +∞, xk → x̄, then by construction
k(xk − x̄) = x − x̄.
1 (xk − x̄)
Denoting tk = > 0, tk → 0 such that → x − x̄, which implies that
k tk
x − x̄ ∈ TF (x̄). Because x ∈ F is arbitrary, F − x̄ ⊂ TF (x̄). As TF (x̄) is a
cone, cone (F − x̄) ⊂ TF (x̄). By (i), TF (x̄) is closed, which implies
cl cone (F − x̄) ⊂ TF (x̄).
The above inclusion along with the reverse inclusion (2.11) yields the desired
equality.
Because F is convex, the set F − x̄ is also convex. Invoking Proposi-
tion 2.14 (iv) implies that TF (x̄) is a convex set.
We now move on to another conical approximation of a convex set that
is the normal cone that plays a major role in establishing the optimality
conditions.

2.2 Convex Sets 57
x̄
x̄ + NF (x̄)
NF (x̄)
FIGURE 2.5: Normal cone.
Definition 2.36 Consider a convex set F ⊂ Rn and x̄ ∈ F . A vector d ∈ Rn

is normal to F at x̄ if
hd, x − x̄i ≤ 0, ∀ x ∈ F.
Observe that if d is a normal, then so is λd for λ ≥ 0. The collection of all

normals forms the cone called normal cone and is denoted by NF (x̄).
For a convex set, the relation between the tangent cone and the normal
cone is given by the proposition below.
Proposition 2.37 Consider a convex set F ⊂ Rn . Then TF (x̄) and NF (x̄)

are polar to each other, that is,
NF (x̄) = (TF (x̄))◦ and TF (x̄) = (NF (x̄))◦ .
Proof. Suppose that d ∈ NF (x̄), which implies that
hd, x − x̄i ≤ 0, ∀ x ∈ F.

Observe that for x ∈ F , x − x̄ ∈ F − x̄, which implies that d ∈ (F − x̄)◦ . By

Proposition 2.31 (ii) along with the convexity of F and hence of F − x̄, and
Theorem 2.35 (ii),
d ∈ (cl cone (F − x̄))◦ = (TF (x̄))◦ ,
thereby implying that NF (x̄) ∈ (TF (x̄))◦ .

Conversely, suppose that d ∈ (TF (x̄))◦ . As F − x̄ ⊂ TF (x̄), by Proposi-
tion 2.31 (i), (TF (x̄))◦ ⊂ (F − x̄)◦ , which implies that
hd, x − x̄i ≤ 0, ∀ x ∈ F,
that is, d ∈ NF (x̄). Therefore, NF (x̄) = (TF (x̄))◦ as desired.

For a convex set F , TF (x̄) is a closed convex cone. Therefore, by Proposi-
tion 2.31 (iii),
(NF (x̄))◦ = (TF (x̄))◦◦ = TF (x̄),
thereby yielding the requisite result.
Figure 2.5 is a representation of the normal cone to a convex set F . Observe

that the normal cone is polar to the tangent cone in Figure 2.4. Now we present
some simple examples for tangent cones and normal cones.
Example 2.38 (i) For a convex set F ⊂ Rn , it can be easily observed that
TF (x) = Rn for every x ∈ int F and by polarity, NF (x) = {0} for every
x ∈ int F .
(ii) For a closed convex cone K ⊂ Rn , by Theorem 2.35 (ii) it is obvious that
TK (0) = K while by Proposition 2.37, NK (0) = K ◦ . Also, for 0 6= x ∈ K,
from the definition of normal cone,
NK (x) = {d ∈ Rn : d ∈ K ◦ , hd, xi = 0}.
(iii) Consider the closed convex set F ⊂ Rn given by
F = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m}
and define the active index set I(x) = {i ∈ {1, 2, . . . , m} : hai , xi = bi }. The
set F is called a polyhedral set, which we will discuss in the next section. Then
TF (x) = {d ∈ Rn : hai , di ≤ 0, ∀ i ∈ I(x)},

NF (x) = cone co {ai : i ∈ I(x)}.
Before moving on to discuss the polyhedral sets, we present some results

on the tangent and normal cones.

2.2 Convex Sets 59
Proposition 2.39 (i) Consider two closed convex sets Fi ⊂ Rn , i = 1, 2. Let

x̄ ∈ F1 ∩ F2 . Then
T (x̄; F1 ∩ F2 ) ⊂ T (x̄; F1 ) ∩ T (x̄; F2 ),

N (x̄; F1 ∩ F2 ) ⊃ N (x̄; F1 ) + N (x̄; F2 ).
If in addition, ri F1 ∩ ri F2 6= ∅, the above inclusions hold as equality.

(ii) Consider two closed convex sets Fi ⊂ Rni , i = 1, 2. Let x̄i ∈ Fi , i = 1, 2.
Then
T ((x̄1 , x̄2 ); F1 × F2 ) = T (x̄1 ; F1 ) × T (x̄2 ; F2 ),

N ((x̄1 , x̄2 ); F1 × F2 ) = N (x̄1 ; F1 ) × N (x̄2 ; F2 ).
(iii) Consider two closed convex sets Fi ⊂ Rn , i = 1, 2. Let x̄i ∈ Fi , i = 1, 2.

Then
T (x̄1 + x̄2 ; F1 + F2 ) = cl (T (x̄1 ; F1 ) + T (x̄2 ; F2 )),

N (x̄1 + x̄2 ; F1 + F2 ) = N (x̄1 ; F1 ) ∩ N (x̄2 ; F2 ).
Proof. (i) We first establish the result for the normal cone and then use it
to derive the result for the tangent cone. Suppose that di ∈ NFi (x̄), i = 1, 2,
which implies that
hdi , xi − x̄i ≤ 0, ∀ xi ∈ Fi , i = 1, 2.
For any x ∈ F1 ∩ F2 , the above inequality is still valid for i = 1, 2. Therefore,
hd1 + d2 , x − x̄i ≤ 0, ∀ x ∈ F1 ∩ F2 ,
which implies that d1 + d2 ∈ NF1 ∩F2 (x̄). Because di ∈ NFi (x̄), i = 1, 2, were
arbitrarily chosen, NF1 (x̄) + NF2 (x̄) ⊂ NF1 ∩F2 (x̄).
By Propositions 2.31 (i), (v), and 2.37,
TF1 ∩F2 (x̄) ⊂ (NF1 (x̄) + NF2 (x̄))◦ = TF1 (x̄) ∩ TF2 (x̄),
as desired. We shall prove the equality part as an application of the subdiffer-

ential sum rule, Theorem 2.91.
(ii) Suppose that d = (d1 , d2 ) ∈ NF1 ×F2 (x̄1 , x̄2 ), which implies
h(d1 , d2 ), (x1 , x2 ) − (x̄1 , x̄2 )i ≤ 0, ∀ (x1 , x2 ) ∈ F1 × F2 ,
that is,
hd1 , x1 − x̄1 i + hd2 , x2 − x̄2 i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
The above inequality holds in particular for x2 = x̄2 , thereby reducing it to
hd1 , x1 − x̄1 i ≤ 0, ∀ x1 ∈ F1 ,

which by Definition 2.36 implies that d1 ∈ NF1 (x̄1 ). Similarly, it can be

shown that d2 ∈ NF2 (x̄2 ). Because (d1 , d2 ) ∈ NF1 ×F2 (x̄1 , x̄2 ) was arbitrary,
NF1 ×F2 (x̄1 , x̄2 ) ⊂ NF1 (x̄1 ) × NF2 (x̄2 ).
Conversely, consider d1 ∈ NF1 (x̄1 ) and d2 ∈ NF2 (x̄2 ), which implies that
(d1 , d2 ) ∈ NF1 (x̄1 ) × NF2 (x̄2 ). Therefore,
hdi , xi − x̄i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2,
which leads to
h(d1 , d2 ), (x1 , x2 ) − (x̄1 , x̄2 )i ≤ 0, ∀ (x1 , x2 ) ∈ F1 × F2 ,
thereby yielding that (d1 , d2 ) ∈ NF1 ×F2 (x̄1 , x̄2 ). As di ∈ NFi (x̄i ), i = 1, 2,
were arbitrary, NF1 ×F2 (x̄1 , x̄2 ) ⊃ NF1 (x̄1 ) × NF2 (x̄2 ), thereby leading to the
desired result. The result on the tangent cone can be obtained by applying
Propositions 2.31 (iv) and 2.37.
(iii) Suppose that d ∈ NF1 +F2 (x̄1 + x̄2 ), which leads to
hd, x1 − x̄1 i + hd, x2 − x̄2 i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
In particular, for x2 = x̄2 , the above inequality reduces to
hd, x1 − x̄1 i ≤ 0, ∀ x1 ∈ F1 ,
that is, d ∈ NF1 (x̄1 ). Similarly, d ∈ NF2 (x̄). Because d ∈ NF1 +F2 (x̄1 + x̄2 ) was
arbitrary, NF1 +F2 (x̄1 + x̄2 ) ⊂ NF1 (x̄1 ) ∩ NF2 (x̄2 ).
Conversely, consider d ∈ NF1 (x̄1 ) ∩ NF2 (x̄2 ). Therefore,
hd, xi − x̄i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2,
which implies that
hd, (x1 + x2 ) − (x̄1 + x̄2 )i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
This leads to d ∈ NF1 +F2 (x̄1 + x̄2 ). As d ∈ NF1 (x̄1 ) ∩ NF2 (x̄2 ) was arbitrary,
NF1 (x̄1 ) ∩ NF2 (x̄2 ) ⊂ NF1 +F2 (x̄1 + x̄2 ), thus establishing the desired result.
The result on tangent cone can be obtained by applying Propositions 2.31 (vi)
and 2.37.
2.2.5 Polyhedral Sets

As discussed in the beginning, finite intersection of closed half spaces generate
a class of convex sets known as the polyhedral sets. Here, we discuss briefly
this class of sets.
Definition 2.40 A set P ⊂ Rn is said to be a polyhedral set if it is nonempty
and is expressed as
P = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m},
where ai ∈ Rn and bi ∈ R for i = 1, 2, . . . , m. Obviously, P is a convex set.

2.2 Convex Sets 61
Polyhedral sets play an important role in the study of linear programming

problems. A polyhedral set can also be considered a finite intersection of
closed half spaces and hyperplane. Any hyperplane ha, xi = b can be further
segregated into two half spaces, ha, xi ≤ b and h−a, xi ≤ −b, and thus can be
expressed as the form in the definition. If in the above definition of polyhedral
sets, bi = 0, i = 1, 2, . . . , m, we get the notion of polyhedral cones.
Definition 2.41 A polyhedral set P is a polyhedral cone if and only if it can

be expressed as the intersection of finite collection of closed half spaces whose
supporting hyperplane pass through the origin. Equivalently, the polyhedral
cone P is given by
P = {x ∈ Rn : hai , xi ≤ 0, i = 1, 2, . . . , m},
where ai ∈ Rn for i = 1, 2, . . . , m.
Next we state some operations on the polyhedral sets and cones. For proofs,
the readers are advised to refer to Rockafellar [97].
Proposition 2.42 (i) Consider polyhedral set (cone) P ⊂ Rn and a linear

transformation A : Rn → Rm . Then A(C) as well as A−1 (C) are polyhedral
sets (cones).
(ii) Consider polyhedral sets (cones) Fi ⊂ Rni , i = 1, 2, . . . , m. Then the
Cartesian product F1 × F2 × . . . × Fm is a polyhedral set (cone).
(iii) Consider polyhedral sets (cones) Fi ⊂ Rn , i = 1, 2, . . . , m. Then the
m
\ Xm
intersection Fi and the sum Fi are also polyhedral sets (cones).
i=1 i=1
With the notion of polyhedral sets, another concept that comes into the
picture is that of a finitely generated set.
Definition 2.43 A set F ⊂ Rn is a finitely generated set if and only if there

exist xi ∈ Rn , i = 1, 2, . . . , m, such that for a fixed integer j, 0 ≤ j ≤ m, F is
given by
j
X m
X
F = {x ∈ Rn : x = λi xi + λi xi , λi ≥ 0, i = 1, 2, . . . , m,
i=1 i=j+1
j
X
λi = 1}
i=1
where {x1 , x2 , . . . , xm } are the generators of the set. For a finitely generated
cone, it is the same set with j = 0 and then {x1 , x2 , . . . , xm } are the generators
of the cone.

Below we mention some characterizations and properties of polyhedral sets

and finitely generated cones. The results are stated without proofs. For more
details on polyhedral sets, one can refer to Bertsekas [12], Rockafellar [97],
and Wets [111].
Proposition 2.44 (i) A set (cone) is polyhedral if and only if it is finitely

generated.
(ii) The polar of a polyhedral convex set is polyhedral.
(iii) Let x1 , x2 , . . . , xm ∈ Rn . Then the finitely generated cone
F = cone co{x1 , x2 , . . . , xm }
is closed and its polar cone is a polyhedral cone given by
F ◦ = {x ∈ Rn : hxi , xi ≤ 0, i = 1, 2, . . . , m}.
With all these background on convex sets, we move on to the study of

convex functions.
2.3 Convex Functions

We devote this section to the study of convex functions and their properties.
We also look into some special class of convex functions, namely the sublinear
functions. We begin by formally defining the convex functions. But before
that, let us recall some notions.
Definition 2.45 Consider a function φ : Rn → R̄. The domain of the func-

tion φ is defined as
dom φ = {x ∈ Rn : φ(x) < +∞}.
The epigraph of the function φ is given by
epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.
Observe that the notion of epigraph holds for domain points only. The function
is proper if φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. A function
is said to be improper if there exists x̂ ∈ Rn such that φ(x̂) = −∞.
Definition 2.46 A function φ : Rn → R̄ is said to be convex if for any

x, y ∈ Rn and λ ∈ [0, 1] we have
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).

2.3 Convex Functions 63
φ
epi φ
(y, φ(y))
(x, φ(x))
x y
FIGURE 2.6: Graph and epigraph of convex function.
If φ is a convex function, then the function ψ : Rn → R defined as ψ = −ψ is

said to be a concave function.
Definition 2.47 A function φ : Rn → R̄ is said to be strictly convex if for

distinct x, y ∈ Rn and λ ∈ (0, 1) we have
φ((1 − λ)x + λy) < (1 − λ)φ(x) + λφ(y).
The proposition given below is an equivalent characterization of a convex

set in terms of its epigraph mentioned in Chapter 1.
Proposition 2.48 Consider a proper function φ : Rn → R̄. φ is convex if

and only if epi φ is a convex set on Rn × R.
Proof. Suppose φ is convex. Consider (xi , αi ) ∈ epi φ, i = 1, 2, which implies

that φ(xi ) ≤ αi , i = 1, 2. This along with the convexity of φ yields that for
every λ ∈ [0, 1],
φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ) ≤ (1 − λ)α1 + λα2 .
Thus, ((1 − λ)x1 + λx2 , (1 − λ)α1 + λα2 ) ∈ epi f . Because (xi , αi ), i = 1, 2,

were arbitrary, thereby leading to the convexity of epi φ on Rn × R.
Conversely, suppose that epi φ is convex. Consider x1 , x2 ∈ dom φ. It is
obvious that (xi , φ(xi )) ∈ epi φ, i = 1, 2. By the convexity of epi φ, for every
λ ∈ [0, 1],
(1 − λ)(x1 , φ(x1 )) + λ(x2 , φ(x2 )) ∈ epi φ,
which implies that
φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1],
thereby implying the convexity of φ, and thus establishing the result.

Figure 2.6 represents the graph and epigraph of a convex function. Observe
that the epigraph is a convex set. Another alternate characterization of the
convex function is in terms of the strict epigraph set. So next we state the
notion of strict epigraph and present the equivalent characterization.
Definition 2.49 The strict epigraph of the function φ is given by
epis φ = {(x, α) ∈ Rn × R : φ(x) < α}.
Proposition 2.50 Consider a proper function φ : Rn → R̄. φ is convex if

and only if epis φ is a convex set on Rn × R.
Proof. The necessary part, that is, the convexity of φ implies that epi s φ is
convex, can be worked along the lines of proof of Proposition 2.48.
Conversely, suppose that epis φ is convex. Consider x1 , x2 ∈ dom φ and
αi ∈ R, i = 1, 2 such that φ(xi ) < αi , i = 1, 2. Therefore, (xi , αi ) ∈ epis φ,
i = 1, 2. By the convexity of epis φ, for every λ ∈ [0, 1],
(1 − λ)(x1 , α1 ) + λ(x2 , α2 ) ∈ epis φ,
which implies that
φ((1 − λ)x1 + λx2 ) < (1 − λ)α1 + λα2 , ∀ λ ∈ [0, 1].
As the above inequality holds for every αi > φ(xi ), i = 1, 2, taking the limit
as αi → φ(xi ), i = 1, 2, the above condition becomes
φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1].
Because x1 and x2 were arbitrarily chosen, the above inequality leads to the
convexity of φ and hence the result.
The definitions presented above are for extended-valued functions. These
definitions can also be given for a function to be convex over a convex set
F ⊂ Rn as φ is convex when dom φ is restricted to F . However, in this book,
we will be considering real-valued functions unless otherwise specified.
Next we state Jensen’s inequality for the proper convex functions. The
proof can be worked out using the induction and the readers are advised to
do so.
Theorem 2.51 (Jensen’s Inequality) Consider a proper convex Pm function

φ : Rn → R̄. Let xi ∈ dom φ and λi ≥ 0 for i = 1, 2, . . . , m with i=1 λi = 1.
Then φ is convex if and only if
Xm m
X
φ( λi xi ) ≤ λi φ(xi )
i=1 i=1
for every such collection of xi and λi , i = 1, 2, . . . , m.

Consider a set F ⊂ Rn . The indicator function, δF : Rn → R̄, to the set F

is defined as

0, x ∈ F,
δF (x) =
+∞, otherwise.
It can be easily shown that δF is lsc and convex if and only if F is closed and
convex, respectively. Also, for sets F1 , F2 ⊂ Rn ,
δF1 ∩F2 (x) = δF1 (x) + δF2 (x).
The indicator function plays an important role in the study of optimality

conditions by converting a constrained problem into an unconstrained one.
Consider a constrained programming problem
min f (x) subject to x ∈ C,
where f : Rn → R and C ⊂ Rn . Then the associated unconstrained problem

is
where f0 : Rn → R̄ is a function given by f0 (x) = f (x) + δC (x), that is,

f (x), x ∈ C,
f0 (x) =
+∞, otherwise.
We will look into this aspect more when we study the derivations of optimality
condition for the convex programming problem (CP ) presented in Chapter 1
in the subsequent chapters.
For a set F ⊂ Rn , the distance function, dF : Rn → R, to F from a point
x̄ is defined as
dF (x̄) = inf kx − x̄k.

x∈F
For a convex set F , the distance function dF is a convex function. If the

infimum is attained, say at x̃ ∈ F , that is,
inf kx − x̄k = kx̃ − x̄k,

x∈F
then x̃ is said to be a projection of x̄ to F and denoted by projF (x̄). Below

we present an important result on projection.
Proposition 2.52 Consider a closed convex set F ⊂ Rn and x̄ ∈ Rn . Then

x̃ ∈ projF (x̄) if and only if
hx̄ − x̃, x − x̃i ≤ 0, ∀ x ∈ F. (2.12)

Proof. Suppose that the inequality (2.12) holds for x̃ ∈ F and x̄ ∈ Rn . For
any x ∈ F , consider
kx − x̄k2 = kx − x̄k2 + kx̃ − x̄k2 − 2hx̄ − x̃, x − x̃i
≥ kx̃ − x̄k2 − 2hx̄ − x̃, x − x̃i, ∀ x ∈ F.
Because (2.12) is assumed to hold, the above condition leads to
kx − x̄k2 ≥ kx̃ − x̄k2 , ∀ x ∈ F,
thereby implying that x̃ ∈ projF (x̄).
Conversely, suppose that x̃ ∈ projF (x̄). Consider any x ∈ F and for
α ∈ [0, 1], define
xα = (1 − α)x̃ + αx ∈ F.
Therefore,
kx̄ − xα k2 = k(1 − α)(x̄ − x̃) + α(x̄ − x)k2
= (1 − α)2 kx̄ − x̃k2 + α2 kx̄ − xk2 + 2α(1 − α)hx̄ − x̃, x̄ − xi.
Observe that as a function of α, kx̄ − xα k2 has a point of minimizer over [0, 1]
at α = 0. Thus,
∇α {kx̄ − xα k2 } |α=0 ≥ 0,
which implies

2 −kx̄ − x̃k2 + hx̄ − x, x̄ − x̃i ≥ 0.
The above inequality leads to
−hx̄ − x̃, x̄ − x̃i + hx̄ − x, x̄ − x̃i = hx̄ − x̃, x̃ − xi ≥ 0, ∀ x ∈ F,
thereby yielding (2.12) and hence completing the proof.
Another class of functions that is also convex in nature are the sublinear
functions and support functions. These classes of functions will be discussed
in the next subsection. But before that we present some operations on the
convex functions that again belong to the class of convex functions itself.
n
Proposition 2.53 (i) Consider proper Pm convex functions φi : R → R̄ and
αi ≥ 0, i = 1, 2, . . . , m. Then φ = i=1 αi φi is also a convex function.
(ii) Consider a proper convex function φ : Rn → R̄ and a nondecreasing
proper convex function ψ : R → R̄. Then the composition function defined as
(ψ ◦ φ)(x) = ψ(φ(x)) is a convex function provided ψ(+∞) = +∞.
(iii) Consider a family of proper convex functions φi : Rn → R̄, i ∈ I, where
I is an arbitrary index set. Then φ = supi∈I φi is a convex function.
(iv) Consider a convex set F ⊂ Rn+1 . Then φ(x) = inf{α ∈ R : (x, α) ∈ F }
is convex.

Proof. (i) By Definition 2.46 of convexity, for any x, y ∈ Rn and any λ ∈ [0, 1],
φi ((1 − λ)x + λy) ≤ (1 − λ)φi (x) + λφi (y), i = 1, 2, . . . , m.
As αi ≥ 0, i = 1, 2, . . . , m, multiplying the above inequality by αi and adding

them leads to
m
X m
X m
X
αi φi ((1 − λ)x + λy) ≤ (1 − λ) αi φi (x) + λ αi φi (y),
i=1 i=1 i=1
Pm
thereby yielding the convexity of i=1 αi φi .
(ii) By the convexity of φ, for every x, y ∈ Rn and for every λ ∈ [0, 1],
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).
Because ψ is nondecreasing convex function, for every x, y ∈ Rn ,
ψ(φ((1 − λ)x + λy)) ≤ ψ((1 − λ)φ(x) + λφ(y))

≤ (1 − λ)ψ(φ(x)) + λψ(φ(y)), ∀ λ ∈ [0, 1].
Thus, (ψ ◦ φ) is a convex function.

T
(iii) Observe that epi φ = i∈I epi φi , which on applying Proposition 2.3 (i)
leads to the convexity of epi φ. Now invoking Proposition 2.48 yields the
convexity of φ.
(iv) Consider any arbitrary ε > 0. Then for any (xi , αi ) ∈ F, i = 1, 2,
αi ≤ φ(xi ) + ε, i = 1, 2.
By the convexity of F , for any λ ∈ [0, 1],
φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)α1 + λα2 ≤ (1 − λ)φ(x1 ) + λφ(x2 ) + ε.
Because the above condition holds for every ε > 0, taking the limit as ε → 0,
the above condition reduces to
φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1],
thereby leading to the convexity of φ.

The proof of (iv) is from Hiriart-Urruty and Lemaréchal [63]. These prop-
erties play an important role in convex analysis. From the earlier discussions
we have that a constrained problem can be equivalently expressed as an uncon-
strained problem using the indicator function. Under the convexity assump-
tions as in the convex programming problem (CP ) and using (i) of the above
proposition, one has that f0 (x) = (f + δC )(x) is a convex function, thereby
reducing (CP ) to an unconstrained convex programming problem that then

leads to the KKT optimality conditions under some assumptions, as we shall

see in Chapter 3.
The property (ii) of Proposition 2.53 leads to the formulation of conjugate
functions. We will discuss this class of functions later in this chapter as it will
also play a pivotal in the study of convex optimization theory.
Next we define infimal convolution or simply inf-convolution on convex
functions. The motivation for this operation comes from the sum of epigraph
and the infimum operation as in (iv) of the above proposition. Consider two
proper convex functions φi : Rn → R̄, i = 1, 2. Then by Proposition 2.3 (ii),
the set F = epi φ1 +epi φ2 is a convex set in Rn ×R. Explicitly, F is expressed
as
F = {(x1 + x2 , α1 + α2 ) ∈ Rn × R : (xi , αi ) ∈ epi φi , i = 1, 2}.
Then by (iv) of Proposition 2.53, the function
φ(x) = inf{α1 + α2 : (x1 + x2 , α1 + α2 ) ∈ F, x1 + x2 = x}
is a convex function. This function φ can be reduced to the form known as

the inf-convolution of φ1 and φ2 as defined below.
Definition 2.54 Consider proper convex functions φi : Rn → R̄,

i = 1, 2, . . . , m. Then the infimal convolution or inf-convolution of φ1 and φ2
is denoted by φ1 φ2 : Rn → R̄ and defined as
(φ1 φ2 )(x̄) = inf{φ1 (x1 ) + φ2 (x2 ) : xi ∈ Rn , i = 1, 2, x1 + x2 = x̄}

= inf{φ1 (x) + φ2 (x̄ − x) : x ∈ Rn }.
A simple consequence for the inf-convolution is the distance function. Con-

sider a convex set F ⊂ Rn . Then the distance function φ(x) = dF (x) can be
expressed as
φ(x) = (φ1 φ2 )(x),
where φ1 (x) = kxk and φ2 (x) = δF (x).

As it turns out, the inf-convolution of convex functions is again convex.
To verify this claim, we will need the following result on strict epigraph sum.
The proof appears in Moreau [89] but here we present its proof and that of
the proposition to follow from Attouch, Buttazzo, and Michaille [3].
Proposition 2.55 Consider two proper convex functions φi : Rn → R̄,

i = 1, 2. Then
epis (φ1 φ2 ) = epis φ1 + epis φ2 . (2.13)
Consequently,
cl epi (φ1 φ2 ) = cl (epi φ1 + epi φ2 ). (2.14)

Proof. Consider (x, α) ∈ epis (φ1 φ2 ), which implies
(φ1 φ2 )(x) < α.
The above inequality holds if and only if there exist x1 , x2 ∈ Rn with

x1 + x2 = x such that
φ1 (x1 ) + φ2 (x2 ) < α.
This is equivalent to the existence of x1 , x2 ∈ Rn and α1 , α2 ∈ R with

x1 + x2 = x and α1 + α2 = α such that φi (xi ) < αi , i = 1, 2, thereby es-
tablishing (2.13).
By Definition 2.49 of strict epigraph, it is obvious that for any function φ,
epis φ ⊂ epi φ and cl epis φ = epi φ,
which along with the strict epigraph condition implies that
epi (φ1 φ2 ) = cl epis (φ1 φ2 )

= cl (epis φ1 + epis φ2 ) ⊂ cl (epi φ1 + epi φ2 ). (2.15)
Now suppose that (xi , αi ) ∈ epi φi , i = 1, 2, which along with the definition
of inf-convolution implies that
(φ1 φ2 )(x1 + x2 ) ≤ φ1 (x1 ) + φ2 (x2 ) ≤ α1 + α2 .
Therefore, (x1 + x2 , α1 + α2 ) ∈ epi (φ1 φ2 ). Because (xi , αi ) ∈ epi φi ,

i = 1, 2, were arbitrary,
epi φ1 + epi φ2 ⊂ epi (φ1 φ2 ).
Taking closure on both sides of the above relation along with the condition
(2.15) yields the condition (2.14), as desired.
Using this proposition, we now move on to show that the inf-convolution
of proper convex functions is also convex.
Proposition 2.56 Consider two proper convex functions φ1 , φ2 : Rn → R̄.

Then φ1 φ2 is also convex.
Proof. From Proposition 2.55,
epis (φ1 φ2 ) = epis φ1 + epis φ2 .
As φ1 and φ2 are convex functions, by Proposition 2.50, epis φ1 and epis φ2 are
convex sets. This along with the above condition implies that epis (φ1 φ2 ),
is convex which again by the characterization of convex functions, Proposi-
tion 2.50, leads to the convexity of φ1 φ2 .

An application of inf-convolution can be seen in the following property of

indicator function. For convex sets F1 and F2 in Rn , the indicator function of
the sum of the sets is
δF1 +F2 = δF1 δF2 .
The importance of inf-convolution will be discussed in the study of conjugate

functions later in this chapter. For more on inf-convolution, the readers may
refer to Strömberg [107].
As we discussed that the inf-convolution is motivated by taking
F = epi φ1 + epi φ2 , similarly the notion of convex hull of a function is moti-
vated by taking F = co epi φ. Below we define this concept.
Definition 2.57 The convex hull of a nonconvex function φ is denoted as

co φ and is obtained from Proposition 2.53 (iv) with F = co epi φ. Therefore,
by Theorem 2.7, (x, P α) ∈ F if and only if there exist (xi , αi ) ∈ epi φ, λi ≥ 0,
m
i = 1, 2, . . . , m with i=1 λi = 1 such that
(x, α) = λ1 (x1 , α1 ) + λ2 (x2 , α2 ) + . . . + λm (xm , αm )

= (λ1 x1 + λ2 x2 + . . . + λm xm , λ1 α1 + λ2 α2 + . . . + λm αm ).
Because φ(xi ) ≤ αi , i = 1, 2, . . . , m, Proposition 2.53 (iv) leads to
co φ(x) = inf{λ1 φ(x1 ) + λ2 φ(x2 ) + . . . + λm φ(xm ) ∈ R :

λ1 x1 + λ2 x2 + . . . + λm xm = x,
X
λi ≥ 0, i = 1, 2, . . . , m, λi = 1}.
It is the greatest convex function majorized by φ. If φ is convex, co φ = φ. The

S hull of an arbitrary collection of functions {φi : i ∈ I} is denoted by
convex
co i∈I φi and is the convex hull of the pointwise infimum of the collection,
that is,
[
co φi = co inf φi .
i∈I
i∈I
It is a function obtained from Proposition 2.53 (iv) by taking

[
F = co epi φi .
i∈I
It is the greatest convex function majorized by every φi , i = 1, 2, . . . , m.

The closed convex hull of a function φ is denoted by cl co φ and defined as
cl co φ(x′ ) = sup{hξ, x′ i − α : hξ, xi − α ≤ φ(x), ∀ x ∈ Rn }.
Similar to closure of a function, cl co φ satisfies the condition
epi cl co φ = cl co epi φ.

For more details on the convex hull and the closed convex hull of a function,
readers are advised to refer to Hiriart-Urruty and Lemaréchal [62, 63].
Now before moving on with the properties of convex functions, we briefly
discuss an important class of convex functions, namely sublinear and support
functions, which as we will see later in the chapter are important in the study
of convex analysis.
2.3.1 Sublinear and Support Functions

Definition 2.58 A proper function p : Rn → R̄ is said to be a sublinear
function if and only if p is subadditive and positively homogeneous, that is,
p(x1 + x2 ) ≤ p(x1 ) + p(x2 ), ∀ x1 , x2 ∈ Rn (subadditive property)
p(λx) = λp(x), ∀ x ∈ Rn , ∀ λ > 0 (positively homogeneous property)
From the positive homogeneity property, p(0) = λp(0) for every λ > 0, which
is satisfied for p(0) = 0 as well as p(0) = +∞. Most sublinear functions satisfy
p(0) = 0. As p is proper, dom p is nonempty. So if p(x) < +∞, then by the
positive homogeneity property, p(tx) < +∞, which implies that dom p is a
cone. Observe that as p is positively homogeneous, for x, y ∈ Rn and any
λ ∈ (0, 1),
p((1 − λ)x) = (1 − λ)p(x) and p(λy) = λp(y).
By the subadditive property of p,
p((1 − λ)x + λy) ≤ p((1 − λ)x) + p(λy)

= (1 − λ)p(x) + λp(y), ∀ λ ∈ (0, 1).
The inequality holds as equality for λ = 0 and λ = 1. Because x, y ∈ Rn were

arbitrary, p is convex. Therefore, a sublinear function is a particular class of
convex functions and hence dom p is convex. Next we present a proposition
that gives the geometric characterization of sublinear functions. For the proof,
we will also need the equivalent form of positive homogeneity from Hiriart-
Urruty and Lemaréchal [63] according to which
p(λx) ≤ λp(x), ∀ x ∈ Rn , ∀ λ > 0.
Note that if p is positively homogeneous, then the above condition holds triv-
ially. Conversely, if the above inequality holds, then for any λ > 0,
1
p(x) = p(λ−1 λx) ≤ p(λx), ∀ x ∈ Rn ,
λ
which along with the preceding inequality yields that p is positively homoge-
neous.
Theorem 2.59 Consider a proper function p : Rn → R̄. p is a sublinear

function if and only if its epigraph, epi p, is a convex cone in Rn × R.

Proof. Suppose that p is sublinear. From the above discussion, p is a convex

function as well and thus epi p is convex. Consider (x, α) ∈ epi p, which
implies that p(x) ≤ α. By the positively homogeneous property
p(λx) = λp(x) ≤ λα, λ > 0,
which implies that λ(x, α) = (λx, λα) ∈ epi p for every λ > 0. Also,
(0, 0) ∈ epi p. Thus, epi p is a cone.
Conversely, suppose that epi p is a convex cone. By Theorem 2.20, for any
(xi , αi ) ∈ epi p, i = 1, 2,
(x1 + x2 , α1 + α2 ) ∈ epi p.
In particular for αi = p(xi ), i = 1, 2, the above condition leads to
p(x1 + x2 ) ≤ p(x1 ) + p(x2 ).
Because x1 , x2 ∈ Rn are arbitrarily chosen, the above inequality implies that

p is subadditive. Also, as epi p is a cone, any (x, α) ∈ epi p implies that
λ(x, α) ∈ epi p for every λ > 0. In particular, for α = p(x),
p(λx) ≤ λp(x), ∀ λ > 0,
which is an equivalent definition for positive homogeneity, as discussed before.

Hence, p is a sublinear function.
Sublinear functions are particular class of convex functions. For a convex
cone K ⊂ Rn , the indicator function δK and the distance function dK are
also sublinear functions. An important class of sublinear functions is that of
support functions. We will discuss the support functions in brief.
Definition 2.60 Consider a set F ⊂ Rn . The support function, σF : Rn → R̄,

to F at x̄ ∈ Rn is defined as
σF (x̄) = sup hx̄, xi.

x∈F
From Proposition 1.7 (ii) and (iii), it is obvious that a support function
is sublinear. As it is the supremum of linear functions that are continuous,
support functions are lsc. For a closed convex cone K ⊂ Rn ,

0, hx̄, xi ≤ 0, ∀ x ∈ K,
σK (x̄) =
+∞, otherwise,
which is nothing but the indicator function of the polar cone K ◦ . Equivalently,
σ K = δK ◦ and δK = σ K ◦ .
Next we present some properties of the support functions, the proofs of

which are from Burke and Deng [22], Hiriart-Urruty and Lemaréchal [63], and
Rockafellar [97].

Proposition 2.61 (i) Consider two convex sets F1 and F2 in Rn . Then
F1 ⊂ F2 =⇒ σF1 (x) ≤ σF2 (x), ∀ x ∈ Rn .
(ii) For a set F ⊂ Rn , one has
σF = σcl F = σco F = σcl co F .
(iii) Consider a convex set F ⊂ Rn . Then x̄ ∈ cl F if and only if
hx∗ , x̄i ≤ σF (x∗ ), ∀ x∗ ∈ Rn .
(iv) For convex sets F1 , F2 ⊂ Rn , cl F1 ⊂ cl F2 if and only if
σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
(v) Let F1 , F2 ⊂ Rn be convex sets and K ⊂ Rn be a closed convex cone. Then
σF1 (x) ≤ σF2 (x), ∀ x ∈ K ⇐⇒ σF1 (x) ≤ σF2 +K ◦ (x), ∀ x ∈ Rn

⇐⇒ F1 ⊂ cl(F2 + K ◦ ).
(vi) The support function of a set F ⊂ Rn is finite everywhere if and only if

F is bounded.
Proof. (i) By Proposition 1.7 (i), it is obvious that for F1 ⊂ F2 ,
sup hx, x1 i ≤ sup hx, x2 i, ∀ x ∈ Rn ,

x1 ∈F1 x2 ∈F2
thereby leading to the desired result.

(ii) As hx, .i is linear and hence continuous over Rn , then on taking supremum
over F ,
σF (x) = σcl F (x), ∀ x ∈ Rn .
Because F ⊂ co F , by (i),
σF (x) ≤ σco F (x), ∀ x ∈ Rn .
Also, for any x′ ∈ co F , by Carathéodory Theorem,P Theorem 2.8, there ex-

n+1
ist x′i ∈ F , λi ≥ 0, i = 1, 2, . . . , n + 1, satisfying i=1 λi = 1 such that
′
P n+1 ′
x = i=1 λi xi . Therefore,
n+1
X n+1
X
hx, x′ i = λi hx, x′i i ≤ λi σF (x) = σF (x).
i=1 i=1
Because x′ ∈ co F was arbitrary, the above inequality holds for every x′ ∈ co F

and hence
σco F (x) ≤ σF (x), ∀ x ∈ Rn ,

thus yielding the equality as desired. These relations also imply that
σF = σcl co F .
(iii) Invoking Theorem 2.27, the desired result holds.
(iv) By (i) and (ii), cl F1 ⊂ cl F2 implies that
σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Conversely, suppose that the above inequality holds, which implies for
every x ∈ cl F1 ,
hx∗ , xi ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Therefore, by (iii), x ∈ cl F2 . Because x ∈ cl F1 was arbitrary, cl F1 ⊂ cl F2 ,
(v) Consider x ∈ K. As F2 ⊂ F2 + K ◦ , by (i) along with Proposition 1.7 and
the definition of polar cone leads to
σF2 (x) ≤ σF2 +K ◦ (x) = σF2 (x) + σK ◦ (x) ≤ σF2 (x),
that is, σF2 (x) = σF2 +K ◦ (x) for x ∈ K. Now if x 6∈ K, there exists z ∈ K ◦
such that hz, xi > 0. Consider y ∈ F2 . Therefore, as the limit λ → +∞,
hy + λz, xi → +∞ which implies σF2 +K ◦ (x) = +∞. Thus, establishing the
first equivalence. The second equivalence can be obtained by (ii) and (iv).
(vi) Suppose that F is bounded, which implies that there exists M > 0 such
that
kx′ k ≤ M, ∀ x′ ∈ F.
Therefore, by the Cauchy–Schwarz Inequality, Proposition 1.1,
hx, x′ i ≤ kxkkx′ k ≤ kxkM, ∀ x′ ∈ F,
which implies that σF (x) ≤ kxkM for every x ∈ Rn . Thus, σF is finite every-
where.
Conversely, suppose that σF is finite everywhere. In the next section, we
will present a result establishing the local Lipschitz property and hence con-
tinuity of the convex function, Theorem 2.72. This leads to the local bound-
edness. Therefore there exists M such that
hx, x′ i ≤ σF (x) ≤ M, ∀ (x, x′ ) ∈ B × F.
x′
If x′ 6= 0, taking x = , the above inequality leads to kx′ k ≤ M for every
kx′ k
x′ ∈ F , thereby establishing the boundedness of F and hence proving the
result.
As mentioned earlier, the support function is lsc and sublinear. Similarly,
a closed sublinear function can be viewed as a support function. We end
this subsection by presenting this important result to assert the preceding
statement. The proof is again due to Hiriart-Urruty and Lemaréchal [63].

Theorem 2.62 For a proper lsc sublinear function σ : Rn → R̄, there exists a
linear function minorizing σ. In fact, σ is the supremum of the linear function
minorizing it; that is, σ is the support function of the closed convex set given
by
Fσ = {x ∈ Rn : hx, di ≤ σ(d), ∀ d ∈ Rn }.
Proof. Because sublinear functions are convex, σ is a proper lsc convex func-
tion. As we will discuss in one of the later sections, every proper lsc convex
function can be represented as a pointwise supremum of affine functions ma-
jorized by it, Theorem 2.100 and there exists (x, α) ∈ Rn × R such that
hx, di − α ≤ σ(d), ∀ d ∈ Rn .
As σ(0) = 0, the preceding inequality leads to α ≥ 0. By the positive homo-

geneity of σ,
α
hx, di − ≤ σ(d), ∀ d ∈ Rn , ∀ λ > 0.
λ
Taking the limit as λ → +∞,
hx, di ≤ σ(d), ∀ d ∈ Rn ,
that is, σ is minorized by linear functions.

As mentioned in the beginning, convex functions are supremum of affine
functions, which for sublinear functions can be restricted to linear functions.
Therefore, by Theorem 2.100,
σ(d) = sup hx, di

x∈Fσ
and hence σ is the support function of Fσ .

After discussing these classes of convex functions, we move on to discuss
the nature of convex functions.
2.3.2 Continuity Property

We have already discussed the operations that preserve convexity of the func-
tions. Now we shall study the continuity, Lipschitzian and differentiability
properties of the function. But before doing so, let us recall proper functions.
A function φ : Rn → R̄ is proper if φ(x) > −∞ for every x ∈ Rn and
dom φ is nonempty, that is, epi φ is nonempty and contains no vertical lines.
A function that is not proper is called an improper function. We know that
for a convex function, the epigraph is a convex set. If φ is an improper convex
function such that there exists x̄ ∈ ri dom φ such that φ(x̄) = −∞, then the
convexity of epi φ is broken unless φ(x) = −∞ for every x ∈ ri dom φ. Such

−1 1 −1 1
epi φ1 epi φ2
FIGURE 2.7: Epigraphs of improper functions φ1 and φ2 .
functions can however have finite values at the boundary points. For example,
consider φ1 : R → R̄ given by

 −∞, |x| < 1,
φ1 (x) = 0, |x| = 1,

+∞, |x| > 1.
Here, φ1 is an improper convex function with finite values at boundary points

of the domain x = −1 and x = 1. Also it is obvious that φ cannot have a finite
value on ri dom φ and −∞ at a boundary point. For better understanding,
suppose that x ∈ ri odm φ such that φ(x) > −∞ and let y be a boundary
point of dom φ with φ(y) = −∞. By the convexity of φ,
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y), ∀ λ ∈ (0, 1),
which implies that for (1 − λ)x + λy ∈ ri dom φ,
φ((1 − λ)x + λy) = −∞.
This contradicts the convexity of the epigraph. This aspect can be easily
visualized by modifying the previous example as follows. Define an improper
function φ2 : R → R̄ as

 −∞, x = 1,
φ2 (x) = 0, −1 ≤ x < 1,

+∞, |x| > 1.
Obviously φ2 cannot be convex as epi φ2 is not convex as in Figure 2.7. These

discussions can be stated as the following result from Rockafellar [97].

Proposition 2.63 Consider an improper convex function φ : Rn → R̄. Then

φ(x) = −∞ for every x ∈ ri dom φ. Thus φ is necessarily infinite except
perhaps at the boundary point of dom φ. Moreover, an lsc improper convex
function can have no finite values.
As discussed in Chapter 1, the continuity of a function plays an important

role in the study of its bounds and hence in optimization problems. Before
discussing the continuity property of convex functions we shall present some
results on interior of the epigraph of a convex function and closure of a convex
function.
Proposition 2.64 Consider a convex function φ : Rn → R̄ such that

ri dom φ is nonempty. Then ri epi φ is also nonempty and given by
ri epi φ = {(x, α) ∈ Rn × R : x ∈ ri dom φ, φ(x) < α}.
Equivalently, (x̄, ᾱ) ∈ ri epi φ if and only if ᾱ > lim supx→x̄ φ(x).
Proof. To obtain the result for ri epi φ, it is sufficient to derive it for int epi φ,
that is,
int epi φ = {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}.
By Definition 2.12, for (x̄, ᾱ) ∈ int epi φ, there exists ε > 0 such that
(x̄, ᾱ) + εB ⊂ epi φ,
which implies that x̄ ∈ int dom φ along with φ(x̄) < ᾱ. As (x̄, ᾱ) ∈ int epi φ
is arbitrary,
int epi φ ⊂ {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}.
Now suppose that x̄ ∈ int dom φ and φ(x̄) ≤ ᾱ. Consider

x1 , x2 , . . . , xm ∈ dom φ such that x̄ ∈ int F where F = co {x1 , x2 , . . . , xm }.
Define
γ = max {φ(xi )}.

1,2,...,m
By the convexity
Pm of F , for any x ∈ F there exist λi ≥ 0, i = 1, 2, . . . , m,
satisfying i=1 λi = 1 such that
m
X
x= λi xi .
i=1
Because φ is convex,
m
X m
X
φ(x) ≤ λi φ(xi ) ≤ λi γ = γ.
i=1 i=1

Therefore, the open set
{(x, α) ∈ Rn × R : x ∈ int F, γ < α} ⊂ epi φ.
In particular, for any α > γ, (x̄, α) ∈ int epi φ. Thus, (x̄, ᾱ) can be con-
sidered as lying on the interior of line segment passing through the points
(x̄, α) ∈ int epi φ, which by the line segment principle, Proposition 2.14,
(x̄, ᾱ) ∈ int epi φ. Because (x̄, ᾱ) is arbitrary,
int epi φ ⊃ {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α},
thereby leading to the requisite result.

Now we move on to prove the equivalent part for ri epi φ. Suppose that
(x̄, ᾱ) ∈ ri epi φ. Therefore, by the earlier characterization one can always
find an ε > 0 such that
x̄ ∈ ri dom φ and sup φ(x) < ᾱ.

x∈Bε (x̄)
Taking the limit as ε → 0 along with Definition 1.5 of limit supremum,
lim sup φ(x) < ᾱ.

x→x̄
Conversely, suppose that for (x̄, ᾱ) the strict inequality condition holds
which implies
lim sup φ(x) < ᾱ.

ε↓0 x∈Bε (x̄)
Therefore, there exists ε > 0 such that
sup φ(x) < ᾱ,

x∈Bε (x̄)
which yields φ(x̄) < ᾱ with x̄ ∈ ri dom φ, thereby proving the equivalent
result. Note that this equivalence can be established for int epi φ as well.
Note that the above result can also be obtained for the relative interior of
the epigraph as it is nothing but the interior relative to the affine hull of the
epigraph. As a consequence of the above characterization of ri F , we have the
following result from Rockafellar [97].
Corollary 2.65 Consider α ∈ R and φ : Rn → R̄ to be a proper convex

function such that for some x ∈ dom φ, φ(x) < α. Then actually φ(x) < α
for some x ∈ ri dom φ.
Proof. Define a hyperplane H as
H = {(x, µ) ∈ Rn × R : µ < α}.

Because for some x ∈ Rn , φ(x) < α, in particular for µ = φ(x), we have that
H meets epi φ. Invoking Corollary 2.16 (ii), H also meets ri epi φ, which by
Proposition 2.64 implies that there exists x ∈ ri dom φ such that φ(x) < α,
thereby yielding the desired result.
Recall that in the previous chapter the closure of a function φ : Rn → R̄
was defined as
cl φ(x̄) = lim inf φ(x), ∀ x̄ ∈ Rn ,

x→x̄
which is a bit complicated to compute. In case of a convex function, it is much

easier to compute and is presented in the next proposition. The proof is from
Rockafellar [97].
Proposition 2.66 Consider a proper convex function φ : Rn → R̄. Then cl φ

agrees with φ in ri dom φ and for x̂ ∈ ri dom φ,
cl φ(x) = lim φ((1 − λ)x̂ + λx), ∀ x ∈ Rn .

λ→1
Proof. From Definition 1.11 of closure of a function, cl φ is lsc and cl φ ≤ φ.

Therefore, by the lower semicontinuity of cl φ,
lim inf (cl φ)((1 − λ)x̂ + λx) = cl φ(x) ≤ lim inf φ((1 − λ)x̂ + λx).
λ→1 λ→1
To prove the result, we will establish the following inequality
cl φ(x) ≥ lim sup φ((1 − λ)x̂ + λx).

λ→1
Consider α ∈ R such that cl φ(x) ≤ α, which implies that
(x, α) ∈ epi cl φ = cl epi φ.
Consider any (x̂, α̂) ∈ ri epi φ. Applying the Line Segment Principle, Propo-
sition 2.14,
(1 − λ)(x̂, α̂) + λ(x, α) ∈ ri epi φ, ∀ λ ∈ [0, 1).
By Proposition 2.64,
φ((1 − λ)x̂ + λx) < (1 − λ)α̂ + λα, ∀ λ ∈ [0, 1).
Taking the limit superior as λ → 1, the above inequality leads to
lim sup φ((1 − λ)x̂ + λx) ≤ lim sup(1 − λ)α̂ + λα = α.

λ→1 λ→1
In particular, taking α = cl φ(x) in the above inequality yields the desired

result.

In the relation
cl φ(x) = lim φ((1 − λ)x̂ + λx),

λ→1
in particular, taking x = x̂ ∈ ri dom φ leads to cl φ(x̂) = φ(x̂). Because

x̂ ∈ ri dom φ is arbitrary, cl φ = φ on ri dom φ, that is, cl φ agrees with φ in
ri dom φ.
Next we present some results from Rockafellar [97] on closure and relative
interior.
Proposition 2.67 Consider a proper convex function φ : Rn → R̄ and let

α ∈ R such that α > inf x∈Rn φ(x). Then the level sets
{x ∈ Rn : φ(x) ≤ α} and {x ∈ Rn : φ(x) < α}
have the same closure and relative interior, namely
{x ∈ Rn : cl φ(x) ≤ α} and {x ∈ Rn : x ∈ ri dom φ, φ(x) < α},
respectively.
Proof. Define a hyperplane H = {(x, α) ∈ Rn × R : x ∈ Rn } in Rn+1 .

Applying Corollary 2.65 and Proposition 2.64, H intersects ri epi φ, which
implies that
ri H ∩ ri epi φ = H ∩ ri epi φ 6= ∅.
Now consider
H ∩ epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.
Invoking Corollary 2.16 (iii),
cl(H ∩ epi φ) = cl H ∩ cl epi φ = H ∩ epi cl φ, (2.16)

ri(H ∩ epi φ) = ri H ∩ ri epi φ = H ∩ ri epi φ. (2.17)
The projection of these sets in Rn are, respectively,
cl {x ∈ Rn : φ(x) ≤ α} = {x ∈ Rn : cl φ(x) ≤ α},

ri {x ∈ Rn : φ(x) ≤ α} = {x ∈ Rn : x ∈ ri dom φ, φ(x) < α}.
The latter relation implies that
ri {x ∈ Rn : φ(x) ≤ α} ⊂ {x ∈ Rn : φ(x) < α} ⊂ {x ∈ Rn : φ(x) ≤ α}.
Therefore, by Corollary 2.16 (ii), {x ∈ Rn : φ(x) < α} has the same closure
and relative interior as {x ∈ Rn : φ(x) ≤ α}.

Proposition 2.68 Consider proper convex functions φi : Rn → R̄,

i = 1, 2, . . . , m. If every φi , i = 1, 2, . . . , m, is lsc and φ1 +φ2 +. . .+φm 6≡ +∞,
then φ1 + φ2 + . . . + φm is a proper lsc convex function. If φi , i = 1, 2, . . . , m,
are not all lsc but ri dom φ1 ∩ ri dom φ2 ∩ . . . ∩ ri dom φm is nonempty, then
cl (φ1 + φ2 + . . . + φm ) = cl φ1 + cl φ2 + . . . + cl φm .
Proof. Define φ = φ1 + φ2 + . . . + φm and assume

m
\
x̂ ∈ ri dom φ = ri ( dom φi ).
i=1
By Proposition 2.66, for every x ∈ Rn ,

m
X
cl φ(x) = lim φ((1 − λ)x̂ + λx) = lim φi ((1 − λ)x̂ + λx). (2.18)
λ→1 λ→1
i=1
If φi , i = 1, 2, . . . , m, are all lsc, then the above condition becomes
cl φ(x) = φ1 (x) + φ2 (x) + . . . + φm (x), ∀ x ∈ Rn
and thus cl φ = φ.
Suppose that φi , i = 1, 2, . . . , m, are not all lsc. If
m
\
ri dom φi 6= ∅,
i=1
by Proposition 2.15 (iii),

m
\ m
\
ri dom φi = ri dom φi = ri dom φ.
i=1 i=1
Therefore,
x̂ ∈ ri dom φi , i = 1, 2, . . . , m.
Again by Proposition 2.66,
cl φi (x) = lim φi ((1 − λ)x̂ + λx), i = 1, 2, . . . , m.

λ→1
Therefore, the condition (2.18) becomes
cl φ(x) = cl φ1 (x) + cl φ2 (x) + . . . + cl φm (x), ∀ x ∈ Rn ,

Using the above propositions, one can prove the continuity property of the
convex functions.

Theorem 2.69 A proper convex function φ : Rn → R̄ is continuous on

ri dom φ.
Proof. By Proposition 2.66, cl φ agrees with φ in ri dom φ, which implies
that φ is lsc on ri dom φ. Now suppose that x̄ ∈ ri dom φ. For any α such
that (x̄, α) ∈ ri epi φ, by Proposition 2.64,
lim sup φ(x) < α.
x→x̄
Taking the limit as α → φ(x̄), the preceding condition becomes

lim sup φ(x) ≤ φ(x̄),
x→x̄
thereby implying the upper semicontinuity of φ at x̄. Because x̄ ∈ ri dom φ

is arbitrary, φ is usc on ri dom φ. Thus φ is continuous on ri dom φ, thereby
yielding the desired result.
Before moving on to discuss the derivative property of a convex function,
we shall discuss its Lipschitzian property. For that we first define Lipschitz
and locally Lipschitz functions.
Definition 2.70 A function φ : Rn → R is said to be Lipschitz if there exists
L > 0 such that
|φ(x) − φ(y)| ≤ L kx − yk, ∀ x, y ∈ Rn .
The positive number L is called the Lipschitz constant of φ, or φ is said to be
Lipschitz with constant L.
Definition 2.71 Consider a function φ : Rn → R and x̄ ∈ Rn . Then φ is said
to be locally Lipschitz if there exist Lx̄ > 0 and a neighborhood N (x̄) of x̄
such that
|φ(x) − φ(y)| ≤ Lx̄ kx − yk, ∀ x, y ∈ N (x̄).
It is a well known that a Lipschitz function is continuous but the converse
need not hold. From Theorem 2.69, we know that a convex function function
is continuous in the relative interior of its domain. In the results to follow, we
show that local boundedness of a convex function implies that the function is
continuous as well as is locally Lipschitz. The result is from Attouch, Buttazzo,
and Michaille [3].
Theorem 2.72 Consider a proper convex function φ : Rn → R̄ and
x̄ ∈ dom φ. For some ε > 0 such that
sup φ(x) = M < +∞.
x∈Bε (x̄)
Then φ is continuous at x̄. Moreover, φ is Lipschitz continuous on every ball

Bε′ (x̄) with ε′ < ε and
2M
|φ(x) − φ(y)| ≤ kx − yk, ∀ x, y ∈ Bε′ (x̄).
ε − ε′

Proof. Without loss of generality, by translation (that is, by considering the

function φ(x + x̄) − φ(x̄)), the problem reduces to the case when x̄ = 0 and
φ(0) = 0. Therefore, the local boundedness in the neighborhood of x̄ = 0
reduces to
sup φ(x) = M < +∞.
x∈Bε (0)
Consider an arbitrary δ ∈ (0, 1] and x ∈ Bδε (0). Now expressing

1
x = (1 − δ)0 + δ( x),
δ
1
where x ∈ Bε (0). The convexity of φ along with the local boundedness
δ
condition leads to
1
φ(x) ≤ (1 − δ)φ(0) + δφ( x) ≤ δM.
δ
Rewriting
1 δ −1
0= x+ ( x),
1+δ 1+δ δ
1
where x ∈ Bε (0). Again, the convexity of φ yields
δ
1 δ −1 1 δM
0 = φ(0) ≤ φ(x) + φ( x) ≤ φ(x) + ,
δ 1+δ δ 1+δ 1+δ
which along with the previous condition on φ(x) implies that
−δM ≤ φ(x) ≤ δM.
Because x ∈ Bδε (0) is arbitrary,
|φ(x)| ≤ δM, ∀ x ∈ Bδε (0),
thereby establishing the continuity of φ at 0.
In the above discussion, in particular for δ = 1,
|φ(x + x̄) − φ(x̄)| ≤ M, ∀ x ∈ Bε (0).
Consider arbitrary x, y ∈ Bε′ (x̄) with x 6= y. Denoting δ = ε − ε′ > 0,
δ kx − yk
z =x+ (x − y) and λ= .
kx − yk δ + kx − yk
Observe that
δ
kz − x̄k = k(y − x̄) + + (x − y)k
kx − yk
δ
≤ ky − x̄k + kx − yk
kx − yk
= ε′ + δ = ε,

which implies z ∈ Bε (x̄). Also
kx − ykz = (δ + kx − yk)x − δy,
which implies that
x = (1 − λ)y + λz, ∀ λ ∈ (0, 1).
By the convexity of φ,
φ(x) ≤ (1 − λ)φ(y) + λφ(x) = φ(y) + λ(φ(z) − φ(y)),
which leads to
φ(x) − φ(y) ≤ λ(φ(z) − φ(y)) ≤ λ|φ(z) − φ(y)|.
Observe that
|φ(z) − φ(y)| ≤ |φ(z) − φ(x̄)| + |φ(y) − φ(x̄)| ≤ 2M,
as z ∈ Bε (x̄) and y ∈ Bε′ (x̄) ⊂ Bε (x̄). Therefore,
kx − yk 2M
φ(x) − φ(y) ≤ 2M ≤ kx − yk.
δ + kx − yk δ
Interchanging the roles of x and y yields
2M
|φ(x) − φ(y)| ≤ kx − yk,
δ
thereby establishing the result.
In the above result, we showed that if a proper convex function is locally
bounded at a point, then it is locally Lipschitz at that point. As a matter of
fact, it is more than that which is presented in the result below, the proof of
which is along the lines of Hiriart-Urruty and Lemaréchal [63].
Theorem 2.73 Consider a proper convex function φ : Rn → R̄. Then φ is

locally Lipschitz on ri dom φ.
Proof. Similar to the proof of Proposition 2.14 (i), consider n + 1 lin-

early independent vectors x1 , x2 , . . . , xn+1 ∈ dom φ such that
x̄ ∈ ri co {x1 , x2 , . . . , xn+1 } ⊂ dom φ. Now consider ε > 0 such that
Bε (x̄) ⊂ co {x1 , x2 , . . . , xn+1 }. For any arbitrary x ∈ Bε (x̄), there exist λi ≥ 0,
Pn+1
i = 1, 2, . . . , n + 1, satisfying i=1 λi = 1 such that
n+1
X
x= λi xi .
i=1

By the convexity of φ,
n+1
X
φ(x) ≤ λi φ(xi ) ≤ max φ(xi ) = M < +∞.
1,2,...,n+1
i=1
Because x ∈ Bε (x̄) is arbitrary, the above condition holds for every x ∈ Bε (x̄).
Therefore, by Theorem 2.72, for ε′ < ε,
2M
|φ(x) − φ(y)| ≤ kx − yk, ∀ x, y ∈ Bε′ (x̄),
ε − ε′
thus proving that φ is locally Lipschitz at x̄ ∈ ri dom φ. Because x̄ ∈ ri dom φ
is arbitrary, φ is locally Lipschitz on ri dom φ.
2.3.3 Differentiability Property

After discussing the continuity and the Lipschitzian property of a convex
function, we will now make a move toward studying its differentiability na-
ture. In general, a convex function need not be differentiable on the whole of
Rn . For instance, consider the convex function φ(x) = |x|. It is differentiable
everywhere except x = 0, which is the point of minimizer if we minimize this
function over the whole of R. Another example of nonsmooth convex function
that appears naturally is the max-function. Consider φ(x) = max{x, x2 }. As
we know from Proposition 2.53, the supremum of convex functions is convex,
so φ is convex. Here both x and x2 are differentiable over R but φ is not
differentiable at x = 0 and x = 1. Again for the unconstrained minimization
of φ over R, the point of minimizer is x̄ = 0. So how is one supposed to
study the optimality at a point if the function is not differentiable there. This
means the notion of differentiability must be replaced by some other concept
so as to facilitate nonsmooth convex functions. For a differentiable function
we know that both left-sided as well as right-sided derivatives exist and are
equal. In case of a convex function, the right-sided derivative always exists.
So in the direction to replace differentiability, we first introduce the concept
of one-sided directional derivative or simply directional derivative.
Definition 2.74 For a proper convex function φ : Rn → R̄, the directional

derivative of φ at x̄ ∈ dom φ in the direction d ∈ Rn is defined as
φ(x̄ + λd) − φ(x̄)

φ′ (x̄, d) = lim ,
λ↓0 λ
provided +∞ and −∞ are allowed as limits.
Before we move on to present the result on the existence of directional

derivatives of a convex function, we present a result from Rockafellar and
Wets [101].

Proposition 2.75 (Slope Inequality) Consider a function φ : I → R where

I ⊂ R denotes an interval. Then φ is convex on I if and only if for arbitrary
points x < z < y in I,
φ(z) − φ(x) φ(y) − φ(x) φ(y) − φ(z)
≤ ≤ . (2.19)
z−x y−x y−z
φ(y) − φ(x)
Consequently, ψ(y) = is nondecreasing on I for every y ∈ I \{x}.
y−x
Moreover, if φ is differentiable over an open interval I ⊂ R, then ∇φ is
nondecreasing on I.
Proof. We know that the convexity of φ on I is equivalent to

y−z z−x
φ(z) ≤ φ(x) + φ(y), ∀ x < z < y in I.
y−x y−x

y−z z−x
φ(z) − φ(x) ≤ − 1 φ(x) + φ(y)
y−x y−x

φ(y) − φ(x)
= (z − x) ,
y−x
as desired. The other inequalities can be established similarly, thereby leading
to (2.19).
Conversely, suppose that x < z < y, which implies that there exists
λ ∈ (0, 1) such that z = (1 − λ)x + λy. Substituting z = (1 − λ)x + λy in
(2.19) leads to
φ((1 − λ)x + λy) − φ(x) φ(y) − φ(x)
≤ ,
λ(y − x) y−x
that is,
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).
Because x and y were arbitrarily chosen, the above inequality holds for any
x, y ∈ I and any λ ∈ [0, 1] (the above inequality holds trivially for λ = 0 and
λ = 1). Hence, φ is a convex function.
Suppose that y1 , y2 ∈ I such that yi 6= x, i = 1, 2, and y1 < y2 . Consider
the following cases:
x < y1 < y2 , y1 < x < y2 and y1 < y2 < x.
Suppose that x < y1 < y2 . In particular, for z = y1 and y = y2 in the
inequality (2.19) yields
φ(y1 ) − φ(x) φ(y2 ) − φ(x)
ψ(y1 ) = ≤ = ψ(y2 ).
y1 − x y2 − x

Applying (2.19) to the remaining two cases leads to the fact that
φ(y) − φ(x)
ψ(y) = is nondecreasing.
y−x
Suppose that φ is convex, which implies (2.19) holds. As φ is differentiable,
for x1 , x2 ∈ I with x1 < x2 ,
φ(x2 ) − φ(x1 ) φ(x1 ) − φ(x2 )
∇φ(x1 ) ≤ = ≤ ∇φ(x2 ),
x2 − x1 x1 − x2
If φ : R → R̄ is a proper convex function, dom φ may be considered an
interval I. Then from the nondecreasing property of ψ in the above proposi-
tion, the right-sided derivative of φ, φ′+ , exists at x̄ provided both −∞ and
+∞ values are allowed and is defined as
φ(x) − φ(x̄)
φ′+ (x̄) = lim .
x↓x̄ x − x̄
φ(x) − φ(x̄)
If has a finite lower bound,
x − x̄
φ(x) − φ(x̄) φ(x) − φ(x̄)
lim = inf ,
x↓x̄ x − x̄ x>x̄, x∈I x − x̄
φ(x) − φ(x̄) φ(x) − φ(x̄)
because is nondecreasing on I. In case does not have
x − x̄ x − x̄
a finite lower bound,
φ(x) − φ(x̄) φ(x) − φ(x̄)
lim = inf = −∞,
x↓x̄ x − x̄ x>x̄, x∈I x − x̄
and for the case when I = {x̄},
φ(x) − φ(x̄)
inf = +∞
x>x̄, x∈I x − x̄
as {x ∈ R : x > x̄, x ∈ I} = ∅. Thus,
φ(x) − φ(x̄)
φ′+ (x̄) = inf .
x>x̄, x∈I x − x̄

x̄ ∈ dom φ. Then for every d ∈ Rn , the directional derivative φ′ (x̄, d) exists
with φ′ (x̄, 0) = 0 and
φ(x̄ + λd) − φ(x̄)

φ′ (x̄, d) = inf .
λ>0 λ
Moreover, φ′ (x̄, d) is a sublinear function in d for every d ∈ Rn .

Proof. Define ψ : R → R̄ given by
ψ(λ) = φ(x̄ + λd).
As x̄ ∈ dom φ, ψ(0) = φ(x̄) < +∞, which along with the convexity of φ
implies that ψ is a proper convex function. Now consider ϕ : R → R̄ defined
as
ψ(λ) − ψ(0) φ(x̄ + λd) − φ(x̄)
ϕ(λ) = = .
λ λ
By Proposition 2.75, ϕ is nondecreasing when λ > 0. Then by the discussion
′
preceding the theorem, ψ+ (0) exists and
′
ψ+ (0) = lim ϕ(λ) = inf ϕ(λ),
λ→0 λ>0
as desired.
Suppose that d ∈ Rn and α > 0. Then
φ(x̄ + λαd) − φ(x̄)
φ′ (x̄, αd) = lim
λ→0 λ
φ(x̄ + λαd) − φ(x̄)
= lim α
λ→0 λα
φ(x̄ + λ′ d) − φ(x̄)
= α lim = αφ′ (x̄, d),
λ′ →0 λ′
which implies that φ′ (x, .) is positively homogeneous.
Suppose that d1 , d2 ∈ Rn and α ∈ [0, 1], by the convexity of φ,
φ(x̄ + λ((1 − α)d1 + αd2 )) − φ(x̄) ≤ (1 − α)(φ(x̄ + λd1 ) − φ(x̄))

+α(φ(x̄ + λd2 ) − φ(x̄)).
Dividing both sides by λ > 0 and taking the limit as λ → 0, the above
inequality reduces to
φ′ (x̄, ((1 − α)d1 + αd2 )) ≤ (1 − α)φ′ (x̄, d1 ) + αφ(x̄, d2 ), ∀ α ∈ [0, 1].
In particular for α = 1/2 and applying the positive homogeneity property, the
above condition yields
φ′ (x̄, (d1 + d2 )) ≤ φ′ (x̄, d1 ) + φ(x̄, d2 ).
Because d1 , d2 ∈ Rn were arbitrary, the above inequality implies that φ′ (x̄, .)

is subadditive, which along with positive homogeneity implies that φ′ (x̄, .) is
sublinear.
For a differentiable convex function φ : Rn → R, the following relation
holds between the directional derivative and the gradient of the function φ
φ′ (x̄, d) = h∇φ(x̄), di, ∀ d ∈ Rn .

But in absence of differentiability, can one have such a relation for the direc-
tional derivative? The answer is yes. The notion that replaces the gradient in
the above condition is the subgradient.
Definition 2.77 Consider a proper convex function φ : Rn → R̄ and

x̄ ∈ dom φ. Then ξ ∈ Rn is said to be the subgradient of the function φ
at x̄ if
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .
The collection of all such vectors constitute the subdifferential of φ at x̄ and

is denoted by ∂φ(x̄). For x̄ ∈
/ dom φ, ∂φ(x̄) is empty.
For a differentiable function, its gradient at any point acts as a tangent to

the graph of the function at that point. In a similar way, from the definition
above it can be seen that the affine function φ(x̄) + hξ, x − x̄i is a supporting
hyperplane to the epigraph of φ at (x̄, φ(x̄)) with the slope ξ. In fact, at the
point of nondifferentiability, there can be an infinite number of such supporting
hyperplanes and the collection of the slopes of each of these hyperplanes forms
the subdifferential.
Recall the indicator function to the convex set F ⊂ Rn . Obviously
δF : Rn → R̄ is a proper convex function. Now from the above definition, the
subdifferential of δF at x̄ ∈ F is given by
∂δF (x̄) = {ξ ∈ Rn : δF (x) − δF (x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn }

= {ξ ∈ Rn : 0 ≥ hξ, x − x̄i, ∀ x ∈ F },
which is nothing but the normal cone to the set F at x̄. Therefore, for a convex
set F , ∂δF = NF .
Consider the norm function φ(x) = kxk, x ∈ Rn . Observe that φ is a
convex function. At x̄ = 0, φ is not differentiable and ∂φ(x̄) = B.
Like the relation between the directional derivative and gradient, we are
interested in deriving a relationship between the directional derivative and the
subdifferential, which we establish in the next result.

x̄ ∈ dom φ. Then
∂φ(x̄) = {ξ ∈ Rn : φ′ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn }.
Proof. Suppose that ξ ∈ ∂φ(x̄), which by Definition 2.77 implies that
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .
In particular, for x = x̄ + λd with λ > 0, the above condition reduces to
φ(x̄ + λd) − φ(x̄)

≥ hξ, di, ∀ d ∈ Rn .
λ

Taking the limit as λ → 0 leads to

φ′ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn ,
as desired.
Conversely, suppose that ξ ∈ Rn satisfies
φ′ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn .
By the alternate definition of φ′ (x̄, d) from Theorem 2.76 leads to
φ(x̄ + λd) − φ(x̄)
≥ hξ, di, ∀ d ∈ Rn .
λ
In particular, for λ ∈ [0, 1] and d = x − x̄, which along with the convexity of
φ leads to
φ(x̄ + λ(x − x̄)) − φ(x̄)
φ(x) − φ(x̄) ≥ ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
λ
which implies that ξ ∈ ∂φ(x̄), thereby establishing the result.
The result below from Rockafellar [97] shows that actually the directional
derivative is the support function of the subdifferential set.
cl φ′ (x̄, d) = sup hξ, di = σ∂φ(x̄) (d), ∀ d ∈ Rn .
ξ∈∂φ(x̄)
However, if x̄ ∈ ri dom φ,
φ′ (x̄, d) = σ∂φ(x̄) (d), ∀ d ∈ Rn
and if x̄ ∈ int dom φ, φ′ (x̄, d) is finite for every d ∈ Rn .
Proof. Because φ′ (x̄, .) is sublinear, combining Theorems 2.62 and 2.78 leads
to
cl φ′ (x̄, d) = σ∂φ(x̄) (d).
If x̄ ∈ ri dom φ, the domain of φ′ (x̄, .) is an affine set that is actually a
subspace parallel to the affine hull of dom φ. By sublinearity, φ′ (x̄, 0) = 0, it is
not identically −∞ on the affine set. Therefore, by Proposition 2.63, cl φ′ (x̄, .)
and hence φ′ (x̄, .) is a proper function. By Proposition 2.66, cl φ′ (x̄, .) agrees
with φ′ (x̄, .) on the affine set and hence is closed, thereby leading to the desired
condition. For x̄ ∈ int dom φ, the domain of φ′ (x̄, .) is Rn and hence it is finite
everywhere.
As mentioned earlier for a differentiable convex function, for every d ∈ Rn ,
′
φ (x̄, d) = h∇φ(x̄), di. So the question is: for a differentiable convex function,
how are the gradient and the subdifferential related? We discuss this aspect
in the result below.

Proposition 2.80 Consider a convex function φ : Rn → R differentiable at

x̄ with gradient ∇φ(x̄). Then the unique subgradient of φ at x̄ is the gradient,
that is, ∂φ(x̄) = {∇φ(x̄)}.
Proof. For a differentiable convex function φ,
φ′ (x̄, d) = h∇φ(x̄), di, ∀ d ∈ Rn .
By Theorem 2.79, for every ξ ∈ ∂φ(x̄),
h∇φ(x̄) − ξ, di ≥ 0, ∀ d ∈ Rn .
Because the above condition holds for every d ∈ Rn , it reduces to
h∇φ(x̄) − ξ, di = 0, ∀ d ∈ Rn ,
which leads to ∇φ(x̄) = ξ. As ξ ∈ ∂φ(x̄) is arbitrary, the subdifferential is a

singleton with ∂φ(x̄) = {∇φ(x̄)}.
From the above theorem, we have the following result, which gives the
equivalent characterization of a differentiable convex function.
Theorem 2.81 Consider a differentiable function φ : Rn → R. Then φ is

convex if and only if
φ(y) − φ(x) ≥ h∇φ(x), y − xi, ∀ x, y ∈ Rn .
Observe that in Theorem 2.79 we defined the relation between the direc-
tional derivative and the support function of the subdifferential for point x̄ in
the relative interior of the domain. The reason for this is the fact that at the
boundary of the domain, the subdifferential may be an empty set. For a clear
view into this aspect, we consider the following example from Bertsekas [12].
Let φ : R → R̄ be a proper convex function given by
√
− x, 0 ≤ x ≤ 1,
φ(x) =
+∞, otherwise.
The subdifferential of φ is
−1

√ , 0 < x < 1,
2 x
∂φ(x) =
 [−1/2, +∞) ,
 x = 1,
∅, x ≤ 0 or x > 1.
Note that the subdifferential is empty at the boundary point x = 0. Also at

the other boundary point x = 1, it is unbounded. But the subdifferential may
also turn out to be unbounded at a point in the relative interior of the domain.

For example, consider the following proper convex function φ : R → R̄ defined

as

0, x = 0,
φ(x) =
+∞, otherwise.
Observe that at x = 0, ∂φ(x) = R, which is unbounded even though 0 is in

the relative interior of the domain. Based on these illustrations, we have the
following result from Rockafellar [97] and Attouch, Buttazzo, and Michaille [3].
Proposition 2.82 Consider a proper convex function φ : Rn → R̄ and

x̄ ∈ dom φ. Then ∂φ(x̄) is closed and convex. For x̄ ∈ ri dom φ, the sub-
differential ∂φ(x̄) is nonempty. Furthermore, if x̄ ∈ int dom φ, ∂φ(x̄) is non-
empty and compact. Moreover, if φ is continuous at x̄ ∈ dom φ, then ∂φ(x̄)
is compact.
Proof. Suppose that {ξk } ⊂ ∂φ(x̄) such that ξk → ξ. By Definition 2.77 of

subdifferential,
φ(x) − φ(x̄) ≥ hξk , x − x̄i, ∀ x ∈ Rn .
Taking the limit as k → +∞, the above inequality leads to
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
which implies that ξ ∈ ∂φ(x̄), thereby yielding the closedness of ∂φ(x̄).

Consider ξ1 , ξ2 ∈ ∂φ(x̄), which implies that for i = 1, 2,
φ(x) − φ(x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn .
Therefore, for any λ ∈ [0, 1],
φ(x) − φ(x̄) ≥ h(1 − λ)ξ1 + λξ2 , x − x̄i, ∀ x ∈ Rn ,
which implies (1 − λ)ξ1 + λξ2 ∈ ∂φ(x̄). Because ξ1 , ξ2 were arbitrary, ∂φ(x̄) is

convex.
From the proof of Theorem 2.79, for x̄ ∈ ri dom φ, φ′ (x̄, .) is the support
function of ∂φ(x̄), which is proper. Hence, ∂φ(x̄) is nonempty.
Again by Theorem 2.79, for x̄ ∈ int dom φ, φ′ (x̄, .) is finite everywhere.
Because it is a support of ∂φ(x̄), by Proposition 2.61, ∂φ(x̄) is bounded and
hence compact.
Now suppose that φ is continuous at x̄ ∈ dom φ. We have already seen that
∂φ is always closed and convex. Therefore to establish that ∂φ(x̄) is compact,
we only need to show that it is bounded. By the continuity of φ at x̄, it is
bounded in the neighborhood of x̄. Thus, there exist ε > 0 and M ≥ 0 such
that
φ(x̄ + εd) ≤ M, ∀ d ∈ B.

Consider ξ ∈ ∂φ(x̄), which implies that
hξ, x − x̄i ≤ φ(x) − φ(x̄), ∀ x ∈ Rn .
In particular, for any d ∈ B, the above inequality along with the boundedness
of φ in the neighborhood of x̄ leads to
hξ, εdi ≤ φ(x̄ + εd) − φ(x̄) ≤ M + |φ(x̄)|,
which implies that

1
hξ, di ≤ (M + |φ(x̄)|), ∀ d ∈ B.
ε
Therefore,
1
kξk ≤ (M + |φ(x̄)|).
ε
Because ξ ∈ ∂φ(x̄) was arbitrary, ∂φ(x̄) is bounded and hence compact.
If we consider a real-valued convex function φ : Rn → R, then int dom φ =
n
R and therefore, the above result reduces to the following.
Proposition 2.83 Consider a convex function φ : Rn → R. Then the subdif-

ferential ∂φ(x) is nonempty, convex, and compact for every x ∈ Rn .
With the discussion on subdifferentials, we present some properties of the

subdifferential as x varies by treating it as a multifunction or set-valued map-
ping x 7→ ∂φ(x) starting with some of the fundamental continuity results of
the subdifferential mapping.
Theorem 2.84 (Closed Graph Theorem) Consider a proper lsc convex func-
tion φ : Rn → R̄. If for sequences {xk }, {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ) with
xk → x̄ and ξk → ξ,¯ then ξ¯ ∈ ∂φ(x̄). This means gph ∂φ is a closed subset of
Rn × Rn .
¯ then from Definition 2.77
Proof. Because ξk ∈ ∂φ(xk ) with (xk , ξk ) → (x̄, ξ),
of subdifferential,
φ(x) − φ(xk ) ≥ hξk , x − xk i, ∀ x ∈ Rn .
Taking the limit infimum as k → +∞, which along with the lower semiconti-
nuity of φ reduces the above condition to
¯ x − x̄i, ∀ x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ,
thereby implying that ξ¯ ∈ ∂φ(x̄) and thus establishing that gph ∂φ is closed,
as desired.

From the above theorem one may note that the normal cone to a convex
set F ⊂ Rn is also graph closed as it is nothing but the subdifferential of the
convex indicator function δF , that is, NF = ∂δF .
In general we know that the arbitrary union of closed sets need not be
closed. But in the proposition below from Bertsekas [12] and Rockafellar [97]
we have that the union of the subdifferential over a compact set is compact.
n
S function φ : R → R and a compact
Proposition 2.85 Consider a convex
n
set F ∈ R . Then the set ∂φ(F ) = x∈F ∂φ(x) is nonempty and compact.
Proof. Because F is a nonempty subset of dom φ = Rn , by Proposition 2.82,

∂φ(F ) is nonempty.
We claim that ∂φ(F ) is closed. Consider a sequence {ξk } ⊂ ∂φ(F ) such
that ξk → ξ. ¯ As ξk ∈ ∂φ(F ) for k ∈ N, there exist xk ∈ F such that
ξk ∈ ∂φ(xk ), k ∈ N. By the compactness of F , {xk } is a bounded sequence
that by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent
subsequence. Without loss of generality, suppose that xk → x̄, which by the
closedness of F implies that x̄ ∈ F . Invoking the Closed Graph Theorem,
Theorem 2.84, ξ¯ ∈ ∂φ(x̄) ⊂ ∂φ(F ). Thus, ∂φ(F ) is closed.
Now to establish the compactness of ∂φ(F ), we will establish the bounded-
ness of ∂φ(F ). On the contrary, suppose that there exist a bounded sequence
{xk } ⊂ F and an unbounded sequence {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ).
ξk
Define ηk = , which is a bounded sequence. Because {xk } and {ηk } are
kξk k
bounded sequences, by the Bolzano–Weierstrass Theorem, have a convergent
subsequence. As ξk ∈ ∂φ(xk ), by Definition 2.77 of subdifferential,
φ(xk + ηk ) − φ(xk ) ≥ hξk , ηk i = kξk k.
By Theorem 2.69, φ is continuous on Rn , which along with the convergence

of {xk } and {ηk } yields that φ(xk + ηk ) − φ(xk ) is bounded. Therefore, by
the above inequality, {ξk } is a bounded sequence, thereby contradicting our
assumption. Thus, ∂φ(F ) is a bounded set and hence compact.
Theorem 2.86 Consider a proper convex function φ : Rn → R̄. Then ∂φ is

usc on int dom φ. Moreover, if φ : Rn → R is a differentiable convex function,
then it is continuously differentiable.
Proof. By Proposition 2.82, ∂φ(x̄) is nonempty and compact if and only if

x̄ ∈ int dom φ. By Theorem 2.84, ∂φ is graph closed. Therefore, from the
discussion on set-valued mappings in Chapter 1, ∂φ is usc on int dom φ.
As for a single-valued map, the notion of upper semicontinuity coincides
with that of continuity and by Proposition 2.80, for a differentiable convex
function ∂φ = {∇φ}, φ is continuously differentiable.
Below we state another important characteristic of the subdifferential with-
out proof. For more details on the treatment of ∂φ as a multifunction, one
may refer to Rockafellar [97].

Theorem 2.87 Consider a closed proper convex function φ : Rn → R̄. Then

the subdifferential ∂φ is a maximal monotone where by monotonicity we mean
that for any x1 , x2 ∈ Rn ,
hξ1 − ξ2 , x1 − x2 i ≥ 0, ∀ ξi ∈ ∂φ(xi ), i = 1, 2,
and maximal monotone map in the sense that its graph is not properly con-
tained in the graph of any other monotone map.
Similar to the standard Mean Value Theorem, Theorem 1.18, we present

the Mean Value Theorem for convex functions in terms of the subdifferential.
Theorem 2.88 Consider a convex function φ : Rn → R. Then for x, y ∈ Rn ,

there exists z ∈ (x, y) such that
φ(y) − φ(x) ∈ h∂φ(z), y − xi,
where h∂φ(z), y − xi = {hξ, y − xi : ξ ∈ ∂φ(z)}.
Proof. Consider the function ψ : [0, 1] → R defined by
ψ(λ) = φ(x + λ(y − x)) − φ(x) + λ(φ(x) − φ(y)).
Because φ is real-valued and by Theorem 2.69 it is continuous on Rn , hence ψ

is a real-valued continuous function on [0,1]. Observe that ψ(0) = 0 = ψ(1).
Also, by the convexity of φ,
ψ(λ) ≤ (1 − λ)φ(x) + λφ(y) − φ(x) + λ(φ(x) − φ(y)) = 0, ∀ λ ∈ [0, 1].
Thus, ψ attains its maximum at λ = 0 and λ = 1 and hence there exists

λ̄ ∈ (0, 1) at which ψ attains its minimum over [0, 1]. Therefore,
ψ ′ (λ̄, d) ≥ 0, ∀ d ∈ R.
Denote z = x + λ̄(y − x) ∈ (x, y). Therefore,
ψ(λ̄ + λd) − ψ(λ̄)

ψ ′ (λ̄, d) = lim
λ↓0 λ
φ(x + (λ̄ + λd)(y − x)) − φ(x + λ̄(y − x))
= lim + d(φ(x) − φ(y))
λ↓0 λ
= φ′ (z, d(y − x)) + d(φ(x) − φ(y)), ∀ d ∈ R,
which implies that
φ′ (z, d(y − x)) ≥ d(φ(y) − φ(x)), ∀ d ∈ R.
In particular, taking d = 1 in the above condition leads to
φ(y) − φ(x) ≤ φ′ (z, y − x),

whereas taking d = −1 yields
−φ′ (z, x − y) ≤ φ(y) − φ(x).
Combining the preceding inequalities imply
−φ′ (z, x − y) ≤ φ(y) − φ(x) ≤ φ′ (z, y − x),
which by Theorem 2.79 becomes
inf hξ, y − xi = − sup hξ, x − yi ≤ φ(y) − φ(x) ≤ sup hξ, y − xi.

ξ∈∂φ(z) ξ∈∂φ(z) ξ∈∂φ(z)
By Proposition 2.83, ∂φ(z) is compact, which along with the continuity of

hξ, y − xi implies that there exists ξ¯ ∈ ∂φ(z) such that
¯ y − xi ∈ h∂φ(z), y − xi,
φ(y) − φ(x) = hξ,

We have discussed the various continuity and differentiability behaviors
of convex functions but in most cases these properties were restricted to the
interior or relative interior of the domain of the function. As seen in the
discussion preceding Proposition 2.82, the subdifferential set may be empty
at the boundary of the domain. To overcome this flaw of the subdifferential
of a convex function, we have the notion of ε-subdifferentials, which have the
nonemptiness property throughout the domain of the function. We will discuss
this notion in a later section in the chapter.
As we are interested in the convex optimization problem, we first give the
optimality condition for the unconstrained convex programming problem
min f (x) subject to x ∈ Rn , (CPu )
where f : Rn → R is a convex function.
Theorem 2.89 Consider the unconstrained convex programming problem

(CPu ). Then x̄ ∈ Rn is the point of minimizer of (CPu ) if and only if
0 ∈ ∂f (x̄).
Proof. Suppose that x̄ ∈ Rn is a point of minimizer of (CPu ), which implies

that
f (x) − f (x̄) ≥ 0, ∀ x ∈ Rn .
By Definition 2.77 of subdifferential, 0 ∈ ∂f (x̄). The converse can be proved

by again employing the definition of the subdifferential.
Now recall the constrained convex programming problem presented in
Chapter 1:

where f : Rn → R is a convex function and C is a convex subset of Rn . Recall

the important property of convex optimization discussed in Section 1.3 that
makes its study useful is that every local minimizer is also a global minimizer.
The next result provides an alternative proof to this fact.
Theorem 2.90 Consider a convex set C ⊂ Rn and let f : Rn → R̄ be a

proper convex function. Then the point of local minimum is a point of global
minimum. If in addition f is strictly convex, there exists at most one global
point of minimum.
Proof. Suppose that x̄ ∈ Rn is a point of local minimum of f over C. We claim

that x̄ is a point of global minimum. On the contrary, assume that x̄ is not a
point of global minimum. Thus there exists x̃ ∈ C such that f (x̃) < f (x̄). By
the convexity of f , for every λ ∈ (0, 1),
f ((1 − λ)x̄ + λx̃) ≤ (1 − λ)f (x̄) + λf (x̃) < f (x̄). (2.20)
Also by the convexity of C, (1 − λ)x̄ + λx̃ ∈ C. Taking λ sufficiently small,

(1 − λ)x̄ + λx̃ is in the neighborhood of x̄, which by the inequality (2.20)
implies that
f ((1 − λ)x̄ + λx̃) < f (x̄),
which contradicts that x̄ is a point of local minimum. Hence, x̄ is a point of

global minimum of f over C.
Suppose that f is a strictly convex function with x̄ and ȳ as the points of
global minimum. Let f (x̄) = f (ȳ) = fmin , say. We claim that x̄ = ȳ. On the
contrary, assume that x̄ 6= ȳ. By Definition 2.47 of strict convexity, for every
λ ∈ (0, 1),
f ((1 − λ)x̄ + λȳ) < (1 − λ)f (x̄) + λf (ȳ) = fmin . (2.21)
By the convexity of C, (1 − λ)x̄ + λȳ ∈ (x̄, ȳ) ⊂ C. Now the strict inequality
(2.21) contradicts the fact that x̄ and ȳ are the points of global minimizers
of f over C, which is a contradiction. Thus, x̄ = ȳ, thereby implying that
minimizing a strictly convex function f over a convex set C has at most one
point of global minimum.
As discussed earlier in this chapter, the above problem can be converted
into the unconstrained convex programming problem of the form (CPu ) with
the objective function f replaced by f + δC . From the above theorem, x̄ is the
point of minimizer of (CP ) if and only if
0 ∈ ∂(f + δC )(x̄).
To express the above inclusion explicitly in terms of the subdifferentials of

the objective function f and the indicator function δC , one needs the calculus
rules for the subdifferentials. Thus, following this path we shall now discuss
the subdifferential calculus rules.

2.4 Subdifferential Calculus

As we have already seen that subdifferentials play a pivotal role in the convex
analysis. It replaces the role of derivative in case of nondifferentiable convex
functions. So it is obvious to look into the matter as to whether or not the dif-
ferential calculus is carried over to subdifferential calculus. As we proceed in
this direction, one will see that it does satisfy results similar to standard cal-
culus but under certain assumptions. We begin our journey of subdifferential
calculus with the sum rule.
Theorem 2.91 (Moreau–Rockafellar Sum Rule) Consider two proper convex

functions φi : Rn → R̄, i = 1, 2. Suppose that ri dom φ1 ∩ ri dom φ2 6= ∅.
Then
∂(φ1 + φ2 )(x) = ∂φ1 (x) + ∂φ2 (x)
for every x ∈ dom(φ1 + φ2 ).
Proof. We first show that
∂φ1 (x̄) + ∂φ2 (x̄) ⊂ ∂(φ1 + φ2 )(x̄). (2.22)
Suppose that ξi ∈ ∂φi (x̄), i = 1, 2. By the definition of a subdifferential,
φi (x) − φi (x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i = 1, 2.
Therefore,
(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ1 + ξ2 , x − x̄i, ∀ x ∈ Rn ,
which implies that (ξ1 + ξ2 ) ∈ ∂(φ1 + φ2 )(x̄), thereby establishing (2.22).

To obtain the result, we will now prove the reverse inclusion, that is,
∂(φ1 + φ2 )(x̄) ⊂ ∂φ1 (x̄) + ∂φ2 (x̄). (2.23)
Suppose that ξ ∈ ∂(φ1 + φ2 )(x̄). Define two convex functions
ψ1 (x) = φ1 (x + x̄) − φ1 (x̄) − hξ, xi and ψ2 (x) = φ2 (x + x̄) − φ2 (x̄).
Here, ψ1 (0) = ψ2 (0) = 0. Observe that ξ ∈ ∂(φ1 + φ2 )(x̄) which by the above
constructed functions is equivalent to
(ψ1 + ψ2 )(x) ≥ 0, ∀ x ∈ Rn ,
that is, 0 ∈ ∂(ψ1 + ψ2 )(0). Thus, without loss of generality, consider x̄ = 0,

ξ = 0, and φ1 (0) = φ2 (0) = 0 such that
0 ∈ ∂(φ1 + φ2 )(0),

2.4 Subdifferential Calculus 99
which implies
(φ1 + φ2 )(x) ≥ (φ1 + φ2 )(0) = 0, ∀ x ∈ Rn ,
that is, φ1 (x) ≥ −φ2 (x) for every x ∈ Rn . Define
F1 = {(x, α) ∈ Rn × R : φ1 (x) ≤ α}
and F2 = {(x, α) ∈ Rn × R : α ≤ −φ2 (x)}.

Observe that both F1 and F2 are closed convex sets, where by Proposition 2.64,
ri F1 = ri epi φ1 = {(x, α) ∈ Rn × R : x ∈ ri dom φ1 , φ1 (x) < α}.
As φ1 (x) ≥ −φ2 (x), we have
ri F1 ∩ F2 = ∅
with (0, 0) ∈ F1 ∩F2 . Therefore, by the separation theorem, Theorem 2.26 (ii),
there exists (x∗ , α∗ ) ∈ Rn × R with (x∗ , α∗ ) 6= (0, 0) such that
hx∗ , xi + α∗ α ≥ 0, ∀ (x, α) ∈ F1 ,
hx∗ , xi + α∗ α ≤ 0, ∀ (x, α) ∈ F2 .
By assumption as φ1 (0) = 0, we have (0, α) ∈ F1 for α ≥ 0. Therefore, from

the inequality above, we have α∗ ≥ 0. We claim that α∗ 6= 0. Suppose that
α∗ = 0. Thus the above inequalities imply
hx∗ , x1 i ≥ 0 ≥ hx∗ , x2 i, ∀ x1 ∈ dom φ1 , ∀ x2 ∈ dom φ2 .
This implies that dom φ1 and dom φ2 can be separated, which contradicts
the hypothesis that ri dom φ1 ∩ ri dom φ2 6= ∅. Hence, α∗ > 0 and can be
normalized to one and thus
hx∗ , xi + α ≥ 0, ∀ (x, α) ∈ F1 ,
hx∗ , xi + α ≤ 0, ∀ (x, α) ∈ F2 .
In particular, for (x, φ1 (x)) ∈ F1 and (x, −φ2 (x)) ∈ F2 , we have −x∗ ∈ ∂φ1 (0)
and x∗ ∈ ∂φ2 (0), thereby leading to
0 ∈ ∂φ1 (0) + ∂φ2 (0),
thus establishing (2.23) and hence completing the proof.

The necessity of the condition ri dom φ1 ∩ ri dom φ2 6= ∅ can be seen from
the following example from Phelps [93]. Consider φ1 , φ2 : R2 → R̄ defined as
φ1 (x) = δF1 (x), F1 = epi y 2 , y ∈ R,

φ2 (x) = δF2 (x), F2 = {(y1 , y2 ) ∈ R2 : y2 = 0}.

Here, ∂(φ1 + φ2 )(0) = R2 whereas
∂φ1 (0) = {(0, ξ) ∈ R2 : ξ ≤ 0} and ∂φ2 (0) = {(0, ξ) ∈ R2 : ξ ∈ R}.
Therefore, ∂(φ1 + φ2 )(0) 6= ∂φ1 (0) + ∂φ2 (0). Observe that dom φ1 ∩ dom φ2 =
F1 ∩ F2 = {(0, 0)} while ri dom φ1 ∩ ri dom φ2 = ri F1 ∩ ri F2 = ∅.
Now as an application of the Subdifferential Sum Rule, we prove the equal-
ity in Proposition 2.39 (i) under the assumption of ri F1 ∩ ri F2 6= ∅.
Proof of Proposition 2.39 (i). For convex sets F1 , F2 ⊂ Rn , define φ1 = δF1
and φ2 = δF2 . Observe that dom φi = Fi for i = 1, 2. If ri F1 ∩ ri F2 6= ∅,
then ri dom φ1 ∩ ri dom φ2 6= ∅. Now applying the Sum Rule, Theorem 2.91,
∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄), ∀ x̄ ∈ dom φ1 ∩ dom φ2 ,
which along with the facts that δF1 + δF2 = δF1 ∩F2 and ∂δF = NF implies
that
NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 ,
hence completing the proof.

n
Now if in Theorem 2.91, φi : R → R for i = 1, 2 are real-valued convex
functions, then the Sum Rule can be derived using the directional derivative.
We briefly discuss that approach from Hiriart-Urruty and Lemaréchal [63].
Using Theorem 2.79, the support of ∂φ1 (x̄) + ∂φ2 (x̄) is φ′1 (x̄, .) + φ′2 (x̄, .).
Readers are advised to verify this fact using the definition of support. Also,
the support of ∂(φ1 + φ2 )(x̄) is (φ1 + φ2 )′ (x̄, .), which is same as that of
∂φ1 (x̄) + ∂φ2 (x̄). Because the support functions are same for both sets,
∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄).
Observe that no additional assumption was required as here ri dom φ1 as well

as ri dom φ2 is Rn .
Other than the sum of convex functions being convex, from Proposi-
tion 2.53, we have that the composition of a nondecreasing convex function
with a convex function is also convex. So before presenting the Chain Rule, we
introduce the notion of increasing function defined over Rn and a result on the
subdifferential of a nondecreasing function. Recall that in Proposition 2.53,
the nondecreasing function ψ was defined over R.
Definition 2.92 A function φ : Rn → R is called nondecreasing if for

x, y ∈ Rn with xi ≥ yi , i = 1, 2, . . . , n, implies that φ(x) ≥ φ(y).
Theorem 2.93 Consider a nondecreasing convex function φ : Rn → R. Then

for every x ∈ Rn , ∂φ(x) ⊂ Rn+ .

Proof. Because φ is a nondecreasing convex function,

φ(x̄) ≥ φ(x̄ − ei ) ≥ φ(x̄) + hξ, −ei i,
where ei = (0, . . . , 0, 1, 0, . . . , 0) with 1 at the i-th place and ξ ∈ ∂φ(x̄). This
implies that
φ(x̄) ≥ φ(x̄) − ξi ,
that is, ξi ≥ 0. Since i was arbitrary, ξi ≥ 0, i = 1, 2, . . . , n and thus
∂φ(x̄) ⊂ Rn+ .
We now present the subdifferential calculus rule of the composition of
convex functions. The proof is from Hiriart-Urruty and Lemaréchal [63].
Theorem 2.94 (Chain Rule) Consider a nondecreasing convex function
φ : Rm → R and a vector-valued function Φ : Rn → Rm given by
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) where φi : Rn → R, i = 1, 2, . . . , m be a
convex function. Then
(m
X
∂(φ ◦ Φ)(x̄) = µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m} .
Proof. Define
(m )
X
F= µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)), ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m .
i=1
We will prove the result in the following steps:

1. We shall show that F is a convex compact set as ∂(φ ◦ Φ).
2. We shall calculate the support function of F.
3. We shall calculate the support function of ∂(φ ◦ Φ) and establish that it
is same as the support of F.
The result is completed by the fact that two convex sets are equal if and
only if their support functions are equal.
Step 1: Consider any ξ ∈ F. Thus there exist (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄))
and ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m, such that
m
X
ξ= µi ξi .
i=1
Therefore,
m
X
kξk ≤ |µi | kξi k.
i=1

By Proposition 2.83, ∂φ(Φ(x̄)) as well ∂φi (x̄), i = 1, 2, . . . , m, are bounded

sets and hence ξ is bounded. Because ξ ∈ F was arbitrary, F is a bounded
set. Moreover, ∂φ(Φ(x̄)) and ∂φi (x̄), i = 1, 2, . . . , m, are closed sets; thus F
is also a closed set, thereby yielding the compactness of F .
Suppose that ξ1 , ξ2 ∈ F, which implies for j = 1, 2,
m
X
ξj = µji ξij ,
i=1
where (µj1 , µj2 , . . . , µjm ) ∈ ∂φ(Φ(x̄) and ξij ∈ ∂φi (x̄), i = 1, 2, . . . , m, for
j = 1, 2. Now for any λ ∈ (0, 1), define
ξλ = (1 − λ)ξ1 + λξ2 .
From Theorem 2.93, µji ≥ 0 for i = 1, 2, . . . , m and j = 1, 2. Define
µλi = (1 − λ)µ1i + λµ2i , i = 1, 2, . . . , m.
Note that µλi = 0 only when µ1i = µ2i = 0 as λ ∈ (0, 1). Therefore,
X (1 − λ)µ1 λµ2i 2

i 1
ξ= µi ξi + ξi ,
¯
µi µi
i∈I
where I¯ = {i ∈ {1, 2, . . . , m} : µi > 0}. By Proposition 2.83, ∂φ(Φ(x̄)) and

∂φi (x̄), i = 1, 2, . . . , m, are convex sets and hence
(µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄))
(1 − λ)µ1i 1 λµ2i 2
and ξi + ξ ∈ ∂φi (x̄), i = 1, 2, . . . , m,
µi µi i
thereby showing that F is convex.
Step 2: Denote
Φ′ (x̄, d) = (φ′1 (x̄, d), φ′2 (x̄, d), . . . , φ′m (x̄, d)).
We will establish that
σF (d) = φ′ (Φ(x̄), Φ′ (x̄, d)).
Consider ξ ∈ F, which implies that

m
X
ξ= µi ξi ,
i=1
where (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)) and ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m. By The-

orem 2.79,
hξi , di ≤ φ′i (x̄, d), i = 1, 2, . . . , m.

By Theorem 2.93, µi ≥ 0, i = 1, 2, . . . , m, which along with the above in-

equality implies that
m
X m
X
hξ, di = µi hξi , di ≤ µi φ′i (x̄, d).
i=1 i=1
As µ = (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
m
X
µi φ′i (x̄, d) = hµ, Φ′ (x̄, d)i ≤ φ′ (Φ(x̄), Φ′ (x̄, d)).
i=1
We claim that there exists ξ¯ ∈ F such that
¯ di = φ′ (Φ(x̄), Φ′ (x̄, d)).

hξ,
By Proposition 2.83, ∂φ(Φ(x̄)) is compact and therefore, there exists

µ̄ = (µ̄1 , µ̄2 , . . . , µ̄m ) ∈ ∂φ(Φ(x̄)) such that
m
X
µ̄i φ′i (x̄, d) = hµ̄, Φ′ (x̄, d)i = φ′ (Φ(x̄), Φ′ (x̄, d)). (2.24)
i=1
Also, for i = 1, 2, . . . , m, ∂φi (x̄) is compact, which implies there exists

ξ¯i ∈ ∂φi (x̄) such that
hξ¯i , di = φ′i (x̄, d), i = 1, 2, . . . , m.
Therefore, the condition (2.24) becomes

m
X
µ̄i hξ¯i , di = φ′ (Φ(x̄), Φ′ (x̄, d)).
i=1
m
X
Denoting ξ¯ = µ̄i ξ¯i ∈ F,
i=1
¯ di = φ′ (Φ(x̄), Φ′ (x̄, d)),

hξ,
which implies that
σF (d) = φ′ (Φ(x̄), Φ′ (x̄, d)), ∀ d ∈ Rn .
Step 3: It is obvious that the support function of ∂(φ ◦ Φ)(x̄) is (φ ◦ Φ)′ (x̄, d).
We claim that
(φ ◦ Φ)′ (x̄, d) = φ′ (Φ(x̄), Φ′ (x̄, d)).

For real-valued convex functions φi , i = 1, 2, . . . , m, from Definition 2.74 of

directional derivative, it is obvious that
φi (x̄ + λd) = φi (x̄) + λφ′i (x̄, d) + o(λ), i = 1, 2, . . . , m,
which implies that
Φ(x̄ + λd) = Φ(x̄) + λΦ′ (x̄, d) + o(λ).
By Theorem 2.69, φ is continuous on ri dom φ = Rn which yields
φ(Φ(x̄ + λd)) = φ(Φ(x̄) + λΦ′ (x̄, d)) + o(λ),
which again by the definition of φ′ leads to
φ(Φ(x̄ + λd)) = φ(Φ(x̄)) + λφ′ (Φ(x̄), Φ′ (x̄, d)) + o(λ).
Dividing throughout by λ > 0 and taking the limit as λ → 0 reduces the
above condition to
(φ ◦ Φ)′ (x̄, d) = φ′ (Φ(x̄), Φ′ (x̄, d)).
Because the support functions of both the sets are same, the sets ∂(φ ◦ Φ) and
F coincide.
As we will discuss in this book, one of the ways to derive the optimality
conditions for (CP ) is the max-function approach, thereby hinting at the use
of subdifferential calculus for the max-function. Consider the convex functions
φi : Rn → R, i = 1, 2, . . . , m, and define the max-function
φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}.
Observe that φ can be expressed as a composition of the functions
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and ϕ(y) = max{y1 , y2 , . . . , ym }
given by φ(x) = (ϕ◦Φ)(x). It is now natural to apply the Chain Rule presented
above but along with that one needs to calculate ∂ϕ or ϕ′ (x, d). So before
moving on to establish the Max-Function Rule, we will present a result to
derive ϕ′ (x, d). The proof is from Hiriart-Urruty [59].
Theorem 2.95 Consider differentiable convex functions ϕi : Rn → R for
i = 1, 2, . . . , m. For x ∈ Rn , define
ϕ(x) = max{ϕ1 (x), ϕ2 (x), . . . , ϕm (x)}
and denote the active index set by I(x) defined as
I(x) = {i ∈ {1, 2, . . . , m} : ϕ(x) = ϕi (x)}.
Then
ϕ′ (x̄, d) = max {h∇ϕi (x̄), di}.
i∈I(x̄)

Proof. Without loss of generality, assume that I(x̄) = {1, 2, . . . , m} because

those ϕi where the maximum is not attained, do not affect ϕ′ (x̄, d). By the
definition of the max-function,
ϕ(x̄ + λd) ≥ ϕi (x̄ + λd), ∀ i = 1, 2, . . . , m,
which implies that
ϕ(x̄ + λd) − ϕ(x̄) ≥ ϕi (x̄ + λd) − ϕ(x̄), ∀ i = 1, 2, . . . , m.
As ϕ(x̄) = ϕi (x̄) for i ∈ I(x̄),
ϕ(x̄ + λd) − ϕ(x̄) ≥ ϕi (x̄ + λd) − ϕi (x̄), ∀ i ∈ I(x̄).
By Definition 2.74 of the directional derivative,

ϕi (x̄ + λd) − ϕi (x̄)
ϕ′ (x̄, d) ≥ lim , ∀ i ∈ I(x̄).
λ↓0 λ
Because φi , i ∈ I(x̄) are differentiable functions, which along with the above
inequality yields
ϕ′ (x̄, d) ≥ max h∇ϕi (x̄), di.

i∈I(x̄)
To establish the result, we will prove the reverse inequality, that is,
ϕ′ (x̄, d) ≤ max h∇ϕi (x̄), di.

i∈I(x̄)
We claim that there exists a neighborhood N (x̄) such that I(x) ⊂ I(x̄) for
every x ∈ N (x̄). On the contrary, assume that there exists {xk } ⊂ Rn with
xk → x̄ such that I(xk ) 6⊂ I(x̄). Therefore, we may choose ik ∈ I(xk ) but
/ I(x̄). As {ik } ⊂ {1, 2, . . . , m} for every k ∈ N, by the Bolzano–Weierstrass
ik ∈
Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of
generality, suppose that ik → ī. Because I(xk ) is closed, ī ∈ I(xk ), which
implies ϕī (xk ) = ϕ(xk ). By Theorem 2.69, the functions are continuous on
Rn . Thus ϕī (x̄) = ϕ(x̄), that is, ī ∈ I(x̄). Because ik ∈ / I(x̄) for every k ∈ N,
which implies that ī 6∈ I(x̄), which is a contradiction, thereby establishing the
claim.
Now consider {λk } ⊂ R+ such that λk → 0. Observe that
ϕik (x̄ + λk d) = ϕ(x̄ + λk d), ∀ ik ∈ I(x̄ + λk d).
For sufficiently large k ∈ N, we may choose ik ∈ I(x̄). Because I(x̄) is closed,

which along with the Bolzano–Weierstrass Theorem implies that ik has a con-
vergent subsequence. Without loss of generality, assume that {ik } converges
to ī ∈ I(x̄). We may choose ik = ī. Therefore,
ϕ(x̄ + λk d) − ϕ(x̄)
lim ≤ max h∇ϕi (x̄), di.
k→∞ λk i∈I(x̄)

By Theorem 2.76, the directional derivative of a convex function always exists

and therefore,
ϕ(x̄ + λd) − ϕ(x̄)

ϕ′ (x̄, d) = lim ≤ max h∇ϕi (x̄), di,
λ→0 λ i∈I(x̄)
hence completing the proof.

We are now in a position to obtain the Subdifferential Max-Function Rule
as an application of the Chain Rule, Theorem 2.94, and the result Theo-
rem 2.95 established above.
Theorem 2.96 (Max-Function Rule) Consider convex functions φi : Rn →

R, i = 1, 2, . . . , m, and let φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}. Then
[
∂φ(x̄) = co , ∂φi (x̄),
i∈I(x̄)
where I(x̄) denotes the active index set.
Proof. In the discussion preceding Theorem 2.95, we observed that φ = ϕ ◦Φ,

where
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and ϕ(y) = max{y1 , y2 , . . . , ym }
with y = (y1 , y2 , . . . , ym ) ∈ Rm . By Theorem 2.95,
ϕ′ (y, d) = max
′
{hei , di},
i∈I (y)
where ei = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rm with 1 at the i-th place and

I ′ (y) = {i ∈ {1, 2, . . . , m} : yi = ϕ(y)}. It is obvious that ϕ′ (y, .) is a support
function of {ei ∈ Rm : i ∈ I ′ (y)} and by Proposition 2.61, it is also the support
function of co {ei ∈ Rm : i ∈ I ′ (y)}. Therefore, by Theorem 2.79,
∂ϕ(y) = co {ei ∈ Rm : i ∈ I ′ (y)},
that is,
∂ϕ(y) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I ′ (y),

m
X
/ I ′ (y),
µi = 0, i ∈ µi = 1}.
i=1
Thus,
∂ϕ(Φ(x̄)) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I ′ (Φ(x̄)),

m
X
/ I ′ (Φ(x̄)),
µi = 0, i ∈ µi = 1}.
i=1

As I ′ (Φ(x̄)) = I(x̄), the above condition reduces to
∂ϕ(Φ(x̄)) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I(x̄),

Xm
µi = 0, i ∈
/ I(x̄), µi = 1}.
i=1
As ϕ is a nondecreasing convex function, applying Theorem 2.94 to φ = ϕ ◦ Φ

yields
Xm
∂φ(x̄) = { µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m}
Xm Xm
= { µi ξi : µi ≥ 0, i ∈ I(x̄), µi = 0, i ∈
/ I(x̄), µi = 1,
i=1 i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m},
which implies
[
∂φ(x̄) = co ∂φi (x̄),
i∈I(x̄)

Observe that in the Max-Function Rule above, the maximum was over a
finite index set. Now if the index set is a compact set, need not be finite, then
what will the subdifferential for sup-function be? This aspect was looked into
by Valadier [109] and thus is also referred to as the Valadier Formula. Below
we present the Valadier Formula from Ruszczyński [102].
Theorem 2.97 Consider a function
Φ(x) = sup φ(x, y),

y∈Y
where φ : Rn × Y → R̄. Let x̄ ∈ dom Φ such that

(i) φ(., y) is convex for every y ∈ Y ,
(ii) φ(x, .) is usc for every x ∈ Rn ,
(iii) Y ⊂ Rm is compact.
Furthermore, if φ(., y) is continuous at x̄ for every y ∈ Y , then
[
∂Φ(x̄) = co ∂x φ(x̄, y),
y∈Ŷ (x̄)
where Ŷ (x̄) = {y ∈ Y : φ(x̄, y) = Φ(x̄)} and ∂x φ denotes the subdifferential

with respect to x.

Proof. Observe that (ii) and (iii) ensure that Ŷ (x̄) is nonempty and compact.
Suppose that ξ ∈ ∂x φ(x̄, ȳ) for some ȳ ∈ Ŷ (x̄). By Definition 2.77 of the
subdifferential,
φ(x, ȳ) − φ(x̄, ȳ) ≥ hξ, x − x̄i, ∀ x ∈ Rn .
As ȳ ∈ Ŷ (x̄), φ(x̄, ȳ) = Φ(x̄). Therefore, the above inequality leads to
Φ(x) − Φ(x̄) = sup φ(x, y) − φ(x̄, ȳ) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,

y∈Y
thus implying that ξ ∈ ∂Φ(x̄). Because ȳ ∈ Ŷ (x̄) and ξ ∈ ∂x φ(x̄, ȳ) were
arbitrary,
∂Φ(x̄) ⊃ ∂x φ(x̄, y), ∀ y ∈ Ŷ (x̄).
Because ∂Φ(x̄) is convex, the preceding inclusion yields

[
∂Φ(x̄) ⊃ co ∂x φ(x̄, y).
y∈Ŷ (x̄)
To establish the converse, we will prove the reverse inclusion

S in the above
relation. Because ∂Φ(x̄) is closed, we first show that y∈Ŷ (x̄) ∂x φ(x̄, y) is
closed. Suppose that ξk ∈ ∂x φ(x̄, yk ), where {yk } ⊂ Ŷ (x̄) such that ξk → ξ. ¯
Because Ŷ (x̄) is compact and hence closed, {yk } is a bounded sequence. By
the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subse-
quence. Without loss of generality, suppose yk → ȳ, which by the closedness
of Ŷ (x̄) implies that ȳ ∈ Ŷ (x̄). By the definition of subdifferential along with
the facts that {yk } ⊂ Ŷ (x̄) and ȳ ∈ Ŷ (x̄), that is, φ(x̄, yk ) = Φ(x̄) = φ(x̄, ȳ)
imply that for every x ∈ Rn ,
φ(x, yk ) ≥ φ(x̄, yk ) + hξk , x − x̄i

= φ(x̄, ȳ) + hξk , x − x̄i, ∀ k ∈ N.
Taking the limit supremum as k → +∞, which by the upper semicontinuity

of φ(x, .) over Rn leads to
φ(x, ȳ) ≥ lim sup φ(x, yk ) ≥ φ(x̄, ȳ) + lim suphξk , x − x̄i
k→∞ k→∞
¯ x − x̄i, ∀ x ∈ Rn ,
= φ(x̄, ȳ) + hξ,
S
thereby yielding that ξ¯ ∈ ∂x φ(x̄, ȳ). Hence, y∈Ŷ (x̄) ∂x φ(x̄, y) is closed.
Now let us assume on the contrary that
[
∂Φ(x̄) 6⊂ co ∂x φ(x̄, y),
y∈Ŷ (x̄)

that is, there exists ξ¯ ∈ ∂Φ(x̄) such that

[
ξ¯ ∈
/ co ∂x φ(x̄, y).
y∈Ŷ (x̄)
S
As co y∈Ŷ (x̄) ∂x φ(x̄, y) is a closed convex set, by the Strict Separation The-
orem, Theorem 2.26 (iii), there exists d ∈ Rn with d 6= 0 such that
¯ di > hξ, di, ∀ ξ ∈ ∂x φ(x̄, y), ∀ y ∈ Ŷ (x̄).
hξ, (2.25)
Consider a sequence {λk } ⊂ R+ such that λk → 0. As Φ is convex, by the

definition of subdifferential,
Φ(x̄ + λk d) − Φ(x̄) ¯ di.

≥ hξ, (2.26)
λk
For k ∈ N, define the set

φ(x̄ + λk d, y) − Φ(x̄) ¯ di .
Yk = y ∈ Y : ≥ hξ,
λk
We claim that Yk is compact and nonempty. Consider {yr } ∈ Yk such that

yr → ŷ. Because yr ∈ Yk ,
φ(x̄ + λk d, yr ) − Φ(x̄) ¯ di.

≥ hξ,
λk
Taking the limit supremum as r → +∞, which along with the upper semicon-
tinuity of φ(x, .) for every x ∈ Rn implies that
φ(x̄ + λk d, ŷ) − Φ(x̄) ¯ di.

≥ hξ,
λk
Thus, ŷ ∈ Yk and hence Yk is closed for every k ∈ N. As Yk ⊂ Y and Y

is compact, Yk is closed and bounded and thus compact. Also by the upper
semicontinuity of φ(x, .) for every x ∈ Rn , Ŷ (x̄ + λk d) is nonempty. From the
inequality (2.26) and the definition of the set Yk , Ŷ (x̄ + λk d) ⊂ Yk and hence
Yk is nonempty. For every y ∈ Y , consider the expression
φ(x̄ + λd, y) − Φ(x̄) φ(x̄ + λd, y) − φ(x̄, y) φ(x̄, y) − Φ(x̄)

= + . (2.27)
λ λ λ
From the discussion preceding Theorem 2.76 on directional derivatives, the
first term on the right-hand side of the above expression is a nondecreasing
function of λ, that is,
φ(x̄ + λ1 d, y) − φ(x̄, y) φ(x̄ + λ2 d, y) − φ(x̄, y)

≤ , ∀ 0 < λ 1 ≤ λ2 . (2.28)
λ1 λ2

Also, as Φ(x̄) ≥ φ(x̄, y) for every y ∈ Y ,

φ(x̄, y) − Φ(x̄) φ(x̄, y) − Φ(x̄)
≤ , ∀ 0 < λ 1 ≤ λ2 , (2.29)
λ1 λ2
which implies that the second term is also nondecreasing in λ. Thus, combining
the conditions (2.28) and (2.29), the expression
φ(x̄ + λ1 d, y) − Φ(x̄) φ(x̄ + λ2 d, y) − Φ(x̄)
≤ , ∀ 0 < λ 1 ≤ λ2 ,
λ1 λ2
that is, the expression (2.27) is nondecreasing in λ. From the above inequality,
it is obvious that Y1 ⊂ Y2 for every 0 < λ1 ≤ λ2 . As {λk } is a decreasing
sequence,
Y1 ⊃ Y2 ⊃ Y3 ⊃ . . . .
As for every k ∈ N, Yk is compact and nonempty, there exists ỹ ∈ Yk for all
k ∈ N. Therefore,
φ(x̄ + λk d, ỹ) − Φ(x̄) ¯ di, ∀ k ∈ N,
≥ hξ,
λk
which implies that the term on the left-hand side is bounded below for
every k ∈ N. By the continuity of φ(., y) at x̄ for every y ∈ Y ,
φ(x̄ + λk d, ỹ) → φ(x̄, ỹ) which along with the lower boundedness yields that
ỹ ∈ Ŷ (x̄), that is, Φ(x̄) = φ(x̄, ỹ). Taking the limit as k → +∞ in the above
inequality along with Definition 2.74 of the directional derivative implies that
¯ di.
φ′ ((x̄, ỹ), d) ≥ hξ,
As φ(., ỹ) is continuous at x̄, any neighborhood of x̄ is contained in dom φ(., ỹ).
Thus, x̄ ∈ int dom φ(., ỹ), which by Theorem 2.79 implies that φ′ (x̄, ỹ) is the
support function of ∂x φ(x̄, ỹ). Also, by Proposition 2.82, ∂x φ(x̄, ỹ) is compact.
Therefore, there exists ξ ∈ ∂x φ(x̄, ỹ) such that the above inequality becomes
¯ di,
hξ, di ≥ hξ,
thereby contradicting the inequality (2.25) as ỹ ∈ Ŷ (x̄), hence completing the
proof.
From Proposition 2.56, another operation on the convex functions that
leads to a convex function is the inf-convolution. We end this section of the
subdifferential calculus rules by presenting the subdifferential rule for the inf-
convolution for a particular case from Lucchetti [79].
Theorem 2.98 Consider proper lsc convex functions φi : Rn → R̄, i = 1, 2.
Let x̄, x1 , x2 ∈ Rn be such that
x1 + x2 = x̄ and (φ1 φ2 )(x̄) = φ1 (x1 ) + φ2 (x2 ).
Then
∂(φ1 φ2 )(x̄) = ∂φ1 (x1 ) ∩ ∂φ2 (x2 ).

2.5 Conjugate Functions 111
Proof. Suppose that ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ). By Definition 2.77 of the subd-
ifferential, for i = 1, 2,
φi (yi ) − φi (xi ) ≥ hξ, yi − xi i, ∀ yi ∈ Rn .
Define y1 +y2 = ȳ. The above inequality along with the given hypothesis leads
to
φ1 (y1 ) + φ2 (y2 ) ≥ (φ1 φ2 )(x̄) + hξ, ȳ − x̄i, ∀ y1 , y2 ∈ Rn .
Taking the infimum over y1 and y2 satisfying y1 + y2 = ȳ in the above inequal-

ity, which by Definition 2.54 of the inf-convolution yields
(φ1 φ2 )(ȳ) ≥ (φ1 φ2 )(x̄) + hξ, ȳ − x̄i.
As ȳ ∈ Rn was arbitrary, the above inequality holds for every ȳ ∈ Rn .

Thus, ξ ∈ ∂(φ1 φ2 )(x̄). Because ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ) was arbitrary,
∂φ1 (x1 ) ∩ ∂φ2 (x2 ) ⊂ ∂(φ1 φ2 )(x̄).
Conversely, suppose that ξ ∈ ∂(φ1 φ2 )(x̄). Therefore,
∂(φ1 φ2 )(ȳ) ≥ φ1 (x1 ) + φ2 (x2 ) + hξ, ȳ − x̄i, ∀ ȳ ∈ Rn .
As the above inequality holds for any ȳ ∈ Rn , then ȳ = x + x2 for some

x ∈ Rn . Substituting in the above inequality along with the definition of the
inf-convolution yields
φ1 (x) + φ2 (x2 ) ≥ φ1 (x1 ) + φ2 (x2 ) + hξ, (x + x2 ) − (x1 + x2 )i, ∀ x ∈ Rn ,
which implies that
φ1 (x) ≥ φ1 (x1 ) + hξ, x − x1 i, ∀ x ∈ Rn .
Therefore, ξ ∈ ∂φ1 (x1 ). Similarly, it can be shown that ξ ∈ ∂φ2 (x2 ) and
hence ξ ∈ ∂φ1 (x1 ) ∩ φ2 (x2 ). Because ξ ∈ ∂(φ1 φ2 )(x̄) was arbitrary,
∂(φ1 φ2 )(x̄) ⊂ ∂φ1 (x1 ) ∩ φ2 (x2 ), thereby establishing the result.
2.5 Conjugate Functions

All this background on convexity, convex sets as well as convex functions, the
subdifferentials and their calculus form a backbone for the study of convex
optimization theory. Optimization problems appear not only in the specialized
fields of engineering, management sciences, and finance, but also in some sim-
ple real-life problems. For instance, if the cost of manufacturing x1 , x2 , . . . , xn
quantities of n goods is given by φ(x) and the price of selling these goods

is ξ1 , ξ2 , . . . , ξn , respectively, then the manufacturer would like to choose the

quantities x1 , x2 , . . . , xn in such a way that it leads to maximum profit, where
the profit function is given by the affine function {hξ, xi − φ(x)}. Theoret-
ically, this problem had been expressed using the conjugate functions of φ
introduced by Fenchel [45], which forms a class of convex functions. As we
will see in a short while, these conjugate functions are related to not only the
subdifferential for a convex function by the Fenchel–Young inequality but also
to the ε-subdifferential via its epigraph. For convex functions, the very idea of
conjugacy seems to derive from the fact that a proper lsc convex function is
a pointwise supremum of affine functions majorized by it. But before moving
on to this result, we present the following lemma from Lucchetti [79].
Lemma 2.99 Consider a proper lsc convex function φ : Rn → R̄. Let

x̄ ∈ dom φ and γ ∈ R such that φ(x̄) > γ. Then there exists (a, b) ∈ Rn × R
such that the affine function h(x) = ha, xi + b satisfies
φ(x) ≥ h(x), ∀ x ∈ Rn and h(x̄) > γ.
Proof. As φ is an lsc convex function, by Theorem 1.9 and Proposition 2.48,

epi φ is a closed convex set in Rn × R. From the given hypothesis, it is obvious
that (x̄, γ) ∈
/ epi φ. By the Strict Separation Theorem, Theorem 2.26 (iii),
there exist (a, λ) ∈ Rn × R with (a, λ) 6= (0, 0) and b ∈ R such that
ha, xi + λα ≥ b > ha, x̄i + λγ, ∀ (x, α) ∈ epi φ. (2.30)
In particular, taking (x̄, φ(x̄)) ∈ epi φ, the above inequality reduces to
λ(φ(x̄) − γ) > 0.
As φ(x̄) > γ, the above strict inequality leads to λ > 0. Again, taking
(x, φ(x)) ∈ epi φ in the condition (2.30) yields
φ(x) ≥ h(x), ∀ x ∈ dom φ and h(x̄) > γ,

−a b
where h(x) = h , xi + . Observe that for x ∈
/ dom φ, the first inequality
λ λ
holds trivially, that is,
φ(x) ≥ h(x), ∀ x ∈ Rn ,

Now we present the main result, the proof of which is from Lucchetti [79].
Theorem 2.100 A proper lsc convex function φ : Rn → R̄ can be expressed

as a pointwise supremum of the collection of all affine functions majorized by
it, that is, for every x ∈ Rn ,
φ(x) = sup{h(x) : φ(x) ≥ h(x), h(x) = ha, xi + b, a ∈ Rn , b ∈ R}.

Proof. Define the function Φ : Rn → R̄ as
Φ(x) = sup{h(x) : φ(x) ≥ h(x), h(x) = ha, xi + b, a ∈ Rn , b ∈ R}.
Because Φ is a pointwise supremum of affine functions, it is an lsc convex

function. Also, as φ(x) ≥ h(x), φ(x) ≥ Φ(x) for every x ∈ Rn , which implies
that epi φ is contained in the intersection of epigraph of the affine functions
h, epi h, which are majorized by φ, that is, φ(x) ≥ h(x) for every x ∈ Rn .
Therefore, to complete the proof, it is sufficient to prove that for (x̄, γ) 6∈ epi φ,
there exists an affine function h such that h(x̄) > γ. By Lemma 2.99, for
x̄ ∈ dom φ such an h exists.
Now suppose that x̄ ∈/ dom φ. As (x̄, γ) 6∈ epi φ, working along the lines
of the proof of Lemma 2.99, there exist (a, λ) ∈ Rn × R with (a, λ) 6= (0, 0)
and b ∈ R such that
ha, xi + λα ≥ b > ha, x̄i + λγ, ∀ (x, α) ∈ epi φ.
If λ 6= 0, the affine function h exists as in the lemma. If λ = 0, the above

ha, xi ≥ b > ha, x̄i, ∀ x ∈ dom φ.
From the above condition,
h(x) ≤ 0, ∀ x ∈ dom φ and h(x̄) > 0,
where h(x) = h−a, xi + b. As a consequence of Lemma 2.99, it is obvious that

a proper lsc convex function has at least one affine function majorized by it.
Therefore, φ has an affine function, say h̄, majorized by it, that is φ(x) ≥ h̄(x)
for every x ∈ Rn . Now for any µ > 0,
φ(x) ≥ h(x) + µh̄(x), ∀ x ∈ dom φ.
The above inequality holds trivially for x ∈

/ dom φ. Thus,
φ(x) ≥ (h + µh̄)(x), ∀ x ∈ Rn ,
which implies the affine function (h + µh̄) is majorized by φ. As h(x̄) > 0, for
µ sufficiently large, (h + µh̄)(x̄) > γ, thereby establishing the result.
Denote the set of all affine functions by H. Consider the support set of φ
denoted by supp(φ, H), which is the collection of all affine functions majorized
by φ, that is,
supp(φ, H) = {h ∈ H : h(x) ≤ φ(x), ∀ x ∈ Rn }.
An affine function h ∈ H is the affine support of φ if
h(x) ≤ φ(x), ∀ x ∈ Rn and h(x̄) = φ(x̄), for some x̄ ∈ Rn .

Consider φ : Rn → R̄ and x̄ ∈ dom φ such that ∂φ(x̄) is nonempty. Then for

any ξ ∈ ∂φ(x̄), by Definition 2.77,
φ(x) ≥ hξ, xi + (φ(x̄) − hξ, x̄i), ∀ x ∈ Rn . (2.31)
Define an affine function h : Rn → R given by
h(x) = hξ, xi + (φ(x̄) − hξ, x̄i). (2.32)
Combining (2.31) and (2.32),
h(x) ≤ φ(x), ∀ x ∈ Rn and h(x̄) = φ(x̄),
thereby implying that h ∈ H is an affine support of φ. Therefore, if ∂φ is

nonempty, then there exists an affine support to it.
Now consider a set Φ∗ ⊂ Rn defined as
¯ ᾱ) ∈ Rn × R : h(x) = hξ,
Φ∗ = {(ξ, ¯ xi − ᾱ ≤ φ(x)},
which implies for every x ∈ Rn , h(x) ≤ φ(x). Therefore,

¯ xi − φ(x)},
ᾱ ≥ sup {hξ,
x∈Rn
which implies Φ∗ can be considered the epigraph of the function φ∗ , which is

the conjugate of φ. We formally introduce the notion of conjugate below.
Definition 2.101 Consider a function φ : Rn → R̄. The conjugate of φ,

φ∗ : Rn → R̄, is defined as
φ∗ (ξ) = sup {hξ, xi − φ(x)}.

x∈Rn
Observe that Φ∗ = epi φ∗ , as discussed above. The biconjugate of φ, φ∗∗ , is

the conjugate of φ∗ , that is,
φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)}.

ξ∈Rn
Consider a set F ⊂ Rn . The conjugate of the indicator function to the set

F is
δF∗ (ξ) = sup {hξ, xi − δF (x)} = sup hξ, xi,

x∈Rn x∈F
which is actually the support function to the set F . Therefore, δF∗ = σF for
any set F .
Observe that the definitions of conjugate and biconjugate functions are
given for any arbitrary function. Below we present some properties of conju-
gate functions.

Proposition 2.102 For any function φ : Rn → R̄, the conjugate function

φ∗ is always lsc convex. In addition, if φ is proper convex, then φ∗ is also a
proper convex function.
Proof. Consider any ξ1 , ξ2 ∈ Rn . Then for every λ ∈ [0, 1],
φ∗ ((1 − λ)ξ1 + λξ2 ) = sup {h((1 − λ)ξ1 + λξ2 ), xi − φ(x)}

x∈Rn
= sup {(1 − λ)(hξ1 , xi − φ(x)) + λ(hξ2 , xi − φ(x))},
x∈Rn
which by Proposition 1.7 leads to
φ∗ ((1 − λ)ξ1 + λξ2 ) ≤ (1 − λ) sup {hξ1 , xi − φ(x)} +

x∈Rn
λ sup {hξ2 , xi − φ(x)}
x∈Rn
∗ ∗
= (1 − λ)φ (ξ1 ) + λφ (ξ2 ), ∀ λ ∈ [0, 1].
Because ξ1 and ξ2 are arbitrary, from the above inequality φ∗ is convex. Also,
as φ∗ is a pointwise supremum of affine functions hx, .i − φ(x), it is lsc.
As φ is a proper convex function, dom φ is a nonempty convex set in Rn ,
which by Proposition 2.14 (i) implies that ri dom φ is nonempty. Also, by
Proposition 2.82, for any x̄ ∈ ri dom φ, ∂φ(x̄) is nonempty. Suppose that
ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential implies that
hξ, x̄i − φ(x̄) ≥ hξ, xi − φ(x), ∀ x ∈ Rn ,
which along with the definition of conjugate φ∗ implies that
hξ, x̄i − φ(x̄) = φ∗ (ξ).
As φ(x̄) is finite, φ∗ (ξ) is also finite, that is, ξ ∈ dom φ∗ . Also, by the proper-
ness of φ and the definition of φ∗ , it is obvious that φ∗ (ξ) > −∞ for every
x ∈ Rn , thereby showing that φ∗ is proper convex function.
Observe that φ∗ is lsc convex irrespective of the nature of φ but for φ∗ to
be proper, we need φ to be a proper convex function. Simply assuming φ to be
proper need not imply that φ∗ is proper. For instance, consider φ(x) = −x2 ,
which is a nonconvex proper function. Then φ∗ ≡ +∞ and hence not proper.
Next we state some conjugate rules that can be proved directly using the
definition of conjugate functions.
Proposition 2.103 Consider a function φ̄, φ : Rn → R̄.

(i) If φ̄ ≤ φ, then φ̄∗ ≥ φ∗ .
(ii) If φ̄(x) = φ(x) + c, φ̄∗ (ξ) = φ∗ (ξ) − c.
(iii) If φ̄(x) = λφ(x) for λ > 0, φ̄∗ (ξ) = λφ∗ (ξ/λ).

(iv) For every x and ξ in Rn ,
φ∗ (ξ) + φ(x) ≥ hξ, xi.
This is known as the Fenchel–Young Inequality. Equivalently,
φ∗∗ (x) ≤ φ(x), ∀ x ∈ Rn .
The readers are urged to verify these properties simply using Defini-
tion 2.101 of conjugate and biconjugate functions.
As we discussed in Theorem 2.100 that a convex function is pointwise
supremum of affine functions, the biconjugate of the function plays an impor-
tant role in this respect. Below we present a result that relates the biconju-
gate with the support set. The proof is along the lines of Hiriart-Urruty and
Lemaréchal [63].
Theorem 2.104 Consider a proper function φ : Rn → R̄. Then φ∗∗ is the

pointwise supremum of all affine functions majorized by φ, that is,
φ∗∗ (x̄) = sup h(x̄).

h∈supp(φ,H)
More precisely, φ∗∗ = cl co φ.
Proof. An affine function h is majorized by φ, that is, h(x) ≤ φ(x) for every
x ∈ Rn . Because an affine function is expressed as h(x) = hξ, xi − α for some
ξ ∈ Rn and α ∈ R,
hξ, xi − α ≤ φ(x), ∀ x ∈ Rn .
Therefore, by Definition 2.101 of the conjugate function, φ∗ (ξ) ≤ α, which

implies ξ ∈ dom φ∗ . Then for any x ∈ Rn ,
sup h(x) = sup {hξ, xi − α}

h∈supp(φ,H) ξ∈dom φ∗ , φ∗ (ξ)≤α
= sup {hξ, xi − φ∗ (ξ)}

ξ∈dom φ∗
= sup {hξ, xi − φ∗ (ξ)} = φ∗∗ (x),
ξ∈Rn
thereby yielding the desired result. From Definition 2.57 of the closed convex
function, φ∗∗ = cl co φ, as desired.
Combining Theorems 2.100 and 2.104 we have the following result for a
proper lsc convex function.
Theorem 2.105 Consider a proper lsc convex function φ : Rn → R̄. Then

φ∗∗ = φ.

Observe that the above theorem holds when the function is lsc. What if
φ is only proper convex but not lsc, then how is one supposed to relate the
function φ to its biconjugate φ∗∗ ? The next result from Attouch, Buttazzo,
and Michaille [3] looks into this aspect.
Proposition 2.106 Consider a proper convex function φ : Rn → R̄. Assume
that φ admits a continuous affine minorant. Then φ∗∗ = cl φ. Consequently,
φ is lsc at x̄ ∈ Rn if and only if φ(x̄) = φ∗∗ (x̄).
Proof. By the Fenchel–Young inequality, Proposition 2.103 (iv),
φ(x) ≥ hξ, xi − φ∗ (ξ), ∀ x ∈ Rn ,
which implies that h(x) = hξ, xi − φ∗ (ξ) belongs to supp(φ, H). By Defini-
tion 2.101 of the biconjugate function,
φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)},
ξ∈Rn
which leads to φ∗∗ being the upper envelope of the continuous affine minorants
of φ. Applying Proposition 2.102 to φ∗ , φ∗∗ is a proper lsc convex function
and thus,
φ∗∗ ≤ cl φ ≤ φ.
This inequality along with Proposition 2.103 (i) leads to
(φ∗∗ )∗∗ ≤ (cl φ)∗∗ ≤ φ∗∗ .
As φ∗∗ and cl φ are both proper lsc convex functions, by Theorem 2.105,
(φ∗∗ )∗∗ = φ∗∗ and (cl φ)∗∗ = cl φ,
thereby reducing the preceding inequality to
φ∗∗ ≤ cl φ)∗∗ ≤ φ∗∗ .
Hence, φ∗∗ = cl φ, thereby establishing the first part of the result.
From Chapter 1, we know that closure of a function φ is defined as
cl φ(x̄) = lim inf φ(x),
x→x̄
which is the same as φ(x̄) if φ is lsc at x̄ by Definition 1.4, thereby yielding

φ(x̄) = cl φ(x̄). Consequently, by the first part, the lower semicontinuity of φ
at x̄ is equivalent to φ(x̄) = φ∗∗ (x̄), thereby completing the proof.
With all the preceding results, and discussions on the properties of the
conjugates and biconjugates, we now move on to see how the conjugates of
the function operations are defined. More precisely, if given some functions and
we perform some operation on them, like the sum operation or the supremum
operation, then how are their conjugates related to the conjugates of the
given functions? In the next result from Hiriart-Urruty and Lemaréchal [63]
and Rockafellar [97], we look into this aspect of conjugate functions.

Theorem 2.107 (i) (Inf-Convolution TRule) Consider proper functions

m
φi : Rn → R̄, i = 1, 2, . . . , m, satisfying i=1 dom φ∗i 6= ∅. Then
(φ1 φ2 . . . φm )∗ = φ∗1 + φ∗2 + . . . + φ∗m .
(ii) (Sum Rule) Consider Tm proper convex functions φi : Rn → R̄,

i = 1, 2, . . . , m, satisfying i=1 dom φi 6= ∅. Then
(cl φ1 + cl φ2 + . . . + cl φm )∗ = cl (φ∗1 φ∗2 . . . φ∗m ).

Tm
If i=1 ri dom φi 6= ∅, then
(φ1 + φ2 + . . . + φm )∗ = φ∗1 φ∗2 . . . φ∗m
and for every ξ ∈ dom (φ1 + φ2 + . . . + φm )∗ , the infimum of the problem
inf{φ∗1 (ξ1 ) + φ∗2 (ξ2 ) + . . . + φ∗m (ξm ) : ξ1 + ξ2 + . . . + ξm = ξ}
is attained.
(iii) (Infimum Rule) Consider a family of proper functions φi : Rn → R̄,
i ∈ I, where I is an arbitrary index set, having a common affine minorant
and satisfying supi∈I φ∗i (ξ) < +∞ for some ξ ∈ Rn . Then
(inf φi )∗ = sup φ∗i .

i∈I i∈I
(iv) (Supremum Rule) Consider a family of proper lsc convex functions

φi : Rn → R̄, i ∈ I, where I is an arbitrary index set. If supi∈I φi is not
indentically +∞, then
(sup φi )∗ = cl co(inf φ∗i ).

i∈I i∈I
Proof. (i) From Definition 2.101 of the conjugate function and Definition 2.54
of the inf-convolution along with Proposition 1.7,
(φ1 . . . φm )∗ (ξ) = sup {hξ, xi − inf (φ1 (x1 ) + . . . + φm (xm ))}

x∈Rn x1 +...+xm =x
= sup sup {hξ, xi − (φ1 (x1 ) + . . . + φm (xm ))}

x∈Rn x1 +...+xm =x
= sup {hξ, x1 i − φ1 (x1 ) + . . . +
x1 ,...,xm ∈Rn
hξ, xm i − φm (xm )}
= φ∗1 (ξ) + . . . + φ∗m (xm ),

(ii) Replacing φi by φ∗i for i = 1, 2, . . . , m, in (i) along with Proposition 2.106
leads to
cl φ∗1 + cl φ2 + . . . + cl φm = (φ∗1 φ∗2 . . . φ∗m )∗ .

Taking the conjugate on both sides and again applying Proposition 2.106
yields the requisite condition,
(cl φ∗1 + cl φ2 + . . . + cl φm )∗ = cl (φ∗1 φ∗2 . . . φ∗m ).

Tm
If i=1 ri dom φi is nonempty, then by Proposition 2.68,
cl φ1 + cl φ2 + . . . + cl φm = cl (φ1 + φ2 + . . . + φm ).
Also, by the definition of conjugate functions,
(cl φ1 + cl φ2 + . . . + cl φm )∗ = (cl (φ1 + φ2 + . . . + φm ))∗

= (φ1 + φ2 + . . . + φm )∗ .
Now to establish the result, it is enough to prove that
φ∗1 φ∗2 . . . φ∗m
is lsc. By Theorem 1.9, it is equivalent to showing that the lower-level set,
Sα = {ξ ∈ Rn : (φ∗1 . . . φ∗m )(ξ) ≤ α},
is closed for every α ∈ R. Consider a bounded sequence {ξk } ⊂ Sα such that

ξk → ξ. By Definition 2.54 of the inf-convolution, there exist ξki ∈ Rn with
P m i
i=1 ξk = ξk such that
1
φ∗1 (ξk1 ) + . . . + φ∗m (ξkm ) ≤ α + , ∀ k ∈ N. (2.33)
k
Tm
By assumption, suppose that x̂ ∈ i=1 ri dom φi . As φi , i = 1, 2, . . . , m, are
convex, by Theorem 2.69, the functions are continuous at x̂. Therefore, for
some ε > 0 and Mi ∈ R, i = 1, 2, . . . , m,
φi (x) ≤ Mi , ∀ x ∈ Bε (x̄), i = 1, 2, . . . , m. (2.34)
For any d ∈ Bε (0), consider
hξk1 , di = hξk1 , x̂i − hξk1 , x̂ − di

= hξk1 , x̂i + hξk2 , x̂ − di + . . . + hξkm , x̂ − di − hξk , x̂ − di,
which by the Fenchel–Young inequality, Proposition 2.103 (iv), and the

Cauchy–Schwarz inequality, Proposition 1.1, leads to
hξk1 , di ≤ φ∗1 (ξk1 ) + φ1 (x̂) + φ∗2 (ξk2 ) + φ2 (x̂ − d) + . . . +

φ∗m (ξkm ) + φm (x̂ − d) + kξk k kx̂ − dk.
By the conditions (2.33) and (2.34), the above inequality reduces to

1
hξk1 , di ≤ α + + M2 + . . . + Mm + kξk k kx̂ − dk,
k

which along with the boundedness of {ξk } implies that {ξk1 } ⊂ Rn is a

bounded sequence. Similarly, it can be shown that {ξki }, i = 2, . . . , m, are
bounded sequences. By the Bolzano–Weierstrass Theorem, Proposition 1.3,
{ξki }, i = 1, 2, . . . , m, have a convergent subsequence. Without loss of general-
ity, assume that ξki → ξi , i = 1, 2, . . . , m. Because ξk = ξk1 + ξk2 + . . . + ξkm , as
limit k → +∞,
ξ = ξ1 + ξ2 + . . . + ξm .
By Proposition 2.102, φ∗i , i = 1, 2, . . . , m, are proper lsc convex functions,

therefore taking the limit as k → +∞,
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) + . . . + φ∗m (ξm ) ≤ α.
By the definition of inf-convolution, the above inequality leads to
(φ∗1 φ∗2 . . . φ∗m )(ξ) ≤ α,
which implies that ξ ∈ Sα . Because α ∈ R was arbitrary, the lower-level set is

closed for every α ∈ R and hence
φ∗1 φ∗2 . . . φ∗m
is closed. Repeating the same arguments with
(φ∗1 φ∗2 . . . φ∗m )(ξ) = α and ξk = ξ
yields that the infimum is achieved, thereby completing the proof.

(iii) By Definition 2.101, for every ξ ∈ Rn ,
(inf φi )∗ (ξ) = sup {hξ, xi − inf φi (x)}

i∈I x∈Rn i∈I
= sup sup{hξ, xi − φi (x)}

x∈Rn i∈I
= sup sup {hξ, xi − φi (x)} = sup φ∗i (ξ),
i∈I x∈Rn i∈I
as desired.
(iv) Replacing φi by φ∗i for i = 1, 2, . . . , m in (iii),
sup φ∗∗ ∗ ∗
i = (inf φi ) .
i∈I i∈I
As φi , i = 1, 2, . . . , m, are lsc, the above condition reduces to
sup φi = (inf φ∗i )∗ .

i∈I i∈I
Taking the conjugate on both sides leads to
(sup φi )∗ = (inf φ∗i )∗∗ ,

i∈I i∈I

which by Theorem 2.104 yields

(sup φi )∗ = cl co (inf φ∗i ).
i∈I i∈I
Next, using the Fenchel–Young inequality, we present an equivalent char-

acterization of the subdifferential of a convex function.
Theorem 2.108 Consider a proper convex function φ : Rn → R̄. Then for

any x, ξ ∈ Rn
ξ ∈ ∂φ(x) ⇐⇒ φ(x) + φ∗ (ξ) = hξ, xi.
In addition, if φ is also lsc, then for any x and ξ in Rn
ξ ∈ ∂φ(x) ⇐⇒ φ(x) + φ∗ (ξ) = hξ, xi ⇐⇒ x ∈ ∂φ∗ (ξ).
Proof. Suppose that ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential

implies that
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .
hξ, x̄i − φ(x̄) ≥ sup {hξ, xi − φ(x)} = φ∗ (ξ),

x∈Rn
that is,
φ(x̄) + φ∗ (ξ) ≤ hξ, x̄i
which along with the Fenchel–Young inequality, Proposition 2.103 (iv), re-
duces to the desired condition
φ(x̄) + φ∗ (ξ) = hξ, x̄i.
Conversely, suppose that above condition is satisfied, which by Defini-

tion 2.101 of the conjugate function implies
hξ, x̄i − φ(x̄) ≥ hξ, xi − φ(x), ∀ x ∈ Rn ,
that is,
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .
Thus, ξ ∈ ∂φ(x̄), thereby establishing the equivalence.

Now if φ is lsc as well, then by Theorem 2.105, φ = φ∗∗ . Then the equivalent
condition can be expressed as ξ¯ ∈ ∂φ(x̄) if and only if
¯ = hξ,
φ∗∗ (x̄) + φ∗ (ξ) ¯ x̄i.

By Definition 2.101 of the biconjugate function, the above condition is equiv-

alent to
¯ x̄i − φ∗ (ξ)
hξ, ¯ ≥ hξ, x̄i − φ∗ (ξ), ∀ ξ ∈ Rn ,
that is,
¯ ≥ hξ − ξ,
φ∗ (ξ) − φ∗ (ξ) ¯ x̄i, ∀ ξ ∈ Rn .
¯ The converse can be worked
By the definition of subdifferential, x̄ ∈ ∂φ∗ (ξ).
out along the lines of the previous part and thus establishing the desired
relation.
As an application of the above theorem, consider a closed convex cone
K ⊂ Rn . We claim that ξ¯ ∈ ∂δK (x̄) if and only if x̄ ∈ ∂δK ◦ (ξ).
¯ Suppose that
¯
ξ ∈ ∂δK (x̄) = NK (x̄), which is equivalent to
¯ x − x̄i ≤ 0, ∀ x ∈ K.
hξ,
¯ x̄i = 0.
In particular, taking x = 0 and x = 2x̄, respectively, implies that hξ,
Therefore, the above inequality reduces to
¯ xi ≤ 0, ∀ x ∈ K,
hξ,
which by Definition 2.30 implies that ξ¯ ∈ K ◦ . Thus, ξ¯ ∈ NK (x̄) is equivalent
to
x̄ ∈ K, ξ¯ ∈ K ◦ , ¯ x̄i = 0.
hξ,
For a closed convex cone K, by Proposition 2.31, K ◦◦ = K. As x̄ ∈ K = K ◦◦ ,
hξ, x̄i ≤ 0, ∀ ξ ∈ K ◦ .
¯ x̄i = 0, the above condition is equivalent to
Because hξ,
¯ x̄i ≤ 0, ∀ ξ ∈ K ◦ ,
hξ − ξ,
¯ thereby proving our claim.
which implies that x̄ ∈ NK ◦ (ξ),
2.6 ε-Subdifferential
In Subsection 2.3.3 on differentiability properties of a convex function, from
Proposition 2.82 and the examples preceding it, we noticed that ∂φ(x) may
turn out to be empty, even though x ∈ dom φ. To overcome this aspect of
subdifferentials, the concept of the ε-subdifferential came into existence; it not
only overcomes the drawback of subdifferentials but is also important from
the optimization point of view. The idea can be found in the work of Brønsted
and Rockafellar [19] but the theory of ε-subdifferential calculus was given by
Hiriart-Urruty [58].

2.6 ε-Subdifferential 123
Definition 2.109 Consider a proper convex function φ : Rn → R̄. For ε > 0,

the ε-subdifferential of φ at x̄ ∈ dom φ is given by
∂ε φ(x̄) = {ξ ∈ Rn : φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn }.
For a zero function, 0 : Rn → R, defined as 0(x) = 0 for every x ∈ Rn ,
∂ε 0(x̄) = {0} for every ε > 0. Otherwise if there exists ξ ∈ ∂ε 0(x̄) with ξ =
6 0,
by the above definition of ε-subdifferential,
0 ≥ hξ, x − x̄i − ε
Xn
= ξi (xi − x̄i ) − ε, ∀ x ∈ Rn .
i=1
Because ξ 6= 0, there exists some j ∈ {1, 2, . . . , n} such that ξj 6= 0. In

2ε
particular, taking xj = x̄j + and xi = x̄i , i 6= j, the above inequality yields
ξj
ε ≤ 0, which is a contradiction.
As shown in Section 2.4 that for a convex set F ⊂ Rn , the subdifferential
of the indicator function coincides with the normal cone, that is, ∂δF = NF .
Along similar lines, we define the ε-normal set.
Definition 2.110 Consider a convex set F ⊂ Rn . Then for ε > 0, the
ε-subdifferential of the indicator function at x̄ ∈ F is
∂ε δF (x̄) = {ξ ∈ Rn : δF (x) − δF (x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn }
= {ξ ∈ Rn : ε ≥ hξ, x − x̄i, ∀ x ∈ F },
which is also called the ε-normal set and denoted as Nε,F (x̄). Note that Nε,F
is not a cone unlike NF , which is always a cone.
Recall the proper convex function φ : R → R̄ given by
√
− x, 0 ≤ x ≤ 1,
φ(x) =
+∞, otherwise,
considered in Subsection 2.3.3. As already mentioned, for x = 0, the subdif-
ferential ∂φ(x)
is empty.
But for any ε > 0, the ε-subdifferential at x = 0 is
−1
∂ε φ(x) = −∞, and hence nonempty.
2ε
In the proposition below we present some properties of the ε-subdifferential
of the convex functions.
Proposition 2.111 Consider a proper lsc convex function φ : Rn → R̄ and
let ε > 0 be given. Then for every x̄ ∈ dom φ, the ε-subdifferential ∂ε φ(x̄) is
a nonempty closed convex set and
\
∂φ(x̄) = ∂ε φ(x̄).
ε>0
For ε1 ≥ ε2 , ∂ε2 (x̄) ⊂ ∂ε1 (x̄).

Proof. Observe that for x̄ ∈ dom φ and ε > 0, φ(x̄) − ε < φ(x̄), which
implies (x̄, φ(x̄) − ε) 6∈ epi φ. Because φ is a lsc convex, by Theorem 1.9 and
Proposition 2.48, epi φ is closed convex set in Rn × R. Therefore, applying the
Strict Separation Theorem, Theorem 2.26 (iii), there exists (ξ, γ) ∈ Rn × R
with (ξ, γ) 6= (0, 0) such that
hξ, x̄i + γ(φ(x̄) − ε) < hξ, xi + γα, ∀ (x, α) ∈ epi φ.
As (x, φ(x)) ∈ epi φ for every x ∈ dom φ, the above condition leads to
hξ, x̄i + γ(φ(x̄) − ε) < hξ, xi + γφ(x), ∀ x ∈ dom φ. (2.35)
In particular, taking x = x̄ in the preceding inequality yields γ > 0. Now

dividing (2.35) throughout by γ implies that
ξ
h− , x − x̄i − ε < φ(x) − φ(x̄), ∀ x ∈ dom φ.
γ
The above condition is also satisfied by x 6∈ dom φ, which implies
ξ
h− , x − x̄i − ε < φ(x) − φ(x̄), ∀ x ∈ Rn .
γ
ξ
By Definition 2.109 of the ε-subdifferential, − ∈ ∂ε φ(x̄). Thus, ∂ε φ(x̄) is
γ
nonempty for every x̄ ∈ dom φ.
Suppose that {ξk } ⊂ ∂ε φ(x̄) such that ξk → ξ. By the definition of
ε-subdifferential,
φ(x) − φ(x̄) ≥ hξk , x − x̄i − ε, ∀ x ∈ Rn .
Taking the limit as k → +∞, the above inequality leads to
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,
which implies that ξ ∈ ∂ε φ(x̄), thereby yielding the closedness of ∂ε φ(x̄).

Consider ξ1 , ξ2 ∈ ∂ε φ(x̄), which implies that for i = 1, 2,
φ(x) − φ(x̄) ≥ hξi , x − x̄i − ε, ∀ x ∈ Rn .
Therefore, for any λ ∈ [0, 1],
φ(x) − φ(x̄) ≥ h(1 − λ)ξ1 + λξ2 , x − x̄i − ε, ∀ x ∈ Rn ,
which implies (1 − λ)ξ1 + λξ2 ∈ ∂ε φ(x̄). Because ξ1 , ξ2 were arbitrary, ∂ε φ(x̄)

is convex.
Now we will prove that
\
∂φ(x̄) = ∂ε φ(x̄).
ε>0

Suppose that ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential implies

that for every x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ, x − x̄i
≥ hξ, x − x̄i − ε, ∀ ε > 0.
Thus, by the definition of ε-subdifferential, ξ ∈ ∂ε φ(x̄) for every ε > 0. Because
ξ ∈ ∂φ(x̄) was arbitrary,
\
∂φ(x̄) ⊂ ∂ε φ(x̄).
ε>0
Conversely, consider ξ ∈ ∂ε φ(x̄) for every ε > 0, which implies that for
every x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ ε > 0.
As the preceding inequality holds for every ε > 0, taking the limit as ε → 0
leads to
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
thereby yielding ξ ∈ ∂φ(x̄). Because ξ was arbitrary, the reverse inclusion is
satisfied, that is,
\
∂φ(x̄) ⊃ ∂ε φ(x̄),
ε>0
hence establishing the result.

The relation ∂ε2 (x̄) ⊂ ∂ε1 (x̄) for ε1 ≥ ε2 can be easily worked out using
the definition of ε-subdifferential.
The proof of the nonemptiness of ε-subdifferential
T is from Lucchetti [79].
In the example, it is easy to observe that at x = 0, ε>0 ∂ε φ(x) is empty, as is
∂φ(x). Before moving any further, let us consider the absolute value function,
φ(x) = |x|. The subdifferential of φ is given by

 1, x > 0,
∂φ(x) = [−1, 1] , x = 0,

−1, x < 0.
Now for ε > 0, the ε-subdifferential of φ is

 [1 − ε/x, 1] , x > ε/2,
∂ε φ(x) = [−1, 1] , −ε/2 ≤ x ≤ ε/2,

[−1, −1 − ε/x] x < −ε/2.
The graphs of ∂φ and ∂ε φ for ε = 1 are shown in Figures 2.8 and 2.9. The
graph of the subdifferential is a simple step function.
Similar to the characterization of the subdifferential in terms of the
conjugate function, the following result provides a relation between the
ε-subdifferential and the conjugate function.

−1
FIGURE 2.8: Graph of ∂(|.|).
−1 1/2
−1/2 1
−1
FIGURE 2.9: Graph of ∂1 (|.|).

Theorem 2.112 Consider a proper convex function φ : Rn → R̄. Then for

any ε > 0 and x ∈ dom φ,
ξ ∈ ∂ε φ(x) ⇐⇒ φ(x) + φ∗ (ξ) − hξ, xi ≤ ε.
Proof. Consider any ξ ∈ ∂ε φ(x̄). By Definition 2.109 of ε-subdifferential,
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,
which implies that
hξ, xi − φ(x) + φ(x̄) − hξ, x̄i ≤ ε, ∀ x ∈ Rn .
By Definition 2.101 of the conjugate function,
φ∗ (ξ) + φ(x̄) − hξ, x̄i ≤ ε,
as desired.
Conversely, suppose that the inequality holds, which by the definition of
conjugate function implies that
hξ, xi − φ(x) + φ(x̄) − hξ, x̄i ≤ ε, ∀ x ∈ Rn ,
which yields that ξ ∈ ∂ε φ(x̄), thus establishing the equivalence.

As mentioned earlier, the notion of ε-subdifferential appears in the well-
known work of Brønsted and Rockafellar [19] in which they estimated how
well ∂ε φ “approximates” ∂φ. We present the modified version of that famous
Brønsted–Rockafellar Theorem from Thibault [108] below. The proof involves
the famous Ekeland’s Variational Principle [41, 42, 43], which we state without
proof before moving on with the result.
Theorem 2.113 (Ekeland’s Variational Principle) Consider a closed proper

function φ : Rn → R̄ and for ε > 0, let x̄ ∈ Rn be such that
φ(x̄) ≤ infn φ(x) + ε.

x∈R
Then for any λ > 0, there exists xλ ∈ Rn such that

ε
kxλ − x̄k ≤ , φ(xλ ) ≤ φ(x̄),
λ
and xλ is the unique minimizer of the unconstrained problem
ε
inf φ(x) + kx − xλ k subject to x ∈ Rn .
λ
Observe that the second condition in Ekeland’s Variational Principle,
φ(xλ ) ≤ φ(x̄), implies that
φ(xλ ) − φ(x̄) ≤ 0 ≤ ε. (2.36)

From the condition on x̄,
φ(x̄) ≤ infn φ(x) + ε ≤ φ(xλ ) + ε,

x∈R
which implies that
φ(x̄) − φ(xλ ) ≤ ε.
The above condition along with (2.36) leads to
|φ(xλ ) − φ(x̄)| ≤ ε.
Now to establish the modified version of Brønsted–Rockafellar Theorem, we

can apply
|φ(xλ ) − φ(x̄)| ≤ ε instead of φ(xλ ) ≤ φ(x̄)
in the Ekeland’s Variational Principle.
Theorem 2.114 (A modified version of the Brønsted–Rockafellar Theorem)

Consider a proper lsc convex function φ : Rn → R̄ and x̄ ∈ dom φ. Then for
any ε > 0 and for any ξ ∈ ∂ε φ(x̄), there exist xε ∈ Rn and ξε ∈ ∂φ(xε ) such
that
√ √
kxε − x̄k ≤ ε, kξε − ξk ≤ ε and |φ(xε ) − hξε , xε − x̄i − φ(x̄)| ≤ 2ε.
Proof. By Definition 2.109 of the ε-subdifferential, ξ ∈ ∂ε φ(x̄) implies
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,
that is,
φ(x̄) − hξ, x̄i ≥ φ(x) − hξ, xi + ε, ∀ x ∈ Rn .
By applying
√ √ 2.113, to φ − hξ, .i with
Ekeland’s Variational Principle, Theorem
λ = ε, there exists xε ∈ Rn such that kxε − x̄k ≤ ε,
|φ(xε ) − hξ, xε i − φ(x̄) + hξ, x̄i| ≤ ε (2.37)
and
√
φ(xε ) − hξ, xε i ≤ φ(x) − hξ, xi + εkx − xε k, ∀ x ∈ Rn . (2.38)
By the definition of subdifferential, the above condition (2.38) implies that

√
ξ ∈ ∂(φ + εk. − xε k)(xε ). (2.39)
As dom k. − xε k = Rn , by Theorem 2.69, k. − xε k is continuous on Rn .

Therefore, by Theorem 2.91 along with the fact that ∂(k. − xε k)(xε ) = B,
(2.39) becomes
√
ξ ∈ ∂φ(xε ) + εB.

√
Thus, there exists ξε ∈ ∂φ(xε ) such that kξε − ξk ≤ ε. From condition (2.37)
along with the Cauchy–Schwarz inequality, Proposition 1.1,
|φ(xε ) − hξ, xε − x̄i − φ(x̄)| ≤ ε + |hξε − ξ, xε − x̄i|

≤ ε + kξε − ξkkxε − x̄k = 2ε,

As in the study of optimality conditions, we need the subdifferential cal-
culus rules; similarly, ε-subdifferentials also play a pivotal role in this respect.
Below we present the ε-subdifferential Sum Rule, Max-Function and the Scalar
Product Rules that we will need in our study of optimality conditions for the
convex programming problem (CP ). The proofs of the Sum and the Max-
Function Rules are from Hiriart-Urruty and Lemaréchal [62].
Theorem 2.115 (Sum Rule) Consider two proper convex functions φi : Rn → R̄,
i = 1, 2 such that ri dom φ1 ∩ ri dom φ2 6= ∅. Then for ε > 0,
[
∂ε (φ1 + φ2 )(x̄) = (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄))
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
for every x̄ ∈ dom φ1 ∩ dom φ2 .
Proof. Suppose that ε1 ≥ 0 and ε2 ≥ 0 such that ε1 + ε2 = ε. Consider

ξi ∈ ∂εi φi (x̄), i = 1, 2, which by Definition 2.109 of the ε-subdifferential
implies that for every x ∈ Rn ,
φi (x) − φi (x̄) ≥ hξi , x − x̄i − εi , i = 1, 2.
The above condition along with the assumption ε1 + ε2 = ε leads to
(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ1 + ξ2 , x − x̄i − (ε1 + ε2 )

= hξ1 + ξ2 , x − x̄i − ε, ∀ x ∈ Rn ,
thereby yielding ξ1 + ξ2 ∈ ∂ε (φ1 + φ2 )(x̄). Because εi ≥ 0 and ξi ∈ ∂εi φi (x̄)

for i = 1, 2, were arbitrary,
[
∂ε (φ1 + φ2 )(x̄) ⊃ (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄))
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
Conversely, suppose that ξ ∈ ∂ε (φ1 + φ2 )(x̄), which by the definition of

ε-subdifferential implies that
(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn .
By Definition 2.101 of the conjugate function,
(φ1 + φ2 )∗ (ξ) + (φ1 + φ2 )(x̄) − hξ, x̄i ≤ ε. (2.40)

By the Sum Rule of the conjugate function, Theorem 2.107 (ii), as the as-
sumption ri dom φ1 ∩ ri dom φ2 6= ∅ holds,
(φ1 + φ2 )∗ (ξ) = (φ∗1 φ∗2 )(ξ),
and the infimum is attained, which implies there exist ξi ∈ Rn , i = 1, 2,

satisfying ξ1 + ξ2 = ξ such that
(φ1 + φ2 )∗ (ξ) = φ∗1 (ξ1 ) + φ∗2 (ξ2 ).
Therefore, the inequality (2.40) becomes
(φ∗1 (ξ1 ) + φ1 (x̄) − hξ1 , x̄i) + (φ∗2 (ξ2 ) + φ2 (x̄) − hξ2 , x̄i) ≤ ε.
Denote εi = φ∗i (ξi ) + φi (x̄) − hξi , x̄i, i = 1, 2, which by the Fenchel–Young

inequality, Proposition 2.103 (iv), implies that εi ≥ 0, i = 1, 2. Observe that
ε1 + ε2 ≤ ε. Again, by the definition of conjugate function,
φi (x) − φi (x̄) ≥ hξi , x − x̄i − εi

≥ hξi , x − x̄i − ε̄i ,
ε − ε1 − ε2
where ε̄i = εi + ≥ εi , i = 1, 2. Therefore, for i = 1, 2,
2
ξi ∈ ∂ε̄i φi (x̄),
with ε̄1 + ε̄2 = ε. Thus,
ξ = ξ1 + ξ2 ∈ ∂ε̄1 φ1 (x̄) + ∂ε̄2 φ2 (x̄).
Because ξ ∈ ∂ε (φ1 + φ2 )(x̄) was arbitrary,

[
∂ε (φ1 + φ2 )(x̄) ⊂ (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄)),
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

Before proving the ε-subdifferential Max-Function Rule, we state a result
from Hiriart-Urruty and Lemaréchal [62] without proof and present the Scalar
Product Rule.
Proposition 2.116 Consider proper convex functions φi : Rn → R̄,

i = 1, . . . , m. Let φ(x) = max{φ
Sm1 (x), φ2 (x), . . . , φm (x)} and p = min{m, n + 1}.
∗
For every ξ ∈ dom φP = co i=1 dom φ∗i , there exist ξi ∈ dom φ∗i and λi ≥ 0,
p
i = 1, 2, . . . , p, with i=1 λi = 1 such that
p
X p
X
φ∗ (ξ) = λi φ∗i (ξi ) and ξ= λi ξi .
i=1 i=1

More precisely, (ξi , λi ) solve the problem

p
X
inf λi φ∗i (ξi )
i=1
p
X p
X
subject to ξ = λi ξi , λi = 1, (P )
i=1 i=1
ξi ∈ dom φ∗i , λi ≥ 0, i = 1, 2, . . . , p.
For the ε-subdifferential Max-Function Rule, we will need the Scalar Prod-
uct Rule that we present below.
Theorem 2.117 (Scalar Product Rule) For a proper convex function

φ : Rn → R̄ and any ε ≥ 0,
∂ε (λg)(x̄) = λ∂ε/λ g(x̄), ∀ λ > 0.
Proof. Suppose that xi ∈ ∂ε (λφ)(x̄), which by Definition 2.109 implies that
(λφ)(x) − (λφ)(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn .
As λ > 0, dividing throughout by λ leads to
ξ ε
φ(x) − φ(x̄) ≥ h , x − x̄i − , ∀ x ∈ Rn ,
λ λ
ξ ε
which implies ∈ ∂ε̃ φ(x̄), where ε̃ = , that is, ξ ∈ λ∂ε̃ φ(x̄). Because
λ λ
ξ ∈ ∂ε (λφ)(x̄) was arbitrary,
∂ε (λg)(x̄) ⊂ λ∂ε̃ g(x̄).
Conversely, suppose that ξ ∈ λ∂ε̃ φ(x̄) for λ > 0, which implies there exists
ξ˜ ∈ ∂ε̃ φ(x̄) such that ξ = λξ.
˜ By the definition of ε-subdifferential,
˜ x − x̄i − ε̃, ∀ x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ,
which implies
(λφ)(x) − (λφ)(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,
where ε = λε̃. Therefore, ξ ∈ ∂ε (λφ)(x̄). Because ξ ∈ λ∂ε̃ φ(x̄) was arbitrary,
∂ε (λg)(x̄) ⊃ λ∂ε̃ g(x̄),
thereby yielding the desired result.

Now we proceed with establishing the ε-subdifferential Max-Function Rule
with the above results as the tool.

Theorem 2.118 (Max-Function Rule) Consider proper convex functions

φi : Rn → R̄, i = 1, 2, . . . , m. Let φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)} and
p = min{m, n + 1}. Then ξ ∈ ∂ε φ(x̄) if P and only if there exist ξi ∈ dom φ∗i ,
p
εi ≥ 0, and λi ≥ 0, i = 1, 2, . . . , p, with i=1 λi = 1 such that
ξi ∈ ∂εi /λi φi (x̄) for every i satisfying λi > 0, (2.41)

p
X p
X p
X
ξ= λi ξi and εi + φ(x̄) − λi φi (x̄) ≤ ε. (2.42)
i=1 i=1 i=1
Proof. By Proposition 2.116,

p
X
φ∗ (ξ) = λi φ∗i (ξi ),
i=1
where p = min{m, n + 1} and (ξi , λi ) ∈ dom φ∗i × R+ , i = 1, 2, . . . , p, solves

the problem (P ), that is, satisfies
p
X p
X
ξ= λi ξi and λi = 1.
i=1 i=1
By the relation between the ε-subdifferential and the conjugate function, The-
orem 2.112, as ξ ∈ ∂ε φ(x),
φ∗ (ξ) + φ(x) − hξ, xi ≤ ε,
which by the conditions on (ξi , λi ), i = 1, 2, . . . , p, along with the definition of

φ leads to
Xp Xp
λi φ∗i (ξi ) + φ(x) − λi hξi , xi ≤ ε. (2.43)
i=1 i=1
The above condition can be rewritten as

p
X p
X
εi + φ(x) − λi φi (x) ≤ ε,
i=1 i=1
where εi = λi (φ∗i (ξi ) + φi (x) − hξi , xi), i = 1, 2, . . . , p, which by Theorem 2.112

yields that ξi ∈ ∂εi /λi φi (x) provided λi > 0, thereby leading to the conditions
(2.41) and (2.42) as desired.
Conversely, suppose that the conditions (2.41) and (2.42) hold. By Theo-
rem 2.112, (2.41) implies that for λi > 0,
λi (φ∗i (ξi ) + φi (x) − hξi , xi) ≤ εi ,
which along with (2.42) lead to

p
X p
X
λi φ∗i (ξi ) + φ(x) − λi hξi , xi ≤ ε,
i=1 i=1

that is, the inequality (2.43). Invoking Proposition 2.116 yields

p
X
φ∗ (ξ) = λi φ∗i (ξi ),
i=1
which along with (2.43) and Theorem 2.112 implies that ξ ∈ ∂ε φ(x), thereby
completing the proof.
Remark 2.119 In the above result, applying the Scalar Product Rule, The-
orem 2.117, to the condition (2.41) implies that ξ˜i = λi ξi ∈ ∂εi (λi φi )(x) pro-
vided λi > 0. Therefore, ξ ∈ ∂ε φ(x) is such that there exist ξ˜i ∈ ∂εi (λi φi )(x),
i = 1, 2, . . . , p, satisfying
p
X p
X p
X
ξ= ξ˜i and εi + φ(x) − λi φi (x) ≤ ε.
i=1 i=1 i=1
As p = min{m, n + 1}, we consider two cases. If p = m, for some

j ∈ {1, 2, . . . , p}, define
p
X
ε̃j = εj + (ε − εi ) and ε̃i = εi , i 6= j
i=1
and the conditions become

ξ˜i ∈ ∂εi (λi φi )(x̄) for every i satisfying λi > 0, (2.44)
Xm Xm m
X
ξ= ˜
ξi and εi + φ(x̄) − λi φi (x̄) = ε. (2.45)
i=1 i=1 i=1
If p < m,Pdefine λi = 0 and Pp εi > 0 arbitrary for i = p + 1, p + 2, . . . , m,

m
such that i=p+1 εi = ε − j=1 εj . As already discussed, ∂εi (λi φi )(x) = {0},
i = p + 1, p + 2, . . . , m and hence yield the conditions (2.44) and (2.45). Thus,
∂ε φ(x), then there exist ξi ∈ dom φ∗i , εi ≥ 0 and λi ≥ 0, i = 1, 2, . . . , m,
if ξ ∈P
m
with i=1 λi = 1 such that the coditions (2.44) and (2.45) hold.
In particular, for a proper convex function φ : Rn → R̄ and
+
φ (x) = max{0, φ(x)},
∂ε (φ+ )(x̄) ⊂ {∂η (λφ)(x̄) : 0 ≤ λ ≤ 1, η ≥ 0, ε = η + φ+ (x̄) − λφ(x̄)}.
In the results stated above, the ε-subdifferential calculus rules were ex-
pressed in terms of the ε-subdifferential itself. Below we state a result by
Hiriart-Urruty and Phelps [64] relating the Sum Rule of the subdifferentials
and the ε-subdifferentials.
Theorem 2.120 Consider two proper lsc convex functions φi : Rn → R̄,
i = 1, 2. Then for any x̄ ∈ dom φ1 ∩ dom φ2 ,
\
∂(φ1 + φ2 )(x̄) = cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)).
ε>0

Proof. Suppose that ξi ∈ ∂ε φi (x̄), i = 1, 2, which implies
φi (x) − φi (x̄) ≥ hξi , x − x̄i − ε, ∀ x ∈ Rn , i = 1, 2.
Therefore,
(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ1 + ξ2 , x − x̄i − 2ε, ∀ x ∈ Rn ,
that is, ξ1 + ξ2 ∈ ∂2ε (φ1 + φ2 )(x̄). Because ξi ∈ ∂ε φi (x̄), i = 1, 2, are arbitrary,

which along with the closedness of ε-subdifferential by Proposition 2.111 yields
∂2ε (φ1 + φ2 )(x̄) ⊃ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)).
Further, applying Proposition 2.111 leads to

\
∂(φ1 + φ2 )(x̄) ⊃ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)).
ε>0
To establish the result, we shall prove the reverse containment in the above
condition. Suppose that ξ¯ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.108,
¯ = hξ,
(φ1 + φ2 )(x̄) + (φ1 + φ2 )∗ (ξ) ¯ x̄i,
which along with the Fenchel–Young inequality, Proposition 2.103 (iv), implies
that
¯ ≤ hξ,
(φ1 + φ2 )(x̄) + (φ1 + φ2 )∗ (ξ) ¯ x̄i.
Applying the Sum Rule of conjugate functions, Theorem 2.107 (ii), to proper
lsc convex functions φ1 and φ2 leads to
(φ1 + φ2 )∗ = cl (φ∗1 φ∗2 ).
Define
φ(ξ) = (φ∗1 φ∗2 )(ξ) − hξ, x̄i,
which implies cl φ = cl (φ∗1 φ∗2 ) − h., x̄i. By the preceding conditions,

¯ ≤ α. It is easy to observe that
denoting α = −(φ1 + φ2 )(x̄) yields φ(ξ)
\
{ξ ∈ Rn : cl φ(ξ) ≤ α} = cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}.
ε>0
Therefore, for every ε > 0,
ξ¯ ∈ cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}.
If φ(ξ) ≤ α + ε/2, then
φ(ξ) − α = inf {φ∗1 (ξ1 ) + φ∗2 (ξ2 ) − hξ1 , x̄i − hξ2 , x̄i + φ1 (x̄) + φ2 (x̄)}
ξ=ξ1 +ξ2
= inf {(φ∗1 (ξ1 ) − hξ1 , x̄i + φ1 (x̄)) + (φ∗2 (ξ2 ) − hξ2 , x̄i + φ2 (x̄))}.
ξ=ξ1 +ξ2

Therefore, there exist ξ1 , ξ2 such that ξ = ξ1 + ξ2 and
(φ∗1 (ξ1 ) − hξ1 , x̄i + φ1 (x̄)) + (φ∗2 (ξ2 ) − hξ2 , x̄i + φ2 (x̄))} < ε.
By the Fenchel–Young inequality,
φ∗i (ξi ) − hξi , x̄i + φi (x̄) ≥ 0, i = 1, 2,
which along with Definition 2.101 of the conjugate and the preceding condi-
tions imply that
hξi , x − x̄i − φ(x) + φi (x̄) ≤ ε, ∀ x ∈ Rn , i = 1, 2,
that is, ξi ∈ ∂εi φi (x̄) for i = 1, 2. Thus,
cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2} ⊂ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)),
which implies ξ¯ ∈ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)) for every ε > 0. As ξ¯ ∈ ∂(φ1 + φ2 )(x̄)
was arbitrary,
\
∂(φ1 + φ2 )(x̄) ⊂ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)),
ε>0
thus establishing the result.
Now if one goes back to the optimality condition 0 ∈ ∂φ(x̄) in Theo-

rem 2.89, it gives an equivalent characterization to the point of minimizer x̄
of the unconstrained problem (CPu ). So one will like to know then what the
condition 0 ∈ ∂ε φ(x̄) implies. As it turns out, it leads to the concept of ap-
proximate optimality conditions, which we will deal with in one of the later
chapters. For now we simply state the result on approximate optimality for
the unconstrained convex programming problem (CPu ).
Theorem 2.121 Consider a proper convex function φ : Rn → R̄ and let ε > 0

be given. Then 0 ∈ ∂ε φ(x̄) if and only if
φ(x̄) ≤ infn φ(x) + ε.

x∈R
The point x̄ is called an ε-solution of (CPu ).
In the above theorem we mentioned only one of the notions of approximate

solutions, namely the ε-solution. But there are other approximate solution
concepts, as we shall see later in the book, some of which are motivated by
the Ekeland’s Variational Principle, Theorem 2.113.

2.7 Epigraphical Properties of Conjugate Functions

With the study of conjugate function and ε-subdifferential, we are now in a
position to present the relation of the epigraph of conjugate functions with the
ε-subdifferentials of a convex function from Jeyakumar, Lee, and Dinh [68].
This relation plays an important part in the study of sequential optimality
conditions as we shall see in the chapter devoted to its study.
Theorem 2.122 Consider a proper lsc convex function φ : Rn → R̄ and let

[
epi φ∗ = {(ξ, hξ, x̄ − φ(x̄) + ε) : ξ ∈ ∂ε φ(x̄)}.
ε≥0
Proof. Denote
[
F= {(ξ, hξ, x̄ − φ(x̄) + ε) : ξ ∈ ∂ε φ(x̄)}.
ε≥0
Suppose that (ξ, α) ∈ epi φ∗ , which implies φ∗ (ξ) ≤ α. By Definition 2.101 of

the conjugate function,
hξ, xi − φ(x) ≤ α, ∀ x ∈ Rn .
Denoting ε = α − hξ, x̄i + φ(x̄), the above inequality becomes
φ(x) − φ(x̄) ≥ hξ, xi − φ(x̄) − α

= hξ, x − x̄i − ε, ∀ x ∈ Rn ,
which by Definition 2.109 of the ε-subdifferential implies that ξ ∈ ∂ε φ(x̄).

Therefore, (ξ, α) ∈ F. Because (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ F.
Conversely, suppose that (ξ, α) ∈ F, which implies there exists ε ≥ 0 and
x̄ ∈ dom ∂φ with
ξ ∈ ∂ε φ(x̄) and α = hξ, x̄i − φ(x̄) + ε.
As ξ ∈ ∂ε φ(x̄), by the definition of ε-subdifferential,
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,
which by the definition of conjugate function leads to
φ∗ (ξ) ≤ hξ, x̄i − φ(x̄) + ε = α.
Thus, (ξ, α) ∈ epi φ∗ . Because (ξ, α) ∈ F was arbitrary, epi φ∗ ⊃ F, thereby

establishing the result.
Next we discuss the epigraphical conditions for the operations of the con-
jugate functions.

2.7 Epigraphical Properties of Conjugate Functions 137
Theorem 2.123 (i) (Sum Rule) Consider proper lsc convex functions
φi : Rn → R̄, i = 1, 2, . . . , m. Then
epi(φ1 + φ2 + . . . + φm )∗ = cl (epi φ∗1 + epi φ∗2 + . . . + epi φ∗m ).
(ii) (Supremum Rule) Consider a family of proper lsc convex functions

φi : Rn → R̄, i ∈ I, where I is an arbitrary index set. Then
[
epi (sup φi )∗ = cl co epi φ∗i .
i∈I
i∈I
(iii) Consider proper lsc convex functions φi : Rn → R, i = 1, 2, . . . , m.

Define a vector-valued convex function Φ : Rn → Rm , defined as
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)). Then
[
epi (λΦ)∗ is a convex cone.
λ∈Rm
+
(iv) Consider a proper lsc convex function φ : Rn → R. Then for every λ > 0,
epi (λφ)∗ = λ epi φ∗ .
Proof. (i) As φi , i = 1, 2, . . . , m are proper lsc convex functions, the condition

of Theorem 2.107 (ii) reduces to
(φ1 + φ2 + . . . + φm )∗ = cl (φ∗1 φ∗2 . . . φ∗m ),
which implies
epi (φ1 + φ2 + . . . + φm )∗ = epi cl (φ∗1 φ∗2 . . . φ∗m ).
By Definition 1.11 of the closure of a function, the above condition becomes
epi (φ1 + φ2 + . . . + φm )∗ = cl epi (φ∗1 φ∗2 . . . φ∗m ),
which by Proposition 2.55 leads to the desired condition.

(ii) Theorem 2.107 (iv) along with Definition 2.57 of the closed convex hull of
a function implies that
[
epi (sup φi )∗ = epi cl co (inf φ∗i ) = cl co epi φ∗i ,
i∈I i∈I
i∈I

S
(iii) Suppose that (ξ, α) ∈ λ∈Rm (λΦ)∗ , which implies that there exists
+
λ′ ∈ Rm ′ ∗
+ such that (ξ, α) ∈ epi (λ Φ) . This along with Definition 2.101 of
the conjugate function leads to
hξ, xi − (λ′ Φ)(x) ≤ (λ′ Φ)∗ (ξ) ≤ α, ∀ x ∈ Rn .

Multiplying throughout by any γ > 0,
hγξ, xi − ((γλ′ )Φ)(x) ≤ γα, ∀ x ∈ Rn ,
where γλ′ ∈ Rm + . Again, by the definition of conjugate function, the above

condition leads to
((γλ′ )Φ)∗ (γξ) ≤ γα,
which implies that

[
γ(ξ, α) ∈ epi ((γλ′ )Φ)∗ ⊂ epi (λΦ)∗ , ∀ γ > 0.
λ∈Rm
+
S
Hence, λ∈Rm epi (λΦ)∗ is a cone.
+ S
Now consider (ξi , αi ) ∈ λ∈Rm epi (λΦ)∗ , i = 1, 2, which implies there
+
∗
exist λi ∈ Rm + such that (ξi , αi ) ∈ epi (λi Φ) for i = 1, 2. Therefore, by the
definition of conjugate function, for every x ∈ Rn ,
hξi , xi − (λi Φ)(x) ≤ αi , i = 1, 2.
For any γ ∈ [0, 1], the above condition leads to
h(1 − γ)ξ1 + γξ2 , xi − (λ′ Φ)(x) ≤ (1 − γ)α1 + γα2 , ∀ x ∈ Rn ,
where λ′ = (1 − γ)λ1 + γλ2 ∈ Rm

+ . Therefore,
(λ′ Φ)∗ ((1 − γ)ξ1 + γξ2 ) ≤ (1 − γ)α1 + γα2 ,
which implies that

[
(1 − γ)(ξ1 , α1 ) + γ(ξ2 , α2 ) ∈ epi (λ′ Φ)∗ ⊂ epi (λΦ)∗ , ∀ γ ∈ [0, 1].
λ∈Rm
+
S
Because (ξi , αi ), i = 1, 2, were arbitrary, thus λ∈Rm epi (λΦ)∗ is a convex
+
set.
(iv) Suppose that (ξ, α) ∈ epi (λφ)∗ , which implies that (λφ)∗ (ξ) ≤ α. As
λ > 0, Proposition 2.103 (iii) leads to

∗ ξ α
φ ≤ ,
λ λ

ξ α
which implies , ∈ epi φ∗ , that is, (ξ, α) ∈ λ epi φ∗ . Because
λ λ
(ξ, α) ∈ epi (λφ)∗ was arbitrary, epi (λφ)∗ ⊂ λepi φ∗ . The reverse inclusion
can be obtained by following the steps backwards, thereby establishing the
result.

From the above theorem, for two proper lsc convex functions φi : Rn → R̄,
i = 1, 2,
epi(φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ).
In general, epi φ∗1 + epi φ∗2 need not be closed. But under certain additional
conditions, it can be shown that epi φ∗1 + epi φ∗2 is closed. We present below
the result from Burachik and Jeyakumar [20] and Dinh, Goberna, López, and
Son [32] to establish the same.
Proposition 2.124 Consider two proper lsc convex functions φi : Rn → R̄,

i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. If cone(dom φ1 − dom φ2 ) is a
closed subspace or at least one of the functions is continuous at some point in
dom φ1 ∩ dom φ2 , then epi φ∗1 + epi φ∗2 is closed.
Proof. As cone(dom φ1 − dom φ2 ) is a closed subspace, by Theorem 1.1 of

Attouch and Brézis [2] or Theorem 3.6 of Strömberg [107], the exact infimal
convolution holds, that is,
(φ1 + φ2 )∗ = φ∗1 φ∗2 .
The above condition along with Theorem 2.123 (i) leads to
cl (epi φ∗1 + epi φ∗2 ) = epi (φ1 + φ2 )∗ = epi (φ∗1 φ∗2 ) = epi φ∗1 + epi φ∗2 ,
thereby yielding the result that epi φ∗1 + epi φ∗2 is closed.
Suppose that φ1 is continuous at x̂ ∈ dom φ1 ∩ dom φ2 , which yields
0 ∈ core (dom φ1 − dom φ2 ).
This implies that cone (dom φ1 − dom φ2 ) is a closed subspace and thus leads
to the desired result.
Note that the result gives only sufficient condition for the closedness of
epi φ∗1 + epi φ∗2 . The converse need not be true. For a better understanding,
we consider the following example from Burachik and Jeyakumar [20]. Let
φ1 = δ[0,∞) and φ2 = δ(−∞,0] . Therefore,
epi φ∗1 = epi σ[0,∞) = R− × R+ and epi φ∗2 = epi σ(−∞,0] = R+ × R+ ,
which leads to epi φ∗1 + epi φ∗2 = R × R+ , a closed convex cone. Observe that
cone(dom φ1 − dom φ2 ) = [0, ∞), which is not a subspace, and also neither
φ1 nor φ2 are continuous at dom φ1 ∩ dom φ2 = {0}. Thus, the condition,
epi φ∗1 + epi φ∗2 is closed, is a relaxed condition in comparison to the other
assumptions.
Using this closedness assumption, Burachik and Jeyakumar [21] obtained
an equivalence between the exact inf-convolution and ε-subdifferential Sum
Rule, which we present next.


i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. Then the following are equivalent:
(i) (φ1 + φ2 )∗ = φ∗1 φ∗2 with exact infimal convolution,
(ii) For every ε ≥ 0 and every x̄ ∈ dom φ1 ∩ dom φ2 ,

[
∂ε (φ1 + φ2 )(x̄) = (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄)).
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
(iii) epi φ∗1 + epi φ∗2 is closed,
Proof. (i) ⇒ (ii): The proof follows along the lines of Theorem 2.115.
(ii) ⇒ (iii): Suppose that (ξ, γ) ∈ cl (epi φ∗1 + epi φ∗2 ). By Theorem 2.123 (i),
(ξ, γ) ∈ epi (φ1 + φ2 )∗ . Let x̄ ∈ dom φ1 ∩ dom φ2 . By Theorem 2.122, there
exists ε ≥ 0 such that
ξ ∈ ∂ε (φ1 + φ2 )(x̄) and γ = hξ, x̄i − (φ1 + φ2 )(x̄) + ε.
By (ii), there exist εi ≥ 0 and ξi ∈ ∂εi φi (x̄), i = 1, 2, such that
ξ = ξ1 + ξ2 and ε = ε 1 + ε2 .
Define γi = hξi , x̄i−φi (x̄)+εi , i = 1, 2. Then from Theorem 2.122, for i = 1, 2,

(ξi , γi ) ∈ epi φ∗i , which implies
(ξ, γ) = (ξ1 , γ1 ) + (ξ2 , γ2 ) ∈ epi φ∗1 + epi φ∗2 ,
thereby leading to (iii).

(iii) ⇒ (i): Suppose that there exists ξ ∈ Rn such that ξ ∈ dom (φ1 + φ2 )∗ (ξ).
Otherwise (i) holds trivially. By (iii),
epi (φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ) = epi φ∗1 + epi φ∗2 ,
which implies (ξ, (φ1 + φ2 )∗ (ξ)) ∈ epi φ∗1 + epi φ∗2 . Thus for i = 1, 2, there
exist (ξi , γi ) ∈ epi φ∗i such that
ξ = ξ1 + ξ2 and (φ1 + φ2 )∗ = γ1 + γ2 ,
which implies there exists ξ¯ ∈ Rn such that

¯ + φ∗ (ξ)
φ∗1 (ξ − ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ).
2
Therefore,
¯ + φ∗ (ξ)
(φ∗1 φ∗2 )(ξ) ≤ φ∗1 (ξ − ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ).
2

By Theorem 2.107 and (iii),
(φ1 + φ2 )∗ (ξ) = cl (φ∗1 φ∗2 )(ξ) ≤ (φ∗1 φ∗2 )(ξ),
which along with the preceding condition leads to the exact infimal convolu-
tion, thereby establishing (i).
Though it is obvious that under the closedness of epi φ∗1 + epi φ∗2 , one
can obtain the subdifferential Sum Rule by choosing ε = 0 in (ii) of the above
theorem, we present a detailed version of the result from Burachik and Jeyaku-
mar [20]. Below is an alternative approach to the Sum Rule, Theorem 2.91.

i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. If epi φ∗1 + epi φ∗2 is closed, then
∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄), ∀ x̄ ∈ dom φ1 ∩ dom φ2 .
Proof. Let x̄ ∈ dom φ1 ∩ dom φ2 . It is easy to observe that
∂(φ1 + φ2 )(x̄) ⊃ ∂φ1 (x̄) + ∂φ2 (x̄).
To prove the result, we shall prove the converse inclusion. Suppose that
ξ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.108,
(φ1 + φ2 )∗ (ξ) + (φ1 + φ2 )(x̄) = hξ, x̄i.
Therefore, the above condition along with the given hypothesis
(ξ, hξ, x̄i − (φ1 + φ2 )(x̄)) ∈ epi (φ1 + φ2 )∗ = epi φ∗1 + epi φ∗2 ,
which implies that there exist (ξi , γi ) ∈ epi φ∗i , i = 1, 2, such that
ξ = ξ1 + ξ2 and hξ, x̄i − (φ1 + φ2 )(x̄) = γ1 + γ2 .
Also, as (ξi , γi ) ∈ epi φ∗i , i = 1, 2, which along with the above conditions lead
to
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≤ hξ, x̄i − (φ1 + φ2 )(x̄) = hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄).
By the Fenchel–Young inequality, Proposition 2.103 (iv),
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≥ hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄),
which together with the preceding inequality leads to
φ∗1 (ξ1 ) + φ∗2 (ξ2 ) = hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄).
Again by the Fenchel–Young inequality and the above equation,
φ∗1 (ξ1 ) + φ1 (x̄) − hξ1 , x̄i = hξ2 , x̄i − φ2 (x̄) − φ∗2 (ξ2 ) ≤ 0,

which by Theorem 2.108 yields ξ1 ∈ ∂φ1 (x̄). Along similar lines it can be
obtained that ξ2 ∈ ∂φ2 (x̄). Thus,
ξ = ξ1 + ξ2 ∈ ∂φ1 (x̄) + ∂φ2 (x̄),
which implies that
∂(φ1 + φ2 )(x̄) ⊂ ∂φ1 (x̄) + ∂φ2 (x̄),

We end this chapter with an application of Theorem 2.126 to provide an
alternative assumption to establish equality in Proposition 2.39 (i).
Corollary 2.127 Consider convex sets F1 , F2 ⊂ Rn such that F1 ∩ F2 6= ∅.

If epi σF1 + epi σF2 is closed, then
NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 .
Proof. We know that for any convex set F ⊂ Rn , δF∗ = σF . Thus the condition
epi σF1 + epi σF2 is closed
is equivalent to
epi δF∗ 1 + epi δF∗ 2 is closed.
As the condition for Theorem 2.126 to hold is satisfied,
∂(δF1 + δF2 )(x̄) = ∂δF1 (x̄) + ∂δF2 (x̄), x̄ ∈ dom δF1 ∩ dom δF2 .
Because δF1 + δF2 = δF1 ∩F2 , the above equality condition along with the fact
that for any convex set F ⊂ Rn , ∂δF = NF , yields the desired result, that is,
NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 .

Chapter 3
Basic Optimality Conditions Using
the Normal Cone
3.1 Introduction
Recall the convex optimization problem presented in Chapter 1
n n
where f : R → R is a convex function and C ⊂ R is a convex set. It is nat-
ural to think that f ′ (x, h) and ∂f (x) will play a major role in the process of
establishing the optimality conditions as these objects have been successful in
overcoming the difficulty posed by the absence of a derivative. In this chapter
we will not bother ourselves with extended-valued function but such a frame-
work can be easily adapted into the current framework. But use of extended-
valued convex functions might appear while doing some of the proofs, as one
will need to use the calculus rules for subdifferentials like the Sum Rule or the
Chain Rule. To begin our discussion more formally, we right away state the
following basic result.
Theorem 3.1 Consider the convex optimization problem (CP ). Then x̄ is a

point of minimizer of (CP ) if and only if either of two conditions hold:
(i) f ′ (x̄, d) ≥ 0, ∀ d ∈ TC (x̄) or,

(ii) 0 ∈ ∂f (x̄) + NC (x̄).
Proof. (i) As x̄ ∈ C and C is a convex set,
x̄ + λ(x − x̄) ∈ C, ∀ x ∈ C, ∀ λ ∈ [0, 1].
Also, as x̄ is a point of minimizer of (CP ), then for every x ∈ C
f (x̄ + λ(x − x̄)) ≥ f (x̄), ∀ λ ∈ [0, 1].
Therefore,
f (x̄ + λ(x − x̄)) − f (x̄)

lim ≥ 0,
λ↓0 λ
143

144 Basic Optimality Conditions Using the Normal Cone
which implies
f ′ (x̄, x − x̄) ≥ 0, ∀ x ∈ C.
By Theorem 2.76, the directional derivative is sublinear in the direction and

thus
f ′ (x̄, d) ≥ 0, ∀ d ∈ cl cone(C − x̄).
By Theorem 2.35, TC (x̄) = cl cone(C − x̄) and therefore, the above inequality
becomes
f ′ (x̄, d) ≥ 0, ∀ d ∈ TC (x̄).
Conversely, suppose condition (i) holds. As f : Rn → R, applying Propo-

sition 2.83 and Theorem 2.79, for any d ∈ Rn there exists ξ ∈ ∂f (x̄) such
that
hξ, di = f ′ (x̄, d).
For every x ∈ C, x − x̄ ∈ TC (x̄). Therefore, the convexity of f along with the

above condition and condition (i) implies that for every x ∈ C, there exists
ξ ∈ ∂f (x̄) such that
f (x) − f (x̄) ≥ hξ, x − x̄i ≥ 0, ∀ x ∈ C,
thereby proving that x̄ is the point of minimizer of f over C.

(ii) As x̄ is a point of minimizer of f over C, we have that x̄ also solves the
problem
min (f + δC )(x).
x∈Rn
Hence, by the optimality conditions for the unconstrained optimization prob-

lem, Theorem 2.89,
0 ∈ ∂(f + δC )(x̄).
Because x̄ ∈ dom δC , by Proposition 2.14, ri dom δC = ri C is nonempty.

Also ri dom f = Rn and hence ri dom f ∩ ri dom δC 6= ∅. Now using the Sum
Rule, Theorem 2.91,
0 ∈ ∂f (x̄) + ∂δC (x̄),
which by the fact that ∂δC (x) = NC (x) leads to
0 ∈ ∂f (x̄) + NC (x̄).

3.2 Slater Constraint Qualification 145
Conversely, suppose that condition (ii) is satisfied, which means that there
exists ξ ∈ ∂f (x̄) such that −ξ ∈ NC (x̄), that is,
hξ, x − x̄i ≥ 0, ∀ x ∈ C.
Therefore by the convexity of f , which along with the above inequality yields
f (x) ≥ f (x̄), ∀ x ∈ C,
By condition (ii) of the above theorem, there exists ξ ∈ ∂f (x̄) such that
h−ξ, xi ≤ h−ξ, x̄i, ∀ x ∈ C.
As x̄ ∈ C, the above condition yields that the support function to the set C
at −ξ is given by
σC (−ξ) = −hξ, x̄i.
Thus, condition (ii) is equivalent to the above condition.
Again, by condition (ii) of Theorem 3.1, there exists ξ ∈ ∂f (x̄) such that
−ξ ∈ NC (x̄), which can be equivalently expressed as
h(x̄ − αξ) − x̄, x − x̄i ≤ 0, ∀ x ∈ C, ∀ α ≥ 0.
Therefore, by Proposition 2.52, condition (ii) is equivalent to
x̄ = projC (x̄ − αξ), ∀ α ≥ 0.
We state the above discussion as the following result.
Theorem 3.2 Consider the convex optimization problem (CP ). Then x̄ is a
point of minimizer of (CP ) if and only if there exists ξ ∈ ∂f (x̄) such that
either σC (−ξ) = −hξ, x̄i or x̄ = projC (x̄ − αξ), ∀ α ≥ 0.
3.2 Slater Constraint Qualification

Now consider the case where C is represented only through convex inequality
constraints. Observe that the equality affine constraints of the form
hj (x) = 0, j = 1, 2, . . . , l,
n
where hj : R → R, j = 1, 2, . . . , l, are affine functions can also be expressed
in the convex inequality form as
hj (x) ≤ 0, j = 1, 2, . . . , l,
−hj (x) ≤ 0, j = 1, 2, . . . , l.

Thus, we define
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, (3.1)
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. In practice, this

is most often the case. In order to write an explicit optimality condition we
need to compute NC (x̄) and express it in terms of the constraint functions
gi , i = 1, 2, . . . , m. So how do we do that? In this respect, we present the
following result.
Proposition 3.3 Consider the set C as in (3.1). Assume that the active index
set at x̄, that is,
I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0}
is nonempty. Let the Slater constraint qualification hold, that is, there exists
x̂ ∈ Rn such that gi (x̂) < 0, for i = 1, 2, . . . , m. Then
 
 X 
NC (x̄) = λi ξi ∈ Rn : ξi ∈ ∂gi (x̄), λi ≥ 0, i ∈ I(x̄) .
 
i∈I(x̄)
In order to prove the above proposition, we need to do a bit of work, which

we will do step by step. Denote the set on the right-hand side of the equality
by
 
 X 
b
S(x̄) = λi ξi ∈ Rn : ξi ∈ ∂gi (x̄), λi ≥ 0, i ∈ I(x̄) . (3.2)
 
i∈I(x̄)
One might get curious as to what are these λi , i = 1, 2, . . . , m, in the expres-

b
sion of the elements of S(x̄). These are the Lagrange multipliers, vital stuff in
optimization and that we need to discuss more in detail. In order to establish
b
Proposition 3.3, that is, NC (x̄) = S(x̄), we will prove that S(x̄)b is a closed
convex cone for which we need the following lemma whose proof is as given in
van Tiel [110].
Proposition 3.4 Consider a nonempty compact set A ⊂ Rn with 0 6∈ A. Let

K be the cone generated by A, that is,
K = coneA = {λa ∈ Rn : λ ≥ 0, a ∈ A}.
Then K is a closed set.
Proof. Consider a sequence {xk } ⊂ K such that xk → x̃. To prove the result,
we need to show that x̃ ∈ K. As xk ∈ K, there exist λk ≥ 0 and ak ∈ A
such that xk = λk ak for every k. Because A is compact, {aν } is a bounded
sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, {ak } has a

convergent subsequence. Thus, without loss of generality, let ak → ã and as

A is closed, ã ∈ A. Because 0 6∈ A, it is simple to observe that there exists
α > 0 such that kak ≥ α for every a ∈ A. Hence,
|λk |kak k 1
|λk | = ≤ kλk ak k.
kak k α
As λk ak → x̃, kλk ak k is bounded, thereby implying that {λk } is a bounded

sequence, that by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a
convergent subsequence. Without loss of generality, assume that λk → λ̃. This
shows that
xk = λk ak → λ̃ã as k → +∞.
By the assumption xk → x̃ and as the limit is unique, λ̃ã = x̃. Hence x̃ ∈ K,

b
We will now show that the set S(x̄) is a closed convex cone. This fact will
play a major role in the proof of Proposition 3.3.
Lemma 3.5 Assume that I(x̄) is nonempty and the Slater constraint quali-
b
fication holds. Then the set S(x̄) given by (3.2) is a closed convex cone.
Proof. Observe that S(x̄) b b

is a cone. To prove the convexity of S(x̄), let
b P j j j j
v1 , v2 (6= 0) ∈ S(x̄). Then vj = i∈I(x̄) λi ξi where λi ≥ 0 and ξi ∈ ∂gi (x̄),
b
i ∈ I(x̄) for j = 1, 2. As S(x̄) is a cone, to show that it is convex, by Theo-
b
rem 2.20 we just have to show that v1 + v2 ∈ S(x̄). Consider
X
v1 + v2 = λ1i ξi1 + λ2i ξi2
i∈I(x̄)
X
λ1i λ2i
= (λ1i + λ2i ) ξ 1
+ ξ 2
.
λ1i + λ2i i λ1i + λ2i i
i∈I(x̄)
Because ∂gi (x̄) is a convex set,
λ1i λ2
ξi1 + 1 i 2 ξi2 ∈ ∂gi (x̄).
λ1i + λi2 λi + λi
b
Hence, v1 + v2 ∈ S(x̄).
b
Finally, we have to show that S(x̄) is closed. Consider the function
g(x) = max{g1 (x), g2 (x), . . . , gm (x)}.
Moreover, as I(x̄) is nonempty, g(x̄) = 0 with J(x̄) = I(x̄), where
J(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = g(x̄)}.

Further, from the Max-Function Rule, Theorem 2.96,

 
[
∂g(x̄) = co  ∂gi (x̄) . (3.3)
i∈I(x̄)
b
We claim that S(x̄) = cone(∂g(x̄)), that is,
b
S(x̄) = {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}. (3.4)
b
But before showing that S(x̄) is given as above and applying Proposition 3.4
b
to conclude that S(x̄) is closed, we first need to show that 0 6∈ ∂g(x̄).
As the Slater constraint qualification holds, there exists x̂ such that
gi (x̂) < 0 for every i = 1, 2, . . . , m. Hence g(x̂) < 0. By the convexity of
g,
hξ, x̂ − x̄i ≤ g(x̂) − g(x̄), ∀ ξ ∈ ∂g(x̄).
Because J(x̄) = I(x̄) is nonempty, for every ξ ∈ ∂g(x̄),
hξ, x̂ − x̄i < 0.
As x̂ 6= x̄, it is clear that 0 6∈ ∂g(x̄). Otherwise, if 0 ∈ ∂g(x̄), the above inequal-

ity will be violated. Hence, observe that 0 6∈ ∂gi (x̄) for every i ∈ J(x̄) = I(x̄).
Because S(x̄)b b
is a cone, 0 ∈ S(x̄). For λ = 0,
0 ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.
b
Consider v ∈ S(x̄) with v 6= 0. We will show that
v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.
b
As vP∈ S(x̄), there exist λi ≥ 0 and ξi ∈ ∂gi (x̄), i ∈ I(x̄) such that
v = i∈I(x̄) λi ξi . Because v 6= 0 and 0 6∈ ∂gi (x̄) for all i ∈ I(x̄), it is clear that
P
all the λi , i ∈ I(x̄) cannot be simultaneously zero and hence i∈I(x̄) λi > 0.
P P
Let α = i∈I(x̄) λi and thus i∈I(x̄) λi /α = 1. Therefore,
 
1 X λi [
v= ξi ∈ co  ∂gi (x̄) ,
α α
i∈I(x̄) i∈I(x̄)
which by (3.3) implies that v ∈ α ∂g(x̄). Hence,
b
S(x̄) ⊆ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.
Conversely, consider v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)} with v 6= 0.

Therefore, v = λξ for some λ ≥ 0, ξ ∈ ∂g(x̄). The condition (3.3) yields that

there exist µi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
X
ξ= µi ξi
i∈I(x̄)
P
with i∈I(x̄) µi = 1. Therefore,
X X
v= λµi ξi = λ′i ξi ,
b
where λ′i = λµi ≥ 0 for i ∈ I(x̄). Hence, v ∈ S(x̄). Because v was arbitrary,
(3.4) holds, which along with the fact that 0 6∈ ∂g(x̄) and Proposition 3.4
b
yields that S(x̄) is closed.
Remark 3.6 It may be noted here that S(x̄) b is proved to be closed under
the Slater constraint qualification, which is equivalent to
[
0 6∈ co ∂gi (x̄).
i∈I(x̄)
This observation was made by Wolkowicz [112]. In the absence of such condi-
b
tions, S(x̄) need not be closed.
Now we turn to establish Proposition 3.3 according to which, if the Slater

constraint qualification holds, then
b
NC (x̄) = S(x̄).
b
Proof of Proposition 3.3. First we will prove that S(x̄) ⊆ NC (x̄). Consider
b
v ∈ S(x̄). Thus, there exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
any P
v = i∈I(x̄) λi ξi . Hence, for any x ∈ C,
X
hv, x − x̄i = λi hξi , x − x̄i.
i∈I(x̄)
By the convexity of gi , i ∈ I(x̄),
hξi , x − x̄i ≤ gi (x) − gi (x̄) ≤ 0, ∀ x ∈ C.
Thus hv, x − x̄i ≤ 0 for every x ∈ C, thereby showing that v ∈ NC (x̄).

b
Conversely, suppose that v ∈ NC (x̄). We have to show that v ∈ S(x̄). On
b b
the contrary, assume that v 6∈ S(x̄). As S(x̄) is a closed convex cone, by the
strict separation theorem, Theorem 2.26 (iii), there exists w ∈ Rn with w 6= 0
such that
b
hw, ξi ≤ 0 < hw, vi, ∀ ξ ∈ S(x̄).

S
b
As S(x̄) = cone co i∈I(x̄) ∂gi (x̄) , for each i ∈ I(x̄), hw, ξi i ≤ 0 for every
ξi ∈ ∂gi (x̄), which along with Theorem 2.79 yields
gi′ (x̄, w) ≤ 0, ∀ i ∈ I(x̄). (3.5)
Define
K = {u ∈ Rn : gi′ (x̄, u) < 0, ∀ i ∈ I(x̄)}.
Our first step is to show that K is nonempty. By the Slater constraint qual-
ification, there exists x̂ such that gi (x̂) < 0 for every i = 1, 2, . . . , m, and
corresponding to that x̂, set u = x̂ − x̄. By the convexity of each gi and
Theorem 2.79,
gi′ (x̄, x̂ − x̄) ≤ gi (x̂) − gi (x̄), ∀ i ∈ I(x̄),
which implies
gi′ (x̄, x̂ − x̄) < 0, ∀ i ∈ I(x̄).
Hence, x̂ − x̄ ∈ K, thereby showing that K is nonempty. Observe that for any

u ∈ K, there exists λ > 0 sufficiently small such that gi (x̄ + λu) < 0 for all
i = 1, 2, . . . , m, which implies x̄ + λu ∈ C. Therefore,
1
u∈ (C − x̄) ⊆ cone(C − x̄) ⊆ cl cone(C − x̄).
λ
By Theorem 2.35, u ∈ TC (x̄). Because TC (x̄) is closed, cl K ⊆ TC (x̄). Also,
as K is nonempty, it is simple to show that
cl K = {u ∈ Rn : gi′ (x̄, u) ≤ 0, ∀ i ∈ I(x̄)}.
By (3.5), w ∈ cl K and hence, w ∈ TC (x̄). As v ∈ NC (x̄), hv, wi ≤ 0, thereby

contradicting the fact that hv, wi > 0 and thus establishing the result.
Recall condition (ii) from Theorem 3.1, that is,
0 ∈ ∂f (x̄) + NC (x̄).
By combining it with Proposition 3.3, we can conclude that under the Slater
constraint qualification, x̄ is a point of minimizer of the convex programming
problem (CP ) with C given by (3.1) if and only if there exists λ̄ ∈ Rm + such
that
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄).
i∈I(x̄)
Setting λ̄i = 0 for i 6∈ I(x̄), the above expression can be rewritten as

m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

The above two expressions form the celebrated Karush–Kuhn–Tucker (KKT)

optimality conditions for the convex programming problem (CP ) with C given
by (3.1). The vector λ̄ ∈ Rm+ is called a Lagrange multiplier or a Karush–Kuhn–
Tucker (KKT) multiplier. The second condition is known as the complemen-
tary slackness condition.
Now suppose that the KKT optimality conditions are satisfied. Then there
exist ξ0 ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄), i = 1, 2, . . . , m, such that
m
X
0 = ξ0 + λi ξi . (3.6)
i=1
Therefore, by the convexity of f and gi , i = 1, 2, . . . , m, for every x ∈ Rn ,
f (x) − f (x̄) ≥ hξ0 , x − x̄i,

gi (x) − gi (x̄) ≥ hξi , x − x̄i, i = 1, 2, . . . , m.
The above inequalities along with (3.6) yields that for every x ∈ Rn ,
m
X m
X
f (x) − f (x̄) + λi (gi (x) − gi (x̄)) ≥ hξ0 , x − x̄i + λi hξi , x − x̄i = 0. (3.7)
i=1 i=1
The above inequality holds, in particular, for x ∈ C ⊂ Rn . Invoking the

complementary slackness condition along with the feasibility of x ∈ C, the
condition (3.7) reduces to
f (x) ≥ f (x̄), ∀ x ∈ C.
Thus, x̄ is a point of minimizer of (CP ).

This discussion can be summed up in the form of the following theorem.
Theorem 3.7 Consider the convex programming problem (CP ) with C given
by (3.1). Assume that the Slater constraint qualification holds. Then x̄ is a
point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m,
such that
m
X
i=1
It is obvious that the computation of the normal cone in Proposition 3.3

plays a major role in the derivation of the KKT optimality conditions. What
is shown by the computation of the normal cone in Proposition 3.3 is that the
Lagrange multipliers are not just auxiliary multipliers that help us convert a
constrained problem into a unconstrained one but are related to the geometry
of the feasible set.

Remark 3.8 In Proposition 3.3, we have seen how to compute the normal
cone when the convex inequality constraints need not be smooth. Now if gi ,
i = 1, 2, . . . , m, are differentiable and the Slater constraint qualification holds,
then from Proposition 3.3
X
NC (x̄) = {v ∈ Rn : λi ∇gi (x̄), λi ≥ 0, ∀ i ∈ I(x̄)}. (3.8)
i∈I(x̄)
This can be actually computed easily. Note that v ∈ NC (x̄) if x̄ is a point of

minimizer of the problem
min − hv, xi subject to gi (x) ≤ 0, i = 1, 2, . . . , m.
As the Slater condition holds, by Theorem 3.7 there exist λi ≥ 0,
i = 1, 2, . . . , m, such that
m
X
−v + λi ∇gi (x̄) = 0.
i=1
By the complementary slackness condition, λi = 0, i 6∈ I(x̄); thus the above

relation becomes
X
−v + λi ∇gi (x̄) = 0.
i∈I(x̄)
Hence, any v ∈ NC (x̄) belongs to the set on the right-hand side. One can
simply check that any element on the right-hand side is also an element in the
normal cone. From (3.8), it is simple to see that NC (x̄) is a finitely generated
cone with {∇gi (x̄) : i ∈ I(x̄)} being the set of generators. Thus, NC (x̄) is
polyhedral when the gi , i = 1, 2, . . . , m, are differentiable and the Slater
constraint qualification holds.
Is the normal cone also polyhedral if the Slater constraint qualification
holds but the constraint functions gi , i = 1, 2, . . . , m, are not be differen-
tiable? What is seen from Proposition 3.3 is that in the case nondifferentiable
constraints, NC (x̄) can be represented as
X
NC (x̄) = { λi ξi ∈ Rn : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)}
i∈I(x̄)
[ X
= { λi ξi ∈ Rn : λi ≥ 0, i ∈ I(x̄)},
ξi ∈∂gi (x̄) i∈I(x̄)
that is, the union of a family of polyhedral cones.

We will now show by an example that even though NC (x̄) is a union of a
family of polyhedral cones, it itself need not be polyhedral. Consider the set
C ⊆ R3 given as
q
C = {x ∈ R3 : x21 + x22 ≤ −x3 , x3 ≤ 0}.

x3
NC (x̄)
x2
x̄ = (0, 0, 0)
x1
C
FIGURE 3.1: NC (x̄) is not polyhedral.
It is clear that C is described by the constraints

q
x21 + x22 + x3 ≤ 0
x3 ≤ 0.
Each of these are convex functions. It is simple to see that the Slater condition
holds. Just take the point x̂ = (0, 0, −1). It is also simple to see that the first
constraint is not differentiable at x̄ = (0, 0, 0). However, from the geometry,

Figure 3.1, it is simple to observe that

q
NC (x̄) = {v ∈ R3 : v12 + v22 ≤ v3 , v3 ≥ 0}.
It is easy to observe that this cone, which is also known as the second-order
cone, is not polyhedral as it has an infinite number of generators and hence is
not finitely generated.
3.3 Abadie Constraint Qualification

From the previous section it is obvious that to derive the KKT conditions an
important feature is that the Slater constraint qualification is satisfied. But
what happens if the Slater constraint qualification is not satisfied? Is there
any other route to derive the KKT conditions? In this direction, we introduce
what is known as the Abadie constraint qualification. Consider the problem
(CP ) with C given by (3.1), that is
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}
where gi , i = 1, 2, . . . , m, are convex functions. Then the Abadie constraint

qualification is said to hold at x̄ ∈ C if
TC (x̄) = {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}.

◦
As C is convex, (TC (x̄)) = NC (x̄) and the expression (ii) in Theorem 3.1 can
be written as
◦
0 ∈ ∂f (x̄) + (TC (x̄)) .
If the Abadie constraint qualification holds, we can compute the NC (x̄) as

◦
NC (x̄) = (S(x̄)) ,
◦
where (S(x̄)) denotes the polar cone of the cone
S(x̄) = {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}.
It can be easily verified that S(x̄) is a closed convex cone. Also observe that
TC (x̄) ⊂ {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}
is always satisfied. So one may simply consider the reverse inclusion as the
◦
Abadie constraint qualification. We will now compute (S(x̄)) . But before
we do that, let us convince ourselves through an example that the Abadie

3.3 Abadie Constraint Qualification 155
constraint qualification can hold even if the Slater constraint qualification

fails. Consider
C = {x ∈ R : |x| ≤ 0, x ≤ 0}.
Here, g1 (x) = |x|, g2 (x) = x and of course C = {0}. Let us set x̄ = 0. This
shows that TC (x̄) = {0}. Further, because both constraints are active at x̄,
S(x̄) = {v ∈ R : g1′ (x̄, v) ≤ 0, g2′ (x̄, v) ≤ 0}

= {v ∈ R : g1′ (x̄, v) ≤ 0, h∇g2 (x̄), vi ≤ 0}
= {v ∈ R : |v| ≤ 0, v ≤ 0}
= {0}.
Hence TC (x̄) = S(x̄), showing that the Abadie constraint qualification holds
while it is clear that the Slater constraint qualification does not hold.
Now we present the following result.
◦ b
Proposition 3.9 (S(x̄)) = cl S(x̄).
Proof. From the relation (3.2),

 
 X 
b
S(x̄) = λi ξi : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)
 
i∈I(x̄)
is a convex cone from Lemma 3.5. Recall from the proof of Lemma 3.5 that
b
S(x̄) was shown to be closed under the Slater constraint qualification. In the
b
absence of the Slater constraint qualification, S(x̄) need not be closed. First
b ◦ b
we show that cl S(x̄) ⊆ (S(x̄)) . Consider any v ∈ S(x̄), P which implies there
exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that v = i∈I(x̄) λi ξi . Consider
any element w ∈ S(x̄), that is, gi′ (x̄, w) ≤ 0 for i ∈ I(x̄). Hence for every
i ∈ I(x̄), by Theorem 2.79, hξi , wi ≤ 0 for every ξi ∈ ∂gi (x̄), which implies
* +
X
λi ξi , w ≤ 0,
i∈I(x̄)
b ◦
that is, hv, wi ≤ 0. Because w ∈ S(x̄) was arbitrarily chosen, S(x̄) ⊆ (S(x̄)) ,
◦ b ◦
which by closedness of (S(x̄)) leads to cl S(x̄) ⊆ (S(x̄)) .
To complete the proof, we will establish the reverse inclusion, that is,
◦ b ◦ b
(S(x̄)) ⊆ cl S(x̄). On the contrary, assume that (S(x̄)) * cl S(x̄), which
◦ b b
implies there exists w ∈ (S(x̄)) and w ∈/ cl S(x̄). As cl S(x̄) is a closed convex
cone, by the strict separation theorem, Theorem 2.26 (iii), there exists v ∈ Rn
with v 6= 0 such that
sup hv, ξi < hv, wi.

b
ξ∈cl S(x̄)

b
Because 0 ∈ cl S(x̄), hv, wi > 0. We claim that v ∈ (cl S(x̄)) b ◦
, that is,
b
hv, ξi ≤ 0 for every ξ ∈ cl S(x̄). b
If v 6∈ (cl S(x̄))◦ b
, then there exists ξ˜ ∈ cl S(x̄)
˜ ˜ ˜
such that hv, ξi > 0. For every λ > 0, λhv, ξi = hv, λξi > 0. Because cl S(x̄) b
˜ b
is a cone, λξ ∈ cl S(x̄) for λ > 0, which means that as λ becomes sufficiently
large, the inequality
˜ < hv, wi
hv, λξi
b
will be violated. Thus, v ∈ (cl S(x̄)) ◦
. Further, observe that for i ∈ I(x̄),
b
ξi ∈ S(x̄), where ξi ∈ ∂gi (x̄). Therefore, hv, ξi i ≤ 0 for every ξi ∈ ∂gi (x̄),
i ∈ I(x̄), which implies that gi′ (x̄, v) ≤ 0 for every i ∈ I(x̄). This shows that
◦
v ∈ S(x̄) and therefore, hv, wi ≤ 0 because w ∈ (S(x̄)) . This leads to a
contradiction, thereby establishing the result.
The result below presents the KKT optimality conditions under the Abadie
constraint qualification.
Theorem 3.10 Consider the convex programming problem (CP ) with C

given by (3.1). Let x̄ be a point of minimizer of (CP ) and assume that the
Abadie constraint qualification holds at x̄. Then
b
0 ∈ ∂f (x̄) + cl S(x̄). (3.9)
Conversely, if (3.9) holds for some x̄ ∈ Rn , then x̄ is a point of minimizer of

(CP ). Moreover, the standard KKT optimality conditions hold at x̄ if either
b
S(x̄) is closed or the functions gi , i ∈ I(x̄), are smooth functions.
Proof. If the Abadie constraint qualification holds at x̄, then using Proposi-
tion 3.9, the relation (3.9) holds.
Conversely, suppose that (3.9) holds at x̄. By the convexity of gi , i ∈ I(x̄),
for every ξi ∈ ∂gi (x̄),
hξi , x − x̄i ≤ gi (x) − gi (x̄) ≤ 0, ∀ x ∈ C.
For every b
v ∈ S(x̄), there exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
P
v = i∈I(x̄) λi ξi . Therefore, by the above inequality,
X
hv, x − x̄i = hλi ξi , x − x̄i ≤ 0, ∀ x ∈ C,
i∈I(x̄)
b
which implies v ∈ NC (x̄). Thus S(x̄) ⊆ NC (x̄), which along with the fact that
b
NC (x̄) is closed implies that cl S(x̄) ⊆ NC (x̄). Therefore, (3.9) yields
0 ∈ ∂f (x̄) + NC (x̄)
and hence, by Theorem 3.1 (ii), x̄ is a point of minimizer of the convex pro-
gramming problem (CP ).

3.4 Convex Problems with Abstract Constraints 157
If gi , i ∈ I(x̄), are smooth,

 
 X 
b
S(x̄) = λi ∇gi (x̄) ∈ Rn : λi ≥ 0, i ∈ I(x̄) .
 
i∈I(x̄)
b
Thus, S(x̄) is a finitely generated cone and hence is closed. Therefore, it is
b
clear that when either S(x̄) is closed or gi , i ∈ I(x̄) are smooth functions,
then under the Abadie constraint qualification, the standard KKT conditions
are satisfied.
3.4 Convex Problems with Abstract Constraints

After studying the convex programming problem involving only inequality
constraints, in this section we turn our attention to a slightly modified version
of (CP ), which we denote as (CP 1) given as
min f (x)
subject to gi (x) ≤ 0, i = 1, 2, . . . , m, (CP 1)
x ∈ X,
where we have the additional abstract constraint x ∈ X with X as a closed
convex subset of Rn . The question is how to write down the KKT conditions
for the problem (CP 1).
Theorem 3.11 Let us consider the problem (CP 1). Assume the Slater-type
constraint qualification, that is, there exists x̂ ∈ ri X such that gi (x̂) < 0 for
i = 1, 2, . . . , m. Then the KKT optimality conditions are necessary as well as
sufficient at a point of minimizer x̄ of (CP 1) and are given as
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2 . . . , m.
i=1
Proof. The problem (CP 1) can be written as
min f (x) subject to x ∈ C ∩ X,
where C is given by (3.1). Thus if x̄ is a point of minimizer of (CP 1), then x̄

solves the unconstrained problem
min (f + δC∩X )(x),

x∈Rn
that is, x̄ solves
min (f + δC + δX )(x).
x∈Rn

By the optimality condition for unconstrained problem, Theorem 2.89,
0 ∈ ∂(f + δC + δX )(x̄).
The fact that ri dom f = Rn along with the Slater-type constraint qualifica-
tion and Propositions 2.15 and 2.67 imply that x̂ ∈ ri dom f ∩ ri C ∩ ri X. In-
voking the Sum Rule, Theorem 2.91, along with the facts that ∂δC (x̄) = NC (x̄)
and ∂δX (x̄) = NX (x̄), the above relation leads to
0 ∈ ∂f (x̄) + NC (x̄) + NX (x̄).
The Slater-type constraint qualification implies the Slater constraint qualifi-

cation which along with Proposition 3.3 yields
X
NC (x̄) = { λi ∂gi (x̄) : λi ≥ 0, i ∈ I(x̄)}.
i∈I(x̄)
By choosing λi = 0, i ∈ / I(x̄), the desired KKT optimality conditions are

obtained.
Conversely, by the optimality condition, there exist ξ0 ∈ ∂f (x̄) and
ξi ∈ ∂gi (x̄), i = 1, 2 . . . , m, such that
m
X
−ξ0 − λi ξi ∈ NX (x̄),
i=1
that is,
m
X
hξ0 , x − x̄i + hλi ξi , x − x̄i ≥ 0, ∀ x ∈ X.
i=1
The convexity of f and gi , i = 1, 2, . . . , m, along with the above condition

leads to
m
X m
X
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) ≥ 0, ∀ x ∈ X.
i=1 i=1
In particular, for any x ∈ C, the above inequality reduces to
f (x) ≥ f (x̄), ∀ x ∈ X,
thereby establishing that x̄ is a point of minimizer of (CP 1).

Next consider the problem
min f (x)
subject to x ∈ C = {x ∈ Rn : Ax = b}, (CP 2)
m
where A is a m × n matrix and b ∈ R . It is clear that C is a polyhedron.
Further, a point x̄ ∈ C is a point of minimizer of f over C if and only if
0 ∈ ∂f (x̄) + NC (x̄).

3.5 Max-Function Approach 159
If v ∈ NC (x̄), then x̄ solves the following smooth problem
min −hv, xi
subject to Ax = b.
As the constraints are affine, the KKT optimality conditions for this problem
automatically hold, that is, there exists λ ∈ Rm such that
−v + AT λ = 0,
that is, v = AT λ. Therefore,

NC (x̄) = v ∈ Rn : v = AT λ, λ ∈ Rm .
Hence, the optimality condition is that there exists λ ∈ Rm such that
−AT λ ∈ ∂f (x̄).
Using the convexity of f , the above relation implies that x̄ is a point of mini-
mizer of (CP 2). This discussion can be stated as the following theorem.
Theorem 3.12 Consider the problem (CP 2). Then x̄ is a point of minimizer
of (CP 2) if and only if there exists λ ∈ Rm such that
−AT λ ∈ ∂f (x̄).
3.5 Max-Function Approach

Until now the convex programming problems were tackled without modifying
the constraint sets. But every convex programming problem can be expressed
as nonsmooth convex programming with a lesser number of constraints. Con-
sider the problem (CP ) where C is given by (3.1), that is, convex inequality
constraints. Assume that the objective function f and the constraints func-
tions gi , i = 1, 2, . . . , m, are convex and smooth. Then (CP ) can be equiva-
lently posed as a problem with only one constraint, which is given as
min f (x) subject to g(x) ≤ 0, (CPeq )
n m
where g : R → R is defined as
g(x) = max{g1 (x), g2 (x), . . . , gm (x)}.
Hence g is intrinsically nonsmooth. We would like to invite the reader to

deduce the optimality condition of the problem (CP ) using (CPeq ). It is clear
that one needs to use the Max-Function Rule, Theorem 2.96, for evaluating
the subdifferential of the max-function. Thus, at a very fundamental level,

every convex programming problem (smooth or nonsmooth) is a nonsmooth

convex programming problem.
The Max-Function Rule is also in some sense very fundamental to convex
programming problems as can be seen in the result below. In the following
result, we derive the KKT optimality conditions for the convex programming
problem (CP ) with C given by (3.1) using the max-function approach.

given by (3.1). Assume that the Slater constraint qualification holds. Then x̄ is
a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m,
such that
m
X
i=1
Proof. As x̄ is a point of minimizer of (CP ), it also solves the unconstrained

problem
min F (x) subject to x ∈ Rn ,
where F (x) = max{f (x) − f (x̄), g1 (x), g2 (x), . . . , gm (x)}. Then by the uncon-
strained optimality condition, Theorem 2.89,
0 ∈ ∂F (x̄).
Applying the Max-Function Rule, Theorem 2.96,

[
0 ∈ co {∂f (x̄) ∪ ( ∂gi (x̄))},
i∈I(x̄)
where I(x̄) is the activePindex set at x̄. Therefore, there exists λi ≥ 0,

i ∈ {0} ∪ I(x̄) satisfying i∈{0}∪I(x̄) λi = 1 such that
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄). (3.10)
i∈I(x̄)
We claim that λ0 6= 0. On the contrary, assume that λ0 = 0. Thus, the above

inclusion reduces to
X
0∈ λi ∂gi (x̄),
i∈I(x̄)
that is, there exists ξi ∈ ∂gi (x̄) such that

X
0= λi ξi . (3.11)
i∈I(x̄)

3.6 Cone-Constrained Convex Programming 161
gi (x) = gi (x) − gi (x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i ∈ I(x̄),
which along with (3.11) implies that

X
λi gi (x) ≥ 0, ∀ x ∈ Rn .
i∈I(x̄)
As the Slater constraint qualification holds, there exists x̂ ∈ Rn such that

gi (x̂) < 0, i = 1, 2, . . . , m. Thus,
X
λi gi (x̂) < 0,
i∈I(x̄)
which is a contradiction of the preceding inequality. Therefore, λ0 6= 0 and

hence dividing (3.10) throughout by λ0 yields
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i∈I(x̄)
λi
where λ̄i = , i ∈ I(x̄). Taking λ̄i = 0, i 6∈ I(x̄), the above condition
λ0
becomes
m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i=1
that is, the KKT optimality condition. It is easy to observe that
λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
thus yielding the desired conditions. The sufficiency part can be worked out
using the convexity of the functions, as done in the previous KKT optimality
theorems.
3.6 Cone-Constrained Convex Programming

A convex optimization problem can be posed in a more general format. Con-
sider a nonempty closed convex cone S ⊂ Rm . Then consider the problem
min f (x) subject to G(x) ∈ −S, (CCP )
n n m
where f : R → R is a convex function and G : R → R is a S-convex
function; that is, for any x, y ∈ Rn and λ ∈ [0, 1],
(1 − λ)G(x) + λG(y) − G((1 − λ)x + λy) ∈ S.

In particular, if S = Rm + , then the above problem reduces to (CP ) with C

given by (3.1). If S = Rs+ × {0}m−s , then (CCP ) reduces to a convex problem
with both inequality and equality constraints. If S is not these two cones, then
(CCP ) is called a cone-constrained problem. We will now derive optimality
conditions for a slightly more general problem that has an added abstract
constraint. Consider the problem
min f (x) subject to G(x) ∈ −S, x ∈ X, (CCP 1)
where X ⊂ Rn is a nonempty closed convex set. There are many ways to
approach this problem. Here we demonstrate one approach. Define
C = {x ∈ Rn : G(x) ∈ −S}.
As S and X are nonempty convex sets, by Proposition 2.14, ri S and ri X are

both nonempty. Assume that the Slater-type constraint qualification holds,
that is, there exist x̂ ∈ ri X such that G(x̂) ∈ −ri S. The most natural
approach is to observe that if x̄ solves (CCP 1), then x̄ is also a point of
minimizer of the unconstrained problem
min (f + δC∩X )(x) subject to x ∈ Rn .
This is equivalent to the problem
min (f + δC + δX )(x).
As dom f = Rn , which along with the Slater-type constraint qualification im-

plies that x̂ ∈ ri dom f ∩ ri C ∩ ri X. Invoking the Sum Rule, Theorem 2.91,
0 ∈ ∂f (x̄) + ∂δC (x̄) + ∂δX (x̄).
and thus,
0 ∈ ∂f (x̄) + NC (x̄) + NX (x̄).
So our main concern now is to explicitly compute NC (x̄). How does one do
that? We have already observed that it is not so straightforward to compute
the normal cone when the inequality constraints are not smooth. Let us now
mention that in this case also we do not consider G to be differentiable. Thus
we shall now introduce the notion of a subdifferential of a cone convex function.
As G is a S-convex function, we call an m × n matrix A to be a subgradient
of G at x ∈ Rn
G(y) − G(x) − A(y − x) ∈ S, ∀ y ∈ Rn .
Then the subdifferential of the cone convex function G at x is given as
∂G(x) = {A ∈ Rm×n : G(y) − G(x) − A(y − x) ∈ S, ∀ y ∈ Rn }.
The important question is whether the set ∂G(x) is nonempty.

It was shown, for example, in Luc, Tan, and Tinh [78] that if G is an
S-convex function, then G is continuous on Rn . Further, G is also a locally
Lipschitz function on Rn ; that is, for any x0 ∈ Rn , there exists a neighborhood
N (x0 ) of x0 such that there exists Lx0 > 0 satisfying
kG(y) − G(x)k ≤ Lx0 ky − xk, ∀ x, y ∈ N (x0 ).
Observe that Lx0 depends on the chosen x0 and is also called the Lipschitz
constant at x0 . Also, note that a locally Lipschitz vector function need not
be differentiable everywhere. For a locally Lipschitz function G, the Clarke
Jacobian of G at x is given as follows,

∂C G(x) = co A ∈ Rm×n : A = lim JG(xk ) where xk → x, xk ∈ D ,
k→∞
where D is the set of points on Rn at which G is differentiable and JG(y) de-

notes the Jacobian of G at y. In fact, there is a famous theorem of Rademacher
that says that Rn \D is a set of Lebesgue measure zero. The set ∂C G(x) 6= ∅
for all x ∈ Rn and is convex and compact. For more details on the Clarke
Jacobian, see for example Clarke [27] or Demyanov and Rubinov [30]. The
property that will be important to us is the Clarke Jacobian as a set-valued
map is locally bounded and graph closed.
It was shown for example in Luc, Tan, and Tinh [78] that ∂C G(x) ⊆ ∂G(x),
thereby proving that if G is an S-convex function, then ∂G(x) 6= ∅ for every
x ∈ Rn . Before we proceed to develop the optimality conditions for (CCP 1),
let us look at a locally Lipschitz function φ : Rn → R.
Recall that a function φ : Rn → R is locally Lipschitz at x0 if there exists
a neighborhood N (x0 ) of x0 and Lx0 > 0 such that
|φ(y) − φ(x)| ≤ Lx0 ky − xk, ∀ x, y ∈ N (x0 ).
Naturally a locally Lipschitz scalar-valued function is not differentiable every-

where and the Rademacher Theorem tells us that the set of points where φ is
not differentiable forms a set of measure zero. Therefore, at any x ∈ Rn , the
Clarke generalized gradient or Clarke subdifferential is given as
∂ ◦ φ(x) = co {ξ ∈ Rn : ξ = lim ∇φ(xk ) where xk → x, xk ∈ D̃},

k→∞
where D̃ denotes the set of points at which φ is differentiable. One can observe
that if m = 1, the Clarke Jacobian reduces to the Clarke subdifferential.
The Clarke subdifferential is nonempty, convex, and compact. If x̄ is a local
minimum of φ over Rn , then 0 ∈ ∂ ◦ φ(x̄). It is important to note that this
condition is necessary but not sufficient. Now we state a calculus rule that
will be useful in our computation of the normal cone. The Sum Rule is from
Clarke [27].

Consider two locally Lipschitz functions φ1 , φ2 : Rn → R. Then
∂ ◦ (φ1 + φ2 )(x) ⊆ ∂ ◦ φ1 (x) + ∂ ◦ φ2 (x).
If one of the functions is continuously differentiable, then equality

holds.
The Chain Rule that we state is from Demyanov and Rubinov [30] (see also
Dutta [36]).
Consider the function φ ◦ Φ where Φ : Rn → Rm and φ : Rm → R

are locally Lipschitz functions. Assume that φ is continuously dif-
ferentiable. Then
∂ ◦ (φ ◦ Φ)(x) = {z T ∇φ(Φ(x)) ∈ Rn : z ∈ ∂C Φ(x)}.
Observe that v ∈ NC (x̄) (in the current context of (CCP 1)) if and only if
x̄ is a point of minimizer of the problem
min − hv, xi subject to G(x) ∈ −S. (N P )
n
For simplicity, assume that C = {x ∈ R : G(x) ∈ −S} is an n-dimensional
convex set. The approach to derive the necessary and sufficient condition for
optimality is due to Rockafellar [100] (see also Chapter 6 of Rockafellar and
Wets [101]). As the above problem is a convex programming problem, x̄ is a
global point of minimizer of (N P ). Further, without loss of generality, we can
assume it to be a unique. Observe that if we define
f (x) = −hv, xi and f˜(x) = −hv, xi + εkx − x̄k2 ,
then ∂f (x̄) = ∂ f˜(x̄) = {−v} and x̄ is the unique minimizer of f˜ because it

is a strictly convex function. Consider an n- dimension convex compact set
Y ⊂ Rn such that x̄ ∈ int Y and C ∩ Y 6= ∅ (Figure 3.2). It is simple to see
that x̄ is also the unique minimizer of the problem
min f (x) subject to G(x) ∈ −S, x ∈ Y. (N P 1)
Also observe that the normal cone NC (x̄) = NC∩Y (x̄). (We would urge the
readers to think why).
Our approach depends on the use of penalization, a method popular for
designing algorithms for constrained optimization. Consider the problem
min fˆ(x, u) = f (x) subject to G(x) − u = 0, (x, u) ∈ Y × (−S). (N dP)
As x̄ is the unique point of minimizer of (N P 1), we deduce that
(x̄, ū) = (x̄, G(x̄)) is the unique point of minimizer of (N dP ). For a sequence
of εk ↓ 0, consider the sequence of penalty approximations
1
min fˆk (x, u) = f (x) + kG(x) − uk2
2εk
subject to (x, u) ∈ Y ×(−S). d
(N P k)

C 11
00
C ∩Y 00
11
x̄
FIGURE 3.2: C ∩ Y .
Consider the following closed set
Sk = {(x, u) ∈ Y × (−S) : fˆ(x, u) ≤ fˆ(x̄, ū) = f (x̄)}.
Note that ū = G(x̄). Also, Sk is nonempty as (x̄, ū) ∈ Sk for each k. Because
Sk is nonempty for each k, solving (Nd P k ) is same as minimizing fˆk over Sk .
Denote the minimum of f over the compact set Y by µ. For any (x, u) ∈ Sk ,
fˆk (x, u) ≤ fˆk (x̄, ū) = f (x̄),
which implies
1
f (x) + kG(x) − uk2 ≤ f (x̄),
2εk
where (x, u) ∈ Sk ⊂ Y × (−S). As f (x) ≥ µ for every x ∈ Y ,
1
µ+ kG(x) − uk2 ≤ f (x̄),
2εk
which leads to
p
kG(x) − uk ≤ 2εk (f (x̄) − µ).
Thus for any given k,

p
S k ⊆ {(x, u) ∈ Y × (−S) : kG(x) − uk ≤ 2εk (f (x̄) − µ)}. (3.12)

Also, for a fixed k,

p
kuk ≤ kG(x)k + 2εk (f (x̄) − µ).
As G is an S-convex function, it is also locally Lipschitz and hence G(Y )

is a compact set. This shows that the right-hand side of (3.12) is bounded
for a fixed k. From this, along with the compactness of Y , we can deduce
that Sk is compact and thus fˆk achieves a minimum over Sk . Hence, (N dP k)
has a point of minimizer that naturally need not be unique. Denote a point
of minimizer of (N dP k ) by (xk , uk ) and thus, obtaining a bounded sequence
{(xk , uk )} ⊂ Y × (−S), which satisfies
p
kG(xk ) − uk k ≤ 2εk (f (x̄) − µ),
and as (xk , uk ) ∈ Sk ,
f (xk ) ≤ fˆk (xk , uk ) ≤ f (x̄).
Because {(xk , uk )} is bounded, by the Bolzano–Weierstrass Theorem, Propo-

sition 1.3, it has a convergent subsequence. Without loss of generality, assume
that xk → x̃ and uk → ũ. Therefore, as k → ∞, εk → 0 and thus
kG(x̃) − ũk = 0 and f (x̃) ≤ f (x̄).
Hence, ũ = G(x̃) and thus (x̃, ũ) is also a minimizer of (N d P ). But as (x̄, ū) is
the unique point of minimizer of (N d P ), we have x̃ = x̄ and ũ = ū.
d
Because (xk , uk ) is a point of minimizer of (N P k ), it is a simple exercise to
see that xk minimizes fk (x, uk ) over Y and uk minimizes fˆk (xk , u) over −S.
ˆ
Hence,
0 ∈ ∂x◦ fˆk (xk , uk ) + NY (xk ), (3.13)

0 ∈ ∂ ◦ fˆk (xk , uk ) + N−S (uk ).
u (3.14)
Now we analyze these conditions in more detail. Denote

1
yk = (G(xk ) − uk ).
εk
From condition (3.14),
−∇u fˆk (xk , uk ) ∈ N−S (uk ).
One can easily compute ∇u fˆk (xk , uk ) to see that yk = −∇u fˆk (xk , uk ) and
hence yk ∈ N−S (uk ). Moreover, applying the Sum Rule and the Chain Rule
for a locally Lipschitz to (3.13), then for each k,
0 ∈ −v + ∂C G(xk )T yk + NY (xk ). (3.15)

Suppose that {yk } is bounded and thus by the Bolzano–Weierstrass The-

orem has a convergent subsequence. Without loss of generality, suppose that
yk → ȳ. Noting that the normal cone is graph closed as a set-valued map and
∂C G is locally bounded, taking the limit as k → ∞ in (3.15) leads to
0 ∈ −v + ∂C G(x̄)T ȳ + NY (x̄).
But as x̄ ∈ int Y , by Example 2.38 NY (x̄) = {0}. As ∂C G(x̄) ⊂ ∂G(x̄),

v ∈ ∂C G(x̄)T ȳ ⊂ ∂G(x̄)T ȳ. Thus, v = z T ȳ for some z ∈ ∂G(x̄).
The important question is can {yk } be unbounded? We show that if {yk }
is unbounded, then the Slater constraint qualification, that is, there exists
x̂ ∈ Rn such that G(x̂) ∈ −ri S is violated.
On the contrary, assume that {yk } is unbounded and thus kyk k → ∞ as
k → ∞. Hence, noting that ∂C G(xk ) ⊂ ∂G(xk ), from (3.15) we have
1 yk 1
0∈ (−v) + ∂G(xk )T + NY (xk ),
kyk k kyk k kyk k
which implies
1
0∈ (−v) + ∂G(xk )T wk + NY (xk ), (3.16)
kyk k
yk
where wk = . Hence, {wk } is a bounded sequence and thus by the
kyk k
Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence.
Without loss of generality, assume that wk → w̄ with kw̄k = 1. Hence from
(3.16),
0 ∈ ∂G(x̄)T w̄. (3.17)
As yk ∈ N−S (uk ), we have wk ∈ N−S (uk ). Again using the fact that the
normal cone map has a closed graph, w̄ ∈ N−S (ū). Hence,
hw̄, z − G(x̄)i ≤ 0, z ∈ −S.
Because S is a cone, 0 ∈ −S, thus
hw̄, −G(x̄)i ≤ 0,
that is,
hw̄, G(x̄)i ≥ 0. (3.18)
Consider p ∈ −S. As S is a convex cone, by Theorem 2.20, G(x̄) + p ∈ −S.
Hence,
hw̄, G(x̄) + p − G(x̄)i ≤ 0,
which implies hw̄, pi ≤ 0. Because p was arbitrary,
hw̄, pi ≤ 0, ∀ p ∈ −S.

Thus, w̄ ∈ S + . Hence, hx̄, G(x̄)i ≤ 0, which together with (3.18) leads to

hw̄, G(x̄)i = 0. For any y ∈ Rn ,
G(y) − G(x̄) − A(y − x̄) ∈ S, ∀ A ∈ ∂G(x̄),
which implies
hw̄, G(y)i − hw̄, G(x̄)i − hw̄, A(y − x̄)i ≥ 0, ∀ A ∈ ∂G(x̄).
From (3.17), if {yk } is unbounded, there exists z̄ ∈ ∂G(x̄) such that z̄ T w̄ = 0.

Thus, from the above inequality we have
hw̄, G(y)i − hw̄, G(x̄)i − hz̄ T w̄, (y − x̄)i ≥ 0, ∀ y ∈ Rn ,
which along with hw̄, G(x̄)i = 0 and z̄ T w̄ = 0 yields
hw̄, G(y)i ≥ 0, ∀ y ∈ Rn .
If the Slater constraint qualification holds, there exists x̂ ∈ Rn such that

G(x̂) ∈ −ri S. As kw̄k = 1 and w̄ ∈ S + , hw̄, G(x̂)i < 0, which contradicts the
above inequality. Therefore, {yk } cannot be an unbounded sequence. Thus,
we leave it to the reader to see that
v = z T ȳ, where ȳ ∈ S + ,
and hence conclude that
NC (x̄) = {v ∈ Rn : there exists ȳ ∈ S + , z ∈ ∂G(x̄) satisfying

hȳ, G(x̄)i = 0 such that v = z T ȳ}.
The reader is now urged to write the necessary and sufficient optimality con-
ditions for the problem (CCP 1), as the structure of the normal cone to C at
x̄ is now known.

Chapter 4
Saddle Points, Optimality, and Duality
4.1 Introduction
In the previous chapter, the KKT optimality conditions was studied using the
normal cone as one of the main vehicles of expressing the optimality condi-
tions. One of the central issues in the previous chapter was the computation
of the normal cone at the point of the feasible set C where the set C was ex-
plicitly described by the inequality constraints. In this chapter our approach
to the KKT optimality condition will take us deeper into convex optimization
theory and also we can avoid the explicit computation of the normal cone.
This approach uses the saddle point condition of the Lagrangian function as-
sociated with (CP ). We motivate the issue using two-person-zero-sum games.
Consider a two-person-zero-sum game where we denote the players as
Player 1 and Player 2 having strategy sets X ⊂ Rn and Λ ⊂ Rm , respec-
tively, which we assume to be compact for simplicity. In each move of the
game, the players reveal their choices simultaneously. For every choice x ∈ X
by Player 1 and λ ∈ Λ by Player 2, an amount L(x, λ) is paid by Player 1 to
Player 2. Now Player 1 behaves in the following way. For any given choice of
strategy x ∈ X, he would like to know what the maximum amount he would
have to give to Player 2. In effect, he computes the function
φ(x) = max L(x, λ).
λ∈Λ
Further, it is natural that he would choose an x ∈ X that minimizes φ(x),

that is, Player 1 solves the problem
min φ(x) subject to x ∈ X,
which implies that in effect, he solves a minimax problem
min max L(x, λ).
x∈X λ∈Λ
Similarly, Player 2 would naturally want to know what the guaranteed amount
he will receive once he makes a move λ ∈ Λ. This means he computes the
function
ψ(λ) = min L(x, λ).
x∈X
169

170 Saddle Points, Optimality, and Duality
Of course he would like to maximize the amount of money he gets and therefore
solves the problem
max ψ(λ) subject to λ ∈ Λ,
that is, he solves
max min L(x, λ).

λ∈Λ x∈X
Thus, in every game there are two associated optimization problems. The
minimization problem for Player 1 and the maximization problem for Player
2. In the optimization literature, the problem associated with Player 1 is
called the primal problem while that associated with Player 2 is called the dual
problem. Duality is a deep issue in modern optimization theory. In this chapter,
we will have quite a detailed discussion on duality in convex optimization. The
game is said to have a value if
min max L(x, λ) = max min L(x, λ).

x∈X λ∈Λ λ∈Λ x∈X
The above relation is the minimax equality.

For any given λ̃ ∈ Λ,
min L(x, λ̃) ≤ min max L(x, λ).

x∈X x∈X λ∈Λ
Because λ̃ ∈ Λ is arbitrary, we obtain the minimax inequality, that is,
max min L(x, λ) ≤ min max L(x, λ),

λ∈Λ x∈X x∈X λ∈Λ
which always holds true.

Of course the minimax equality would hold true if a saddle point exists,
that is, a pair (x̄, λ̄) ∈ X × Λ exists that satisfies the following inequality,
L(x̄, λ) ≤ L(x̄, λ̄) ≤ L(x, λ̄), ∀ x ∈ X, ∀ λ ∈ Λ.
The above relation is called the saddle point condition. It is easy to observe
that (x̄, λ̄) ∈ X × Λ is a saddle point if and only if
max L(x̄, λ) = L(x̄, λ̄) = min L(x, λ̄).

λ∈Λ x∈X
The above condition implies
min max L(x, λ) ≤ max L(x̄, λ) = min L(x, λ̄) ≤ max min L(x, λ),
x∈X λ∈Λ λ∈Λ x∈X λ∈Λ x∈X
which along with the minimax inequality yields the minimax equality.
Before moving on to study the optimality of the convex programming

4.2 Basic Saddle Point Theorem 171
problem (CP ) via the saddle point approach, we state the Saddle Point The-
orem (Proposition 2.6.9, Bertsekas [12]) for which we will need the following
notations.
For each λ ∈ Λ, define the proper function φλ : Rn → R̄ as

L(x, λ), x ∈ X,
φλ (x) =
+∞, otherwise,
and for every x ∈ X, the proper function ψx : Rm → R̄ is given by

−L(x, λ), λ ∈ Λ,
ψx (λ) =
+∞, otherwise.
Proposition 4.1 (Saddle Point Theorem) Assume that for every λ ∈ Λ, φλ

and for every x ∈ X, ψx are lsc and convex. The set of saddle points of L is
nonempty and compact under any one of the following conditions:
(i) X and Λ are compact.

(ii) Λ is compact and there exists λ̄ ∈ Λ and α ∈ R such that the set
{x ∈ X : L(x, λ̄) ≤ α}
is nonempty and compact.

(iii) X is compact and there exists x̄ ∈ X and α ∈ R such that the set
{λ ∈ Λ : L(x̄, λ) ≥ α}
is nonempty and compact.

(iv) There exist x̄ ∈ X, λ̄ ∈ Λ, and α ∈ R such that
{x ∈ X : L(x, λ̄) ≤ α} and {λ ∈ Λ : L(x̄, λ) ≥ α}
are nonempty and compact.
This proposition will play a pivotal role in the study of enhanced optimality
conditions in Chapter 5.
4.2 Basic Saddle Point Theorem

The saddle point condition can itself be taken as an optimality condition for
the problem of Player 1, that is,
min φ(x) subject to x ∈ X.

Our question is, can we construct a function like L(x, λ) for the convex
(CP ) for which f (x) can be represented in a way as φ(x) has been represented
through L(x, λ)? Note that if we remove the compactness from Λ, then φ(x)
could take up +∞ value for some x. It is quite surprising that for the objective
function f (x) of (CP ), such a function can be obtained by considering the
classical Lagrangian function from calculus.
For the problem (CP ) with inequality constraints, we construct the La-
grangian function L : Rn × Rm + → R as
m
X
L(x, λ) = f (x) + λi gi (x)
i=1
with λ = (λ1 , λ2 , . . . , λm ) ∈ Rm
+ . Observe that it is a simple matter to show
that

f (x), x is feasible,
sup L(x, λ) =
λ∈Rm +
+∞, otherwise.
Here, the Lagrangian function L(x, λ) is playing the role of L(x, λ). So the
next pertinent question is, if we can solve (CP ) then does there exist a saddle
point for it? Does the existence of a saddle point for L(x, λ) guarantee that
a solution to the original problem (CP ) is obtained? The following theorem
answers the above questions. Recall the convex programming problem
min f (x) subject to x∈C (CP )
with C given by
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, (4.1)
where gi : Rn → R, i = 1, 2, . . . , m, are now assumed to be convex and

non-affine functions.
by (4.1). Assume that the Slater constraint qualification holds, that is, there
exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. Then x̄ is a point of
minimizer of (CP ) if and only if there exists λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) ∈ Rm
+
satisfying the complementary slackness condition, that is, λ̄i gi (x̄) = 0 for
i = 1, 2, . . . , m and the saddle point condition
L(x̄, λ) ≤ L(x̄, λ̄) ≤ L(x, λ̄), ∀ x ∈ Rn , λ ∈ Rm

+.
Proof. As x̄ is a point of minimizer of (CP ) the following system

f (x) − f (x̄) < 0
,
gi (x) < 0, i = 1, 2, . . . , m

4.2 Basic Saddle Point Theorem 173
has no solution. Define a set
Λ = {(y0 , y) ∈ R × Rm : there exists x ∈ Rn such that

f (x) − f (x̄) < y0 , gi (x) < yi , i = 1, 2, . . . , m}.
We leave it the reader to prove that the set Λ is convex and open. It is clear
that (0, 0) ∈
/ Λ. Hence, by the Proper Separation Theorem, Theorem 2.26 (iv),
there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that
m
X
λ0 y0 + λi yi ≥ 0, ∀ (y0 , y) ∈ Λ. (4.2)
i=1
Corresponding to x̄ ∈ Rn , for yi > 0, i = 0, 1, . . . , m, (y0 , y) ∈ Λ. Also, for

any γ > 0, (y0 + γ, y) ∈ Λ. Therefore, from condition (4.2),
Xm
1
λ0 ≥ − {λ0 y0 + λi yi },
γ i=1
which as the limit γ → ∞ leads to λ0 ≥ 0. It is now left to the reader to prove

in a similar fashion that λ ∈ Rm
+.
For any x ∈ Rn , consider a fixed αi > 0, i = 0, 1, . . . , m. Then for any
γi > 0, i = 0, 1, . . . , m,
(f (x) − f (x̄) + γ0 α0 , g1 (x) + γ1 α1 , . . . , gm (x) + γm αm ) ∈ Λ.
Therefore, from (4.2),

m
X
λ0 (f (x) − f (x̄) + γ0 α0 ) + λi (gi (x) + γi αi ) ≥ 0.
i=1
As γi → 0, the above inequality yields

m
X
λ0 (f (x) − f (x̄)) + λi gi (x) ≥ 0, ∀ x ∈ Rn . (4.3)
i=1
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0, thereby reducing

(4.3) to
m
X
λi gi (x) ≥ 0, ∀ x ∈ Rn .
i=1
This violates the Slater constraint qualification. Thus, λ0 > 0. Therefore,

λi
denoting λ̄i = , the condition (4.3) yields
λ0
m
X
f (x) − f (x̄) + λ̄i gi (x) ≥ 0, ∀ x ∈ Rn .
i=1

Pm
In particular, x = x̄ in the above inequality leads to i=1 λ̄i gi (x̄) = 0. Because
the sum of negative numbers is zero only if each term is zero, the complemen-
tary slackness condition, that is, λ̄i gi (x̄) = 0, i = 1, 2, . . . , m, holds. Therefore,
the preceding inequality leads to
m
X m
X
f (x) + λ̄i gi (x) ≥ f (x̄) + λ̄i gi (x̄), ∀ x ∈ Rn ,
i=1 i=1
which implies
L(x, λ̄) ≥ L(x̄, λ̄), ∀ x ∈ Rn .

Pm
Further, for any λ ∈ Rm
+, i=1 λi gi (x̄) ≤ 0. Thus,
m
X m
X
f (x̄) + λi gi (x̄) ≤ f (x̄) = f (x̄) + λ̄i gi (x̄),
i=1 i=1
that is,
L(x̄, λ) ≤ L(x̄, λ̄), ∀ λ ∈ Rm

+,
thereby establishing the saddle point condition.

Conversely, suppose that there exists λ̄ ∈ Rm + such that the saddle point
condition and the complementary slackness condition hold at x̄. We first prove
that x̄ is feasible, that is, −g(x̄) = (−g1 (x̄), −g2 (x̄), . . . , −gm (x̄)) ∈ Rm
+ . On
the contrary, assume that −g(x̄) 6∈ Rm m
+ . As R+ is a closed convex cone, by
the Strict Separation Theorem, Theorem 2.26 (iii), there exists λ ∈ Rm + with
λ 6= 0 such that
m
X
hλ, g(x̄)i = λi gi (x̄) > 0.
i=1
Therefore,
m
X
f (x̄) + λi gi (x̄) > f (x̄),
i=1
which implies L(x̄, λ) > L(x̄, λ̄), thereby contradicting the saddle point con-
dition. Hence, x̄ is feasible to (CP ).
Because L(x̄, λ̄) ≤ L(x, λ̄) and the complementary slackness condition is
satisfied,
m
X
f (x̄) ≤ f (x) + λi gi (x) ≤ f (x), ∀ x ∈ C.
i=1

The consequence of the saddle point criteria is simple. If (x̄, λ̄) is a saddle
point associated with the Lagrangian function of (CP ) where x̄ is a point of
minimizer of f over C, then
L(x̄, λ̄) = minn L(x, λ̄)

x∈R
with λ̄i gi (x̄) = 0 for i = 1, 2, . . . , m. Therefore, by the optimality condition

for the unconstrained problem, Theorem 2.89,
0 ∈ ∂x L(x̄, λ̄),
which under Slater constraint qualification yields

m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i=1
thus leading to the KKT optimality conditions for (CP ).
4.3 Affine Inequalities and Equalities and Saddle Point

Condition
Observe that in the previous section we had mentioned that the convex func-
tion are non-affine. This eventually has to do with the Slater constraint qual-
ification. Consider the set
C = {(x1 , x2 ) ∈ R2 : x1 + x2 ≤ 0, − x1 ≤ 0}.
This set is described by affine inequalities. However, C = {(0, 0)} and hence
the Slater constraint qualification fails. The question is whether in such a sit-
uation the saddle point condition exists or not. What we show below is that
the presence of affine inequalities does not affect the saddle point condition.
In fact, we should only bother about the Slater constraint qualification for
the convex non-affine inequalities. The presence of affine inequalities by itself
is a constraint qualification. To the best of our knowledge, the first study
in this respect was due to Jaffray and Pomerol [65]. We present their result
establishing the saddle point criteria under a modified version of Slater con-
straint qualification using the separation theorem. For that we now consider
the feasible set C of the convex programming problem (CP ) defined by convex
non-affine and affine inequalities as
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , l}, (4.4)
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions while hj : Rn → R,

j = 1, 2, . . . , l, are affine functions. Observe that C is a convex set. Correspond-

ing to this convex programming problem (CP ), the associated Lagrangian
function L : Rn × Rm l
+ × R+ is defined as
m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x).
i=1 j=1
Then (x̄, λ̄, µ̄) is the saddle point of (CP ) with C given by (4.4) if
L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄).
We shall now present the proof of Jaffray and Pomerol in a more detailed
and simplified manner.
Theorem 4.3 Consider (CP ) with C defined by (4.4). Assume that the mod-
ified Slater constraint qualification holds, that is, there exists x̂ ∈ Rn such that
gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) ≤ 0, j = 1, 2, . . . , l. Then x̄ is a point of
minimizer of (CP ) if and only if there exist (λ̄, µ̄) ∈ Rm l
+ × R+ such that
L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ Rn , λ ∈ Rm l

+ , µ ∈ R+
along with the complementary slackness conditions, that is,
λ̄i gi (x̄) = 0, i = 1, 2, . . . , m and µ̄j hj (x̄) = 0, j = 1, 2, . . . , l.
Proof. Consider an index set J as the (possibly empty) maximal sub-

P of {1, 2, . . . , l} such that there
set
n
exists αj > 0 for j ∈ J such that
j∈J αj h j (x) = 0 for every x ∈ R . Observe that for every x ∈ C,
hj (x) = 0, ∀ j ∈ J.
Otherwise, if for some x ∈ C and for some j ∈ J, hj (x) < 0, the maximality
of J is contradicted.
Define the Lagrange covers of (CP ) as
Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ Rn such that

f (x) − f (x̄) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) ≤ zj , j ∈ J c , hj (x) = zj , j ∈ J},
where J c = {j ∈ {1, 2, . . . , l} : j ∈
/ J}.
We claim that the set Λ is convex. Consider (y01 , y 1 , z 1 ) and (y02 , y 2 , z 2 ) in Λ
with x1 and x2 the respective associated elements from Rn . For any λ ∈ [0, 1],
x = λx1 + (1 − λ)x2 ∈ Rn . By the convexity of f and gi , i = 1, 2, . . . , m,
f (x) − f (x̄) ≤ λ(f (x1 ) − f (x̄)) + (1 − λ)(f (x2 ) − f (x̄))

≤ λy01 + (1 − λ)y02 ,
gi (x) ≤ λgi (x1 ) + (1 − λ)gi (x2 ) ≤ λyi1 + (1 − λ)yi2 , i = 1, 2, . . . , m,

while the affineness of hj , j = 1, 2, . . . , l leads to
hj (x) = λhj (x1 ) + (1 − λ)hj (x2 ) ≤ λzj1 + (1 − λ)zj2 , j ∈ J c ,

hj (x) = λhj (x1 ) + (1 − λ)hj (x2 ) = λzj1 + (1 − λ)zj2 , j ∈ J.
Thus, for every λ ∈ [0, 1], λ(y01 , y 1 , z 1 ) + (1 − λ)(y02 , y 2 , z 2 ) ∈ Λ with x ∈ Rn

as the associated element, thereby implying the convexity of Λ.
Observe that corresponding to the point of minimizer of (CP ), x̄ ∈ Rn ,
(ȳ0 , 0, 0) ∈ Λ if and only if ȳ0 ≥ 0. Also, (y0 , 0, 0) belongs to the affine hull
of Λ for every y0 ∈ R, and hence, (0, 0, 0) belongs to the relative bound-
ary of Λ. Applying the Proper Separation Theorem, Theorem 2.26 (iv), to
the Lagrange cover Λ and the relative boundary point (0, 0, 0), there exists
(λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that
m
X l
X
λ0 y0 + λi yi + µj zj ≥ 0, ∀ (y0 , y, z) ∈ Λ (4.5)
i=1 j=1
and for some (y0 , y, z) ∈ Λ,

m
X l
X
λ0 y0′ + λi yi′ + µj zj′ > 0. (4.6)
i=1 j=1
Consider (y0 , y, z) ∈ Λ. For any α0 > 0 and α ∈ int Rm + , (y0 +α0 , y +α, z) ∈ Λ.
Therefore, by (4.5), i′ = 0, 1, . . . , m,
 ′

1  
m
X Xl iX −1 Xm
λi′ ≥ − λ0 y0 + λi yi + µj zj + λi αi + λi αi ,
αi′  i=1 i=1 i=0 ′

i=i +1
which as the limit αi′ → +∞ yields λi′ ≥ 0, i′ = 0, 1, . . . , m. Using the above

technique, we can also show that µj ≥ 0, j ∈ J c . The reader is advised to
check this out. Observe that µj for j ∈ J are unrestricted.
Let us proceed by assuming P that J is nonempty. Therefore, there exist
αj > 0, j ∈ J such that j∈J αj hj (x) = 0 for every x ∈ Rn . Redefining
λi , i = 0, 1, . . . , m, and µj , j = 1, 2, . . . , l, as
λ̂i = λi , i = 0, 1, . . . , m, µ̂j = µj , j ∈ J c and µ̂j = µj + γαj , j ∈ J,
where γ > 0 is chosen such that µ̂j > 0 for j ∈ J. Also, observe that
X X X X
µ̂j hj (x) = µj hj (x) + γαj hj (x) = µj hj (x).
j∈J j∈J j∈J j∈J
Thus, the conditions (4.5) and (4.6) hold for (λ̂0 , λ̂, µ̂) as well.
We claim that λ̂0 , λ̂i , i = 1, 2, . . . , m, and µ̂j , j ∈ J c , are not all simul-
taneously zero. On the contrary, assume that λ̂0 = 0, λ̂i = 0, i = 1, 2, . . . , m,

and µ̂j = 0, j ∈ J c . Therefore, from the construction of Λ along with (4.5)

yields
X
µ̂j hj (x) ≥ 0, ∀ x ∈ Rn .
j∈J
As x̄ is feasible for (CP ), the above condition becomes

X
µ̂j hj (x̄) = 0.
j∈J
P
Therefore, the affine function j∈J µ̂j hj (.) achieves its minimum over Rn at
x̄. Moreover, an affine function is unbounded over Rn . This shows that
X
µ̂j hj (x) = 0, ∀ x ∈ Rn .
j∈J
By condition (4.6), there exists x′ ∈ Rn associated to (y0′ , y ′ , z ′ ) ∈ Λ such that

X
µ̂j hj (x′ ) > 0.
j∈J
Hence, a contradiction is reached. Therefore, λ̂0 , λ̂i , i = 1, 2, . . . , m, and

µ̂j , j ∈ J c , are not all simultaneously zero.
Next suppose that λ̂0 = 0 and λ̂i = 0, i = 1, 2, . . . , m, and for some j ∈ J c ,
µ̂j > 0. Again working along the preceding lines, one obtains
X X
µ̂j hj (x) + µ̂j hj (x) = 0, ∀ x ∈ Rn .
j∈J c :µ̂j >0 j∈J
Observe that {j ∈ J c : µ̂j > 0} is nonempty. Because the above condition

holds for j ∈ {j ∈ J c : µ̂j > 0} ∪ {j ∈ J : µ̂j > 0}, thereby contradicting
the maximality of the index set J. Hence λ̂0 and λ̂i , i = 1, 2, . . . , m, are not
simultaneously zero.
Assume that λ̂0 = 0. As the modified Slater constraint qualification holds,
there exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) ≤ 0,
j = 1, 2, . . . , l, corresponding to x̂,
(f (x̂) − f (x̄), g1 (x̂), . . . , gm (x̂), h1 (x̂), . . . , hl (x̂)) ∈ Λ,
which along with condition (4.5) and the modified Slater constraint qualifica-
tion leads to
m
X l
X
0> λ̂i gi (x̂) + µ̂j hj (x̂) ≥ 0,
i=1 j=1
which is a contradiction. Hence λ̂0 6= 0.

Now dividing (4.5) throughout by λ̂0 yields

m
X l
X
y0 + λ̄i yi + µ̄j zj ≥ 0, ∀ (y0 , y, z) ∈ Λ, (4.7)
i=1 j=1
λ̂i µ̂j
where λ̄i = , i = 1, 2, . . . , m, and µ̄j = , j = 1, 2, . . . , l. Corresponding
λ̂0 λ̂0
to every x ∈ Rn ,
(f (x) − f (x̄), g1 (x), . . . , gm (x), h1 (x), . . . , hl (x)) ∈ Λ,
thereby reducing the inequality (4.7) to

m
X l
X
f (x̄) ≤ f (x) + λ̄i gi (x) + µ̄j hj (x), ∀ x ∈ Rn . (4.8)
i=1 j=1
By the feasibility of x̄ for (CP ) and the fact that (λ̄, µ̄) ∈ Rm l
+ × R+ , condition
(4.8) implies that
L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ Rn .
In particular, taking x = x̄ in (4.8), along with the feasibility of x̄, leads to

m
X l
X
λ̄i gi (x̄) + µ̄j hj (x̄) = 0.
i=1 j=1
This shows that
λ̄i gi (x̄) = 0, i = 1, 2, . . . , m and µ̄j hj (x̄) = 0, j = 1, 2, . . . , l,
thereby establishing the complementary slackness condition. For any

(λ, µ) ∈ Rm l
+ × R+ , again by the feasibility of x̄,
m
X l
X m
X l
X
λi gi (x̄) + µj hj (x̄) ≤ 0 = λ̄i gi (x̄) + µ̄j hj (x̄),
i=1 j=1 i=1 j=1
that is,
L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄), ∀ λ ∈ Rm l

+ , µ ∈ R+ ,
thereby leading to the desired result. The converse of the above the result can
be obtained in a manner similar to Theorem 4.2.
In the convex programming problem (CP ) considered by Jaffray and
Pomerol [65], the problem involved only convex non-affine and affine inequal-
ities. Next we present a similar result from Florenzano and Van [47] to derive

the saddle point criteria under a modified version of Slater constraint quali-
fication but for a more general scenario involving additional affine equalities
and abstract constraints in (4.4). Consider the feasible set C of the convex
programming problem (CP ) as
C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , s,
hj (x) = 0, j = s + 1, s + 2, . . . , l}, (4.9)
where gi : Rn → R, i = 1, 2, . . . , m, are convex non-affine functions;

hj : Rn → R, j = 1, 2, . . . , l, are affine functions; and X ⊂ Rn is a con-
vex set. Corresponding to this problem, the associated Lagrangian function
L : X × Rm l
+ × R → R is defined as
m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x),
i=1 j=1
where µ = (µ̂, µ̃) ∈ Rs+ × Rl−s . Then (x̄, λ̄, µ̄) is called the saddle point of the
above problem if
L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X, λ ∈ Rm l

+, µ ∈ R ,
ˆ, µ̄
where µ = (µ̂, µ̃) and µ̄ = (µ̄ ˜) are in Rs+ × Rl−s .
Theorem 4.4 Consider the convex programming problem (CP ) with C de-
fined by (4.9). Let x̄ be a point of minimizer of (CP ). Assume that there exists
x̂ ∈ ri X such that
hj (x̂) ≤ 0, j = 1, 2, . . . , s,
hj (x̂) = 0, j = s + 1, s + 2, . . . , l.
Then there exist (λ0 , λ) ∈ R+ × Rm + with (λ0 , λ) 6= (0, 0), and

µ = (µ̂, µ̃) ∈ Rs+ × Rl−s such that
m
X l
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x) + µj hj (x), ∀ x ∈ X,
i=1 j=1
λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s.
Proof. Consider the set
Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) − f (x̄) < y0 ,

gi (x) < yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}.
It can be easily shown as in the proof of Theorem 4.3 that Λ is a convex set.

Also, Λ is nonempty because corresponding to the point of minimizer x̄ of

(CP ), one can define (y0 , y, z) ∈ Λ as
y0 > 0, yi > 0, i = 1, 2, . . . , m, and zj = hj (x̄), j = 1, 2, . . . , l.
As Λ is a nonempty convex set, by Proposition 2.14, ri Λ is also a nonempty

convex set. Note that
Λ ∩ (R1+m+s
− × {0Rl−s }) = ∅.
Otherwise, there exists an element in Λ such that the associated x ∈ X is

feasible for (CP ) satisfying f (x) < f (x̄), which is a contradiction to the fact
that x̄ is a point of minimizer of (CP ). Therefore, by Proposition 2.15,
ri Λ ∩ ri (R1+m+s
− × {0Rl−s }) = ∅.
Invoking the Proper Separation Theorem, Theorem 2.26 (iv), there exists
(λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that
m
X l
X m
X s
X
λ0 y0 + λi yi + µj zj ≥ λ0 w0 + λi wi + µj vj (4.10)
i=1 j=1 i=1 j=1
for every (y0 , y, z) ∈ Λ and (w0 , w, v) ∈ R1+m+s , and there exists

(y0′ , y ′ , z ′ ) ∈ Λ such that
m
X l
X
λ0 y0′ + λi yi′ + µj zj′ > 0. (4.11)
i=1 j=1
Let us partition µ = (µ̂, µ̃) ∈ Rs × Rl−s . We claim that λ0 ≥ 0, λ ∈ Rm +

and µ̂ ∈ Rs+ . Corresponding to the point of minimizer x̄, choose y0 > 0,
yi > 0, i = 1, 2, . . . , m, and zj = hj (x̄), j = 1, 2, . . . , l. From condition (4.10),
for i′ = 0, 1, . . . , m,
′
iX−1 m
X Xl Xm Xs
1
λi′ ≥ {− λi yi − λi yi − µj zj + λ0 w0 + λi wi + µj vj }.
yi′ i=0 ′ j=1 i=1 j=1
i=i +1
Taking the limit as yi′ → ∞ yields λi′ ≥ 0, i′ = 0, 1, . . . , m. Again from

(4.10), for j ′ = 1, 2, . . . , s,
Xm Xl Xm
1
µ ≥
j′ {λ0 y0 + λi yi + µj zj − λ0 w0 − λi wi
vj ′ i=1 j=1 i=1
′
jX −1 s
X
− µj vj − µj vj }.
j=1 j=j ′ +1
Taking the limit as vj ′ → ∞ leads to µj ′ ≥ 0, j ′ = 1, 2, . . . , s.

Now consider any x ∈ X and δ > 0. Define

y0 = f (x) − f (x̄) + δ,
yi = gi (x) + δ, i = 1, 2, . . . , m,
zj = hj (x), j = 1, 2, . . . , l.
Therefore, (y0 , y, z) ∈ Λ and for (0, 0, 0) ∈ Rm+s
− × {0Rl−s }, the condition
(4.10) yields that for every x ∈ X and every δ > 0,
m
X l
X m
X
λ0 (f (x) − f (x̄)) + λi gi (x) + µj hj (x) + λi δ ≥ 0.
i=1 j=1 i=0
Because δ > 0 was arbitrarily chosen, as δ → 0 the above condition reduces

to
m
X l
X
λ0 (f (x) − f (x̄)) + λi gi (x) + µj hj (x) ≥ 0, ∀ x ∈ X. (4.12)
i=1 j=1
In particular, for x = x̄, condition (4.12) yields

m
X l
X
λi gi (x̄) + µj hj (x̄) ≥ 0,
i=1 j=1
which along with the feasibility of x̄ for (CP ) leads to

λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s,
as in the proof of Theorem 4.3.
We claim that (λ0 , λ) 6= (0, 0). On the contrary, suppose that
(λ0 , λ) = (0, 0). Therefore, condition (4.12) leads to
l
X
µj hj (x) ≥ 0, ∀ x ∈ X.
j=1
By the given hypothesis, for x̂ ∈ ri X along with the above inequality implies
that
l
X
µj hj (x̂) = 0,
j=1
Pl
that is, the affine function j=1 µj hj (.) achieves its minimum at a relative
interiorPpoint. Because an affine function achieves its minimum at a boundary
l
point, j=1 µj hj (.) has a constant value zero over X, that is,
l
X
µj hj (x) = 0, ∀ x ∈ X. (4.13)
j=1

Corresponding to (y0′ , y ′ , z ′ ) ∈ Λ satisfying (4.11) there exists x′ ∈ X such

that
l
X
µj hj (x′ ) > 0,
j=1
which contradicts (4.13). Therefore, λi , i = 0, 1, . . . , m, are not all simultane-

ously zero, which along with (4.12) leads to the desired result.
fined by (4.9). Assume that the modified Slater constraint qualification is sat-
isfied, that is there exists x̂ ∈ ri X such that
gi (x̂) < 0, i = 1, 2, . . . , m,
hj (x̂) ≤ 0, j = 1, 2, . . . , s,
hj (x̂) = 0, j = s + 1, s + 2, . . . , l.
Then x̄ is a point of minimizer of (CP ) if and only if there exist λ̄ ∈ Rm
+,
¯ µ̃)
µ̄ = (µ̂, ¯ ∈ Rs+ × Rl−s such that
L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X, λ ∈ Rm l

+,µ ∈ R ,
where µ = (µ̂, µ̃) ∈ Rs+ × Rl−s along with

Proof. Because the modified Slater constraint qualification is satisfied, the hy-
pothesis of Theorem 4.4 also holds. Thus, if x̄ is a point of minimizer of (CP ),
there exist (λ0 , λ) ∈ R+ ×Rm s
+ with (λ0 , λ) 6= (0, 0) and µ = (µ̂, µ̃) ∈ R+ × R
l−s
such that
m
X l
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x) + µj hj (x), ∀ x ∈ X (4.14)
i=1 j=1
and
λi gi (x̄) = 0, i = 1, 2, . . . , m, and µj hj (x̄) = 0, j = 1, 2, . . . , s. (4.15)
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. Because
(λ0 , λ) 6= (0, 0), λ =
6 0. Therefore, the optimality condition (4.14) becomes
m
X l
X
λi gi (x) + µj hj (x) ≥ 0, ∀ x ∈ X.
i=1 j=1
In particular, for x = x̂, the above condition along with the modified Slater
constraint qualification leads to
m
X l
X
0> λi gi (x̂) + µj hj (x̂) ≥ 0,
i=1 j=1

which is a contradiction. Thus, λ0 > 0 and hence dividing (4.14) throughout

by λ0 yields
m
X l
X
f (x̄) ≤ f (x) + λ̄i gi (x) + µ̄j hj (x), ∀ x ∈ X,
i=1 j=1
λi µj
where λ̄i = , i = 1, 2, . . . , m, and µ̄j = , j = 1, 2, . . . , l. This inequality
λ0 λ0
along with the condition (4.15) leads to
L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X.
As x̄ is feasible for (CP ), g(x̄) ∈ −Rm s

+ , −ĥ(x̄) ∈ R+ and h̃(x̄) = {0}Rl−s .
Therefore, for λ ∈ Rm + , µ = (µ̂, µ̃) ∈ R s
+ × Rl−s
,
m
X l
X m
X l
X
λi gi (x̄) + µj hj (x̄) ≤ 0 = λ̄i gi (x̄) + µ̄j hj (x̄),
i=1 j=1 i=1 j=1
which leads to
L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄),
thereby proving the desired saddle point result. The converse can be worked
out as in Theorem 4.2.
Observe that the saddle point condition in the above theorem
L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X
can be rewritten as
m
X s
X l
X
f (x̄) + λ̄i gi (x̄) + µ̂j hj (x̄) + µ̃j hj (x̄) + δX (x̄)
i=1 j=1 j=s+1
m
X s
X l
X
≤ f (x) + λi gi (x) + µ̂j hj (x) + µ̃j hj (x) + δX (x)
i=1 j=1 j=s+1
for every x ∈ Rn . The above inequality implies that

l
X s
X l
X
0 ∈ ∂(f + λi gi + µ̂j hj (x) + µ̃j hj (x) + δX )(x̄).
i=1 j=1 j=s+1
By the modified Tl qualification x̂ ∈ ri X and therefore,

Tm Slater constraint
ri dom f ∩ i=1 ri dom gi ∩ j=1 ri dom hj ∩ ri dom δX = ri X is

4.4 Lagrangian Duality 185
nonempty. Applying the Sum Rule, Theorem 2.91 along with the fact that
∂δX (x̄) = NX (x̄) yields the KKT optimality condition
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∂hj (x̄) + ∂( µ̃j hj )(x̄) + NX (x̄).
i=1 j=1 j=s+1
By the affineness of hj , j = 1, 2, . . . , l, ∂hj (x̄) = {∇hj (x̄)}, thereby reducing

the above condition to the standard KKT optimality condition
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∇hj (x̄) + µ̃j ∇hj (x̄) + NX (x̄).
i=1 j=1 j=s+1
We state this discussion as the following result.

fined by (4.9). Assume that the modified Slater constraint qualification is
satisfied. Then x̄ is a point of minimizer of (CP ) if and only if there exist
λi ≥ 0, i = 1, 2, . . . , m; µ̂j ≥ 0, j = 1, 2, . . . , s; and µ̃j ∈ R, j = s + 1, . . . , l,
such that
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∇hj (x̄) + µ̃j ∇hj (x̄) + NX (x̄)
i=1 j=1 j=s+1
along with
4.4 Lagrangian Duality

In the beginning of this chapter, we tried to motivate the notion of a saddle
point using two-person-zero-sum games. We observed that two optimization
problems were being simultaneously solved. Player 1 was solving a minimiza-
tion problem while Player 2 was solving a maximization problem. The maxi-
mization problem is usually referred to as the dual of the minimization prob-
lem. Similarly, corresponding to the problem (CP ), one can actually construct
a dual problem following an approach quite similar to that of the two-person-
zero-sum games. Consider the problem (CP ) with the feasible set given by
(4.9). Then if vL denotes the optimal value of (CP ), then observe that
vL = inf sup L(x, λ, µ̂, µ̃),
x∈C (λ,µ̂,µ̃)∈Ω
where Ω = Rm s
+ × R+ × R
l−s
. Taking a clue from the two-person-zero-sum
games, the dual problem to (CP ) that we denote by (DP ) can be stated as

sup w(λ, µ̂, µ̃) subject to (λ, µ̂, µ̃) ∈ Ω, (DP )

where w(λ, µ̂, µ̃) = min L(x, λ, µ̂, µ̃). We denote the optimal value of (DP )
x∈X
by dL . Our main aim here is to check if
dL = sup w(λ, µ̂, µ̃) = vL , (4.16)

(λ,µ̂,µ̃)∈Ω
that is,
sup inf L(x, λ, µ̂, µ̃) = inf sup L(x, λ, µ̂, µ̃).
(λ,µ̂,µ̃)∈Ω x∈X x∈C (λ,µ̂,µ̃)∈Ω
The statement (4.16) is known as strong duality. We now present a result that
shows when strong duality holds.
Theorem 4.7 Consider the problem (CP ) where the set C is defined by (4.9).
Assume that (CP ) has a lower bound, that is, it has an infimum value, vL ,
that is finite. Also, assume that the modified Slater constraint qualification is
satisfied. Then the dual problem (DP ) has a supremum and the supremum is
attained with
vL = d L .
Proof. We always have vL ≥ dL . This is absolutely straightforward and we

urge the reader to establish this. This is called weak duality.
The problem (CP ) has an infimum, vL , that is,
vL = inf f (x).
x∈C
Working along the lines of the proof of Theorem 4.4, we conclude from (4.12)
ˆ, µ̄
that there exists nonzero (λ̄0 , λ̄, µ̄ ˜) ∈ R+ × Rm s
+ × R+ × R
l−s
such that
m
X s
X l
X
λ̄0 (f (x) − vL ) + λ̄i gi (x) + ˆj hj (x) +
µ̄ ˜j hj (x) ≥ 0, ∀ x ∈ X.
µ̄
i=1 j=1 j=s+1
As the modified Slater constraint qualification holds, by Theorem 4.5, it is

simple to observe that λ̄0 6= 0 and without loss of generality, assume λ̄0 = 1.
Hence,
m
X s
X l
X
(f (x) − vL ) + λ̄i gi (x) + ˆj hj (x) +
µ̄ ˜j hj (x) ≥ 0, ∀ x ∈ X.
µ̄
i=1 j=1 j=s+1
Therefore,
ˆ, µ̄
L(x, λ̄, µ̄ ˜) ≥ vL , ∀ x ∈ X,

ˆ, µ̄
that is, w(λ̄, µ̄ ˜) ≥ vL . Hence,
sup ˆ, µ̄
w(λ, µ̂, µ̃) ≥ w(λ̄, µ̄ ˜ ) ≥ vL .
(λ,µ̂,µ̃)∈Ω
By the weak duality, vL ≥ sup(λ,µ̂,µ̃)∈Ω w(λ, µ̂, µ̃). Thus,
sup w(λ, µ̂, µ̃) = vL = inf f (x),

(λ,µ̂,µ̃)∈Ω x∈C
thereby establishing the strong duality between (CP ) and (DP ).

It is important to note that the assumption of the Slater constraint qual-
ification is quite crucial as its absence can give a positive duality gap. We
provide below the following famous example due to Duffin [35].
Example 4.8 Consider the primal problem

q
inf ex2 subject to x21 + x22 ≤ x1 .
The Lagrangian dual problem is
max w(λ) subject to λ ∈ Rm

+,
where
q
w(λ) = inf2 ex2 + λ( x21 + x22 − x1 ), λ ≥ 0.
x∈R
Observe that the only feasible point of the primal problem is (x1 , x2 ) = (0, 0)
and hence inf ex2 = e0 = 1. Thus, the minimum value or the infimum value
of the primal problem is vL = 1. Now let us evaluate the p function w(λ) for
each λ ≥ 0. Observe that for every fixed x2 , the term ( x21 + x22 − x1 ) → 0
as x1 → +∞. Thus, for each x2 , the value ex2 dominates the expression
q
ex2 + λ( x21 + x22 − x1 )
as x1 → +∞. Hence, for a fixed x2 ,

q
inf ex2 + λ( x21 + x22 − x1 ) = ex2 .
x1
By letting x2 → −∞,
w(λ) = 0, ∀ λ ≥ 0.
Therefore, the supremum value of the dual problem is dL = 0. Hence, there

is a positive duality gap. Observe that the Slater constraint qualification does
not hold in the primal case.

We are now going to present some deeper properties of the dual variables
(or Lagrange multipliers) for the problem (CP ) with convex non-affine in-
equality, that is, the feasible set C is given by (4.1),
C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m}.
The set of Lagrange multipliers at a given solution x̄ of (CP ) is given as

m
X
M(x̄) = {λ ∈ Rm
+ : 0 ∈ ∂f (x̄) + λi ∂gi (x̄), λi gi (x̄) = 0, i = 1, 2, . . . , m}.
i=1
It is quite natural to think that when we change x̄, the set of multipliers will
also change. We now show that for a convex programming problem, the set
M(x̄) does not depend on the solution x̄. Consider the set
M = {λ ∈ Rm
+ : inf f (x) = infn L(x, λ)}. (4.17)
x∈C x∈R
In the following result we show that M(x̄) = M for any solution x̄ of (CP ).
The proof of this fact is from Attouch, Buttazzo, and Michaille [3].
fined by (4.1). Let x̄ be the point of minimizer of (CP ). Then M(x̄) = M.
Proof. Suppose that λ ∈ M(x̄). Then
0 ∈ ∂x L(x̄, λ)
with λi gi (x̄) = 0, i = 1, 2, . . . , m, where ∂x L denotes the subdifferential with

respect to x. Hence, x̄ solves the problem
min L(x, λ) subject to x ∈ Rn .
Therefore, for every x ∈ Rn ,

m
X m
X
f (x̄) + λi gi (x̄) ≤ f (x) + λi gi (x),
i=1 i=1
which along with λi gi (x̄) = 0, i = 1, 2, . . . , m, implies

m
X
f (x̄) ≤ f (x) + λi gi (x), ∀ x ∈ Rn .
i=1
Thus,
m
X
f (x̄) = infn (f + λi gi )(x) = infn L(x, λ).
x∈R x∈R
i=1

Further, f (x̄) = inf f (x). Hence, λ ∈ M.

x∈C
Conversely, suppose that λ ∈ M, which implies
m
X
f (x̄) = infn (f + λi gi )(x).
x∈R
i=1
Therefore,
m
X
f (x̄) ≤ f (x̄) + λi gi (x̄),
i=1
thereby yielding
m
X
λi gi (x̄) ≥ 0.
i=1
The above inequality along with the feasibility of x̄ for (CP ) and nonnegativity
of λi , i = 1, 2, . . . , m, leads to
m
X
λi gi (x̄) = 0.
i=1
This further yields
λi gi (x̄) = 0, i = 1, 2, . . . , m.
Thus,
m
X m
X
f (x̄) + λi gi (x̄) = infn (f + λi gi )(x),
x∈R
i=1 i=1
which implies that x̄ solves the problem
min L(x, λ) subject to x ∈ Rn .
Therefore, 0 ∈ ∂x L(x̄, λ). As dom f = dom gi = Rn , i = 1, 2, . . . , m, applying

the Sum Rule, Theorem 2.91,
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄).
i=1
This combined with the fact that λi gi (x̄) = 0, i = 1, 2, . . . , m, shows that

λ ∈ M(x̄), thereby establishing that M(x̄) = M.
Remark 4.10 In the above theorem, x̄ was chosen to be any arbitrary solu-
tion of (CP ). Thus, it is clear that M(x̄) is independent of the choice of x̄
and hence M(x̄) = M for every solution x̄ of (CP ).

Note that the above result can be easily extended to the problem with
feasible set C defined by (4.9), that is, convex non-affine and affine inequalities
along with affine equalities. If we take a careful look at the set M, we realize
that for λ ∈ Rm + it is not essential that (CP ) has a solution; one merely
needs (CP ) to be bounded below. Thus Attouch, Buttazzo, and Michaille [3]
call the set M to be the set of generalized Lagrange multipliers. Of course
if (CP ) has a solution, then M is the set of Lagrange multipliers. We now
show how deeply the notion of Lagrange multipliers is associated with the
perturbation of the constraints of the problem. From a numerical point of
view, it is important to deal with constraint perturbations. Note that due to
rounding off and other errors, often the iterates do not satisfy the constraints
exactly but some perturbed version of it, that is, possibly in the form
gi (x) ≤ yi , i = 1, 2, . . . , m.
Thus, the function
v(y) = inf{f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}
is called the value function or the marginal function associated with (CP ). It
is obvious that if v(0) ∈ R, then v(0) is the optimal value of (CP ). We now
establish that v : Rm → R̄ is a convex function. In order to show that, we
need the following interesting and important lemma.
Lemma 4.11 Consider Φ : Rn × Rm → R ∪ {+∞}, which is convex in both

variables. Then the function
φ(v) = infn Φ(u, v)

u∈R
is a convex function in v.
Proof. Consider (vi , αi ) ∈ epis φ, i = 1, 2, that is,
φ(vi ) < αi , i = 1, 2.
Therefore, there exist ū1 , ū2 ∈ Rn such that by the definition of infimum,
Φ(ūi , vi ) < αi , i = 1, 2.
By the convexity of Φ, for every λ ∈ [0, 1],
Φ((1 − λ)ū1 + λū2 , (1 − λ)v1 + λv2 ) ≤ (1 − λ)Φ(ū1 , v1 ) + λΦ(ū2 , v2 )

< (1 − λ)α1 + λα2 ,
which implies
φ((1 − λ)v1 + λv2 ) < (1 − λ)α1 + λα2 , ∀ λ ∈ [0, 1].

Thus
((1 − λ)v1 + λv2 ), (1 − λ)α1 + λα2 ) ∈ epis φ,
which by Proposition 2.50 leads to the convexity of φ.

Observe that the value function can be expressed as
v(y) = infn {f (x) + δC(y) (x)}, (4.18)

x∈R
where C(y) = {x ∈ Rn : gi (x) ≤ yi , i = 1, 2, . . . , m}. Now to prove the

convexity of the value function, what one needs to show is that f (x)+δC(y) (x)
is convex in both the variables x as well as y, and we leave it to the reader.
Once that is done, we just have to use Lemma 4.11 to conclude that v is a
convex function.
Through the following result given in Attaouch, Buttazzo, and
Michaille [3], we show how the Lagrange multipliers (or the generalized La-
grange multipliers) are related to the value function.
Theorem 4.12 (i) Let v(0) ∈ R, then M = −∂v(0). Further, if the Slater
constraint qualification holds, then v is continuous at the origin and hence M
is convex compact set in Rm +.
(ii) Consider the problem

sup − v ∗ (−λ) subject to λ ∈ Rm
+. (DP 1)
The solutions of (DP 1) coincide with the set M. Further, for every λ ∈ Rm
+,
m
X
−v ∗ (−λ) = infn {f (x) + λi gi (x)}.
x∈R
i=1
Thus the problem (DP 1) coincides with the Lagrangian dual problem of (CP ).
Proof. (i) We begin by proving M = −∂v(0). Consider any λ ∈ M. By the

definition of the value function v (4.18) and M (4.17),
m
X
v(0) = infn {f + δC }(x) = infn {f + λi gi }(x).
x∈R x∈R
i=1
For any given y ∈ Rm , consider the set
C(y) = {x ∈ Rn : gi (x) ≤ yi , i = 1, 2, . . . , m}.
As λ ∈ Rm
+ , for any x ∈ C(y),
m
X m
X
λi gi (x) ≤ λi yi ,
i=1 i=1

which implies that

m
X m
X
f (x) + λi gi (x) ≤ f (x) + λi yi .
i=1 i=1
Therefore,
m
X m
X
inf {f + λi gi }(x) ≤ inf f (x) + λi yi . (4.19)
x∈C(y) x∈C(y)
i=1 i=1
Because C(y) ⊂ Rn , by Proposition 1.7,

m
X m
X
infn {f + λi gi }(x) ≤ inf {f + λi gi }(x).
x∈R x∈C(y)
i=1 i=1
As λ ∈ M, by (4.17) along with (4.19) leads to

m
X
v(0) ≤ v(y) + λi yi ,
i=1
that is,
v(y) ≥ v(0) + h−λ, y − 0i, ∀ y ∈ Rm .
This yields that −λ ∈ ∂v(0), thereby establishing that M ⊂ −∂v(0).

Conversely, suppose that λ ∈ −∂v(0), that is, −λ ∈ ∂v(0). We will prove
that λ ∈ M. Consider any y ∈ Rm + . Then it is easy to observe that C ⊂ C(y).
inf f (x) ≥ inf f (x),

x∈C x∈C(y)
that is,
v(0) ≥ v(y), ∀ y ∈ Rm
+.
As −λ ∈ ∂v(0), which along with the above inequality leads to
hλ, yi ≥ v(0) − v(y) ≥ 0.
Because y ∈ Rm m
+ was arbitrary, it is clear that λ ∈ R+ . We now establish that
λ ∈ M by proving that
m
X
inf f (x) = infn {f + λi gi }(x),
x∈C x∈R
i=1
that is,
m
X
v(0) = infn {f + λi gi }(x).
x∈R
i=1

Pm
Note that if x ∈ C, gi (x) ≤ 0, i = 1, 2, . . . , m. Then i=1 λi gi (x) ≤ 0 as
λi ≥ 0, i = 1, 2, . . . , m. Thus,
m
X
f (x) + λi gi (x) ≤ f (x), ∀ x ∈ C.
i=1
Therefore,
m
X m
X
infn {f + λi gi }(x) ≤ inf {f + λi gi }(x) ≤ inf f (x) = v(0). (4.20)
x∈R x∈C x∈C
i=1 i=1
The fact that −λ ∈ ∂v(0) leads to
v(y) + hλ, yi ≥ v(0), ∀ y ∈ Rm ,
that is, for every y ∈ Rm ,

m
X
v(y) + λi yi ≥ v(0).
i=1
Consider any x̃ ∈ Rn and set ỹ = gi (x̃), i = 1, 2, . . . , m. Therefore, the above

inequality leads to
m
X
v(ỹ) + λi gi (x̃) ≥ v(0).
i=1
By the definition (4.18) of value function v(ỹ) ≤ f (x̃), which along with the
above inequality leads to
m
X
f (x̃) + λi gi (x̃) ≥ v(0).
i=1
Because x̃ was arbitrary,

m
X
infn {f + λi gi }(x) ≥ v(0). (4.21)
x∈R
i=1
Combining (4.21) with (4.20),

m
X
v(0) = infn {f + λi gi }(x).
x∈R
i=1
Therefore, λ ∈ M and thus establishing that M = −∂v(0).

Now assume that v(0) is finite and the Slater constraint qualification holds,
that is, there exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. Thus there

exists δ > 0 such that for every y ∈ Bδ (0) = δB, gi (x̂) < yi , i = 1, 2, . . . , m,
which implies that
v(y) ≤ f (x̂), ∀ y ∈ Bδ (0). (4.22)
As dom f = Rn , f (x̂) < +∞, thereby establishing that v is bounded above
on Bδ (0). This fact shows that
Bδ (0) × [f (x̂), +∞) ⊂ epi v.
We claim that v(y) > −∞ for every y ∈ Rm . On the contrary, assume that
there exists ŷ ∈ Rm such that v(ŷ) = −∞. Thus,
{ŷ} × R ⊂ epi v.
Consider z = −αŷ such that α > 0 and kzk < δ. This is possible by choosing
δ 1 1−λ
α= . Setting λ = , we have λ ∈ (0, 1) and α = . This implies
2kŷk 1+α λ
−(1 − λ)
that z = ŷ, that is,
λ
λz + (1 − λ)ŷ = 0.
By choice, z ∈ Bδ (0), which by (4.22) implies that v(z) ≤ f (x̂) and thus,
(z, f (x̂)) ∈ Bδ (0) × [f (x̂), +∞) ⊂ epi v.
Further, for every t ∈ R,
(ŷ, t) ∈ {ŷ} × R ⊂ epi v.
As v is convex, by Proposition 2.48, epi v is a convex set, which implies that
(λz + (1 − λ)ŷ, λf (x̂) + (1 − λ)t) ∈ epi v,
that is,
(0, λf (x̂) + (1 − λ)t) ∈ epi v.
Therefore,
v(0) ≤ λf (x̂) + (1 − λ)t, ∀ t ∈ R.
Taking the limit as t → −∞, v(0) ≤ −∞. But v(0) ≥ −∞ and hence
v(0) = −∞, which is a contradiction because v(0) ∈ R. By Theorem 2.72,
the function v : Rm → R ∪ {+∞} is majorized on a neighborhood of the ori-
gin and hence v is continuous at y = 0. Then by Proposition 2.82, ∂v(0) is
convex compact set, which implies so is M.
(ii) We already know that λ ∈ M if and only −λ ∈ ∂v(0). Therefore, from
Theorem 2.108,
−λ ∈ ∂v(0) ⇐⇒ v(0) + v ∗ (−λ) = 0,

which implies λ ∈ M if and only if
v(0) + v ∗ (−λ) = 0.
From (i) we know that v is continuous at y = 0. Thus, by Proposition 2.106,

v(0) = v ∗∗ (0). By Definition 2.101 of the biconjugate,
v ∗∗ (0) = sup {−v ∗ (µ)} = sup {−v ∗ (−µ)}.

µ∈Rm µ∈Rm
Thus λ ∈ M if and only if
−v ∗ (−λ) = v ∗∗ (0) = sup {−v ∗ (−µ)},

µ∈Rm
which is equivalent to the fact that λ solves the problem
sup −v ∗ (µ) subject to µ ∈ Rm .
Observe that
v ∗ (µ) = sup {hµ, yi − v(y)}

y∈Rm
= sup {hµ, yi − inf {f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}}
y∈Rm x
= sup {hµ, yi + sup{−f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}}

y∈Rm x
= sup {hµ, yi − f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}.

(y,x)∈Rm ×Rn
If for some i ∈ {1, 2, . . . , m}, µi > 0, then v ∗ (µ) = +∞. So assume that
µ ∈ −Rm + . Then
m
X
v ∗ (µ) = sup {−f (x) + sup µi yi }.
x∈Rn yi ≥gi (x) i=1
Pm
As i=1 µi yi = hµ, yi is a linear function,
m
X m
X
sup µi yi = µi gi (x).
yi ≥gi (x) i=1 i=1
Hence, for µ ∈ −Rm

+,
Xm
v ∗ (µ) = sup { µi gi (x) − f (x)}.
x∈Rn i=1
In particular, for µ = −λ,

m
X
v ∗ (−λ) = sup {−(f (x) + λi gi (x))},
x∈Rn i=1

which implies
m
X
∗
−v (−λ) = infn {f (x) + λi gi (x)}.
x∈R
i=1
Thus, −v ∗ (−λ) = w(λ), thereby showing that the dual problems (DP ) and
(DP 1) are the same.
4.5 Fenchel Duality

In the last section it was clear that the notion of conjugation is linked to the
understanding of Lagrangian duality. In this section we explore this relation a
bit more. We will focus on Fenchel duality where the dual problem is expressed
explicitly in terms of the conjugate functions. Also we shall make a brief
presentation of Rockafellar’s perturbation approach to duality. Our approach
to Fenchel duality will be that of Borwein and Lewis [17], which we present
below.
Theorem 4.13 Consider proper convex functions f : Rn → R̄ and

g : Rm → R̄ and a linear map A : Rn → Rm . Let vF , dF ∈ R̄ be the opti-
mal values of the primal and the dual problems given below:
vF = infn {f (x) + g(Ax)} and dF = sup {−f ∗ (AT λ) − g ∗ (−λ)},

x∈R φ∈Rm
where AT denotes the conjugate of the linear map A or the transpose of the
matrix represented by A. In fact, A can be viewed as an m×n matrix. Assume
that the condition
0 ∈ core(dom g − A dom f )
holds. Then vF = dF and the supremum in the dual problem is attained if the
optimal value is finite. (Instead of the term core, one can also use interior or
relative interior.)
Proof. We first prove that vF ≥ dF , that is, the weak duality holds. By the
definition of conjugate function, Definition 2.101,
f ∗ (AT λ) = sup {hAT λ, xi − f (x)} ≥ hλ, Axi − f (x), ∀ x ∈ Rn ,

x∈Rn
which implies
f (x) ≥ hλ, Axi − f ∗ (AT λ), ∀ x ∈ Rn .

4.5 Fenchel Duality 197
Similarly, we have
g(Ax) ≥ −hλ, Axi − g ∗ (−λ), ∀ x ∈ Rn .
The above inequalities immediately show that for any λ ∈ Rm and any x ∈ Rn ,
f (x) + g(Ax) ≥ −f ∗ (AT λ) − g ∗ (−λ).
Thus, the above inequality it yields that vF ≥ dF .

Next, to prove the equality under the given constraint qualification, define
the function h : Rm → R̄ as
h(y) = infn {f (x) + g(Ax + y)}.

x∈R
In the parlance of optimization, h is referred to as the optimal value function or

just a value function. Here the vector y acts as a parameter. See the previous
section for more details. Using Lemma 4.11, it is easy to observe that h is
convex. We urge the reader to reason out for himself / herself. Further, one
must decide what dom h is. We claim that
dom h = dom g − A dom f.
Consider y ∈ dom h, that is, h(y) < +∞. Hence there exists x ∈ Rn such that
x ∈ dom f and Ax + y ∈ dom g, which leads to
y ∈ dom g − A dom f.
This holds for every y ∈ dom h and thus
dom h ⊂ dom g − A dom f.
Let z ∈ dom g − A dom f , which implies that there exists u ∈ dom g and
x̂ ∈ dom f such that z = u − Ax̂. Hence z + Ax̂ ∈ dom g, that is,
f (x̂) + g(z + Ax̂) < +∞.
Thus h(z) < +∞, thereby showing that z ∈ dom h. This proves the assertion
toward the domain of h.
Note that if vF = −∞, there is nothing to prove. Without loss of gen-
erality, we assume that vF is finite. By assumption, 0 ∈ core(dom h) (or
0 ∈ int(dom h)). By Proposition 2.82, ∂h(0) 6= ∅, which implies that there
exists −ξ ∈ ∂h(0). Thus, by Definition 2.77 of the subdifferential along with
the definition of h,
h(0) ≤ h(y) + hξ, yi ≤ f (x) + g(Ax + y) + hξ, yi, ∀ y ∈ Rm .
Hence,
h(0) ≤ {f (x) − hA∗ ξ, xi} + {g(Ax + y) − h−ξ, Ax + yi}.

Taking the infimum first over y and then over x yields that
h(0) ≤ −f ∗ (A∗ ξ) − g ∗ (−ξ) ≤ dF ≤ vF ≤ h(0),
thereby establishing that vF = dF . Observe that the dual value is obtained at

λ = ξ.
It is important to mention that the above problem was also studied by
Rockafellar [97]. In Rockafellar [97], the function g is taken to be a concave
function, Definition 2.46, and the objective function of the primal problem
and the dual problem are, respectively, given as
f (x) − g(Ax) and g∗ (λ) − f ∗ (AT λ).
Further, g∗ denotes the conjugate of the concave function g, which is defined

as
g∗ (λ) = infm {hλ, yi − g(y)}.

y∈R
From the historical point of view, we provide the statement of the classical
Fenchel duality theorem as it appears in Rockafellar [97].
Theorem 4.14 Consider a proper convex function f : Rn → R̄ and a proper

concave function g : Rn → R̄. Assume that one of the following conditions
holds:
(1) ri(dom f ) ∩ ri(dom g) 6= ∅,
(2) f and g are lsc and ri(dom f ∗ ) ∩ ri(dom g∗ ) 6= ∅.
Then
inf {f (x) − g(x)} = sup {g∗ (λ) − f ∗ (λ)}. (4.23)

x∈Rn λ∈Rn
We request the readers to figure out how one will define the notion of a
proper concave function. Of course, if we consider g to be a convex function,
(4.23) can be written as
inf {f (x) + g(x)} = sup {−g ∗ (−λ) − f ∗ (λ)}.

x∈Rn λ∈Rn
Note that this can be easily proved using Theorem 4.13 by taking A to be
the identity mapping I : Rn → Rn . Moreover, ri(dom f ) ∩ ri(dom g) 6= ∅
shows that 0 ∈ int(dom g − dom f ). Hence the result follows by invoking
Theorem 4.13.
We now look into the perturbation-based approach. This approach is due
to Rockafellar. Rockafellar’s monogarph [99] entitled Conjugate Duality and
Optimization makes a detailed study of this method in an infinite dimen-
sional setting. We however discuss the whole issue from a finite dimensional

4.5 Fenchel Duality 199
viewpoint. In this approach, one considers the original problem being embed-
ded in a family of problems. In fact, we begin by considering the convexly
parameterized family of convex problems
min F (x, y) subject to x ∈ Rn , (CP (y))
where the vector y is called the parameter and the function F : Rn × Rm → R̄
is assumed to be proper convex jointly in x and y. In fact, in such a situation,
the optimal value function
v(y) = inf F (x, y)

x∈C
is a convex function by Lemma 4.11. Of course, the function F is so chosen

that f0 (x) = F (x, 0), where

f (x), x ∈ C,
f0 (x) =
+∞, otherwise.
In fact, (CP ) can be viewed as
thus embedding the original problem (CP ) in (CP (y)).

Now we pose the standard convex optimization problem as (CP (y)). Con-
sider the problem (CP ) with C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Corresponding to

(CP ), introduce the family of parameterized problems (CP (y)) as follows
min F (x, y) subject to x ∈ Rn ,
where

f (x), gi (x) ≤ yi , i = 1, 2, . . . , m,
F (x, y) =
+∞, otherwise.
It is clear that

f (x), gi (x) ≤ 0, i = 1, 2, . . . , m,
F (x, 0) = f0 (x) =
+∞, otherwise.
Recall that the Lagrangian function corresponding to (CP ) is given by

f (x) + hλ, g(x)i, λ ∈ Rm
+,
L(x, λ) =
+∞, otherwise.
Next we look at how to construct the dual problem for (CP (y)). Define the
Lagrangian function L : Rn × Rm → R̄ as
L(x, λ) = infm {F (x, y) + hy, λi},

y∈R

that is,
−L(x, λ) = sup {hy, λi − F (x, y)}.

y∈Rm
Observe that
F ∗ (x∗ , λ∗ ) = sup {hx∗ , xi + hλ∗ , yi − F (x, y)}

x∈Rn ,y∈Rm
= sup {hx , xi + sup (hλ∗ , yi − F (x, y))}
∗
x∈Rn y∈Rm
= sup {hx , xi − L(x, λ∗ )}.
∗
x∈Rn
Thus,
−F ∗ (0, λ∗ ) = infn L(x, λ∗ ).

x∈R
Hence the Fenchel dual problem associated with (CP ) is

sup (−F ∗ (0, λ)) subject to λ ∈ Rm . (DPF )
With the given Lagrangian in a similar fashion as before, one can define a
saddle point (x, λ) of the Lagrangian function L(x, λ).
We now state without proof the following result. For proof, see for example
Lucchetti [79] and Rockafellar [99].
Theorem 4.15 Consider the problem (CP ) and (DPF ) as given above. Then
the following are equivalent:
(i) (x̄, λ̄) be a saddle point of L,
(ii) x̄ is a solution for (CP ) and λ̄ is a solution for (DPF ) and there is no
duality gap.
For more details on the perturbation-based approach, see Lucchetti [79]

and Rockafellar [99].
4.6 Equivalence between Lagrangian and Fenchel Dual-

ity
In the previous sections, we studied two types of duality theories, namely the
Lagrangian duality and the Fenchel duality. The obvious question that comes
to mind is whether the two theories are equivalent or not. It was shown by
Magnanti [81] that for a convex programming problem, both these forms of

4.6 Equivalence between Lagrangian and Fenchel Duality 201
duality coincide. We end this chapter by taking a look at the equivalence

between the two duality theories based on the approach of Magnanti [81].
Consider the following convex programming problems:
Lagrange: inf f (x) subject to x ∈ C,

Fenchel: inf (f1 (x) − f2 (x)) subject to x ∈ C1 ∩ C2 ,
where
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m,
hj (x) = 0, j = 1, 2, . . . , l, x ∈ X},
f : X → R, f1 : C1 → R are convex functions; f2 : C2 → R is a con-

cave function; gi : X → R, i = 1, 2, . . . , m, are convex non-affine functions;
hj : Rn → R, j = 1, 2, . . . , l, are affine functions; and C1 , C2 , X are convex
subsets of Rn . Denote the optimal values of the Lagrangian and the Fenchel
convex programming problems as vL and vF , respectively. Observe that the
Lagrangian problem is a particular case of (CP ) with C given by (4.9). Cor-
responding to the two convex programming problem, we have the following
dual problems:
Lagrange: sup inf L(x, λ, µ) subject to (λ, µ) ∈ Rm l

+ ×R ,
x∈X
Fenchel: sup ((f2 )∗ (ξ) − f1∗ (ξ)) subject to ξ ∈ Rn ,
where the Lagrangian function L : Rn × Rm l

+ × R is defined as
m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x).
i=1 j=1
As fi are defined over Ci , that is, dom fi = Ci for i = 1, 2, the conjugate

functions reduce to
f1∗ (ξ) = sup {hξ, xi − f1 (x)},

x∈C1
(f2 )∗ (ξ) = inf {hξ, xi − f2 (x)}.
x∈C2
Denote the optimal values of the Lagrangian and the Fenchel dual problems
as dL and dF , respectively. Note that f1∗ (ξ) = +∞ for some ξ ∈ Rn is a pos-
sibility. Similarly, for the concave conjugate, (f2 )∗ (ξ) = −∞ for some ξ ∈ Rn
is also a possibility. But these values play no role in the Fenchel dual problem
and thus the problem may be considered as
Fenchel: sup ((f2 )∗ (ξ) − f1∗ (ξ)) subject to ξ ∈ C1∗ ∩ C2∗ ,
where
C1∗ = {ξ ∈ Rn : f1∗ (ξ) < +∞} and C2∗ = {ξ ∈ Rn : (f2 )∗ (ξ) > −∞}.

By Theorem 4.7 and Theorem 4.14, we have the strong duality results for the
Lagrangian and the Fenchel problems, respectively, that is,
vL = d L and vF = d F .
Now we move on to show that the two strong dualities are equivalent. But
before doing so we present a result from Magnanti [81] on relative interior.
Lemma 4.16 Consider the set
Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 ,
gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}.
If x̂ ∈ ri X such that
f (x̂) < ŷ0 , gi (x̂) < ŷi , i = 1, 2, . . . , m, and hj (x̂) = ẑj , j = 1, 2, . . . , l,
then (ŷ0 , ŷ, ẑ) ∈ ri Λ.
Proof. By the convexity of the functions f , gi , i = 1, 2, . . . , m, and hj ,
j = 1, 2, . . . , l, and the set X, it is easy to observe that the set Λ is con-
vex. It is left to the reader to verify this fact. To prove the result, we will
invoke the Prolongation Principle, Proposition 2.14 (iii).
Consider (y0 , y, z) ∈ Λ, that is, there exists x ∈ X such that
f (x) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m, and hj (x) = zj , j = 1, 2, . . . , l.
n
Because X ⊂ R is a nonempty convex set and x̂ ∈ ri X, by the Prolongation
Principle, there exists γ > 1 such that
γ x̂ + (1 − γ)x ∈ X,
which by the convexity of X yields that
αx̂ + (1 − α)x ∈ X, ∀ α ∈ (1, γ]. (4.24)
As dom f = dom gi = X, i = 1, 2, . . . , m with x̂ ∈ ri X, for some α ∈ (1, γ],
f (αx̂ + (1 − α)x) < αŷ0 + (1 − α)y0 , (4.25)
gi (αx̂ + (1 − α)x) < αŷi + (1 − α)yi , i = 1, 2, . . . , m. (4.26)
By the affineness of hj , j = 1, 2, . . . , l,
hj (αx̂ + (1 − α)x) < αẑj + (1 − α)zj , j = 1, 2, . . . , l. (4.27)
Combining the conditions (4.24) through (4.27) yields that for α > 1,
α(ŷ0 , ŷ, ẑ) + (1 − α)(y0 , y, z) ∈ Λ.
Because (y0 , y, z) ∈ Λ was arbitrary, by the Prolongation Principle,
(ŷ0 , ŷ, ẑ) ∈ ri Λ as desired.
Now we present the equivalence between the strong duality results.

Theorem 4.17 Lagrangian strong duality is equivalent to Fenchel strong du-

ality, that is, Theorem 4.7 is equivalent to Theorem 4.14.
Proof. Suppose that the Lagrangian strong duality, Theorem 4.7, holds under
the assumption of modified Slater constraint qualification, that is, there exists
x̂ ∈ ri X such that
gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) = 0, j = 1, 2, . . . , l.
Define X = C1 × C2 × Rn and x = (x1 , x2 , x3 ). The Fenchel convex program-
ming problem can now be expressed as
vF = inf (f1 (x1 ) − f2 (x2 )),
x∈C
where
C = {x ∈ R3n : hrj (x) = (xj − x3 )r = 0, j = 1, 2, r = 1, 2, . . . , n, x ∈ X}.
Observe that here hj : Rn → Rn . Note that the reformulated Fenchel problem
is nothing but the Lagrangian convex programming problem. The correspond-
ing Lagrangian dual problem is as follows:
n
X n
X
dL = sup inf {f1 (x1 ) − f2 (x2 ) + µr1 (x1 − x3 )r + µr2 (x2 − x3 )r },
(µ1 ,µ2 )∈R2n x∈X r=1 r=1
that is,
dL = sup inf {f1 (x1 ) − f2 (x2 ) + hµ1 , x1 i + hµ2 , x2 i
(µ1 ,µ2 )∈R2n x∈X
−hµ1 + µ2 , x3 i}. (4.28)

From the assumption of Theorem 4.14,
ri(dom f1 ) ∩ ri(dom f2 ) = ri C1 ∩ ri C2 6= ∅,
which implies there exists x̂ ∈ Rn such that x̂ ∈ ri C1 ∩ ri C2 . Therefore,
x = (x̂, x̂, x̂) ∈ ri X such that hrj (x) = 0, j = 1, 2, r = 1, 2, . . . , n; thereby
implying that the modified Slater constraint qualification holds. Invoking the
Lagrangian strong duality, Theorem 4.7,
vF = d L , (4.29)
it is easy to note that if µ1 6= −µ2 , the infimum is −∞ as x3 ∈ Rn . So taking
the supremum over µ = −µ1 = µ2 along with Proposition 1.7, the Lagrangian
dual problem (4.28) leads to
dL = sup inf {f1 (x1 ) − f2 (x2 ) − hµ, x1 i + hµ, x2 i}
µ∈Rn (x1 ,x2 )∈C1 ×C2
= sup { inf (hµ, x2 i − f2 (x2 )) + inf (f1 (x1 ) − hµ, x1 i)}
µ∈Rn x2 ∈C2 x1 ∈C1
= sup {(f2 )∗ (µ) − f1∗ (µ)},

µ∈Rn

thereby implying that dL = dF . This along with the relation (4.29) yields that
vF = dF and hence the Fenchel strong duality holds.
Conversely, suppose that the Fenchel strong duality holds under the as-
sumption that ri C1 ∩ ri C2 6= ∅. Define
C1 = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 ,

gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}
and
C2 = {(y0 , y, z) ∈ R1+m+l : yi ≤ 0, i = 1, 2, . . . , m, zj = 0, j = 1, 2, . . . , l}.
The Lagrange convex programming problem can now be expressed as
vL = inf{y0 : (y0 , y, z) ∈ C1 ∩ C2 },
which is of the form of the Fenchel problem with f1 (y0 , y, z) = y0 and

f2 (y0 , y, z) = 0. The corresponding Fenchel dual problem is
dF = sup{((f2 )∗ (ξ) − f1∗ (ξ)) : ξ = (λ0 , λ, µ) ∈ R1+m+l }

= sup { inf {λ0 y0 + hλ, yi + hµ, zi}
(λ0 ,λ,µ)∈R1+m+l (y0 ,y,z)∈C2
− sup {λ0 y0 + hλ, yi + hµ, zi − y0 }

(y0 ,y,z)∈C1
= sup { inf {y0 − λ0 y0 − hλ, yi − hµ, zi}

(λ0 ,λ,µ)∈R1+m+l (y0 ,y,z)∈C1
+ inf {λ0 y0 + hλ, yi + hµ, zi}}. (4.30)

(y0 ,y,z)∈C2
By the assumption of Theorem 4.7, the modified Slater constraint qualifica-

tion holds, which implies that there exists x̂ ∈ ri X such that gi (x̂) < 0,
i = 1, 2, . . . , m, and hj (x̂) = 0, j = 1, 2, . . . , l. As dom gi = X, i = 1, 2, . . . , m,
by Theorem 2.69, gi , i = 1, 2, . . . , m, is continuous on ri X. Therefore, there
exists ŷi < 0 such that gi (x̂) < ŷi < 0, i = 1, 2, . . . , m. Also, as dom f = X
with x̂ ∈ ri X, one may choose ŷ0 ∈ R such that f (x̂) < ŷ0 . Thus, for x̂ ∈ ri X,
f (x̂) < ŷ0 , gi (x̂) < ŷi , i = 1, 2, . . . , m, and hj (x̂) = ẑj , j = 1, 2, . . . , l,
where ẑj = 0, j = 1, 2, . . . , l. By Lemma 4.16, (ŷ0 , ŷ, ẑ) ∈ ri C1 ∩ ri C2 , which

implies ri C1 ∩ ri C2 6= ∅. Thus, by Theorem 4.14,
vL = d F . (4.31)
From the definition of C2 , the second infimum in (4.30) reduces to

m
X
inf{λ0 y0 + λi yi : y0 ∈ R, yi ≤ 0, i = 1, 2, . . . , m}.
i=1

The infimum is −∞ if either λ0 6= 0 or λi > 0 for some i = 1, 2, . . . , m and

takes the value 0 otherwise. Therefore, the Fenchel dual problem becomes
dF = sup inf {y0 − hλ, yi − hµ, zi}

(λ,µ)∈Rm l (y0 ,y,z)∈C1
− ×R
m
X l
X
= sup inf {y0 + λi yi + µj zj }
(λ,µ)∈Rm l (y0 ,y,z)∈C1
+ ×R i=1 j=1
m
X l
X
= sup inf {f (x) + λi gi (x) + µj hj (x)},
(λ,µ)∈Rm l x∈X
+ ×R i=1 j=1
which yields that dF = dL . This along with (4.31) implies that vL = dL ,

thereby establishing the Lagrangian strong duality.

Chapter 5
Enhanced Fritz John Optimality
Conditions
5.1 Introduction
Until now we have studied how to derive the necessary KKT optimality con-
ditions for convex programming problems (CP ) or its slight variations such
as (CP 1), (CCP ) or (CCP 1) via normal cone or saddle point approach. Ob-
serve that in the KKT optimality conditions, the multiplier associated with
the subdifferential of the objective function is nonzero and thus normalized
to one. As discussed in Chapters 3 and 4, some additional conditions known
as the constraint qualifications are to be satisfied by the constraints to ensure
that the multiplier is nonzero and hence the KKT optimality conditions hold.
But in absence a of constraint qualification, one may not be able to derive
KKT optimality conditions. For example, consider the problem
min x subject to x2 ≤ 0.
In this example, f (x) = x and g(x) = x2 with C = {0} at which none of

the constraint qualifications is satisfied. Observe that the KKT optimality
conditions is also not satisfied at the point of minimizer x̄ = 0, the only
feasible point, as there do not exist λ0 = 1 and λ ≥ 0 such that
λ0 ∇f (x̄) + λ∇g(x̄) = 0 and λg(x̄) = 0.
In this chapter, we will consider the convex programming problem

min f (x) subject to x ∈ C (CP 1)
where (CP 1) which involves not only inequality constraints but also additional
abstract constraints, that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X}.
Here f, gi : Rn → R, i = 1, 2, . . . , m, are convex functions on Rn and X ⊂ Rn

is a closed convex set. Below we present the standard Fritz John optimality
conditions for (CP 1).
207

208 Enhanced Fritz John Optimality Conditions
Theorem 5.1 Consider the convex programming problem (CP 1) and let x̄ be
a point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m,
not all simultaneously zero, such that
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, ∀ i = 1, 2, . . . , m.
i=1
Proof. As x̄ is a point of minimizer of (CP 1), it is a point of minimizer of

the problem
min F (x) subject to x ∈ X,
where F (x) = max{f (x) − f (x̄), g1 (x), g2 (x), . . . , gm (x)} is a convex function.
Therefore by the optimality condition (ii) of Theorem 3.1,
0 ∈ ∂F (x̄) + NX (x̄).
Applying the Max-Function Rule, XTheorem 2.96, there exist λ0 ≥ 0 and

λi ≥ 0, i ∈ I(x̄), satisfying λ0 + λi = 1 such that
i∈I(x̄)
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i∈I(x̄)
where I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is the active index set at x̄. For
i∈
/ I(x̄), defining λi = 0 yields
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄)
i=1
along with λi gi (x̄) = 0, i = 1, 2, . . . , m, hence completing the proof.

Note that in the example considered earlier, the Fritz John optimality
condition holds if one takes λ0 = 0 and λ > 0. Observe that the Fritz John
optimality conditions are only necessary and not sufficient. To study the suf-
ficiency optimality conditions, one needs KKT optimality conditions.
5.2 Enhanced Fritz John Conditions Using the

Subdifferential
Recently, Bertsekas [11, 12] studied the Fritz John optimality conditions,
which are more enhanced than those stated above and hence called them en-
hanced Fritz John optimality conditions. The proof of the enhanced Fritz John

5.2 Enhanced Fritz John Conditions Using the Subdifferential 209
optimality condition involves the combination of the quadratic penalty func-

tion and metric approximation approaches. The penalty function approach
is an important theoretical as well as algorithmic method in the study of
constrained programming problems. Corresponding to the given problem, a
sequence of the unconstrained penalized problem is formulated and in the lim-
iting scenario, the sequence of point of minimizers of the penalized problem
converges to the point of minimizer of the original constrained problem. The
approach of metric approximations was introduced by Mordukhovich [84, 85].
This approach involves approximating the objective function and the con-
straint functions by smooth functions and reducing the constrained into an
unconstrained problem. The work of Bertsekas [11, 12] was based mostly on
the work of Hestenes [55], which was in turn motivated by the penalty function
approach of McShane [83] to establish the Fritz John optimality conditions.
It was the work of Hestenes [55] in which the complementary slackness was
strengthened to obtain a somewhat weaker condition than the complemen-
tary violation condition, which we will discuss in the subsequent derivation of
enhanced Fritz John optimality condition. In their works, McShane [83] and
Hestenes [55] considered X = Rn while Bertsekas extended the study when
X 6= Rn . Below we discuss the above approach to establish the enhanced Fritz
John optimality conditions for the convex programming problem (CP 1).
Theorem 5.2 Consider the convex programming problem (CP 1) and let x̄ be
the point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m,
not all simultaneously zero, such that
m
X
(i) 0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄).
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ 6= ∅, then

there exists a sequence {xk } ⊂ X that converges to x̄ and is such that
for all k sufficiently large,
f (xk ) < f (x̄) and ¯

λi gi (xk ) > 0, ∀ i ∈ I.
Proof. For k = 1, 2, . . ., consider the penalized problem

min Fk (x) subject to x ∈ X ∩ cl Bε (x̄), (Pk )
where ε > 0 is such that f (x̄) ≤ f (x) for every x ∈ cl Bε (x̄) feasible to (CP 1).
The function Fk : Rn → R is defined as
m
kX + 1
Fk (x) = f (x) + (g (x))2 + kx − x̄k2 ,
2 i=1 2
where g + (x) = max{0, g(x)}. By the convexity of the functions f and

gi , i = 1, 2, . . . , m, Fk is a real-valued convex on Rn . As dom Fk = Rn ,
by Theorem 2.69, Fk is continuous on Rn . Also, as X is a closed convex set

and cl Bε (x̄) is a compact convex set, X ∩ cl Bε (x̄) is a compact convex subset

of Rn . By the Weierstrass Theorem, Theorem 1.14, there exists a point of
minimizer xk for the problem (Pk ). Therefore,
Fk (xk ) ≤ Fk (x̄), ∀ k ∈ N,
which implies
m
kX + 1
f (xk ) + (g (xk ))2 + kxk − x̄k2 ≤ f (x̄), ∀ k ∈ N. (5.1)
2 i=1 i 2
Because dom f = Rn , again by Theorem 2.69, f is continuous on Rn . Hence,

it is continuous on X ∩ cl Bε (x̄) and thus bounded over X ∩ cl Bε (x̄). By the
boundedness of f (xk ) over X ∩ cl Bε (x̄) and the relation (5.1), we have
lim gi+ (xk ) = 0, i = 1, 2, . . . , m. (5.2)

k→∞
Otherwise as k → +∞, the left-hand side of (5.1) also tends to infinity, which
is a contradiction.
As {xk } is a bounded sequence, by the Bolzano–Weierstrass Theorem,
Proposition 1.3, it has a convergent subsequence. Without loss of generality,
assume that {xk } converge to x̃ ∈ X ∩ cl Bε (x̄). By the condition (5.2),
gi (x̃) ≤ 0, i = 1, 2, . . . , m,
and hence x̃ is feasible for (CP 1). Taking the limit as k → +∞ in the condition
(5.1) yields
1
f (x̃) + kx̃ − x̄k2 ≤ f (x̄).
2
As x̄ is the point of minimizer of (CP 1) and x̃ is feasible to (CP 1),
f (x̄) ≤ f (x̃). Thus the above inequality reduces to
kx̃ − x̄k2 ≤ 0,
which implies kx̃ − x̄k = 0, that is, x̃ = x̄. Hence, the sequence xk → x̄ and
thus there exists a k̄ ∈ N such that xk ∈ ri X ∩ Bε (x̄) for every k ≥ k̄.
For k ≥ k̄, xk is a point of minimizer of the penalized problem (Pk ), which
by Theorem 3.1 implies that
0 ∈ ∂Fk (xk ) + NX∩Bε (x̄) (xk ).
As xk ∈ ri X ∩ Bε (x̄), by Proposition 2.39,
NX∩Bε (x̄) (xk ) = NX (xk ) + NBε (x̄) (xk ).
Again, because xk ∈ Bε (x̄), by Example 2.38, NBε (x̄) (xk ) = {0} and thus
0 ∈ ∂Fk (xk ) + NX (xk ), ∀ k ≥ k̄.

As dom f = dom gi = Rn , i = 1, 2, . . . , m, by Theorem 2.69, f and gi ,

i = 1, 2, . . . , m are continuous on Rn . Applying the Sum Rule and the Chain
Rule for the subdifferentials, Theorems 2.91 and 2.94, respectively, the above
condition becomes
m
X
0 ∈ ∂f (xk ) + k gi+ (xk )∂gi+ (xk ) + (xk − x̄) + NX (xk ), ∀ k ≥ k̄,
i=1
which implies that for every k ≥ k̄, there exist ξ0k ∈ ∂f (xk ) and ξik ∈ ∂gi (xk ),
i = 1, 2, . . . , m, such that
m
X
−{ξ0k + αik ξik + (xk − x̄)} ∈ NX (xk ), (5.3)
i=1
where αik = kβk gi+ (xk ) and βk ∈ [0, 1] for i = 1, 2, . . . , m. Denote

v
u m
u X 1 αk
k
γ = 1+ t (αik )2 , λk0 = k and λki = ki , i = 1, 2, . . . , m. (5.4)
i=1
γ γ
Observe that
m
X
(λk0 )2 + (λki )2 = 1, ∀ k ≥ k̄. (5.5)
i=1
Therefore, the sequences {λk0 } and {λki }, i = 1, 2, . . . , m, are bounded se-

quences in R+ and thus by the Bolzano–Weierstrass Theorem, Proposition 1.3
have a convergent subsequence. Without loss of generality, let λki → λi ,
i = 0, 1, . . . , m. As αik ≥ 0, i = 1, 2, . . . , m and γ k ≥ 1 for every k ≥ k̄,
λki ≥ 0 and thereby implying that λi ≥ 0, i = 0, 1, . . . , m. Also by condi-
tion (5.5), it is obvious that λ0 , λ1 , . . . , λm are not simultaneously zero. Now
dividing (5.3) by γ k leads to
m
X 1
−{λk0 ξ0k + λki ξik + (xk − x̄)} ∈ NX (xk ). (5.6)
i=1
γk
As f and gi , i = 1, 2, . . . , m are continuous at xk ∈ Rn , therefore by

Proposition 2.82, ∂f (xk ) and ∂gi (xk ), i = 1, 2, . . . , m, are compact. Thus
{ξik }, i = 0, 1, . . . , m, are bounded sequences in Rn and hence by the Bolzano–
Weierstrass Theorem have a convergent subsequence. Without loss of general-
ity, let ξik → ξi , i = 0, 1, . . . , m. By the Closed Graph Theorem, Theorem 2.84,
of the subdifferentials, ξ0 ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄) for i = 1, 2, . . . , m.
Taking the limit as k → +∞ in (5.6) along with the fact that the normal
cone NX has a closed graph yields
m
X
−{λ0 ξ0 + λi ξi } ∈ NX (x̄),
i=1

which implies
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i=1
thereby establishing condition (i).

Now suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is non-
empty. For i ∈ I,¯ corresponding to λi > 0, there exists a sequence λk → λi .
i
¯ By
Therefore, for all k sufficiently large, λki > 0 and hence λi λki > 0 for i ∈ I.
the condition (5.4), λi gi+ (xk ) > 0 for sufficiently large k, which implies
¯
λi gi (xk ) > 0, ∀ i ∈ I.
Also, by condition (5.1), f (xk ) < f (x̄) for sufficiently large k and hence con-
dition (ii) is satisfied, thereby yielding the requisite result.
Observe that the condition (ii) in the above theorem is a condition that
replaces the complementary slackness condition in the Fritz John optimal-
ity condition. According to the condition (ii), if λi > 0, the corresponding
constraint gi is violated at the points arbitrarily close to x̄. Thus the con-
dition (ii) is called the complementary violation condition by Bertsekas and
Ozdaglar [13].
Now let us consider, in particular, X = Rn and gi , i = 1, 2, . . . , m,
to be affine in (CP 1). Then from the above theorem there exist λi ≥ 0,
i = 0, 1, . . . , m, not all simultaneously zero, such that conditions (i) and (ii)
hold. Due to affineness of gi , i = 1, 2, . . . , m, we have
gi (x) = gi (x̄) + h∇gi (x̄), x − x̄i, ∀ x ∈ Rn . (5.7)
Suppose that λ0 = 0. Then by condition (i) of Theorem 5.2,

m
X
0= λi ∇gi (x̄),
i=1
which implies that

m
X
0= λi h∇gi (x̄), x − x̄i.
i=1
As all the scalars cannot be all simultaneously zero, the index set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. By condition (ii), there exists
a sequence {xk } ⊂ Rn such that gi (xk ) > 0 for i ∈ I. ¯ Therefore, by (5.7),
which along with the above condition for x = xk , leads to
m
X m
X X
λi gi (x̄) = λi gi (xk ) = λi gi (xk ) > 0,
i=1 i=1 i∈I¯

¯ thereby contradicting the feasi-
which implies that gi (x̄) > 0 for some i ∈ I,
bility of x̄. Thus λ0 > 0 and hence can be normalized to one, thereby leading
to the KKT optimality condition.
Observe that in the case as discussed above, the KKT optimality condition
holds without any assumption of constraint qualification. But if the convex
programming problem is not of the above type, to ensure that λ0 6= 0, one has
to impose some form of constraint qualification. In view of the enhanced Fritz
John optimality conditions, Bertsekas [12] introduced the notion of pseudo-
normality, which is defined as follows.
Definition 5.3 A feasible point x̄ of (CP 1) is said to be pseudonormal if

there does not exist any λi , i = 1, 2, . . . , m, and sequence {xk } ⊂ X such that
m
X
(i) 0 ∈ λi ∂gi (x̄) + NX (x̄)
i=1
(ii) λi ≥ 0, i = 1, 2, . . . , m and λi = 0 for i 6∈ I(x̄). Recall that

I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} denotes the active index set at x̄.
(iii) {xk } converges to x̄ and
m
X
λi gi (xk ) > 0, ∀ k ∈ N.
i=1
Below we present a result to show how the affineness of gi , i = 1, 2, . . . , m,

or the Slater-type constraint qualification ensure the pseudonormality at a
feasible point.
Theorem 5.4 Consider the problem (CP 1) and let x̄ be a feasible point of
(CP 1). Then x̄ is pseudonormal under either one of the following two criteria:
(a) Linearity criterion: X = Rn and the functions gi , i = 1, 2, . . . , m, are

affine.
(b) Slater-type constraint qualification: there exists a feasible point x̂ ∈ X
of (CP 1) such that gi (x̂) < 0, i = 1, 2, . . . , m.
Proof. (a) Suppose on the contrary that x̄ is not pseudonormal, which implies
that there exist λi , i = 1, 2, . . . , m, and {xk } ⊂ Rn satisfying conditions (i),
(ii), and (iii) in the Definition 5.3. By the affineness of gi , i = 1, 2, . . . , m, for
every x ∈ Rn ,
gi (x) = gi (x̄) + h∇gi (x̄), x − x̄i,
which implies
m
X m
X m
X
λi gi (x) = λi gi (x̄) + λi h∇gi (x̄), x − x̄i, ∀ x ∈ Rn . (5.8)
i=1 i=1 i=1

By the conditions (i) and (ii) in the definition of pseudonormality,

m
X
0= λi ∇gi (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m,
i=1
thereby reducing the condition (5.8) to

m
X
λi gi (x) = 0, ∀ x ∈ Rn .
i=1
This is a contradiction of condition (iii) of Definition 5.3 at x̄. Hence, x̄ is

pseudonormal.
(b) On the contrary, suppose that x̄ is not pseudonormal. By the convexity of
gi , i = 1, 2, . . . , m, for every x ∈ Rn ,
gi (x) − gi (x̄) ≥ hξi , x − x̄i, ∀ ξi ∈ ∂gi (x̄), i = 1, 2, . . . , m. (5.9)
By condition (i) in the definition of pseudonormality, there exist ξ¯i ∈ ∂gi (x̄),
i = 1, 2, . . . , m, such that
m
X
λi hξ¯i , x − x̄i ≥ 0, ∀ x ∈ X.
i=1
The above inequality along with condition (ii) reduces the condition (5.9) to
m
X
λi gi (x) ≥ 0, ∀ x ∈ X. (5.10)
i=1
As the Slater constraint qualification is satisfied at x̂ ∈ X,

m
X
λi gi (x̂) < 0
i=1
if λi > 0 for some i ∈ I(x̄). Thus, the condition (5.10) holds only if λi = 0
for i = 1, 2, . . . , m. But then this contradicts condition (iii). Therefore, x̄ is
pseudonormal.
In Chapter 3 we derived the KKT optimality conditions under the Slater
constraint qualification as well as the Abadie constraint qualification. For
the convex programming problem (CP ) considered in previous chapters, the
feasible set C was given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}.
Recall that the Abadie constraint qualification is said to hold at x̄ ∈ C if
TC (x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)},

where I(x̄) is the active index set at x̄. But unlike the Slater constraint quali-
fication, the Abadie constraint qualification need not imply pseudonormality.
For better understanding, let us recall the example
C = {x ∈ R : |x| ≤ 0, x ≤ 0}.
From the discussion in Chapter 3, we know that the Abadie constraint qual-
ification is satisfied at x̄ = 0 but the Slater constraint qualification does not
hold as the feasible set C = {0}. Observe that both constraints are active at
x̄. Taking the scalars λ1 = λ2 = 1 and the sequence {xk } as {1/k}, conditions
(i), (ii), and (iii) in Definition 5.3 are satisfied. Thus, x̄ = 0 is not pseudo-
normal. The Abadie constraint qualification is also known as quasiregularity
at x̄. This condition was defined for X = Rn . The notion of quasiregularity
is implied by the concept of quasinormality. This concept was introduced by
Hestenes [55] for the case X = Rn . The notion of quasinormality is further
implied by pseudonormality.
Now if X 6= Rn , the quasiregularity at x̄ is defined as
TC (x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)} ∩ TX (x̄).
The above condition was studied by Gould and Tolle [53] and Guignard [54].
It was shown by Ozdaglar [91] and Ozdaglar and Bertsekas [92] that under
the regularity (Chapter 2 end notes) of the set X, pseudonormality implies
quasiregularity. They also showed that unlike the case X = Rn where quasi-
regularity leads to KKT optimality conditions, the concept is not enough to
derive KKT conditions when X 6= Rn unless some additional conditions are
assumed. For more on quasiregularity and quasinormality, readers are advised
to refer to the works of Bertsekas.
Next we establish the KKT optimality conditions under the pseudonor-
mality assumptions at the point of minimizer.
Theorem 5.5 Consider the convex programming problem (CP 1). Assume
that x̄ satisfies pseudonormality. Then x̄ is a point of minimizer of (CP 1)
if and only if there exist λi ≥ 0, i = 1, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Proof. Observe that the complementary slackness condition is equivalent to

condition (ii) in the definition of pseudonormality. Therefore, λi = 0 for every
i∈/ I(x̄). Suppose that the multiplier λ0 associated with the subdifferential
of the objective function in the enhanced Fritz John optimality condition is
zero. Therefore,
m
X
0∈ λi ∂gi (x̄) + NX (x̄),
i=1

that is, condition (i) of Definition 5.3 holds. As all λi ≥ 0, i = 0, 1, . . . , m,

are not simultaneously zero, λi > 0 for some i ∈ I(x̄) and thus
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Therefore, by condition (ii) of
the enhanced Fritz John condition, there exists a sequence {xk } ⊂ X con-
verging to x̄ such that
¯
λi gi (xk ) > 0, ∀ i ∈ I,
which implies
m
X
λi gi (xk ) > 0.
i=1
Thus condition (iii) in the definition of pseudonormality is satisfied, thereby

implying that x̄ is not pseudonormal. This contradicts the given hypothesis.
Therefore, λ0 6= 0, thereby satisfying the KKT optimality conditions. The
sufficiency can be worked out using the convexity of the objective function
and the constraint functions along with the convexity of the set X as done in
Chapter 3.
5.3 Enhanced Fritz John Conditions under Restrictions

Observe that in the problem (CP 1), the functions f and gi , i = 1, 2, . . . , m,
are convex on Rn . But if these functions are convex only over the closed
convex set X, the line of proof of the above theorem breaks down. Bertsekas,
Ozdaglar, and Tseng [14] gave an alternative version of the enhanced Fritz
John optimality conditions, which is independent of the subdifferentials. The
proof given by them, which we present below relies on the saddle point theory.
Theorem 5.6 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex
set X ⊂ Rn and let x̄ be a point of minimizer of (CP 1). Then there exist
λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 f (x̄) = min{λ0 f (x) + λi gi (x)}.
x∈X
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ 6= ∅, then

there exists a sequence {xk } ⊂ X that converges to x̄ and is such that
lim f (xk ) = f (x̄) and lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m,
k→∞ k→∞
and for all k sufficiently large

f (xk ) < f (x̄) and ¯
gi (xk ) > 0, ∀ i ∈ I.

5.3 Enhanced Fritz John Conditions under Restrictions 217
Proof. For the positive integers k and r, consider the saddle point function
Lk,r : X × Rm
+ → R defined as
Xm
1 2 1
Lk,r (x, α) = f (x) + 3
kx − x̄k + αi gi (x) − kαk2 .
k i=1
2r
For fixed αi ≥ 0, i = 1, 2, . . . , m, by the lower semicontinuity and convexity

of the functions f and gi , i = 1, 2, . . . , m, over X, Lk,r (., α) is an lsc convex
function while for a fixed x ∈ X, Lk,r (x, .) is strongly concave and quadratic
in α. For every k, define the set
Xk = X ∩ B̄k (x̄).
Observe that Xk is a compact set. As f and gi , i = 1, 2 . . . , m, are lsc convex on

X, the functions are lsc, convex on Xk . Also, as Lk,r (x, .) is strongly concave,
it has a unique maximizer over Rm + and thus for some β ∈ R, the level set
{α ∈ Rm
+ : Lk,r (x̄, α) ≥ β}
is nonempty and compact. Thus by condition (iii) of the Saddle Point Theo-
rem, Proposition 4.1, Lk,r has a saddle point over Xk × Rm
+ , say (xk,r , αk,r ).
By the saddle point definition,
Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm

+ . (5.11)
As Lk,r (., αk,r ) attains an infimum over Xk at xk,r ,
Xm
1 2 1
Lk,r (xk,r , αk,r ) = f (xk,r ) + 3 kxk,r − x̄k + αk,r i gi (xk,r ) − kαk,r k2
k i=1
2r
Xm
1 2
≤ inf {f (x) + kx − x̄k + αk,r i gi (x)}
x∈Xk k3 i=1
Xm
1 2
≤ inf {f (x) + kx − x̄k + αk,r i gi (x)}
x∈Xk ,gi (x)≤0,∀i k3 i=1
1
≤ inf {f (x) + kx − x̄k2 }.
x∈Xk ,gi (x)≤0,∀i k3
As x̄ ∈ Xk and satisfies gi (x̄) ≤ 0, i = 1, 2, . . . , m, the above inequalities yield
Lk,r (xk,r , αk,r ) ≤ f (x̄). (5.12)
Again from (5.11), Lk,r (xk,r , .) attains a supremum over α ∈ Rm + at αk,r . As a

function of α ∈ Rm
+ , Lk,r (xk,r , .) is strongly concave and quadratic, and thus,
has a unique supremum at
αk,r i = rgi+ (xk,r ), i = 1, 2, . . . , m. (5.13)

We leave it to the readers to figure out how to compute αk,r i . Therefore,
1 r
Lk,r (xk,r , αk,r ) = f (xk,r ) + kxk,r − x̄k2 + kg + (xk,r )k2 , (5.14)
k3 2
which implies that
Lk,r (xk,r , αk,r ) ≥ f (xk,r ). (5.15)
From the conditions (5.12) and (5.15), we have
xk,r ∈ {x ∈ Xk : f (x) ≤ f (x̄)}.
As Xk is compact, the set {x ∈ Xk : f (x) ≤ f (xk )} is bounded and thus {xk,r }

forms a bounded sequence. In fact, we leave it to the readers to show that f
is also coercive on Xk . Thus, by the Bolzano–Weierstrass Theorem, Propo-
sition 1.3, for a fixed k the sequence {xk,r } has a convergent subsequence.
Without loss of generality, let {xk,r } converge to xk ∈ {x ∈ Xk : f (x) ≤ f (x̄)}.
As f is convex and coercive on Xk , by the Weierstrass Theorem, The-
orem 1.14, an infimum over Xk exists. Therefore for each k, the sequence
{f (xk,r )} is bounded below by inf x∈Xk f (x). Also, by condition (5.12),
Lk,r (xk,r , αk,r ) is bounded above by f (x̄). Thus, from (5.14),
lim sup gi (xk,r ) ≤ 0, i = 1, 2, . . . , m,

r→∞
which along with the lower semicontinuity of gi , i = 1, 2, . . . , m, implies that

gi (xk ) ≤ 0 for i = 1, 2, . . . , m, thereby yielding the feasibility of xk for (CP 1).
We urge the reader to work out the details. As x̄ is the minimizer of (CP 1),
f (xk ) ≥ f (x̄), which along with the conditions (5.12), (5.15), and the lower
semicontinuity of f leads to
f (x̄) ≤ f (xk ) ≤ lim inf f (xk,r ) ≤ lim sup f (xk,r ) ≤ f (x̄),

r→∞ r→∞
which implies that for each k,
lim f (xk,r ) = f (x̄).

r→∞
By the conditions (5.12) and (5.14), we have for every k ∈ N,
lim xk,r = x̄.

r→∞
Further note that using the definition of Lk,r (xk,r , αk,r ) and using (5.12) and
(5.15), for every k,
m
X
lim αk,ri gi (xk,r ) = 0.
r→+∞
i=1

Therefore, by the preceding conditions,

m
X
lim {f (xk,r ) − f (x̄) + αk,r i gi (xk,r )} = 0. (5.16)
r→∞
i=1
Denote
v
u m
u X 1 αk,r i
γ k,r = t1 + (αk,r i )2 , λk,r
0 = and λk,r
i = , i = 1, 2, . . . , m.
i=1
γ k,r γ k,r
(5.17)
Dividing (5.16) by γ k,r > 0 leads to
m
X
lim {λk,r k,r
0 f (xk,r ) − λ0 f (x̄) + λk,r
i gi (xk,r )} = 0.
r→∞
i=1
For each k, we fix an integer rk such that

m
X 1
|λk,r
0
k
f (xk,rk ) − λk,r
0
k
f (x̄) + λk,r
i
k
gi (xk,rk )| ≤ (5.18)
i=1
k
and
1 1 1
kxk,rk − x̄k ≤ , |f (xk,rk ) − f (x̄)| ≤ , |gi+ (xk,rk )| ≤ , i = 1, 2, . . . , m.
k k k
(5.19)
Dividing the saddle point condition
Lk,rk (xk,rk , αk,rk ) ≤ Lk,rk (x, αk,rk ), ∀ x ∈ Xk
by γ k,rk yields
Xm
λk,r k
λk,r
0
k
f (xk,rk ) 0 2
+ 3 kxk,rk − x̄k + λk,r
i
k
gi (xk,rk )
k i=1
Xm
1
≤ λk,r
0
k
f (x) + kx − x̄k2
+ λk,r
i
k
gi (x), ∀ x ∈ Xk .
k 3 γ k,rk i=1
As αik,rk ≥ 0, i = 1, 2, . . . , m, from the condition (5.17), γ k,rk ≥ 1 and

λk,r
i
k
≥ 0, i = 0, 1, . . . , m, along with
m
X
(λk,r
0
k 2
) + (λk,r
i
k 2
) = 1.
i=1
Therefore, {λk,r
i
k
}, i = 0, 1, . . . , m, are bounded sequences in R+ and thus
by the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent
subsequence. Without loss of generality, assume that λk,r i
k
→ λi ≥ 0,

i = 0, 1, . . . , m, not all simultaneously zero. Taking the limit as k → +∞

in the above inequality along with the condition (5.18) leads to
m
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X,
i=1
which implies
m
X
λ0 f (x̄) ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀i
i=1
≤ inf λ0 f (x)
= λ0 f (x̄).
Therefore, λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero, satisfy condi-

tion (i), that is,
m
X
λ0 f (x̄) = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
Next suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is

nonempty. Corresponding to λi > 0 for i ∈ I, ¯ there is a sequence λk,rk → λi
i
k,rk
such that λi > 0, i = 1, 2, . . . , m, which along with the condition (5.13)
implies
gi (xk,rk ) > 0, ∀ i ∈ I¯
for sufficiently large k. For each k, choosing rk such that xk,rk 6= x̄ and the
condition (5.19) is satisfied, implies that
xk,rk → x̄, f (xk,rk ) → f (x̄), gi+ (xk,rk ) → 0, i = 1, 2, . . . , m.
Also, by the condition (5.12),
f (xk,rk ) ≤ f (x̄),
thereby proving (ii) and hence establishing the requisite result.

Similar to the pseudonormality notion defined earlier, the notion is stated
as below for the enhanced Fritz John conditions obtained in Theorem 5.6.
Definition 5.7 The constraint set of (CP 1) is said to be pseudonormal if

there do not exist any scalars λi ≥ 0, i = 1, 2, . . . , m, and a vector x′ ∈ X
such that

m
X
(i) 0 = inf λi gi (x),
x∈X
i=1
m
X
(ii) λi gi (x′ ) > 0.
i=1
For a better understanding of the above definition of pseudonormality, we

recall the idea of proper separation from Definition 2.25. A hyperplane H is
said to separate two convex sets F1 and F2 properly if
sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i.
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2
Now consider a set G = {g(x) = (g1 (x), g2 (x), . . . , gm (x)) : x ∈ X}. Then
from Definition 5.7 it is easy to observe that pseudonormality implies that
there exists no hyperplane H that separates the set G and the origin {0}
properly.
Similar to Theorem 5.4, the pseudonormality of the constraint set can be
derived under the Linearity criterion or the Slater constraint qualification.
Theorem 5.8 Consider the problem (CP 1). Then the constraint set is
pseudonormal under either one of the following two criteria:
(a) Linearity criterion: X = Rn and the functions gi , i = 1, 2, . . . , m, are

affine.
(b) Slater--type constraint qualification: there exists a feasible point x̂ ∈ X

of (CP 1) such that gi (x̂) < 0, i = 1, 2, . . . , m.
Proof. (a) Suppose on the contrary that the constraint set is not pseudo-
normal, which implies that there exist λi ≥ 0, i = 1, 2, . . . , m, and a vector
x′ ∈ Rn satisfying conditions (i) and (ii) in the Definition 5.7. Suppose that
x̄ ∈ Rn is feasible to (CP 1), that is, gi (x̄) ≤ 0, i = 1, 2, . . . , m, which along
with condition (i) yields
Xm
λi gi (x̄) = 0. (5.20)
i=1
By the affineness of gi , i = 1, 2, . . . , m,
gi (x) = gi (x̄) + h∇gi (x̄), x − x̄i, ∀ x ∈ Rn ,
which again by condition (i) and (5.20) implies

m
X
λi h∇gi (x̄), x − x̄i ≥ 0, ∀ x ∈ Rn .
i=1

λ X = Rn
λ
G
0 0
G
H H
Linearity criterion Slater criterion
FIGURE 5.1: Pseudonormality.
Pm
By Definition 2.36 of the normal cone, i=1 λi ∇gi (x̄) ∈ NRn (x̄). As x̄ ∈ Rn ,
by Example 2.38, the normal cone NRn (x̄) = {0}, which implies
m
X
λi ∇gi (x̄) = 0.
i=1
This equality along with the condition (5.20) and the affineness of gi ,
i = 1, 2, . . . , m implies that
m
X
λi gi (x) = 0, ∀ x ∈ Rn ,
i=1
thereby contradicting condition (ii) in the definition of pseudonormality. Hence

the constraint set is pseudonormal.
(b) Suppose on the contrary that the constraint set is not pseudonormal.
As the Slater-type constraint qualification holds, there exists x̂ ∈ X such
that gi (x̂) < 0, i = 1, 2, . . . , m, condition (i) is satisfied only if λi = 0,
i = 1, 2, . . . , m, which contradicts condition (ii) in Definition 5.7. Therefore,
the constraint set is pseudonormal.
In case the Slater-type constraint qualification is satisfied, the set G inter-
sects the orthant {x ∈ Rm : xi ≤ 0, i = 1, 2, . . . , m} as shown in Figure 5.1.
Then obviously condition (i) in the definition of pseudonormality does not
hold; that is, there exists no hyperplane H passing through origin supporting
G such that G lies in the positive orthant. Now when one has the linearity
criterion, that is, X = Rn and gi , i = 1, 2, . . . , m, are affine, the set G is also
affine (see Figure 5.1) and thus, condition (ii) is violated; that is, the hyper-
plane H does not contain the set G completely. In the linearity criterion, if X
is a polyhedron instead of X = Rn along with gi , i = 1, 2, . . . , m, being affine,

λ H
0
FIGURE 5.2: Not pseudonormal.
pseudonormality need not hold as shown in Figure 5.2. These observations

were made by Bertsekas, Ozdaglar, and Tseng [14].
We end this section by establishing the KKT optimality conditions similar
to Theorem 5.5, under the pseudonormality of the constraint set.
functions f and gi , i = 1, 2, . . . , m are lsc and convex on the closed convex set
X ⊂ Rn . Assume that the constraint set is pseudonormal. Then x̄ is a point
of minimizer of (CP 1) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such
that
m
X
f (x̄) = min{f (x) + λi gi (x)} and λi gi (x̄) = 0, i = 1, 2, . . . , m.
x∈X
i=1
Proof. Suppose that in the enhanced Fritz John optimality condition, Theo-
rem 5.6, λ0 = 0. This implies
m
X
0 = min λi gi (x),
x∈X
i=1
that is, λi ≥ 0, i = 1, 2, . . . , m, satisfies condition (i) in the definition of

pseudonormality of the constraint set. As in the enhanced Fritz John condi-
tion λi , i = 0, 1, . . . , m, are not all simultaneously zero, there exists at least
one i ∈ {1, 2, . . . , m} such that λi > 0, that is, I¯ is nonempty. Again by The-
orem 5.6, there exist a sequence {xk } ⊂ X such that
¯
gi (xk ) > 0, ∀ i ∈ I,
which implies
X m
X
λi gi (xk ) = λi gi (xk ) > 0,
i∈I¯ i=1

that is, satisfying condition (ii) in Definition 5.7, thereby contradicting the
fact that the constraint sets are pseudonormal. Thus, λ0 6= 0 and hence can
be taken in particular as one, thereby establishing the optimality condition.
Using the optimality condition along with the feasibility of x̄ leads to
m
X
0≤ λi gi (x̄) ≤ 0,
i=1
that is,
m
X
λi gi (x̄) = 0.
i=1
As the sum of nonpositive terms is zero if every term is zero, the above equality
leads to
λi gi (x̄) = 0, i = 1, 2, . . . , m,
thereby establishing the complementary slackness condition.

Conversely, by the optimality condition,
m
X
f (x̄) ≤ f (x) + λi gi (x), ∀ x ∈ X.
i=1
In particular, for any x feasible to (CP 1), that is, x ∈ X satisfying gi (x) ≤ 0,
i = 1, 2, . . . , m, the above inequality reduces to
f (x̄) ≤ f (x),
thus proving that x̄ is a point of minimizer of (CP 1).
5.4 Enhanced Fritz John Conditions in the Absence of

Optimal Solution
Up to now in this chapter, one observes two forms of enhanced Fritz John op-
timality conditions, one when the functions are convex over the whole space
Rn while in the second scenario convexity of the functions is over the convex
set X 6= Rn . The results obtained in Section 5.3 are in a form similar to strong
duality. In all the results of enhanced Fritz John and KKT optimality condi-
tions, it is assumed that the point of minimizer exists. But what if the convex
programming problem (CP 1) has an infimum that is not attained? In such
a case is it possible to establish a Fritz John optimality condition that can
then be extended to KKT optimality conditions under the pseudonormality

5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 225
condition? The answer is yes and we present a result from Bertsekas [12] and
Bertsekas, Ozdaglar, and Tseng [14] to establish the enhanced Fritz John op-
timality conditions similar to those derived in Section 5.3. But in the absence
of a point of minimizer of (CP 1), the multipliers are now dependent on the
infimum, as one will observe in the theorem below.
functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn
and let finf < +∞ be the infimum of (CP 1). Then there exist λi ≥ 0 for
i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
λ0 finf = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
Proof. If the infimum finf = −∞, then by the condition
inf f (x) ≤ inf f (x) = finf ,

x∈X x∈X,gi (x)≤0,∀i
inf x∈X f (x) = −∞. Thus for λ0 = 1 and λi = 0, i = 1, 2, . . . , m, the requisite

condition is obtained.
Now suppose that finf is finite. To establish the Fritz John optimality
condition we will invoke supporting hyperplane theorem. For that purpose,
define a set in Rm+1 as
M = {(d0 , d) ∈ R × Rm : there exists x ∈ X such that

f (x) ≤ d0 , gi (x) ≤ di , i = 1, 2, . . . , m}.
We claim that M is a convex set. For j = 1, 2, consider (dj0 , dj ) ∈ M, which

implies that there exist xj ∈ X such that
f (xj ) ≤ dj0 and gi (xj ) ≤ dji , i = 1, 2, . . . , m.
As X is a convex set, for every µ ∈ [0, 1], y = µx1 + (1 − µ)x2 ∈ X. Also by

the convexity of f and gi , i = 1, 2, . . . , m,
f (y) ≤ µf (x1 ) + (1 − µ)f (x2 ) ≤ µd10 + (1 − µ)d20 ,

gi (y) ≤ µgi (x1 ) + (1 − µ)gi (x2 ) ≤ µd1i + (1 − µ)d2i , i = 1, 2, . . . , m,
which implies that µ(d10 , d1 ) + (1 − µ)(d20 , d2 ) ∈ M for every µ ∈ [0, 1]. Hence
M is a convex subset of R × Rm .
Next we prove that (finf , 0) ∈ / int M. On the contrary, suppose that
(finf , 0) ∈ int M, which by Definition 2.12 implies that there exists ε > 0
such that (finf − ε, 0) ∈ M. Thus, there exists x ∈ X such that
f (x) ≤ finf − ε and gi (x) ≤ 0, i = 1, 2, . . . , m.
From the above condition it is obvious that x is a feasible point of (CP 1),
thereby contradicting the fact that finf is the infimum of the problem (CP 1).

Hence (finf , 0) 6∈ int M. By the supporting hyperplane theorem, Theo-

rem 2.26 (i), there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that
m
X
λ0 finf ≤ λ0 d0 + λi di , ∀ (d0 , d) ∈ M. (5.21)
i=1
Let (d0 , d) = (d0 , d1 , . . . , dm ) ∈ M. Then for αi > 0,
(d0 , . . . , di−1 , di + αi , di+1 , . . . , dm ) ∈ M, i = 0, 1, . . . , m.
If for some i ∈ {0, 1, . . . , m}, λi < 0, then as the corresponding αi → +∞, it

leads to a contradiction of (5.21). Therefore, λi ≥ 0 for i = 0, 1, . . . , m.
It is easy to observe that (f (x), g(x)) = (f (x), g1 (x), g2 (x), . . . , gm (x)) ∈ M
for any x ∈ X. Therefore, the condition (5.21) becomes
m
X
λ0 finf ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X,
i=1
which implies
m
X
λ0 finf ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
i=1
≤ inf λ0 f (x)
= λ0 finf ,
thereby leading to the Fritz John optimality condition, as desired.

Note that in Theorem 5.10, there is no complementary slackness condition.
Under the Slater-type constraint qualification, that is, there exists x̂ ∈ X
such that gi (x̂) < 0, i = 1, 2, . . . , m, it can be ensured that λ0 6= 0. Other-
wise if λ0 = 0, from the Fritz John optimality condition, there exist λi ≥ 0,
i = 1, 2, . . . , m, not all simultaneously zero such that
m
X
λi gi (x) ≥ 0, ∀ x ∈ X,
i=1
which contradicts the Slater-type constraint qualification. This discussion can

be stated as follows.
functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn
and let finf < +∞ be the infimum of (CP 1). Assume that the Slater-type

constraint qualification holds. Then there exist λi ≥ 0, i = 1, 2, . . . , m, such

that
m
X
finf = inf {f (x) + λi gi (x)}.
x∈X
i=1
In Theorem 5.10, the Fritz John optimality condition is established in the

duality form in the absence of any point of minimizer of (CP 1) but at the
cost of the complementary slackness condition. Note that in Theorems 5.10
and 5.11, one requires the set X to be convex, but need not be closed. The
enhanced Fritz John optimality condition similar to Theorem 5.6 has also
been obtained in this scenario by Bertsekas, Ozdaglar, and Tseng [14] and
Bertsekas [12]. The proof is similar to that of Theorem 5.6 but complicated
as the point of minimizer does not exist.
set X ⊂ Rn and let finf < +∞ be the infimum of (CP 1). Then there exist
λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 finf = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ =

6 ∅, then
there exists a sequence {xk } ⊂ X such that
lim f (xk ) = finf and lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m

k→∞ k→∞
f (xk ) < finf and ¯

gi (xk ) > 0, ∀ i ∈ I.
Proof. If for every x ∈ X, f (x) ≥ finf , then the result holds for λ0 = 1 and
λi = 0, i = 1, 2, . . . , m.
Now suppose that there exists an x̄ ∈ X such that f (x̄) < finf , thereby
implying that finf is finite. Consider the minimization problem
min f (x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ Xk . (CP 1k )
In (CP 1k ), Xk is a closed convex subset of Rn defined as
Xk = X ∩ B̄βk (0), ∀ k ∈ N
and β > 0 is chosen to be sufficiently large such that for every k, the constraint
set
{x ∈ Xk : gi (x) ≤ 0, i = 1, 2, . . . , m}

is nonempty. As f and gi , i = 1, 2, . . . , m, are lsc convex on X, they are lsc

convex and coercive on Xk . Thus by the Weierstrass Theorem, Theorem 1.14,
the problem (CP 1k ) has a point of minimizer, say x̄k . As k → ∞, Xk → X and
thus f (x̄k ) → finf . Because Xk ⊂ X, finf ≤ f (x̄k ). Define δk = f (x̄k ) − finf .
Observe that δk ≥ 0 for every k. If δk = 0 for some k, then x̄k ∈ Xk ⊂ X
is a point of minimizer of (CP 1) and the result holds by Theorem 5.6 with
finf = f (x̄k ).
Now suppose that δk > 0 for every k. For positive integers k and positive
scalars r, consider the function Lk,r : Xk × Rm + → R given by
Xm
δk2 2 kαk2
Lk,r (x, α) = f (x) + 2
kx − x̄k k + αi gi (x) − .
4k i=1
2r
Observe that the above function is similar to the saddle point function con-
1
sidered in Theorem 5.6 except that the term 3 kx − x̄k2 is now replaced by
k
δk2 2
kx − x̄k k . In Theorem 5.6, x̄ is a point of minimizer of (CP 1) whereas
4k 2
here the infimum is not attained and thus the term involves x̄k , the point of
minimizer of the problem (CP 1k ) and δk .
Now working along the lines of proof of Theorem 5.6, Lk,r has a saddle
point over Xk × Rm + , say (xk,r , αk,r ), which by the saddle point definition
implies
Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm
+ . (5.22)
As Lk,r (., αk,r ) attains an infinimum over Xk at xk,r ,

Lk,r (xk,r , αk,r ) ≤ f (x̄k ). (5.23)
Also, from (5.22), L(xk,r , α) attains a supremum over α ∈ Rm
+ at
αk,r i = rgi+ (xk,r ), i = 1, 2, . . . , m. (5.24)

Therefore,
Lk,r (xk,r , αk,r ) ≥ f (xk,r ). (5.25)
Further, as in the proof of Theorem 5.6,
lim f (xk,r ) = f (x̄k ).
r→∞
Note that in the proof, the problem (CP 1k ) is considered instead of (CP 1)
as in Theorem 5.6 and hence the condition obtained involves the point of
minimizer of (CP 1k ), that is, x̄k . Now as δk = f (x̄k )−finf , the above equality
leads to
lim f (xk,r ) = finf + δk . (5.26)
r→∞
Now before continuing with the proof to obtain the multipliers for the Fritz
John optimality condition, we present a lemma from Bertsekas, Ozdaglar, and
Tseng [14].

√
Lemma 5.13 For sufficiently large k and every r ≤ 1/ δk ,
δk
f (xk,r ) ≤ finf − . (5.27)
2
√
Furthermore, there exists a scalar rk ≥ 1/ δk such that
δk
f (xk,rk ) = finf − . (5.28)
2
Proof. Define δ = finf − f (x̄), where x̄ ∈ X is such that f (x̄) < finf . For
sufficiently large k, x̄ ∈ Xk . As x̄k is the point of minimizer of the problem
(CP 1k ), f (x̄k ) ≥ finf such that f (x̄k ) → finf , thus for sufficiently large k,
f (x̄k ) − finf < finf − f (x̄),
which implies δk < δ. By the convexity of f over X and that of Xk ⊂ X, for

λk ∈ [0, 1],
f (yk ) ≤ λk f (x̄) + (1 − λk )f (x̄k )

= λk (finf − δ) + (1 − λk )(finf + δk )
= finf − λk (δk + δ) + δk ,
2δk
where yk = λk x̄+(1−λk )x̄k . Because 0 ≤ δk < δ, 0 ≤ < 1. Substituting
δk + δ
2δk
λk = in the above condition yields
δk + δ
f (yk ) ≤ finf − δk . (5.29)
Again by the convexity assumptions of gi , i = 1, 2, . . . , m, along with the

2δk
feasibility of x̄k for (CP 1k ), for λk = ,
δk + δ
gi (yk ) ≤ λk gi (x̄) + (1 − λk )gi (x̄k )

2δk
≤ gi (x̄), i = 1, 2, . . . , m. (5.30)
δk + δ
From the saddle point condition (5.22) along with (5.24) and (5.25),
f (xk,r ) ≤ Lk,r (xk,r , αk,r )

δ2 r
= inf {f (x) + k2 kx − x̄k k2 + kg + (x)k2 }.
x∈Xk 4k 2
As x, x̄k ∈ Xk ⊂ B̄βk (0),
kx − x̄k k ≤ kxk + kx̄k ≤ 2βk,

thereby reducing the preceding inequality to

r
f (xk,r ) ≤ f (x) + (βδk )2 + kg + (x)k2 , ∀ x ∈ Xk .
2
In particular, taking x = yk ∈ Xk in the above condition, which along with
(5.29) and (5.30) implies that for sufficiently large k,
2rδk2
f (xk,r ) ≤ finf − δk + (βδk )2 + kg + (x̄)k2 .
(δk + δ)2
√
For sufficiently large k, δk → 0 and for every r ≤ 1/ δk , the above inequality
reduces to (5.27).
Now by the saddle point condition (5.22), which along with (5.24) implies
that
δk2 r
Lk,r (xk,r , αk,r ) = f (xk,r ) + 2
kxk,r − x̄k k2 + kg + (xk,r )k2
4k 2
δk2 r
= inf {f (x) + 2 kx − x̄k k + kg + (x)k2 }.
2
x∈Xk 4k 2
Consider r̄ > 0. Then for every r ≥ r̄,
δk2 r̄
Lk,r̄ (xk,r̄ , αk,r̄ ) = inf {f (x) + kx − x̄k k2 + kg + (x)k2 }
x∈Xk 4k 2 2
δ2 r̄
≤ f (xk,r ) + k2 kxk,r − x̄k k2 + kg + (xk,r )k2
4k 2
δ2 r
4k 2
= Lk,r (xk,r , αk,r )
δ2 r
≤ f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2 .
4k 2
Thus as r ↓ r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ ).
Now for r ≤ r̄,
δk2 r
f (xk,r̄ ) + kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2
4k 2 2
δ2 r̄
≤ f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2
4k 2
= Lk,r̄ (xk,r̄ , αk,r̄ )
δ2 r̄
4k 2
δ2 r r̄ − r +
= f (xk,r ) + k2 kxk,r − x̄k k2 + kg + (xk,r )k2 + kg (xk,r )k2
4k 2 2
r̄ − r +
= Lk,r (xk,r , αk,r ) + kg (xk,r )k2
2
δ2 r r̄ − r +
≤ f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2 + kg (xk,r )k2 .
4k 2 2

For every k, as gi , i = 1, 2, . . . , m, is lsc and coercive on Xk , {gi (xk,r )} is

bounded below by inf gi (x), which exists by the Weierstrass Theorem, The-
x∈Xk
orem 1.14. Therefore as r ↑ r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ ), which along
with the previous case of r ↓ r̄ leads to the continuity of Lk,r (xk,r , αk,r ) in r,
that is, as r → r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ ).
By the conditions (5.23) and (5.25) xk,r belongs to the compact set
{x ∈ Xk : f (x) ≤ f (x̄k )} for every k and therefore {xk,r } is a bounded se-
quence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, as r → r̄, it
has convergent subsequence. Without loss of generality, let xk,r → x̂k , where
x̂k ∈ {x ∈ Xk : f (x) ≤ f (x̄k )}. The continuity of Lk,r (xk,r , αk,r ) in r along
with the lower semicontinuity of f and gi , i = 1, 2, . . . , m, leads to
Lk,r̄ (xk,r̄ , αk,r̄ ) = lim Lk,r (xk,r , αk,r )
r→r̄
δk2 r
= lim {f (xk,r ) + kxk,r − x̄k k2 + kg + (xk,r )k2 }
r→r̄ 4k 2 2
δ2 r̄
≥ f (x̂k ) + k2 kx̂k − x̄k k2 + kg + (x̂k )k2
4k 2
δk2 r̄
≥ inf {f (x) + 2 kx − x̄k k + kg + (x)k2 }
2
x∈Xk 4k 2
= Lk,r̄ (xk,r̄ , αk,r̄ ),
which implies x̂k is the point of minimizer of
δk2 r̄
f (x) + kx − x̄k k2 + kg + (x)k2
4k 2 2
over Xk . As a strict convex function has unique point of minimizer and f (x) +
δk2 r̄
kx − x̄k k2 + kg + (x)k2 is strictly convex, x̂k = xk,r̄ .
4k 2 2
We claim that f (xk,r ) → f (xk,r̄ ) as r → r̄. As f is lsc, we will
prove the upper semicontinuity of f in r. On the contrary, suppose that
f (xk,r̄ ) < lim supr→r̄ f (xk,r ). As r → r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ )
and xk,r → x̂k = xk,r̄ , which implies that
δk2 r̄
f (xk,r̄ ) + 2
kxk,r̄ − x̄k k2 + lim inf kg + (xk,r )k2
4k r→r̄ 2
< lim sup Lk,r (xk,r , αk,r )
r→r̄
= Lk,r̄ (xk,r̄ , αk,r̄ )
δ2 r̄
= f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2 .
4k 2
But the above inequality is a contradiction of the lower semicontinuity of
gi , i = 1, 2, . . . , m. Therefore, f (xk,r ) is continuous in r.
Now by (5.26), for sufficiently large k,
lim f (xk,r ) = finf + δk .
r→+∞

3δk
Therefore, taking ε = , for r sufficiently large,
2
3δk
|f (xk,r ) − (finf + δk )| < ,
2
which implies that
δk 5δk
finf − < f (xk,r ) < finf + . (5.31)
2 2
1
For r ≤ √ , by (5.27),
δk
δk
f (xk,r ) ≤ finf − .
2
1
Now for r = √ , we have two possibilities:
δk
δk
(i) f (xk,r ) = finf − ,
2
δk
(ii) f (xk,r ) < finf − .
2
If (i) holds, then we are done with r = rk . If (ii) holds, then it contradicts
δk
finf − < f (xk,r )
2
1
and thus, r must satisfy r > √ . As f (xk,r ) is continuous in r, by the
δk √
Intermediate Value Theorem, there exists rk ≥ 1/ δk such that
δk
f (xk,r ) = finf − ,
2
that is, (5.28) holds.
Now we continue proving the theorem. From the conditions (5.23) and
(5.25),
Xm
δk2 2
f (xk,r ) ≤ Lk,r (xk,r , αk,r ) ≤ inf {f (x) + 2 kx − x̄k k + αk,r i gi (x)}
x∈Xk 4k i=1
Xm
δk2 2
= f (xk,r ) + kxk,r − x̄k k + αk,r i gi (xk,r )
4k 2 i=1
≤ f (x̄k ).

1
For r = rk ≥ √ , the above condition along with (5.28) and the fact that as
δk
k → +∞, f (x̄k ) → finf and δk → 0 imply that
Xm
δk2 2
lim f (xk,rk ) − finf + kxk,rk − x̄k k + αk,rk i gi (xk,rk ) = 0. (5.32)
k→∞ 4k 2 i=1
Define
v
u m
u X 1 αk,rk i
γk = t1 + 2
αk,r ki
, λk0 = , λki = , i = 1, 2, . . . , m. (5.33)
i=1
γk γk
As αk,rk ∈ Rm+ , δk ≥ 1 for every k. Therefore, dividing (5.32) by γk along with

the relation (5.33) leads to
Xm
δk2 λk0
lim λk0 f (xk,rk ) − λk0 finf + 2
kxk,rk − x̄k k + λki gi (xk,rk ) = 0. (5.34)
k→∞ 4k 2 i=1
By the saddle point condition (5.22),
Lk,rk (xk,rk , αk,rk ) ≤ Lk,rk (x, αk,rk ), ∀ x ∈ Xk .
Dividing the above inequality throughout by γk along with the fact that
kx − x̄k k ≤ 2βk for every x ∈ Xk implies that
Xm
δk2 λk0
λk0 f (xk,rk ) + kxk,r − x̄k k2
+ λki gi (xk,rk )
4k 2 k
i=1
Xm
δk2 λk0
≤ λk0 f (x) + kx − x̄k k2
+ λki gi (x).
4k 2 i=1
From (5.33), for every k, λki ≥ 0, i = 0, 1, . . . , m, such that

m
X
(λk0 )2 + (λki )2 = 1.
i=1
Therefore {λki }, i = 1, 2, . . . , m are bounded sequences and hence, by

the Bolzano–Weierstrass Theorem, Proposition 1.3, have convergent subse-
quences. Without loss of generality, assume that λki → λi , i = 1, 2, . . . , m,
with λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero. Taking the limit as
k → +∞ in the above inequality along with (5.34) leads to
m
X
λ0 finf ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X.
i=1

Therefore,
m
X
λ0 finf ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀ i
i=1
≤ inf λ0 f (x)
= λ0 finf ,
which leads to condition (i).

Now dividing the condition (5.24) by γk , which along with (5.33) leads to
rk gi+ (xk,rk )
λki = , i = 1, 2, . . . , m.
γk
As k → +∞,
rk gi+ (xk,rk )
λi = lim , i = 1, 2, . . . , m.
k→∞ γk
In the beginning of the proof, we assumed that there exists x̄ ∈ X satisfying
f (x̄) < finf , which along with the condition (i) implies that the index set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Otherwise, if I¯ is empty, then
inf λ0 f (x) ≤ λ0 f (x̄) < λ0 finf ,

x∈X
¯ λi > 0, which implies there

which is a contradiction to condition (i). For i ∈ I,
exists a sequence λi → λi such that λi > 0. Therefore, gi+ (xk,rk ) > 0, that is,
k k
¯
gi (xk,rk ) > 0, ∀ i ∈ I.
1
In particular, for r = rk ≥ √ , conditions (5.23) and (5.24) yield
δk
rk + δ2 rk
f (xk,rk ) + kg (xk,rk )k2 ≤ f (xk,rk ) + k2 kxk,rk − x̄k k2 + kg + (xk,rk )k2
2 4k 2
≤ f (x̄k ),
which along with (5.28) and the relation δk = f (x̄k ) − finf ≥ 0 implies that
rk kg + (xk,rk )k2 ≤ 3δk .

√
As k → +∞, δk → 0 and rk ≥ 1/ δk → ∞, the above inequality leads to
g + (xk,rk ) → 0, that is,
lim sup gi (xk,rk ) ≤ 0, i = 1, 2, . . . , m.

k→∞

5.5 Enhanced Dual Fritz John Optimality Conditions 235
Also, from the condition (5.28),
f (xk,rk ) < finf and lim f (xk,rk ) = finf .

k→∞
Thus, the condition (ii) is satisfied by the sequence {xk,rk } ⊂ X, thereby

yielding the desired result.
Under the Slater-type constraint qualification, the multiplier λ0 can be
ensured to be nonzero and hence can be normalized to one.
5.5 Enhanced Dual Fritz John Optimality Conditions

In this chapter we emphasize the enhanced Fritz John conditions. As observed
in Section 5.4, we dealt with the situation where the infimum of the original
problem (CP 1) exists but is not attained. Those results were extended to
the dual scenario where the dual problem has a supremum but not attained
by Bertsekas, Ozdaglar and Tseng [14]. Now corresponding to the problem
(CP 1), the associated dual problem is
sup w(λ) subject to λ ∈ Rm
+ (DP 1)
where w(λ) = inf L(x, λ) with
x∈X
 m
 f (x) + X λ g (x),

λ ∈ Rm
i i +,
L(x, λ) =

 i=1
−∞, otherwise.
Before presenting the enhanced dual Fritz John optimality condition, we

first prove a lemma that will be required in establishing the theorem.
Lemma 5.14 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the convex set
X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the
infimum of (CP 1) and for every δ > 0, assume that
fδ = inf f (x).
x∈X,gi (x)≤δ,∀i
Then the supremum of (DP 1), wsup , satisfies fδ ≤ wsup for every δ > 0 and
wsup = lim fδ .
δ↓0
Proof. For the problem (CP 1), as the infimum finf exists and finf < +∞,
the feasible set of (CP 1) is nonempty, that is, there exists x̄ ∈ X satisfying
gi (x̄) ≤ 0, i = 1, 2, . . . , m. Thus for δ > 0, the problem

inf f (x) subject to gi (x) ≤ δ, i = 1, 2, . . . , m, x ∈ X, (CP 1δ )

satisfies the Slater-type constraint qualification as x̄ ∈ X with gi (x̄) < δ,
i = 1, 2, . . . , m. Therefore, by Theorem 5.11, there exist λδi ≥ 0,
i = 1, 2, . . . , m, such that
m
X m
X
fδ = inf {f (x) + λδi gi (x) − δ λδi }
x∈X
i=1 i=1
m
X
≤ inf {f (x) + λδi gi (x)}
x∈X
i=1
δ
= w(λ )
≤ sup w(λ) = wsup .
λ∈Rm
+
Therefore, for every δ > 0, fδ ≤ wsup and hence
lim fδ ≤ wsup . (5.35)

δ↓0
Now as δ → 0, the feasible region of (CP 1δ ) decreases and thus fδ is

nondecreasing as δ ↓ 0 and for every δ > 0, fδ ≤ finf . This leads to two cases,
either limδ→0 fδ > −∞ or limδ→0 fδ = −∞.
If limδ→0 fδ > −∞, then fδ > −∞ for every δ > 0 sufficiently small. For
those δ > 0, choosing xδ ∈ X such that gi (xδ ) ≤ δ, i = 1, 2, . . . , m, and
f (xδ ) ≤ fδ + δ. Such xδ are called almost δ-solution of (CP 1), the concept
that will be dealt with in Chapter 10. Therefore for λ ∈ Rm +,
m
X
w(λ) = inf {f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ f (xδ ) + λi gi (xδ )
i=1
Xm
≤ fδ + δ + δ λi .
i=1
Taking the limit as δ → 0 in the above inequality leads to
w(λ) ≤ lim fδ , ∀ λ ∈ Rm
+,
δ→0
which implies wsup ≤ limδ→0 fδ .

If limδ→0 fδ = −∞, then for δ > 0, choose xδ ∈ X such that gi (xδ ) ≤ δ,
i = 1, 2, . . . , m, and f (xδ ) ≤ −1/δ. As in the previous case, for λ ∈ Rm
+,
Xm
−1
w(λ) ≤ +δ λi ,
δ i=1

which leads to w(λ) = −∞ for every λ ∈ Rm + as δ ↓ 0 and hence,

wsup = −∞ = limδ→0 fδ . From both these cases along with the condition
(5.35), the requisite result is established.
Finally, we present the enhanced dual Fritz John optimality conditions
obtained by Bertsekas, Ozdaglar, and Tseng [14], which are expressed with
respect to the supremum of the dual problem (DP 1).
set X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the
infimum of (CP 1) and wsup > −∞ be the supremum of (DP 1). Then there
exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 wsup = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1
(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ =

6 ∅, then
there exists a sequence {xk } ⊂ X such that
lim f (xk ) = wsup and lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m,

k→∞ k→∞
f (xk ) < wsup and ¯

gi (xk ) > 0, ∀ i ∈ I.
Proof. By the weak duality, wsup ≤ finf , which along with the hypothesis
implies that finf and wsup are finite. For k = 1, 2, . . ., consider the problem:
1
min f (x) subject to , i = 1, 2, . . . , m, x ∈ X.
gi (x) ≤ (CP 1k )
k4
k
By Lemma 5.14, the infimum finf of (CP 1k ) satisfies the condition
k
finf ≤ wsup for every k. For each k, consider x̂k ∈ X such that
1 1
f (x̂k ) ≤ wsup + and gi (x̂k ) ≤ , i = 1, 2, . . . , m. (5.36)
k2 k4
Now consider another problem:
1 ˆ 1k )
min f (x) subject to gi (x) ≤ , i = 1, 2, . . . , m, x ∈ X̂k , (CP
k4
where X̂k = X ∩ {x ∈ Rn : kxk ≤ k(maxj=1,...,k kx̂j k + 1)} is a compact
set. By the lower semicontinuity and convexity of f and gi , i = 1, 2, . . . , m,
over X, the functions are lsc convex and coercive on X̂k . Therefore, by the
Weierstrass Theorem, Theorem 1.14, (CP ˆ 1k ) has a point of minimizer, say
ˆ 1k ) which leads to
x̄k . From (5.36), x̂k is feasible for (CP
1
f (x̄k ) ≤ f (x̂k ) ≤ wsup + . (5.37)
k2

For every k, define the Lagrangian function as

m
X kαk2
Lk (x, α) = f (x) + αi gi (x) −
i=1
2k
and the set
Xk = X̂k ∩ {x ∈ Rn : gi (x) ≤ k, i = 1, 2, . . . , m}. (5.38)
For a fixed α ∈ Rm + , Lk (., α) is lsc convex and coercive on Xk by the lower

semicontinuity convexity and coercivity of f and gi , i = 1, 2, . . . , m, on X̂k
whereas for a given x ∈ Xk , Lk (x, .) is quadratic negative definite in α. Then
by the Saddle Point Theorem, Proposition 4.1, Lk has a saddle point over
Xk × Rm + , say (xk , αk ), that is,
Lk (xk , α) ≤ Lk (xk , αk ) ≤ L(x, αk ), ∀ x ∈ Xk , ∀ α ∈ Rm

+.
Because Lk (xk , .) is quadratic negative definite, it attains a supremum over

Rm
+ at
αik = kgi+ (xk ), i = 1, 2, . . . , m. (5.39)
Also, as Lk (., αk ) attains the infimum over Xk at xk , which along with (5.37),
(5.38) and (5.39) implies
m
X kαk2
Lk (xk , αk ) = f (xk ) + αik gi (xk ) −
i=1
2k
m
X
≤ f (xk ) + αik gi (xk )
i=1
m
X
= inf {f (x) + k gi+ (xk )gi (x)}
x∈Xk
i=1
m
X
≤ inf {f (x) + k gi+ (xk )gi (x)}.
x∈Xk ,gi (x)≤ k14 ,∀i
i=1
As xk ∈ Xk , gi (xk ) ≤ k for i = 1, 2, . . . , m. Therefore, the above inequality

leads to
m
Lk (xk , αk ) ≤ inf 1 {f (x) + 2 }
x∈Xk ,gi (x)≤ k4 ,∀i k
m
= f (x̄k ) + 2
k
m+1
≤ wsup + . (5.40)
k2
Due to the finiteness of wsup , there exists a sequence {µk } ⊂ Rm
+ satisfying
kµk k2
w(µk ) → wsup and → 0, (5.41)
2k

which is ensured by choosing µk as the point of maximizer of the problem

max w(α) subject to kαk ≤ k 1/3 , α ∈ Rm
+.
Thus for every k,

Lk (xk , αk ) = sup inf Lk (x, α)
α∈Rm x∈Xk
+
≥ sup inf Lk (x, α)

α∈Rm x∈X
+
m
X kαk2
= sup { inf {f (x) + αi gi (x)} − }
α∈Rm
+
x∈X
i=1
2k
kαk2
= sup {w(α) − }
α∈Rm
+
2k
kµk k2
≥ w(µk ) − . (5.42)
2k
From the conditions (5.40) and (5.42),
m
X
kµk k2 kαk k2
w(µk ) − ≤ f (xk ) + αik gi (xk ) −
2k i=1
2k
m
X
≤ f (xk ) + αik gi (xk )
i=1
m+1
≤ wsup + . (5.43)
k2
Taking the limit as k → +∞ in the above inequality, which along with (5.41)
implies that
m
X
lim {f (xk ) − wsup + αik gi (xk )} = 0. (5.44)
k→∞
i=1
Define
v
u m
u X 1 αik
γk = t1 + (αik )2 , λk0 = and λki = , i = 1, 2, . . . , m. (5.45)
i=1
γk γk
As αk ∈ Rm
+,from the above condition it is obvious that γk ≥ 1 for every k
and thus dividing (5.44) by γk yields
m
X
lim {λk0 (f (xk ) − wsup ) + λki gi (xk )} = 0. (5.46)
k→∞
i=1
As xk minimizes Lk (., αk ) over Xk ,

m
X m
X
f (xk ) + αik gi (xk ) ≤ f (x) + αik gi (x), ∀ x ∈ Xk ,
i=1 i=1

which on dividing throughout by γk leads to

m
X m
X
λk0 f (xk ) + λki gi (xk ) ≤ λk0 f (x) + λki gi (x), ∀ x ∈ Xk .
i=1 i=1
From the condition (5.45),

m
X
(λk0 )2 + (λki )2 = 1,
i=1
which implies that the sequences {λki } ⊂ R+ , i = 0, 1, . . . , m, are bounded and

thus by Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent sub-
sequence. Without loss of generality, let λki → λi with λi ≥ 0, i = 0, 1, . . . , m,
not all simultaneously zero. Therefore, as k → +∞ in the preceding inequality,
which along with (5.46) yields
m
X
λ0 wsup ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X,
i=1
which leads to
m
X
λ0 wsup ≤ inf {λ0 f (x) + λi gi (x)}. (5.47)
x∈X
i=1
If λ0 > 0, then from the above inequality (5.47),
Xm
λi
wsup ≤ inf {f (x) + gi (x)} = w(λ/λ0 ) ≤ wsup ,
x∈X λ
i=1 0
thereby satisfying condition (i).

If λ0 = 0, then the relation (5.45) reduces to
m
X
0 ≤ inf λi gi (x).
x∈X
i=1
As finf exists and is finite, the feasible set of (CP 1) is nonempty, which implies
that there exists x ∈ X satisfying gi (x) ≤ 0, i = 1, 2, . . . , m. Therefore, the
above condition becomes
m
X
0 = inf λi gi (x).
x∈X
i=1
Therefore for both cases, condition (i) holds, that is,

m
X
λ0 wsup = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1

Now suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is non-

empty. Dividing the condition (5.39) throughout by γk and using (5.45),
kgi+ (xk )
λki = , i = 1, 2, . . . , m.
δk
As k → +∞, λki → λi , i = 1, 2, . . . , m, thereby reducing the above equality

to
kgi+ (xk )
λi = lim , i = 1, 2, . . . , m.
k→∞ δk
¯ λi > 0, which implies for sufficiently large k, g + (xk ) > 0, that
For any i ∈ I, i
is,
¯
gi (xk ) > 0, ∀ i ∈ I.
From the inequalities (5.43), for every k,

m
X m+1
k(f (xk ) − wsup ) + k αik gi (xk ) ≤ .
i=1
k
Pm 1
By the condition (5.39), i=1 αik gi (xk ) = kαk k2 . Therefore, the above in-
k
equality becomes
m
X m+1
k(f (xk ) − wsup ) + (αik )2 ≤ .
i=1
k
Dividing the above inequality throughout by γk2 , which along with (5.45) im-
plies that
m
k(f (xk ) − wsup ) X k 2 m+1
+ (λi ) ≤ ,
γk2 i=1
kγk2
which as k → +∞ yields
Xm
k(f (xk ) − wsup )
lim sup 2 ≤− λ2i . (5.48)
k→∞ γk i=1
As I¯ is nonempty, the above inequality leads to
k(f (xk ) − wsup )

lim sup < 0,
k→∞ γk2
which for sufficiently large k implies that f (xk ) < wsup .

Now from (5.41) and (5.43),

m
X kαk k2
lim {f (xk ) − wsup + αik gi (xk )} − lim = 0,
k→∞
i=1
k→∞ 2k
which by the condition (5.44) implies that
kαk k2
lim = 0. (5.49)
k→∞ 2k
The condition (5.39) along with (5.41) and (5.43) leads to
kαk k2
lim (f (xk ) − wsup ) + = 0,
k→∞ 2k
which together with (5.48) implies that f (xk ) → wsup . Also, (5.49) along with
(5.39) and (5.48) yields
m
X
lim k (gi+ (xk ))2 = 0,
k→∞
i=1
which shows that
lim sup gi (xk ) ≤ 0.

k→∞
Thus for nonempty I, ¯ the sequence {xk } ⊂ X satisfies condition (ii), thereby
establishing the desired result.

Chapter 6
Optimality without Constraint
Qualification
6.1 Introduction
In the last few chapters we saw how fundamental the role of constraint qual-
ification is like the Slater constraint qualification in convex optimization. In
Chapter 3 we saw that a relaxation of the Slater constraint qualification to
the Abadie constraint qualification leads to an asymptotic version of the KKT
conditions for the nonsmooth convex programming problems. Thus it is inter-
esting to ask whether it is possible to develop necessary and sufficient opti-
mality conditions for (CP ) without any constraint qualifications. Recently a
lot of work has been done in this respect in the form of sequential optimality
conditions. But to the best of our knowledge the first step in this direction
was taken by Ben-Tal, Ben-Israel, and Zlobec [7]. They obtained the necessary
and sufficient optimality conditions in the smooth scenario in the absence of
constraint qualifications. This work was extended to the nonsmooth scenario
by Wolkowicz [112]. All these studies involved direction sets, which we will
discuss below. So before moving on with the discussion of the results derived
by Ben-Tal, Ben-Israel, and Zlobec [7], and Wolkowicz [112], we present the
notion of direction sets. Before that we introduce the definition of a blunt
cone.
A set K ⊂ Rn is said to be a cone (Definition 2.18) if
λx ∈ K whenever λ ≥ 0 and x ∈ K,
whereas K is a blunt cone if K is a cone without origin, that is,
0∈
/K and λx ∈ K if x ∈ K and λ > 0.
For example, R2+ \ {(0, 0)} is a blunt cone while the set K ⊂ R2 given as
K = {(x, y) ∈ R2 : x = y} is not a blunt cone.
Definition 6.1 Let φ : Rn → R be a given function and let x̄ ∈ Rn be any
given point. Then the set
Dφrelation (x̄) = {d ∈ Rn : there exists ᾱ > 0 such that
φ(x̄ + αd) relation φ(x̄), ∀ α ∈ (0, ᾱ]},
243

244 Optimality without Constraint Qualification
where the relation can be =, ≤, <, ≥, or >.
In particular, the set Dφ= is called the cone of directions of constancy that was
considered by Ben-Tal, Ben-Israel, and Zlobec [7]. The other direction sets
were introduced in the work of Wolkowicz [112]. We present certain examples
of computing explicitly the set Dφ= (x̄) from Ben-Tal, Ben-Israel, and Zlobec [7].
For a strictly convex function φ : Rn → R, Dφ= (x̄) = {0} for any x̄ ∈ Rn .
Another interesting example from Ben-Tal, Ben-Israel, and Zlobec [7] is the
cone of the directions of constancy for the so-called faithfully convex function
given as
φ(x) = h(Ax + b) + ha, xi + β,
where h : Rn → R is strictly convex, A is an m × n matrix, b ∈ Rm , a ∈ Rn ,

and β ∈ R. The class of faithfully convex functions is quite broad, comprising
all the strictly convex functions and quadratic convex functions. See Rockafel-
lar [98] for more details. In the case of faithfully convex functions,

= A
Dφ (x̄) = N ull = {d ∈ Rn : Ad = 0, ha, di = 0},
aT
where N ull(S) is the null space of the matrix S. It is obvious that the null
space is contained in Dφ= . For the sake of completeness, we provide an expla-
nation for the reverse containment. We consider the following cases.
1. Ad = 0: Then by the definition of direction of constancy, ha, di = 0.
2. ha, di = 0: Suppose that d ∈ Dφ= (x̄), which implies there exists ᾱ > 0
such that
h(Ax̄ + αAd + b) = h(Ax̄ + b), ∀ α ∈ (0, ᾱ].
Suppose Ad 6= 0, then Ax̄ + αAd + b 6= Ax̄ + b for every α ∈ (0, ᾱ]. Now
two cases arise. If h(Ax̄ + α̂Ad + b) = h(Ax̄ + b) for some α̂ ∈ (0, ᾱ],
then by the strict convexity of h, for every λ ∈ (0, 1),
h(Ax̄ + λα̂Ad + b) < (1 − λ)h(Ax̄ + b) + λh(Ax̄ + α̂Ad + b),
which implies
h(Ax̄ + αAd + b) < h(Ax̄ + b), ∀ α ∈ (0, α̂)
/ Dφ= (x̄).
and hence, d ∈
The second case is that h(Ax̄+αAd+b) 6= h(Ax̄+b) for every α ∈ (0, ᾱ].
Then again it implies that d 6∈ Dφ= (x̄), which violates our assumption.
Therefore, for d to be a direction of constancy, Ad = 0 .

6.1 Introduction 245
3. Ad 6= 0, ha, di 6= 0: This implies d 6= 0. We will show that φ is strictly

convex on the line segment [x̄, x̄ + ᾱd]. Consider xi = x̄ + αi d, i = 1, 2,
where αi ∈ [0, ᾱ] and α1 6= α2 . Therefore x1 6= x2 . By the strict convex-
ity of h, for every λ ∈ (0, 1),
h(A(λx1 + (1 − λ)x2 ) + b) < λh(Ax1 + b) + (1 − λ)h(Ax2 + b),
and by linearity of ha, .i,
ha, λx1 + (1 − λ)x2 i = λha, x1 i + (1 − λ)ha, x2 i.
Combining the above two conditions,
φ(λx1 + (1 − λ)x2 ) < λφ(x1 ) + (1 − λ)φ(x2 ), ∀ λ ∈ (0, 1).
This condition holds for every x1 , x2 ∈ [x̄, x̄ + αd] and thus φ is strictly
convex on [x̄, x̄ + αd]. Hence as mentioned earlier, Dφ= (x̄) = {0} for the
strictly convex function φ. But this contradicts the fact that d 6= 0.
Combining the above cases, we have
Dφ= (x̄) = {d ∈ Rn : Ad = 0, ha, di = 0}.
Below we present some results on the direction sets that will be required
in deriving the optimality conditions from Ben-Tal, Ben-Israel, and Zlobec [7],
Ben-Tal and Ben-Israel [6], and Wolkowicz [112].
Proposition 6.2 (i) Consider a function φ : Rn → R and x̄ ∈ Rn . Then
Dφ= (x̄) ⊂ {d ∈ Rn : φ′ (x̄, d) = 0}.
(ii) Consider a differentiable convex function φ : Rn → R and x̄ ∈ Rn . Then

Dφ= (x̄) is a convex cone.
(iii) Consider a convex function φ : Rn → R and x̄ ∈ Rn . Then Dφ≤ (x̄) is a

convex cone while Dφ< (x̄) is a convex blunt open cone. Also
Dφ≤ (x̄) = {d ∈ Rn : φ′ (x̄, d) ≤ 0} and Dφ< (x̄) = {d ∈ Rn : φ′ (x̄, d) < 0}.
(iv) Consider a convex function φ : Rn → R and x̄ ∈ Rn . Assume that

Dφ< (x̄) 6= ∅ (equivalently 0 ∈
/ ∂φ(x̄)). Then
(Dφ≤ (x̄))◦ = cone ∂φ(x̄).
Proof. (i) Consider d ∈ Dφ= (x̄), which implies there exists ᾱ > 0 such that
φ(x̄ + αd) = φ(x̄), ∀ α ∈ (0, ᾱ].

Therefore, from the above condition,
φ(x̄ + αd) − φ(x̄)

lim = 0,
α↓0 α
which implies φ′ (x̄, d) = 0, thereby yielding the desired result.

(ii) Consider d ∈ Dφ= (x̄), which implies there exists ᾱ > 0 such that
φ(x̄ + αd) = φ(x̄), ∀ α ∈ (0, ᾱ].
The above condition can be rewritten as
φ(x̄ + α′ d′ ) = φ(x̄), ∀ α′ ∈ (0, ᾱ′ ],

α ᾱ
where d′ = λd, α′ = and ᾱ′ = for any λ > 0. Also 0 ∈ Dφ= (x̄). Therefore,
λ λ
λd ∈ Dφ= (x̄) for every λ ≥ 0 and hence Dφ= (x̄) is a cone.
Now consider d1 , d2 ∈ Dφ= (x̄). Then for i = 1, 2, there exists ᾱi > 0 such
that
φ(x̄ + αi di ) = φ(x̄), ∀ αi ∈ (0, ᾱi ].
Taking ᾱ = min{ᾱ1 , ᾱ2 } > 0, for i = 1, 2 the above condition becomes
φ(x̄ + αdi ) = φ(x̄), ∀ α ∈ (0, ᾱ]. (6.1)
For any λ ∈ [0, 1], consider d = λd1 + (1 − λ)d2 . The convexity of φ along with
(6.1) on d1 and d2 yields
φ(x̄ + αd) = φ(λ(x̄ + αd1 ) + (1 − λ)(x̄ + αd2 ))

≤ λφ(x̄ + αd1 ) + (1 − λ)φ(x̄ + αd2 ), ∀ α ∈ (0, ᾱ],
that is,
φ(x̄ + αd) ≤ φ(x̄), ∀ α ∈ (0, ᾱ]. (6.2)
Again, by the convexity of φ for the differentiable case, for every α ∈ (0, ᾱ],
φ(x̄ + αd) ≥ φ(x̄) + αh∇φ(x̄), di

= φ(x̄) + αλh∇φ(x̄), d1 i + α(1 − λ)h∇φ(x̄), d2 i. (6.3)
For a differentiable convex function, φ′ (x̄, d) = h∇φ(x̄), di for any d ∈ Rn .

Thus the relation in (i) becomes
Dφ= (x̄) ⊂ {d ∈ Rn : h∇φ(x̄), di = 0},
which reduces the inequality (6.3) to
φ(x̄ + αd) ≥ φ(x̄), ∀ α ∈ (0, ᾱ]

6.1 Introduction 247
as d1 , d2 ∈ Dφ= (x̄). This inequality along with the condition (6.2) implies that
d ∈ Dφ= (x̄), thereby leading to the convexity of Dφ= .
(iii) We will prove the result for Dφ< (x̄). Consider d ∈ Dφ< (x̄), which implies
there exists ᾱ > 0 such that
φ(x̄ + αd) < φ(x̄), ∀ α ∈ (0, ᾱ].
As done in (ii), the above inequality can be rewritten as
φ(x̄ + α′ d′ ) < φ(x̄), ∀ α′ ∈ (0, ᾱ′ ],

α ᾱ
where d′ = λd, α′ = and ᾱ′ = for any λ > 0. Note that 0 6∈ Dφ< (x̄).
λ λ
Therefore, λd ∈ Dφ< (x̄) for every λ > 0 and hence Dφ< (x̄) is a blunt cone.
Now consider d1 , d2 ∈ Dφ< (x̄). Working along the lines of the proof in (ii),
for i = 1, 2,
φ(x̄ + αdi ) < φ(x̄), ∀ α ∈ (0, ᾱ]. (6.4)
For any λ ∈ [0, 1], let d = λd1 + (1 − λ)d2 . The convexity of φ along with the
condition (6.4) on d1 and d2 yields
φ(x̄ + αd) ≤ λφ(x̄ + αd1 ) + (1 − λ)φ(x̄ + αd2 )

< φ(x̄), ∀ α ∈ (0, ᾱ],
thereby implying the convexity of Dφ< . From Definition 6.1, it is obvious that
Dφ< is open by using the continuity of φ.
Consider d ∈ Dφ< (x̄), which implies that there exists ᾱ > 0 such that
φ(x̄ + αd) < φ(x̄), ∀ α ∈ (0, ᾱ].
By the convexity of φ, for every α ∈ (0, ᾱ],
αhξ, di ≤ φ(x̄ + αd) − φ(x̄) < 0, ∀ ξ ∈ ∂φ(x̄).
As dom φ = Rn , by Theorem 2.79 and Proposition 2.83, the directional deriv-

ative is the support function of the subdifferential, which along with the com-
pactness of ∂φ is attained at some ξ ∈ ∂φ(x̄). Thus φ′ (x̄, d) < 0, which leads
to
Dφ< (x̄) ⊂ {d ∈ Rn : φ′ (x̄, d) < 0}. (6.5)
Now consider d ∈ Rn such that φ′ (x̄, d) < 0, that is,
φ(x̄ + αd) − φ(x̄)

lim < 0.
α↓0 α
Therefore, there exists ᾱ > 0 such that
φ(x̄ + αd) < φ(x̄), ∀ α ∈ (0, ᾱ],

which implies d ∈ Dφ< (x̄), thereby establishing the equality in the relation
(6.5). Working along the above lines of proof, readers are advised to prove the
result for Dφ≤ (x̄).
(iv) As Dφ< (x̄) is nonempty, (iii) implies that
φ′ (x̄, d) < 0, ∀ d ∈ Dφ< (x̄).
Because dom φ = Rn , by Theorem 2.79, the directional derivative acts as a

support function of the subdifferential, which along with the above relation is
equivalent to 0 6∈ ∂φ(x̄). Therefore, by Proposition 3.4, cone ∂φ(x̄) is closed.
The proof can be worked along the lines of Proposition 3.9 by replacing S(x̄)
b
by Dφ≤ (x̄) and cl S(x̄) by the closed set cone ∂φ(x̄).
Note that unlike (iii) where the relation holds as equality, one is able to
prove only inclusion in (i) and not equality. For example, consider the strict
convex function φ : R → R defined as φ(x) = x2 . For x̄ = 0, Dφ= (x̄) = {0} and
∇φ(x̄) = 0. Observe that
{d ∈ R : h∇φ(x̄), di = 0} = R 6= {0} = Dφ= (x̄).
Hence, the equality need not hold in (i) even for a differentiable function. Also,
for a differentiable function φ : Rn → R, if there are n linearly independent
vectors di ∈ Dφ= (x̄), i = 1, 2, . . . , n, then ∇φ(x̄) = 0. Observe that one needs
the differentiability assumption only in (ii). A careful look at the proof of (ii)
shows that to prove the reverse inequality in (6.2), we make use of (i) under
differentiability. So if φ is nondifferentiable, to prove the result one needs to
assume that for some ξ ∈ ∂φ(x̄), φ′ (x̄, d) = hξ, di = 0 for every d ∈ Dφ= (x̄).
For a better understanding, we illustrate with an example from Ben-Tal, Ben-
Israel, and Zlobec [7]. Consider a convex nondifferentiable function φ : R2 → R
defined as
φ(x1 , x2 ) = max{x1 , x2 }.
For x̄ = (0, 0), ∂φ(x̄) = co {(1, 0), (0, 1)} and
Dφ= (x̄) = {(d, 0) ∈ R2 : d ≤ 0} ∪ {(0, d) ∈ R2 : d ≤ 0},
which is not convex. Note that h(ξ¯1 , ξ¯2 ), (d, 0)i = 0 for ξ¯ = (0, 1) whereas
h(ξ˜1 , ξ˜2 ), (0, d)i = 0 for ξ˜ = (1, 0), that is, φ′ (x̄, d) = 0 for ξ,
¯ ξ˜ ∈ ∂φ(x̄) with
ξ¯ 6= ξ.
˜
With all these discussions on the direction sets, we move on to study the
work done by Ben-Tal, Ben-Israel, and Zlobec [7].

6.2 Geometric Optimality Condition: Smooth Case 249
6.2 Geometric Optimality Condition: Smooth Case

Ben-Tal, Ben-Israel, and Zlobec [7] established the necessary and sufficient
optimality conditions for (CP ) with the feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
in the absence of any constraint qualifications in the smooth scenario. The
result relates the point of minimizer of (CP ) with the inconsistency of a
system. We present the result below. Throughout we will assume that the
active index set I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is nonempty.
by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then
x̄ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(x̄),
the system

h∇f (x̄, di < 0, 
h∇gi (x̄), di < 0, i ∈ Ω, (CPΩ )

d ∈ Di= (x̄), i ∈ Ω∗ = I(x̄)\Ω
is inconsistent where Di= (x̄) = Dg=i (x̄) for i ∈ Ω∗ . It is important to note that
for Ω = I(x̄), Ω∗ = ∅ and then by convention we will consider d ∈ Rn .
Proof. We will prove that the negation of the result, that is, x̄ is not a point of
minimizer of (CP ) if and only if there exists some subset Ω ⊂ I(x̄) such that
the system (CPΩ ) is consistent. Suppose that x̄ is not a point of minimizer
of (CP ), which implies that there exists a feasible point x̃ ∈ C of (CP ) such
that f (x̃) < f (x̄). Therefore, by the convexity of the differentiable functions
f and gi , i ∈ I(x̄), Theorem 2.81,
h∇f (x̄), x̃ − x̄i ≤ f (x̃) − f (x̄) < 0,
h∇gi (x̄), x̃ − x̄i ≤ gi (x̃) − gi (x̄) ≤ 0, i ∈ I(x̄),
which implies for d = x̃ − x̄, the system
h∇f (x̄), di < 0,
h∇gi (x̄), di ≤ 0, i ∈ I(x̄).
Define the subset Ω of I(x̄) as
Ω = {i ∈ I(x̄) : h∇gi (x̄), di < 0}.
Therefore, d satisfies the system
h∇f (x̄), di < 0,
h∇gi (x̄), di < 0, i ∈ Ω,
h∇gi (x̄), di = 0, i ∈ Ω∗ .

We claim that for every i ∈ Ω∗ ,
Di= (x̄) = {d ∈ Rn : h∇gi (x̄), di = 0}.
By Proposition 6.2 (i),
Di= (x̄) ⊂ {d ∈ Rn : gi′ (x̄, d) = 0} = {d ∈ Rn : h∇gi (x̄), di = 0}. (6.6)
Thus, to establish our claim, we will prove the reverse inclusion in the con-
dition (6.6). Consider any i ∈ Ω∗ . Define a differentiable convex function
Gi : R → R as Gi (λ) = gi (x̄ + λd). Therefore,
Gi (λ + δ) − Gi (λ)
∇Gi (λ) = lim
δ↓0 δ
gi (x̄ + (λ + δ)d) − gi (x̄ + λd)
= lim ,
δ↓0 δ
which for λ = 0 along with the fact that i ∈ Ω∗ implies that
gi (x̄ + (λ + δ)d)
∇Gi (λ) = lim = h∇gi (x̄), di = 0.
δ↓0 δ
By Proposition 2.75, ∇Gi is a nondecreasing over λ > 0, that is,
∇Gi (λ) ≥ ∇Gi (0) = 0, ∀ λ > 0.
Therefore, Gi is a nondecreasing function over λ > 0, which implies that λ = 0

is a point of minimizer of Gi . Hence,
gi (x̄ + λd) = Gi (λ) ≥ Gi (0) = 0, ∀ λ > 0 (6.7)
as i ∈ Ω∗ ⊂ I(x̄). As x̃ = x̄ + d is feasible to (CP ), for i ∈ Ω∗ , gi (x̄ + d) ≤ 0.

Thus, for λ = 1, the condition (6.7) reduces to gi (x̄ + d) = 0. By the convexity
of gi ,
gi (x̄ + λd) = gi ((1 − λ)x̄ + λ(x̄ + d))

≤ (1 − λ)gi (x̄) + λgi (x̄ + d) = 0, ∀ λ ∈ (0, 1),
which by (6.7) yields
gi (x̄ + λd) = 0, ∀ λ ∈ (0, 1].
Thus, d ∈ Di= (x̄). Because d ∈ {d ∈ Rn : h∇gi (x̄), di = 0} was arbitrary,
Di= (x̄) ⊃ {d ∈ Rn : h∇gi (x̄), di = 0},
thereby proving the claim. As the claim holds for every i ∈ Ω∗ , x̄ is not a
point of minimizer of (CP ) implies that the system (CPΩ ) is consistent.

Conversely, suppose that the system (CPΩ ) is consistent for some subset
Ω ⊂ I(x̄), that is,
h∇f (x̄, di < 0, (6.8)

h∇gi (x̄), di < 0, i ∈ Ω, (6.9)
d∈ Di= (x̄), i ∈ Ω∗ = I(x̄)\Ω. (6.10)
From the inequality (6.8),

f (x̄ + αd) − f (x̄)
lim < 0,
α↓0 α
which implies there exists ᾱf > 0 such that
f (x̄ + αd) < f (x̄), ∀ α ∈ (0, ᾱf ]. (6.11)
Similarly, from the condition (6.9), there exist ᾱi > 0, i ∈ Ω such that
gi (x̄ + αd) < gi (x̄) = 0, ∀ α ∈ (0, ᾱi ], i ∈ Ω. (6.12)
From (6.10), d ∈ Di= (x̄), i ∈ Ω∗ , which by Definition 6.1 implies that there
exist ᾱi > 0, i ∈ Ω∗ such that
gi (x̄ + αd) = gi (x̄) = 0, ∀ α ∈ (0, ᾱi ], i ∈ Ω∗ . (6.13)
/ I(x̄) is continuous on Rn , there exist

For i 6∈ I(x̄), gi (x̄) < 0. As gi , i ∈
ᾱi > 0, i ∈ I(x̄) such that
gi (x̄ + αd) < 0, ∀ α ∈ (0, ᾱi ], i ∈ I(x̄). (6.14)
Define ᾱ = min{ᾱf , ᾱ1 , . . . , ᾱm }. Therefore, the conditions (6.12), (6.13), and
(6.14) hold for ᾱ as well, which implies x̄ + ᾱd ∈ C, that is, feasible for (CP ).
By the strict inequality (6.11),
f (x̄ + ᾱd) < f (x̄),
thereby leading to the fact that x̄ is not a point of minimizer of (CP ), as

desired.
We illustrate the above result by the following example. Consider the con-
vex programming problem
min −x1 + x2
subject to x1 + x2 + 1 ≤ 0,
x22 ≤ 0.
Observe that x̄ = (−1, 0) is the point of minimizer of the above problem. The
KKT optimality condition at x̄ is given by

−1 1 0 0
+ λ1 + λ2 = ,
1 1 0 0

which is not satisfied by any λi ≥ 0, i = 1, 2. For x̄, I(x̄) = {1, 2} with
D1= (x̄) = {(d1 , d2 ) ∈ R2 : d1 + d2 = 0},

D2= (x̄) = {(d1 , d2 ) ∈ R2 : d2 = 0}.
Now consider the following systems as in Theorem 6.3:


−d1 + d2 < 0, 
d1 + d2 = 0, (CP∅ )

d2 = 0.

−d1 + d2 < 0, 
d1 + d2 < 0, (CP1 )

d2 = 0.

−d1 + d2 < 0, 
0 < 0, (CP2 )

d1 + d2 = 0.

−d1 + d2 < 0, 
d1 + d2 < 0, (CPI(x̄) )

0 < 0.
Observe that all four systems are inconsistent. Therefore, by the above theo-
rem, x̄ is the point of minimizer of the problem.
Now if we consider x̃ = (−2, 0), which is feasible for the problem,
I(x̃) = {2} with D2= (x̃) = D2= (x̄). For x̃, the system

−d1 + d2 < 0,
(CPI(x̃) )
0 < 0.
is inconsistent whereas

−d1 + d2 < 0,
(CP∅ )
d2 = 0.
is consistent. Thus, by Theorem 6.3, x̃ is not the point of minimizer.
Theorem 6.3 was expressed in terms of the inconsistency of a system for
every subset Ω ⊂ I(x̄). Next we present the result of Ben-Tal, Ben-Israel, and
Zlobec [7] in terms of the Fritz John type optimality conditions. But before
establishing that result, we state the Dubovitskii–Milyutin Theorem, which
acts as a tool in the proof.
Proposition 6.4 Consider open blunt convex cones C1 , C2 , . . . , Cm and con-

vex cone Cm+1 . Then
m+1
\
Ci = ∅
i=1

if and only if there exists yi ∈ Ci◦ , i = 1, 2, . . . , m, not all simultaneously zero

such that
y1 + y2 + . . . + ym + ym+1 = 0.
by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then
x̄ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(x̄) the
system
X 
= ◦
0 ∈ λ0 ∇f (x̄) + λi ∇gi (x̄) + (DΩ ∗ (x̄)) , 
i∈Ω (CPΩ′ )

λ0 ≥ 0, λi ≥ 0, i ∈ Ω, not all simultaneously zeros
is consistent, where
 \
 Di= (x̄), if Ω∗ 6= ∅,
=
DΩ ∗ (x̄) = i∈Ω∗
 n
R , if Ω∗ = ∅.
Proof. From Theorem 6.3, x̄ is a point of minimum of (CP ) if and only

if for every subset Ω ⊂ I(x̄), the system (CPΩ ) is inconsistent, which by the
differentiability of f and gi , i ∈ Ω, along with Proposition 6.2 (iii) is equivalent
to
!
\
< < =
Df (x̄) ∩ Di (x̄) ∩ DΩ ∗ (x̄) = ∅,
i∈Ω
where Di< (x̄) = Dg<i (x̄), i ∈ Ω. By Proposition 6.2 (c), Df< (x̄) and Di< (x̄),
=
i ∈ Ω are open blunt convex cones while DΩ ∗ (x̄), being the intersection of
convex cones, is itself a convex cone. Applying Propositions 6.2 (iv) and 2.80,
(Dφ< (x̄))◦ = {y ∈ Rn : y = µ∇φ(x̄), µ ≥ 0}.
Now applying the Dubovitskii–Milyutin Theorem, Proposition 6.4, is equiva-

lent to the existence of multipliers λ0 ≥ 0, λi ≥ 0, i ∈ Ω, not all simultane-
ously zero such that
X
= ◦
0 ∈ λ0 ∇f (x̄) + λi ∇gi (x̄) + (DΩ ∗ (x̄)) ,
i∈Ω

Ben-Tal, Ben-Israel, and Zlobec [7] also dealt with the strictly convex case.
For more details, one can go through [7]. Observe that taking Ω = I(x̄) in
Theorem 6.5, the system (CPΩ′ ) reduces to the standard Fritz John optimality
condition. Similarly in Theorem 6.3, the system (CPΩ ) becomes
h∇f (x̄), di < 0,

h∇gi (x̄), di < 0, i ∈ I(x̄).

Similar to the notion of constraint qualification, they define the following

concept of regularization condition under which the result holds for Ω = I(x̄),
and the other subsets of I(x̄) need not be considered.
Definition 6.6 A condition is called a regularization condition at a point x̄

if, when assumed along with the convexity and differentiability conditions of
f and gi , i = 1, 2, . . . , m, the family {(CPΩ ) : Ω ⊂ I(x̄)} can be replaced by a
single system (CPI(x̄) ). Thus, x̄ is a point of minimum of (CP ) if and only if
′
(CPI(x̄) ) is inconsistent or (CPI(x̄) ) is consistent. In this case, the Fritz John
optimality condition is necessary as well as sufficient to characterize the point
of minimum of (CP ).
In the example considered in this section, there is no regularization con-

dition because in the case of x̃ we need to verify the inconsistency, all the
possible systems other than (CPI(x̃) ) only to check that x̃ is not the point of
minimizer.
As observed in Chapter 3, under the Slater constraint qualification, the
KKT optimality conditions is necessary as well as sufficient to check whether
a point is optimal or not. It has been shown in Ben-Tal, Ben-Israel, and
Zlobec [7] that the Slater constraint qualification acts as a regularization con-
dition for (CP ). We present the result below.
Proposition 6.7 Consider the convex programming problem (CP ) with C

given by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions.
Then the Slater constraint qualification, that is, there exists x̂ ∈ Rn such that
gi (x̂) < 0, i = 1, 2, . . . , m, is a regularization condition for (CP ).
Proof. We prove the result by establishing the negation. From the definition
of regularization condition, it is equivalent to verifying that x̄ is not a point of
minimizer of (CP ) if and only if the system (CPI(x̄) ) is consistent. If (CPI(x̄) )
is consistent, then by Theorem 6.3, x̄ is not a point of minimizer of (CP ).
Conversely, suppose that x̄ is not a point of minimizer for (CP ). Again, by
Theorem 6.3, there exists a subset Ω̄ ∈ I(x̄) and d¯ ∈ Rn such that the system

¯ < 0,
h∇f (x̄), di 
¯ < 0, i ∈ Ω̄,
h∇gi (x̄), di (CPΩ̄ )

d¯ ∈ Di= (x̄), i ∈ Ω̄∗ = I(x̄)\Ω̄.
By Proposition 6.2 (i),
¯ = 0, i ∈ Ω̄∗ .
h∇gi (x̄), di (6.15)
As x̂ satisfies the Slater constraint qualification, applying Theorem 2.81 to

gi , i ∈ I(x̄),
h∇gi (x̄), x̂ − x̄i ≤ gi (x̂) − gi (x̄) < 0, i ∈ I(x̄).
Define d˜ = d¯+ α(x̂ − x̄) for α > 0 sufficiently small. Then using the condition
(6.15), the system

6.3 Geometric Optimality Condition: Nonsmooth Case 255

˜ < 0,
h∇f (x̄, di
˜ < 0, i ∈ I(x̄), (CPI(x̄) )
h∇gi (x̄), di
˜ thereby leading to the desired result.
is consistent for d,
Note that in the example considered in this section, the regularization
condition did not hold. As a matter of fact, the Slater constraint qualification
was not satisfied.
6.3 Geometric Optimality Condition: Nonsmooth Case

The work of Ben-Tal, Ben-Israel, and Zlobec [7] was extended by Wolkow-
icz [112] to nonsmooth convex scenario. The latter not only studied the op-
timality conditions by avoiding constraint qualifications, but also gave a geo-
metrical interpretation to what he termed as badly behaved constraints. Before
discussing the contributions of Wolkowicz [112] toward the convex program-
ming problem (CP ) with the feasible set C given by (3.1), we will define some
notations. The equality set is given by
I = = {i ∈ {1, 2, . . . , m} : gi (x) = 0, ∀ x ∈ C}.
For x̄ ∈ C, define
I < (x̄) = I(x̄)\I = ,
where I(x̄) is the active index set at x̄. Observe that while I < (x̄) depends on
x̄, I = is independent of any x ∈ C. Using the direction notations presented in
the beginning of this chapter, Wolkowicz [112] defined the set of badly behaved
constraints.
Definition 6.8 For x̄ ∈ C, the set of badly behaved constraints is given by
\
I b (x̄) = {i ∈ I = : (Di> (x̄) ∩ S(x̄))\cl Di= (x̄) 6= ∅},
i∈I =
where
S(x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)}.
Recall that we introduced the set S(x̄) in Section 3.3 and proved in Proposi-
tion 3.9 that
b
(S(x̄))◦ = cl S(x̄),
where
X
b
S(x̄) ={ λi ξi : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)}.
i∈I(x̄)

The set I b (x̄) is the set of constraints that create problems in KKT conditions.
A characterization of the above set in terms of the directional derivative was
stated by Wolkowicz [112] without proof. We present the result with proof for
a better understanding.
by (3.1). Let i∗ ∈ I = . Then i∗ ∈ I b (x̄) if and only if the system

gi′∗ (x̄, d) = 0, 


gi′ (x̄, d) ≤ 0, ∀ i\∈ I(x̄)\i∗ ,
(CPb )
d∈ / Di=∗ (x̄) ∪ cl Di= (x̄). 


i∈I =
is consistent.
Proof. Suppose that i∗ ∈ I b (x̄), which implies there exists d∗ ∈ Rn such that
 ∗

 d ∈ Di>∗ (x̄),
 ∗
d ∈ S(x̄),\


 d ∗
∈
/ cl Di= (x̄).
i∈I =
As d∗ ∈ Di>∗ (x̄), d∗ ∈
/ Di=∗ (x̄), which along with the last condition implies
\
d∗ ∈
/ Di=∗ (x̄) ∪ cl Di= (x̄). (6.16)
i∈I =
Also, as d∗ ∈ Di>∗ (x̄), by Definition 6.1 there exists α∗ > 0 such that
gi∗ (x̄ + αd∗ ) > gi∗ (x̄), ∀ α ∈ (0, α∗ ].
Therefore,
gi∗ (x̄ + αd∗ ) − gi∗ (x̄)

lim ≥ 0,
α↓0 α
which implies
gi′∗ (x̄, d∗ ) ≥ 0. (6.17)
Because d∗ ∈ S(x̄),
gi′ (x̄, d∗ ) ≤ 0, ∀ i ∈ I(x̄). (6.18)
In particular, taking i∗ ∈ I = ⊆ I(x̄) in the above inequality along with (6.17)
yields
gi′∗ (x̄, d∗ ) = 0. (6.19)
Combining the conditions (6.16), (6.18), and (6.19) together imply that d∗
solves the system (CPb ), thereby leading to its consistency.

Conversely, suppose that (CPb ) is consistent, which implies there exists

d∗ ∈ Rn such that

gi′∗ (x̄, d∗ ) = 0, 


gi′ (x̄, d∗ ) ≤ 0, ∀ i \
∈ I(x̄)\i∗ ,
d∗ ∈ / Di=∗ (x̄) ∪ cl Di= (x̄). 


i∈I =
The first equality condition can be expressed as two inequalities given by
gi′∗ (x̄, d∗ ) ≤ 0 and gi′∗ (x̄, d∗ ) ≥ 0. (6.20)
As i∗ ∈ I = ⊆ I(x̄) along with the above condition yields
d∗ ∈ S(x̄). (6.21)
Also, from the inequality (6.20), there exists α∗ > 0 such that
gi∗ (x̄ + αd∗ ) ≥ gi∗ (x̄), ∀ α ∈ (0, α∗ ].
As d∗ 6∈ Di=∗ (x̄), the above inequality holds as a strict inequality and hence
d∗ ∈ Di>∗ (x̄). (6.22)

T
The conditions (6.21) and (6.22) along with the fact that d∗ ∈ / cl i∈I = Di= (x̄)
implies that d∗ ∈ I b (x̄), thereby establishing the desired result.
Observe that if Di=∗ (x̄) = {d ∈ Rn : gi′∗ (x̄, d) = 0}, then by the above
characterization of the badly behaved constraints, i∗ 6∈ I b (x̄). The class of
functions that are never badly behaved includes the class of all continuous lin-
ear functionals and the classical distance function. For more on badly behaved
constraints, one can refer to Wolkowicz [112].
Before moving any further, we present a few results from Wolkowicz [112]
that act as a tool in the derivation of the characterization for the point of
minimum.
Proposition 6.10 Consider the convex programming problem (CP ) with C

given by (3.1). Suppose that x̄ ∈ C. Then
\ \ \
(i) Di≤ (x̄) = Di= (x̄) ∩ Di≤ (x̄).
i∈I(x̄) i∈I = i∈I < (x̄)
\ \
(ii) Di= (x̄) ∩ Di< (x̄) 6= ∅.
i∈I = i∈I < (x̄)
Furthermore, suppose that the set Ω satisfies I b (x̄) ⊂ Ω ⊂ I = . If

\
either co Di= (x̄) is closed or Ω = I = ,
i∈Ω
then

\
(iii) TC (x̄) = cl Di≤ (x̄).
i∈I(x̄)
\ \ \
(iv) cl co Di= (x̄) ∩ S(x̄) = cl Di= (x̄) ∩ S(x̄) = cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈Ω i∈I =
\
(v) TC (x̄) = cl co Di= (x̄) ∩ S(x̄).
i∈Ω
[ \
(vi) −co ∂gi (x̄) ∩ ( Di= (x̄))◦ = ∅.
i∈I < (x̄) i∈Ω
Proof. (i) Observe that I(x̄) = I = ∪ I < (x̄), which implies

\
Di≤ (x̄) = {d ∈ Rn : there exists ᾱ > 0 such that
i∈I(x̄)
gi (x̄ + αd) ≤ gi (x̄), ∀ i ∈ I(x̄)}

\ \
= Di≤ (x̄) ∩ Di≤ (x̄).
i∈I = i∈I < (x̄)
For any d ∈ Di≤ (x̄), there exists ᾱ > 0 such that
gi (x̄ + αd) ≤ gi (x̄) = 0, α ∈ (0, ᾱ],
which implies x̄ + αd ∈ C for every α ∈ (0, ᾱ]. As for every i ∈ I = , gi (x) = 0

for every feasible point x ∈ C of (CP ), thereby implying that for every i ∈ I = ,
gi (x̄ + αd) = gi (x̄) = 0, α ∈ (0, ᾱ],
which implies Di≤ (x̄) = Di= (x̄) for every i ∈ I = . Therefore, by this condition,
\ \ \
Di≤ (x̄) = Di= (x̄) ∩ Di≤ (x̄),
i∈I(x̄) i∈I = i∈I < (x̄)
as desired.
(ii) If I(x̄) = ∅, the result holds trivially by (i). Suppose that I = and I < (x̄)
are nonempty. Then corresponding to any i ∈ I < , there exists some x̂ ∈ C
such that gi (x̂) < 0. By the convexity of gi , for every λ ∈ (0, 1],
gi (x̄ + λ(x̂ − x̄)) ≤ λgi (x̂) + (1 − λ)gi (x̄) < 0 = gi (x̄),
which implies that dˆ = x̂ − x̄ ∈ Di< (x̄). Also, suppose that there is some
j ∈ I < (x̄), j 6= i, then corresponding to j there exists some x̃ ∈ C such that
gj (x̃) < 0. Then as before, d˜ = x̃ − x̄ ∈ Dj< (x̄). Now if i and j are such that
gi (x̃) = 0 and gj (x̂) = 0,

then by the convexity of gi and gj , for every λ ∈ (0, 1),
gi (λx̂ + (1 − λ)x̃) ≤ λgi (x̂) + (1 − λ)gi (x̃) < 0,

gj (λx̂ + (1 − λ)x̃) ≤ λgj (x̂) + (1 − λ)gj (x̃) < 0,
which implies for λ ∈ (0, 1), (λx̂ + (1 − λ)x̃) − x̄ = λdˆ + (1 − λ)d˜ such that
λdˆ + (1 − λ)d˜ ∈ Di< (x̄) and λdˆ + (1 − λ)d˜ ∈ Dj< (x̄).
Proceeding as above, there exists d¯ ∈ Rn such that
d ∈ Di< (x̄), ∀ i ∈ I < (x̄) (6.23)
with corresponding ᾱ > 0 such that x̄+αd ∈ C for every α ∈ (0, ᾱ]. Therefore,
for every i ∈ I = ,
gi (x̄ + αd) = 0 = gi (x̄), ∀ α ∈ (0, ᾱ],
that is,
d ∈ Di= (x̄), ∀ i ∈ I = ,
which along with the condition (6.23) proves the desired result.
(iii) Consider a feasible point x ∈ C of (CP ) that implies
gi (x) ≤ 0, ∀ i ∈ I(x̄).
By the convexity of gi , i ∈ I(x̄), for every λ ∈ (0, 1],
gi (x̄ + λ(x − x̄)) ≤ λgi (x) + (1 − λ)gi (x̄) ≤ 0 = gi (x̄), ∀ i ∈ I(x̄).

T
Therefore, x − x̄ ∈ i∈I(x̄) Di≤ (x̄) for every x ∈ C, which implies
\
(C − x̄) ⊂ Di≤ (x̄).
i∈I(x̄)
T
As i∈I(x̄) Di≤ (x̄) is a cone,
\
cone (C − x̄) ⊂ Di≤ (x̄). (6.24)
i∈I(x̄)
T
Suppose that d ∈ i∈I(x̄) Di≤ (x̄), which implies there exists ᾱ > 0 such that
gi (x̄ + αd) ≤ gi (x̄) = 0, ∀ α ∈ (0, ᾱ], ∀ i ∈ I(x̄).
For i 6∈ I(x̄), gi (x̄) < 0 and thus, there exists some α′ > 0 such that for any
d ∈ Rn ,
gi (x̄ + αd) < 0, ∀ α ∈ (0, α′ ), ∀ i 6∈ I(x̄).

Therefore, by the preceding inequalities, x′ = x̄ + αd ∈ C for α ∈ (0, α∗ ],

where α∗ = min{ᾱ, α′ }, which implies αd ∈ C − x̄, thereby leading to
d ∈ cone (C − x̄), which along with the condition (6.24) yields
\
Di≤ (x̄) = cone (C − x̄).
i∈I(x̄)
By Theorem 2.35,
\
TC (x̄) = cl cone (C − x̄) = cl Di≤ (x̄),
i∈I(x̄)
hence establishing the result.

(iv) By the given hypothesis Ω ⊂ I = , which implies that the containment
relation
\ \ \
cl Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄) ⊂ cl co Di= (x̄) ∩ S(x̄) (6.25)
i∈I = i∈Ω i∈Ω
holds. To establish the result, we will prove the following:

\ \
(1) cl co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄)).
i∈Ω i∈Ω
T =
If co i∈Ω Di (x̄) is closed, then
\ \ \
cl co Di= (x̄) ∩ S(x̄) = co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄)),
i∈Ω i∈Ω i∈Ω
thereby establishing the above condition.

If Ω = I = , we prove
\ \
cl co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄)). (6.26)
i∈I = i∈I =
T
As S(x̄) is a closed convex set and i∈I = Di= (x̄) ⊂ S(x̄),
\
cl co Di= (x̄) ⊂ S(x̄).
i∈I =
T T
Also S(x) = i∈I = Si (x̄) ∩ i∈I < (x̄) Si (x̄), where
Si (x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0}.
Therefore, establishing (6.26) is equivalent to proving

\ \ \ \
cl co Di= (x̄) ∩ Si (x̄) ⊂ cl (co Di= (x̄) ∩ Si (x̄)).
i∈I = i∈I < (x̄) i∈I = i∈I < (x̄)

By condition (ii), there exists d ∈ Rn such that

\ \ \ \
d∈ Di= (x̄) ∩ Di< (x̄) ⊂ co Di= (x̄) ∩ int Si (x̄),
i∈I = i∈I < (x̄) i∈I = i∈I < (x̄)
which yields the above condition.

\ \
(2) co Di= (x̄) ∩ S(x̄) = Di= (x̄) ∩ S(x̄).
i∈Ω i∈Ω
By Proposition 6.2 (i) and (iii),

\ \ ≤
Di= (x̄) ⊂ Di (x̄).
i∈Ω i∈Ω
Because Di≤ (x̄) is convex,

\ \
co Di= (x̄) ⊂ Di≤ (x̄). (6.27)
i∈Ω i∈Ω
As Ω ⊂ I = , for every feasible point x ∈ C, gi (x) = 0, i ∈ Ω. For any

d ∈ Di≤ (x̄), i ∈ Ω, there exists ᾱi > 0 such that
gi (x̄ + αd) ≤ gi (x̄) = 0, ∀ α ∈ (0, ᾱi ],
which implies x̄ + αd ∈ C. Therefore, for any i ∈ Ω,
gi (x̄ + αd) = 0, ∀ α ∈ (0, ᾱi ],
thereby implying that d ∈ Di≤ (x̄), i ∈ Ω. Thus, the condition (6.27) becomes
\ \ \
co Di= (x̄) ⊂ Di= (x̄) ⊂ co Di= (x̄).
i∈Ω i∈Ω i∈Ω
T
The above relation implies that i∈Ω Di= (x̄) is convex, thereby leading to
\ \
co Di= (x̄) ∩ S(x̄) = Di= (x̄) ∩ S(x̄),
i∈Ω i∈Ω
T
as desired. Note that, in particular, for Ω = I = , i∈I = Di= (x̄) is convex.
\ \
(3) cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =
Suppose that Ω ( I = . We claim that
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =
Assume on the contrary that there exists d ∈ Rn such that

\ \
d∈ Di= (x̄) ∩ S(x̄) \ (cl Di= (x̄) ∩ S(x̄)).
i∈Ω i∈I =

By the given hypothesis, there exists Ω̃ ⊂ I = \ Ω such that

\
d ∈ S(x̄), d ∈ Di= (x̄)
i∈I = \Ω̃
but
\
/ Di= (x̄) ∪ cl
d∈ Di= (x̄), ∀ i ∈ Ω̃.
i∈I =
By the hypothesis I b (x̄) ⊂ Ω ⊂ I = , Ω̃ ⊂ I = \ Ω ⊂ I = \ I b (x̄), which implies

Ω̃ 6⊂ I b (x̄). By invoking Theorem 6.9, the system (CPb ) is inconsistent and
thus
gi′ (x̄, d) < 0, ∀ i ∈ Ω̃.
Therefore, \ \
d∈ Di< (x̄) ∩ Di= (x̄). (6.28)
i∈Ω̃ i∈I = \Ω̃
By (ii), as
\ \
Di= (x̄) ∩ Di< (x̄) 6= ∅,
i∈I = i∈I < (x̄)
there exists d¯ ∈ Rn such that

\ \
d¯ ∈ Di= (x̄) ∩ Di< (x̄). (6.29)
i∈I = i∈I < (x̄)
Define dλ = λd + (1 − λ)d.¯ By condition (6.28), for i ∈ Ω̃ there exists ᾱi > 0

such that
gi (x̄ + αd) < gi (x̄), ∀ α ∈ (0, ᾱi ]. (6.30)
As Ω̃ ⊂ I = , by condition (6.29), for i ∈ Ω̃ there exists α̂i > 0 such that
gi (x̄ + αd) = gi (x̄), ∀ α ∈ (0, α̂i ]. (6.31)
Denote αi = min{ᾱi , α̂i }. By the convexity of gi , i ∈ Ω̃ along with conditions

(6.30) and (6.31), for λ ∈ (0, 1],
¯ < gi (x̄), ∀ α ∈ (0, αi ],
gi (x̄ + αdλ ) ≤ λgi (x̄ + αd) + (1 − λ)gi (x̄ + αd)
which implies \
dλ ∈ Di< (x̄), ∀ λ ∈ (0, 1]. (6.32)
i∈Ω̃
Again from (6.28),

\ \
d∈ Di= (x̄) ⊂ Di≤ (x̄),
i∈I = \Ω̃ i∈I = \Ω̃

and from (6.29),

\ \ \
d¯ ∈ Di= (x̄) ⊂ Di= (x̄) ⊂ Di≤ (x̄).
i∈I = i∈I = \Ω̃ i∈I = \Ω̃
Because Di≤ (x̄), i ∈ I = \ Ω̃ are convex sets,

\
dλ ∈ Di≤ (x̄), ∀ λ ∈ (0, 1). (6.33)
i∈I = \Ω̃
By Theorem 2.69, gi , i ∈ I < (x̄), is continuous on Rn , which along with con-

dition (6.29) implies that there exists β ∈ (0, 1) such that
\
dλ ∈ Di< (x̄), λ ∈ (0, β]. (6.34)
i∈I < (x̄)
Observe that
I(x̄) = I < (x̄) ∪ I = \ Ω̃ ∪ Ω̃,
which along with (i) leads to

\ \ \ \ \
Di≤ (x̄) ∩ Di< (x̄) = Di≤ (x̄) ∩ Di≤ (x̄) ∩ Di< (x̄).
i∈I(x̄) i∈Ω̃ i∈I < (x̄) i∈I = \Ω̃ i∈Ω̃
Therefore, combining (6.32), (6.33), and (6.34) along with the above relation
yields
\ \
dλ ∈ Di≤ (x̄) ∩ Di< (x̄).
i∈I(x̄) i∈Ω̃
As Ω̃ ⊂ I = , which along with (i) implies

\ \ \ \ \
Di≤ (x̄) = Di≤ (x̄) ∩ Di= (x̄) ⊂ Di≤ (x̄) ∩ Di= (x̄).
i∈I(x̄) i∈I < (x̄) i∈I = i∈I < (x̄) i∈Ω̃
Thus,
\
dλ ∈ Di= (x̄),
i∈Ω̃
which is a contradiction to
\
dλ ∈ Di< (x̄).
i∈Ω̃
Therefore,
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =

\
Because cl Di= (x̄) and S(x̄) are closed sets,
i∈I =
\ \
cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄),
i∈Ω i∈I =
thereby establishing the desired result when Ω ( I = .

If Ω = I = ,
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄),
i∈I = i∈I =
thus yielding the desired condition as before.
From the conditions (1) through (3), it is easy to observe that

\ \
cl co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄))
i∈Ω i∈Ω
\ \
= cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄),
i∈Ω i∈I =
which along with (6.25) yields the requisite result.

(v) Using (iii) and (iv), it is enough to show that
\ \
cl Di≤ (x̄) = cl ( Di= (x̄) ∩ S(x̄)).
i∈I(x̄) i∈I =
From (ii) and Proposition 6.2 (iii), it is obvious that

\ \
cl Di≤ (x̄) ⊂ cl ( Di= (x̄) ∩ S(x̄)). (6.35)
i∈I(x̄) i∈I =
To prove the result, we claim that

\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di≤ (x̄). (6.36)
i∈I = i∈I(x̄)
T
Suppose that d ∈ i∈I = Di= (x̄) ∩ S(x̄). By (ii), there exists d¯ ∈ Rn such that
\ \
d¯ ∈ Di= (x̄) ∩ Di< (x̄).
i∈I = i∈I < (x̄)
+ (1 − λ)d.
Denote dλ = λd[ ¯ Therefore, by Theorem 2.79 and Proposition 6.2,
for every ξ ∈ ∂gi (x̄),
i∈I < (x̄)
¯ < 0, ∀ λ ∈ [0, 1),

hξ, dλ i = λhξ, di + (1 − λ)hξ, di

which again by Theorem 2.79 implies that for every i ∈ I < (x̄),
gi′ (x̄, dλ ) < 0, ∀ λ ∈ [0, 1).
Therefore, by Proposition 6.2 (iii),

\
dλ ∈ Di< (x̄), ∀ λ ∈ [0, 1). (6.37)
i∈I < (x̄)
T
Also, by the convexity of i∈I = Di= (x̄),
\
dλ ∈ Di= (x̄), ∀ λ ∈ [0, 1). (6.38)
i∈I =
Thus, by the relations (6.37) and (6.38), along with (i), we obtain
\ \ \
dλ ∈ Di< (x̄) ∩ Di= (x̄) ⊂ Di≤ (x̄), ∀ λ ∈ [0, 1).
i∈I < (x̄) i∈I = i∈I(x̄)
T
As the limit λ → 1, dλ → d, which implies d ∈ cl i∈I(x̄) Di≤ (x̄), thus proving
(6.36), which yields that
\ \
cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di≤ (x̄).
i∈I = i∈I(x̄)
The above condition along with (6.35) establishes the desired result.
(vi) Define
[
F = −co ∂gi (x̄).
i∈I < (x̄)
We will prove the result by contradiction. Assume that

\
F ∩( Di= (x̄))◦ 6= ∅,
i∈Ω
which implies there exists

\
ξ ∈F ∩( Di= (x̄))◦ .
i∈Ω
P
As ξ ∈ F , there exists ξi ∈ ∂gi (x̄) and λi ≥ 0, i ∈ I < (x̄) with i∈I < (x̄) λi = 1
such that
X
ξ=− λi ξi .
i∈I < (x̄)

\
Di= (x̄) ⊂ {ξ}◦ = {d ∈ Rn : hξ, di ≤ 0},
i∈Ω
−F ◦ ⊂ −{ξ}◦ = {d ∈ Rn : hξ, di ≥ 0}.
Therefore,
\
−F ◦ ∩ Di= (x̄) ⊂ {d ∈ Rn : hξ, di = 0}.
i∈Ω
By (ii), there exists

\ \
dˆ ∈ Di= (x̄) ∩ Di< (x̄)
i∈I = i∈I < (x̄)
\
⊂ Di= (x̄) ∩ −F ◦ ⊂ {d ∈ Rn : hξ, di = 0},
i∈Ω
ˆ = 0. As dˆ ∈ T < D< (x̄), there exists ᾱi > 0, i ∈ I < (x̄),

that is, hξ, di i∈I (x̄) i
such that
ˆ < 0, ∀ αi ∈ (0, ᾱi ], ∀ i ∈ I < (x̄).
gi (x̄ + αi d)
By the convexity of gi , i ∈ I < (x̄), for every αi ∈ (0, ᾱi ],

ˆ ≤ gi (x̄ + αi d)
αi hξi , di ˆ − gi (x̄) < 0, ∀ i ∈ I < (x̄),
which implies
X
hξ, di = λi hξi , di < 0,
i∈I < (x̄)
which is a contradiction, thereby leading to the requisite result.

Wolkowicz [112] derived a certain characterization in form of the KKT
type optimality conditions. But before presenting that result, we present a
lemma that will be required to prove the result.
Lemma 6.11 Consider the convex programming problem (CP ) with C given
by (3.1). Suppose that x̄ ∈ C and F ⊂ Rn any nonempty set. Then the
statement
x̄ is a point of minimizer of (CP ) if and only if the system

X 
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + F 
i∈I(x̄) (6.39)

λi ≥ 0, i ∈ I(x̄)
is consistent

holds for any objective function f if and only if F satisfies

b
NC (x̄) = S(x̄) + F. (6.40)
Proof. Suppose that the statement is satisfied for any fixed objective func-
tion. We will prove the condition (6.40). Consider ξ ∈ NC (x̄) and define the
objective function as f (x) = −hξ, xi. Then ξ ∈ −∂f (x̄)∩NC (x̄), which implies
0 ∈ ∂f (x̄) + NC (x̄).
By the optimality conditions for (CP ), Theorem 3.1 (ii), x̄ is a point of min-
imizer of (CP ). Therefore, by (6.39) along with ∂f (x̄) = {−ξ} leads to
b
ξ ∈ S(x̄) + F,
that is,
b
NC (x̄) ⊂ S(x̄) + F. (6.41)
b
Now suppose that ξ ∈ S(x̄) + F , which implies there exist ξi ∈ ∂gi (x̄) and
λi ≥ 0 for i ∈ I(x̄) such that
X
ξ− λi ξi ∈ F.
i∈I(x̄)
Again define the objective function as f (x) = −hξ, xi, which implies
∂f (x̄) = {−ξ}. By the above condition it is obvious that the condition (6.39)
is satisfied and thus by the statement, x̄ is a point of minimizer of (CP ).
Applying Theorem 3.1, −ξ ∈ NC (x̄), which implies
b
S(x̄) + F ⊂ NC (x̄).
The above containment along with the relation (6.41) yields the desired con-
dition (6.40).
Conversely, suppose that (6.40) holds. By Theorem 3.1 (ii), x̄ is a point of
minimizer of (CP ) if and only if
0 ∈ ∂f (x̄) + NC (x̄),
which by (6.40) is equivalent to
b
0 ∈ ∂f (x̄) + S(x̄) + F,
that is, the system (6.39) is consistent, thereby completing the proof.
b
As mentioned in the beginning of this section, (S(x̄))◦ = cl S(x̄) by Propo-
b
sition 3.9. Therefore, if S(x̄) is closed, condition (6.40) becomes
NC (x̄) = (S(x̄))◦ + F.

A similar result as the above theorem was studied by Gould and Tolle [53]
under the assumption of differentiability of the functions but not necessarily
convex.
Applying the above lemma along with some additional conditions, Wolkow-
icz [112] established KKT type optimality conditions. We present the result
below.

given by (3.1) and x̄ ∈ C. Suppose that the set Ω satisfies
I b (x̄) ⊂ Ω ⊂ I =
and both the sets

\ \
co Di= (x̄) and b
S(x̄) +( Di= (x̄))◦
i∈Ω i∈Ω
are closed. Then x̄ is a point of minimizer of (CP ) if and only if the system
X \ 
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + ( Di= (x̄))◦ , 
i∈I(x̄) i∈Ω (6.42)

λi ≥ 0, i ∈ I(x̄),
is consistent.
Proof.T Observe that the system (6.42) is obtained, in particular, by taking

F = ( i∈Ω Di= (x̄))◦ in Lemma 6.11. Thus, to establish the result, it is suffi-
cient to prove that
\
b
NC (x̄) = S(x̄) +( Di= (x̄))◦ . (6.43)
i∈Ω
By Proposition 6.10 (v),

\
TC (x̄) = S(x̄) ∩ cl co Di= (x̄),
i∈Ω
which by Propositions 2.31 and 3.9 imply that

\
NC (x̄) = cl ((S(x̄)◦ + (cl co Di= (x̄))◦ )
i∈Ω
\
b
= cl (S(x̄) +( Di= (x̄))◦ ).
i∈Ω
The closedness assumption leads to the condition (6.43), thereby yielding the
requisite result.
In the above theorem, the closedness conditions on the sets
\ \
co b
Di= (x̄) and S(x̄) +( Di= (x̄))◦
i∈Ω i∈Ω

act as a constraint qualification. If, in particular, we choose Ω = I = , then the

closedness conditions are no longer needed. In fact,
\
b
NC (x̄) = S(x̄) +( Di= (x̄))◦
i∈I =
is always satisfied. Below we present the result for this particular case.
Theorem 6.13 x̄ is a minimum of (CP ) if and only if the system

X \ 
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + ( Di= (x̄))◦ , 
i∈I(x̄) i∈I =

λi ≥ 0, i ∈ I(x̄),
is consistent.
Proof. By Theorem 3.1 (ii), x̄ ∈ C is a point of minimizer if and only if
0 ∈ ∂f (x̄) + NC (x̄).
In order to establish the result, it is enough to show that

\
b
NC (x̄) = S(x̄) +( Di= (x̄))◦ . (6.44)
i∈I =
Observe that int Di≤ (x̄) = Di< (x̄) for every i ∈ I < (x̄). Thus, invoking Propo-
sitions 2.31 and 6.10 implies
X \
NC (x̄) = TC (x̄)◦ = (Di≤ (x̄))◦ + ( Di= (x̄))◦ .
i∈I < (x̄) i∈I =
Again by Proposition 6.10 (ii), Di< (x̄) 6= ∅, which along with Proposi-
tion 6.2 (iv) yields
X \
NC (x̄) = { λi ∂gi (x̄) : λi ≥ 0, i ∈ I < (x̄)} + ( Di= (x̄))◦ .
i∈I < (x̄) i∈I =
Choosing λi = 0, i ∈ I = , the above condition leads to

\
b
NC (x̄) ⊂ S(x̄) +( Di= (x̄))◦ . (6.45)
i∈I =
By Propositions 3.9, 2.31, and 6.10 imply that

\
b
S(x̄) ⊂ (S(x̄))◦ = ( Di≤ (x̄))◦ = NC (x̄). (6.46)
i∈I(x̄)
Again, by Proposition 6.10,

\ \
( Di= (x̄))◦ ⊂ ( Di≤ (x̄))◦ = NC (x̄).
i∈I = i∈I(x̄)

As NC (x̄) is a closed convex cone, the above relation along with (6.46) leads
to
\
b
S(x̄) +( Di= (x̄))◦ ⊂ NC (x̄),
i∈I =
which together with (6.45) yields the desired condition (6.44).

In all these discussions, the notion of constraint qualification was not con-
sidered. Observe that in Theorem 6.13, instead of the standard KKT opti-
mality conditions,TWolkowicz [112] derived KKT type optimality conditions
involving the set i∈I =TDi= (x̄). The system reduces to the standard KKT op-
timality conditions if ( i∈I = Di= (x̄))◦ = {0}, that is, F = {0} in system (6.39)
of Lemma 6.11. Similar to the regularization condition of Ben-Tal, Ben-Israel,
and Zlobec [7], Wolkowicz [112] introduced the notion of regular point and
weakest constraint qualification.
Definition 6.14 A feasible point x̄ ∈ C of (CP ) is a regular point if for

any objective function f , the system (6.39) holds for F = {0}. A constraint
qualification that is satisfied if and only if x̄ is a regular point is known as the
weakest constraint qualification.
For the differentiable case, Gould and Tolle [52, 53] showed that the Abadie
constraint qualification, that is,
TC (x̄) = S(x̄)
is a weakest constraint qualification. Under the differentiability of the func-

b
tions gi , i ∈ I(x̄), the set S(x̄) is closed, which along with the Abadie con-
straint qualification is equivalent to
b
NC (x̄) = S(x̄),
which is a weakest constraint qualification. For the nonsmooth case, as dis-

cussed in Theorem 3.10, the Abadie constraint qualification along with the
b
assumption that S(x̄) is closed leads to the standard KKT conditions. In fact,
the Abadie constraint qualification is equivalent to the emptiness of the class
of badly behaved constraints I b (x̄). We present the result below.
Proposition 6.15 Let x̄ ∈ C. Then TC (x̄) = S(x̄) if and only if I b (x̄) = ∅.
Proof. Suppose that I b (x̄) = ∅. Therefore by Proposition 6.10 (iii) and (v),
it is obvious that TC (x̄) = S(x̄).
Conversely, let I b (x̄) 6= ∅, which implies there exists i∗ ∈ I b (x̄) such that
∗
i ∈ I = and there exists
\
v ∗ ∈ (Di>∗ (x̄) ∩ S(x̄))\cl Di= (x̄).
i∈I =


\
v∗ ∈
/ cl Di= (x̄) ∩ S(x̄) = TC (x̄),
i∈I =
which implies TC (x̄) 6= S(x̄), thereby proving the result.

Now we illustrate by examples the above result. Consider
C = {x ∈ Rn : x2 ≤ 0, x ≤ 0}
with g1 (x) = x2 and g2 (x) = x. Observe that C = {0}. For x̄ = 0,

TC (x̄) = {0}, and I(x̄) = I = = {1, 2}. Here,
S(x̄) = {v ∈ R : g1′ (x̄, v) ≤ 0, g2′ (x̄, v) ≤ 0}

= {v ∈ R : h∇g1 (x̄), vi ≤ 0, h∇g2 (x̄), vi ≤ 0}
= {v ∈ R : v ≤ 0}.
Thus, TC (x̄) 6= S(x̄), thereby showing that the Abadie constraint qualification
is not satisfied. Also by the definitions of the cones of directions, we have
D1> (x̄) = {v ∈ R : v 6= 0},

D2> (x̄) = {v ∈ R : v > 0},
D1= (x̄) = {0} = D2= (x̄).
Observe that I b (x̄) = {1}, that is, the set of badly behaved constraints is
nonempty.
Next let us consider the set
C = {x ∈ R : |x| ≤ 0, x ≤ 0}.
Recall from Chapter 3 that the Abadie constraint qualification is satisfied at

x̄ = 0 with S(x̄) = {0}. Here also, the cones of directions are the same as that
of the previous example but now Di> (x̄) ∩ S(x̄) = ∅, thereby showing that the
set of badly behaved constraints I b (x̄) is empty.
Wolkowicz [112] gave an equivalent characterization of the regular point
with Abadie constraint qualification and the set of badly behaved constraints
I b (x̄). We state the result below. The proof can be worked out using Theo-
rem 3.10 and Proposition 6.15.

given by (3.1) and let x̄ ∈ C. Then the following are equivalent:
(i) x̄ is a regular point,

b
(ii) Abadie constraint qualification holds at x̄ and S(x̄) is closed,
b
(iii) I b (x̄) is empty and S(x̄) is closed.

In Chapter 3 we derived the optimality conditions not only under the

Abadie constraint qualification, but also the Slater constraint qualification.
It was observed by Wolkowicz [112] that the Slater constraint qualification is
a weakest constraint qualification with respect to the Fritz John optimality
condition, which we present below.
given by (3.1). Then the Slater constraint qualification is a weakest constraint
qualification.
Proof. By Definition 6.14, the Slater constraint qualification is a weakest
constraint qualification if and only if x̄ is a regular point. Consider the Fritz
John optimality condition for (CP ); that is, if x̄ ∈ C is a point of minimizer
of (CP ), then there exist λi ≥ 0, i ∈ {0} ∪ I(x̄), not all simultaneously zero
such that
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄).
i∈I(x̄)
Suppose that the Slater constraint qualification is satisfied, that is, there exists
x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. We claim that λ0 6= 0. On the
contrary, assume that λ0 = 0. Then the above condition implies that there
exist λi ≥ 0, i ∈ I(x̄), not all simultaneously zero, such that
X
0∈ λi ∂gi (x̄),
i∈I(x̄)
which implies that there exist ξi ∈ ∂gi (x̄), i ∈ I(x̄), such that
X
0= λi ξi . (6.47)
i∈I(x̄)

hξi , x̂ − x̄i ≤ gi (x̂) − gi (x̄) < 0, ∀ ξi ∈ ∂gi (x̄),
which along with the condition (6.47) leads to a contradiction. Thus, λ0 6= 0
and hence can be normalized to one, thereby leading to the KKT optimality
conditions.
Observe that the KKT optimality condition holds at x̄ if the system
X 
0 ∈ ∂f (x̄) + λi ∂gi (x̄), 
i∈I(x̄)

λi ≥ 0, i ∈ I(x̄),
is consistent for any f , which is equivalent to the inconsistency of the system
X 
0∈ λi ∂gi (x̄), 
i∈I(x̄)

λi ≥ 0, i ∈ I(x̄), not all simultaneously zero.

Thus, the inconsistency of the above system is equivalent to

[
0 6∈ cone co ∂gi (x̄). (6.48)
i∈I(x̄)
We claim that the above condition is equivalent to the Slater constraint

qualification. Suppose that the condition (6.48) holds. Because dom gi = Rn ,
i ∈ I(x̄), by Proposition 2.83, S ∂gi (x̄) is a nonempty compact set. As
I(x̄) ⊂ {1, 2, . . . , m} is finite, i∈I(x̄) ∂gi (x̄) is also nonempty compact. Also,
as
[ [
∂gi (x̄) ⊂ cone co ∂gi (x̄),
the condition (6.48) implies that

[
0∈
/ ∂gi (x̄).
i∈I(x̄)
Invoking Proposition 3.4,

[
cone co ∂gi (x̄)
i∈I(x̄)
is a closed set. Invoking the Strict Separation Theorem, Theorem 2.26 (iii),
there exists d¯ ∈ Rn and d¯ 6= 0 such that
[
¯ < 0, ∀ z ∈ cone co
hz, di ∂gi (x̄).
i∈I(x̄)
In particular, for ξi ∈ ∂gi (x̄), i ∈ I(x̄), the above inequality leads to

¯ < 0.
hξ, di
As dom gi = Rn , i ∈ I(x̄), by Theorem 2.79, for i ∈ I(x̄),

¯ = g ′ (x̄, d)
max hξi , di ¯ < 0,
i
ξi ∈∂gi (x̄)
which implies
¯ − gi (x̄)
gi (x̄ + λd) ¯
gi (x̄ + λd)
lim = lim < 0.
λ↓0 λ λ↓0 λ
Therefore, for every λ > 0,
¯ < 0, ∀ i ∈ I(x̄).
gi (x̄ + λd) (6.49)
For i 6∈ I(x̄), gi (x̄) < 0. Because dom gi = Rn , i 6∈ I(x̄), by Theorem 2.69,

gi , i 6∈ I(x̄) is continuous over Rn . Thus, there exists λ̄ > 0 such that
gi (x̄ + λ̄d) < 0, ∀ d ∈ Rn .

¯ the above inequality becomes
In particular, for d = d,
¯ < 0, ∀ i ∈
gi (x̄ + λ̄d) / I(x̄). (6.50)
Combining (6.49) and (6.50), for x̄ + λ̄d¯ ∈ Rn ,

¯ < 0, ∀ i = 1, 2, . . . , m,
gi (x̄ + λ̄d)
which implies that the Slater constraint qualification holds.

Conversely, suppose that the Slater constraint qualification holds, that
is, there exists x̂ ∈ Rn such that gi (x̂) < 0, i ∈ I. By Definition 2.77 of
subdifferentiability, for any ξi ∈ ∂gi (x̄), i ∈ I(x̄),
hξ, x̂ − x̄i ≤ gi (x̂) − gi (x̄) = gi (x̂) < 0,
which implies that

[
hz, x̂ − x̄i < 0, ∀ z ∈ cone co ∂gi (x̄).
i∈I(x̄)
S
Therefore, z 6= 0 for any z ∈ cone co i∈I(x̄) ∂gi (x̄), thereby establishing
(6.48). Hence, the Slater constraint qualification is a weakest constraint qual-
ification.
In both these approaches, one makes use of the direction sets to establish
optimality conditions in the absence of any constraint qualification for the
convex programming problem (CP ). More recently, Jeyakumar and Li [69]
studied a class of sublinear programming problems involving separable sub-
linear constraints in the absence of any constraint qualification, which we
discuss in the next section.
6.4 Separable Sublinear Case

As already mentioned, the sublinear programming problem considered by
Jeyakumar and Li [69] involved separable sublinear constraints. So before
moving ahead with the problem, we state the concept of separable sublinear
function.
Definition 6.18 A sublinear function p : Rn → R is called a separable sub-

linear function if
n
X
p(x) = pj (xj )
j=1
with each pj : R → R, j = 1, 2, . . . , n being a sublinear function.

6.4 Separable Sublinear Case 275
The sublinear programming problem studied by Jeyakumar and Li [69] is

min p0 (x) subject to pi (x) ≤ bi , i = 1, 2, . . . , m (SP )
where p0 : Rn → R is a sublinear function, pi : Rn → R, i = 1, 2, . . . , m, is a
separable sublinear function and bi ∈ R, i = 1, 2, . . . , m. Before establishing
the optimality conditions for (SP ), we first present the Farkas’ Lemma derived
by Jeyakumar and Li [69]. Farkas’ Lemma acts as a tool in the study of
optimality conditions for (SP ) in the absence of any constraint qualification.
Theorem 6.19 Consider the sublinear function p̃0 : Rn → R and separable

sublinear functions p̃i : Rn → R, i = 1, 2, . . . , m. Then the following are
equivalent:
(i) x ∈ Rn , p̃i (x) ≤ 0, i = 1, 2, . . . , m ⇒ p̃0 (x) ≥ 0,
(ii) There exist λi ≥ 0, i = 1, 2, . . . , m, such that

m
X
p̃0 (x) + λi p̃i (x) ≥ 0, ∀ x ∈ Rn .
i=1
Proof. Suppose that condition (i) holds. We claim that condition (ii) is also
satisfied. On the contrary, assume that (ii) does not hold, which along with
the fact that for a real-valued sublinear function p : Rn → R, p(0) = 0 implies
that for any λi ≥ 0, i = 1, 2, . . . , m, x̄ = 0 is not a point of minimizer of the
unconstrained problem
m
X
min p̃0 (x) + λi p̃i (x) subject to x ∈ Rn .
i=1
As sublinear functions are a special class of convex functions, the sublinear

programming problem (SP ) is also a convex programming problem for which
the KKT optimality conditions are necessary as well as sufficient for the point
of minimizer. Therefore, the KKT optimality condition does not hold at x̄ = 0,
that is,
m
X
0∈
/ ∂(p̃0 + λi p̃i )(0).
i=1
As dom p̃i = Rn , i = 0, 1, . . . , m, by Theorem 2.69, p̃i , i = 0, 1, . . . , m, are

continuous on Rn . Applying the Sum Rule, Theorem 2.91, the above condition
becomes
m
X
0 6∈ ∂ p̃0 (0) + λi ∂ p̃i (0),
i=1

thereby implying ∂ p̃0 (0) ∩ (−P ) = ∅, where

Xm
P ={ λi ∂ p̃i (0) : λi ≥ 0, i = 1, 2, . . . , m}.
i=1
As p̃i , i = 1, 2, . . . , m, are separable sublinear functions,

n
X
p̃i (x) = p̃ij (xj ), i = 1, 2, . . . , m,
j=1
where p̃ij are sublinear functions on R. Thus,

Xm
P ={ λi (∂ p̃i1 (0) × ∂ p̃i2 (0) × . . . × ∂ p̃in (0)) : λi ≥ 0, i = 1, 2, . . . , m}.
i=1
As p̃ij : R → R, by Proposition 2.83, ∂ p̃ij is a nonempty convex and compact

set in R, that is,
∂ p̃ij (0) = [lij , uij ], i = 1, 2, . . . , m, j = 1, 2, . . . , n,
for some lij , uij ∈ R with lij ≤ uij . Therefore,

Xm
P ={ λi ([li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ]) : λi ≥ 0, i = 1, 2, . . . , m}
i=1
m
[
= cone co ([li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ])
i=1
[m
= cone co { (ai1 , ai2 , . . . , ain ) : aij ∈ [lij , uij ], j = 1, 2, . . . , n}.
i=1
Note that [li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ] forms a convex polytope in Rn
with 2n vertices denoted by
r r r
(vir ) = (vi1 , vi2 , . . . , vin ), i = 1, 2, . . . , m, r = 1, 2, . . . , 2n ,
r
where vij ∈ {lij , uij }. Also, any element in the polytope can be expressed as
the convex combination of the vertices. Therefore,
(ai1 , ai2 , . . . , ain ) = co{(vi1 ), (vi2 ), . . . , (vi2n )},
which implies that

m
[
P = cone co {(vi1 ), (vi2 ), . . . , (vi2n )}.
i=1
Hence, P is a finitely generated convex cone and thus, by Proposition 2.44, is

a polyhedral cone that is always closed.

As sublinear functions are convex, by Proposition 2.83, ∂ p̃0 (0) is a com-

pact convex set and, from the above discussion, P is a closed convex cone.
Therefore, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists
α ∈ Rn with α 6= 0 such that
sup hα, ξi < inf hα, ξi = − suphα, ξi. (6.51)

ξ∈∂ s̃0 (0) ξ∈−P ξ∈P
Consider
( m
)
X
suphα, ξi = sup hα, ξi : ξ ∈ { λi ∂ p̃i (0) : λi ≥ 0, i = 1, 2, . . . , m}
ξ∈P i=1
m
X
≥ sup{hα, ξi : ξ ∈ λi ∂ p̃i (0)}, ∀ λ ∈ Rm
+
i=1
m
X
= λi p̃i (α), ∀ λ ∈ Rm
+.
i=1
From the preceding relation and condition (6.51),

m
X
p̃0 (α) = sup hα, ξi < − suphα, ξi ≤ − λi p̃i (α), ∀ λ ∈ Rm
+, (6.52)
ξ∈∂ p̃0 (0) ξ∈P i=1
which implies
m
X
λi p̃i (α) < −p̃0 (α), ∀ λ ∈ Rm
+.
i=1
This inequality holds for every λ ∈ Rm + if s̃i (α) ≤ 0, i = 1, 2, . . . , m. Otherwise,

if for some i ∈ {1, 2, . . . , m}, s̃i (α) > 0, then choosing the corresponding
λi → +∞, we arrive at a contradiction. Also, as P is a closed convex cone,
from (6.52),
p̃0 (α) < − sup hα, ξi ≤ 0.

ξ∈P
Therefore, for α ∈ Rn ,
p̃0 (α) < 0 and p̃i (α) ≤ 0, i = 1, 2, . . . , m,
which contradicts (i). Thus condition (ii) is satisfied.

Conversely, suppose that condition (ii) holds, which implies for some
λi ≥ 0, i = 1, 2, . . . , m,
m
X
− λi p̃i (x) ≤ p̃0 (x), ∀ x ∈ Rn .
i=1

If for some x ∈ Rn , p̃i (x) ≤ 0, i = 1, 2, . . . , m, from the above inequality

p̃0 (x) ≥ 0, thereby establishing condition (i) and hence the desired result.
We end this chapter by deriving the constraint qualification free optimality
condition for the sublinear programming problem (SP ) from Jeyakumar and
Li [69].
Theorem 6.20 Consider the sublinear programming problem (SP ). Then x̄

is a minimizer of (SP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such
that
m
X m
X
0 ∈ ∂p0 (0) + λi ∂pi (0) and p0 (x̄) + λi bi = 0.
i=1 i=1
Proof. Observe that x̄ is a minimizer of (SP ) if and only if
pi (x) − bi ≤ 0, i = 1, 2, . . . , m =⇒ p0 (x) − p0 (x̄) ≥ 0. (6.53)
But Theorem 6.19 cannot be applied directly to the above system as the
theorem is for the system involving sublinear functions, whereas here neither
pi (x) − bi , i = 1, 2, . . . , m, nor p0 (x) − p0 (x̄) is positively homogeneous and
hence not sublinear functions. So define p̃i : Rn × R → R, i = 0, 1, . . . , m, as
p̃0 (x, t) = p0 (x) − tp0 (x̄) and p̃i (x, t) = pi (x) − tbi , i = 1, 2, . . . , m.
Because pi , i = 1, 2, . . . , m, are separable sublinear functions on Rn , p̃i ,

i = 1, 2, . . . , m, are also separable sublinear functions along with the sublin-
earity of p̃0 on Rn × R. Now consider the system
p̃i (x, t) ≤ 0, i = 1, 2, . . . , m =⇒ p̃0 (x, t) ≥ 0. (6.54)
This system is in the desired form needed for the application of Farkas’
Lemma, Theorem 6.19. To establish the result, we will first establish the equiv-
alence between the systems (6.53) and (6.54).
Suppose that the system (6.53) holds. We claim that (6.54) is also satisfied.
On the contrary, assume that the system (6.54) does not hold, which implies
there exists (x̃, t̃) ∈ Rn × R such that
p̃0 (x̃, t̃) < 0 and p̃i (x̃, t̃) ≤ 0, i = 1, 2, . . . , m.
For t̃ > 0, by positive homogeneity of the sublinear function and the con-
struction of p̃i , i = 0, 1, . . . , m,
p0 (x̃/t̃) − p0 (x̄) = p̃0 (x̃/t̃, 1) < 0,

pi (x̃/t̃) − bi = p̃i (x̃/t̃, 1) ≤ 0, i = 1, 2, . . . , m,
thereby contradicting (6.53).

Now, in particular, taking t̃ = 0,
p0 (x̃) = p̃0 (x̃, 0) < 0 and pi (x̃) = p̃i (x̃, 0) ≤ 0, i = 1, 2, . . . , m.
For t > 0, consider x̄ + tx̃ ∈ Rn . Therefore, by the feasibility of x̄ for (SP )

and the above condition,
p0 (x̄ + tx̃) − p0 (x̄) ≤ tp0 (x̃) < 0,

pi (x̄ + tx̃) − bi ≤ pi (x̄) − bi + tpi (x̃) ≤ 0, i = 1, 2, . . . , m,
which is again a contradiction of the system (6.53).

If t̃ < 0, then by construction of p̃i , i = 0, 1, . . . , m,
p0 (x̃) − t̃p0 (x̄) < 0 and pi (x̃) − t̃bi ≤ 0, i = 1, 2, . . . , m.
Consider x̃ + (−t̃ + 1)x̄ ∈ Rn . By the sublinearity of pi , i = 0, 1, . . . , m,
p0 (x̃ + (−t̃ + 1)x̄) ≤ p0 (x̃) + (−t̃ + 1)p0 (x̄)

≤ t̃p0 (x̄) + (−t̃ + 1)p0 (x̄) = p0 (x̄),
and
pi (x̃ + (−t̃ + 1)x̄) ≤ pi (x̃) + (−t̃ + 1)pi (x̄)

≤ t̃bi + (−t̃ + 1)bi = bi , i = 1, 2, . . . , m,
which contradicts (6.53). Thus from all three cases, it is obvious that our
assumption is wrong and hence the system (6.54) holds.
Conversely, taking t = 1 in system (6.54) yields (6.53). Hence, both systems
(6.53) and (6.54) are equivalent. Applying Farkas’ Lemma, Theorem 6.19, for
the sublinear systems to (6.54), there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
p̃0 (x, t) + λi p̃i (x, t) ≥ 0, ∀ (x, t) ∈ Rn × R,
i=1
which implies (0, 0) ∈ Rn × R is a point of minimizer of the unconstrained

problem
m
X
min p̃0 (x, t) + λi p̃i (x, t) subject to (x, t) ∈ Rn × R.
i=1
By the KKT optimality condition for the unconstrained problem, Theo-

rem 2.89,
m
X
(0, 0) ∈ ∂(p̃0 + λi p̃i )(0, 0)
i=1
Xm m
X
= ∂(p0 + λi pi )(0) × ∇(tp0 (x̄) + λi tbi )(0),
i=1 i=1

where the subdifferential is with respect to x and the gradient with respect to
t. Therefore, a componentwise comparison leads to
m
X
0 ∈ ∂(p0 + λi pi )(0) and p0 (x̄) + λi bi = 0.
i=1
As dom pi = Rn , i = 0, 1, . . . , m, by Theorem 2.69, pi , i = 0, 1, . . . , m, are

continuous on Rn . Thus, by the Sum Rule, Theorem 2.91, the first relation
yields
m
X
0 ∈ ∂p0 (0) + λi ∂pi (0),
i=1

Chapter 7
Sequential Optimality Conditions
7.1 Introduction
In this chapter we are going to look into a completely different approach to

develop optimality conditions in convex programming. These optimality con-
ditions, called sequential optimality conditions, can hold without any qualifica-
tion and thus both from a theoretical as well as practical point of view this is
of great interest. To the best of our knowledge, this approach was initiated by
Thibault [108]; Jeyakumar, Rubinov, Glover, and Ishizuka [70]; and Jeyaku-
mar, Lee, and Dinh [68]. Unlike the approach of direction sets in Chapter 6,
in the sequential approach one needs calculus rules for subdifferentials and ε-
subdifferentials, namely the Sum Rule and the Chain Rule. As the name itself
suggests, the sequential optimality conditions are established as a sequence
of subdifferentials at neighborhood points as in the work of Thibault [108] or
sequence of ε-subdifferentials at the exact point as in the study of Jeyakumar
and collaborators [68, 70]. Thibault [108] used the approach of sequential sub-
differential calculus rules while Jeyakumar and collaborators [68, 70] used the
approach of epigraphs of conjugate functions to study the sequential optimal-
ity conditions extensively. In both these approaches, the convex programming
problem involved cone constraints and abstract constraints. But keeping in
sync with the convex programming problem (CP ) studied in this book, we
consider the feasible set C involving convex inequalities. The reader must have
realized the central role of the Slater constraint qualification in the study of
optimality and duality in optimization. However, as we have seen, the Slater
constraint qualification can fail even for very simple problems. The failure of
the Slater constraint qualification was overcome by the development of the
so-called closed cone constraint qualification. It is a geometric qualification
that uses the Fenchel conjugate of the constraint function. We will study this
qualification condition in detail.
281

282 Sequential Optimality Conditions
7.2 Sequential Optimality: Thibault’s Approach

We first discuss the approach due to Thibault [108]. As already mentioned,
he makes use of sequential subdifferential rules in his work. As one will observe,
the Sum Rule and the Chain Rule are expressed in terms of the sequence
of subdifferentials at neighborhood points. We present below the Sum Rule
from Thibault [108] which involves the application of the Sum Rule given by
Hiriart-Urruty and Phelps [64].
Theorem 7.1 (Sequential Sum Rule) Consider two proper lsc convex func-
tions φi : Rn → R̄, i = 1, 2. Then for any x̄ ∈ dom φ1 ∩ dom φ2 ,
∂(φ1 + φ2 )(x̄) = lim sup {∂φ1 (x1 ) + ∂φ2 (x2 )},
xi →φi −h,i x̄
where lim supxi →φi −h,i x̄ {∂φ1 (x1 ) + ∂φ2 (x2 )} denotes the set of all limits
limk→∞ (ξ1k + ξ2k ) for which there exists xki → x̄, i = 1, 2 such that
ξik ∈ ∂φi (xki ), i = 1, 2, and
φi (xki ) − hξik , xki − x̄i → φi (x̄), i = 1, 2. (7.1)
Proof. Suppose that ξ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.120,
\
ξ∈ cl{∂1/k φ1 (x̄) + ∂1/k φ2 (x̄)},
k∈N
which implies for every k ∈ N,

ξ ∈ cl {∂1/k φ1 (x̄) + ∂1/k φ2 (x̄)}.
From Definition 2.12 of the closure of a set, for every k ∈ N,
1
ξ ∈ ∂1/k φ1 (x̄) + ∂1/k φ2 (x̄) + B.
k
Therefore, there exists ξi′k ∈ ∂1/k φi (x̄), i = 1, 2, and bk ∈ B such that
1 k
ξ = ξ1′k + ξ2′k + b . (7.2)
k
Applying the modified version of the Brøndsted–Rockafellar Theorem, Theo-
rem 2.114, there exist xki ∈ Rn and ξik ∈ ∂φi (xki ) such that for i = 1, 2,
1 1 2
kxki − x̄k ≤ √ , kξik − ξi′k k ≤ √ , |φi (xki ) − hξik , xki − x̄i − φi (x̄)| ≤ , (7.3)
k k k
√
which implies ξi′k = ξik + 1/ k bki for some bki ∈ B for i = 1, 2. Therefore, the
condition (7.2) becomes
1 1 1
ξ = ξ1k + ξ2k + ( bk + √ bk1 + √ bk2 ),
k k k

7.2 Sequential Optimality: Thibault’s Approach 283
that is,
ξ = lim (ξ1k + ξ2k ),

k→∞
which along with (7.3) yields the desired inclusion.

Conversely, suppose that
ξ ∈ lim sup {∂φ1 (x1 ) + ∂φ2 (x2 )},

xi →φi −h,i x̄
which implies for i = 1, 2, there exist xki → x̄, ξik ∈ ∂φi (xki ) satisfying
φi (xki ) − hξik , xki − x̄i → φi (x̄) and
ξ = lim (ξ1k + ξ2k ).

k→∞
As ξik ∈ ∂φi (xki ), i = 1, 2,
hξik , x − xki i ≤ φi (x) − φi (xki ), ∀ x ∈ Rn .
Also, for every x ∈ Rn ,
hξik , x − x̄i = hξik , x − xki i + hξik , xki − x̄i

≤ φi (x) − φi (xki ) + hξik , xki − x̄i, i = 1, 2,
thereby yielding
hξ1k + ξ2k , x − x̄i ≤ φ1 (x) + φ2 (x) − φ1 (xk1 ) − φ2 (xk2 )

+hξ1k , xk1 − x̄i + hξ2k , xk2 − x̄i
for every x ∈ Rn . Taking the limit as k → +∞ and using the condition (7.1),
the above inequality reduces to
hξ, x − x̄i ≤ (φ1 + φ2 )(x) − (φ1 + φ2 )(x̄), ∀ x ∈ Rn ,
which implies ξ ∈ ∂(φ1 + φ2 )(x̄), thereby establishing the requisite result.

Using a very different assumption, the Moreau–Rockafellar Sum Rule, The-
orem 2.91, was obtained by Thibault [108].
Corollary 7.2 Consider two proper lsc convex functions φi : Rn → R̄,

i = 1, 2. If
0 ∈ core(dom φ1 − dom φ2 ),
then for every x̄ ∈ dom φ1 ∩ dom φ2 ,
∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄).

Proof. By Definition 2.77 of the subdifferentials, it is easy to observe that

the inclusion
∂(φ1 + φ2 )(x̄) ⊃ ∂φ1 (x̄) + ∂φ2 (x̄) (7.4)
always holds true.
To prove the result, we will show the reverse inclusion in relation (7.4).
Consider ξ ∈ ∂(φ1 + φ2 )(x̄). Then by Theorem 7.1, for i = 1, 2, there exist
xki → x̄ and ξik ∈ ∂φi (xki ) such that
ξ = lim (ξ1k + ξ2k ) and γik = φi (xki ) − hξik , xki − x̄i → φi (x̄). (7.5)
k→∞
Denote ξ k = ξ1k + ξ2k . As 0 ∈ core(dom φ1 − dom φ2 ), by Definition 2.17, for

any y ∈ Rn and y 6= 0, there exist α > 0 and xi ∈ dom φi , i = 1, 2, such that
αy = x1 − x2 . By the convexity of φi , i = 1, 2 along with (7.5),
hξ1k , αyi = hξ1k , x1 − xk1 i + hξ1k , xk1 − x̄i + hξ1k , x̄ − x2 i

≤ φ1 (x1 ) − φ(xk1 ) + hξ1k , xk1 − x̄i + hξ1k , x̄ − x2 i
= φ1 (x1 ) − γ1k + hξ k , x̄ − x2 i + hξ2k , x2 − xk2 i + hξ2k , xk2 − x̄i
≤ φ1 (x1 ) − γ1k + hξ k , x̄ − x2 i + φ2 (x2 ) − φ2 (xk2 ) + hξ2k , xk2 − x̄i
= (φ1 (x1 ) − γ1k ) + (φ2 (x2 ) − γ2k ) + hξ k , x̄ − x2 i.
As the limit k → +∞, using the conditions (7.5),
(φ1 (x1 ) − γ1k ) + (φ2 (x2 ) − γ2k ) + hξ k , x̄ − x2 i

→ (φ1 (x1 ) − φ1 (x̄)) + (φ2 (x2 ) − φ2 (x̄)) + hξ, x̄ − x2 i.
Therefore,
My
hξ1k , yi ≤ , ∀ k ∈ N,
α
My
that is, {hξ1k , yi} is bounded above by which is independent of k. Simi-
α
k
larly, the sequence {hξ1 , −yi} is bounded above. In particular, taking y = ei ,
i = 1, 2, . . . , n, where ei is a vector in Rn with i-th component 1 and all other
zeroes,
kξ1k k∞ = max |hξ1k , ei i| ≤ max |Mi |.

i=1,2,...,n i=1,2,...,n
Thus, {ξ1k } is a bounded sequence. As ξ1k + ξ2k → ξ, {ξ2k } is also a bounded se-
quence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, the sequences
{ξik }, i = 1, 2, have a convergent subsequence. Without loss of generality,
assume that ξik → ξi , i = 1, 2, such that ξ1 + ξ2 = ξ. By Theorem 2.84,
ξi ∈ ∂φi (x̄), i = 1, 2, thereby yielding
∂(φ1 + φ2 )(x̄) ⊂ ∂φ1 (x̄) + ∂φ2 (x̄),

which along with (7.4) leads to the desired result.

Consider the convex optimization problem
where f : Rn → R̄ is a proper lsc convex function and C ⊂ Rn is a closed
convex set. We shall now provide the sequential optimality condition for (CP )
as an application to Theorem 7.1.
Theorem 7.3 Consider the convex optimization problem (CP ) where

f : Rn → R̄ is an extended-valued proper lsc convex function. Then x̄ is a
point of minimizer of (CP ) if and only if there exist xki → x̄, i = 1, 2, with
ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ NC (xk2 ) such that
ξ1k + ξ2k → 0, f (xk1 ) − hξ1k , xk1 − x̄i → f (x̄) and hξ2k , xk2 − x̄i → 0.
Proof. Observe that (CP ) is equivalent to the unconstrained problem
min (f + δC )(x) subject to x ∈ Rn .
By the optimality condition for the unconstrained programming problem, The-

orem 2.89, x̄ is a minimum to (CP ) if and only if
0 ∈ ∂(f + δC )(x̄).
Applying Theorem 7.1, there exist sequence {xki } ⊂ Rn with xki → x̄, i = 1, 2,
ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ δC (xk2 ) = NC (xk2 ) satisfying
f (xk1 ) − hξ1k , xk1 − x̄i → f (x̄) and hξ2k , xk2 − x̄i → 0
such that
lim (ξ1k + ξ2k ) = 0,

k→∞
thereby yielding a sequential optimality condition.

It is important to note that the conditions on the problem data of (CP ) was
minimal. The importance of the Sequential Sum Rule becomes clear because
under the assumptions in (CP ), it is not obvious whether the qualification
conditions needed to apply the exact Sum Rule holds or not.
For the convex programming problem (CP ) with C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
it was discussed in Chapter 3 how the normal cones could be explicitly

expressed in terms of the subdifferentials of the constraint functions gi ,
i = 1, 2, . . . , m, in presence of the Slater constraint qualification. But if the
Slater constraint qualification is not satisfied, then how would one explicitly
compute the normal cone. For that we first present the sequential Chain Rule

from Thibault [108] in a finite dimensional setting, a corollary to which plays a

pivotal role in deriving the sequential optimality conditions, when C is explic-
itly given by convex inequalities. Note that in the following we will consider
a vector-valued convex function Φ : Rn → Rm . This means that each compo-
nent function of Φ is a real-valued convex function on Rn . Equivalently, Φ is
convex if for every x1 , x2 ∈ Rn and for every λ ∈ [0, 1],
(1 − λ)Φ(x1 ) + λΦ(x2 ) − Φ((1 − λ)x1 + λx2 ) ∈ Rm
+. (7.6)
The epigraph of Φ, epi Φ, is defined as
epi Φ = {(x, µ) ∈ Rn × Rm : µ ∈ Φ(x) + Rm
+ }.
A function φ : Rm → R̄ is said to be nondecreasing on a set F ⊂ Rm if for

every y1 , y2 ∈ F ,
φ(y1 ) ≤ φ(y2 ) whenever y2 − y1 ∈ Rm
+.
Consider a vector-valued convex function Φ and let φ be nondecreasing convex

function on Φ(Rn ) + Rm n
+ . By the convexity of Φ, for every x1 , x2 ∈ R and for
every λ ∈ [0, 1], the condition (7.6) leads to
(1 − λ)Φ(x1 ) + λΦ(x2 ) ∈ Φ((1 − λ)x1 + λx2 ) + Rm n m
+ ⊂ Φ(R ) + R+ .
Also,
Φ((1 − λ)x1 + λx2 ) ∈ Φ(Rn ) ⊂ Φ(Rn ) + Rm
+.
As φ is a nondecreasing function on Φ(Rn ) + Rm

+ , by the convexity of Φ, (7.6)
implies that
φ(Φ((1 − λ)x1 + λx2 )) ≤ φ((1 − λ)Φ(x1 ) + λΦ(x2 )).
By the convexity of φ, for every x1 , x2 ∈ Rn ,
φ(Φ((1 − λ)x1 + λx2 )) ≤ (1 − λ)φ(Φ(x1 )) + λφ(Φ(x2 )), ∀ λ ∈ [0, 1],
that is, for every λ ∈ [0, 1],
(φ ◦ Φ)((1 − λ)x1 + λx2 ) ≤ (1 − λ)(φ ◦ Φ)(x1 ) + λ(φ ◦ Φ)(x2 ).
Hence, (φ ◦ Φ) is a convex function.
Below we present the Sequential Chain Rule from Thibault [108].
Theorem 7.4 (Sequential Chain Rule) Consider a vector-valued convex func-
tion Φ : Rn → Rm and a proper lsc convex function φ : Rm → R̄ that is non-
decreasing over Φ(Rn ) + Rm+ . Then for ȳ = Φ(x̄) ∈ dom φ, ξ ∈ ∂(φ ◦ Φ)(x̄) if
and only if there exist xk → x̄, yk → ȳ, ξk → ξ, τk → 0, yk′ ∈ Φ(xk ) + Rm +
with yk′ → Φ(x̄) and ηk ∈ Rm+ such that
ηk + τk ∈ ∂φ(yk ), ξk ∈ ∂(ηk Φ)(xk ), hηk , yk′ i → hηk , Φ(xk )i

and
φ(yk ) − hηk , yk − ȳi → φ(ȳ) and hηk , Φ(xk ) − ȳi → 0.

Proof. Define φ1 (x, y) = φ(y) and φ2 (x, y) = δepi Φ (x, y). We claim that
ξ ∈ ∂(φ ◦ Φ)(x̄) if and only if (ξ, 0) ∈ ∂(φ1 + φ2 )(x̄, ȳ). (7.7)
Suppose that ξ ∈ ∂(φ ◦ Φ)(x̄), which by Definition 2.77 of the subdifferential

implies that
φ(Φ(x)) − φ(Φ(x̄)) = (φ ◦ Φ)(x) − (φ ◦ Φ)(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .
Consider (x, y) ∈ epi Φ, which implies y − Φ(x) ∈ Rm m

+ , that is, y ∈ Φ(x) + R+ .
n m
Because φ is nondecreasing over Φ(R ) + R+ , φ(y) ≥ φ(Φ(x)) for every
(x, y) ∈ epi Φ. Therefore, the above condition leads to
φ(y) − φ(ȳ) ≥ hξ, x − x̄i, ∀ (x, y) ∈ epi Φ,
where ȳ = Φ(x̄). From the definition of φ1 and φ2 , for every (x, y) ∈ Rn × Rm

the above condition leads to
φ1 (x, y) + φ2 (x, y) − φ1 (x̄, ȳ) − φ2 (x̄, ȳ) ≥ hξ, x − x̄i + h0, y − ȳi,
thereby implying that (ξ, 0) ∈ ∂(φ1 + φ2 )(x̄, ȳ).

Conversely, suppose that (ξ, 0) ∈ ∂(φ1 + φ2 )(x̄, ȳ), which by the definition
of subdifferential implies that for every (x, y) ∈ Rn × Rm ,
(φ1 + φ2 )(x, y) − (φ1 + φ2 )(x̄, ȳ) ≥ hξ, x − x̄i + h0, y − ȳi.
The above inequality holds in particular for every (x, y) ∈ epi Φ. As

(x, Φ(x)) ∈ epi Φ, the above inequality reduces to
φ(Φ(x)) − φ(Φ(x̄)) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
which implies that ξ ∈ ∂(φ ◦ Φ)(x̄), thereby establishing our claim (7.7).
Now by Theorem 7.1,
(0, βk ) + (ξk , θk ) → (ξ, 0),
where
βk ∈ ∂φ(yk ), (ξk , θk ) ∈ ∂δepi Φ (xk , yk′ ),

yk → ȳ, (xk , yk′ ) → (x̄, ȳ),
φ(yk ) − hβk , yk − ȳi → φ(ȳ),
φ2 (xk , yk′ ) − hθk , yk′ − ȳi − hξk , xk − x̄i → φ2 (x̄, ȳ) = 0.
Set θk = −ηk and define τk = βk − ηk . Observe that ξk → ξ and

τk = βk + θk → 0. The preceding facts can thus be written as
(0, ηk + τk ) + (ξk , −ηk ) → (ξ, 0),

with
ηk + τk ∈ ∂φ(yk ), (ξk , −ηk ) ∈ ∂δepi Φ (xk , yk′ ),

yk → ȳ, (xk , yk′ ) → (x̄, ȳ),
φ(yk ) − hηk + τk , yk − ȳi → φ(ȳ), (7.8)
φ2 (xk , yk′ ) + hηk , yk′ − ȳi − hξk , xk − x̄i → φ2 (x̄, ȳ) = 0. (7.9)
As (xk , yk′ ) ∈ epi Φ, φ2 (xk , yk′ ) = 0, which along with ξk → ξ and xk → x̄

reduces (7.9) to
hηk , yk′ − ȳi → 0. (7.10)
Also, (ξk , −ηk ) ∈ ∂δepi Φ (xk , yk′ ) implies that
hξk , x − xk i − hηk , y − yk′ i ≤ 0, ∀ (x, y) ∈ epi Φ. (7.11)
Observe that (xk , yk′ ) ∈ epi Φ, which implies that
yk′ − Φ(xk ) ∈ Rm
+.
Therefore, for any y ′ ∈ Rm

+,
y ′ + yk′ − Φ(xk ) ∈ Rm
+,
that is, (xk , y ′ +yk′ ) ∈ epi Φ. In particular, taking x = xk and setting y = yk′ +y ′
for any y ′ ∈ Rm + in (7.11) yields
hηk , y ′ i ≥ 0, ∀ y ′ ∈ Rm
+,
which implies that ηk ∈ Rm + . Taking x = xk and y = Φ(xk ) in

(7.11), hηk , yk′ − Φ(xk )i ≤ 0 which along with the facts that ηk ∈ Rm + and
yk′ − Φ(xk ) ∈ Rm ′
+ leads to hηk , yk − Φ(xk )i = 0. Therefore, (7.11) is equivalent
to
hξk , x − xk i ≤ hηk , yi − hηk , Φ(xk )i, ∀ (x, y) ∈ epi Φ.
In particular, for y = Φ(x),
hξk , x − xk i ≤ (ηk Φ)(x) − (ηk Φ)(xk ), ∀ x ∈ Rn .
Observe that as ηk ∈ Rm + , (ηk Φ) is a convex function and thus the above

inequality implies that (7.11) is equivalent to ξk ∈ ∂(ηk Φ)(xk ). Also from the
condition hηk , yk′ − Φ(xk )i = 0, we have hηk , yk′ i = hηk , Φ(xk )i. Inserting this
fact in (7.10) leads to
hηk , Φ(xk ) − ȳi → 0.
Because τk → 0, (7.8) is equivalent to
φ(yk ) − hηk , yk − ȳi → φ(ȳ),


Next we present the Sequential Chain Rule in a simpler form using the
above theorem and a lemma for which one will require the notion of the Clarke
subdifferential. Recall that at any x ∈ Rn the Clarke generalized gradient or
Clarke subdifferential is given as
∂ ◦ φ(x) = co {ξ ∈ Rn : ξ = lim ∇φ(xk ) where xk → x, xk ∈ D̃},

k→∞
where D̃ denotes the set of points at which φ is differentiable. The Sum

Rule from Clarke [27] is as follows. Consider two locally Lipschitz functions
φ1 , φ2 : Rn → R. Then for λ1 , λ2 ∈ R,
∂ ◦ (λ1 φ1 + λ2 φ2 )(x) ⊂ λ1 ∂ ◦ φ1 (x) + λ2 ∂ ◦ φ2 (x).
For a convex function φ, the convex subdifferential and the Clarke subdiffer-
ential coincide, that is, ∂φ(x̄) = ∂ ◦ φ(x̄). Now we present the lemma that plays
an important role in obtaining the Sequential Chain Rule.
Lemma 7.5 Consider a locally Lipschitz vector-valued function Φ : Rn → Rm .

Suppose that there exist {λk } ⊂ Rm and {xk } ⊂ Rn with λk → 0 and xk → x̄
such that
ωk ∈ ∂ ◦ (λk Φ)(xk ), ∀ k ∈ N.
Then ωk → 0.
Proof. By the Clarke Sum Rule,

m
X
ωk ∈ ∂ ◦ (λk Φ)(xk ) ⊂ λki ∂ ◦ φi (xk ), ∀ k ∈ N,
i=1
where Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and φi : Rn → R, i = 1, 2, . . . , m,

are locally Lipschitz functions. From the above condition, there exist
ωik ∈ ∂ ◦ φi (xk ), i = 1, 2, . . . , m, such that
m
X
ωk = λki ωik .
i=1
Therefore,
m
X m
X
kωk k = k λki ωik k ≤ |λki | kωik k.
i=1 i=1
Because the Clarke subdifferential ∂ ◦ φi (xk ), i = 1, 2, . . . , m, are compact,

{ωik }, i = 1, 2, . . . , m, are bounded sequences and hence by the Bolzano–
Weierstrass Theorem, Proposition 1.3, have a convergent subsequence. With-
out loss of generality, assume that ωik → ωi , i = 1, 2, . . . , m. Because the

Clarke subdifferential as a set-valued map has a closed graph, ωi ∈ ∂ ◦ φi (x̄),

i = 1, 2, . . . , m. Also, as λk → 0, |λki | → 0. Hence, by the compactness of the
Clarke subdifferential, for i = 1, 2, . . . , m, kωi k ≤ Mi < +∞. Therefore,
m
X m
X
|λki | kωik k → 0(Mi ) = 0,
i=1 i=1
thus implying that ωk → 0.
Theorem 7.6 (A simpler version of the Sequential Chain Rule) Consider

a vector-valued convex function Φ : Rn → Rm and a proper lsc convex
function φ : Rm → R̄ that is nondecreasing over Φ(Rn ) + Rm + . Then for
ȳ = Φ(x̄) ∈ dom φ, ξ ∈ ∂(φ ◦ Φ)(x̄) if and only if there exist
ηk ∈ ∂φ(yk ) and ξk ∈ ∂(ηk Φ)(xk )
satisfying
xk → x̄, yk → Φ(x̄), ξk → ξ,
φ(yk ) − hηk , yk − ȳi → φ(ȳ) and hηk , Φ(xk ) − ȳi → 0.
Proof. Consider ξ ∈ ∂(φ ◦ Φ)(x̄). Suppose that xk , yk , yk′ , ξk , ηk , and τk are

as in Theorem 7.4. Denote ζk = ηk + τk . Observe that for every k ∈ N,
ξk ∈ ∂(ηk Φ)(xk ) and ηk Φ = ζk Φ + (ηk − ζk )Φ,
with ηk − ζk = −τk → 0. Because Φ is convex and every component is

locally Lipschitz, it is simple to show that Φ is also locally Lipschitz. As
xk → x̄, for sufficiently large k, xk ∈ N (x̄) where N (x̄) is a neighborhood of
x̄ on which Φ satisfies the Lipschitz property with Lipschitz constant L̄ > 0.
Hence, (ηk − ζk )Φ is also locally Lipschitz over N (xk ) with Lipschitz constant
L̄kηk − ζk k > 0. This follows from the fact that for any x, x′ ∈ N (xk ),
|(ηk − ζk )Φ(x) − (ηk − ζk )Φ(x′ )| = |h(ηk − ζk ), Φ(x) − Φ(x′ )i|,
which by the Cauchy–Schwarz inequality, Proposition 1.1, implies that
|(ηk − ζk )Φ(x) − (ηk − ζk )Φ(x′ )| ≤ kηk − ζk k kΦ(x) − Φ(x′ )k

≤ L̄kηk − ζk k kx − x′ k.
From Theorem 7.4, ηk ∈ Rm + , which implies (ηk Φ) is convex. However, (ζk Φ)

and ((ηk − ζk )Φ) need not be convex. Thus, ξk ∈ ∂(ηk Φ)(xk ) implies
ξk ∈ ∂ ◦ (ηk Φ)(xk ) = ∂ ◦ {ζk Φ + (ηk − ζk )Φ}(xk ).
By the Clarke Sum Rule,
ξk ∈ ∂ ◦ (ζk Φ)(xk ) + ∂ ◦ ((ηk − ζk )Φ)(xk ),

which implies that there exist ρk ∈ ∂ ◦ (ζk Φ)(xk ) and ̺ ∈ ∂ ◦ ((ηk − ζk )Φ)(xk )
such that ξk = ρk + ̺k . As ηk − ζk → 0 and xk → x̄, by Lemma 7.5, ̺k → 0.
Setting ̺k = −βk ,
ρk = ξk + βk ∈ ∂ ◦ (ζk Φ)(xk ).
As the limit k → +∞, ρk → ξ, ζk = ηk + τk ∈ ∂φ(yk ), hζk , Φ(xk ) − ȳi → 0,

and
φ(yk ) − hζk , yk − ȳi → φ(ȳ),
thereby yielding the desired conditions.

Conversely, suppose that conditions are satisfied. As ηk ∈ ∂φ(yk ),
φ(y) − φ(yk ) ≥ hηk , y − yk i, ∀ y ∈ Rm ,
which implies that
φ(y) ≥ φ(yk ) − hηk , yk − ȳi + hηk , y − ȳi, ∀ y ∈ Rm .
In particular, for y = Φ(x) for every x ∈ Rn , the above inequality yields that
for every x ∈ Rn ,
(φ ◦ Φ)(x) ≥ φ(yk ) − hηk , yk − ȳi + hηk , Φ(x) − ȳi

= φ(yk ) − hηk , yk − ȳi + hηk , Φ(x) − Φ(xk )i + hηk , Φ(xk ) − ȳi
= φ(yk ) − hηk , yk − ȳi + (ηk ◦ Φ)(x) − (η ◦ Φ)(xk ) +
hηk , Φ(xk ) − ȳi.
Because ξk ∈ ∂(ηk ◦ Φ)(xk ), for every x ∈ Rn ,
(φ ◦ Φ)(x) ≥ φ(yk ) − hηk , yk − ȳi + hξk , x − xk i + hηk , Φ(xk ) − ȳi,
which as the limit k → +∞ reduces to

¯ x − x̄i
(φ ◦ Φ)(x) ≥ φ(ȳ) + hξ,
¯ x − x̄i, ∀ x ∈ Rn .
= (φ ◦ Φ)(x̄) + hξ,
Thus, ξ ∈ ∂(φ ◦ Φ)(x̄), thereby completing the result.

Now we move on to establish the sequential optimality condition for (CP )
with the real-valued convex objective function f : Rn → R obtained by
Thibault [108] using the above theorem. To apply the result, we equivalently
expressed the feasible set C as
C = {x ∈ Rn : G(x) ∈ −Rm
+ },
where G(x) = (g1 (x), g2 (x), . . . , gm (x)). Observe that G : Rn → Rm is a

vector-valued convex function as gi , i = 1, 2, . . . , m, are convex. Now using
the sequential subdifferential calculus rules, Theorems 7.1 and 7.6, we present
the sequential optimality conditions for the constrained problem (CP ).

Theorem 7.7 Consider the convex programming problem problem (CP ) with
C given by (3.1). Then x̄ ∈ C is a point of minimizer of (CP ) if and only if
there exist xk → x̄, yk → G(x̄), λk ∈ Rm + , ξ ∈ ∂f (x̄), and ξk ∈ ∂(λk G)(xk )
such that
ξ + ξk → 0, hλk , yk i = 0,
hλk , yk − G(x̄)i → 0, hλk , G(xk ) − G(x̄)i → 0.
Proof. Observe that the problem (CP ) can be rewritten as the unconstrained
problem
min(f + (δ−Rm
+
◦ G))(x) subject to x ∈ Rn .
By Theorem 2.89, x̄ is a point of minimizer of (CP ) if and only if
0 ∈ ∂(f + (δ−Rm
+
◦ G))(x̄).
As dom f = Rn , invoking the Sum Rule, Theorem 2.91,
0 ∈ ∂f (x̄) + ∂(δ−Rm
+
◦ G)(x̄),
which is equivalent to the existence of ξ ∈ ∂f (x̄) and ξˆ ∈ ∂(δ−Rm

+
◦ G)(x̄) such
that
ξ + ξˆ = 0.
As G is a convex function and the indicator function δ−Rm
+
is a proper lsc con-
vex function nondecreasing over G(Rn ) + Rm+ , thereby applying Theorem 7.6,
ξ ∈ ∂(δ−R+ ◦ G)(x̄) if and only if there exist xk → x̄, yk → G(x̄), ξk → ξˆ
ˆ m
such that
λk ∈ ∂δ−Rm
+
(yk ) and ξk ∈ ∂(λk G)(xk ),
satisfying
δ−Rm
+
(yk ) − hλk , yk − G(x̄)i → (δ−Rm
+
◦ G)(x̄) and hλk , G(xk ) − G(x̄)i → 0.
As λk ∈ ∂δ−Rm
+
(yk ) = N−Rm
+
(yk ), the sequence {yk } ⊂ −Rm
+ with
hλk , y − yk i ≤ 0, ∀ y ∈ −Rm
+.
In particular, taking y = 0 and y = 2yk in the above inequality leads to

hλk , yk i = 0.
Thus,
hλk , yi ≤ 0, ∀ y ∈ −Rm
+,
which implies {λk } ⊂ Rm m

+ . Using the fact that {yk } ⊆ −R+ , the condition
δ−Rm
+
(yk ) − hλk , yk − G(x̄)i → (δ−Rm
+
◦ G)(x̄)
reduces to
hλk , yk − G(x̄)i → 0,

7.3 Fenchel Conjugates and Constraint Qualification 293
7.3 Fenchel Conjugates and Constraint Qualification

Observe that in the previous section, the sequential optimality conditions are
obtained in terms of the subdifferentials that are calculated at some neigh-
boring point rather than the exact point of minimum, as is the case in the
standard KKT conditions. But this can be overcome by using the Brøndsted–
Rockafellar Theorem, Theorem 2.114, thereby expressing the result in terms of
the ε-subdifferentails at the exact point. To the best of our knowledge this was
carried out by Jeyakumar, Rubinov, Glover, and Ishizuka [70] and Jeyakumar,
Lee, and Dinh [68]. In their approach, the epigraph of the conjugate function
of the objective function and the constraints play a central role in the charac-
terization of the optimality for the convex programming problem (CP ). The
proof is based on the result in Jeyakumar, Rubinov, Glover, and Ishizuka [70].
by (3.1). Then x̄ is a point of minimizer of (CP ) if and only if
[ Xm
(0, −f (x̄)) ∈ epi f ∗ + cl epi ( λi gi )∗ . (7.12)
λ∈Rm
+
i=1
Proof. Recall that the feasible set C of the convex programming problem
(CP ) is given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
which can be equivalently expressed as
C = {x ∈ Rn : G(x) ∈ −Rm
+ },
where G : Rn → Rm is defined as G(x) = (g1 (x), g2 (x), . . . , gm (x)). Because

gi , i = 1, 2, . . . , m, are convex functions, G is also a convex function. x̄ is a
point of minimizer of (CP ) if and only if
f (x) ≥ f (x̄), ∀ x ∈ C,
that is,
φ(x) + δC (x) ≥ 0, ∀ x ∈ Rn ,
where φ(x) = f (x) − f (x̄). By Definition 2.101 of the conjugate function,
(φ + δC )∗ (0) = sup {h0, xi − (φ + δC )(x)}

x∈Rn
= sup −(φ + δC )(x) ≤ 0.
x∈Rn

This shows that
(0, 0) ∈ epi (φ + δC )∗ ,
which by the epigraph of the conjugate of the sum, Theorem 2.123, implies
that
(0, 0) ∈ cl{epi φ∗ + epi δC

∗
}.
As dom φ = Rn , by Theorem 2.69, φ is continuous on Rn . Hence, by Propo-

sition 2.124, epi φ∗ + epi δC
∗
is closed, which reduces the above condition
to
(0, 0) ∈ epi φ∗ + epi δC
∗
. (7.13)
Consider
(λG)(x) = hλ, G(x)i.
For x ∈ C, G(x) ∈ −Rm m

+ , which implies (λG)(x) ≤ 0 for every λ ∈ R+ . Thus,
sup (λG)(x) = 0. (7.14)

λ∈Rm
+
If x ∈
/ C, there exists some i ∈ {1, 2, . . . , m} such that gi (x) > 0. Hence, it is
simple to see that
sup (λG)(x) = +∞. (7.15)
λ∈Rm
+
Combining (7.14) and (7.15),
δC (x) = sup (λG)(x).

λ∈Rm
+
Applying Theorem 2.123, relation (7.13) along with Proposition 2.103 yields
[
(0, 0) ∈ epi f ∗ + (0, f (x̄)) + cl co epi(λG)∗ .
λ∈Rm
+
S
By Theorem 2.123, λ∈Rm epi(λG)∗ is a convex cone and thus, the above
+
relation reduces to
[
(0, 0) ∈ epi f ∗ + (0, f (x̄)) + cl epi(λG)∗
λ∈Rm
+
[ Xm
= epi f ∗ + (0, f (x̄)) + cl epi( λi gi )∗ ,
λ∈Rm
+
i=1
thereby leading to the requisite condition (7.12).

Conversely, suppose that condition (7.12) holds, which implies

Pmthere ex-
ist ξ ∈ dom f ∗ , α ≥ 0, αk ≥ 0, {λk } ⊂ Rm + , {ξk } ⊂ dom ( k ∗
i=1 i i ) ,
λ g
i = 1, 2, . . . , m, such that
m
X
∗
(0, −f (x̄)) = (ξ, f (ξ) + α) + lim (ξk , (λki gi )∗ (ξk ) + αk ).
k→∞
i=1
Componentwise comparison leads to
0 = ξ + lim ξk , (7.16)
k→∞
m
X
−f (x̄) = f ∗ (ξ) + α + lim ( (λki gi )∗ (ξk ) + αk ). (7.17)
k→∞
i=1
By Definition 2.101 of the conjugate functions, the condition (7.17) implies

that for every x ∈ Rn ,
m
X
f (x̄) − f (x) ≤ −hξ, xi − α − lim ( (λki gi )∗ (ξk ) + αk )
k→∞
i=1
m
X
≤ −hξ, xi − α − lim (hξk , xi − λki gi (x) + αk ).
k→∞
i=1
In particular, taking x ∈ C, that is, gi (x) ≤ 0, i = 1, 2, . . . , m, in the above

inequality along with the nonnegativity of α, αk , λki , i = 1, 2, . . . , m, and the
condition (7.16) yields
f (x̄) ≤ f (x), ∀ x ∈ C.
Therefore, x̄ is a point of minimizer of (CP ), as desired.

As one can express the epigraph of conjugate functions in terms of the
ε-subdifferential of the function, Theorem 2.122, Jeyakumar et al. [70, 68]
expressed the above theorem in terms of the ε-subdifferentials, thus obtaining
the sequential optimality conditions presented below. We present the same
using the condition (7.12) obtained in Theorem 7.8.
by (3.1). Then x̄ is a point of minimizer for (CP ) if and only if there exist
ξ ∈ ∂f (x̄), εki ≥ 0, λki ≥ 0, ξik ∈ ∂εki gi (x̄), i = 1, 2, . . . , m, such that
m
X m
X m
X
ξ+ λki ξik → 0, λki gi (x̄) → 0 and λki εki ↓ 0 as k → +∞.
i=1 i=1 i=1
Proof. Consider Theorem 7.8, according to which x̄ is a point of minimizer

of (CP ) if and only if the containment (7.12) is satisfied. By Theorem 2.122,

Pm
there exist ξ ∈ ∂ε f (x̄), λki ≥ 0 and ξk ∈ ∂εk ( i=1 λki gi )(x̄), i = 1, 2, . . . , m,
with ε, εk ≥ 0 such that
Xm
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + lim (ξk , hξk , x̄i + εk − ( λki gi )(x̄)).
k→∞
i=1
0 = ξ + lim ξk ,
k→∞
Xm
−ε = hξ, x̄i + lim (hξk , x̄i + εk − ( λki gi )(x̄)),
k→∞
i=1
which together imply that

Xm
−ε = lim (εk − ( λki gi )(x̄)).
k→∞
i=1
This equation along with the nonnegativity of ε, εk , and λki , i = 1, 2, . . . , m,

implies
Xm
ε = 0, εk ↓ 0 and λki gi (x̄) → 0. (7.18)
i=1
As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.115, there exist εki ≥ 0,

n
i = 1, 2, . . . , m, such that
m
X m
X
ξk ∈ ∂εki (λki gi )(x̄) and εk = εki .
i=1 i=1
Define I¯k = {i ∈ {1, 2, . . . , m} : λki > 0}. By Theorem 2.117,
∂εki (λki gi )(x̄) = λki ∂ε̄ki gi (x̄), ∀ i ∈ I¯k ,
εki
where ε̄ki = ≥ 0. Therefore,
λki
X X
ξk ∈ λki ∂ε̄ki gi (x̄) + ∂εki (λki gi )(x̄). (7.19)
i∈I¯k / I¯k
i∈
As discussed in Chapter 2, the ε-subdifferential of zero function is zero, that

is, ∂ε 0(x) = {0}. Thus,
/ I¯k .
∂εki (λki gi )(x̄) = 0 = λki ∂εki gi (x̄), ∀ i ∈
The above relation along with the condition (7.19) yields that
X X
ξk ∈ λki ∂ε̄ki gi (x̄) + λki ∂εki gi (x̄). (7.20)
i∈I¯k / I¯k
i∈

Also,
X X
εk = λki ε̄ki + λki εki ,
i∈I¯k / I¯k
i∈
which along with (7.20) leads to the desired sequential optimality conditions.
Conversely, suppose that the sequential optimality conditions hold. From
Definitions 2.77 and 2.109 of subdifferentials and ε-subdifferentials,
f (x) − f (x̄) ≥ hξ, x − x̄i,

gi (x) − gi (x̄) ≥ hξik , x − x̄i − εki , i = 1, 2, . . . , m,
respectively. The above inequalities along with the sequential optimality con-
ditions imply that
m
X
f (x) − f (x̄) + λki gi (x) ≥ 0, ∀ x ∈ Rn ,
i=1
where {λki } ⊂ R+ , i = 1, 2, . . . , m. In particular, taking x ∈ C, that is,

gi (x) ≤ 0, i = 1, 2, . . . , m, which along with the condition on {λk } reduces
the above inequality to
f (x) ≥ f (x̄), ∀ x ∈ C,
thereby establishing the optimality of x̄ for (CP ).

Observe that not only is the optimality condition sequential, but one ob-
tains a sequential complementary slackness condition. Note that we are work-
ing in a simple scenario with a convex inequality system. This helps in ex-
pressing the condition (7.12) derived in Theorem 7.8 in a more relaxed form.
By applying Theorem 2.123, the condition becomes
[ Xm
(0, −f (x̄)) ∈ epi f ∗ + cl cl( epi (λi gi )∗ ). (7.21)
λ∈Rm
+
i=1
By the closure properties of the arbitrary union of sets, the condition (7.21)
leads to
m
[ X
(0, −f (x̄)) ∈ epi f ∗ + cl epi (λi gi )∗ . (7.22)
λ∈Rm
+
i=1
Define I¯λ = {i ∈ {1, 2, . . . , m} : λi > 0}. Again by Theorem 2.123,
epi (λi gi )∗ = λi epi gi∗ , ∀ i ∈ I¯λ .
For i 6∈ I¯λ with λi = 0,

∗ ∗ 0, ξ = 0,
(λi gi ) (ξ) = 0 (ξ) =
+∞, otherwise,

which implies that

/ I¯λ .
epi (λi gi )∗ = {0} × R+ , ∀ i ∈
Using the preceding conditions, the relation (7.22) becomes
[ X X
(0, −f (x̄)) ∈ epi f ∗ + cl ( λi epi gi∗ + {0} × R+ )
λ∈Rm ¯
+ i∈Iλ / I¯λ
i∈
[ X
= epi f ∗ + cl ( λi epi gi∗ + {0} × R+ ). (7.23)
λ∈Rm
+ i∈I¯λ
P
Now consider (ξ, α) ∈ i∈I¯λ λi epi gi∗ , which implies that for i ∈ I¯λ there exist
(ξi , αi ) ∈ epi gi∗ such that
X
(ξ, α) = λi (ξi , αi ).
i∈I¯λ
Therefore, for any element (0, ᾱ) ∈ {0} × R+ ,

X
(ξ, α + ᾱ) = λi (ξi , αi + ᾱ/λi ),
i∈I¯λ
ᾱ
where ≥ 0. As (ξi , αi ) ∈ epi gi∗ ,
λi
ᾱ
gi∗ (ξi ) ≤ αi ≤ αi + , ∀ i ∈ I¯λ ,
λi
P
which implies that (ξi , αi + ᾱ/λi ) ∈ epi gi∗ . Hence (ξ, α + ᾱ) ∈ i∈I¯λ λi epi gi∗
for every ᾱ ≥ 0. Therefore, (7.23) reduces to
m
[ X
∗
(0, −f (x̄)) ∈ epi f + cl λi epi gi∗ .
λ∈Rm
+
i=1
It is quite simple to see that

m
[ X m
[
cl λi epi gi∗ = cl cone co epi gi∗ .
λ∈Rm
+
i=1 i=1
We leave this as an exercise for the reader. Hence,

m
[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi gi∗ . (7.24)
i=1
The condition (7.24) implies that there exist ξ ∈ dom f ∗ , α ≥ 0, ξik ∈ dom gi∗ ,
αik , λki ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λki (ξik , gi∗ (ξik ) + αik ).
k→∞
i=1


m
X
0 = ξ + lim λki ξik , (7.25)
k→∞
i=1
m
X
−f (x̄) = f ∗ (ξ) + α + lim λki (gi∗ (ξik ) + αik ). (7.26)
k→∞
i=1
By Definition 2.101 of the conjugate functions, the condition (7.26) implies

that
m
X
f (x̄) − f (x) ≤ −hξ, xi − α − lim λki (gi∗ (ξik ) + αik )
k→∞
i=1
m
X
≤ −hξ, xi − α − lim λki (hξik , xi − gi (x) + αik ).
k→∞
i=1
In particular, taking x ∈ C along with the nonnegativity of α, αik , and λki ,

i = 1, 2, . . . , m, and the condition (7.25) yields
f (x̄) ≤ f (x), ∀ x ∈ C.
Therefore, x̄ is a point of minimizer of (CP ) under the relation (7.24). This
discussion can be stated as the following result.
given by (3.1). Then x̄ is a point of minimizer of (CP ) if and only if
m
[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi gi∗ . (7.27)
i=1
Using the above result, we present an alternate proof to the sequential

optimality conditions, Theorem 7.9.
Alternate proof of Theorem 7.9. According to the Theorem 7.10, x̄ is
a point of minimizer of (CP ) if and only if the containment (7.27) is satis-
fied. By Theorem 2.122, there exist ξ ∈ ∂ε f (x̄), ξik ∈ ∂εki gi (x̄) and λki ≥ 0,
i = 1, 2, . . . , m, with ε, εki ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + lim λki (ξik , hξik , x̄i + εki − gi (x̄)).
k→∞
i=1

m
X
0 = ξ + lim λki ξik ,
k→∞
i=1
m
X
−ε = hξ, x̄i + lim λki (hξik , x̄i + εki − gi (x̄)),
k→∞
i=1

which together imply that

m
X
−ε = lim λki (εki − gi (x̄)).
k→∞
i=1
k k
This equation along
Pm with the nonnegativity
Pm of kε,k εi and λi , i = 1, 2, . . . , m,
k
implies ε = 0, i=1 λi gi (x̄) → 0 and i=1 λi εi ↓ 0 as k → +∞, thereby
establishing the sequential optimality conditions. The converse can be verified
as in Theorem 7.9.
As already discussed in the previous chapters, if one assumes certain con-
straint qualifications, then the standard KKT conditions can be established. If
we observe the necessary and sufficient condition
S given
Pmin Theorem 7.8 care-
fully, we will observe that the term cl λ∈Rm epi ( i=1 λi gi )∗ prevents us
+
from further manipulation. On the other hand, one might feel that the route
to the KKT optimality conditions lies in further manipulation of the condition
(7.12). Further, observe that we arrived at the condition (7.12) without any
constraint qualification. However, in order to derive the KKT optimality con-
ditions, one needs some additional qualification conditions on the constraints.
Thus from (7.12) it is natural to consider that the set
[ Xm
epi ( λi gi )∗ is closed.
λ∈Rm
+
i=1
This is usually known as the closed cone constraint qualification or the Farkas–
Minkowski (FM) constraint qualification. One may also take the more relaxed
constraint qualification based on condition (7.27), that is,
m
[
cone (co epi gi∗ ) is closed.
i=1
We will call the above constraint qualification as the relaxed FM constraint

qualification. Below we derive the standard KKT conditions under either of
the two constraint qualification.

given by (3.1). Assume that either the FM constraint qualification holds or
the relaxed FM constraint qualification holds. Then x̄ is a point of minimizer
of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Proof. From Theorem 7.8, we know that x̄ is a point of minimizer of (CP )

if and only if the relation (7.12) holds. As the FM constraint qualification is

satisfied, (7.12) reduces to

[ Xm
(0, −f (x̄)) ∈ epi f ∗ + epi ( λi gi )∗ .
λ∈Rm
+
i=1
By the ε-subdifferential characterization of the epigraph of the conjugate func-

Pm 2.122, there exist ξ ∈ ∂ε f (x̄), λi ≥ 0, i = 1, 2, . . . , m, and
tion, Theorem
ξ ′ ∈ ∂ε′ ( i=1 λi gi )(x̄) with ε, ε′ ≥ 0 such that
Xm
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + (ξ ′ , hξ ′ , x̄i + ε′ − ( λi gi )(x̄)).
i=1
0 = ξ + ξ′, (7.28)
m
X
−f (x̄) = hξ, x̄i + ε − f (x̄) + hξ ′ , x̄i + ε′ − ( λi gi )(x̄). (7.29)
i=1
By the feasibility of x̄ ∈ C along with the nonnegativity of ε, ε′ , and λi ,

i = 1, 2, . . . , m, the condition (7.29) leads to
m
X
ε = 0, ε′ = 0, and λi gi (x̄) = 0.
i=1
Because ε = 0 and ε′ = 0, the condition (7.28) lead to the fact that

m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄). (7.30)
i=1
Pm
Further, i=1 λi gi (x̄) = 0 implies that
λi gi (x̄) = 0, i = 1, 2, . . . , m. (7.31)
The conditions (7.30) and (7.31) together yield the KKT optimality condi-
tions.
Now if the relaxed constraint qualification is satisfied, (7.27) reduces to
m
[
∗
(0, −f (x̄)) ∈ epi f + cone co epi gi∗ .
i=1
By the ε-subdifferential characterization of the epigraph of the conjugate

function, Theorem 2.122, there exist ξ ∈ ∂ε f (x̄), ξi ∈ ∂εi gi (x̄) and λi ≥ 0,
i = 1, 2, . . . , m, with ε, εi ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + λi (ξi , hξi , x̄i + εi − gi (x̄)).
i=1


m
X
0 = ξ+ λi ξi , (7.32)
i=1
m
X
−f (x̄) = hξ, x̄i + ε − f (x̄) + λi (hξi , x̄i + εi − gi (x̄)). (7.33)
i=1
By the feasibility of x̄ ∈ C along with the nonnegativity of ε, εi , and λi ,

i = 1, 2, . . . , m, the condition (7.33) leads to
m
X m
X
ε = 0, λ i εi = 0 and λi gi (x̄) = 0.
i=1 i=1
Let us assume that I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Then

¯ εi = 0 and gi (x̄) = 0, imply that
corresponding to any i ∈ I,
ξ ∈ ∂f (x̄), ξi ∈ ∂gi (x̄) and ¯
λi gi (x̄) = 0, i ∈ I. (7.34)
Therefore, from (7.32) and (7.34),
X X
0=ξ+ λi ξi ∈ ∂f (x̄) + λi ∂gi (x̄).
i∈I¯ i∈I¯
¯ choose εi = 0, the above condition leads to

For i 6∈ I,
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄),
i=1
along with the complementary slackness condition λi gi (x̄) = 0, i = 1, 2, . . . , m

and thereby establishing the standard KKT optimality conditions. The reader
should try to see how to arrive at the KKT optimality conditions when I¯ is
empty. The sufficiency can be worked out using Definition 2.77 of subdiffer-
entials, as done in Chapter 3.
The proof of the KKT optimality conditions under the FM constraint
qualification was given by Jeyakumar, Lee, and Dinh [68] and that using
the relaxed FM condition is based on Jeyakumar [67]. It has been shown by
Jeyakumar, Rubinov, Glover, and Ishizuka [70] that under the Slater con-
straint qualification, the FM constraint qualification holds. We present the
result below proving the same.
Proposition 7.12 Consider the set C given by (3.1). Assume that the Slater
constraint qualification holds, that is, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m. Then the FM constraint qualification is satisfied, that is,
[ Xm
epi ( λi gi )∗ is closed.
λ∈Rm
+
i=1

Proof. Observe that defining G = (g1 , g2 , . . . , gm ),
[ Xm [
epi ( λi gi )∗ = epi (λG)∗ .
λ∈Rm
+
i=1 λ∈Rm
+
Suppose that
[
(ξk , αk ) → (ξ, α) ∈ cl epi (λG)∗
λ∈Rm
+
with (λk G)∗ (ξk ) ≤ αk for some λk ∈ Rm m

+ . As int R+ is nonempty, one can
m
always find a compact convex set R ⊂ R+ such that 0 ∈/ R and cone R = Rm+.
Thus, λk = γk bk , where γk ≥ 0 and bk ∈ R. Assume that γk ≥ 0 for every k
and bk → b ∈ R by the compactness of R. We consider the following cases.
(i) γk → γ > 0: Consider
(λk G)∗ (ξk ) ≤ αk ⇐⇒ (γk bk G)∗ ≤ αk

⇐⇒ (bk G)∗ (ξk /γk ) ≤ αk /γk .
Because bk G → bG, ξk /γk → ξ/γ and αk /γk → α/γ,
(bG)∗ (ξ/γ) ≤ lim inf (bk G)∗ (ξk /γk ) ≤ α/γ.

k→∞
S
Therefore, (ξ/γ, α/γ) ∈ epi (bG)∗ and hence (ξ, α) ∈ λ∈Rm epi (λG)∗ .
+
(ii) γk → +∞: Then ξk /γk → 0 and αk /γk → 0. Therefore,
(bG)∗ (0) ≤ lim inf (bk G)∗ (ξk /γk ) ≤ 0,

k→∞
which implies
− infn (bG)(x) = sup (−(bG)(x)) ≤ 0,

x∈R x∈Rn
that is, (bG)(x) ≥ 0 for every x ∈ Rn . But by the Slater constraint qual-
ification, G(x̂) ∈ −int Rm
+ and b 6= 0. Therefore, (bG)(x̂) < 0, which is a
contradiction.
(iii) γk → 0: This implies that λk → 0 and thus (λk G) → 0. Therefore,
0∗ (ξ) ≤ lim inf (λk G)∗ (ξk ) ≤ α.

k→∞
Observe that

∗ ′ 0, ξ ′ = 0,
0 (ξ ) =
+∞, otherwise,

which leads to ξ = 0 and α ≥ 0. Thus,

[
(0, α) ∈ epi (0G)∗ ⊂ epi (λG)∗ .
λ∈Rm
+
Therefore, the closed cone constraint qualification is satisfied.

Next we present some examples to show that the FM constraint qualifica-
tion is weaker in comparison to the Slater constraint qualification. Consider
C = {x ∈ R : g(x) ≤ 0}, where
2
x , x ≤ 0,
g(x) =
x, x ≥ 0.
Observe that C = {0} and hence the Slater constraint qualification is not
satisfied. Also TC (0) = {0} while
S(0) = {d ∈ R : g ′ (0, d) ≤ 0}
= {d ∈ R : d ≤ 0},
which implies that the Abadie constraint qualification is also not satisfied. For
ξ ∈ R,

∗ 0, 0 ≤ ξ ≤ 1,
g (ξ) =
ξ 2 /4, ξ ≤ 0.
Observe that as only one constraint is involved,

[ [
epi (λg)∗ = λ epi g ∗ = cone epi g ∗ .
λ≥0 λ≥0
Therefore, the FM constraint qualification reduces to the set cone epi g ∗ being
closed, which is same as the relaxed FM constraint qualification. Here,
cone epi g ∗ = {(ξ, α) ∈ R2 : ξ ≤ 0, α > 0} ∪ {(ξ, α) ∈ R2 : ξ ≥ 0, α ≥ 0}
is not closed and hence the FM constraint qualification is not satisfied.

Now suppose that in the previous example,

−2x, x ≤ 0,
g(x) =
x, x ≥ 0.
Again, C = {0} and the Slater constraint qualification does not hold. But
unlike the above example, TC (0) = {0} = S(0) which implies that the Abadie
constraint qualification is satisfied. For ξ ∈ R,
g ∗ (ξ) = 0, − 2 ≤ ξ ≤ 1.
Observe that the set
cone epi g ∗ = {(ξ, α) ∈ R2 : ξ ∈ R, α ≥ 0}

is closed. Thus, the FM constraint qualification also holds.

Observe that in the above examples, either both the Abadie constraint
qualification and the FM qualification are satisfied or neither holds. Now let
us consider an example from Jeyakumar, Lee, and Dinh [70] showing that the
FM constraint qualification is weaker than the Abadie constraint qualification
as well. Consider a convex function g : R2 → R defined as
q
g(x1 , x2 ) = x21 + x22 − x2 .
Here C = {(x1 , x2 ) ∈ R2 : x1 = 0, x2 ≥ 0}. Observe that the Slater con-

straint qualification does not hold as for any (x1 , x2 ) ∈ C, g(x1 , x2 ) = 0. For
(0, x2 ), x2 > 0, g is differentiable at (0, x2 ) and hence
S(0, x2 ) = R2 while TC (0, x2 ) = {(0, x2 ) : x2 ∈ R}.
Thus the Abadie constraint qualification is also not satisfied. Now, for any
(ξ1 , ξ2 ) ∈ R2 ,

∗ 0, ξ1 = ξ2 = 0,
g (ξ1 , ξ2 ) =
+∞, otherwise.
Therefore,
cone epi g ∗ = {(0, 0)} × R+ ,
which is closed. Hence, the FM constraint qualification holds, thereby showing

that it is a weaker constraint qualification with respect to the Slater and
Abadie constraint qualifications.
Until now we considered the convex programming problem (CP ) with a
real-valued objective function f . This fact played an important role in the
derivation of Theorem 7.8 as the continuity of f on Rn along with Proposi-
tion 2.124 leads to the closedness of
epi f ∗ + epi δC
∗
.
But if f : Rn → R̄ is a proper lsc convex function and C involves inequality

constraints and additionally an abstract constraint, that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X} (7.35)
where X ⊂ Rn is a closed convex set, then one has to impose an additional

condition along with the closed cone constraint qualification to establish the
KKT optimality condition, namely the CC qualification condition, that is,
[ Xm
epi f ∗ + epi ( λi gi )∗ + epi δX
∗
is closed.
λ∈Rm
+
i=1

Next we present the KKT optimality condition in the presence of the CC

qualification condition from Dinh, Nghia, and Vallet [34]. A similar result was
established by Burachik and Jeyakumar [20] under the assumption of CC as
well as FM constraint qualification.
Theorem 7.13 Consider the convex programming problem (CP ) where

f : Rn → R̄ is a proper lsc convex function and the feasible set C is given
by (7.35). Assume that the CC qualification condition is satisfied. Then
x̄ ∈ dom f ∩ C is a point of minimizer of (CP ) if and only if there exist
λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Proof. Suppose that x̄ ∈ dom f ∩ C is a point of minimizer of the problem

(CP ). Then working along the lines of Theorem 7.8, we have
(0, 0) ∈ cl {epi φ∗ + epi δC

∗
},
where φ(x) = f (x) − f (x̄). Expressing C = C ∩ X implies that δC = δC + δX ,

where
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}.
From the proof of Theorem 7.8,
[ Xm
∗
epi δC = cl epi ( λi gi )∗ .
λ∈Rm
+
i=1
Therefore, by Theorem 2.123 and Propositions 2.102 and 2.15 (vi), the above
condition becomes
(0, 0) ∈ cl {epi φ∗ + cl (epi δC

∗ ∗
+ epi δX )}
[ Xm
⊂ cl {epi φ∗ + cl ( epi ( λi gi )∗ + epi δX
∗
)}
λ∈Rm
+
i=1
[ m
X
⊂ cl {epi φ∗ + epi ( λi gi )∗ + epi δX
∗
}.
λ∈Rm
+
i=1
By Propositions 2.103 and 2.15 (vi), the above yields
[ Xm
(0, −f (x̄)) ∈ cl {epi f ∗ + epi ( λi gi )∗ + epi δX
∗
},
λ∈Rm
+
i=1

which under the CC qualification condition reduces to

[ Xm
(0, −f (x̄)) ∈ epi f ∗ + epi ( λi gi )∗ + epi δX
∗
.
λ∈Rm
+
i=1
Pm
Applying Theorem 2.122, there exist ξf ∈ ∂εf f (x̄), ξg ∈ ∂εg ( i=1 λi gi )(x̄),
and ξx ∈ ∂εx δX (x̄) = NX,εx (x̄) with εf , εg , εx ≥ 0 such that
0 = ξf + ξg + ξx , (7.36)
−f (x̄) = (hξf , x̄i − f (x̄) + εf )
Xm
+(hξg , x̄i − ( λi gi )(x̄) + εg ) + hξx , x̄i + εx . (7.37)
i=1
Condition (7.36) leads to
Xm
0 ∈ ∂εf f (x̄) + ∂εg ( λi gi )(x̄) + NX,εx (x̄). (7.38)
i=1
Condition (7.37) along with (7.36) and the nonnegativity conditions yields
m
X
εf + εg + εx − λi gi (x̄) = 0.
i=1
From the above condition it is obvious that
εf = 0, εg = 0, εx = 0, and λi gi (x̄) = 0, i = 1, 2, . . . , m.
Therefore, the condition (7.38) reduces to
Xm
0 ∈ ∂f (x̄) + ∂( λi gi )(x̄) + NX (x̄).
i=1
n
As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.91, the above condition
becomes
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i=1
which along with the complementary slackness condition yields the desired
optimality conditions.
Conversely, suppose that the optimality conditions hold. Therefore, there
exist ξ ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄) such that
m
X
−ξ − λi ξi ∈ NX (x̄),
i=1

that is,
m
X
hξ + λi ξi , x − x̄i ≥ 0, ∀ x ∈ X.
i=1
The convexity of f and gi , i = 1, 2, . . . , m, along with Definition 2.77 of the

subdifferentials, imply that
m
X m
X
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) ≥ 0, ∀ x ∈ X.
i=1 i=1
In particular, for x ∈ C̄, that is, gi (x) ≤ 0, i = 1, 2, . . . , m, along with the

complementary slackness condition, reduces the above condition to
f (x) ≥ f (x̄), ∀ x ∈ C.
7.4 Applications to Bilevel Programming Problems

Consider the following bilevel problem:
min f (x) subject to x ∈ C, (BP )
where C is given as
C = argmin{φ(x) : x ∈ Θ},
f, φ : Rn → R are convex functions, and Θ ⊂ Rn is a convex set. Thus

it is clear that C is a convex set and hence the problem (BP ) is a convex
programming problem. As C is the solution set to a subproblem, which is
again a convex optimization problem, here we call (BP ) a simple convex bilevel
programming problem. In particular, (BP ) contains the standard differentiable
convex optimization problem of the form
min f (x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, Ax = b,
where f, gi : Rn → R, i = 1, 2, . . . , m, are differentiable convex functions, A is

an l × n matrix, and b ∈ Rl . This problem can be posed as the problem (BP )
by defining φ as
m
X
2
φ(x) = ||Ax − b|| + || max{0, gi (x)}||2 ,
i=1
and the lower-level problem is to minimize φ over Rn .

7.4 Applications to Bilevel Programming Problems 309
The bilevel programming problem (BP ) can be equivalently expressed as

a convex programming problem by assuming C to be nonempty and defining
α = inf φ(x).
x∈Θ
Then the reformulated problem is given by

min f (x) subject to φ(x) ≤ α, x ∈ Θ. (RP )
Observe that (RP ) has the same form as the convex programming problem
(CP ) studied in the previous section. From the definition of α, it is easy to
see that there does not exist any x̂ ∈ Θ such that φ(x̂) < α, which implies
that the Slater constraint qualification does not hold for (RP ). We present
the KKT optimality condition as a consequence of Theorem 7.13.
Theorem 7.14 Consider the reformulated problem (RP ). Assume that
{cone {(0, 1)} ∪ cone [(0, α) + epi φ∗ ]} + epi δΘ

∗
is closed. Then x̄ ∈ Θ is a point of minimizer of (RP ) if and only if there is

λ ≥ 0 such that
0 ∈ ∂f (x̄) + λ∂φ(x̄) + NΘ (x̄) and λ(φ(x̄) − α) = 0.
Proof. Observe that the problem (RP ) is of the type considered in The-
orem 7.13. We can invoke Theorem 7.13 if the CC qualification condition
holds, that is,
[
epi f ∗ + epi(µ(φ(.) − α))∗ + epi δΘ
∗
µ≥0
is closed. As dom f = Rn , by Theorem 2.69, f is continuous on Rn and

thus the CC qualification condition can be replaced by the FM constraint
qualification, that is,
[
epi (µ(φ(.) − α))∗ + epi δΘ
∗
(7.39)
µ≥0
is closed. For µ > 0, by Proposition 2.103,
(µ(φ(.) − α))∗ (ξ) = µα + (µφ)∗ (ξ),
which along with Theorem 2.123 leads to
epi (µ(φ(.) − α))∗ = (0, µα) + epi (µφ)∗

= µ((0, α) + epi φ∗ ), ∀ µ > 0. (7.40)
For µ = 0,

∗ ∗ 0, ξ = 0,
(µ(φ(.) − α)) (ξ) = 0 (ξ) =
+∞, otherwise,

which implies
epi (µ(φ(.) − α))∗ = 0 × R+ = cone {(0, 1)}, µ = 0. (7.41)
Using (7.40) and (7.41), the condition (7.39) becomes

[
cone{(0, 1)} ∪ { µ((0, α) + epi φ∗ )} + epi δΘ
∗
.
µ>0
Observe that cone{(0, 1)}∪{(0, 0)} = cone{(0, 1)} and thus the above becomes
cone{(0, 1)} ∪ cone ((0, α) + epi φ∗ ) + epi δΘ

∗
. (7.42)
By the hypothesis of the theorem, (7.42) is a closed set and thus the reformu-
lated problem (RP ) satisfies the FM constraint qualification. Now invoking
Theorem 7.13, there exists λ ≥ 0 such that
0 ∈ ∂f (x̄) + λ∂(φ(.) − α)(x̄) + NΘ (x̄) and λ(φ(x̄) − α) = 0.
As ∂(φ(.) − α)(x̄) = ∂h(x̄), the optimality condition reduces to
0 ∈ ∂f (x̄) + λ∂φ(x̄) + NΘ (x̄),
thereby establishing the desired result. The converse can be proved as in Chap-
ter 3.
For a better understanding of the above result, consider the bilevel pro-
gramming problem where f (x) = x2 + 1, Θ = [−1, 1], and φ(x) = max{0, x}.
Observe that C = [−1, 0] and α = 0. Thus the reformulated problem is
min x2 + 1 subject to max {0, x} ≤ 0, x ∈ [−1, 0].
For ξ ∈ R,

+∞, ξ < 0 or ξ > 1,
φ∗ (ξ) =
0, ξ ∈ [0, 1],
which implies
epi φ∗ = {(ξ, γ) ∈ R2 : ξ ∈ [0, 1], ξ ≥ 0} = [0, 1] × R+

∗
while epi δΘ = epi |.|. Therefore,
cone epi φ∗ + epi δΘ

∗
= R2+ ∪ {(ξ, γ) ∈ R2 : ξ ≤ 0, γ ≥ −ξ},
which is a closed set. Because cone{(0, 1)} ⊂ cone epi φ∗ , the reformulated
problem satisfies the qualification condition in Theorem 7.14. It is easy to see
that x̄ = 0 is a solution of the bilevel problem with NΘ (0) = {0}, ∂f (0) = {0},
and ∂φ(0) = [0, 1]. Thus the KKT optimality conditions of Theorem 7.14 are

satisfied with λ = 0. Note that the Slater condition fails to hold for the
reformulated problem.
We end this chapter by presenting the optimality conditions for the bilevel
programming problem
inf f (x) subject to x ∈ C, (BP 1)
where C is the solution set of the lower-level problem
min φ(x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X.
Here, φ : Rn → R̄ is a proper, convex, lsc function and gi : Rn → R,

i = 1, 2, . . . , m, are convex functions, and X ⊂ Rn is a closed convex set.
Define
α = inf{φ(x) : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X} < +∞.
Without loss of generality, assume that α = 0. This can be achieved by setting

φ(x) = φ(x) − α. Then the bilevel programming problem (BP 1) is equivalent
to the following optimization problem:
min f (x) subject to φ(x) ≤ 0, gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X. (RP 1)
Below we present the result on optimality conditions for the bilevel program-
ming problem (BP 1).
Theorem 7.15 Consider the bilevel programming problem (BP 1). Assume
that
[ Xm [ [ Xm
{ epi ( λi gi )∗ } ∪ { λ0 epi φ∗ + epi ( λi gi )∗ } + epi δX
∗
λ∈Rm
+
i=1 λ0 >0 λ∈Rm
+
i=1
is closed. Then x̄ ∈ C is a point of minimizer of (BP 1) if and only if there

exist λ0 ≥ 0 and λi ≥ 0, i = 1, 2, . . . , m, such that
Xm
0 ∈ ∂f (x̄) + λ0 ∂φ(x̄) + ∂( λi gi )(x̄) + NX (x̄),
i=1
λ0 φ(x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
e = (λ0 , λ) ∈ R+ × Rm ,
Proof. Observe that for any λ +
m
X
e
(λg)(x) = λ0 φ(x) + λi gi (x).
i=1
Therefore,
Xm
e ∗ = cl{epi (λ0 φ)∗ + epi(
epi(λg) λi gi )∗ }.
i=1

As dom gi = Rn , i = 1, 2, . . . , m, dom (λi gi ) = Rn , i = 1, 2, . . . , m, which by

Theorem 2.69 are continuous on Rn . By Proposition 2.124,
Xm
e ∗ = epi (λ0 φ)∗ + epi (
epi(λg) λi gi )∗ . (7.43)
i=1
Now consider the two cases, namely λ0 = 0 and λ0 > 0. For λ0 = 0,

epi (λ0 φ)∗ = cone {(0, 1)}. Thus, the condition (7.43) reduces to
Xm
e ∗ = cone {(0, 1)} + epi (
epi(λg) λi gi )∗ .
i=1
Observe that for µ ≥ 0,

m
X
µ(0, 1) + (ξ, α) = (ξ, α + µ) ∈ epi ( λi gi )∗ ,
i=1
m
X
where (ξ, α) ∈ epi ( λi gi )∗ . Because µ ≥ 0 was arbitrary,
i=1
Xm Xm
cone {(0, 1)} + epi ( λi gi )∗ ⊂ epi ( λi gi )∗ .
i=1 i=1
m
X
Also, for any (ξ, α) ∈ epi ( λi gi )∗ ,
i=1
m
X
(ξ, α) = (0, 0) + (ξ, α) ∈ cone {(0, 1)} + epi ( λi gi )∗ .
i=1
Xm
As (ξ, α) ∈ epi ( λi gi )∗ was arbitrary,
i=1
Xm Xm
epi ( λi gi )∗ ⊂ cone {(0, 1)} + epi ( λi gi )∗ ,
i=1 i=1
thereby implying that

Xm Xm
cone {(0, 1)} + epi ( λi gi )∗ = epi ( λi gi )∗ .
i=1 i=1
Thus, for λ0 = 0,
Xm
e ∗ = epi (
i=1

For the case when λ0 > 0, the condition (7.43) becomes

Xm
e ∗ = λ0 epi φ ∗ +epi (
i=1
Therefore,
[ [ Xm
e ∗ + epi δ ∗ = {
epi(λg) epi ( λi gi )∗ )} ∪
X
e 1+m
λ∈R +
λ∈Rm
+
i=1
[ [ Xm
{ epi φ∗ + epi ( λi gi )∗ } + epi δX
∗
.
λ0 >0 λ∈Rm
+
i=1
By the given hypothesis, the set

[
e ∗ + epi δ ∗
epi(λg) X
e 1+m
λ∈R +
is closed. Hence, the FM constraint qualification holds for the problem (RP 1).
Because dom f = Rn , by Theorem 2.69 f is continuous on Rn , CC qualification
condition holds for (RP 1). As the bilevel problem (BP 1) is equivalent to
(RP 1), by Theorem 7.11, x̄ ∈ C is a point of minimizer of (BP 1) if and only
e = (λ0 , λ) ∈ R+ × Rm such that
if there exists λ +
e
0 ∈ ∂f (x̄) + ∂(λg)(x̄) + NX (x̄) and e
(λg)(x̄) = 0. (7.44)
As φ is proper convex, λ0 φ is also proper convex. Therefore, dom (λ0 φ) is
a nonempty convex set in Rn . By Proposition 2.14 (i), ri dom (λ0 φ) is non-
empty. Because dom g = Rn , dom (λg) = Rn . Now invoking the Sum Rule,
Theorem 2.91,
Xm
e
∂(λg)(x̄) = λ0 ∂φ(x̄) + ∂( λi gi )(x̄).
i=1
Thus the optimality condition in (7.44) becomes

Xm
0 ∈ ∂f (x̄) + λ0 ∂φ(x̄) + ∂( λi gi )(x̄) + NX (x̄). (7.45)
i=1
By the complementary slackness condition in (7.44),

m
X
e
(λg)(x̄) = λ0 φ(x̄) + λi gi (x̄) = 0.
i=1
As (λ0 , λ) ∈ R+ × Rm
+ , which along with the feasibility of x̄ yields that
λ0 φ(x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.

The above condition together with (7.45) leads to the requisite conditions.
The converse can be proved as in Chapter 3.

Chapter 8
Representation of the Feasible Set and
KKT Conditions
8.1 Introduction
Until now, we discussed the convex programming problem (CP ) with the
convex feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2 . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions and its variations like
(CP 1) and (CCP ). But is the convexity of the functions forming the convex
feasible set C important? For example, assume C as a subset of R2 given by
C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 ≥ 0}.
This set is convex even though g(x1 , x2 ) = 1−x1 x2 is a nonconvex function. As
stated in Chapter 1, convex optimization basically means minimizing a convex
function over a convex set with no emphasis on as to how the feasible set is
obtained. Very recently (2010), Lasserre [74] published a very interesting paper
discussing this aspect of convex feasibility for smooth convex optimization.
8.2 Smooth Case

In this section, we turn our attention to the case of smooth convex optimiza-
tion studied by Lasserre [74]. From Chapter 2 we know that when a convex
function is differentiable, then ∂φ(x) = {∇φ(x)} and its gradient is also con-
tinuous; thus any differentiable convex function is smooth. So one can obtain
the KKT optimality conditions at the point of minimizer from the subdif-
ferential optimality conditions discussed in Chapter 3; that is, if x̄ is the
point of minimizer of (CP ) with (C) given by (3.1), then there exist λi ≥ 0,
i = 1, 2, . . . , m, such that
m
X
∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
315

316 Representation of the Feasible Set and KKT Conditions
Observe that the KKT conditions for smooth convex optimization problems
look absolutely the same as the KKT conditions for the usual smooth opti-
mization problem. As discussed in earlier chapters, under certain constraint
qualifications like the Slater constraint qualification, the above KKT condi-
tions are necessary as well as sufficient.
Lasserre observed that the convex feasible set C of (CP ) need not always
be defined by convex inequality constraints as in the above example. The
question that Lasserre answers is “in such a scenario what conditions would
make the KKT optimality conditions necessary as well as sufficient? ” So now
the convex set C given by (3.1) is considered, with the only difference that
gi , i = 1, 2, . . . , m, need not be convex even though they are assumed to
be smooth. Lasserre showed that if the Slater constraint qualification and
an additional nondegeneracy condition hold, then the KKT condition is both
necessary and sufficient. Though Lasserre defined the notion of nondegeneracy
for every point of the set C, we define it for a particular point and extend it
to the feasible set C.
Definition 8.1 The nondegeneracy condition is said to hold at x̄ ∈ C if
∇gi (x̄) 6= 0, ∀ i ∈ I(x̄),
where I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} denotes the active index set at x̄.
The set C is said to satisfy the nondegeneracy condition if it holds for every
x̄ ∈ C.
The Slater constraint qualification along with the nondegeneracy condi-

tion gives the following interesting characterization of a convex set given by
Lasserre [74].
Theorem 8.2 Consider the set C given by (3.1) where gi , i = 1, 2, . . . , m,

are smooth. Assume that the Slater constraint qualification is satisfied, that is,
there exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m, and the nondegeneracy
condition holds for C. Then C is convex if and only if
h∇gi (x), y − xi ≤ 0, ∀ x, y ∈ C, ∀ i ∈ I(x). (8.1)
Proof. Suppose that C is a convex set and consider x̄ ∈ C. Therefore, for any
y ∈ C, for every λ ∈ [0, 1], x̄ + λ(y − x̄) ∈ C, that is,
gi (x̄ + λ(y − x̄)) ≤ 0, ∀ i = 1, 2, . . . , m.
Now for i ∈ I(x̄),

gi (x̄ + λ(y − x̄)) − gi (x̄)
lim ≤ 0,
λ↓0 λ
that is, for every i ∈ I(x̄),
h∇gi (x̄), y − x̄i ≤ 0, ∀ y ∈ C.

8.2 Smooth Case 317
Because x̄ ∈ C is arbitrary, the above inequality holds for every x̄ ∈ C, thereby

establishing the desired inequality.
Conversely, suppose that the condition (8.1) holds. Observe that C has an
interior because the Slater constraint qualification holds. Further, (8.1) along
with the nondegeneracy condition of the set C implies that each boundary
point of C has a nontrivial supporting hyperplane. The supporting hyper-
plane is nontrivial due to the non-degeneracy condition on C. Using Proposi-
tion 2.29, C is a convex set.
Observe that the nondegeneracy condition of the set C is required only in
the sufficiency part of the proof.
Till now in the book, we have mainly dealt with the Slater constraint
qualification and some others namely, Abadie, pseudonormality and the
FM constraint qualification. Another well known constraint qualification for
the convex programming problem (CP ) is the Mangasarian–Fromovitz con-
straint qualification. For (CP ) with C given by (3.1) in the smooth scenario,
Mangasarian–Fromovitz constraint qualification is said to hold at x̄ ∈ Rn if
there exists d ∈ Rn such that
h∇gi (x̄), di < 0, ∀ i ∈ I(x̄).
One may observe that if this constraint qualification is satisfied for x̄ ∈ C,

then ∇gi (x̄) 6= 0, for every i ∈ I(x̄), thereby ensuring that the nondegeneracy
condition at x̄. But the converse need not hold, that is the nondegeneracy
condition need not imply the Mangasarian–Fromovitz constraint qualification.
We verify this claim by the following example. Consider the set C ⊂ R2 given
by
C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 + x2 − 2 ≤ 0, x1 ≥ 0}.
Here, g1 (x1 , x2 ) = 1 − x1 x2 , g2 (x1 , x2 ) = x1 + x2 − 2 and g3 (x1 , x2 ) = −x1 .

Note that C = {(1, 1)} and thus trivially is a convex set. At x̄ = (1, 1), the
active index set is I(x̄) = {1, 2} and

−1 1
∇g1 (x̄) = 6= 0 and ∇g2 (x̄) = 6= 0,
−1 1
which implies that the nondegeneracy condition is satisfied for C = {x̄}. But
observe that there exists no (d1 , d2 ) ∈ R2 satisfying
−d1 − d2 < 0 and d 1 + d2 < 0
simultaneously, thereby not satisfying the Mangasarian–Fromovitz constraint

qualification at x̄.
We end this section by presenting the result from Lasserre [74] establishing
the necessary and sufficient optimality condition for a minimizer of (CP ) over
C with gi , i = 1, 2, . . . , m, nonconvex smooth functions. As one will observe

from the result below, the nondegeneracy condition is required only for the
necessary part at the given point and not the set as mentioned in the statement
of the theorem in Lasserre [74]. Also in the converse part, we require the
necessary part of Theorem 8.2, which is independent of the nondegeneracy
condition.
Theorem 8.3 Consider the problem (CP ) where f is smooth and C is given
by (3.1), where gi , i = 1, 2, . . . , m, are smooth but need not be convex. Assume
that the Slater constraint qualification is satisfied and the nondegeneracy con-
dition holds at x̄ ∈ C. Then x̄ is a point of minimizer of (CP ) if and only if
there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Proof. Let x̄ be a point of minimizer of f over C. By the Fritz John optimality

conditions, Theorem 5.1, there exist λ0 ≥ 0, λi ≥ 0, i = 1, 2, . . . , m, not all
simultaneously zero such that
m
X
λ0 ∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Suppose that λ0 = 0, which implies for some i ∈ {1, 2, . . . , m}, λi > 0. There-
fore, the set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}
is nonempty. By the complementary slackness condition,

¯
gi (x̄) = 0, ∀ i ∈ I,
which implies I¯ ⊂ I(x̄). By the optimality condition,

X
λi ∇gi (x̄) = 0,
i∈I¯
which implies that

X
λi h∇gi (x̄), x − x̄i = 0, ∀ x ∈ C.
i∈I¯
As the Slater constraint qualification is satisfied, there exists x̂ ∈ Rn such

that gi (x̂) < 0 for every i = 1, 2, . . . , m. As dom gi = Rn , i = 1, 2, . . . , m,
by Theorem 2.69, gi , i = 1, 2, . . . , m, are continuous on Rn . Thus there exists
δ > 0 such that for every x ∈ Bδ (x̂) and every i = 1, 2, . . . , m, gi (x) < 0, that
is, Bδ (x̂) ⊂ int C. By the preceding equality,
X
λi h∇gi (x̄), x − x̄i = 0, ∀ x ∈ Bδ (x̂). (8.2)
i∈I¯

8.2 Smooth Case 319
As I¯ ⊂ I(x̄), along with the convexity of C and Theorem 8.2, yields that for
¯
every i ∈ I,
h∇gi (x̄), x − x̄i ≤ 0, ∀ x ∈ C,

¯
which along with the condition (8.2) implies that for i ∈ I,
h∇gi (x̄), x − x̄i = 0, ∀ x ∈ Bδ (x̂). (8.3)
Because x̂ ∈ Bδ (x̂), the condition (8.3) reduces to

¯
h∇gi (x̄), x̂ − x̄i = 0, ∀ i ∈ I. (8.4)
For any d ∈ Rn , consider the vector x̂ + λd such that for λ > 0 sufficiently
¯
small, x̂ + λd ∈ Bδ (x̂). Hence, by the condition (8.3), for each i ∈ I,
h∇gi (x̄), x̂ + λd − x̄i = 0,
which implies
h∇gi (x̄), x̂ − x̄i + λh∇gi (x̄), di = 0.

¯
By condition (8.4), for every i ∈ I,
h∇gi (x̄), di = 0, ∀ d ∈ Rn .
Hence, ∇gi (x̄) = 0 for every i ∈ I¯ ⊂ I(x̄) and thereby contradicting the non-
degeneracy condition at x̄. Thus, λ0 6= 0. Dividing the Fritz John optimality
condition by λ0 , the KKT optimality condition is established at x̄ as
m
X
∇f (x̄) + λ̄i ∇gi (x̄) = 0 and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
i=1
λi
where λ̄i = , i = 1, 2, . . . , m.
λ0
Conversely, suppose that x̄ satisfies the KKT optimality conditions. As-
sume that x̄ is not a point of minimizer of (CP ). Therefore, there exists x ∈ C
such that f (x) < f (x̄), which along with the convexity of f implies that
0 > f (x) − f (x̄) ≥ h∇f (x̄), x − x̄i.
Therefore, by the KKT optimality conditions,

m
X
0 > f (x) − f (x̄) ≥ − λi h∇gi (x̄), x − x̄i. (8.5)
i=1
If λi = 0 for every i = 1, 2, . . . , m, we reach a contradiction. Now assume that

I¯ 6= ∅. By Theorem 8.2, for every i ∈ I¯ ⊂ I(x̄),
h∇gi (x̄), x − x̄i ≥ 0, ∀ x ∈ C,

which implies that

m
X X
λi h∇gi (x̄), x − x̄i = λi h∇gi (x̄), x − x̄i ≥ 0, ∀ x ∈ C,
i=1 i∈I¯
thereby contradicting the condition (8.5) and thus leading to the requisite
result, that is, x̄ is a point of minimizer of f over C.
8.3 Nonsmooth Case

Motivated by the above work of Lasserre [74], Dutta and Lalitha [40] extended
the study to a nonsmooth scenario involving the locally Lipschitz function.
But before we move on with the work done in this respect, we need some tools
for nonsmooth Lipschitz functions.
Consider a locally Lipschitz function φ : Rn → R. The Clarke directional
derivative of φ at x̄ in the direction d ∈ Rn is defined as
φ(x + λd) − φ(x)
φ◦ (x̄, d) = lim .
x→x̄, λ↓0 λ
The Clarke directional derivative is a sublinear function of the direction d.
In Section 3.6 we defined the Clarke subdifferential using the Rademacher
Theorem. Here, we express the Clarke subdifferential of φ at x̄ using the
Clarke directional derivative defined above as
∂ ◦ φ(x̄) = {ξ ∈ Rn : φ◦ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn }.
The function φ is said to be regular at x̄ if for every d ∈ Rn , the directional

derivative φ′ (x̄, d) exists and
φ◦ (x̄, d) = φ′ (x̄, d), ∀ d ∈ Rn .
Every convex function is regular.

In the nonsmooth scenario, Dutta and Lalitha [40] considered the con-
vex feasible set C of (CP ) to be defined by inequality constraints involving
nonsmooth locally Lipschitz functions that are regular. For example, consider
φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)},
where φi : Rn → R, i = 1, 2, . . . , m, are smooth functions. Then φ is a locally

Lipschitz regular function.
Now, similar to the nondegeneracy condition in the smooth case given by
Lasserre [74], Dutta and Lalitha [40] defined the notion for nonsmooth locally
Lipschitz scenario as follows.

8.3 Nonsmooth Case 321
Definition 8.4 Consider the set C given by (3.1) where each gi ,

i = 1, 2, . . . , m is a locally Lipschitz function. The set C is said to satisfy
the nondegeneracy condition at x̄ ∈ C if
0 6∈ ∂ ◦ gi (x̄), ∀ i ∈ I(x̄).
If the condition holds for every x̄ ∈ C, the nondegeneracy condition is said to

hold for the set C.
Before moving on to discuss the results obtained in this work, we present

some examples from Dutta and Lalitha [40] to have a look at the above non-
degeneracy condition.
Consider the set
C = {x ∈ R : g0 (x) ≤ 0},
where g0 (x) = max{x3 , x} − 1. Hence C = (−∞, 1]. At the boundary point

x̄ = 1, I(x̄) = {0} where ∂ ◦ g0 (x̄) = [1, 3], thereby satisfying the nonde-
generacy condition. Now if we define the function g0 (x) = max{x3 , x}, then
C = (−∞, 0] with boundary point x̄ = 0 at which ∂ ◦ g0 (x̄) = [0, 1]. Thus, the
nondegeneracy condition is not satisfied at x̄. Observe that in both the cases
g0 is a regular function and the Slater constraint qualification is also satisfied.
Yet in the second scenario the nondegeneracy condition is not satisfied.
But if the functions gi , i = 1, 2, . . . , m, involved are convex and the Slater
constraint qualification holds for (CP ), then the Mangasarian–Fromovitz con-
straint qualification for the nonsmooth case is satisfied at x̄ ∈ C, that is there
exists d ∈ Rn such that
gi′ (x̄, d) < 0, ∀ i ∈ I(x̄).
As the directional derivative is a support function to the subdifferential set,

Theorem 2.79, the above condition is equivalent to
hξi , di < 0, ∀ ξi ∈ ∂gi (x̄), ∀ i ∈ I(x̄),
from which it is obvious that the nondegeneracy condition is ensured for the
convex nonsmooth scenario.
Next we present the equivalent characterization of the convex set C under
the nonsmooth scenario.
Theorem 8.5 Consider the set C be given by (3.1) represented by nonsmooth

locally Lipschitz inequality constraints where gi , i = 1, 2, . . . , m, are regular.
Assume that the Slater constraint qualification holds and satisfies the nonde-
generacy condition. Then C is convex if and only if
gi◦ (x, y − x) ≤ 0, ∀ x, y ∈ C, ∀ i ∈ I(x). (8.6)

Proof. Consider the convex set C. Working along the lines of Theorem 8.2,
for arbitrary but fixed x̄ ∈ C, for λ ∈ (0, 1),
gi (x̄ + λ(y − x̄)) − gi (x̄)

≤ 0, ∀ i ∈ I(x̄).
λ
As the functions gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions,
gi◦ (x̄, y − x̄) = gi′ (x̄, y − x̄) ≤ 0, ∀ y ∈ C, ∀ i ∈ I(x),
thus leading to the requisite result.

Conversely, suppose that (8.6) holds. As the Slater constraint qualification
holds, the set C has an interior. Now consider any boundary point x ∈ C. By
the condition (8.6) along with the fact that the Clarke directional derivative
is the support function of the Clarke subdifferential, then for every y ∈ C,
hξi , y − xi ≤ gi◦ (x, y − x) ≤ 0, ∀ ξi ∈ ∂ ◦ gi (x), ∀ i ∈ I(x).
As the nondegeneracy condition is satisfied, ξi 6= 0 for every ξi ∈ ∂ ◦ gi (x) and

every i ∈ I(x), which implies that there is a nontrivial supporting hyperplane
to C at x. Hence, by Proposition 2.29, C is a convex set, as desired.
Now we present the theorem establishing the necessary and sufficient op-
timality conditions for the class of problem (CP ) dealt with in this section.
Theorem 8.6 Consider the problem (CP ) with C is given by (3.1), where
gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions. Assume that the
Slater constraint qualification holds and the nondegeneracy condition is sat-
isfied at x̄ ∈ C. Then x̄ is a point of minimizer of (CP ) if and only if there
exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂ ◦ gi (x̄) and λi gi (x̄) = 0, i = 1, . . . , m.
i=1
Proof. Suppose that x̄ is a point of minimizer of f over C. We know by

Theorem 2.72 that a convex function f is locally Lipschitz. Then by the op-
timality conditions for locally Lipschitz functions at x̄, there exist λi ≥ 0,
i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
0 ∈ λ0 ∂ ◦ f (x̄) + λi ∂ ◦ gi (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Because f is convex, ∂ ◦ f (x̄) = ∂f (x̄). Therefore, the optimality condition can

be rewritten as
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂ ◦ gi (x̄).
i=1

We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. As λi ,

i = 0, 1, . . . , m, are not all zeroes, the set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}
is nonempty. Then the above optimality condition reduces to
X
0∈ λi ∂ ◦ gi (x̄),
i∈I¯
which implies there exist ξi ∈ ∂ ◦ gi (x̄), i ∈ I¯ such that

X
0= λi ξi .
i∈I¯
From the definition of the Clarke subdifferential, the above condition leads to
X X
λi gi◦ (x̄, d) ≥ λi hξi , di = 0, ∀ d ∈ Rn . (8.7)
i∈I¯ i∈I¯
As the Slater constraint qualification is satisfied, there exists x̂ ∈ Rn such

that gi (x̂) < 0 for every i = 1, . . . m. Also, as gi , i = 1, 2, . . . , m, are locally
Lipschitz, and hence continuous. Thus there exists δ > 0 such that for every
x ∈ Bδ (x̂), gi (x) < 0, i = 1, . . . , m. In condition (8.7), in particular, taking
d = x − x̄ where x ∈ Bδ (x̂) ⊂ C,
X
λi gi◦ (x̄, x − x̄) ≥ 0, ∀ x ∈ Bδ (x̂). (8.8)
i∈I¯
¯ that is,
By the complementary slackness condition, gi (x̄) = 0 for every i ∈ I,
¯
I ⊂ I(x̄). Therefore, by Theorem 8.5, as C is a convex set, we have
¯
gi◦ (x̄, x − x̄) ≤ 0, ∀ x ∈ Bδ (x̂), ∀ i ∈ I,
¯
which along with the condition (8.8) implies that for every i ∈ I,
gi◦ (x̄, x − x̄) = 0, ∀ x ∈ Bδ (x̂). (8.9)
In particular, for x̂ ∈ Bδ (x̂), the above condition reduces to

¯
gi◦ (x̄, x̂ − x̄) = 0, ∀ i ∈ I. (8.10)
Consider any v ∈ Rn and choose λ > 0 sufficiently small such that

¯
x̂ + λv ∈ Bδ (x̂). Hence, from the condition (8.9), for every i ∈ I,
gi◦ (x̄, x̂ + λv − x̄) = 0, ∀ v ∈ Rn .
As the Clarke generalized directional derivative is sublinear in the direction,

¯ the above condition becomes
for every i ∈ I,
gi◦ (x̄, x̂ − x̄) + λgi◦ (x̄, v) ≥ 0, ∀ v ∈ Rn ,

which by (8.10) leads to
¯
gi◦ (x̄, v) ≥ 0, ∀ v ∈ Rn , ∀ i ∈ I.
¯
From the definition of the Clarke subdifferential, 0 ∈ ∂ ◦ gi (x̄) for every i ∈ I,
thereby contradicting the nondegeneracy condition. Therefore λ0 6= 0 and
dividing the optimality condition throughout by λ0 reduces it to
m
X
0 ∈ ∂f (x̄) + λ̄i ∂ ◦ gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
i=1
λi
where λ̄i = , i = 1, 2, . . . , m leading to the requisite result.
λ0
Conversely, suppose that the conditions hold at x̄. On the contrary, assume
that x̄ is not a point of minimizer of f over C. Thus, there exists x ∈ C such
that f (x) < f (x̄), which along with the convexity of f ,
0 > f (x) − f (x̄) ≥ hξ, x − x̄i, ∀ ξ ∈ ∂f (x̄). (8.11)
Using the optimality conditions at x̄, there exists ξ0 ∈ ∂f (x̄) and ξi ∈ ∂ ◦ gi (x̄),
i = 1, 2, . . . , m, such that
m
X
0 = ξ0 + λi ξi .
i=1
The above condition along with (8.11) leads to

m
X
0>− λi hξi , x − x̄i,
i=1
which by the definition of Clarke subdifferential along with Theorem 8.5 yields
m
X
0>− λi gi◦ (x̄, x − x̄) ≥ 0,
i=1
thereby leading to a contradiction. Therefore, x̄ is a point of minimizer of

(CP ).
We end this chapter with an example from Dutta and Lalitha [40] to
illustrate that in the absence of the nondegeneracy condition, even though the
Slater constraint qualification and the regularity of the constraint functions
hold, the KKT optimality condition need not be satisfied.
Consider the problem
min f (x) subject to g1 (x) ≤ 0, g2 (x) ≤ 0

where

3 −x − 1, x ≤ 0,
f (x) = −x, g1 (x) = x and g2 (x) =
−1, x > 0.
Then the feasible set is C = [−1, 0] and the point of minimizer is x̄ = 0.

Also, C does not satisfy the nondegeneracy condition but the Slater constraint
qualification holds along with the constraint functions being regular. Observe
that ∂f (x̄) = {−1}, ∂ ◦ g1 (x̄) = {0} and ∂ ◦ g2 (x̄) = ∂g2 (x̄) = [−1, 0], and thus
the KKT optimality conditions are not satisfied. Now if in the above example
one takes the objective function to be f (x) = x, then the point of minimizer is
x̄ = −1 at which ∂f (x̄) = {1}, ∂ ◦ g1 (x̄) = {3}, and ∂ ◦ g2 (x̄) = ∂g2 (x̄) = {−1}.
Observe that the KKT optimality conditions hold with λ1 = 0 and λ2 = 1.

Chapter 9
Weak Sharp Minima in Convex
Optimization
9.1 Introduction
In the preceding chapters we studied the necessary and sufficient optimality
conditions for x̄ ∈ Rn to be a point of minimizer for the convex optimization
problem wherein a convex objective function f is minimized over a convex
feasible set C ⊂ Rn . From Theorem 2.90, if the objective function f is strictly
convex, then the point of minimizer x̄ is unique. The notion of unique mini-
mizer was extended to the concept of sharp minimum or, equivalently, strongly
unique local minimum. The ideas of sharp minimizer and strongly unique min-
imizer were introduced by Polyak [94, 95] and Cromme [29]. These notions
played an important role in the approximation theory or the study of pertur-
bation in optimization problems and also in the analysis of the convergence
of algorithms [1, 26, 56]. Below we define the notion of sharp minimum.
Definition 9.1 A function φ : Rn → R̄ defined over a set F ⊂ Rn is said to

be sharp minima at x̄ ∈ F if there exists α > 0 such that
φ(x) − φ(x̄) ≥ α kx − x̄k, ∀ x ∈ F.
From the above definition it is obvious that a point of sharp minimizer

is unique. This is one of the major drawbacks of the concept of sharp mini-
mum as it rules out the most basic optimization problem, namely the linear
programming problem. To overcome this difficulty, the notion of weak sharp
minimum was introduced by Ferris [46]. We study this notion for the convex
optimization problem
where f : Rn → R is a convex function and C ⊂ Rn is a closed convex set.
327

328 Weak Sharp Minima in Convex Optimization
9.2 Weak Sharp Minima and Optimality

We begin this section by defining the weak sharp minimum for the convex
optimization problem (CP ) from Ferris [46].
Definition 9.2 Let S ⊂ Rn denote the nonempty solution set of (CP ). Then
S is said to be the set of weak sharp minimizer on C if there exists α > 0
such that
f (x) − f (projS (x)) ≥ α kx − projS (x)k, ∀ x ∈ C.
Observe that for any x ∈ C, projS (x) ∈ S and as S is the solution set,
f (x̄) = constant, ∀ x̄ ∈ S.
Equivalently, S is the set of the weak sharp minimizer if there exists α > 0
such that
f (x) − f (x̄) ≥ α dS (x), ∀ x ∈ C, ∀ x̄ ∈ S.
The equivalent definition was given in Burke and Ferris [25].
Before moving on with the results on equivalent conditions for weak
sharp minimizers, we present some results from Aubin and Ekeland [4], Luc-
chetti [79], Luenberger [80], and Rockafellar [97], which act as a tool in proving
the equivalence.
Proposition 9.3 Consider nonempty closed convex set F ⊂ Rn .
(i) For every x ∈ F ,
NF (x) = {v ∈ Rn : hv, xi = σF (v)}.
(ii) For every y ∈ Rn ,
dF (y) = max (hv, yi − σF (v)).
v∈cl B
(iii) If F is a closed convex cone, then for every y ∈ Rn ,

dF (y) = σcl B∩F ◦ (y).
(iv) For every y ∈ Rn ,
dF (y) = sup dx+TF (x) (y).
x∈F
(v) For every x ∈ F , the subdifferential of the distance function dF is

∂dF (x) = cl B ∩ NF (x)
and the directional derivative is
d′F (x, v) = dTF (x) (v) = σcl B∩NF (x) (v), ∀ v ∈ Rn .

9.2 Weak Sharp Minima and Optimality 329
Proof. (i) From Definition 2.36 of normal cone,
NF (x̄) = {v ∈ Rn : hv, x − x̄i ≤ 0, ∀ x ∈ F }.
Observe that any v ∈ NF (x̄) along with the fact that x̄ ∈ F satisfies the
inequality
hv, x̄i ≤ σF (v) ≤ hv, x̄i,
that is, σF (v) ≤ hv, x̄i. Thus
NF (x̄) = {v ∈ Rn : hv, x̄i = σF (v)}.
(ii) By the definition of the distance function,
dF (y) = inf ky − xk = inf sup hv, y − xi

x∈F x∈F v∈cl B
= sup {hv, yi + inf (−hv, xi)}

v∈cl B x∈F
= sup {hv, yi − σF (v)}.

v∈cl B
(iii) For a closed convex cone F , by Definition 2.30 of polar cone,
F ◦ = {v ∈ Rn : hv, xi ≤ 0, ∀ x ∈ F }.
Therefore,
0, v ∈ F ◦ ,
σF (v) = (9.1)
+∞, otherwise.
From (ii), which along with the above relation (9.1) yields that
dF (y) = sup hv, yi, provided v ∈ F ◦ ,

v∈cl B
which implies
dF (y) = sup hv, yi = σcl B∩F ◦ (y),

v∈cl B∩F ◦
as desired.
(iv) By Theorem 2.35, TF (x) is a closed convex cone and hence x + TF (x) is
also a closed convex cone. Invoking (iii) along with Proposition 2.37 leads to
dx+TF (x) (y) = dTF (x) (y − x) = σcl B∩NF (x) (y − x).
Therefore,
sup dx+TF (x) (y) = sup sup hv, y − xi.

x∈F x∈F v∈cl B∩NF (x)

By (i) and (ii), the above condition reduces to
sup dx+TF (x) (y) = sup {hv, yi − σF (v)} = dF (y),

x∈F v∈cl B

(v) As an example to inf-convolution, Definition 2.54,
dF (x) = (k.k δF )(x),
which is exact at every x ∈ Rn . Invoking the subdifferential inf-convolution

rule at the point where the inf-convolution is exact, Theorem 2.98,
∂dF (x) = ∂k.k(y) ∩ ∂δF (x − y).
For x ∈ int F , taking y = 0,
∂k.k(0) = cl B while ∂δF (x) = NF (x) = {0}.
Thus, ∂dF (x) = {0} for x ∈ int F . For x ∈ bdry F , again taking y = 0,
∂k.k(0) = cl B while ∂δF (x) = NF (x),
and hence ∂dF (x) = B ∩ NF (x). Therefore,
∂dF (x) = cl B ∩ NF (x), ∀ x ∈ F. (9.2)
As dom dF = Rn , by Theorem 2.79 and the condition (9.2),
d′F (x, v) = σ∂dF (x) (v) = σcl B∩NF (x) (v),
which by (iii) implies that
d′F (x, v) = dTF (x) (v)
and hence the result.

As we know, the convex optimization problem can be equivalently ex-
pressed as the unconstrained problem
min f0 (x) subject to x ∈ Rn , (CPu )
where f0 (x) = f (x) + δC (x) is an lsc proper convex function. As (CP ) and
(CPu ) are equivalent, the solution set of both problems coincide, which implies
that S is also the set of weak sharp minimizers of (CPu ). Before moving on
to prove the main result on the characterization of the weak sharp minimizer,
we present the results in terms of the objective function f0 of (CPu ).
Lemma 9.4 Consider the unconstrained convex optimization problem (CPu )

and the set of weak sharp minimizers S. Let α > 0. Then the following are
equivalent:

(i) α cl B ∩ NS (x) ⊂ ∂f0 (x) for every x ∈ S,

[ [
(ii) α cl B ∩ NS (x) ⊂ ∂f0 (x).
x∈S x∈S
Proof. It is easy to observe that (i) implies (ii). Conversely, suppose that (ii)
holds. Consider x̄ ∈ S with ξ ∈ α cl B ∩ NS (x̄). As (ii) is satisfied, there exists
ȳ ∈ S such that ξ ∈ ∂f0 (ȳ). By Definition 2.77 of subdifferential,
f0 (x) − f0 (ȳ) ≥ hξ, x − ȳi, ∀ x ∈ Rn . (9.3)
In particular, for any x ∈ S, f0 (x) = f0 (ȳ), thereby reducing the above in-
equality to
hξ, x − ȳi ≤ 0, ∀ x ∈ S,
which implies ξ ∈ NS (ȳ). By assumption, ξ ∈ NS (x̄). Thus, by Proposi-

tion 9.3 (i),
hξ, x̄i = σS (ξ) = hξ, ȳi. (9.4)
As x̄ ∈ S, f0 (x̄) = f0 (ȳ), which along the conditions (9.3) and (9.4) leads to
f0 (x) − f0 (x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
thereby implying that ξ ∈ ∂f0 (x̄). Because x̄ ∈ S was arbitrary, (i) holds.
The above result was from Burke and Ferris [25]. The next result from
Burke and Deng [22] provides a characterization for weak sharp minimizer in
terms of f0 .
Theorem 9.5 Consider the convex optimization problem (CP ) and its equiv-
alent unconstrained problem (CPu ). Let α > 0. Then S is the set of weak sharp
minimizers with modulus α if and only if
f0′ (x̄, v) ≥ α dTS (x̄) (v), ∀ x̄ ∈ S, ∀ v ∈ Rn . (9.5)
Proof. Suppose that S is the set of weak sharp minimizers with modulus
α > 0. Consider x̄ ∈ S. Therefore, by Definition 9.2,
f (x) − f (x̄) ≥ α dS (x), ∀ x ∈ C.
As x̄ ∈ S ⊂ C, f0 (x̄) = f (x̄). Also for x ∈ C, f0 (x) = f (x). Therefore, the

above inequality leads to
f0 (x) − f0 (x̄) ≥ α dS (x), ∀ x ∈ C. (9.6)
For x 6∈ C, f0 (x) = +∞. Thus,
f0 (x) − f0 (x̄) ≥ α dS (x), ∀ x ∈

/C (9.7)

trivially. Combining (9.6) and (9.7) yields
f0 (x) − f0 (x̄) ≥ α dS (x), ∀ x ∈ Rn .
In particular, taking x = x̄ + λv ∈ Rn for λ > 0 and v ∈ Rn in the above

condition leads to
f0 (x̄ + λv) − f0 (x̄) ≥ α dS (x̄ + λv), ∀ λ > 0, ∀ v ∈ Rn ,
which implies that for every λ > 0,
f0 (x̄ + λv) − f0 (x̄) dS (x̄ + λv)

≥α , ∀ v ∈ Rn . (9.8)
λ λ
Observe that
(x − x̄)
dS (x̄ + λv) = inf kx̄ + λv − xk = λ inf kv − k,
x∈S x∈S λ
which by Definition 2.33 of tangent cone implies that
dS (x̄ + λv)
≥ inf kv − yk = dTS (x̄) (v). (9.9)
λ y∈TS (x̄)
Therefore, using (9.8) along with (9.9) leads to
f0 (x̄ + λv) − f0 (x̄)

≥ αdTS (x̄) (v), ∀ v ∈ Rn .
λ
Taking the limit as λ → 0 in the above inequality reduces it to
f0′ (x̄, v) ≥ αdTS (x̄) (v), ∀ v ∈ Rn .
Because x̄ ∈ S was arbitrary, the above condition yields (9.5).

Conversely, suppose that the relation (9.5) is satisfied. Consider x ∈ C and
x̄ ∈ S. Therefore,
f0 (x) − f0 (x̄) ≥ f0′ (x̄, x − x̄) ≥ α dTS (x̄) (x − x̄) = α dx̄+TS (x̄) (x).
By Proposition 9.3 (iv), the above inequality leads to
f0 (x) − f0 (x̄) ≥ α sup dx̄+TS (x̄) (x) = α dS (x).

x̄∈S
Because x ∈ C and x̄ ∈ S were arbitrary, the above condition holds for every
x ∈ C and every x̄ ∈ S, and hence S is the set of weak sharp minimizers.
We end this chapter by giving equivalent characterizations for the set of
weak sharp minimizers, S, for (CP ) from Burke and Deng [22].

Theorem 9.6 Consider the convex optimization problem (CP ) and its equiv-
alent unconstrained problem (CPu ). Let α > 0. Then the following statements
are equivalent:
(i) S is the set of weak sharp minimizers for (CP ) with modulus α > 0.
(ii) For every x̄ ∈ S and v ∈ TC (x̄),
f ′ (x̄, v) ≥ α dTS (x̄) (v).
(iii) For every x̄ ∈ S,
α cl B ∩ NS (x̄) ⊂ ∂f0 (x̄).
(iv) The inclusion

[ [
α cl B ∩ NS (x̄) ⊂ ∂f0 (x̄)
x̄∈S x̄∈S
holds.
(v) For every x̄ ∈ S and v ∈ TC (x̄) ∩ NS (x̄),
f ′ (x̄, v) ≥ α kvk.
(vi) For every x̄ ∈ S,
α B ⊂ ∂f (x̄) + (TC (x̄) ∩ NS (x̄))◦ .
(vii) For every x ∈ C,
f ′ (x̄, x − x̄) ≥ α dS (x),
where x̄ ∈ projS (x).
Proof. [(i) =⇒ (ii)] Because S is the set of weak sharp minimizers, by Theo-
rem 9.5,
f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.10)
The above condition holds in particular for v ∈ TC (x). As f0 (x) = f (x) + δC (x),
which along with the fact that f0′ (x, v) = f ′ (x, v) for every x ∈ S and
v ∈ TC (x), and condition (9.10) yields
f ′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ TC (x),
thereby establishing (ii).

[(ii) =⇒ (iii)] As dom f = Rn , by Theorem 2.79 and the relation (ii),
σ∂f (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ TC (x).

By Theorem 2.35, TC (x) is a closed convex cone. Invoking Proposition 2.61 (v)
along with Proposition 2.37 yields
σ∂f (x)+NC (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn .
By the fact that NC (x) = ∂δC (x) and from the Sum Rule, Theorem 2.91,
∂f (x) + NC (x) ⊂ ∂(f + δC )(x) = ∂f0 (x), which is always true along with
Proposition 2.61 (i), the above inequality yields
σ∂f0 (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.11)
By Proposition 9.3 (v), for any x ∈ S and v ∈ Rn ,
α dTS (x) (v) = α σcl B∩NS (x) (v) = α sup hv ∗ , vi.

v ∗ ∈cl B∩NS (x)
As α > 0, the above condition becomes
α dTS (x) (v) = sup hα v ∗ , vi

v ∗ ∈cl B∩NS (x)
= sup hα v ∗ , vi = σα cl B∩NS (x) (v). (9.12)

α v ∗ ∈α cl B∩NS (x)
Substituting the above relation in the inequality (9.11) leads to
σ∂f0 (x) (v) ≥ σα cl B∩NS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.13)
By Proposition 2.82, ∂f0 (x) is a closed convex set which along with Proposi-
tion 2.61 (iv) and (ii) implies that
α cl B ∩ NS (x) ⊂ ∂f0 (x), ∀ x ∈ S,
thereby proving (iii).

[(iii) =⇒ (i)] By Proposition 2.61 (i), relation (9.13) holds which along with
(9.12) implies that
σ∂f0 (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn .
By Theorem 2.79, the above inequality leads to
f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn ,
that is, (9.5) is satisfied. Therefore by Theorem 9.5, (i) holds.

[(iii) ⇐⇒ (iv)] This holds by Lemma 9.4.
[(v) =⇒ (vi)] Because dom f = Rn , by Theorem 2.79, the relation (v) becomes
σ∂f (x) (v) ≥ α sup hv ∗ , vi, ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x).

v ∗ ∈cl B

As α > 0, for every x ∈ S and every v ∈ TC (x) ∩ NS (x), the above inequality
is equivalent to
σ∂f (x) (v) ≥ sup hα v ∗ , vi = σα cl B (v).

α v ∗ ∈α cl B
Because TC (x) ∩ NS (x) is a closed convex cone, by Proposition 2.61 (v), the
above condition yields that for every x ∈ S,
α cl B ⊂ cl {∂f (x) + (TC (x) ∩ NS (x))◦ }.
Invoking Proposition 2.15,
α B = int (α cl B) ⊂ int {∂f (x) + (TC (x) ∩ NS (x))◦ }

⊂ ∂f (x) + (TC (x) ∩ NS (x))◦ , ∀ x ∈ S,
thereby leading to (vi).

[(vi) =⇒ (v)] Applying Proposition 2.61 (v) to condition (vi) leads to
σ∂f (x) (v) ≥ σα B (v), ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x).
As dom f = Rn , by Theorem 2.79, the above inequality leads to
f ′ (x, v) ≥ α kvk, ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x),
thereby establishing (v).

[(ii) =⇒ (v)] By Proposition 9.3 (vi), for every x ∈ S,
dTS (x) (v) = σcl B∩NS (x) (v).
For every v ∈ NS (x),

dTS (x) (v) = kvk. (9.14)
Therefore, for every x ∈ S and every v ∈ TC (x)∩NS (x), the relation (ii) along
with (9.14) leads to
f ′ (x, v) ≥ α kvk,
thereby deriving (v).

[(v) =⇒ (vii)] Consider x ∈ C and let x̄ ∈ projS (x). By Theorem 2.35,
x − x̄ ∈ TC (x̄). As x̄ ∈ projS (x), by Proposition 2.52,
hx − x̄, ȳ − x̄i ≤ 0, ∀ ȳ ∈ S,
which by Definition 2.36 of normal cone, x − x̄ ∈ NS (x̄). Therefore,
x − x̄ ∈ TC (x̄) ∩ NS (x̄).

i vii
iv iii
ii v vi
FIGURE 9.1: Pictorial representation of Theorem 9.6.
Now by relation (v),
f ′ (x̄, x − x̄) ≥ α kx − x̄k.
As x̄ ∈ projS (x), dS (x) = kx − x̄k. Thus the above inequality becomes
f ′ (x̄, x − x̄) ≥ α dS (x).
Because x ∈ C and x̄ ∈ projS (x) were arbitrary, the inequality holds for every
x ∈ C and x̄ ∈ projS (x), thereby yielding the relation (vii).
[(vii) =⇒ (i)] As dom f = Rn , by Theorem 2.79 along with Definition 2.77 of
subdifferential and the relation (vii) leads to
f (x) − f (x̄) ≥ f ′ (x̄, x − x̄) ≥ α dS (x), ∀ x ∈ C, (9.15)
with x̄ ∈ projS (x). Because for any ȳ ∈ S with ȳ 6= x̄, f (ȳ) = f (x̄). Thus,
(9.15) holds for every x ∈ C and every x̄ ∈ S, thereby leading to (i).
Figure 9.1 presents the pictorial representation of Theorem 9.6. We have
devoted this chapter only to the theoretical aspect of weak sharp minimizers,
though as mentioned in the beginning this notion plays an important role
from the algorithmic point of view. For readers interested in its computational
aspects, one may refer to Burke and Deng [23, 24] and Ferris [46].

Chapter 10
Approximate Optimality Conditions
10.1 Introduction
We have discussed the various aspects of studying optimality conditions for
the convex programming problem (CP ). Throughout, we concentrated on es-
tablishing the standard or the sequential optimality conditions at the exact
point of minima. But it may not be always possible to find the point of min-
imizer. There may be cases where the infimum exists but is not attainable.
For instance, consider
min ex subject to x ∈ R.
As we know, the infimum for the above problem is zero but it is not attainable
over the whole real line. Thus for scenarios we try to approximate the solution.
In this example, for a given ε > 0, one can always find x̄ ∈ R such that ex̄ < ε.
This leads to the notion of approximate solutions, which play a crucial role in
algorithmic study of optimization problems. Recall the convex optimization
problem
where f : Rn → R is a convex function and C is a convex subset of Rn .
Definition 10.1 Let ε ≥ 0 be given. Then x̄ ∈ C is said to be an ε-solution

of (CP ) or an approximate up to ε for (CP ) if
f (x̄) ≤ f (x) + ε, ∀ x ∈ C.
This is not the only way to study approximate solutions. In the literature,
one finds the notions of various approximate solutions introduced over the
years, such as quasi ε-solution, regular ε-solution, almost ε-solution [76], to
name a few. We will define these solution concepts before moving on to study
the approximate optimality conditions. The classes of quasi ε-solution and
regular ε-solution are motivated by Ekeland’s variational principle stated in
Chapter 2.
337

338 Approximate Optimality Conditions
Definition 10.2 Let ε ≥ 0 be given. Then x̄ ∈ C is said to be quasi ε-solution

of (CP ) if
√
f (x̄) ≤ f (x) + εkx − x̄k, ∀ x ∈ C.
A point x̄ ∈ C, which is an ε-solution as well as a quasi ε-solution of (CP ), is
known as the regular ε-solution of (CP ).
The class of almost ε-solution, as the name itself suggests, seems to be
an approximation to the ε-solution. Actually, it is the approximate solution
concept associated with the perturbed problem. Before defining the almost
ε-solution, recall the feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions.
Definition 10.3 Let ε ≥ 0 be given. The ε-feasible set of (CP ) with the
feasible set C given by (3.1) is defined as
Cε = {x ∈ Rn : gi (x) ≤ ε, i = 1, 2, . . . , m}.
Then x̄ ∈ Rn is said to be an almost ε-solution of (CP ) if
x̄ ∈ Cε and f (x̄) ≤ f (x) + ε, ∀ x ∈ C.
Observe that here the almost ε-solution need not be from the actual feasible
set but should belong to the perturbed feasible set that is ε-feasible set.
Now we move on to discuss the approximate optimality conditions for the
various classes of approximate solutions. In this chapter we concentrate on the
ε-solution, quasi ε-solution, and almost ε-solution. We begin with the study
of ε-solutions.
10.2 ε-Subdifferential Approach

Consider the unconstrained convex programming problem (CPu )
min f (x) subject to x ∈ Rn . (CPu )
n
If x̄ ∈ R is an ε-solution, then by Definition 10.1,
f (x) − f (x̄) ≥ −ε, ∀ x ∈ Rn .
Using the definition of ε-subdifferential, Definition 2.109, 0 ∈ ∂ε f (x̄). The con-
verse can be established by directly applying the definition of ε-subdifferential.
This has been stated as a result characterizing the ε-solution in Theorem 2.121
as follows.

10.2 ε-Subdifferential Approach 339
Theorem 10.4 Consider the unconstrained problem (CPu ). Then x̄ ∈ Rn is

an ε-solution of (CPu ) if and only if 0 ∈ ∂ε f (x̄).
As the convex programming problem (CP ) can be reformulated as an
unconstrained problem with the objective function f replaced by (f + δC ),
then from the above theorem one has that x̄ is an ε-solution of (CP ) if and
only if
0 ∈ ∂ε (f + δC )(x̄).
Observe that dom f = Rn . If in addition, the Slater constraint qualification,
that is, C has a nonempty relative interior holds, then by invoking the Sum
Rule of ε-subdifferential, Theorem 2.115, along with the definition of ε-normal
set, Definition 2.110, leads to
0 ∈ ∂ε1 f (x̄) + NC,ε2 (x̄)
for some εi ≥ 0, i = 1, 2, with ε1 + ε2 = ε. This may be stated as the following
theorem.
Theorem 10.5 Consider the convex optimization problem (CP ). Assume
that the Slater constraint qualification holds, that is ri C is nonempty. Let
ε ≥ 0 be given. Then x̄ ∈ C is an ε-solution of (CP ) if and only if there exist
εi ≥ 0, i = 1, 2, satisfying ε1 + ε2 = ε such that
0 ∈ ∂ε1 f (x̄) + NC,ε2 (x̄).
Note that for a nonempty convex set C, by Proposition 2.14 (i), ri C is
nonempty and hence the Slater constraint qualification holds. From the above
theorem it is obvious that to obtain the approximate optimality conditions in
terms of the constraint functions gi , i = 1, 2, . . . , m, NC,ε (x) must be explicitly
expressed in their terms. Below we present the result from Strodiot, Nguyen,
and Heukemes [106], which acts as the tool in establishing the approximate
optimality conditions. But before that, we define the right scalar multiplication
from Rockafellar [97].
Definition 10.6 Let φ : Rn → R̄ be a proper convex function and λ ≥ 0.
The right scalar multiplication, φλ, is defined as

λφ(λ−1 x), λ > 0,
(φλ)(x) =
δ{0} (x), λ = 0.
A positively homogeneous convex function generated by φ, ψ, is defined as
ψ(x) = inf{(φλ)(x) : λ ≥ 0}.
Proposition 10.7 Consider ε ≥ 0 and g : Rn → R is a convex function. Let
x̄ ∈ C̄ = {x ∈ Rn : g(x) ≤ 0}. Assume that the Slater constraint qualification
holds, that is, there exist x̂ ∈ Rn such that g(x̂) < 0. Then ξ ∈ NC̄,ε (x̄) if and
only if there exists λ ≥ 0 and ε̄ ≥ 0 such that
ε̄ ≤ λg(x̄) + ε and ξ ∈ ∂ε̄ (λg)(x̄).

Proof. Using the definition of an ε-normal set, Definition 2.110,
NC̄,ε (x̄) = {ξ ∈ Rn : hξ, x − x̄i ≤ ε, ∀ x ∈ C̄}

= {ξ ∈ Rn : σC̄ (ξ) ≤ hξ, x̄i + ε},
where σC̄ (ξ) denotes the support function to the set C̄ at ξ. Observe that
dom g = Rn and hence by Theorem 2.69 continuous over the whole of Rn .
Now invoking Theorem 13.5 from Rockafellar [97] (see also Remark 10.8), the
support function σC̄ is the closure of the positively homogenous function φ
generated by g ∗ , which is defined as
φ(ξ) = inf (g ∗ λ)(ξ) = inf λg ∗ (λ−1 ξ) = inf (λg)∗ (ξ).

λ≥0 λ≥0 λ≥0
Therefore,
NC̄,ε (x̄) = {ξ ∈ Rn : inf (λg)∗ (ξ) ≤ hξ, x̄i + ε}

λ≥0
= {ξ ∈ Rn : there exists λ ≥ 0 such that (λg)∗ (ξ) ≤ hξ, x̄i + ε}
= {ξ ∈ Rn : there exists λ ≥ 0 such that
(λg)∗ (ξ) + (λg)(x̄) ≤ hξ, x̄i + ε + (λg)(x̄)}
n
= {ξ ∈ R : there exists λ ≥ 0 such that
(λg)(x) − (λg)(x̄) ≥
hξ, x − x̄i − ε − (λg)(x̄), ∀ x ∈ Rn }.
From the above condition, there exists λ ≥ 0 such that ξ ∈ ∂ε+(λg)(x̄) (λg)(x̄).
As ∂ε1 φ(x) ⊂ ∂ε2 φ(x) whenever ε1 ≤ ε2 , there exists an ε̄ satisfying
0 ≤ ε̄ ≤ ε + (λg)(x̄) such that ξ ∈ ∂ε̄ (λg)(x̄). Therefore,
[
NC̄,ε (x̄) = {ξ ∈ Rn : there exists λ ≥ 0 such that
0≤ε̄≤ε+(λg)(x̄)
(λg)∗ (ξ) + (λg)(x̄) ≤ hξ, x̄i + ε̄}
[ [
= ∂ε̄ (λg)(x̄),
0≤ε̄≤ε+(λg)(x̄) λ≥0
Remark 10.8 We state Theorem 13.5 from Rockafellar [97].
Let φ : Rn → R̄ be a proper lsc convex function. The support function

of the set C = {x ∈ Rn : φ(x) ≤ 0} is then cl ψ, where ψ is the
positively homogeneous convex function generated by φ∗ . Dually, the
closure of the positively homogeneous convex function ψ generated by
φ is the support function of the set {x∗ ∈ Rn : φ∗ (x∗ ) ≤ 0}.
For more details, readers are advised to refer to Rockafellar [97].

10.2 ε-Subdifferential Approach 341
Next we present the approximate optimality conditions for the convex

programming problem (CP ).

given by (3.1). Assume that the Slater constraint qualification is satisfied.
Let ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0,
ε̄i ≥ 0, and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε ≤ λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1
Proof. Observe that (CP ) is equivalent to the unconstrained problem

m
X
min (f + δCi )(x) subject to x ∈ Rn ,
i=1
where Ci = {x ∈ Rn : gi (x) ≤ 0}, i = 1, 2, . . . , m. By the Slater constraint

qualification, there exist x̂ ∈ Rn such that gi (x̂) < 0 for every i = 1, 2, . . . , m,
which implies ri Ci , i = 1, 2, . . . , m, is nonempty.
Pm Invoking Theorem 10.5,
there exist εi ≥ 0, i = 0, 1, . . . , m, with ε0 + i=1 εi = ε such that
m
X
0 ∈ ∂ε0 f (x̄) + NCi ,εi (x̄).
i=1
Applying Proposition 10.7 to Ci , i = 1, 2, . . . , m, there exist λ̄i ≥ 0 and ε̄i ≥ 0,

i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i g)(x̄)
i=1
and ε̄i − εi ≤ λ̄i gi (x̄) ≤ 0, i = 1, 2, . . . , m, (10.1)
where ε̄0 = ε0P

. Now summing (10.1) over i = 1, 2, . . . , m, and using the
m
condition ε0 + i=1 εi = ε leads to
m
X m
X
ε̄i − ε ≤ λ̄i gi (x̄) ≤ 0, (10.2)
i=0 i=1
as desired.
Conversely, define εi = ε̄i − λ̄i gi (x̄), i = 1, 2, . . . , m. Applying Proposi-
tion 10.7, ξi ∈ ∂ε̄i (λ̄i gi )(x̄) is equivalent to ξi ∈ NCi ,εi (x̄) for i = 1, 2, . . . , m.
Also, from the condition (10.2),
m
X m
X m
X
ε̄0 + εi + λ̄i gi (x̄) − ε ≤ λ̄i gi (x̄) ≤ 0,
i=1 i=1 i=1

Pm Pm
which implies ε̄0 + i=1 εi ≤ ε. Define ε0 = ε̄0 +εs , where εs = ε−ε̄0 − i=1 εi .
Observe that εs ≥ 0. Therefore,
m
X m
X
0 ∈ ∂ε̄0 f (x̄) + NCi ,εi (x̄) ⊂ ∂ε0 f (x̄) + NCi ,εi (x̄),
i=1 i=1
Pm
where ε0 + i=1 εi = ε. By Theorem 10.5, x̄ is an ε-solution of (CP ).
Observe that in the above approximate optimality conditions instead of the
complementary slackness conditions, we have an ε-complementary slackness
condition. Also, we derived the approximate optimality conditions in terms of
the ε-subdifferentials of the objective function as well as the constraint func-
tions at the ε-solution of (CP ) by equivalent characterization of ε-normal set
in terms of the ε-subdifferentials of the constraint functions gi , i = 1, 2, . . . , m.
10.3 Max-Function Approach

As discussed in the Section 3.5, another approach that is well known in estab-
lishing the standard KKT optimality conditions is the max-function approach.
Applying a similar approach for an ε-solution, x̄, of (CP ) we introduce an un-
constrained minimization problem
min F (x) subject to x ∈ Rn , (CPmax )
where F (x) = max{f (x)−f (x̄)+ε, g1 (x), . . . , gm (x)}. Using this max-function,
an alternative proof is provided to derive the approximate optimality condi-
tions. But before that we present a result to study the relation between the
ε-solution of (CP ) and those of the unconstrained problem (CPmax ).

given by (3.1). If x̄ is an ε̄-solution of (CP ), then x̄ is an ε-solution of the un-
constrained problem (CPmax ) for every ε ≥ ε̄. Conversely, if x̄ is an ε-solution
of (CPmax ), then it is an almost 2ε-solution of (CP ).
Proof. Because x̄ is an ε̄-solution of (CP ), x̄ ∈ C with
f (x̄) ≤ f (x) + ε̄, ∀ x ∈ C. (10.3)
Observe that F (x̄) = ε̄. To show that for every ε ≥ ε̄, x̄ ∈ Rn is an ε-solution
for (CPmax ), it is sufficient to establish that
F (x̄) ≤ F (x) + ε̄, ∀ x ∈ Rn ,
which is equivalent to proving that F (x) ≥ 0 for every x ∈ Rn .

For x ∈ C, gi (x) ≤ 0, i = 1, 2, . . . , m, while condition (10.3) ensures that

10.3 Max-Function Approach 343
f (x) − f (x̄) + ε̄ ≥ 0. Therefore, F (x) ≥ 0 for every x ∈ C. If x 6∈ C, then for

some i ∈ {1, 2, . . . , m}, gi (x) > 0 and thus, F (x) > 0 for every x ∈
/ C. Hence,
x̄ is an ε-solution of (CPmax ).
Conversely, as x̄ is an ε-solution of (CPmax ),
F (x̄) ≤ F (x) + ε, ∀ x ∈ Rn .
Therefore,
0 < ε = max{ε, g1 (x̄), g2 (x̄), . . . , gm (x̄)} ≤ F (x) + ε, ∀ x ∈ Rn .
The above condition yields
F (x) > 0 and gi (x̄) ≤ F (x) + ε, i = 1, 2, . . . , m, ∀ x ∈ Rn .
From the first condition, in particular for x ∈ C,
f (x̄) ≤ f (x) + ε ≤ f (x) + 2ε
while in the second condition, taking x = x̄ leads to
gi (x̄) ≤ 2ε, i = 1, 2, . . . , m,
thereby implying that x̄ is an almost 2ε-solution of (CP ).
fined by (3.1). Assume that the Slater constraint qualification is satisfied and
let ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0,
ε̄i ≥ 0, and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε = λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1
Proof. As x̄ is an ε-solution of (CP ), then by Theorem 10.10, x̄ is also an

ε-solution of the unconstrained minimization problem (CPmax ). By the ap-
proximate optimality condition, Theorem 10.4, for the unconstrained problem,
0 ∈ ∂ε F (x̄).
Pm Rule, Remark 2.119, there exist εi ≥ 0,

By the ε-subdifferential Max-Function
λi ≥ 0, i = 0, 1, . . . , m, with i=1 λi = 1 and ξ0 ∈ ∂ε0 (λ0 f )(x̄) provided
λ0 > 0 and ξi ∈ ∂εi (λi gi )(x̄) for those i ∈ {1, 2, . . . , m} satisfying λi > 0 such
that
X m
X X
0 = ξ0 + ξi and εi + F (x̄) − λ0 ε − λi gi (x̄) = ε, (10.4)
i∈I¯ i=0 i∈I¯
where I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. Now if λ0 = 0, again invoking

the ε-subdifferential Max-Function Rule, Remark 2.119, there exists some

i ∈ {1, 2, . . . , m} such that λi > 0, which implies I¯ is nonempty. Thus, corre-
sponding to i ∈ I, ¯ there exist ξi ∈ ∂ε (λi gi )(x̄) such that
i
X m
X X
0= ξi and εi + F (x̄) − λi gi (x̄) = ε. (10.5)
i∈I¯ i=0 i∈I¯
As F (x̄) = ε, the second equality condition reduces to

m
X X
εi = λi gi (x̄). (10.6)
i=0 i∈I¯
By the definition of ε-subdifferentiability, Definition 2.109,

¯
λi gi (x) ≥ λi gi (x̄) + hξi , x − x̄i − εi , ∀ x ∈ Rn , i ∈ I.
Therefore, the above inequality along with (10.5) and the nonnegativity of
εi , i = 0, 1, . . . , m, leads to
m
X X X m
X
λi gi (x) = λi gi (x) ≥ λi gi (x̄) − εi ,
i=1 i∈I¯ i∈I¯ i=0
which by the condition (10.6) yields

m
X
λi gi (x) ≥ 0, ∀ x ∈ Rn . (10.7)
i=1
As the Slater constraint qualification holds, there exists x̂ ∈ Rn such that

gi (x̂) < 0, i = 1, 2, . . . , m. Thus,
m
X
λi gi (x̂) < 0,
i=1
thereby contradicting the inequality (10.7). Therefore, λ0 6= 0. Now divid-

ing both relations of (10.4) throughout by λ0 > 0, along with F (x̄) = ε and
Theorem 2.117, leads to
X m
X m
X
0 ∈ ξ¯0 + ξ¯i and ε̄i − ε = λ̄i gi (x̄) ≤ 0,
i∈I¯ i=0 i=1
¯ ε̄i = εi , i = 0, 1, . . . , m, and
where ξ¯0 ∈ ∂ε̄0 f (x̄), ξ¯i ∈ ∂ε̄i (λ̄i gi )(x̄), i ∈ I,
λ0
λi ¯ Corresponding to i ∈ ¯ λ̄i = 0 with ξ¯i = 0 ∈ ∂ε̄ (λ̄i gi )(x̄),
λ̄i = , i ∈ I. / I, i
λ0
thereby leading to the approximate optimality condition
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄)
i=1

10.4 ε-Saddle Point Approach 345
along with the ε-complementary slackness condition. The converse can be

worked along the lines of Theorem 10.9 taking εs = 0.
Note that in the ε-complementary slackness condition of Theorem 10.9, we
had inequality whereas in the above theorem it is in the form of an equation.
Actually, this condition can also be treated as an inequality if for condition
(10.4) we consider F (x̄) = max{ε, 0} ≥ ε instead of F (x̄) = ε.
10.4 ε-Saddle Point Approach

While studying the optimality conditions for the convex programming problem
(CP ), we have already devoted a chapter on saddle point theory. Now to
derive the approximate optimality conditions, we make use of the ε-saddle
point approach. Recall the Lagrangian function L : Rn × Rm + → R associated
with the convex programming problem (CP ) with C given by (3.1), that is,
involving convex inequalities, introduced in Chapter 4, is given by
m
X
L(x, λ) = f (x) + λi gi (x).
i=1
Definition 10.12 A point (x̄, λ̄) ∈ Rn × Rm

+ is said to be an ε-saddle point
of (CP ) if
L(x̄, λ) − ε ≤ L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn , ∀ λ ∈ Rm

+.
Below we present a saddle point result established by Dutta [37].

given by (3.1). Let ε ≥ 0 be given and x̄ be an ε-solution of (CP ).
Assume that the Slater constraint qualification holds. Then there exist
λ̄i ≥P0, i = 1, 2, . . . , m, such that (x̄, λ̄) is an ε-saddle point of (CP ) and
m
ε + i=1 λ̄i gi (x̄) ≥ 0.
Proof. As x̄ is an ε-solution of (CP ), the following system
f (x) − f (x̄) + ε < 0,

gi (x) < 0, i = 1, 2, . . . , m,
has no solution x ∈ Rn . Define the set
Λ = {(y, z) ∈ R × Rm : f (x) − f (x̄) + ε < y, gi (x) < zi , i = 1, 2, . . . , m}.
The reader is urged to verify that Λ is an open convex set. Observe that
(0, 0) ∈
/ Λ. Therefore, by the Separation Theorem, Theorem 2.26 (ii), there

exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that

m
X
λ0 (f (x) − f (x̄) + ε) + λi gi (x) ≥ 0, ∀ x ∈ Rn . (10.8)
i=1
Working along the lines of proof of Theorem 4.2, it can be proved that
(λ0 , λ) ∈ R+ × Rm +.
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. By the
Slater constraint qualification, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m which implies
m
X
λi gi (x̂) < 0,
i=1
thereby contradicting (10.8). Therefore, λ0 6= 0 and thus the condition (10.8)

can be expressed as
m
X
f (x) − f (x̄) + ε + λ̄i gi (x) ≥ 0, ∀ x ∈ Rn , (10.9)
i=1
λi
where λ̄i = for i = 1, 2, . . . , m. In particular, taking x = x̄, the above
λ0
X m
ε+ λ̄i gi (x̄) ≥ 0. (10.10)
i=1
As gi (x̄) ≤ 0, i = 1, 2, . . . , m, which along with (10.9) leads to
m
X m
X
f (x̄) + λ̄i gi (x̄) ≤ f (x) + λ̄i gi (x) + ε, ∀ x ∈ Rn ,
i=1 i=1
which implies
L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn . (10.11)
For any λi ≥ 0, i = 1, 2, . . . , m, the feasibility of x̄ along with the nonnega-
tivity of ε and (10.10) leads to
m
X m
X
f (x̄) + λi gi (x̄) − ε ≤ f (x̄) − ε ≤ f (x̄) + λ̄i gi (x̄),
i=1 i=1
that is,
L(x̄, λ) − ε ≤ L(x̄, λ̄), ∀ λ ∈ Rm
+.
The above inequality along with (10.11) implies that (x̄, λ̄) is an ε-saddle point
of (CP ), which satisfies (10.10), thereby yielding the desired result.
Using this ε-saddle point result, we establish the approximate optimal-
ity conditions. But unlike Theorems 10.9 and 10.11, the result below is only
necessary with a relaxed ε-complementary slackness condition.


given by (3.1). Let ε ≥ 0 be given and x̄ be an ε-solution of (CP ). Assume
that the Slater constraint qualificationP holds. Then there exist ε̄0 ≥ 0, ε̄i ≥ 0,
m
and λ̄i ≥ 0, i = 1, 2, . . . , m, with ε̄0 + i=1 ε̄i = ε such that
m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε+ λ̄i gi (x̄) ≥ 0.
i=1 i=1
Proof. By the previous theorem, there exist λ̄i ≥ 0, i = 1, 2, . . . , m, such that
L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn

Pm
along with ε + i=1 λ̄i gi (x̄) ≥ 0. By Definition 10.1 of ε-solution, the above
inequality implies that x̄ is an ε-solution of the unconstrained problem
m
X
inf f (x) + λ̄i gi (x) subject to x ∈ Rn .
i=1
By Theorem 10.4, the approximate optimality condition is

m
X
0 ∈ ∂ε (f + λ̄i gi )(x̄). (10.12)
i=1
As dom f = Rn and dom gi = Rn , i = 1, 2, . . . , m, applying the Sum Rule of

ε-subdifferential,
Pm Theorem 2.115, there exist ε̄i ≥ 0, i = 0, 1, . . . , m, satisfying
ε̄0 + i=1 ε̄i = ε such that (10.12) becomes
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄),
i=1
Observe that the conditions obtained in Theorem 10.14 are only neces-
sary and not sufficient. The approach used in Theorems 10.9 and 10.11 for
the sufficiency part cannot be invoked here. But if instead of the relaxed
ε-complementary slackness condition, one has the standard complementary
slackness, which is equivalent to
m
X
λ̄i gi (x̄) = 0,
i=1
then working along the lines of Theorem 10.9 the sufficiency can also be estab-
lished. The result below shows that the optimality conditions derived in the
above theorem imply toward the 2ε-solution of (CP ) instead of the ε-solution.


given by (3.1). Let ε ≥ 0 be given. Assume that the approximate optimal-
ity condition and the relaxed ε-complementary slackness condition of Theo-
n m
Pm hold for (x̄, λ̄) ∈ R × R+ and εi ≥ 0, i = 0, 1, . . . , m, satisfying
rem 10.14
ε0 + i=1 εi = ε. Then x̄ is a 2ε-solution of (CP ).
Proof. From the approximate optimality condition of Theorem P 10.14, there

m
exist λ̄i ≥ 0, i = 1, 2, . . . , m, and εi ≥ 0, i = 0, 1, . . . , m, with ε0 + i=1 εi = ε,
ξ0 ∈ ∂ε0 f (x̄), and ξi ∈ ∂εi (λ̄i gi )(x̄), i = 1, 2, . . . , m such that
m
X
0 = ξ0 + ξi . (10.13)
i=1
By Definition 2.109 of the ε-subdifferential,
f (x) − f (x̄) ≥ hξ0 , x − x̄i − ε0 ,

λ̄i gi (x) − λ̄i gi (x̄) ≥ hξi , x − x̄i − εi , i = 1, 2, . . . , m.
Summing the above inequalities along with the condition (10.13) leads to
m
X m
X m
X
f (x) + λ̄i gi (x) ≥ f (x̄) + λ̄i gi (x̄) − (ε0 + εi ).
i=1 i=1 i=1
For any x feasible for (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which alongP

with the re-
m
laxed ε-complementary slackness condition and the fact that ε0 + i=1 εi = ε
implies that
f (x) ≥ f (x̄) − 2ε, ∀ x ∈ C.
Thus, x̄ is a 2ε-solution of (CP ).

From Definition 10.12, (x̄, λ̄) is an ε-saddle point of (CP ) if
L(x̄, λ) − ε ≤ L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn , ∀ λ ∈ Rm

+.
With respect to the ε-solution, we will call x̄ an ε-minimum solution of L(., λ̄)
and similarly, call λ̄ an ε-maximum solution of L(x̄, .).
We end this section by presenting a result relating the ε-solutions of the
saddle point to the almost ε-solution of (CP ) that was derived by Dutta [37].

given by (3.1). Let (x̄, λ̄) ∈ Rn ×Rm
+ be such that x̄ is an ε1 -minimum solution
of L(., λ̄) and λ̄ is an ε2 -maximum solution of L(x̄, .). Then x̄ is an almost
(ε1 + ε2 )-solution of (CP ).
Proof. Because λ̄ ∈ Rm m
+ is an ε2 -maximum solution of L(x̄, λ) over R+ ,
L(x̄, λ) − ε2 ≤ L(x̄, λ̄), ∀ λ ∈ Rm

+.

Pm
As L(x, λ) = f (x) + i=1 λi gi (x), the above inequality reduces to
m
X
(λi − λ̄i )gi (x̄) ≤ ε2 , ∀ λi ≥ 0, i = 1, 2, . . . , m. (10.14)
i=1
We claim that x̄ ∈ Cε2 = {x ∈ Rn : gi (x) ≤ ε2 , i = 1, 2, . . . , m}. On the

contrary, suppose that x̄ 6∈ Cε2 , which implies that the system
gi (x̄) − ε2 ≤ 0, i = 1, 2, . . . , m
does not hold. Equivalently, the above condition implies that
/ Rm
(g1 (x̄) − ε2 , g2 (x̄) − ε2 , . . . , gm (x̄) − ε2 ) ∈ −.
As Rm− is a closed convex set, by the Strict Separation Theorem, Theo-

rem 2.26 (iii), there exists γ ∈ Rm with γ 6= 0 such that
m
X m
X m
X
γi gi (x̄) − γi ε2 > 0 ≥ γi yi , ∀ y ∈ Rm
−. (10.15)
i=1 i=1 i=1
We claim that γ ∈ Rm m
+ . On the contrary, assume that γ 6∈ R+ , which implies
for some i ∈ {1, 2, . . . , m}, γi < 0. As the inequality (10.15) holds for every
y ∈ Rm − , taking the corresponding yi → −∞ leads to a contradiction. Hence,
γ ∈ Rm +. Pm
Because γ 6= 0, it can be so chosen satisfying i=1 γi = 1. Therefore, the
strict inequality condition in (10.15) reduces to
m
X
γi gi (x̄) > ε2 . (10.16)
i=1
As λ̄ ∈ Rm m m
+ and γ ∈ R+ , λ̄ + γ ∈ R+ . Therefore, taking λ = λ̄ + γ in (10.14)
leads to
m
X
γi gi (x̄) ≤ ε2 ,
i=1
which contradicts (10.16). Thus, x̄ ∈ Cε2 ⊂ Cε1 +ε2 , where
Cε1 +ε2 = {x ∈ Rn : gi (x) ≤ ε1 + ε2 , i = 1, 2, . . . , m}.
As x̄ is an ε1 -minimum solution of L(x, λ̄) over Rn ,
L(x̄, λ̄) ≤ L(x, λ̄) + ε1 , ∀ x ∈ Rn ,
which implies
m
X m
X
f (x̄) + λ̄i gi (x̄) ≤ f (x) + λ̄i gi (x) + ε1 , ∀ x ∈ Rn .
i=1 i=1

For any x feasible to (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which implies

Pi gmi (x) ≤ 0, i = 1, 2, . . . , m. Taking λi = 0, i = 1, 2, . . . , m, in (10.14),
λ̄
i=1 λ̄i gi (x̄) ≥ −ε2 . Thus, the preceding inequality reduces to
f (x̄) ≤ f (x) + ε1 + ε2 , ∀ x ∈ C.
Therefore, x̄ is an almost (ε1 + ε2 )-solution of (CP ).
10.5 Exact Penalization Approach

We have discussed different approaches like the ε-subdifferential approach,
max-function approach, and saddle point approach to study the approximate
optimality conditions. Another approach to deal with the relationship between
the different classes of approximate solutions is the penalty function approach
by Loridan [76]. In the work of Loridan that appeared in 1982, he dealt with
the notion of regular and almost regular approximate solutions. But here
we will concentrate more on ε-solutions and almost ε-solutions for which we
move on to study the work done by Loridan and Morgan [77]. This approach
helps in dealing with the stability analysis with respect to the perturbed
problem, thereby relating the ε-solutions of the perturbed problem and almost
ε-solutions of (CP ).
We consider the exact penalty function
m
X
fρ (x) = f (x) + ρi max{0, gi (x)},
i=1
where ρ = (ρ1 , ρ2 , . . . , ρm ), with ρi > 0, i = 1, 2, . . . , m and the following

unconstrained problem
min fρ (x) subject to x ∈ Rn , (CP )ρ
is associated with it. The convergence of the ε-solutions of the sequence of
problems (CP )ρ under certain assumptions leads to an ε-solution of the prob-
lem (CP ). So before moving on to establish the convergence result, we present
a result relating the ε-solution of (CP )ρ with the almost ε-solution of (CP ).
Theorem 10.17 Assume that f is bounded below on Rn . Then there ex-

(α + ε)
ists ρε = where α = inf x∈C f (x) − inf x∈Rn f (x) such that whenever
ε
ρi ≥ ρε , i = 1, 2, . . . , m, every ε-solution of (CP )ρ is an almost ε-solution of
(CP ).
Proof. Suppose that xρ is an ε-solution for (CP )ρ . Then
fρ (xρ ) ≤ fρ (x) + ε, ∀ x ∈ Rn . (10.17)

10.5 Exact Penalization Approach 351
Observe that for x ∈ C, as gi (x) ≤ 0, i = 1, 2, . . . , m, fρ (x) = f (x). This

along with the condition (10.17) and the definition of fρ implies
f (xρ ) ≤ fρ (xρ ) ≤ f (x) + ε, ∀ x ∈ C. (10.18)
Again, from the definition of fρ along with (10.18),

m
X
infn f (x) + ρi max{0, gi (xρ )} ≤ fρ (xρ ) ≤ inf f (x) + ε,
x∈R x∈C
i=1
which implies
m
X
ρi max{0, gi (xρ )} ≤ α + ε. (10.19)
i=1
(α + ε)
Now consider ρ = (ρ1 , ρ2 , . . . , ρm ) such that ρi ≥ ρε = for every
ε
i = 1, 2, . . . , m. Therefore, for the ε-solution xρ of (CP )ρ , the condition (10.19)
leads to
m
X
gi (xρ ) ≤ max{0, gi (xρ )} ≤ max{0, gi (xρ )} ≤ ε, ∀ i = 1, 2, . . . , m,
i=1
which implies xρ ∈ Cε . This along with (10.18) yields that xρ is an almost

ε-solution of (CP ).
In the above theorem, it was shown that the ε-solutions of the penal-
ized problem (CP )ρ are almost ε-solutions of (CP ). But we are more in-
terested in deriving an ε-solution rather than an almost ε-solution of (CP ).
The next result paves a way in this direction by obtaining an ε-solution of
(CP ) from the ε-solutions of the sequence of problems {(CP )ρk }k , where
ρk = (ρk1 , ρk2 , . . . , ρkm ).
Theorem 10.18 Assume that f is bounded below on Rn and satisfies the

coercivity condition
lim f (x) = +∞.

kxk→+∞
Let {ρk }k be a sequence such that limk→+∞ ρki = +∞ for every i = 1, 2, . . . , m

and xρk be the ε-solution of (CP )ρk . Then every convergent sequence of {xρk }
has a limit point that is an ε-solution of (CP ).
Proof. As {xρk } is the ε-solution of (CP )ρk , by Theorem 10.17, {xρk } is an

almost ε-solution of (CP ) and thus satisfies
f (xρk ) ≤ f (x) + ε, ∀ x ∈ C.
Because f (xρk ) is bounded above for every k, therefore by the given hypothesis

{xρk } is a bounded sequence and thus by the Bolzano–Weierstrass Theorem,

Proposition 1.3, has a convergent subsequence. Without loss of generality,
assume that xρk → xρ . As dom f = Rn , by Theorem 2.69, f is continuous on
Rn . Thus, taking the limit as k → +∞, the above inequality leads to
f (xρ ) ≤ f (x) + ε, ∀ x ∈ C. (10.20)
Using the condition (10.19) in the proof of Theorem 10.17
gi (xρk ) ≤ (α + ε)/ρki .
Again, by Theorem 2.69, gi , i = 1, 2, . . . , m, is continuous on dom gi = Rn ,

i = 1, 2, . . . , m. Therefore, taking the limit as k → +∞, the above inequality
leads to
gi (xρ ) ≤ 0, ∀ i = 1, 2, . . . , m.
Thus, xρ ∈ C along with the condition (10.20) implies that xρ is an ε-solution

of (CP ).
From the above discussions it is obvious that an ε-solution of (CP )ρ need
not be an ε-solution of (CP ) when xρ ∈ / C. But in case xρ ∈ C, it may
be considered as an ε-solution of (CP ). The result below tries to find an
ε-solution for (CP ) by using an ε-solution of (CP )ρ under the Slater constraint
qualification. Even though the result is from Loridan and Morgan [77] but the
proof is based on the work by Zangwill [114] on penalty functions. Here we
present the detailed proof for a better understanding.

given by (3.1). Assume that f is bounded below on Rn and the Slater con-
straint qualification is satisfied, that is, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m. Define β = inf x∈C f (x) − f (x̂) and γ = maxi=1,...,m gi (x̂) < 0.
Let ρ0 = (β − 1)/γ > 0. For ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 , i = 1, 2, . . . , m,
let xρ ∈/ C be an ε-solution for (CP )ρ . Let x̄ be the unique point on the line
segment joining xρ and x̂ lying on the boundary of C. Then x̄ is an ε-solution
of (CP ).
Proof. Because x̄ is a unique point on the line segment joining xρ and x̂ lying
on the boundary, the active index set I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is
nonempty. Define a convex auxiliary function as
X
F(x) = f (x) + ρ0 gi (x).
i∈I(x̄)
Observe that for i ∈ I(x̄), gi (x̄) = 0 while for i ∈

/ I(x̄), gi (x̄) < 0. Therefore,
F(x̄) = f (x̄) = fρ0 (x̄). (10.21)

10.5 Exact Penalization Approach 353
As x̄ lies on the line segment joining xρ and x̂, there exists λ ∈ (0, 1) such
that x̄ = λxρ + (1 − λ)x̂. Then by the convexity of gi , i = 1, 2, . . . , m,
gi (x̄) ≤ λgi (xρ ) + (1 − λ)gi (x̂), i = 1, 2, . . . , m.
For i ∈ I(x̄), gi (x̄) = 0, which along with the Slater constraint qualification
reduces the above inequality to
0 < −(1 − λ)gi (x̂) ≤ λgi (xρ ), ∀ i ∈ I(x̄).
Therefore, for i ∈ I(x̄), gi (xρ ) > 0, which implies
X X m
X
gi (xρ ) = max{0, gi (xρ )} ≤ max{0, gi (xρ )},
i∈I(x̄) i∈I(x̄) i=1
thereby leading to the fact that
F(xρ ) ≤ fρ0 (xρ ). (10.22)
To prove the result, it is sufficient to show that F(x̄) < F(xρ ). But first
we will show that F(x̂) < F(x̄). Consider
X
F(x̂) = f (x̂) + ρ0 gi (x̂).
i∈I(x̄)
P
Because gi (x̂) < 0, i = 1, 2, . . . , m, i∈I(x̄) gi (x̂) ≤ maxi=1,...,m gi (x̂), which
by the given hypothesis implies
F(x̂) ≤ f (x̂) + ρ0 γ = inf f (x) − 1 < f (x̄) = F(x̄). (10.23)

x∈C
The convexity of F along with (10.23) leads to
F(x̄) < λF(xρ ) + (1 − λ)F(x̄),
which implies F(x̄) < F(xρ ). Therefore, by (10.21) and (10.22),

f (x̄) < fρ0 (xρ ). By the definition of fρ , fρ0 (x) ≤ fρ (x) for every
ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 , i = 1, 2, . . . , m, which along with the fact
that xρ is an ε-solution of (CP )ρ implies
f (x̄) < fρ (xρ ) ≤ fρ (x) + ε, ∀ x ∈ Rn .
For x ∈ C, fρ (x) = f (x), which reduces the above condition to
f (x̄) ≤ f (x) + ε, ∀ x ∈ C,
thereby implying that x̄ is an ε-solution of (CP ).

For a better understanding of the above result, let us consider the following
example. Consider
inf ex subject to x ≤ 0.
Obviously the Slater constraint qualification holds. Consider x̂ = −1 and then

ρ0 = e−1 + 1. For ε = 2, xρ = 1/2 > 0 is an ε-solution for every ρ ≥ ρ0 . Here,
x̄ = 0 ∈ [−1, 1/2] is an ε-solution for the constrained problem.
Observe that one requires the fact that xρ is an ε-solution of (CP )ρ only to
establish that x̄ is an ε-solution of (CP ). So from the proof of Theorem 10.19
it can also be worked out that under the Slater constraint qualification, cor-
responding to xρ 6∈ C, there exists x̄ ∈ C such that
f (x̄) = fρ (x̄) < fρ (xρ )
for every ρ ≥ ρ0 , where ρ0 is the same as in the previous theorem. As a matter

of fact, because the set C is closed convex, one can always find such an x̄ on
the boundary of C. As xρ 6∈ C is arbitrarily chosen, then from the above
inequality it is obvious that
inf f (x) ≤ fρ (x), ∀ x ∈

/ C.
x∈C
Also, for any x ∈ C, f (x) = fρ (x), which along with the above condition
implies
inf fρ (x) = inf f (x) ≤ infn fρ (x).

x∈C x∈C x∈R
The reverse inequality holds trivially. Therefore,
inf f (x) = infn fρ (x). (10.24)

x∈C x∈R
This leads to the fact that every ε-solution of (CP ) is also an ε-solution of the
penalized unconstrained problem. Next we derive the approximate optimality
conditions for (CP ) using the penalized unconstrained problem.
fined by (3.1). Assume that the Slater constraint qualification is satisfied. Let
ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0, ε̄i ≥ 0
and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε = λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1
Proof. As x̄ ∈ C is an ε-solution of (CP ), from the above discussion it is also

an ε-solution of the penalized unconstrained problem for ρ = (ρ1 , ρ2 , . . . , ρm )
with ρi ≥ ρ0 > 0, where ρ0 is defined in Theorem 10.19. Therefore, by the

10.6 Ekeland’s Variational Principle Approach 355
approximate optimality condition, Theorem 10.4, for the unconstrained pe-

nalized problem (CP )ρ ,
0 ∈ ∂ε fρ (x̄).
As dom f = dom gi = Rn , applying the ε-subdifferential P Sum Rule, Theo-

m
rem 2.115, there exist εi ≥ 0, i = 0, 1, . . . , m, satisfying i=0 εi = ε such
that
m
X
0 ∈ ∂ε0 f (x̄) + ∂εi (max{0, ρi gi (.)})(x̄).
i=1
By the ε-subdifferential Max-Function Rule, Remark 2.119, there exist

0 ≤ λi ≤ 1 and ε̄i ≥ 0 satisfying
εi = ε̄i + max{0, ρi gi (x̄)} − λi ρi gi (x̄) = ε̄i − λi ρi gi (x̄) (10.25)
for every i = 1, 2, . . . , m such that

m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄),
i=1
ε0 ≥ 0 and λ̄i = ρi λi ≥ 0, i = 1, 2, . . . , m. The condition (10.25)

where ε̄0 =P
m
along with i=0 εi = ε implies that
m
X m
X
ε̄i − λ̄i gi (x̄) = ε,
i=0 i=1
thereby leading to the requisite conditions. The converse can be proved in a

similar fashion, as done in Theorem 10.9 with εs = 0.
Note that the conditions obtained in the above theorem are the same as
those in Theorem 10.11.
10.6 Ekeland’s Variational Principle Approach

In all the earlier sections, we concentrated on the ε-solutions. If x̄ is an
ε-solution of (CP ), then by the Ekeland’s variational principle, Theorem 2.113,
mentioned in Chapter 2 there exists x̂ ∈ C such that
√
f (x̂) ≤ f (x) + εkx − x̂k, ∀ x ∈ C.
Any x̂ satisfying the above condition is a quasi ε-solution of (CP ). Observe

that we are emphasizing only one of the conditions of the Ekeland’s variational

principle and the other two need not be satisfied. In this section we deal
with the quasi ε-solution and derive the approximate optimality conditions
for this class of approximate solutions for (CP ). But before doing so, let
us illustrate by an example that a quasi ε-solution may or may not be an
ε-solution. Consider the problem
1
inf subject to x > 0.
x
1
Note that the infimum of the problem is zero, which is not attained. For ε = ,
4
it is easy to note that xε = 4 is an ε-solution. Now x̄ > 0 is a quasi ε-solution
if
1 1 1
≤ + |x − x̄|, ∀ x > 0.
x̄ x 2
Observe that x̄ = 4.5 is a quasi ε-solution that is also an ε-solution satisfying
all the conditions of the Ekeland’s variational principle, while x̄ = 3.5 is a
quasi ε-solution that is not ε-solution. Also, it does not satisfy the condition
1 1
≤ . These are not the only quasi ε-solutions. Even points that satisfy
x̄ xε
only the unique minimizer condition of the variational principle, like x̄ = 3,
are also the quasi ε-solution to the above problem.
Now we move on to discuss the approximate optimality conditions for the
quasi ε-solutions.

given by (3.1). Let ε ≥ 0 be given. Assume that the Slater constraint qual-
ification holds. Then x̄ is a quasi ε-solution of (CP ) if and only if there exist
λi ≥ 0, i = 1, 2, . . . , m, such that
m
X √
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + εB and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1
Proof. A quasi ε-solution x̄ of (CP ) can be considered a minimizer of the

convex programming problem
√
min f (x) + εkx − x̄k subject to x ∈ C,
where C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}. By the KKT optimality

condition, Theorem 3.7, x̄ is a minimizer of the above problem if and only if
there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
√
0 ∈ ∂(f + εk. − x̄k)(x̄) + λi ∂gi (x̄).
i=1
As dom f = dom k. − x̄k = Rn , invoking the Sum Rule, Theorem 2.91, along

10.6 Ekeland’s Variational Principle Approach 357
with the fact that ∂k. − x̄k = B, the above inclusion becomes
m
X
√
0 ∈ ∂f (x̄) + εB + λi ∂gi (x̄),
i=1
along with λi gi (x̄) = 0, i = 1, 2, . . . , m, thereby yielding the requisite condi-

tions.
Conversely, by the approximate optimality condition, there exist
ξ0 ∈ ∂f (x̄), ξi ∈ ∂gi (x̄), i = 1, 2, . . . , m, and b ∈ B such that
m
X √
0 = ξ0 + λi ξi + εb. (10.26)
i=1
By Definition 2.77 of the subdifferential,
f (x) − f (x̄) ≥ hξ0 , x − x̄i, ∀ x ∈ Rn , (10.27)

gi (x) − gi (x) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i = 1, 2, . . . , m, (10.28)
and by the Cauchy–Schwartz inequality, Proposition 1.1,
kbk kx − x̄k ≥ hb, x − x̄i, ∀ x ∈ Rn . (10.29)
Combining the inequalities (10.27), (10.28), and (10.29) along with (10.26)
implies
m
X m
X √
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) + εkbk kx − x̄k ≥ 0, ∀ x ∈ Rn .
i=1 i=1
For any x feasible to (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which along with the

complementary slackness condition and the fact that λi ≥ 0, i = 1, 2, . . . , m,
reduces the above inequality to
√
f (x) − f (x̄) + εkbk kx − x̄k ≥ 0, ∀ x ∈ C.
As b ∈ B, kbk ≤ 1, thereby leading to

√
f (x) − f (x̄) + εkx − x̄k ≥ 0, ∀ x ∈ C
and thus, establishing the requisite result.

Observe that the above theorem provides a necessary as well as sufficient
characterization to the quasi ε-solution. Here the approximate optimality con-
dition is in terms of B and the subdifferentials, unlike the earlier results of
this chapter where the approximate optimality conditions were expressed in
terms of the ε-subdifferentials. Also, here we obtain the standard complemen-
tary slackness condition instead of the ε-complementary slackness or relaxed
ε-complementary slackness conditions. Results similar to the ε-saddle point
can also be worked out for quasi ε-saddle points. For more details, one can
look into Dutta [37].

10.7 Modified ε-KKT Conditions

In all discussions regarding the KKT optimality conditions in the earlier chap-
ters, it was observed that under some constraint qualification, the optimality
conditions are established at the point of minimizer, that is, the KKT opti-
mality conditions are nothing but point conditions. Due to this very reason,
the KKT conditions have not been widely incorporated in the optimization
algorithm design but only used as stopping criteria. However, if one could find
the direction of the minima using the deviations from the KKT conditions,
it could be useful from an algorithmic point of view. Work has recently been
done in this respect by Dutta, Deb, Arora, and Tulshyan [39]. They introduced
a new notion of modified ε-KKT point and used it to study the convergence
of the sequences of modified ε-KKT points to the minima of the convex pro-
gramming problem (CP ). Below we define this new concept, which is again
motivated the Ekeland’s variational principle.
Definition 10.22 A feasible point x̄ of (CP ) is said to be a modified √ ε-KKT
point for a given ε > 0 if there exists x̃ ∈ Rn satisfying kx̃ − x̄k ≤ ε and
there exist ξ˜0 ∈ ∂f (x̃), ξ˜i ∈ ∂gi (x̃) and λi ≥ 0, i = 1, 2, . . . , m, such that
m
X m
X
√
kξ˜0 + λi ξ˜i k ≤ ε and ε+ λi gi (x̄) ≥ 0.
i=1 i=1
Observe that in the ε-KKT condition, the subdifferentials are calculated at

some x̃ ∈ B√ε (x̄), whereas the relaxed ε-complementary slackness condition
is satisfied at x̄ itself. Before moving on to the satability part, we try to relate
the already discussed ε-solution with the modified ε-KKT point.
given by (3.1). Assume that the Slater constraint qualification holds and let x̄
be an ε-solution of (CP ). Then x̄ is a modified ε-KKT point.
Proof. Because x̄ is an ε-solution of (CP ), by Theorem 10.13, there exist
λi ≥ 0, i = 1, 2, . . . , m, such that x̄ is also an ε-saddle point along with
m
X
ε+ λi gi (x̄) ≥ 0. (10.30)
i=1
As x̄ is an ε-saddle point,
L(x̄, λ) ≤ L(x, λ) + ε, ∀ x ∈ Rn ,
n
√ over R . Applying Ekeland’s
which implies x̄ is an ε-solution of L(., λ)
n
vari-
ational principle,
√ Theorem 2.113, for ε, there exists x̃ ∈ R satisfying
kx̃ − x̄k ≤ ε such that x̃ is a minimizer of the problem
√
min L(x, λ) + εkx − x̃k subject to x ∈ Rn .

10.7 Modified ε-KKT Conditions 359
By the unconstrained optimality condition, Theorem 2.89,

√
0 ∈ ∂(L(., λ) + εk. − x̂k)(x̃).
As the functions dom f = dom gi = Rn , i = 1, 2, . . . , m, applying the Sum

Rule, Theorem 2.91,
m
X √
0 ∈ ∂f (x̃) + λi ∂gi (x̃) + εB,
i=1
which implies there exist ξ˜0 ∈ ∂f (x̃), ξ˜i ∈ ∂gi (x̃), i = 1, 2, . . . , m, and b ∈ B
such that
m
X √
0 = ξ˜0 + λi ξ˜i + εb,
i=1
thereby leading to
m
X √
kξ˜0 + λi ξ˜i k ≤ ε,
i=1
which along with the condition (10.30) implies that x̄ is a modified ε-KKT
point as desired.
In the case of the exact penalization approach, from Theorem 10.16 we
have that every convergent sequence of ε-solutions of a sequence of penalized
problems converges to an ε-solution of (CP ). Now is it possible to establish
such a result by studying the sequence of modified ε-KKT points and the
answer is yes, as shown in the following theorem.

given by (3.1). Assume that the Slater constraint qualification holds and let
{εk } ⊂ R+ such that εk ↓ 0 as k → +∞. For every k, let xk be a modified
εk -KKT point of (CP ) such that xk → x̄ as k → +∞. Then x̄ is a point of
minimizer of (CP ).
Proof. As for every k, xk is a modified

√ εk -KKT point of (CP ), there exists
x̃k ∈ Rn satisfying kx̃k − xk k ≤ ε and there exist ξ0k ∈ ∂f (x̃k ), ξik ∈ ∂gi (x̃k ),
and λki ≥ 0, i = 1, 2, . . . , m, such that
m
X m
X
√
kξ0k + λki ξik k ≤ εk and εk + λki gi (xk ) ≥ 0. (10.31)
i=1 i=1
We claim that {λk } ⊂ Rm k

+ is a bounded sequence. Suppose that {λ } is an
k
λ
unbounded sequence. Define a bounded sequence γ k = with kγ k k = 1.
kλk k

Because {γ k } is a bounded sequence, by the Bolzano–Weierstrass Theorem,

Proposition 1.3, it has a convergent subsequence. Without loss of generality,
assume that γ k → γ with kγk = 1. Observe that
kx̃k − x̄k ≤ kx̃k − xk k + kxk − x̄k

√
≤ εk + kxk − x̄k.
By the given hypothesis, as k → +∞, εk ↓ 0 and xk → x̄, which implies

x̃k → x̄.
Now dividing both the conditions of (10.31) throughout by kλk k yields
m √ m
1 k X k k εk X εk
k ξ + γ ξ k ≤ and γik gi (xk ) ≥ − .
kλk k 0 i=1 i i kλk k i=1
kλk k
By Proposition 2.83, f and gi , i = 1, 2, . . . , m, have compact subdifferentials.

Thus, {ξ0k } and {ξik }, i = 1, 2, . . . , m, are bounded sequences and hence by
the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent subse-
quence. Without loss of generality, let ξ0k → ξ0 and ξik → ξi , i = 1, 2, . . . , m.
By the Closed Graph Theorem, Theorem 2.84, ξ0 ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄),
i = 1, 2, . . . , m. Therefore, as k → +∞,
√
1 k εk εk
ξ → 0, → 0 and → 0,
kλk k 0 kλk k kλk k
Pm Pm Pm
which implies k i=1 γi ξi k ≤ 0, that is, i=1 γi ξi = 0 and i=1 γi gi (x̄) ≥ 0.
By Definition 2.77 of the subdifferential,
gi (x) − gi (x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i = 1, 2, . . . , m,
which yields
m
X m
X
γi gi (x) ≥ γi gi (x̄) ≥ 0, ∀ x ∈ Rn ,
i=1 i=1
thereby contradicting the existence of a point x̂ satisfying gi (x̂) < 0,

i = 1, 2, . . . , m by the Slater constraint qualification. Therefore, the sequence
{λk } is a bounded sequence and hence by the Bolzano–Weierstrass Theorem,
Proposition 1.3, has a convergent subsequence. Without loss of generality, let
λki → λi , i = 1, 2, . . . , m. Taking the limit as k → +∞ in (10.31) yields
m
X m
X
kξ0 + λi ξi k ≤ 0 and λi gi (x̄) ≥ 0. (10.32)
i=1 i=1
The norm condition in (10.32) implies that

m
X
0 = ξ0 + λi ξi ,
i=1

10.8 Duality-Based Approach to ε-Optimality 361
thereby leading to the optimality condition

m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄).
i=1
As {xk } is a modified εk -KKT point, it is a feasible point of (CP ), that

is, gi (xk ) ≤ 0, i = 1, 2, . . . , m, which implies gi (x̄) ≤ 0, i = 1, 2, . . . , m, as
k → +∞. This along with the condition in (10.32) leads to the complementary
slackness condition
m
X
λi gi (x̄) = 0.
i=1
Hence, x̄ satisfies the standard KKT optimality conditions. As (CP ) is a

convex programming problem, by the sufficient optimality conditions, x̄ is a
point of minimizer of (CP ), thereby establishing the desired result.
Theorems 10.23 and 10.24 can be combined together and stated as follows.

given by (3.1). Assume that the Slater constraint qualification holds. Let {xk }
be a sequence of the εk -solution of (CP ) such that xk → x̄ and εk ↓ 0 as
k → +∞. Then x̄ is a point of minimizer of (CP ).
10.8 Duality-Based Approach to ε-Optimality

In this chapter, in all the results on approximate optimality conditions, we
have assumed the Slater constraint qualification. But what if neither the Slater
nor any other constraint qualification is satisfied. Work has been done in this
respect by Yokoyama [113] using the exact penalization approach. In this
work, he replaced the assumption of Slater constraint qualification by relating
the penalty parameter with the ε-maximum solution of the dual problem
associated with (CP ). The results were obtained relating the ε-solutions of the
given problem (CP ), its dual problem, and the penalized problem. Here we will
discuss some of his results in comparison to the ones derived in Section 10.5.
For that purpose, we associate the dual problem
sup w(λ) subject to λ ∈ Rm , (DP )
where w(λ) = infn L(x, λ) and L(x, λ) is the Lagrange function given by
x∈R
 m
 f (x) + X λ g (x),

λi ≥ 0, i = 1, 2, . . . , m,
i i
L(x, λ) =

 i=1
−∞, otherwise.

Denote the duality gap by θ = inf x∈C f (x) − supλ∈Rm w(λ). Next we present
the theorem relating the ε-solution of (CP )ρ with the almost ε-solution of
(CP ) under the assumption of the ε-maximum solution of (DP ). Recall the
penalized problem
min fρ (x) subject to x ∈ Rn , (CP )ρ
Pm
where fρ (x) = f (x) + i=1 ρi max{0, gi (x)} and ρ = (ρ1 , ρ2 , . . . , ρm ) with
ρi > 0, i = 1, 2, . . . , m.
given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying
θ
ρ ≥ 3 + max λ̄i + ,
i=1,...,m ε
where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is an ε-maximum solution of (DP ), every x̄ that
is an ε-solution of (CP )ρ is also an almost ε-solution of (CP ).
Proof. Consider an ε-solution x̂ ∈ C of (CP ), that is,
f (x̂) ≤ inf f (x) + ε.
x∈C
As x̄ is an ε-solution of (CP )ρ , in particular,

fρ (x̄) ≤ fρ (x̂) + ε = f (x̂) + ε,
which implies that
m
X
f (x̄) + ρ max{0, gi (x̄)} ≤ inf f (x) + 2ε. (10.33)
x∈C
i=1
By the definition of duality gap θ,

inf f (x) = sup w(λ) + θ. (10.34)
x∈C λ∈Rm
For an ε-maximum solution λ̄ of the dual problem (DP ),

m
X
sup w(λ) ≤ w(λ̄) + ε ≤ f (x̄) + λ̄i gi (x̄) + ε. (10.35)
λ∈Rm i=1
Therefore, using the conditions (10.34) and (10.35), (10.33) becomes

m
X m
X
f (x̄) + ρ max{0, gi (x̄)} ≤ f (x̄) + λ̄i gi (x̄) + 3ε + θ,
i=1 i=1
that is,
m
X m
X
ρ max{0, gi (x̄)} ≤ λ̄i gi (x̄) + 3ε + θ.
i=1 i=1

10.8 Duality-Based Approach to ε-Optimality 363
Define the index set I > = {i ∈ {1, 2, . . . , m} : gi (x̄) > 0}. Thus
X m
X m
X
ρ gi (x̄) = ρ max{0, gi (x̄)} ≤ λ̄i gi (x̄) + 3ε + θ
i∈I > i=1 i=1
X
≤ λ̄i gi (x̄) + 3ε + θ,
i∈I >
which implies
X X
(ρ − max λ̄i ) gi (x̄) ≤ (ρ − λ̄i )gi (x̄) ≤ 3ε + θ.
i=1,...,m
i∈I > i∈I >
From the above condition and the given hypothesis on ρ,

X 3ε + θ
gi (x̄) ≤ ≤ ε,
i∈I >
(ρ − max λ̄i )
i=1,...,m
thereby implying that x̄ ∈ Cε = {x ∈ Rn : gi (x) ≤ ε, i = 1, 2, . . . , m}.

Also, f (x̄) ≤ fρ (x̄). As x̄ is an ε-solution of (CP )ρ ,
fρ (x̄) ≤ fρ (x) + ε, ∀ x ∈ Rn ,
which along with the fact that f (x) = fρ (x) for every x ∈ C leads to
f (x̄) ≤ f (x) + ε, ∀ x ∈ C,
thus implying that x̄ is an almost ε-solution of (CP ).
This result is the same as Theorem 10.17 except for the bound on
α+ε
the penalty parameter. Recall from Theorem 10.17 that ρ ≥ where
ε
α = inf x∈C f (x) − inf x∈Rn f (x). Also in that result the Slater constraint qual-
ification was not assumed. Observe that both the results are similar but the
parameter bounds are different. Under the Slater constraint qualification, it is
known that strong duality holds and thus the duality gap θ = 0 and the dual
problem (DP ) is solvable. Consequently, under the Slater constraint qualifi-
cation, the bound on the parameter now becomes
ρ ≥ 3 + max λ̄i ,
i=1,...,m
where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is a maximizer of (DP ). Here we were discussing

the existence of an almost ε-solution of (CP ), given an ε-solution of (CP )ρ .
From the discussion in Section 10.5, it is seen that under the Slater con-
straint qualification and for ρ ≥ ρ0 with ρ0 given in Theorem 10.19,
inf f (x) = infn fρ (x),
x∈C x∈R
thereby implying that every x̄ that is an ε-solution of (CP ) is also an ε-solution

of (CP )ρ . So in absence of any constraint qualification, Yokoyama [113] ob-
tained that x̄ is an (2ε + θ)-solution of (CP )ρ presented below.


given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying
ρ ≥ max λ̄i ,
i=1,...,m
where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is an ε-maximum solution of (DP ), every x̄ that

is an ε-solution of (CP ) is also a (2ε + θ)-solution of (CP )ρ .
Proof. As x̄ is an ε-solution of (CP ), x̄ ∈ C, which implies
fρ (x̄) = f (x̄) ≤ inf f (x) + ε.

x∈C
As λ̄ is an ε-maximum solution of (DP ), working along the lines of Theo-

rem 10.26, the above condition becomes
m
X
fρ (x̄) ≤ f (x) + λi gi (x) + 2ε + θ, ∀ x ∈ Rn .
i=1
Using the hypothesis on ρ, the above inequality leads to

m
X
fρ (x̄) ≤ f (x) + ρ max{0, gi (x)} + 2ε + θ = fρ (x) + 2ε + θ, ∀ x ∈ Rn ,
i=1
thereby implying that x̄ is a (2ε + θ)-solution of (CP )ρ .

It was mentioned by Yokoyama [113] that in the presence of the Slater con-
straint qualification and with λ̄ as some optimal Lagrange multiplier, every
x̄ that is an ε-solution of (CP ) is also an ε-solution of (CP )ρ . In his work,
Yokoyama also derived the necessary approximate optimality conditions as
established in this chapter in the absence of any constraint qualification. The
sufficiency could be established only under the assumption of the Slater con-
straint qualification.

Chapter 11
Convex Semi-Infinite Optimization
11.1 Introduction
In all the preceding chapters we considered the convex programming problem
(CP ) with the feasible set C of the form (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Observe that the

problem involved only a finite number of constraints. Now in situations where
the number of constraints involved is infinite, the problem extends to the class
of semi-infinite programming problems. Such problems come into existence in
many physical and social sciences models where it is necessary to consider
the constraints on the state or the control of the system during a period of
time. For examples from real-life scenarios where semi-infinite programming
problem are involved, readers may refer to Hettich and Kortanek [57] and
references therein. We consider the following convex semi-infinite programming
problem,
inf f (x) subject to g(x, i) ≤ 0, i ∈ I (SIP )
where f, g(., i) : Rn → R, i ∈ I are convex functions with infinite index set
I ⊂ Rm . The term “semi-infinite programming” is derived from the fact that
the decision variable x is finite while the index set I is infinite. But before
moving on with the derivation of KKT optimality conditions for (SIP ), we
present some notations that will be used in subsequent sections.
Denote the feasible set of (SIP ) by CI , that is,
CI = {x ∈ Rn : gi (x) ≤ 0, i ∈ I}.
Let RI be the product space of λ = (λi ∈ R : i ∈ I) and
R[I] = {λ ∈ RI : λi 6= 0 for finitely many i ∈ I},

[I]
while the positive cone in R[I] , R+ , is defined as
[I]
R+ = {λ ∈ R[I] : λi ≥ 0, ∀ i ∈ I}.
365

366 Convex Semi-Infinite Optimization
For a given z ∈ RI and λ ∈ R[I] , define the supporting set of λ as

supp λ = {i ∈ I : λi =
6 0},
X X
hλ, zi = λ i zi = λ i zi .
i∈I i∈supp λ
With these notations, we now move on to study the various approaches to

obtain the KKT optimality conditions for (SIP ).
11.2 Sup-Function Approach

A possible approach to solve (SIP ) is to associate a problem with a finite
number of constraints, that is, the reduced form of (SIP )
inf f (x) subject to ˜
g(x, i) ≤ 0, i ∈ I, g)
(SIP
where I˜ ⊂ I is finite and f and g(., i), i ∈ I˜ are as in (SIP ) such that the
optimal value of (SIP ) and the reduced problem (SIP g ) coincide. Then (SIP
g)
is said to be the equivalent reduced problem of (SIP ). One way to reduce (SIP )
to an equivalent (SIPg ) is to replace the infinite inequality constraints by a
single constraint,
g̃(x) = sup g(x, i),

i∈I
where g̃ : Rn → R̄ is a convex function by Proposition 2.53 (iii). Therefore,

the reduced problem is
inf f (x) subject to g̃(x) ≤ 0. g sup )
(SIP
Such a formulation was studied by Pshenichnyi [96], where g(., i) for every
g sup )
i ∈ I, were taken to be convex differentiable functions. Observe that (SIP
is of the form (CP ) studied in Chapter 3. It was seen that under the Slater
constraint qualification, the standard KKT optimality conditions for (CP )
can be obtained. Therefore, to apply Theorem 3.7, (SIP g sup ) should satisfy
the Slater constraint qualification. But this problem is equivalent to (SIP ),
for which we introduce the following Slater constraint qualification for (SIP ).
Definition 11.1 The Slater constraint qualification for (SIP ) is
(i) I ⊂ Rm is a compact set,
(ii) g(x, i) is a continuous function of (x, i) ∈ Rn × I,
(iii) There exists x̂ ∈ Rn such that g(x̂, i) < 0 for every i ∈ I.

11.2 Sup-Function Approach 367
Observe that in the Slater constraint qualification for (CP ), only condition
(iii) is considered. Here the additional conditions (i) and (ii) ensure that the
supremum is attained over I, which holds trivially in the finite index set
scenario. We now present the KKT optimality condition for (SIP ).
Theorem 11.2 Consider the convex semi-infinite programming problem
(SIP ). Assume that the Slater constraint qualification for (SIP ) holds. Then
[I(x̄)]
x̄ ∈ Rn is a point of minimizer of (SIP ) if and only if there exists λ ∈ R+
such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
where I(x̄) = {i ∈ I : g(x̄, i) = 0} denotes the active index set and the
subdifferential ∂g(x̄, i) is with respect to x.
Proof. As already observed, (SIP ) is equivalent to (SIP g sup ) and thus x̄ is

g
also a point of minimizer of (SIP sup ). As the Slater constraint qualification
for (SIP ) holds, then by conditions (i) and (ii) the supremum is attained over
I. Therefore, by condition (iii) of the Slater constraint qualification for (SIP ),
there exists x̂ ∈ Rn such that
g̃(x̂) = sup g(x̂, i) < 0,
i∈I
g sup ) satisfies the Slater constraint qualification. In-

which implies that (SIP
voking Theorem 3.7, there exists λ′ ≥ 0 such that
0 ∈ ∂f (x̄) + λ′ ∂g̃(x̄) and λ′ g̃(x̄) = 0. (11.1)
Now we consider two cases depending on g̃(x̄).
(i) g̃(x̄) < 0: By the complementary slackness condition λ′ = 0. Also, because
g̃(x̄) < 0, g(x̄, i) < 0 for every i ∈ I, which implies the active index set I(x̄)
is empty. Thus the optimality condition (11.1) reduces to
0 ∈ ∂f (x̄),
[I]
and the KKT optimality condition for (SIP ) holds with λ = 0 ∈ R+ .
(ii) g̃(x̄) = 0: By the complementary slackness condition, λ ≥ 0. Define the
supremum set as
ˆ
I(x̄) = {i ∈ I : g(x̄, i) = g̃(x̄)} = {i ∈ I : g(x̄, i) = 0},
ˆ
which implies that I(x̄) = I(x̄). By the conditions (i) and (ii) of the Slater
ˆ
constraint qualification for (SIP ), I(x̄) and hence I(x̄) is nonempty. By the
Valadier formula, Theorem 2.97, the optimality condition becomes
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈I(x̄)

[I(x̄)] P
where λi = λ′ λ̄i ≥ 0, i ∈ I(x̄) with λ̄ ∈ R+ satisfying i∈supp λ̄ λ̄i = 1.
′ [I(x̄)]
As λ ≥ 0, λ ∈ R+ and thus the preceding optimality condition can be
expressed as
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i).
i∈supp λ
Thus, the KKT optimality condition is obtained for (SIP ).

Conversely, suppose that the optimality condition holds, which implies
that there exist ξ ∈ ∂f (x̄) and ξi ∈ ∂g(x̄, i) such that
X
0=ξ+ λi ξi , (11.2)
i∈supp λ
[I(x̄)]
where λ ∈ R+ . By Definition 2.77 of the subdifferential, for every x ∈ Rn ,
f (x) ≥ f (x̄) + hξ, x − x̄i,

g(x, i) ≥ g(x̄, i) + hξi , x − x̄i, i ∈ supp λ,
which along with the condition (11.2) implies that

X X
f (x) + λi g(x, i) ≥ f (x̄) + λi g(x̄, i), ∀ x ∈ Rn .
i∈supp λ i∈supp λ
The above inequality along with the fact that g(x̄, i) = 0, i ∈ I(x̄) leads to
X
f (x) + λi g(x, i) ≥ f (x̄), ∀ x ∈ Rn .
i∈supp λ
In particular, for x ∈ CI , that is, g(x, i) ≤ 0, i ∈ I, the above condition

reduces to
f (x) ≥ f (x̄), ∀ x ∈ CI ,
thereby implying that x̄ is the minimizer of (SIP ).
11.3 Reduction Approach

As already mentioned in the preceding section, the reduction approach is one
possible method to establish the KKT optimality condition for (SIP ). The
sup-function approach was one such reduction technique. Another way to for-
g ) is to use the approach by Ben-Tal, Rosinger, and
mulate an equivalent (SIP
Ben-Israel [9] to derive a Helly-type Theorem for open convex sets using the

11.3 Reduction Approach 369
result by Klee [72]. But this approach was a bit difficult to follow. So Bor-
wein [16] provided a self-contained proof of the reduction approach involving
quasiconvex functions. Here we present the same under the assumptions that
g(., i) is convex for every i ∈ I and g(., .) is jointly continuous as a function
of (x, i) ∈ Rn × I. In the proof, one only needs g(x, i) to be jointly usc as a
function of (x, i) ∈ Rn × I along with the convexity assumption.
Proposition 11.3 Consider open and closed convex sets U ⊂ Rn and

C ⊂ Rn , respectively. The following are equivalent when I is compact.
(i) There exists x ∈ C and ε > 0 such that
x + εB ⊂ U, g(y, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I.
(ii) (a) For every set of n + 1 points {i0 , i1 , . . . , in } ⊂ I, there exists x ∈ C

such that
g(x, i0 ) < 0, g(x, i1 ) < 0, . . . , g(x, in ) < 0.
(b) For every set of n points {i1 , i2 , . . . , in } ⊂ I, there exists x ∈ C such

that
x ∈ U, g(x, i1 ) < 0, g(x, i2 ) < 0, . . . , g(x, in ) < 0.
Proof. It is obvious that (i) implies (ii)(b). Also, in particular, taking

y = x ∈ x + εB in (i) yields (ii)(a). Therefore, to establish the result, we show
that (ii) implies (i).
Suppose that both (ii)(a) and (b) are satisfied. We first prove that (ii)(a)
implies (i) with U = Rn . For any r ∈ N and any i ∈ I, define the set
1
C r (i) = {x ∈ C ∩ r cl B : g(y, i) < 0, ∀ y ∈ x + B}.
r
Observe that C r (i) ⊂ r cl B and hence is bounded.
We claim that C r (i) is convex. Consider x1 , x2 ∈ C r (i), which implies that
xj ∈ C ∩ r cl B, j = 1, 2. Because C and cl B are convex sets, C ∩ r cl B is
also convex. Thus,
(1 − λ)x1 + λx2 ∈ C ∩ r cl B, ∀ λ ∈ [0, 1].
1
For any yj ∈ xj + B, j = 1, 2,
r
1 1
y = (1 − λ)y1 + λy2 ∈ (1 − λ)(x1 + B) + λ(x2 + B)
r r
1
⊂ (1 − λ)x1 + λx2 + B.
r

As x1 , x2 ∈ C r (i), for j = 1, 2,
1
g(yj , i) < 0, ∀ yj ∈ xj + B.
r
By the convexity of g(., i), for any λ ∈ [0, 1],
g(y, i) ≤ (1 − λ)g(y1 , i) + λg(y2 , i) < 0.
1
Because the above conditions hold for arbitrary yj ∈ xj + B, j = 1, 2,
r
1
g(y, i) < 0, ∀ y ∈ (1 − λ)x1 + λx2 + B.
r
Therefore, from the definition of C r (i), it is obvious that
(1 − λ)x1 + λx2 ∈ C r (i), ∀ λ ∈ [0, 1].
Because x1 , x2 ∈ C r (i) are arbitrary, C r (i) is a convex set.
Next we prove that C r (i) is closed. Suppose that x̄ ∈ cl C r (i), which
implies there exists a sequence {xk } ⊂ C r (i) with xk → x̄. Because xk ∈ C r (i),
xk ∈ C ∩ r cl B such that
1
g(y, i) < 0, ∀ y ∈ xk + B. (11.3)
r
Because C and cl B are closed sets, C ∩ r cl B is also closed and thus,
1
x̄ ∈ C ∩ r cl B. Now if x̄ ∈ / C r (i), there exists some ȳ ∈ x̄ + B such that
r
1
g(ȳ, i) ≥ 0. As xk → x̄, for sufficiently large k, ȳ ∈ xk + B with g(ȳ, i) ≥ 0,
r
which is a contradiction to condition (11.3). Thus C r (i) is a closed set.
Finally, we claim that for some r̄ ∈ N and every set of n + 1 points
{i0 , i1 , . . . , in } ⊂ I,
n
\
C r̄ (ij ) 6= ∅.
j=0
On the contrary, suppose that for every r ∈ N, there exist n + 1 points

{ir0 , ir1 , . . . , irn } ⊂ I such that
n
\
C r (irj ) = ∅. (11.4)
j=0
Define the sequence sr = (ir0 , ir1 , . . . , irn ) ∈ I n+1 . As I is a compact set,

I n+1 is also compact and thus {sr } is a bounded sequence. By the Bolzano–
Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. With-
out loss of generality, assume that sr → s̄, where s̄ = (ī0 , ī1 , . . . , īn ) ∈ I n+1 .
As (ii)(a) is satisfied, there exists x̄ ∈ C such that
g(x̄, ī0 ) < 0, g(x̄, ī1 ) < 0, . . . , g(x̄, īn ) < 0.

Because g(., .) is jointly continuous on (x, i) ∈ Rn × I, hence jointly usc on

(x, i) ∈ Rn × I. Therefore, by the above condition there exist ε > 0 and a
neighborhood of īj , N (īj ), j = 0, 1, . . . , n, such that
g(y, ij ) < 0, ∀ y ∈ x̄ + εB, ∀ ij ∈ N (īj ), j = 0, 1, . . . , n. (11.5)
As irj → īj , one may choose r̄ ∈ N sufficiently large such that
1
kx̄k ≤ r̄, ε> and ir̄j ∈ N (īj ), j = 0, 1, . . . , n. (11.6)
r̄
Combining (11.5) and (11.6), x̄ ∈ C ∩ r̄ cl B such that
1
g(y, ir̄j ) < 0, ∀ y ∈ x̄ + B.
r̄
Therefore, x̄ ∈ C r̄ (ir̄j ) for every j = 0, 1, . . . , n, which contradicts our as-
sumption (11.4). Thus, for some r̄ ∈ N and every set of n + 1 points
{i0 , i1 , . . . , in } ⊂ I,
n
\
C r̄ (ij ) 6= ∅.
j=0
As C r̄ (ij ), j = 0, 1, . . . , n, are nonempty compact convex sets, invoking Helly’s

Theorem, Proposition 2.28,
\
C r̄ (i) 6= ∅.
i∈I
From the above condition, there exists x̃ ∈ C r (i) for every i ∈ I, which implies
x̃ ∈ C such that
1
g(y, i) < 0, ∀ y ∈ x̃ + B, i ∈ I.
r
1
Taking U = Rn and defining ε = for r ∈ N, the above condition yields (i).
r
To complete the proof, we have to finally show that (ii)(b) also implies (i).
This can be done by expressing (ii(b) in the form of (ii)(a). Consider a point
i′ ∈
/ I and define I ′ = {i′ } ∪ I, which is again a compact set. Also define the
function g ′ on Rn × I ′ as

′ ′ −δ, x ∈ U,
g (x, i ) = and g ′ (x, i) = g(x, i), i ∈ I,
+∞, x 6∈ U,
where δ > 0. Observe that g ′ (., i), i ∈ I ′ satisfies the convexity assumption
and is jointly usc on Rn × I ′ . Therefore, (ii)(b) is equivalent to the existence
of x ∈ C for every n points {i1 , i2 , . . . , in } ⊂ I,
g ′ (x, i′ ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0. (11.7)

As (ii)(a) is also satisfied, for every n+1 points {i0 , i1 , . . . , in } ⊂ I there exists
x ∈ C such that
g ′ (x, i0 ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0. (11.8)
Combining the conditions (11.7) and (11.8), (ii)(b) implies that for every n+1
points {i0 , i1 , . . . , in } ⊂ I ′ there exists x ∈ C such that
g ′ (x, i0 ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0,
which is of the form (ii)(a). As we have already seen that (ii)(a) implies (i)
with U = Rn , there exists x ∈ C and ε > 0 such that
g ′ (x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I ′ ,
which by the definition of the function g ′ implies that
y ∈ U, g(x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I,
that is,
x + εB ⊂ U, g(x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I.
Thus, (ii)(b) implies (i) and hence establishes the result.

Using the above proposition, Borwein [16] obtained the equivalent reduced
form of (SIP ) under the relaxed Slater constraint qualification. The convex
semi-infinite programming (SIP ) is said to satisfy the relaxed Slater constraint
qualification for (SIP ) if given any n+1 points {i0 , i1 , . . . , in } ⊂ I, there exists
x̂ ∈ Rn such that
g(x̂, i0 ) < 0, g(x̂, i1 ) < 0, . . . , g(x̂, in ) < 0.
Observe that the Slater constraint qualification for (SIP ) also implies the
relaxed Slater constraint qualification for (SIP ). Now we present the KKT
g ).
optimality condition for (SIP ) by reducing it to the equivalent (SIP

(SIP ). Suppose that the relaxed Slater constraint qualification for (SIP ) holds.
Then x̄ is a point of minimizer of (SIP ) if and only if there exist n points
{i1 , i2 , . . . , in } ⊂ I, λij ≥ 0, j = 1, 2, . . . , n, such that
n
X
0 ∈ ∂f (x̄) + λij ∂g(x̄, ij ).
j=1
Proof. Define an open set
U = {x ∈ Rn : f (x) < f (x̄)}.

Consider x1 , x2 ∈ U . By the convexity of f ,
f ((1 − λ)x1 + λx2 ) ≤ (1 − λ)f (x1 ) + λf (x2 ) < f (x̄), ∀ λ ∈ [0, 1],
which implies (1 − λ)x1 + λx2 ∈ U . Because x1 , x2 ∈ U were arbitrary, U is

a convex set. As x̄ is a point of minimizer of (SIP ), there does not exist any
x ∈ Rn and ε > 0 such that
x + εB ⊂ U, g(y, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I,
which implies that (i) of Proposition 11.3 does not hold. Therefore either
(ii)(a) or (ii)(b) is not satisfied. As the relaxed Slater constraint qualification
for (SIP ), which is the same as (ii)(a), holds, (ii)(b) cannot be satisfied. Thus,
there exist n points {i1 , i2 , . . . , in } ⊂ I such that
f (x) < f (x̄), g(x, ij ) < 0, j = 1, 2, . . . , n, (11.9)
has no solution. We claim that x̄ is a point of minimizer of the reduced problem

inf f (x) subject to g(x, ij ) ≤ 0, j = 1, 2, . . . , n. g)
(SIP
g ), that is,
Consider a feasible point x̃ of (SIP
g(x, ij ) ≤ 0, j = 1, 2, . . . , n. (11.10)
Also, by the relaxed Slater constraint qualification for (SIP ), corresponding

g ),
to the n + 1 points {i0 , i1 , i2 , . . . , in } ⊂ I with {i1 , i2 , . . . , in } as in (SIP
there exists x̂ such that
g(x̂, ij ) < 0, j = 0, 1, 2, . . . , n. (11.11)
By the convexity of g(., ij ), j = 1, 2, . . . , n, along with the conditions (11.10)

and (11.11),
g((1 − λ)x̃ + λx̂, ij ) ≤ (1 − λ)g(x̃, ij ) + λg(x̂, ij ) < 0, ∀ λ ∈ (0, 1).
Because the system (11.9) has no solution,
f ((1 − λ)x̃ + λx̂) ≥ f (x̄), ∀ λ ∈ (0, 1). (11.12)
As dom f = Rn , by Theorem 2.69, f is continuous on Rn . Thus, taking the

limit as λ → 0, the inequality (11.12) leads to
f (x̃) ≥ f (x̄).
g ), x̄ is a point of minimizer of
Because x̃ is an arbitrary feasible point of (SIP
g
(SIP ). Observe that by (11.11), the Slater constraint qualification is satisfied

by the reduced problem. Therefore, invoking Theorem 3.7, there exist λij ≥ 0,
j = 1, 2, . . . , n, such that
n
X
0 ∈ ∂f (x̄) + λij ∂g(x̄, ij ) and λij g(x̄, ij ) = 0, j = 1, 2, . . . , n. (11.13)
j=1
[I(x̄)]
We claim that λ ∈ R+ . From the complementary slackness condition in
the optimality condition (11.13), if ij 6∈ I(x̄), λij = 0, whereas for ij ∈ I(x̄),
λij ≥ 0. For i 6∈ {i1 , i2 , . . . , in } but i ∈ I(x̄), define λi = 0. Therefore,
[I(x̄)]
λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
thereby yielding the KKT optimality condition for (SIP ). The converse can
be worked out along the lines of Theorem 11.2.
11.4 Lagrangian Regular Point

In both the preceding sections on reduction techniques to establish the KKT
optimality condition for (SIP ), the index set I was taken to be compact. But
then what about the scenarios where the index set I need not be compact.
To look into such situations, López and Vercher [75] introduced the concept
of Lagrangian regular point, which we present next. Before we define this
concept, we introduce the following notations.
For x̄ ∈ CI having nonempty I(x̄), define
[
S̄(x̄) = {∂g(x̄, i) ∈ Rn : i ∈ I(x̄)} = ∂g(x̄, i)
i∈I(x̄)
and
X [I(x̄)]
b
S(x̄) = cone co S̄(x̄) = { λi ∂g(x̄, i) ∈ Rn : λ ∈ R+ }.
i∈supp λ
For any i ∈ I(x̄), consider ξi ∈ ∂g(x̄, i) ⊂ S̄(x̄). By Definition 2.77 of the

subdifferential,
g(x, i) − g(x̄, i) ≥ hξi , x − x̄i, ∀ x ∈ Rn .
In particular, for {xk } ⊂ CI , that is, g(xk , i) ≤ 0 along with the fact that
g(x̄, i) = 0, i ∈ I(x̄), the above inequality reduces to
hξi , xk − x̄i ≤ 0, ∀ k ∈ N.

11.4 Lagrangian Regular Point 375
For any {αk } ⊂ R+ ,
hξi , αk (xk − x̄)i ≤ 0, ∀ k ∈ N.
Taking the limit as k → +∞ in the above inequality,
hξi , zi = lim hξi , αk (xk − x̄)i ≤ 0,

k→∞
where z ∈ cl cone (CI − x̄). By Theorem 2.35, z ∈ TCI (x̄). Because i ∈ I(x̄)
and ξi ∈ S̄(x̄) were arbitrary,
hξ, zi ≤ 0, ∀ ξ ∈ S̄(x̄),
which implies z ∈ (S̄(x̄))◦ . Because z ∈ TCI (x̄) is arbitrary,
TCI (x̄) ⊂ (S̄(x̄))◦ , (11.14)

b
which by Propositions 2.31(iii) and 2.37 implies that cl S(x̄) ⊂ NCI (x̄). As
(SIP ) is equivalent to the unconstrained problem,
inf (f + δCI )(x) subject to x ∈ Rn , (SIPu )
therefore, if x̄ is a point of minimizer of (SIP ), it is also a minimizer of (SIPu ).
By Theorem 3.1, the following optimality condition
0 ∈ ∂f (x̄) + NCI (x̄) (11.15)

b
holds. So rather than cl S(x̄) ⊂ NCI (x̄), one would prefer the reverse relation
so that the above condition may be explicitly expressed in terms of the subd-
ifferential of the constraints. Thus, we move on with the notion of Lagrangian
regular point studied in López and Vercher [75].
Definition 11.5 x̄ ∈ CI is said to be a Lagrangian regular point if

(i) I(x̄) is empty: TCI (x̄) = Rn .
b
(ii) I(x̄) is nonempty: (S̄(x̄))◦ ⊂ TCI (x̄) and S(x̄) is closed.
Recall the equivalent Abadie constraint qualification for (CP ) studied in

Chapter 3, that is, S(x̄) ⊂ TC (x̄), where
S(x̄) = {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}.
By Proposition 3.9,
b
(S(x̄))◦ = cl S(x̄)
which by Proposition 2.31(ii) and (iii) along with the fact that S(x̄) is a closed
convex cone implies that
b
(S(x̄))◦
= (S̄(x̄))◦ = S(x̄)

where
m
[
S̄(x̄) = {∂gi (x̄) : i = 1, 2, . . . , m} = ∂gi (x̄).
i=1
Therefore, the Abadie constraint qualification is equivalent to
(S̄(x̄))◦ ⊂ TC (x̄).
Moreover, in the derivation of the standard KKT optimality condition for

(CP ), Theorem 3.10, under the Abadie constraint qualification, we further
b
assumed that S(x̄) was closed. A careful look at the Lagrangian regular point
when I(x̄) is nonempty shows that it is an extension of the Abadie constraint
qualification to (SIP ) along with the closedness condition. Next we derive
the KKT optimality condition for (SIP ) under the Lagrangian regularity.
The result is due to López and Vercher [75].

(SIP ). Assume that x̄ ∈ CI is a Lagrangian regular point. Then x̄ is a point
[I(x̄)]
of minimizer of (SIP ) if and only if there exists λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i).
i∈supp λ
Proof. Suppose that x̄ is a point of minimizer of (SIP ), which by the condi-

tion (11.15) implies
0 ∈ ∂f (x̄) + NCI (x̄).
Therefore, there exists ξ ∈ ∂f (x̄) such that
−ξ ∈ NCI (x̄).
Depending on the emptiness and nonemptiness of I(x̄), we consider the fol-

lowing two cases.
(i) I(x̄) is empty: As x̄ is a Lagrangian regular point, TCI (x̄) = Rn , which by
Proposition 2.37 implies that
NCI (x̄) = (TCI (x̄))◦ = {0}.
Therefore, the optimality condition reduces to
0 ∈ ∂f (x̄).
(ii) I(x̄) is nonempty: As x̄ is a Lagrangian regular point, (S̄(x̄))◦ ⊂ TCI (x̄),

which by Proposition 2.31(i) and (iii) yields that
NCI (x̄) ⊂ (S̄(x̄))◦◦

b
= cl cone co S̄(x̄) = cl S(x̄). (11.16)

Also, by relation (11.14), TCI (x̄) ⊂ (S̄(x̄))◦ , which implies that

b
cl S(x̄) ⊂ NCI (x̄) (11.17)
is always true. Combining the conditions (11.16) and (11.17),

b
NCI (x̄) = cl S(x̄).
b
Again, by Definition 11.5 of the Lagrangian regular point, S(x̄) is closed and
hence, the KKT optimality condition becomes
b
0 ∈ ∂f (x̄) + S(x̄),
[I(x̄)]
which implies that there exists λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
as desired. The converse can be worked out as in Theorem 11.2, thereby es-
tablishing the requisite result.
In Goberna and López [50], they consider the feasible direction cone to
F ⊂ Rn at x̄ ∈ F , DF (x̄), defined as
DF (x̄) = {d ∈ Rn : there exists λ > 0 such that x̄ + λd ∈ F }.
It is easy to observe that
DF (x̄) ⊂ cone (F − x̄). (11.18)
In case F is a convex set, by Definition 2.46 of convex set, for every x ∈ F ,
x̄ + λ(x − x̄) ∈ F, ∀ λ ∈ (0, 1),
which implies x − x̄ ∈ DF (x̄). Because x ∈ F was arbitrary, F − x̄ ⊂ DF (x̄).

As DF (x̄) is a cone,
cone (F − x̄) ⊂ DF (x̄). (11.19)
Combining the conditions (11.18) and (11.19),
DF (x̄) = cone (F − x̄)
and hence, the tangent cone to F at x̄ is related to the feasible direction set
as
TF (x̄) = cl DF (x̄).
For the convex semi-infinite programming problem (SIP ), the feasible set is
CI . In particular, taking F = CI in the above condition yields
TCI (x̄) = cl DCI (x̄).

From (11.14) we have that TCI (x̄) ⊂ (S̄(x̄))◦ ; thus the above condition yields
DCI (x̄) ⊂ TCI (x̄) ⊂ (S̄(x̄))◦ . (11.20)
In Definition 11.5 of the Lagrangian regular point, for x̄ ∈ CI with nonempty

I(x̄),
(S̄(x̄))◦ ⊂ TCI (x̄) = cl DCI (x̄).
Combining the above condition with (11.20), which along with Proposi-
b
tion 2.31 and the fact that S(x̄) = cone co S̄(x̄) implies that
b
cl S(x̄) = (S̄(x̄))◦◦ = (DCI (x̄))◦ .
b
By the closedness condition of S(x̄) at the Lagrangian regular point x̄, the
preceding condition reduces to
b
S(x̄) = (DCI (x̄))◦ .
The above qualification condition is referred to as the convex locally Farkas-

Minkowski problem in Goberna and López [50]. For x̄ ∈ ri CI , TCI (x̄) = Rn ,
which by Proposition 2.37 implies that
NCI (x̄) = (TCI (x̄))◦ = {0}.
As TCI (x̄) ⊂ (S̄(x̄))◦ always holds, by Proposition 2.31,
{0} ⊂ (S̄(x̄))◦◦ ⊂ (TCI (x̄))◦ = {0}
which implies
b
S(x̄) = cl cone co S̄(x̄) = (S̄(x̄))◦◦ = {0}.
b
Thus, S(x̄) = NCI (x̄) for every x̄ ∈ ri CI . Therefore one needs to impose
the Lagrangian regular point condition to boundary points only. This fact
was mentioned in Goberna and López [50] and was proved in Fajardo and
López [44].
Recall that in Chapter 3, under the Slater constraint qualification, Propo-
b
sition 3.3 leads to NC (x̄) = S(x̄), which by Propositions 2.31 and 2.37 is
equivalent to
b
TC (x̄) = (S(x̄))◦ b
= (cl S(x̄))◦
= S(x̄).
b
Also, under the Slater constraint qualification, by Lemma 3.5, S(x̄) is closed.
Hence the Slater constraint qualification leads to the Abadie constraint qualifi-
cation along with the closedness criteria. A similar result also holds for (SIP ).
But before that we present Gordan’s Theorem of Alternative which plays an
important role in establishing the result.

Proposition 11.7 (Gordan’s Theorem of Alternative) Consider xi ∈ Rn for

i ∈ I, where I is an arbitrary index set. If co{xi : i ∈ I} is a closed set, then
the equivalence holds between the negation of system (I) and system (II),
where
(I) {x ∈ Rn : hxi , xi < 0, i ∈ I} 6= ∅,
(II) 0 ∈ co {xi : i ∈ I}.
Proof. If xi = 0 for some i ∈ I, then the result holds trivially as system (I)
is not satisfied while system (II) holds. So without loss of generality, assume
that xi 6= 0 for every i ∈ I.
Suppose (I) does not hold. Let 0 6∈ co {xi : i ∈ I}. As by hypothesis
co {xi : i ∈ I} is closed, by the Strict Separation Theorem, Theorem 2.26(iii),
there exists a ∈ Rn with a 6= 0 such that
ha, xi < 0, ∀ x ∈ co {xi : i ∈ I}.
In particular, for xi ∈ co {xi : i ∈ I},
ha, xi i < 0, ∀ i ∈ I,
which implies system (I) holds, a contradiction to our supposition. Thus

0 ∈ co {xi : i ∈ I}, that is, system (II) holds.
[I]
Suppose
P that system (II) holds, which implies that there exists λ ∈ R+
with i∈supp λ λi = 1 such that
X
0= λi xi .
i∈supp λ
Let x̄ ∈ {x ∈ Rn : hxi , xi < 0, i ∈ I}. Therefore,

X
0 = h0, x̄i = λi hxi , x̄i < 0,
i∈supp λ
which is a contradiction. Thus, system (I) does not hold, thereby completing
the proof.
The hypothesis that co {xi : i ∈ I} is a closed set is required as shown
in the
h example from López and Vercher [75]. Consider xi = (cos i, sin i) and
π π
I= − , . Observe that (0, 0) 6∈ co {xi : i ∈ I} as
2 2
0 ∈ co {cos i : i ∈ I} and 0 ∈ co {sin i : i ∈ I}
cannot hold simultaneously because 0 ∈ co {cos i : i ∈ I} is possible only if

π
i = − at which sin i = −1. Thus system (II) is not satisfied. Also, there
2
does not exist any x = (xc , xs ) such that
xc cos i + xs sin i < 0, ∀ i ∈ I. (11.21)

π
On the contrary, suppose that such an x exists. In particular, taking i = −
2
π
and i = 0 yields that xs > 0 and xc < 0, respectively. But as the limit i → ,
2
xc cos i + xs sin i → xs ,
that is, for some i ∈ I, xc cos i + xs sin i > 0, which is a contradiction to

(11.21). Hence, system (I) is also not satisfied. Note that co {xi : i ∈ I} is not
π
closed because taking the limit as i → ,
2
xi = (cos i, sin i) → (0, 1)
and (0, 1) ∈
/ co {xi : i ∈ I}.
Now we present the result from López and Vercher [75] showing that
the Slater constraint qualification for (SIP ) implies that every feasible point
x ∈ CI is a Lagrangian regular point.
Proposition 11.8 Consider the convex semi-infinite programming problem

(SIP ). If the Slater constraint qualification for (SIP ) holds, then every x̄ ∈ CI
is a Lagrangian regular point.
Proof. Suppose that x̄ ∈ CI , that is,
g(x̄, i) ≤ 0, i ∈ I.
Define g(x) = supi∈I g(x, i).

(i) I(x̄) is empty: By conditions (i) and (ii) of the Slater constraint qualifica-
tion for (SIP ), g(x̄) < 0 which by Proposition 2.67 implies that x̄ ∈ ri CI .
Therefore,
TCI (x̄) = cl cone (CI − x̄) = Rn .
(ii) I(x̄) is nonempty: We claim that I(x̄) is compact. By condition (i) of the
Slater constraint qualification for (SIP ), I is compact. Because I(x̄) ⊂ I, I(x̄)
is bounded. Now consider {ik } ⊂ I(x̄) such that ik → i. By the compactness
of I and the fact that I(x̄) ⊂ I, i ∈ I. As ik ∈ I(x̄),
g(x̄, ik ) = 0, ∀ k ∈ N,
which by condition (ii) of the Slater constraint qualification for (SIP ), that
is, the continuity of g(x, i) with respect to (x, i) in Rn × I, implies that as the
limit k → +∞, g(x̄, i) = 0 and thus i ∈ I(x̄). Therefore, I(x̄) is closed, which
along with the boundedness impliesSthat I(x̄) is compact.
Next we will show that S̄(x̄) = i∈I(x̄) ∂g(x̄, i) is compact. Suppose that
{ξk } ⊂ S̄(x̄) with ξk → ξ. As ξk ∈ S̄(x̄), there exists ik ∈ I(x̄) such that
ξk ∈ ∂g(x̄, ik ), that is, by Definition 2.77 of the subdifferential,
g(x, ik ) − g(x̄, ik ) ≥ hξk , x − x̄i, ∀ x ∈ Rn .

As I(x̄) is compact, without loss of generality, assume that ik → i ∈ I(x̄).

Taking the limit as k → +∞ in the above inequality along with condition (ii)
of the Slater constraint qualification for (SIP ) yields
g(x, i) − g(x̄, i) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
that is, ξ ∈ ∂g(x̄, i) with i ∈ I(x̄). Therefore ξ ∈ S̄(x̄), thereby implying that
S̄(x̄) is closed. As dom g(., i) = Rn , i ∈ I(x̄), by Proposition 2.83, ∂g(x̄, i) is
compact for i ∈ I(x̄), which implies for every ξi ∈ ∂g(x̄, i) there exists Mi > 0
such that
kξi k ≤ Mi , ∀ i ∈ I(x̄).
Because I(x̄) is compact, the supremum of Mi over I(x̄) is attained, that is,
sup Mi = M < +∞.

i∈I(x̄)
Therefore, for every i ∈ I(x̄),
kξk ≤ M, ∀ ξ ∈ ∂g(x̄, i),
which implies that S̄(x̄) is bounded. Thus S̄(x̄) is compact.

As I(x̄) is nonempty, g(x̄) = 0. By condition (iii) of the Slater constraint
qualification for (SIP ), there exists x̂ such that g(x̂, i) < 0, i ∈ I, which along
with condition (i) yields that
g(x̂) < 0 = g(x̄).
Thus, x̄ is not a point of minimizer of g and hence 0 ∈ / ∂g(x̄). By the Valadier

formula, Theorem 2.97,
[
0∈ / co ∂g(x̄, i) = co S̄(x̄).
i∈I(x̄)
Because S̄(x̄) is compact, by Theorem 2.9, co S̄(x̄) is also compact. By Propo-

b
sition 3.4, cone co S̄(x̄) and hence S(x̄) is closed.
Finally to establish that x̄ is a Lagrangian regular point, we will prove
that (S̄(x̄))◦ ⊂ TCI (x̄). Define the set
S ′ (x̄) = {x ∈ Rn : hξ, xi < 0, ∀ ξ ∈ S̄(x̄)}.
As co S̄(x̄) is closed with 0 6∈ co S̄(x̄), by the Gordan’s Theorem of Alternative,

Proposition 11.7, S ′ (x̄) is nonempty. Therefore, by Proposition 2.67, ri (S̄(x̄))◦
is nonempty. Consider z ∈ ri (S̄(x̄))◦ , which implies
hξ, zi < 0, ∀ ξ ∈ S̄(x̄),

which leads to
hξ, zi < 0, ∀ ξ ∈ co S̄(x̄).
Again, by the Valadier formula, Theorem 2.97,

[
∂g(x̄) = co ∂g(., i) = co S̄(x̄).
i∈I(x̄)
Thus,
hξ, zi < 0, ∀ ξ ∈ ∂g(x̄).
By the compactness of I, the supremum is attained by g(., i) over I. As

dom g(., i) = Rn , dom g = Rn . Therefore, by Theorem 2.79,
g ′ (x̄, z) = max hξ, zi.

ξ∈∂g(x̄)
Also, because dom g = Rn , by Proposition 2.83, ∂g(x̄) is compact and thus

g ′ (x̄, z) < 0, which implies that there exists λ̄ > 0 such that
g(x̄ + λz) < 0, ∀ λ ∈ (0, λ̄),
which implies that for every λ ∈ (0, λ̄),
g(x̄ + λz, i) < 0, ∀ i ∈ I.
Hence, x̄ + λz ∈ CI for every λ ∈ (0, λ̄), which yields

1
z∈ (CI − x̄) ⊂ cl cone (CI − x̄) = TCI (x̄).
λ
Because z ∈ ri (S̄(x̄))◦ was arbitrary, ri (S̄(x̄))◦ ⊂ TCI (x̄), which along with
the closedness of the tangent cone, TCI (x̄), leads to
(S̄(x̄))◦ = cl (ri (S̄(x̄))◦ ) ⊂ TCI (x̄).
From both cases, we obtain that x̄ ∈ CI is a Lagrangian regular point. Because

x̄ was arbitrary, every feasible point is a Lagrangian regular point under the
assumption of the Slater constraint qualification for (SIP ), thereby establish-
ing the result.
11.5 Farkas–Minkowski Linearization

In the previous section on the Lagrangian regular point, observe that it is
defined at a point and hence is a local qualification condition. We observed

11.5 Farkas–Minkowski Linearization 383
that the notion of Lagrangian regular point is also known as the convex locally
Farkas–Minkowski problem. In this section, we will discuss about the global
qualification condition, namely the Farkas–Minkowski qualification studied
in Goberna, López, and Pastor [51] and López and Vercher [75]. Before in-
troducing this qualification condition, let us briefly discuss the concept of
Farkas–Minkowski system for a linear semi-infinite system from Goberna and
López [50].
Consider a linear semi-infinite system
Θ = {hxi , xi ≥ ci , i ∈ I} (LSIS)
The relation hx̃, xi ≥ c̃ is a consequence relation of the system Θ if every solu-
tion of Θ satisfies the relation. A consistent (LSIS) Θ is said to be a Farkas–
Minkowski system, in short, an FM system, if every consequent relation is a
consequence of some finite subsystem. Before we state the Farkas–Minkowski
qualification for convex semi-infinite programming problem (SIP ), we present
some results on the consequence relation and the FM system from Goberna
and López [50].
Proposition 11.9 hx̃, xi ≥ c̃ is a consequence relation of the consistent

(LSIS) Θ if and only if
(x̃, c̃) ∈ cl cone co {(xi , ci ), i ∈ I; (0, −1)}.
Proof. Denote by K ⊂ Rn+1 the convex cone
K = cone co {(xi , ci ), i ∈ I; (0, −1)}.
Consider i′ ∈
/ I and define (xi′ , ci′ ) = (0, −1) and I ′ = {i′ } ∪ I. Thus
K = cone co {(xi , ci ), i ∈ I ′ }.
[I ′ ]
Suppose that (x̃, c̃) ∈ cl K, which implies there exist {λk } ⊂ R+ , {sk } ∈ N,
{xikj } ⊂ Rn and {cikj } ⊂ R satisfying ikj ∈ I ′ for j = 1, 2, . . . , sk , such that
sk
X
(x̃, c̃) = lim λkj (xikj , cikj ). (11.22)
k→∞
j=1
As K ⊂ Rn+1 , by the Carathéodory Theorem, Theorem 2.8, 1 ≤ sk ≤ n + 2.

For any k ∈ N with sk < n + 2, define λkj = 0 and any arbitrary (xikj , cikj )
with ikj ∈ I ′ for j = sk + 1, sk + 2, . . . , n + 2. Therefore, condition (11.22)
becomes
n+2
X
(x̃, c̃) = lim λkj (xikj , cikj ). (11.23)
k→∞
j=1
If x̄ is a solution of (LSIS) Θ,
hxikj , x̄i ≥ cikj , j = 1, 2, . . . , n + 2,

which along with the fact λkj ≥ 0, j = 1, 2, . . . , n + 2, leads to

n+2
X n+2
X
λkj hxikj , x̄i ≥ λkj cikj .
j=1 j=1
Taking the limit as k → +∞ in the above condition along with (11.23) yields
hx̃, x̄i ≥ c̃.
Because x̄ was arbitrary, hx̃, xi ≥ c̃ is a consequence relation of (LSIS) Θ.

Conversely, suppose that hx̃, xi ≥ c̃ is a consequence relation of (LSIS) Θ.
We claim that (x̃, c̃) ∈ cl K. On the contrary, suppose that (x̃, c̃) 6∈ cl K. By the
Strict Separation Theorem, Theorem 2.26 (iii), there exist (γ, γn+1 ) ∈ Rn × R
with (γ, γn+1 ) 6= (0, 0) such that
hγ, xi + γn+1 c > α̃ = hγ, x̃i + γn+1 c̃, ∀ (x, c) ∈ cl K. (11.24)
As 0 ∈ K ⊂ cl K, the above condition implies that α̃ < 0. We claim that
hγ, xi + γn+1 c ≥ 0, ∀ (x, c) ∈ cl K.
On the contrary, suppose that there exists (x̄, c̄) ∈ cl K such that
α̃ < hγ, x̄i + γn+1 c̄ < 0.
Because cl K is a cone, λ(x̄, c̄) ∈ cl K for λ > 0. Therefore, the above inequality
along with the condition (11.24) implies that
α̃ < hγ, λx̄i + γn+1 λc̄ < 0. (11.25)
Taking the limit as λ → +∞,
hγ, λx̄i + γn+1 λc̄ → −∞,
thereby contradicting the relation (11.25). Thus,
hγ, xi + γn+1 c ≥ 0 > α̃, ∀ (x, c) ∈ cl K. (11.26)
As (0, −1) ∈ K ⊂ cl K, for λ > 0, (0, −λ) ∈ cl K. Therefore, (11.26) leads to
−λγn+1 ≥ 0, ∀ λ > 0,
which implies that γn+1 ≤ 0. We now consider the following two cases.
(i) γn+1 = 0: The condition (11.26) reduces to
hγ, xi ≥ 0 > hγ, x̃i, ∀ (x, c) ∈ cl K.
In particular, for (xi , ci ) ∈ cl K, i ∈ I ′ ,
hγ, xi i ≥ 0 > hγ, x̃i, ∀ i ∈ I ′ . (11.27)

As (LSIS) Θ is consistent, there exists x̄ ∈ Rn such that
hx̄, xi i ≥ ci , ∀ i ∈ I. (11.28)
Therefore, from the inequalities (11.27) and (11.28), for any λ > 0,
hx̄ + λγ, xi i ≥ ci , ∀ i ∈ I,
which implies x̄ + λγ is a solution of (LSIS) Θ. By our supposition, hx̃, xi ≥ c̃

is a consequence relation of Θ, which implies that
hx̃, x̄ + λγi ≥ c̃. (11.29)
By the condition (11.27), as the limit λ → +∞,
hx̄ + λγ, x̃i → −∞,
thereby contradicting the inequality (11.29).

(ii) γn+1 < 0: As γn+1 6= 0, dividing (11.26) throughout by −γn+1 and setting
−γ
x̄ = ,
γn+1
hx̄, xi − c ≥ 0 > hx̄, x̃i − c̃, ∀ (x, c) ∈ cl K.
The above condition holds in particular for (xi , ci ) ∈ K ⊂ cl K, i ∈ I. Thus,
hx̄, xi i − ci ≥ 0 > hx̄, x̃i − c̃, ∀ i ∈ I,
that is,
hx̄, xi i ≥ ci , i ∈ I and hx̄, x̃i < c̃.
Therefore, x̄ is a solution of (LSIS) Θ but does not satisfy the consequence

relation hx̃, xi ≥ c̃, which is again a contradiction.
Hence, our assumption was wrong and (x̃, c̃) ∈ cl K, thereby completing the
proof.
Next we present the characterization of the FM system Θ from Goberna,
López, and Pastor [51].
Proposition 11.10 (LSIS) Θ is an FM system if and only if
(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}.
Proof. Suppose that
(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}.

As in the proof of Proposition 11.9, consider i′ ∈

/ I. Define (xi′ , ci′ ) = (0, −1)
and I ′ = {i′ } ∪ I. Therefore,
(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I ′ } ⊂ Rn+1 ,
which by the Carathéodory Theorem, Theorem 2.8, implies that there exist
λj ≥ 0 and ij ∈ I ′ , j = 1, . . . , s, with 1 ≤ s ≤ n + 2 such that
s
X
(x̃, c̃) = λj (xij , cij ).
j=1
Invoking Proposition 11.9 to the finite system, hx̃, xi ≥ c̃ is a consequence

relation of the finite system
hxij , xi ≥ cij , j = 1, 2, . . . , s.
Therefore, (LSIS) Θ is an FM system.

Conversely, suppose that Θ is an FM system, which implies that a conse-
quence relation hx̃, xi ≥ c̃ of the infinite system
hxi , xi ≥ ci , i ∈ I
can be expressed as a consequence of finite subsystem, that is, hx̃, xi ≥ c̃ is a

consequence relation of a finite subsystem
hxi , xi ≥ ci , i = 1, 2, . . . , s.
Applying Proposition 11.9 to the above finite system,
(x̃, c̃) ∈ cl cone co {(xi , ci ), i = 1, 2, . . . , s; (0, −1)}

= cone co {(xi , ci ), i = 1, 2, . . . , s; (0, −1)}
which implies that
(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)},

Now we introduce the Farkas–Minkowski qualification for (SIP ) from Gob-
erna, López, and Pastor [51].
Definition 11.11 The convex semi-infinite programming problem (SIP ) is

said to satisfy the Farkas–Minkowski (FM) qualification if (LSIS)
Θ = {g(y, i) + hξ, x − yi ≤ 0 : (y, i) ∈ Rn × I, ξ ∈ ∂g(y, i)}
is an FM system.

Using the FM qualification, we present the KKT optimality condition for

(SIP ) from Goberna, López, and Pastor [51].

(SIP ). Assume that the FM qualification holds. Then x̄ ∈ Rn is a point of
[I(x̄)]
minimizer of (SIP ) if and only if there exists λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄).
i∈supp λ
e that is,
Proof. Define the solution set of (LSIS) Θ by C,
e = {x ∈ Rn : g(y, i) + hξ, x − yi ≤ 0, ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)}.
C
Note that as dom g(., i) = Rn , i ∈ I, by Proposition 2.83, ∂g(y, i) is nonempty

for every y ∈ Rn and hence C e is defined. We claim that CI = C. e
Suppose that x̃ ∈ CI , which implies that g(x̃, i) ≤ 0, i ∈ I. For any y ∈ Rn
and ξi ∈ ∂g(y, i) for i ∈ I, by Definition 2.77 of the subdifferential,
g(y, i) + hξi , x − yi ≤ g(x, i), ∀ x ∈ Rn .
In particular, for x = x̃, the above inequality becomes
g(y, i) + hξi , x̃ − yi ≤ g(x̃, i) ≤ 0, ∀ i ∈ I,

e Because x̃ ∈ CI was arbitrary, CI ⊂ C.
which leads to x̃ ∈ C. e
e
Conversely, suppose that x̃ ∈ C, which implies that for every y ∈ Rn and
i ∈ I,
g(y, i) + hξ, x̃ − yi ≤ 0, ∀ ξ ∈ ∂g(y, i).
In particular, taking y = x̃, the above condition reduces to
g(x̃, i) ≤ 0, i ∈ I,
thereby implying that x̃ ∈ CI . Because x̃ ∈ C e was arbitrary, Ce ⊂ CI . Hence

e
CI = C and thus the FM system Θ is a linearization of CI .
As x̄ is a point of minimizer of (SIP ), it is also the point of minimizer of
the equivalent problem
inf f (x) subject to x ∈ CI .
Because dom f = Rn , by Theorem 3.1,
0 ∈ ∂f (x̄) + NCI (x̄),
which implies that there exists ξ ∈ ∂f (x̄) such that −ξ ∈ NCI (x̄). By Defini-
tion 2.36 of the normal cone,
hξ, x − x̄i ≥ 0, ∀ x ∈ CI ,

and thus it is a consequence relation of (LSIS) Θ. As Θ is an FM system, by

Proposition 11.10, there exist λj ≥ 0, ξj ∈ ∂g(x̄, ij ), ij ∈ I, j = 1, 2, . . . , s,
and µ ≥ 0 such that
s
X
(ξ, hξ, x̄i) = λj (−ξj , g(yj , ij ) − hξj , yj i) + (0, −µ).
j=1
Without loss of generality, assume that λj > 0, j = 1, 2, . . . , s. Now multiply-

ing the above condition throughout by (−x, 1) leads to
s
X
hξ, x̄ − xi = λj (g(yj , ij ) + hξj , x − yj i) − µ.
j=1
As µ ≥ 0, the above relation leads to

s
X
hξ, x̄ − xi ≤ λj (g(yj , ij ) + hξj , x − yj i). (11.30)
j=1
As ξj ∈ ∂g(x̄, ij ), j = 1, 2, . . . , s,
g(yj , ij ) + hξj , x − yj i ≤ g(x, ij ), ∀ x ∈ Rn . (11.31)
Also, because ξ ∈ ∂f (x̄),
f (x̄) − f (x) ≤ hξ, x̄ − xi, ∀ x ∈ Rn . (11.32)
Combining the conditions (11.30), (11.31) and (11.32), yields that

s
X
f (x̄) ≤ f (x) + λj g(x, ij ), ∀ x ∈ Rn . (11.33)
j=1
In particular, taking x = x̄ in the above inequality, which along with the

feasibility of x̄ leads to
s
X
0≤ λj g(x̄, ij ) ≤ 0,
j=1
s
X
that is, λj g(x̄, ij ) = 0. Thus,
j=1
λj g(x̄, ij ) = 0, ∀ j = 1, 2, . . . , s.
By our supposition, λj > 0, j = 1, 2, . . . , s which implies that g(x̄, ij ) = 0, that

is, ij ∈ I(x̄), j = 1, 2, . . . , s. Define λi = 0 for i ∈ I(x̄) and i 6∈ {i1 , i2 , . . . , is }.

Therefore, from the inequality (11.33), x̄ is the minimizer of the unconstrained

problem
X
inf f (x) + λi g(x, i) subject to x ∈ Rn ,
i∈supp λ
[I(x̄)]
where λ ∈ R+ . By the optimality condition for the unconstrained problem,
Theorem 2.89,
X
0 ∈ ∂(f + λi g(., i))(x̄).
i∈supp λ
As dom f = dom g(., i) = Rn for i ∈ I(x̄), by the Sum Rule, Theorem 2.91,
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ
thereby establishing the KKT optimality conditions for (SIP ). The converse
can be worked out along the lines of Theorem 11.2.
Another notion that implies (LSIS) Θ is an FM system is that of the
canonically closed system. Below we define this concept and a result relating
a canonically closed system and an FM system from Goberna, López, and
Pastor [51].
Definition 11.13 (LSIS) Θ = {hxi , xi ≥ ci , i ∈ I} is canonically closed if

the following conditions hold:
(i) There exists x̂ ∈ Rn such that hxi , x̂i > ci , i ∈ I.
(ii) The set {(xi , ci ), i ∈ I} is compact.
The following result provides different conditions under which Θ is an FM

system, part of the proof is due to Hestenes [55].
Proposition 11.14 If the consistent (LSIS) Θ satisfies one of the following

conditions, then it is an FM system:
(i) cone co {(xi , ci ), i ∈ I; (0, −1)}) is closed.
(ii) cone co {(xi , ci ), i ∈ I} is closed.
(iii) (LSIS) Θ is canonically closed.
Proof. (i) Suppose that cone co {(xi , ci ), i ∈ I; (0, −1)} is closed. Then by
Proposition 11.9, hx̃, xi ≥ c̃ is the consequence relation of (LSIS) Θ if and
only if
(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)},

which by Proposition 11.10 is equivalent to Θ being an FM system.

(ii) Define
K = cone co {(xi , ci ), i ∈ I; (0, −1)} and e = cone co {(xi , ci ), i ∈ I}.

K
It is easy to observe that

e + cone (0, −1).
K=K (11.34)
Suppose that K e is closed. We claim that K is closed, which by (i) will then
imply that Θ is an FM system. Consider a bounded sequence {(x̃k , c̃k )} ⊂ K
such that (x̃k , c̃k ) → (x̃, c̃). Note that (x̃k , c̃k ) for k ∈ N can be expressed as
(x̃k , c̃k ) = (xk , ck ) + λk (0, −1), ∀ k ∈ N, (11.35)

e and {λk } ⊂ R+ . Assume that {λk } is an unbounded
where {(xk , ck )} ⊂ K
sequence, which implies λk → +∞. From the condition (11.35)
1 1
(x̃k , c̃k ) = (xk , ck ) + (0, −1), ∀ k ∈ N.
λk λk
As (x̃k , c̃k ) → (x̃, c̃), taking the limit as k → +∞ in the above condition
implies that
1
(xk , ck ) → (0, 1),
λk
e ⊂ cl K. By Proposition 11.9,
that is, (0, 1) ∈ cl K
0 = h0, xi ≥ 1
is a consequent relation of (LSIS) Θ, which is impossible. Thus, {λk } is a

bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, it
has a convergent subsequence. Without loss of generality, assume that λk → λ.
By the condition (11.35) and boundedness of the sequences {(x̃k , c̃k )} and
{λk }, {(xk , ck )} is also a bounded sequence. Without loss of generality, by the
Bolzano–Weierstrass Theorem, let (xk , ck ) → (x, c). As Ke is closed, (x, c) ∈ K.
e
Therefore, taking the limit as k → +∞, (11.35) along with (11.34) yields that
(x̃, c̃) = (x, c) + λ(0, −1) ∈ K,
and hence K is closed, which by (i) implies that (LSIS) Θ is an FM system.

(iii) Suppose that Θ is a canonically closed system. Therefore, the set
{(xi , ci ), i ∈ I} is compact. We claim that
e = cone co {(xi , ci ), i ∈ I} ⊂ Rn+1
K
is closed. On the contrary, assume that it is not closed, which implies there

exists a convergent sequence {(x̃k , c̃k )} ⊂ K e such that (x̃k , c̃k ) → (x̃, c̃) with
e but (x̃, c̃) ∈
(x̃, c̃) ∈ cl K e Because (x̃k , c̃k ) ∈ K,
/ K. e there exist λi ≥ 0, ik ∈ I
kj j
for j = 1, 2, . . . , sk , {sk } ⊂ N such that

sk
X
(x̃k , c̃k ) = λikj (xikj , cikj ).
j=1
By the Carathéodory Theorem, Theorem 2.8, 1 ≤ sk ≤ n + 2. For sk < n + 2,

choose any ikj ∈ I with λikj = 0, j = sk + 1, sk + 2, . . . , n + 2. Therefore the
above condition can be rewritten as
n+2
X
(x̃k , c̃k ) = λikj (xikj , cikj ). (11.36)
j=1
As {(xi , ci ), i ∈ I} is compact, by the Bolzano–Weierstrass Theorem,

{(xikj , cikj )} has a convergent subsequence. Without loss of generality, assume
that (xi , ci ) → (xi , ci ) ∈ {(xi , ci ), i ∈ I}. By assumption, (x̃, c̃) ∈ e
/ K,
kj kj j j
which implies {λikj } is unbounded. Otherwise, there exists some convergent

subsequence of {λikj }. Without relabeling, assume that λikj → λij . Therefore,
taking the limit as k → +∞ in (11.36) leads to
n+2
X
(x̃, c̃) = e
λij (xij , cij ) ∈ K,
j=1
which is a contradiction of our assumption.

Pn+2 λikj
Denote λk = j=1 λikj . Observe that the sequence { } ⊂ R+ is a
λk
bounded sequence and hence by the Bolzano–Weierstrass Theorem has a con-
λikj
vergent subsequence. Without loss of generality, assume that → λij ≥ 0,
Pn+2 λk
j = 1, 2, . . . , n+2, with j=1 λij = 1. Dividing the condition (11.36) through-
out by λk and taking the limit as k → +∞, which along with the fact that
λk → +∞ yields
n+2
X n+2
X
λj (xij , cij ) = 0 with λij = 1. (11.37)
j=1 j=1
As (LSIS) Θ is canonically closed, there exists x̂ ∈ Rn such that hxi , x̂i > ci ,
i ∈ I, that is,
h(xi , ci ), (x̂, −1)i = hxi , x̂i − ci > 0, ∀ i ∈ I. (11.38)
Combining the relations (11.37) and (11.38) along with the fact that λij ≥ 0,
j = 1, 2, . . . , n + 2, not all simultaneously zero, implies that
n+2
X n+2
X
0= λij h(xij , cij ), (x̂, −1)i = λij (hxij , x̂i − cij ) > 0,
j=1 j=1

e is closed,
which is impossible. Thus our assumption was wrong and hence K
which by (ii) yields that Θ is an FM system.
As seen in Section 11.4, the Slater constraint qualification for (SIP ) im-
plies that every feasible point is a Lagrangian regular point; we will now
present the relation between the Slater constraint qualification for (SIP ) and
the FM qualification. For this we will need the following result from Goberna,
López, and Pastor [51].
Proposition 11.15 Consider a closed convex set F ⊂ Rn and let Fb denote

the boundary points of F . Also consider (LSIS)
Θ = {hxi , xi ≤ ci , i ∈ I}
such that
(i) every point of F is a solution of the system Θ,
(ii) there exists x̂ ∈ F such that hxi , xi < ci , i ∈ I,
(iii) given any y ∈ Fb , there exists some i ∈ I such that hxi , yi = ci .
Then F is the solution set of Θ, that is,
F = {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}.
Proof. Observe that by condition (i), F ⊂ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}.

Conversely, suppose that there exists
z ∈ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}
and z 6∈ F . By (ii), there exists x̂ ∈ F such that hxi , xi < ci for every i ∈ I. As
F is a closed convex set, the line segment joining x̂ and z meets the boundary
Fb at only one point, say y ∈ (x̂, z). Therefore, there exists λ ∈ (0, 1) such
that y = (1 − λ)x̂ + λz ∈ Fb . By condition (iii), there exists ī ∈ I such that
hxī , yi = cī . (11.39)
By the conditions on x̂ and z,
hxī , x̂i < cī and hxī , zi ≤ cī ,
respectively. Thus
hxī , yi = (1 − λ)hxī , x̂i + λhxī , zi < cī ,
which is a contradiction to (11.39). Hence, F ⊃ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I},

Now we are in a position to present the implication that the Slater con-
straint qualification for (SIP ) leads to the FM qualification from López and
Vercher [75].

Proposition 11.16 Consider the convex semi-infinite programming problem

(SIP ) with bounded feasible set CI . If the Slater constraint qualification for
(SIP ) holds, then the FM qualification condition is also satisfied.
Proof. Define g(x) = sup g(x, i) and CIb = {x ∈ CI : g(x) = 0}. Consider the
i∈I
(LSIS)
e = {hξ, xi ≤ hξ, yi, y ∈ C b , ξ ∈ ∂g(y)}.
Θ I
We claim that Θ e is a linear representation of CI . For any ξ ∈ ∂g(y), by

Definition 2.77 of the subdifferential,
hξ, x − yi ≤ g(x) − g(y), ∀ x ∈ Rn . (11.40)
As the Slater constraint qualification for (SIP ) holds, by condition (i) and
(ii), the supremum g(x) is attained. Therefore, in particular, for y ∈ CIb and
x ∈ CI , that is, g(y) = 0 and g(x) = supi∈I g(x, i) ≤ 0, respectively, the above
hξ, xi ≤ hξ, yi, ∀ ξ ∈ ∂g(y).
Because x ∈ CI was arbitrary, every point of CI is a solution of Θ.e

By condition (iii) of the Slater constraint qualification for (SIP ), there
exists x̂ ∈ Rn such that
g(x̂, i) < 0, ∀ i ∈ I,
that is, x̂ ∈ CI . By the conditions (i) and (ii) of the Slater constraint qualifi-
cation for (SIP ), g(x̂) < 0. Therefore, in particular, taking y ∈ CIb and x = x̂,
the condition (11.40) becomes
hξ, x̂ − yi ≤ g(x̂) < 0, ∀ ξ ∈ ∂g(y) (11.41)
for every y ∈ CIb . Also, in particular, taking y = ȳ ∈ CIb and x = ȳ in the
inequality (11.40), the relation holds with equality.
From the above discussion, it is obvious that the conditions of Proposi-
tion 11.15 are satisfied and thus, CI coincides with the solution set of (LSIS)
e that is,
Θ,
CI = {x ∈ Rn : hξ, xi ≤ hξ, yi, ∀ y ∈ CIb , ∀ ξ ∈ ∂g(y)}. (11.42)
e is canonically closed and hence is an FM system.
We now show that Θ
By the condition (11.41),
hξ, x̂i < hξ, yi, ∀ y ∈ CIb , ∀ ξ ∈ ∂g(y)
e
and thus, the condition of (ii) of Definition 11.13 is satisfied. Therefore, for Θ
to be a canonically closed system, we need to show that the set
e = {(ξ, hξ, yi), y ∈ CIb , ξ ∈ ∂g(y)}
K

is compact.
As CI is bounded and CIb ⊂ CI , therefore CIb is bounded. Also by condition
(i) and (ii) of the Slater constraint qualification for (SIP ), the supremum is
attained over I. Therefore, as dom g(., i) = Rn , i ∈ I, dom g = Rn , which
by Theorem 2.69 is continuous over Rn . Thus, for a sequence {yk } ⊂ CIb with
yk → ȳ, g(yk ) → g(ȳ). Also, as g(yk ) = 0 for every k ∈ N, g(ȳ) = 0, which
implies ȳ ∈ CIb . Hence, CIb is closed, thereby yielding the compactness of CIb .
[
∂g(CIb ) = {ξ ∈ Rn : ξ ∈ ∂g(y), y ∈ CIb } = ∂g(y)
y∈CIb
is compact.
Now consider a convergent sequence {(ξk , hξk , yk i)} ⊂ K e where {yk } ⊂ C b
I
˜ γ̃), that is ξk → ξ˜
and ξk ∈ ∂g(yk ) ⊂ ∂g(CIb ). Suppose that (ξk , hξk , yk i) → (ξ,
and hξk , yk i → γ̃. As ξk → ξ,˜ which by the compactness of ∂g(C b ) implies that
I
˜
ξ ∈ ∂g(CI ). Because {yk } ⊂ CIb , {yk } is a bounded sequence. By the Bolzano–
b
Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. With-

out loss of generality, assume that yk → ỹ, which by compactness of CIb leads
to ỹ ∈ CIb . As hξk , yk i → γ, which by the convergence of {ξk } and {yk } im-
plies that γ̃ = hξ, ˜ ỹi. Because ξk ∈ ∂g(yk ) with ξk → ξ˜ and yk → ỹ, by
the Closed Graph Theorem, Theorem 2.84, ξ˜ ∈ ∂g(ỹ). Thus (ξ, ˜ hξ, e
˜ ỹi) ∈ K,
thereby yielding the closedness of K. e
By the compactness of CIb and ∂g(CIb ), kyk ≤ M1 for every y ∈ CIb and
kξk ≤ M2 for every ξ ∈ ∂g(CIb ), respectively. Therefore, for any (ξ, hξ, yi) ∈ K e
along with the Cauchy–Schwarz inequality, Proposition 1.1,
kξk2 + |hξ, yi| ≤ kξk2 + kξk kyk ≤ M1 (M1 + M2 )
and hence Ke is bounded. Therefore, Ke is compact, thus implying that the

e e is
system Θ is canonically closed, which by Proposition 11.14 yields that Θ
an FM system.
Next we claim that (LSIS)
Θ = {hξ, x − yi ≤ −g(y, i), (y, i) ∈ Rn × I, ξ ∈ ∂g(y, i)}
is equivalent to Θ,e that is, both Θ and Θ e have the same solution set. To
establish this claim, we will prove that CI is the solution set of Θ.
For any (y, i) ∈ Rn × I and ξi ∈ ∂g(y, i), by Definition 2.77 of the subdif-
ferential,
hξi , x − yi ≤ g(x, i) − g(y, i), ∀ x ∈ Rn . (11.43)
In particular, taking x ∈ CI , that is,
g(x, i) ≤ 0, ∀ i ∈ I,

11.6 Noncompact Scenario: An Alternate Approach 395
the inequality (11.43) reduces to
hξi , x − yi ≤ −g(y, i), ∀ (y, i) ∈ Rn × I, ∀ ξi ∈ ∂g(y, i),
which implies x is a solution of (LSIS) Θ. Because x ∈ CI was arbitrary,

every point of CI is a solution of Θ.
By condition (iii) of the Slater constraint qualification, there exists x̂ ∈ Rn
such that
g(x̂, i) < 0, ∀ i ∈ I.
In particular, taking x = x̂ in (11.43) yields that for every (y, i) ∈ Rn × I,
hξi , x̂ − yi ≤ g(x̂, i) − g(y, i) < −g(y, i), ∀ ξi ∈ ∂g(y, i).
Also, taking y = ỹ ∈ CIb , where
CIb = {y ∈ Rn : there exists some i ∈ I such that g(y, i) = 0},
along with x = ỹ and ĩ ∈ I(ỹ) in the condition (11.43) leads to
hξ˜i , ỹ − ỹi = 0 = −g(ỹ, ĩ), ∀ ξi ∈ ∂g(ỹ, ĩ).
As the conditions of Proposition 11.15 are satisfied,
CI = {x ∈ Rn : hξ, x − yi ≤ −g(y, i), ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)},

(11.44)
that is, CI is a solution set of (LSIS) Θ.
From the conditions (11.42) and (11.44), both Θ e and Θ are equivalent
e
(LSIS). Because Θ is an FM system, Θ is also an FM system, which along with
Definition 11.11 yields that (SIP ) satisfies the FM qualification condition,
thereby establishing the requisite result.
11.6 Noncompact Scenario: An Alternate Approach

In this section we discuss the recent epigraphical approach, or more precisely
the sequential approach studied in Chapter 7 as a tool to establish the KKT
optimality conditions for (SIP ). This approach has been studied for convex
programming problems with infinite constraints by Jeyakumar [66, 67] and
Goberna, Jeyakumar, and López [49]. Here we will present the KKT optimality
conditions for (SIP ) from the work of Dinh, Mordukhovich, and Nghia [33]
under the following relaxed closed cone constraint qualification for (SIP ),
that is,
[
cone co epi g(., i)∗ is closed.
i∈I

But before establishing the optimality conditions for (SIP ) as a consequence

of the optimality conditions expressed in terms of the epigraph of the conjugate
functions, we present a result from Jeyakumar [67].
Proposition 11.17 Consider an lsc proper convex function φ : Rn → R̄ and

define
F = {x ∈ Rn : φ(x) ≤ 0}.
If F is nonempty, then epi δF∗ = cl cone co epi φ∗ .
Proof. Suppose that F is nonempty. From the definition of the indicator

function to the set F , δF ,
φ(x) ≤ 0 = δF (x), for x ∈ F,

φ(x) ≤ +∞ = δF (x), for x ∈
/ F.
Therefore, φ(x) ≤ δF (x) for every x ∈ Rn . By Proposition 2.103,
δF∗ (ξ) ≤ φ∗ (ξ), ∀ ξ ∈ Rn . (11.45)
We claim that cl cone co epi φ∗ ⊂ epi δF∗ . By Definition 2.101 of the conjugate
function, δF∗ is the same as the support function to the set F , that is, δF∗ = σF .
By Proposition 2.102, δF∗ is lsc, hence by Theorem 1.9, epi δF∗ is closed. Also,
as σF is a sublinear function, by Theorem 2.59 epi σ is a convex cone. So it
is sufficient to establish that epi φ∗ ⊂ epi δF∗ . Consider any (ξ, α) ∈ epi φ∗ ,
which by condition (11.45) implies that
δF∗ (ξ) ≤ φ∗ (ξ) ≤ α.
Therefore, (ξ, α) ∈ epi δF∗ . As (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ epi δF∗ .
Because epi δF∗ is a closed convex cone,
cl cone co epi φ∗ ⊂ epi δF∗ . (11.46)
To complete the proof, we will prove the converse inclusion, that is,
epi δF∗ ⊂ cl cone co epi φ∗ . Suppose that (ξ, α) ∈ / cl cone co epi φ∗ . As
δF = σF is a sublinear function with δF (0) = 0. Therefore, (0, −1) 6∈ epi δF∗ ,
∗ ∗
which by the relation (11.46) implies that (0, −1) 6∈ cl cone co epi φ∗ . Define
the convex set
Fe = {(1 − λ)(ξ, α) + λ(0, −1) ∈ Rn × R : λ ∈ [0, 1]}.
We claim that
Fe ∩ cl cone co epi φ∗ = ∅.
On the contrary, suppose that there exists λ̃ ∈ (0, 1) such that
(1 − λ̃)(ξ, α) + λ̃(0, −1) ∈ cl cone co epi φ∗ . (11.47)

We claim that {0} × R+ ⊂ cl cone co epi φ∗ . To establish this fact, it is

sufficient to show that (0, 1) ∈ cl cone co epi φ∗ . On the contrary, suppose
that
(0, 1) 6∈ cl cone co epi φ∗ .
Then by the Strict Separation Theorem, Theorem 2.26 (iii), there exist
(a, γ) ∈ Rn × R with (a, γ) 6= (0, 0) such that
ha, ξi + γα > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ . (11.48)
As (0, 0) ∈ cl cone co epi φ∗ , γ < 0. We will show that
ha, ξi + γα ≥ 0 > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ .
On the contrary, suppose that (ξ, α) ∈ cl cone co epi φ∗ such that
0 > ha, ξi + γα > γ. (11.49)
For any λ > 0, λ(ξ, α) ∈ cl cone co epi φ∗ , which by the conditions (11.48)
and (11.49) implies that
0 > λ(ha, ξi + γα) > γ.
Taking the limit as λ → +∞ in the above inequality,
λ(ha, ξi + γα) → −∞,
which is a contradiction. Therefore,
ha, ξi + γα ≥ 0 > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ . (11.50)
Consider any ξ ∈ dom φ∗ and ε > 0. Thus, (ξ, φ∗ (ξ) + ε) ∈ cl cone co epi φ∗ .
Therefore, from the condition (11.50),
ha, ξi + γ(φ∗ (ξ) + ε) ≥ 0,
which implies
1
(ha, ξi + γφ∗ (ξ)) + γ ≥ 0.
ε
Taking the limit as ε → +∞ in the above inequality, which along with (11.50)
yields that 0 > γ ≥ 0, which is a contradiction. Thus, (0, 1) ∈ cl cone co epi φ∗
and hence
{0} × R+ = cone (0, 1) ⊂ cl cone co epi φ∗ . (11.51)
From the relations (11.47) and (11.51),
(1 − λ̃)(ξ, α) = (1 − λ̃)(ξ, α) + λ̃(0, −1) + (0, λ̃) ∈ cl cone co epi φ∗ ,

which implies
1
(ξ, α) = {(1 − λ̃)(ξ, α)} ∈ cl cone co epi φ∗ ,
(1 − λ̃)
thereby contradicting our assumption. Thus
Fe ∩ cl cone co epi φ∗ = ∅.
As Fe is a compact convex set and cl cone co epi φ∗ is a closed convex cone, by

the Strict Separation Theorem, Theorem 2.26 (iii), there exists (a, γ) ∈ Rn ×R
with (a, γ) 6= (0, 0) such that
ha, zi + γβ > ha, z̃i + γ β̃ (11.52)
for every (z, β) ∈ cl cone co φ∗ and (z̃, β̃) ∈ Fe. As (0, 0) ∈ cl cone co epi φ∗ ,
0 > ha, zi + γβ, ∀ (z, β) ∈ Fe.
Also, as (0, −1), (ξ, α) ∈ Fe, from condition (11.52),
γ>0 and ha, ξi + γα < 0. (11.53)
Repeating the discussion as before, we can show that
ha, zi + γβ ≥ 0 > ha, z̃i + γ β̃
for every (z, β) ∈ cl cone co φ∗ and (z̃, β̃) ∈ Fe. For any ξ ∈ dom φ∗ ,
(ξ, φ∗ (ξ)) ∈ cl cone co epi φ∗ , which by the above inequality implies that
ha, ξi + γφ∗ (ξ) ≥ 0, ∀ ξ ∈ dom φ∗ . (11.54)
Because φ is lsc, by Theorem 2.105, φ = φ∗∗ . Therefore, by the conditions

(11.53) and (11.54),
−a −a −a
φ( ) = φ∗∗ ( ) = sup {hξ, i − φ∗ (ξ)} ≤ 0,
γ γ ξ∈Rn γ
−a
which implies that ∈ F . Again using the condition (11.53),
γ
−a −a −a
δF∗ ( ) = σF ( ) ≥ hξ, i > α,
γ γ γ
/ epi δF∗ , thereby establishing the desired result.

which implies (ξ, α) ∈
Now we move on to derive the optimality conditions in epigraphical form.
Similar results have been studied in the form of generalized Farkas’ Lemma
in Dinh, Goberna, and López [31] and Dinh, Goberna, López, and Son [32].


(SIP ). Then x̄ is a point of minimizer of (SIP ) if and only if
[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi g ∗ (., i). (11.55)
i∈I
Proof. Suppose that x̄ is a point of minimizer of (SIP ) and hence of the

following unconstrained problem
inf f (x) + δCI (x) subject to x ∈ Rn .
Therefore, by Theorem 2.89,
0 ∈ ∂(f + δCI )(x̄),
which by Theorem 2.108 and the fact that x̄ ∈ CI implies that
f (x̄) + (f + δCI )∗ (0) = h0, x̄i = 0.
Therefore, the above condition leads to
(0, −f (x̄)) ∈ epi (f + δCI )∗ .
As dom f = Rn , by Theorem 2.69, f is continuous over Rn . Thus, by Propo-

sition 2.124,
(0, −f (x̄)) ∈ epi f ∗ + epi δC
∗
I
. (11.56)
Define the supremum function g(x) = supi∈I g(x, i), which implies that
CI = {x ∈ Rn : g(x) ≤ 0}.
Because x̄ ∈ CI , CI is nonempty. Invoking Proposition 11.17, the condition

(11.56) yields
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi g ∗ .
Applying Theorem 2.123 to the above relation leads to

[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi g ∗ (., i),
i∈I
thereby leading to the desired condition.

Conversely, suppose that the epigraphcal condition (11.55) is satisfied,
which implies that there exist ξ ∈ dom f ∗ , α ≥ 0, λki ≥ 0, ξik ∈ dom g ∗ (., i)
and αik ≥ 0 for i ∈ I such that
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λki (ξik , g ∗ (ξik , i) + αik ).
k→∞
i∈I

S
As cone co i∈I epi g ∗ (., i) ⊂ Rn+1 , by the Carathéodory Theorem, Theo-
rem 2.8, any element
S in the convex cone can be expressed as a sum of n + 2
elements from i∈I epi g ∗ (., i). Therefore, the above condition becomes
n+2
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λkij (ξikj , g ∗ (ξikj , ij ) + αikj ),
k→∞
j=1
where ij ∈ I, j = 1, 2, . . . , n + 2. Componentwise comparison leads to

n+2
X
0 = ξ + lim λkij ξikj , (11.57)
k→∞
j=1
n+2
X
−f (x̄) = f ∗ (ξ) + α + lim λkij (g ∗ (ξikj , ij ) + αikj ). (11.58)
k→∞
j=1
By Definition 2.101 of the conjugate function, condition (11.58) yields

n+2
X
f (x̄) − f (x) ≤ −hξ, xi − α − lim λkij (g ∗ (ξikj , ij ) + αikj )
k→∞
i=1
n+2
X
≤ −hξ, xi − α − lim λkij (hξikj , xi − g(ξikj , ij ) + αikj ), ∀ x ∈ Rn .
k→∞
i=1
Using condition (11.57), for every x ∈ CI , the above inequality leads to

n+2
X
f (x̄) − f (x) ≤ −α − lim λkij αikj ,
k→∞
i=1
which by the nonnegativity of α and αikj , j = 1, 2, . . . , n + 2, yields
f (x̄) ≤ f (x), ∀ x ∈ CI .
Thus, x̄ is a point of minimizer of (SIP ), thereby establishing the result.

We end this chapter by presenting the KKT optimality condition for (SIP )
from Dinh, Mordukhovich, and Nghia [33]. But before that we define the set
of active constraint multipliers as
[I]
A(x̄) = {λ ∈ R+ : λi g(x̄, i) = 0, ∀ i ∈ supp λ}.

(SIP ). Assume that the closed cone constraint qualification holds. Then x̄
is a point of minimizer of (SIP ) if and only if there exists λ ∈ A(x̄) such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i).
i∈supp λ

Proof. By Theorem 11.18, x̄ is a point of minimizer of (SIP ) if and only

if condition (11.55) is satisfied. As the closed cone constraint qualification is
satisfied, (11.55) reduces to
[
(0, −f (x̄)) ∈ epi f ∗ + cone co epi g ∗ (., i).
i∈I
By Theorem 2.122, there exist ξ ∈ ∂ε f (x̄), ε ≥ 0, λi ≥ 0, ξi ∈ ∂εi g(x̄, i) and

εi ≥ 0, i ∈ I such that
X
(0, −f (x̄)) = (ξ, hξ, x̄i − f (x̄) + ε) + λi (ξi , hξi , x̄i − g(x̄, i) + εi ).
i∈I

X
0 = ξ+ λi ξi , (11.59)
i∈I
X
−f (x̄) = (hξ, x̄i − f (x̄) + ε) + λi (hξi , x̄i − g(x̄, i) + εi ). (11.60)
i∈I
Using the condition (11.59), (11.60) reduces to

X
0=ε+ λi (−g(x̄, i) + εi ).
i∈I
The above condition along with the fact that x̄ ∈ CI , that is, g(x̄, i) ≤ 0, i ∈ I
and the nonnegativity of ε, εi and λi , i ∈ I, implies that
ε = 0, λi εi = 0 and λi g(x̄, i) = 0, i ∈ I.
Thus, for i ∈ supp λ, εi = 0 and λ ∈ A(x̄). Therefore, ξ ∈ ∂f (x̄) and

ξi ∈ ∂g(x̄, i), i ∈ supp λ satisfying
X
0=ξ+ λi ξi .
i∈supp λ
Therefore, for λ ∈ A(x̄),

X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i), (11.61)
i∈supp λ
thereby yielding the KKT optimality condition for (SIP ).

Conversely, suppose that (11.61) holds, which implies that there exist
ξ ∈ ∂f (x̄) and ξi ∈ ∂g(x̄, i), i ∈ supp λ such that
X
0=ξ+ ξi . (11.62)
i∈supp λ

By Definition 2.77 of the subdifferential, for every x ∈ Rn ,
f (x) ≥ f (x̄) + hξ, x − x̄i,

g(x, i) ≥ g(x̄, i) + hξ, x − x̄i, i ∈ supp λ,
which along with the condition (11.62) implies that

X X
f (x) + λi g(x, i) ≥ f (x̄) + λi g(x̄, i), ∀ x ∈ Rn .
i∈supp λ i∈supp λ
As λ ∈ A(x̄), λi g(x̄, i) = 0 for i ∈ supp λ, which for every x ∈ CI reduces the

above inequality to
X
f (x) ≥ f (x) + λi g(x, i) ≥ f (x̄), ∀ x ∈ CI .
i∈supp λ
Therefore, x̄ is a point of minimizer of (SIP ), hence completing the proof.

Chapter 12
Convexity in Nonconvex Optimization
12.1 Introduction
This is the final chapter of the book. What we want to discuss here is essen-
tially outside the preview of convex optimization. Yet as we will see, convexity
will play a fundamental role in the issues discussed. We will discuss here two
major areas in nonconvex optimization, namely maximization of a convex
function and minimization of a d.c. function. The acronym d.c. stands for
difference convex function, that is, functions expressed as the difference of
two convex functions. Thus, more precisely, we would look into the following
problems:
max f (x) subject to x∈C (P 1)
and min f (x) − g(x) subject to x ∈ C, (P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. A large
class of nonconvex optimization problems actually come into this setting. Note
that (P 1) can be posed as
min − f (x) subject to x∈C
and thus as
min φ(x) − f (x) subject to x ∈ C,
where φ is the zero function. Thus the problem (P 1) can also be viewed as
a special case of (P 2), though we will consider them separately for a better
understanding.
12.2 Maximization of a Convex Function

The problem of maximizing a convex function over a convex set is a complete
paradigm shift from that of minimization of a convex function over a convex
403

404 Convexity in Nonconvex Optimization
set. The problem of maximization of a convex function is, in fact, a hard

nonconvex minimization problem. One of the early results in this direction
appears in the classic text of Rockafellar [97] and we will mention a few of
them here in order to motivate the reader. The first point that the reader
should note is that local maxima of a convex function need not be global
maxima. We leave it to the reader to create some examples that bring out
this fact. The following result is given in Rockafellar [97]. We will not provide
any proof. See Rockafellar [97] for the proof.
Theorem 12.1 Consider a convex function f : Rn → R and a convex set

C ⊂ Rn . If f attains its supremum relative to C at some point in ri C, then
f is constant on C.
The above theorem says that if f is a nonconstant convex function and if

it attains its supremum on C, then it must be attained at the boundary. Of
course the more interesting question is when does the convex function actually
attains its maximum? In this respect, one has the following interesting result
from [97] where the set C is assumed to be polyhedral.
Theorem 12.2 Consider a convex function f : Rn → R and a convex set

C ⊂ Rn that is polyhedral. Suppose that there are no half lines in C on which
f is unbounded above. Then f attains its supremum over C.
For more general results, see [97]. One of the earliest papers dealing exclu-
sively with the optimality conditions of maximizing a convex function over a
convex set is due to Strekalovskii [104]. Though Strekalovskii [104] frames his
problem in a general setting, his results are essentially useful for the convex
case and the main results in his paper are given only for the convex case.
Observe that if f : Rn → R is a convex function and x̄ ∈ C is the point
where f attains a global maximum, then for every x ∈ C,
0 ≥ f (x) − f (x̄) ≥ hξ, x − x̄i, ∀ ξ ∈ ∂f (x̄),
which implies
hξ, x − x̄i ≤ 0, ∀ ξ ∈ ∂f (x̄), ∀ x ∈ C.
Thus the necessary condition is ∂f (x̄) ⊂ NC (x̄). The reader should try to find
a necessary condition when x̄ is a local maximum. Can we find a sufficient
condition for a global maximum? Strekalovskii [104] attempts to answer this
question by developing a set of necessary and sufficient conditions.
Theorem 12.3 Consider the problem of maximizing a convex function f over

a closed convex set C. Assume that x̄ ∈ C is a point such that
−∞ < infn f (x) < f (x̄) < +∞

x∈R

12.2 Maximization of a Convex Function 405
and the set
C̄ = {x ∈ Rn : f (x) ≤ f (x̄)}
is compact having a nonempty interior, that is, int C̄ 6= ∅. Then x̄ ∈ C is a

global maximum of f on C if and only if
(a) for every x∗ ∈ ∂f (x̄), hx∗ , x − x̄i ≤ 0, ∀ x ∈ C or,

(b) for every y ∗ ∈ S(f, x̄), hy ∗ , x − x̄i ≤ 1, ∀ x ∈ C where
S(f, x̄) = {y ∗ ∈ Rn : ∃ y ∈ Rn , y 6= x̄, f (y) = f (x̄) and

∃ α > 0, αy ∗ ∈ ∂f (y) satisfying hy ∗ , y − x̄i = 1}.
Proof. We will only prove (a) and leave (b) to the readers. If x̄ is a global
maximum, then our discussion preceding the theorem shows that (a) holds,
that the condition in (a) is necessary. Now we will look into the reverse, that
is, whether (a) is sufficient for a global maximum or not. Observe that under
the given hypothesis, for every x∗ ∈ ∂f (x̄),
hx∗ , x − x̄i ≤ 0, ∀ x ∈ C.
As dom f = Rn , by Theorem 2.69, f is a continuous convex function, thus

the set C̄ is closed and convex. Also, from the above inequality,
cone ∂f (x̄) ⊂ NC (x̄).
Further, as C̄ has a nonempty interior, there exists x̂ such that f (x̂) < f (x̄).
Hence
NC̄ (x̄) = {λξ : λ ≥ 0, ξ ∈ ∂f (x̄)}.
Thus, NC̄ (x̄) = cone ∂f (x̄). This shows that NC̄ (x̄) ⊂ NC (x̄), which implies
that C ⊂ C̄. Hence x̄ is the point where the maximum is achieved as x̄ is
already given to be an element of C.
It is important to note that without the additional conditions,
∂f (x̄) ⊂ NC (x̄) does not render a global maximum. Here we put forward an
example from Dutta [38]. Consider f : R → R defined as
f (x) = max{x2 , x}.
Now suppose that we want to maximize f over C = [−1, 0]. Consider x̄ = 0.

Thus NC (x̄) = R+ = {x ∈ R : x ≥ 0}. Observe that ∂f (0) = [0, 1]. Therefore,
∂f (0) ⊂ NC (0). However, x̄ = 0 is a global minimizer of f over C and not a
global maximizer.
Strekalovskii refined the above result slightly to provide the following re-
sult. This appeared in [105].

Theorem 12.4 Consider a closed convex set C ⊂ Rn and let x̄ ∈ C. Assume

that
−∞ ≤ infn f (x) < f (x̄),

x∈R
where f : Rn → R is a convex function. Then x̄ ∈ C is a global maximum for

(P 1) if and only if
∂f (x) ⊂ NC (x), ∀ x ∈ Rn satisfying f (x) = f (x̄).
Readers are requested to have a look at the difference between

Strekalovskii’s result in Theorem 12.3 and this result. Though the above re-
sult is elegant, it suffers from a drawback, that is, one needs to calculate
NC (x) for every x ∈ Rn satisfying f (x) = f (x̄). Now if x ∈
/ C, then tradi-
tionally we define NC (x) = ∅. However, for a convex function f : Rn → R,
∂f (x) 6= ∅ for every x ∈ Rn . This drawback was overcome by Hiriart-Urruty
and Ledyaev [61]. We now present their result but with a different approach
to the proof.
Theorem 12.5 Consider a convex function f : Rn → R and a closed convex

set C ⊂ R. Let x̄ ∈ C be such that
−∞ ≤ inf f (x) < f (x̄).

x∈C
Then x̄ ∈ C is a maximizer for (P 1) if and only if
∂f (x) ⊂ NC (x), ∀ x ∈ C satisfying f (x) = f (x̄).
Proof. If x̄ ∈ C is the global maximizer of the function f over C, then we

have already seen that ∂f (x̄) ⊂ NC (x̄). It is simple to see that if f (x) = f (x̄),
∂f (x) ⊂ NC (x). We leave this very simple proof to the reader.
Conversely, assume on the contrary that x̄ ∈ C is not a global maximizer
of (P 1). Therefore, there exists x̂ ∈ C such that f (x̂) > f (x̄). Consider the
following level set
S(x̄) = {x ∈ C : f (x) ≤ f (x̄)},
which is a closed convex set. It is clear that x̂ 6∈ S(x̄). Thus, the following
projection problem:
1
min kx − x̂k2 subject to f (x) ≤ f (x̄), x ∈ C
2
has a unique solution. Let x̃ ∈ C be that unique solution. Now using the Fritz
John optimality conditions for a convex optimization problem, Theorem 5.1,
there exist λ0 ≥ 0 and λ1 ≥ 0 with (λ0 , λ1 ) 6= (0, 0) such that
(i) 0 ∈ λ0 (x̃ − x̂) + λ1 ∂f (x̃) + NC (x̃),

12.2 Maximization of a Convex Function 407
(ii) λ1 (f (x̃) − f (x̄)) = 0.

Assume that λ0 = 0, which implies λ1 > 0. Thus the above conditions reduce
to
0 ∈ λ1 ∂f (x̃) + NC (x̃) and f (x̃) = f (x̄).
The condition 0 ∈ λ1 ∂f (x̃) + NC (x̃) leads to the expression
0 ∈ ∂f (x̃) + NC (x̃).
This is obtained by dividing both sides by λ1 and noting that NC (x̃) is a cone.
As f is convex, invoking Theorem 3.1, x̃ ∈ C is a point of minimizer of f over
C, that is,
f (x̃) = inf f (x).

x∈C
The condition f (x̃) = f (x̄) along with the given hypothesis yields that
−∞ ≤ inf f (x) < f (x̃),

x∈C
thereby contradicting the fact that x̃ is the point of minimizer of f over C.

Hence λ0 > 0. Now assume that λ1 = 0. Therefore, the facts that λ0 > 0 and
NC (x̃) is a cone yield that
0 ∈ (x̃ − x̂) + NC (x̃),
that is,
x̂ − x̃ ∈ NC (x̃).
Because x̂ ∈ C,
0 ≥ hx̂ − x̃, x̂ − x̃i = kx̂ − x̃k2 ,
implying that x̃ = x̂. This is indeed a contradiction. Hence λ1 > 0. Thus there
exist ξ ∈ ∂f (x̃) and η ∈ NC (x̃) such that
0 = λ0 (x̃ − x̂) + λ1 ξ + η. (12.1)
As f (x̃) = f (x̄), by the given hypothesis, ∂f (x̃) ⊂ NC (x̃), which implies
−hλ1 ξ, x̂ − x̃i ≥ 0. (12.2)
Further, it is simple to see that
−hη, x̂ − x̃i + λ0 kx̂ − x̃k2 > 0. (12.3)
The conditions (12.1), (12.2), and (12.3) lead to a contradiction, thereby es-
tablishing the result.

12.3 Minimization of d.c. Functions

In this section we will concentrate on deriving the optimality condition for
local and global minimization of a very important class of nonconvex problems.
These problems are the ones where the objective function can be expressed
as the difference of two convex functions. Such functions are referred to as
difference convex functions or d.c. functions. Thus we will concentrate on the
problem
min f (x) − g(x) subject to x∈C (P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. Note
that f (x) − g(x) need not be a convex function unless g is a linear or affine
function. So in general it is a nonconvex function. We begin by providing a
necessary optimality condition for a local optimal point.
Theorem 12.6 Consider the problem (P 2) and let x̄ be a local minimizer of

(P 2) where C = Rn . Then ∂f (x̄) ∩ ∂g(x̄) 6= ∅.
Proof. Let x̄ be a local minimum. As f − g is locally Lipschitz,
0 ∈ ∂ ◦ (f − g)(x̄).
For details, see Clarke [27] or Chapter 3. Hence, by the Sum Rule of the Clarke
subdifferential,
0 ∈ ∂ ◦ f (x̄) + ∂ ◦ (−g)(x̄).
Noting that ∂ ◦ f (x̄) = ∂f (x̄) and ∂ ◦ (−g)(x̄) = −∂ ◦ g(x̄) = −∂g(x̄), the above
condition becomes
0 ∈ ∂f (x̄) − ∂g(x̄).
This yields that
∂f (x̄) ∩ ∂g(x̄) 6= ∅.
We would again like to stress that for details on the Clarke subdifferential,
see Clarke [27].
Let us note that the above condition is only necessary and not sufficient.
Consider h(x) = f (x) − g(x), where f (x) = x2 and g(x) = |x|. At x̄ = 0,
∂f (0) = 0 and ∂g(0) = [−1, 1]. Thus,
∂f (0) ∩ ∂g(0) = {0}.
But it is clear that x̄ = 0 is not a local minimizer of h.

12.3 Minimization of d.c. Functions 409
Now let us see what happens if we consider C ⊂ Rn . In this case, one

would have
0 ∈ ∂ ◦ (f − g)(x̄) + NC (x̄),
(see Clarke [27] for more details). Hence,
0 ∈ ∂f (x̄) − ∂g(x̄) + NC (x̄).
Thus there exist ξf ∈ ∂f (x̄), ξg ∈ ∂g(x̄) and η ∈ NC (x̄) such that
ξg = ξf + η.
Thus, the optimality condition can now be stated as follows:

If x̄ is a local minimum for (P 2), there there exists ξg ∈ ∂g(x̄)
such that
ξg ∈ ∂f (x̄) + NC (x̄).
For C = Rn , if x̄ is a global minimum for (P 2),
f (x) − g(x) ≥ f (x̄) − g(x̄), ∀ x ∈ Rn .
Therefore,
f (x) − f (x̄) ≥ g(x) − g(x̄) ≥ hξg , x − x̄i, ∀ ξg ∈ ∂g(x̄),
thereby implying that
∂g(x̄) ⊂ ∂f (x̄).
Note that this is again a necessary condition and not sufficient. We urge the
reader to find an example demonstrating this fact.
We now present interesting and important necessary and sufficient optimal-
ity conditions for the global optimization of problem (P 2). Here the optimality
conditions will be expressed in terms of the ε-subdifferential. We present this
result as given in Bomze [15].
Theorem 12.7 Consider the problem (P 2) with C = Rn . Then x̄ ∈ Rn is a

global minimizer of (P 2) if and only if
∂ε g(x̄) ⊂ ∂ε f (x̄), ∀ ε > 0.
Proof. As x̄ ∈ Rn is a global minimizer of (f − g) over Rn ,
f (x) − f (x̄) ≥ g(x) − g(x̄), ∀ x ∈ Rn .
If ξ ∈ ∂ε g(x̄) for any ε > 0,
f (x) − f (x̄) ≥ g(x) − g(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

thereby implying that ξ ∈ ∂ε f (x̄). Because ε > 0 was arbitrary, this establishes
that
∂ε g(x̄) ⊂ ∂ε f (x̄), ∀ ε > 0.
Let us now look at the converse. On the contrary, assume that x̄ is not a
global minimizer of (P 2), which implies that there exists x̂ ∈ Rn such that
f (x̂) − g(x̂) < f (x̄) − g(x̄).
This yields that
f (x̄) − f (x̂) − g(x̄) + g(x̂) > 0.
Set δ = (1/2)(f (x̄) − f (x̂) − g(x̄) + g(x̂)). It is simple to see that δ > 0.
Now consider ξˆ ∈ ∂g(x̂), which implies that
ˆ x̄ − x̂i ≥ 0.
g(x̄) − g(x̂) − hξ,
Because δ > 0,
ˆ x̄ − x̂i + δ > 0.
g(x̄) − g(x̂) − hξ,
ˆ x̄ − x̂i + δ. Then for any x ∈ Rn ,
Set ε = g(x̄) − g(x̂) − hξ,
ˆ x − x̄i − ε
hξ, ˆ x − x̂ + x̂ − x̄i − ε
= hξ,
ˆ x − x̂i − δ + g(x̂) − g(x̄).
= hξ,
As ξˆ ∈ ∂g(x̂), it is clear that ξˆ ∈ ∂δ g(x̂), which leads to

ˆ x − x̂i − δ + g(x̂) ≤ g(x).
hξ,
Thus,
ˆ x − x̄i − ε ≤ g(x) − g(x̄), ∀ x ∈ Rn ,
hξ,
thereby implying that ξˆ ∈ ∂ε g(x̄). By the given hypothesis, ξˆ ∈ ∂ε f (x̄). There-

fore, in particular for x = x̂,
ˆ x̂ − x̄i − ε.
f (x̂) − f (x̄) ≥ hξ,
Now
2δ = f (x̄) − f (x̂) − (g(x̄) − g(x̂))

≤ ε − hξ,ˆ x̂ − x̄i − (g(x̄) − g(x̂)).
The way in which ε is defined leads to

ˆ x̂ − x̄i.
ε − (g(x̄) − g(x̂)) = δ + hξ,

12.3 Minimization of d.c. Functions 411
Hence,
ˆ x̂ − x̄i − hξ,
2δ ≤ δ + hξ, ˆ x̂ − x̄i = δ < 2δ,
which is a contradiction. Thus, x̄ is indeed a global solution for (P 2).

Note that the above result also holds true if we assume f : Rn → R∪{+∞}
and g : Rn → R. In that case, one just has to assume that x̄ ∈ dom f . The
reader is encouraged to sketch the proof for such a scenario. However, we
present the result here for the sake of convenience.
Theorem 12.8 Consider the problem (P 2) with C = Rn and a lower semi-

continuous convex function f : Rn → R ∪ {+∞} with dom f 6= ∅. Let
x̄ ∈ dom f . Then x̄ is a global minimum for (P 2) if and only if
∂ε g(x̄) ⊂ ∂ε f (x̄), ∀ ε > 0.
Using the above result, one can deduce an optimality conditions for the case
when C ⊂ Rn and f : Rn → R. Observe that when C ⊂ Rn and f : Rn → R,
the problem (P 2) can be equivalently written as
min (f + δC )(x) − g(x) subject to x ∈ Rn .
Hence, x̄ is a global minimum for (P 2) if and only if
∂ε g(x̄) ⊂ ∂ε (f + δC )(x̄), ∀ ε > 0.
This is done of course by applying Theorem 12.8. Invoking the Sum Rule of
ε-subdifferential, Theorem 2.115,
[
∂ε (f + δC )(x̄) = (∂ε1 f (x̄) + ∂ε2 δC (x̄)).
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
Hence,
[
∂ε g(x̄) ⊂ (∂ε1 f (x̄) + Nε2 ,C (x̄)), ∀ ε > 0.
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε
We just recall that ∂ε2 δC (x̄) = Nε2 ,C (x̄) for any ε2 ≥ 0.

The result Theorem 12.8 can also be used to deduce necessary and sufficient
optimality conditions for the problem (P 1).
Corollary 12.9 Consider the problem (P 1). Assume that C ⊂ Rn is a closed

convex set. The x̄ ∈ C is a global maximum for (P 1) if and only if
∂ε f (x̄) ⊂ Nε,C (x̄), ∀ ε > 0.

Proof. Observe that the problem (P 1) can be written as
min − f (x) subject to x ∈ C.
A further equivalent version can be given by
min (δC + f )(x) subject to x ∈ Rn .
Using Theorem 12.8, the optimality condition is
∂ε f (x̄) ⊂ ∂ε δC (x̄), ∀ ε > 0,
that is,
∂ε f (x̄) ⊂ Nε,C (x̄), ∀ ε > 0,

We end our discussion and the book here. However for more details on the
use of the above results, see for example Bomze [15], Hiriart-Urruty [60], and
the references therein.

Bibliography
[1] F. A. Al-Khayyal and J. Kyparisis. Finite convergence of algorithms

for nonlinear programs and variational inequalities. J. Optim. Theory
Appl., 70:319–332, 1991.
[2] H. Attouch and H. Brézis. Duality for the sum of convex functions in
general banach spaces. In Aspects of Mathematics and its Applications,
pages 125–133. Amsterdam, 1986.
[3] H. Attouch, G. Buttazzo, and G. Michaille. Variational Analysis
in Sobolev and BV Spaces: Applications to PDEs and Optimization.
MPS/SIAM Series on Optimization, SIAM, Philadelphia, PA; MPS,
Philadelphia, PA, 2006.
[4] J.-P. Aubin and I. Ekeland. Applied Nonlinear Analysis. Wiley-
Interscience, New York, 1984.
[5] A. Auslender. Optimisation. Méthodes numériques. Maı̂trise de Math-
matiques et Applications Fondamentales. Masson, Paris-New York-
Barcelona, 1976.
[6] A. Ben-Tal and A. Ben-Israel. Characterizations of optimality in convex
programming: The nondifferentiable case. Applicable Anal., 9:137–156,
1979.
[7] A. Ben-Tal, A. Ben-Israel, and S. Zlobec. Characterization of optimality
in convex programming without a constraint qualification. J. Optim.
Theory Appl., 20:417–437, 1976.
[8] A. Ben-Tal and A. Nemirovskii. Lectures on Modern Convex Optimiza-
tion: Analysis, Algorithms, and Engineering Applications. MPS/SIAM
Series on Optimization, SIAM, Philadelphia, PA, 2001.
[9] A. Ben-Tal, E. E. Rosinger, and A. Ben-Israel. A Helly-type theorem and
semi-infinite programming. In Constructive Approaches to Mathemati-
cal Models, pages 127–135. Academic Press, New York-London-Toronto,
Ont., 1979.
[10] C. Berge. Topological Spaces. Including a Treatment of Multi-Valued
Functions, Vector Spaces and Convexity. Dover Publications, Inc., Mi-
neola, NY, 1997.
413

414 Bibliography
[11] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont,

MA, 1999.
[12] D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific,
Belmont, MA, 2003.
[13] D. P. Bertsekas and A. E. Ozdaglar. Pseudonormality and Lagrange
multipler theory for constrained optimization. J. Optim. Theory Appl.,
114:287–343, 2002.
[14] D. P. Bertsekas, A. E. Ozdaglar, and P. Tseng. Enhanced Fritz John
for convex programming. SIAM J. Optim., 16:766–797, 2006.
[15] I. M. Bomze. Global optimization: A quadratic programming perspec-
tive. In Nolinear Optimization, Lecture Notes in Math., volume 1989.
Springer-Verlag, Berlin, 2010.
[16] J. M. Borwein. Direct theorems in semi-infinite convex programming.
Math. Programming, 21:301–318, 1981.
[17] J. M. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Opti-
mization: Theory and Examples. CMS Books in Mathematics, Springer-
Verlag, New York, 2000.
[18] J. M. Borwein and Q. J. Zhu. Techniques of Variational Analysis.
Springer, New York, 2005.
[19] A. Brøndsted and R. T. Rockafellar. On the subdifferentiability of con-
vex functions. Proc. Am. Math. Soc., 16:605–611, 1965.
[20] R. S. Burachik and V. Jeyakumar. A dual condition for the convex
subdifferential sum formula with applications. J. Convex Anal., 12:279–
290, 2005.
[21] R. S. Burachik and V. Jeyakumar. A new geometric condition for
Fenchel’s duality in infinite dimensional spaces. Math. Programming,
104:229–233, 2005.
[22] J. V. Burke and S. Deng. Weak sharp minima revisited. Part I: Basic
theory. Control Cybernetics, 31:439–469, 2002.
[23] J. V. Burke and S. Deng. Weak sharp minima revisited. Part II: Ap-
plication to linear regularity and error bounds. Math. Programming,
104:235–261, 2005.
[24] J. V. Burke and S. Deng. Weak sharp minima revisited Part III: Error
bounds for differentiable convex inclusions. Math. Programming, 116:37–
56, 2009.
[25] J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical
programming. SIAM J. Control Optim., 31:1340–1359, 1993.

Bibliography 415
[26] E. W. Cheney. Introduction to Approximation Theory. McGraw-Hill,

New York, 1966.
[27] F. H. Clarke. Optimization and Nonsmooth Analysis. Wiley Interscience,

New York, 1983.
[28] F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski. Non-

smooth Analysis and Control Theory, volume 178: Graduate Texts in
Mathematics. Springer-Verlag, New York.
[29] L. Cromme. Strong uniqueness. Numer. Math., 29:179–193, 1978.
[30] V. F. Demyanov and A. M. Rubinov. Constructive Nonsmooth Analysis.

Approximation & Optimization. 7: Peter Lang, Frankfurt am Main,
1995.
[31] N. Dinh, M. A. Goberna, and M. A. López. From linear to convex

systems: Consistency, Farkas’ lemma and applications. J. Convex Anal.,
13:279–290, 2006.
[32] N. Dinh, M. A. Goberna, M. A. López, and T. Q. Son. New Farkas-

type constraint qualifications in convex infinite programmming. ESAIM
Control Optim. Calc. Var., 13:580–597, 2007.
[33] N. Dinh, B. S. Mordukhovich, and T. T. A. Nghia. Subdifferentials of

value functions and optimality conditions for DC and bilevel infinite and
semi-infinite programs. Math. Programming, 123:101–138, 2010.
[34] N. Dinh, T. T. A. Nghia, and G. Vallet. A closedness condition and

its application to DC programs with convex constraints. Optimization,
59:541–560, 2010.
[35] R. J. Duffin. Convex analysis treated by linear programming. Math.

Programming, 4:125–143, 1973.
[36] J. Dutta. Generalized derivatives and nonsmooth optimization, a finite

dimensional tour. Top, 13:185–314, 2005.
[37] J. Dutta. Necessary optimality conditions and saddle points for approx-
imate optimization in Banach spaces. Top, 13:127–144, 2005.
[38] J. Dutta. Optimality conditions for maximizing a locally Lipschitz func-

tion. Optimization, 54:377–389, 2005.
[39] J. Dutta, K. Deb, R. Arora, and R. Tulshyan. Approximate KKT points:

Theoretical and numerical study. 2010. Preprint.
[40] J. Dutta and C. S. Lalitha. Optimality conditions in convex optimization

revisited. 2010. Preprint.

416 Bibliography
[41] I. Ekeland. On the variational principle. J. Math. Anal. Appl., 47:324–

353, 1974.
[42] I. Ekeland. Nonconvex minimization problems. Bull. Am. Math. Soc.,
1:443–474, 1979.
[43] I. Ekeland and R. Temam. Convex Analysis and Variational Problems,
volume 1: Studies in Mathematics and its Applications. North-Holland
Publishing Co., Amsterdam-Oxford and American Elsevier Publishing
Co., Inc., New York, 1976.
[44] M. D. Fajardo and M. A. López. Locally Farkas-Minkowski systems in
convex semi-infinite programming. J. Optim. Theory Appl., 103:313–
335, 1999.
[45] W. Fenchel. On conjugate convex functions. Canadian J. Math., 1:73–
77, 1949.
[46] M. C. Ferris. Weak Sharp Minima and Penalty Functions in Mathemat-
ical Programming. University of Cambridge, Cambridge, 1988. Ph. D.
thesis.
[47] M. Florenzano and C. Le Van. Finite Dimensional Convexity and Op-
timization, volume 13: Studies in Economic Theory. Springer-Verlag,
Berlin, 2001.
[48] M. Fukushima. Equivalent differentiable optimization problems and de-
scent methods for asymmetric variational inequality problems. Math.
Programming, 53:99–110, 1992.
[49] M. A. Goberna, V. Jeyakumar, and M. A. López. Necessary and suffi-
cient constraint qualifications for solvability of systems of infinite convex
inequalities. Nonlinear Anal., 68:1184–1194, 2008.
[50] M. A. Goberna and M. A. López. Linear Semi-Infinite Optimization,
volume 2: Wiley Series in Mathematical Methods in Practice. John
Wiley & Sons, Ltd., Chichester, 1998.
[51] M. A. Goberna, M. A. López, and J. Pastor. Farkas-Minkowski systems
in semi-infinite programming. Appl. Math. Optim., 7:295–308, 1981.
[52] F. J. Gould and J. W. Tolle. A necessary and sufficient qualification for
constrained optimization. SIAM J. Appl. Math., 20:164–172, 1971.
[53] F. J. Gould and J. W. Tolle. Geometry of optimality conditions and
constraint qualifications. Math. Programming, 2:1–18, 1972.
[54] M. Guignard. Generalized Kuhn-Tucker conditions for mathematical
programming problems in a Banach space. SIAM J. Control, 7:232–
241, 1969.

Bibliography 417
[55] M. R. Hestenes. Optimization Theory: The Finite Dimensional Case.

Wiley, New York, 1975.
[56] R. Hettich. A review of numerical methods for semi-infinite optimiza-
tion. In Semi-Infinite Programming and Applications, Lecture Notes
in Econom. and Math. System, volume 215, pages 158–178. Springer-
Verlag, Berlin, 1983.
[57] R. Hettich and K. O. Kortanek. Semi-infinite programming: theory,
methods and applications. SIAM Review, 35:380–429, 1993.
[58] J.-B. Hiriart-Urruty. ε-Subdifferential Calculus. In Convex Analysis
and Optimization, pages 43–92. Pitman, London, 1982.
[59] J.-B. Hiriart-Urruty. What conditions are satisfied at points minimizing
the maximum of a finite number of differentiable functions? In Non-
smooth Optimization: Methods and Applications (Erice, 1991), pages
166–174. Gordon and Breach, Montreux, 1992.
[60] J. B. Hiriart-Urruty. Global optimality conditions for maximizing a
convex quadratic function under convex quadratic constraints. J. Global
Optim., 21:445–455, 2001.
[61] J. B. Hiriart-Urruty and Y. S. Ledyaev. A note on the characterization
of the global maxima of a (tangentailly) convex function over a convex
set. J. Convex Anal., 3:55–61, 1996.
[62] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Min-
imization Algorithms I & II, volume 306: Fundamental Principles of
Mathematical Sciences. Springer-Verlag, Berlin, 1993.
[63] J.-B. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex
Analysis. Grundlehren Text Editions, Springer-Verlag, Berlin, 2001.
[64] J. B. Hiriart-Urruty and R. R. Phelps. Subdifferential calculus using
ε-subdifferentials. J. Funct. Anal., 118:154–166, 1993.
[65] J. Y. Jaffray and J. Ch. Pomerol. A direct proof of the Kuhn-Tucker
necessary optimality theoren for convex and affine inequalities. SIAM
Review, 31:671–674, 1989.
[66] V. Jeyakumar. Asymptotic dual conditions characterizing optimality for
infinite convex programs. J. Optim. Theory Appl., 93:153–165, 1997.
[67] V. Jeyakumar. Characterizing set containments involving infinite convex
constraints and reverse-convex constraints. SIAM J. Optim., 13:947–
959, 2003.
[68] V. Jeyakumar, G. M. Lee, and N. Dinh. New sequential Lagrange mul-
tiplier conditions characterizing optimality without constraint qualifica-
tion for convex programs. SIAM J. Optim., 14:534–547, 2003.

418 Bibliography
[69] V. Jeyakumar and G. Y. Li. Farkas’ lemma for separable sublinear

inequalities without qualifications. Optim. Lett., 3:537–545, 2009.
[70] V. Jeyakumar, A. M. Rubinov, B. M. Glover, and Y. Ishizuka. Inequality

systems and global optimization. J. Math. Anal. Appl., 202:900–919,
1996.
[71] F. John. Extremum problems with inequalities as subsidiary conditions.

In Studies and Essays Presented to R. Courant on His 60th Birthday,
pages 187–204. Interscience Publishers, Inc., New York, 1948.
[72] V. L. Klee. The critical set of a convex body. Am. J. Math., 75:178–188,
1953.
[73] H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Proceed-

ings of the Second Berkeley Symposium on Mathematical Statistics and
Probability, 1950, pages 481–492. University of California Press, Berke-
ley and Los Angeles, 1951.
[74] J. B. Lasserre. On representation of the feasible set in convex optimiza-

tion. Optim. Lett., 4:1–5, 2010.
[75] M. A. López and E. Vercher. Optimality conditions for nondifferen-

tiable convex semi-infinite programming. Math. Programming, 27:307–
319, 1983.
[76] P. Loridan. Necessary conditions for ε-optimality. Math. Programming

Stud., 19:140–152, 1982.
[77] P. Loridan and J. Morgan. Penalty function in ε-programming and

ε-minimax problems. Math. Programming, 26:213–231, 1983.
[78] D. T. Luc, N. X. Tan, and P. N. Tinh. Convex vector functions and

their subdifferential. Acta Math. Vietnam., 23:107–127, 1998.
[79] R. Lucchetti. Convexity and Well-Posed Problems. Springer Science +

Business Media, Inc., New York, 2006.
[80] D. G. Luenberger. Optimization by Vector Space Methods. John Wiley

& Sons, Inc., New York, 1968.
[81] T. L. Magnanti. Fenchel and Lagrange duality are equivalent. Math.

Programming, 7:253–258, 1974.
[82] O. L. Mangasarian. Nonlinear Programming. McGraw-Hill Book Com-

pany, New York, 1969.
[83] E. J. McShane. The Lagrange multiplier rule. Am. Math. Monthly,

80:922–925, 1973.

Bibliography 419
[84] B. S. Mordukhovich. Approximation and maximum principle for non-

smooth problems of optimal control. Russian Math. Surveys, 196:263–
264, 1977.
[85] B. S. Mordukhovich. Metric approximations and necessary optimality

conditions for general class of nonsmooth extremal problems. Soviet
Math. Doklady, 22:526–530, 1980.
[86] B. S. Mordukhovich. Variational Analysis and Generalized Differentia-

tion I & II. Springer-Verlag, Berlin, 2006.
[87] J. J. Moreau. Fonctions convexes en dualite. Seminaire de Mathema-

tiques de la Faculte des Sciences de Montpellier, (1), 1962.
[88] J. J. Moreau. Convexity and duality. In Functional Analysis and Opti-

mization, pages 145–169. Academic Press, New York, 1966.
[89] J. J. Moreau. Inf-convolution, sous-additivité, convexité des fonctions

numériques. J. Math. Pures Appl., 49:109–154, 1970.
[90] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic

Course. In Applied Optimization, volume 87. Kluwer Academic Publish-
ers, 2004.
[91] A. E. Ozdaglar. Pseudonormality and a Lagrange Multiplier Theory for

Constrained Optimization. Mass. Institute of Technology, Cambridge,
MA, 2003. Ph. D. thesis.
[92] A. E. Ozdaglar and D. P. Bertsekas. The relation between pseudonor-

mality and quasiregularity in constrained optimization. Optim. Methods
Softw., 19:493–506, 2004.
[93] R. R. Phelps. Convex Functions, Monotone Operators and Differen-

tiability, volume 1364: Lecture Notes in Mathematics. Springer-Verlag,
Berlin.
[94] B. T. Polyak. Sharp Minima. In Institute of Control Sciences Lecture

Notes, Moscow, 1979. Presented at the IIASA Workshop on Generalized
Lagrangians and Their Applications, IIASA, Laxenburg, Austria.
[95] B. T. Polyak. Introduction to Optimization. Optimmization Software,

Inc., Publications Division, New York, 1987.
[96] B. N. Pshenichnyi. Necessary Conditions for an Extremum, volume 4:

Pure and Applied Mathematics. Marcel Dekker, Inc., New York, 1971.
[97] R. T. Rockafellar. Convex Analysis, volume 28: Princeton Mathematical

Series. Princeton University Press, Princeton, NJ, 1970.

420 Bibliography
[98] R. T. Rockafellar. Some convex programs whose duals are linearly con-
strained. In Nonlinear Programming (Proc. Sympos., Univ. of Wiscon-
sin, Madison, Wis., 1970), pages 293–322. Academic Press, New York,
1970.
[99] R. T. Rockafellar. Conjugate Duality and Optimization. Society for

Industrial and Applied Mathematics, Philadelphia, 1974.
[100] R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review,

35:183–238, 1993.
[101] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume

317: Fundamental Principles of Mathematical Sciences. Springer-Verlag,
Berlin.
[102] A. Ruszczynski. Nonlinear Optimization. Princeton University Press,

Princeton, NJ, 2006.
[103] R. Schneider. Convex Bodies: The Brunn–Minkowski Theory, volume

44: Encyclopedia of Mathematics and its Applications. Cambridge Uni-
versity Press, Cambridge, 1993.
[104] A. S. Strekavolskiĭ. On the problem of the global extremum. Soviet

Math. Doklady, 292:1062–1066, 1987.
[105] A. S. Strekavolskiĭ. Search for the global maximum of a convex func-

tional on an admissible set. Comp. Math. Math. Phys., 33:349–363,
1993.
[106] J.-J. Strodiot, V. H. Nguyen, and N. Heukemes. ε-Optimal solutions

in nondifferentiable convex programming and some related questions.
Math. Programming, 25:307–328, 1983.
[107] T. Strömberg. The operation of infimal convolution. Dissertationes

Math., 352:1–61, 1996.
[108] L. Thibault. Sequential convex subdifferential calculus and sequential

Lagrange multipliers. SIAM J. Control Optim., 35:1434–1444, 1997.
[109] M. Valadier. Sous-diffrentiels d’une borne suprieure et d’une somme

continue de fonctions convexes. C. R. Acad. Sci. Paris, 268:39–42, 1969.
[110] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley & Sons,
Inc., New York, 1984.
[111] R.-J. Wets. Elementary constructive proofs of the theorems of Farkas,

Minkowski and Weyl. In Econimic Decision Making: Games, Econo-
metrics and Optimization, pages 427–432. Elsevier-Science, Amsterdam,
1990.

Bibliography 421
[112] H. Wolkowicz. Geometry of optimality conditions and constraint quali-

fications: The convex case. Math. Programming, 19:32–60, 1980.
[113] K. Yokoyama. ε-Optimality criteria for convex programming problems

via exact penalty functions. Math. Programming, 56:233–243, 1992.
[114] W. I. Zangwill. Non-linear programming via penalty functions. Manag.
Sci., 13:344–358, 1967.

Index
S-convex function, 161 canonically closed system, 389

ε-complementary slackness, 342 Carathéodory Theorem, 28
ε-feasible set, 338 Cauchy–Schwarz inequality, 4
ε-maximum solution, 348 CC qualification condition, 305
ε-minimum solution, 348 Chain Rule, 101, 164
ε-normal set, 123, 339 Clarke directional derivative, 320
ε-saddle point, 345 Clarke generalized gradient, 163, 289
ε-solution, 135, 337 Clarke Jacobian, 163
ε-subdifferential, 96, 122, 123, 338, Clarke subdifferential, 163, 289, 320
409 closed cone constraint qualification,
281, 300, 395
Abadie constraint qualification, 154, closed convex hull of function, 70
214, 270, 375 closed function, 10
abstract constraint, 157 closed graph theorem, 93
active constraint multipliers, 400 closed half space, 23, 44
active index set, 58, 104, 146, 213, closed map, 15
249, 255, 316, 367 closed-valued map, 15
affine combination, 31 closure of function, 10, 77, 79
affine function, 3 closure of set, 4, 31
affine hull, 31 coercive, 12, 351
affine set, 24 complementary slackness, 151, 172
affine support, 113 complementary violation condition,
almost ε-solution, 236, 338, 348, 350 212
approximate optimality conditions, concave function, 63
337 cone, 40, 243
approximate solutions, 337 cone constrained problem, 161, 162
approximate up to ε, 337 cone convex function, 162
cone generated by set, 40
badly behaved constraints, 255 conic combination, 40
biconjugate function, 114 conjugate function, 68, 111, 112, 114,
bilevel programming, 308 198
bipolar cone, 50 consequence relation, 383
blunt cone, 243 constraint qualification, 213
Bolzano–Weierstrass Theorem, 5 continuous, 6, 75
bounded sequence, 5 convergent sequence, 4
bounded-valued map, 15 convex analysis, 2, 23
Brønsted–Rockafellar Theorem, 128 convex combination, 25
423

424 Index
convex cone, 39, 40 Fenchel duality, 196

convex cone generated by set, 40 Fenchel–Young inequality, 116
convex function, 2, 3, 62, 286 finitely generated cone, 61
convex hull, 27 finitely generated set, 61
convex hull of function, 70 Fritz John optimality condition, 207
convex locally Farkas-Minkowski
problem, 378 gap function, 19
convex optimization, 2, 315 generalized Lagrange multiplier, 190
convex programming, 3 generators of cone, 61
convex set, 2, 3, 23 generators of set, 61
convex-valued map, 15 geometric optimality condition, 249,
core, 39 255
Gordan’s theorem of alternative, 379
d.c. function, 403, 408 gradient, 13
derivative, 13, 14, 85 graph, 15
direction of constancy, 244
direction of recession, 41 Helly’s Theorem, 49
direction sets, 243 Hessian, 14
directional derivative, 85, 320 hyperplane, 23, 44
distance function, 65
domain, 3, 62 improper function, 62, 75
dual problem, 170, 185, 235, 361 indicator function, 65, 123
duality, 170 Inf-Convolution Rule, 118
Dubovitskii–Milyutin Theorem, 252 infimal/inf-convolution, 68
Infimum Rule, 118
Ekeland’s variational principle, 127, inner product, 4
135, 355 interior, 4, 31
enhanced dual Fritz John condition,
235 Jacobian, 14
enhanced Fritz John optimality Jensen’s inequality, 64
condition, 207, 208
epigraph, 4, 62, 75, 136, 286 Karush–Kuhn–Tucker (KKT)
equality set, 255 optimality condition, 2, 151
error bound, 19 Karush–Kuhn–Tucker multiplier, 151
exact penalty function, 350
extended-valued function, 3 Lagrange multiplier, 1, 146, 151, 188
Lagrangian duality, 185
faithfully convex function, 244 Lagrangian function, 172, 238, 345
Farkas’ Lemma, 275, 398 Lagrangian regular point, 374, 375
Farkas–Minkowski (FM) constraint limit infimum of function, 6
qualification, 300 limit infimum of sequence, 5
Farkas–Minkowski (FM) system, 383 limit point, 5
Farkas–Minkowski qualification, 382, limit supremum of function, 6
383, 386 limit supremum of sequence, 5
feasible direction, 54 line segment principle, 32
feasible direction cone, 377 linear programming, 1, 25, 61, 327

Index 425
linear semi-infinite system, 383 polyhedral cone, 61

linearity criteria, 213, 221 polyhedral set, 25, 58, 60
Lipschitz constant, 82, 163 positive cone, 365
Lipschitz function, 2, 82, 320 positive polar cone, 53
locally bounded map, 15 positively homogeneous, 71
locally Lipschitz function, 82, 163 primal problem, 170
lower limit of function, 6 product space, 365
lower limit of sequence, 5 projection, 65
lower semicontinuous (lsc), 5 prolongation principle, 32
lower semicontinuous hull, 10 proper function, 3, 62, 75
lower-level set, 8 proper map, 15
proper separation, 45, 221
Mangasarian–Fromovitz constraint pseudonormality, 213, 220
qualification, 317
marginal function, 190 quasi ε-solution, 338, 355
max-function, 104, 159, 342 quasinormality, 215
Max-Function Rule, 106, 132 quasiregularity, 215
maximal monotone, 95
Mean Value Theorem, 14 Rademacher Theorem, 163
merit function, 19 recession cone, 41
metric approximation, 209 regular ε-solution, 338
minimax equality, 170 regular function, 320
minimax inequality, 170 regular point, 270
modified ε-KKT conditions, 358 regularization condition, 254
modified ε-KKT point, 358 relative interior, 31
modified Slater constraint relaxed ε-complementary slackness,
qualification, 176, 183 346, 358
monotone, 18 relaxed Slater constraint
multifunction, 93 qualification, 372
right scalar multiplication, 339
nonconvex optimization, 403
nondecreasing function, 100, 286 saddle point, 169, 170
nondegeneracy condition, 316, 321 saddle point condition, 170, 216
nonsmooth function, 13, 243, 320 Saddle Point Theorem, 171
nonsmooth optimization, 2 Scalar Product Rule, 131
norm, 4 second-order derivative, 14
normal cone, 40, 54, 57, 89 semi-infinite programming, 365
separable sublinear function, 274
open ball, 4 separating hyperplane, 44
open half space, 24, 44 separation theorem, 44, 45
orthogonal complement, 48 Sequential Chain Rule, 286, 290
sequential optimality conditions, 243,
parameter, 199 281, 291, 395
parameterized family, 199 Sequential Sum Rule, 282
penalty function, 209 set-valued map, 15, 93
polar cone, 40, 50 sharp minimum, 327

426 Index
Slater constraint qualification, 145,

146, 167, 172, 214, 254, 272,
302, 316, 339, 366
Slater-type constraint qualification,
157, 162, 213, 221, 236
slope inequality, 86
smooth function, 13, 243, 315
strict convex function, 63
strict epigraph, 64
strict separation, 45
strong duality, 186
strongly convex function, 19
strongly convex optimization, 19
strongly monotone, 20
strongly unique local minimum, 327
subadditive, 71
subdifferential, 89, 162
subdifferential calculus, 98
subgradient, 89, 162
sublinear function, 66, 71, 274, 320
subsequence, 5
Sum Rule, 98, 118, 129, 137, 163
sup-function approach, 366
support function, 66, 71, 72
support set, 113, 366
supporting hyperplane, 45
Supremum Rule, 118, 137
tangent cone, 40, 54

two-person-zero-sum game, 169
upper limit of function, 6

upper limit of sequence, 5
upper semicontinuous (usc), 6, 94
upper semicontinuous (usc) map, 15
Valadier Formula, 107

value function, 190, 197
value of game, 170
variational inequality, 17
weak duality, 186

weak sharp minimum, 327, 328
weakest constraint qualification, 270
Weierstrass Theorem, 12

Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Optimality Conditions in Convex Optimization - A Finite-Dimensional View (PDFDrive)

Uploaded by

Copyright:

Available Formats

OPTIMALITY CONDITIONS

© 2012 by Taylor & Francis Group, LLC

K13102_FM.indd 1 9/2/11 11:33 AM

© 2012 by Taylor & Francis Group, LLC

K13102_FM.indd 3 9/2/11 11:33 AM

No claim to original U.S. Government works

International Standard Book Number-13: 978-1-4398-6823-2 (eBook - PDF)

© 2012 by Taylor & Francis Group, LLC

© 2012 by Taylor & Francis Group, LLC

Symbol Description xiii

1 What Is Convex Optimization? 1

2 Tools for Convex Optimization 23

3 Basic Optimality Conditions Using the Normal Cone 143

© 2012 by Taylor & Francis Group, LLC

4 Saddle Points, Optimality, and Duality 169

5 Enhanced Fritz John Optimality Conditions 207

6 Optimality without Constraint Qualification 243

7 Sequential Optimality Conditions 281

8 Representation of the Feasible Set and KKT Conditions 315

9 Weak Sharp Minima in Convex Optimization 327

10 Approximate Optimality Conditions 337

© 2012 by Taylor & Francis Group, LLC

11 Convex Semi-Infinite Optimization 365

12 Convexity in Nonconvex Optimization 403

© 2012 by Taylor & Francis Group, LLC

1.1 Lower semicontinuous hull. . . . . . . . . . . . . . . . . . . . 11

2.1 Convex and nonconvex sets. . . . . . . . . . . . . . . . . . . . 24

3.1 NC (x̄) is not polyhedral. . . . . . . . . . . . . . . . . . . . . . 153

5.1 Pseudonormality. . . . . . . . . . . . . . . . . . . . . . . . . . 222

9.1 Pictorial representation of Theorem 9.6. . . . . . . . . . . . . 336

© 2012 by Taylor & Francis Group, LLC

∅ empty set k.k norm

© 2012 by Taylor & Francis Group, LLC

The roots of the mathematical topic of optimization go back to ancient Greece,

© 2012 by Taylor & Francis Group, LLC

optimization, combining very recent ideas of nonsmooth analysis with stan-

© 2012 by Taylor & Francis Group, LLC

This is a book on convex optimization. More precisely it is a book on the re-

© 2012 by Taylor & Francis Group, LLC

rique Martinez-Legaz also encouraged us to go ahead and write the book. We

© 2012 by Taylor & Francis Group, LLC

© 2012 by Taylor & Francis Group, LLC

1.2 Basic Concepts

where gi : Rn → R, i = 1, 2, . . . , m are convex functions and hj : Rn → R,

© 2012 by Taylor & Francis Group, LLC

j = 1, 2, . . . , l are affine functions. When C is expressed explicitly as above,

[x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1},

is also in C. A function φ : Rn → R is a convex function if for any x, y ∈ Rn

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),

while it is an affine function if it is a translate of a linear function; that is, φ

min f0 (x) subject to x ∈ Rn ,

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),

© 2012 by Taylor & Francis Group, LLC

epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.

A function is said to be convex if the epigraph is convex. We leave it as a simple

For any set F ⊂ Rn and any scalar λ ∈ R,

The closure of a set F is denoted by cl F while the interior is given by int F .

(y1 , y2 , . . . , yn ) in Rn , the inner

Proposition 1.1 (Cauchy–Schwarz Inequality) For any two vectors x, y ∈ Rn ,

|hx, yi| ≤ kxkkyk.

To discuss the concept of continuities of a function, we shall consider the

Definition 1.2 A sequence {xk ∈ R : k = 1, 2, . . .} or simply {xk } ⊂ R is

|xk − x̄| < ε, ∀ k ≥ kε .