You are on page 1of 445

OPTIMALITY CONDITIONS 

IN  
CONVEX OPTIMIZATION
A Finite-Dimensional View

© 2012 by Taylor & Francis Group, LLC

K13102_FM.indd 1 9/2/11 11:33 AM


OPTIMALITY CONDITIONS 
IN  
CONVEX OPTIMIZATION
A Finite-Dimensional View

Anulekha Dhara
Joydeep Dutta

© 2012 by Taylor & Francis Group, LLC

K13102_FM.indd 3 9/2/11 11:33 AM


CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2012 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works


Version Date: 20110831

International Standard Book Number-13: 978-1-4398-6823-2 (eBook - PDF)

This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a pho-
tocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

© 2012 by Taylor & Francis Group, LLC


In memory of
Professor M. C. Puri
and
Professor Alex M. Rubinov

© 2012 by Taylor & Francis Group, LLC


Contents

List of Figures xi

Symbol Description xiii

Foreword xv

Preface xvii

1 What Is Convex Optimization? 1


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Smooth Convex Optimization . . . . . . . . . . . . . . . . . 15

2 Tools for Convex Optimization 23


2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2.1 Convex Cones . . . . . . . . . . . . . . . . . . . . . . . 39
2.2.2 Hyperplane and Separation Theorems . . . . . . . . . 44
2.2.3 Polar Cones . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.4 Tangent and Normal Cones . . . . . . . . . . . . . . . 54
2.2.5 Polyhedral Sets . . . . . . . . . . . . . . . . . . . . . . 60
2.3 Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . 62
2.3.1 Sublinear and Support Functions . . . . . . . . . . . . 71
2.3.2 Continuity Property . . . . . . . . . . . . . . . . . . . 75
2.3.3 Differentiability Property . . . . . . . . . . . . . . . . 85
2.4 Subdifferential Calculus . . . . . . . . . . . . . . . . . . . . . 98
2.5 Conjugate Functions . . . . . . . . . . . . . . . . . . . . . . . 111
2.6 ε-Subdifferential . . . . . . . . . . . . . . . . . . . . . . . . . 122
2.7 Epigraphical Properties of Conjugate Functions . . . . . . . 136

3 Basic Optimality Conditions Using the Normal Cone 143


3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
3.2 Slater Constraint Qualification . . . . . . . . . . . . . . . . . 145
3.3 Abadie Constraint Qualification . . . . . . . . . . . . . . . . 154
3.4 Convex Problems with Abstract Constraints . . . . . . . . . 157
3.5 Max-Function Approach . . . . . . . . . . . . . . . . . . . . 159
3.6 Cone-Constrained Convex Programming . . . . . . . . . . . 161

vii

© 2012 by Taylor & Francis Group, LLC


viii Contents

4 Saddle Points, Optimality, and Duality 169


4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2 Basic Saddle Point Theorem . . . . . . . . . . . . . . . . . . 171
4.3 Affine Inequalities and Equalities and Saddle Point Condition 175
4.4 Lagrangian Duality . . . . . . . . . . . . . . . . . . . . . . . 185
4.5 Fenchel Duality . . . . . . . . . . . . . . . . . . . . . . . . . 196
4.6 Equivalence between Lagrangian and Fenchel Duality . . . . 200

5 Enhanced Fritz John Optimality Conditions 207


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
5.2 Enhanced Fritz John Conditions Using the Subdifferential . 208
5.3 Enhanced Fritz John Conditions under Restrictions . . . . . 216
5.4 Enhanced Fritz John Conditions in the Absence of Optimal
Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.5 Enhanced Dual Fritz John Optimality Conditions . . . . . . 235

6 Optimality without Constraint Qualification 243


6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
6.2 Geometric Optimality Condition: Smooth Case . . . . . . . . 249
6.3 Geometric Optimality Condition: Nonsmooth Case . . . . . . 255
6.4 Separable Sublinear Case . . . . . . . . . . . . . . . . . . . . 274

7 Sequential Optimality Conditions 281


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 281
7.2 Sequential Optimality: Thibault’s Approach . . . . . . . . . 282
7.3 Fenchel Conjugates and Constraint Qualification . . . . . . . 293
7.4 Applications to Bilevel Programming Problems . . . . . . . . 308

8 Representation of the Feasible Set and KKT Conditions 315


8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.2 Smooth Case . . . . . . . . . . . . . . . . . . . . . . . . . . . 315
8.3 Nonsmooth Case . . . . . . . . . . . . . . . . . . . . . . . . . 320

9 Weak Sharp Minima in Convex Optimization 327


9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 327
9.2 Weak Sharp Minima and Optimality . . . . . . . . . . . . . . 328

10 Approximate Optimality Conditions 337


10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
10.2 ε-Subdifferential Approach . . . . . . . . . . . . . . . . . . . 338
10.3 Max-Function Approach . . . . . . . . . . . . . . . . . . . . 342
10.4 ε-Saddle Point Approach . . . . . . . . . . . . . . . . . . . . 345
10.5 Exact Penalization Approach . . . . . . . . . . . . . . . . . . 350
10.6 Ekeland’s Variational Principle Approach . . . . . . . . . . . 355
10.7 Modified ε-KKT Conditions . . . . . . . . . . . . . . . . . . 358
10.8 Duality-Based Approach to ε-Optimality . . . . . . . . . . . 361

© 2012 by Taylor & Francis Group, LLC


Contents ix

11 Convex Semi-Infinite Optimization 365


11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 365
11.2 Sup-Function Approach . . . . . . . . . . . . . . . . . . . . . 366
11.3 Reduction Approach . . . . . . . . . . . . . . . . . . . . . . . 368
11.4 Lagrangian Regular Point . . . . . . . . . . . . . . . . . . . . 374
11.5 Farkas–Minkowski Linearization . . . . . . . . . . . . . . . . 382
11.6 Noncompact Scenario: An Alternate Approach . . . . . . . . 395

12 Convexity in Nonconvex Optimization 403


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
12.2 Maximization of a Convex Function . . . . . . . . . . . . . . 403
12.3 Minimization of d.c. Functions . . . . . . . . . . . . . . . . . 408

Bibliography 413

Index 423

© 2012 by Taylor & Francis Group, LLC


List of Figures

1.1 Lower semicontinuous hull. . . . . . . . . . . . . . . . . . . . 11


1.2 Graph of a real-valued differentiable convex function. . . . . . 16
1.3 Local minimizer is global minimizer. . . . . . . . . . . . . . . 18

2.1 Convex and nonconvex sets. . . . . . . . . . . . . . . . . . . . 24


2.2 F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 are
nonconvex. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Line segment principle. . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Tangent cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.5 Normal cone. . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.6 Graph and epigraph of convex function. . . . . . . . . . . . . 63
2.7 Epigraphs of improper functions φ1 and φ2 . . . . . . . . . . . 76
2.8 Graph of ∂(|.|). . . . . . . . . . . . . . . . . . . . . . . . . . . 126
2.9 Graph of ∂1 (|.|). . . . . . . . . . . . . . . . . . . . . . . . . . 126

3.1 NC (x̄) is not polyhedral. . . . . . . . . . . . . . . . . . . . . . 153


3.2 C ∩Y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

5.1 Pseudonormality. . . . . . . . . . . . . . . . . . . . . . . . . . 222


5.2 Not pseudonormal. . . . . . . . . . . . . . . . . . . . . . . . . 223

9.1 Pictorial representation of Theorem 9.6. . . . . . . . . . . . . 336

xi

© 2012 by Taylor & Francis Group, LLC


Symbol Description

∅ empty set k.k norm


∞ infinity φ(F ) image space of F under φ
N set of natural numbers gph Φ graph of set-valued map Φ
R real line dom φ effective domain of φ : X
R̄ R ∪ {−∞, +∞} → R̄
Rn n-dimensional Euclidean epi φ epigraph of φ
space lev≤α φ α-lower level set of φ
R+ nonnegative orthant of R δF indicator function to F
Rn+ nonnegative orthant of Rn dF distance function to F
[x, y] closed line segment joining projF (x̄) projection of x̄ to F
x and y σ( . ; F ) support function to F
(x, y) open line segment joining φ∗ conjugate function of φ
x and y Q φ+ (x) max{0, φ(x)}
RI product space I R ∇φ(x̄) derivative or gradient of φ
R[I] {λ ∈ RI : λi 6= 0 for finitely at x̄
many i ∈ I} φ◦ (x̄; d) Clarke directional derivative
[I]
R+ positive cone in R[I] of φ at x̄ in the direction d
supp λ {i ∈ I : λ ∈ R[I] , λi 6= 0} ∂φ
partial derivative of φ with
B open unit ball ∂x
Bδ (x̄) open ball with radius δ > 0 respect to x
and center at x̄ ∂2φ
second-order partial deriva-
cl F closure of F ∂xi ∂xj
co F convex hull of F tive of φ with respect to xi
cl co F closed convex hull of F and xj
af f F affine hull of F ∂φ(x̄) convex subdifferential of φ
int F interior of F at x̄
ri F relative interior of F ∂ǫ φ(x̄) ǫ-subdifferential of φ at x̄
cone F cone generated by F ∂ ◦ φ(x̄) Clarke subdifferential or
F+ positive polar cone of F generalized gradient of φ at
F◦ polar cone of F x̄
x → x̄ x converges to x̄ ∇2 φ(x̄) Hessian of φ at x̄
lim limit Jφ(x̄) Jacobian of φ at x̄
lim inf limit infimum TF (x̄) tangent cone to F at x̄
lim sup limit supremum NF (x̄) normal cone to F at x̄
h., .i inner product Nǫ,F (x̄) ǫ-normal set to F at x̄

xiii

© 2012 by Taylor & Francis Group, LLC


Foreword

The roots of the mathematical topic of optimization go back to ancient Greece,


when Euclid considered the minimal distance of a point to a line; convex
sets were investigated by Minkowski about a hundred years ago, and fifty
years ago, J.-J. Moreau [87] defined the notion of the subdifferential of a
convex function. In 1970, R.T. Rockafellar wrote his monograph [97] on convex
analysis. Since then, the field of convex optimization and convex analysis has
developed rapidly, a huge number of papers on that topic have been published
in scientific journals and a large number of monographs and textbooks have
been produced. Now, we have a new book at hand and one can ask why read
this book.
A recent topic of research in mathematical optimization is the need to
compute global optima of nonconvex problems. To do that, the problem can
be convexified using the optimal function value of the resulting convex opti-
mization problem as a bound for the problem investigated. Combining this
with an enumeration idea the problem can be solved. The same approach of
convexification plus enumeration can serve as a way to solve mixed-integer
nonlinear optimization problems which is a second challenging problem of re-
cent and future research. Moreover, many practical situations lead directly to
convex programming problems. Hence the need to develop a deep insight into
convex optimization.
The theory of convex differentiable optimization is well established. Every
student will be introduced in basic courses on mathematical optimization to
the Fritz John and Karush–Kuhn–Tucker necessary optimality conditions. For
guaranteeing the Karush–Kuhn–Tucker conditions a constraint qualification
such as the Slater condition is needed. But, in many applications, this condi-
tion is violated. There are a larger number of ways out in such a situation:
Abadie constraint qualification can be supposed, sequential optimality condi-
tions can be used or we can try to filter out full information of (enhanced)
Fritz John necessary optimality conditions. These nonstandard but essential
parts of the theory of convex optimization need to be described in detail and
in close relation to each other.
Nonsmooth analysis (see, for example, Mordukhovich [86]) is a quickly
developing area in mathematical optimization. The initial point of nonsmooth
analysis is convex analysis, but recent developments in nonsmooth analysis are
a good influence on convex analysis.
The aim of this book is to develop deep insight into the theory of convex

xv

© 2012 by Taylor & Francis Group, LLC


xvi Foreword

optimization, combining very recent ideas of nonsmooth analysis with stan-


dard and nonstandard theoretical results. Lagrange and Fenchel duality use
different tools and can be applied successfully in distinct directions. But in
the end, both are shown to coincide.
If, at an optimal solution, no constraint qualification is satisfied, algorithms
solving the Karush–Kuhn–Tucker conditions cannot be used to compute this
point. And, how to characterize such a point? Roughly speaking one idea is
the existence of a sequence outside of the feasible set with smaller objective
function values converging to that point. These are the enhanced Fritz John
necessary optimality conditions. A second idea is to characterize optimality
via subgradients of the regular Lagrangian function at perturbed points con-
verging to zero. This is the sequential optimality condition. Both optimality
conditions work without constraint qualifications. ε-optimal solutions can be
characterized using ε-subgradients.
One special convex optimization problem is also investigated. This is the
problem to find a best point within the set of optimal solutions of a convex
optimization problem. If the objective function is convex, this is a convex
optimization problem called a simple bilevel programming problem. It is easy
to see that standard regularity conditions are violated at every feasible point.
For this problem, a very general constraint qualification is derived.
A last question is if convexity can successfully be used to investigate non-
convex problems as the maximization of convex functions or the minimization
of a function being the difference of convex functions.
Algorithmic approaches for solving convex optimization problems are not
described in this book. This results in much more space for theoretical proper-
ties. The result is a book illuminating not only the body but also the bounds
and corners of the theory of convex optimization. Many of the results pre-
sented are usually not contained in books on this topic. But, if more and
more (applied) difficult optimization problems need to be solved, we are more
likely be faced with instances where usual approaches fail. Then it is necessary
to search away from standard tools for applicable approaches. I am sure that
this book will be very helpful.
I deeply recommend this book for advanced reading.

Stephan Dempe
Freiberg, Germany

© 2012 by Taylor & Francis Group, LLC


Preface

This is a book on convex optimization. More precisely it is a book on the re-


cent advances in the theory of optimality conditions for convex optimization.
The question is why should one need an additional book on the subject? How-
ever, possibly the books on convex analysis are much more in number than the
ones on convex optimization. In the books dealing with convex analysis, like
the classic Convex Analysis by Rockafellar [97] or the the more recent Convex
Analysis and Nonlinear Optimization by Borwein and Lewis [17], one would
find convex optimization theory appears as an application to various results
of convex analysis. However, from 1970 until now there has been a growing
body of research in the area of optimality conditions for a convex optimiza-
tion. Many of these results address the question as to what happens when
the Slater condition fails for a convex optimization problem or are there very
general constraint qualification conditions which hold even if the the most pop-
ular ones fail? The books on convex analysis usually do not present results of
this type and thus these results remain largely scattered in the vast literature
on convex optimization. On the other hand, the books dealing with convex
optimization largely focus on algorithms or algorithms and theory associated
with a certain special class of problems like second-order conic programming
or semidedfinite programming. Some recent books like Introductory Lectures
in Convex Optimization by Nesterov [90] or Lectures on Modern Convex Opti-
mization by Ben-Tal and Nemirovskii [8] deal with algorithms and the special
problems, respectively.
This book has a completely different focus. It deals with optimality con-
ditions in convex optimization. It attempts to bring in most of the important
and recent results in this area that are scattered in the literature. However,
we do not ignore the required convex analysis either. We provide a detailed
chapter on the main convex analytic tools and also provide some new results
that have appeared recently in the literature. These results are usually not
found in standard books on convex analysis but they are essential in devel-
oping many of the important results in this book. This book actually began
as a survey paper but then we realized that it has too much material to be
considered as a survey; and then we thought of converting the survey paper
into the form of a monograph.
We would look to thank the many people who encouraged us to write
the book. Professor Stephan Dempe agreed very kindly to write the foreword.
Professor Boris Mordukhovich, Professor Suresh Chandra, Professor Juan En-

xvii

© 2012 by Taylor & Francis Group, LLC


xviii Preface

rique Martinez-Legaz also encouraged us to go ahead and write the book. We


are indeed grateful to them. We would also like to thank Aastha Sharma of
Taylor & Francis, India, for superb handling of the whole book project and
Shashi Kumar from the helpdesk of Taylor & Francis for helping with the
formatting. We would also like to extend our deepest gratitude to our families
for their support. Joydeep Dutta would like to thank his daughter Naina and
his wife Lalty for their understanding and patience during the time this book
was written. Anulekha Dhara would like to express her deepest and sincer-
est regards and gratitude to her parents Dr. Madhu Sudan Dhara and Dolly
Dhara for their understanding and support. She would also like to thank the
National Board for Higher Mathematics, Mumbai, India, for providing finan-
cial support during her tenure at the Indian Institute of Technology Kanpur,
India.
The book is intended for research mathematicians in convex optimization
and also for graduate students in the area of optimization theory. This could be
of interest also to the practitioner who might be interested in the development
of the theory. We have tried our best to make the book free of errors. But to
err is human, so we take the responsibility for any errors the readers might
find in the book. We would also like to request that readers communicate
with us by email at the address: jdutta@iitk.ac.in. We sincerely hope that the
young researchers in the field of optimization will find this book helpful.

Anulekha Dhara
Avignon, France

Joydeep Dutta
Kanpur, India

© 2012 by Taylor & Francis Group, LLC


Chapter 1
What Is Convex Optimization?

1.1 Introduction
Optimization is the heart of applied mathematics. Various problems encoun-
tered in the areas of engineering, sciences, management science, and economics
are based on the fundamental idea of mathematical formulation. Optimiza-
tion is an essential tool for the formulation of many such problems expressed
in the form of minimization of a function under certain constraints like in-
equalities, equalities, and/or abstract constraints. It is thus rightly considered
a science of selecting the best of the many possible decisions in a complex
real-life environment.
Even though optimization problems have existed since very early times,
the optimization theory has settled as a solid and autonomous field only in
recent decades. The origin of analytic optimization lies in the classical calculus
of variations and is interrelated with the development of calculus. The very
concept of derivative introduced by Fermat in the mid-seventeenth century via
the tangent slope to the graph of a function was motivated by solving an op-
timization problem, leading to the Fermat stationary principle. Around 1684,
Leibniz developed a method to distinguish between minima and maxima via
second-order derivatives. The calculus of variations was introduced by Euler
while solving the Brachistochrone problem, which was posed by Bernoulli in
1696. The problem is stated as “Given two points x and y in the vertical plane.
A particle is allowed to move under its own gravity from x to y. What should
be the curve along which the particle should move so as to reach y from x in
the shortest time?” In 1759, Lagrange gave a completely different approach
to solve the problems in calculus of variations, today known as the Lagrange
multiplier rule. The Lagrange multipliers are viewed as the auxiliary variables
that are primarily used to derive the optimality conditions for constrained
optimization problems. These optimality conditions are the building blocks of
optimization theory.
During the second world war, Dantzig developed the simplex method to
solve linear programming problems. The first attempt to develop the La-
grange multiplier rules for nonlinear optimization problem was made by Fritz
John [71] in 1948. In 1951, Kuhn and Tucker [73] gave the Lagrange multiplier
rule for convex and other nonlinear optimization problems involving differen-

© 2012 by Taylor & Francis Group, LLC


2 What Is Convex Optimization?

tiable functions. It was later found that Karush in 1939 had independently
established the optimality conditions similar to those of Kuhn and Tucker.
These optimality conditions are today famous as the Karush–Kuhn–Tucker
(KKT) optimality conditions. All the initial theories were developed with the
differentiability assumptions of the functions involved.
Meanwhile, efforts were made to shed the differentiability hypothesis,
thereby leading to the development of nonsmooth convex analysis as a subject
in itself. This added a new chapter to optimization theory. The key contrib-
utors in the development of convexity theory are Fenchel [45], Moreau [88],
and Rockafellar [97]. An important milestone in this direction was the publi-
cation of Convex Analysis by Rockafellar [97], where the theory of nonsmooth
convex analysis was presented in detail for the first time. No wonder this text
is by far a must for all optimization researchers. In the early 1970s, his stu-
dent Clarke coined the term nonsmooth optimization to categorize the theory
involving nondifferentiable optimization problems. He extended the calculus
rules and applied them to optimization problems involving locally Lipschitz
functions. This was just the beginning. The subsequent decade witnessed a
large development in the field of nonsmooth nonconvex optimization. For de-
tails on nonsmooth analysis, one may refer to Borwein and Lewis [17]; Bor-
wein and Zhu [18]; Clarke [27]; Clarke, Ledyaev, Stern and Wolenshi [28];
Mordukhovich [86]; and Rockafellar and Wets [101].
However, such developments have not overshadowed the importance of
convex optimization, which still is and will remain a pivotal area of research. It
has paved a path not only for theoretical improvements, but also algorithmic
designing aspects. In this book we focus mainly on convex analysis and its
application to the development of convex optimization theory.

1.2 Basic Concepts


By convex optimization we simply mean the problem of minimizing a convex
function over a convex set. More precisely, we are concerned with the following
problem
min f (x) subject to x ∈ C, (CP )
where f : Rn → R is a convex function and C ⊂ Rn is a convex set. Of course
in most cases the the set C is described by a system of convex inequalities
and affine equalities. Thus we can write

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m and
hj (x) = 0, j = 1, 2, . . . , l},

where gi : Rn → R, i = 1, 2, . . . , m are convex functions and hj : Rn → R,

© 2012 by Taylor & Francis Group, LLC


1.2 Basic Concepts 3

j = 1, 2, . . . , l are affine functions. When C is expressed explicitly as above,


(CP ) is called the convex programming problem.
A set C ⊂ Rn is a convex set if for any x, y ∈ Rn , the line segment joining
them, that is

[x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1},

is also in C. A function φ : Rn → R is a convex function if for any x, y ∈ Rn


and λ ∈ [0, 1],

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),

while it is an affine function if it is a translate of a linear function; that is, φ


is affine if

φ(x) = ha, xi + b,

where a ∈ Rn and b ∈ R.
It is important to note at the very outset that in optimization theory it is
worthwhile to consider extended-valued functions, that is, functions that take
values in R̄ = R∪{−∞, +∞}. The need to do so arises when we seek to convert
a constrained optimization problem into an unconstrained one. Consider for
example the problem (CP ), which can be restated as

min f0 (x) subject to x ∈ Rn ,

where

f (x), x ∈ C,
f0 (x) =
+∞, otherwise.
All the modern books on convex analysis beginning with the classic Convex
Analysis by Rockafellar [97] follow this framework. However, when we include
infinities, we need to know how to deal with them. Most rules with infinity
are intuitively clear except possibly 0 × (+∞) and ∞ − ∞. Because we will
be dealing mainly with minimization problems, we will follow the convention
0 × (+∞) = (+∞) × 0 = 0 and ∞ − ∞ = ∞. This convention was adopted in
Rockafellar and Wets [101] and we shall follow it. However, we would like to
ascertain that we really need not get worried about ∞ − ∞ as the functions
considered in this book are real-valued or proper functions. An extended-
valued function φ : Rn → R̄ is said to be a proper function if φ(x) > −∞ for
every x ∈ Rn and dom φ is nonempty where dom φ = {x ∈ Rn : φ(x) < +∞}
is the domain of φ.
It is worthwhile to note that the definition of a convex function given
above can be extended to the case when φ is an extended-valued function. An
extended-valued function φ : Rn → R̄ is a convex function if for any x, y ∈ Rn
and λ ∈ [0, 1],

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y),

© 2012 by Taylor & Francis Group, LLC


4 What Is Convex Optimization?

with the convention that ∞ − ∞ = +∞. A better way to handle the convexity
of an extended-valued convex function is to use its associated geometry. In
this direction we describe the epigraph of a function φ : Rn → R̄, which is
given as

epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.

A function is said to be convex if the epigraph is convex. We leave it as a simple


exercise for the reader to show that if the epigraph of a function φ : Rn → R̄
is convex in Rn × R, then φ is a convex function over Rn . For more details see
Chapter 2.
In case of extended-valued functions, one can work with the semicontinuity
of the functions rather than the continuity. Before we define those notions, we
present certain notations that will be used throughout.
For any two sets F1 , F2 ⊂ Rn , define

F1 + F2 = {x1 + x2 ∈ Rn : x1 ∈ F1 , x2 ∈ F2 }.

For any set F ⊂ Rn and any scalar λ ∈ R,

λF = {λx ∈ Rn : x ∈ F }.

The closure of a set F is denoted by cl F while the interior is given by int F .


The open unit ball, or simply unit ball, is denoted by B. By Bδ (x̄) we mean
an open ball of radius δ > 0 with center at x̄. Explicitly,

Bδ (x̄) = x̄ + δB.

(y1 , y2 , . . . , yn ) in Rn , the inner


For vectors x = (x1 , x2 , . . . , xn ) and y = P
n
py is denoted by hx, yi = i=1 xi yi while the norm of x is
product of x and
given by kxk = hx, xi. We state a standard result on the norm.

Proposition 1.1 (Cauchy–Schwarz Inequality) For any two vectors x, y ∈ Rn ,

|hx, yi| ≤ kxkkyk.

The above inequality holds as equality if and only if x = αy for some scalar
α ∈ R.

To discuss the concept of continuities of a function, we shall consider the


notions of limit infimum and limit supremum of a function. But first we discuss
the convergence of sequences in Rn .

Definition 1.2 A sequence {xk ∈ R : k = 1, 2, . . .} or simply {xk } ⊂ R is


said to converge to x̄ ∈ R if for every ε > 0, there exists kε such that

|xk − x̄| < ε, ∀ k ≥ kε .

A sequence {xk } ⊂ Rn converges to x̄ ∈ Rn if the i-th component of xk

© 2012 by Taylor & Francis Group, LLC


1.2 Basic Concepts 5

converges to the i-th component of x̄. The vector x̄ is called the limit of {xk }.
Symbolically it is expressed as

xk → x̄ or lim xk = x̄.
k→∞

The sequence {xk } ⊂ Rn is bounded if each of its components is bounded.


Equivalently, {xk } is bounded if and only if there exists M ∈ R such that
kxk k ≤ M for every k ∈ N. A subsequence of {xk } ⊂ Rn is a sequence {xkj },
j = 1, 2, . . ., where each xkj is a member of the original sequence and the order
of the elements as in the original sequence is maintained. A vector x̄ ∈ Rn is
a limit point of {xk } ⊂ Rn if there exists a subsequence of {xk } converging
to x̄. If the limit point is unique, it is the limit of {xk }. Next we state the
classical result on the bounded sequences.

Proposition 1.3 (Bolzano–Weierstrass Theorem) A bounded sequence in Rn


has a convergent subsequence.

For a sequence {xk } ⊂ R, define

zr = inf{xk : k ≥ r} and yr = sup{xk : k ≥ r}.

It is obvious that the sequences {zr } and {yr } are nondecreasing and non-
increasing, respectively. If {xk } is bounded below or bounded above, the se-
quences {zr } or {yr }, respectively, have a limit. The limit of {zr } is called
the limit infimum or lower limit of {xk } and denoted by lim inf k→∞ xk , while
that of {yr } is called the limit supremum or upper limit of {xk } and denoted
by lim supk→∞ xk . Equivalently,

lim inf xk = lim { inf xr } and lim sup xk = lim {sup xr }.


k→∞ k→∞ r≥k k→∞ k→∞ r≥k

For a sequence {xk }, lim inf k→∞ xk = −∞ if the sequence is unbounded below
while lim supk→∞ xk = +∞ if the sequence is unbounded above. Therefore,
{xk } converges to x̄ if and only if

−∞ < lim inf xk = x̄ = lim sup xk < +∞.


k→∞ k→∞

Now we move on to define the semicontinuities of a function that involve


the limit infimum and limit supremum of the function.

Definition 1.4 A function φ : Rn → R̄ is said to be lower semicontinuous


(lsc) at x̄ ∈ Rn if for every sequence {xk } ⊂ Rn converging to x̄,

φ(x̄) ≤ lim inf φ(xk ).


k→∞

Equivalently,

φ(x̄) ≤ lim inf φ(x),


x→x̄

© 2012 by Taylor & Francis Group, LLC


6 What Is Convex Optimization?

where the term on the right-hand side of the inequality denotes the limit
infimum or the lower limit of the function φ defined as
lim inf φ(x) = lim inf φ(x).
x→x̄ δ↓0 x∈Bδ (x̄)

The function φ is lsc over a set F ⊂ Rn if φ is lsc at every x̄ ∈ F .


For a function φ : Rn → R̄,
inf φ(x) ≤ φ(x̄).
x∈Bδ (x̄)

Taking the limit as δ ↓ 0 in the above inequality leads to


lim inf φ(x) ≤ φ(x̄).
x→x̄

Thus, the inequality in the above definition of lsc can be replaced by an


equality, that is, φ : Rn → R̄ is lsc at x̄ if
φ(x̄) = lim inf φ(x).
x→x̄

Similar to the concept of lower semicontinuity and limit infimum, we next


define the upper semicontinuity and the limit supremum of a function.
Definition 1.5 A function φ : Rn → R̄ is said to be upper semicontinuous
(usc) at x̄ ∈ Rn if for every sequence {xk } ⊂ Rn converging to x̄,
φ(x̄) ≥ lim sup φ(xk ).
k→∞

Equivalently,
φ(x̄) ≥ lim sup φ(x),
x→x̄

where the term on the right-hand side of the inequality denotes the limit
supremum or the upper limit of the function φ defined as
lim sup φ(x) = lim sup φ(x).
x→x̄ δ↓0 x∈Bδ (x̄)

The function φ is usc over a set F ⊂ Rn if φ is usc at every x̄ ∈ F .


Definition 1.6 A function φ : Rn → R̄ is said to be continuous at x̄ if it is
lsc as well as usc at x̄, that is,
lim φ(x) = φ(x̄).
x→x̄

Alternatively, φ is continuous at x̄ if for any ε > 0 there exists δ(ε, x̄) > 0
such that
|φ(x) − φ(x̄)| ≤ ε whenever kx − x̄k < δ(ε, x̄).
The function φ is continuous over a set F ⊂ Rn if φ is continuous at every
x̄ ∈ F .

© 2012 by Taylor & Francis Group, LLC


1.2 Basic Concepts 7

Because we will be considering minimization problems, the continuity of


a function will be replaced by lower semicontinuity. Before moving on, we
state a result on the infimum and supremum operations from Rockafellar and
Wets [101].

Proposition 1.7 (i) Consider an extended-valued function φ : Rn → R̄ and


sets Fi ⊂ Rn , i = 1, 2 such that F1 ⊂ F2 . Then

inf φ(x1 ) ≥ inf φ(x2 ) and sup φ(x1 ) ≤ sup φ(x2 ).


x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2

(ii) Consider the functions φ1 , φ2 : Rn → R̄ and a set F ⊂ Rn . Then

inf φ1 (x) + inf φ2 (x) ≤ inf (φ1 + φ2 )(x)


x∈F x∈F x∈F
≤ sup (φ1 + φ2 )(x) ≤ sup φ1 (x) + sup φ2 (x).
x∈F x∈F x∈F

Also, for functions φi : Rni → R̄ and sets Fi ⊂ Rni , i = 1, 2,

inf φ1 (x1 ) + inf φ2 (x2 ) = inf (φ1 (x1 ) + φ2 (x2 )),


x1 ∈F1 x2 ∈F2 (x1 ,x2 )∈F1 ×F2

sup φ1 (x1 ) + sup φ2 (x2 ) = sup (φ1 (x1 ) + φ2 (x2 )).


x1 ∈F1 x2 ∈F2 (x1 ,x2 )∈F1 ×F2

(iii) Consider an extended-valued function φ : Rn → R̄ and a set F ⊂ Rn .


Then for λ ≥ 0,

inf (λφ)(x) = λ inf φ(x) and sup (λφ)(x) = λ sup φ(x),


x∈F x∈F x∈F x∈F

provided 0 × (+∞) = 0 = 0 × (−∞).

The next result from Rockafellar and Wets [101] gives a characterization
of limit infimum of an arbitrary extended-valued function.

Lemma 1.8 For an extended-valued function φ : Rn → R̄,

lim inf φ(x) = min{α ∈ R̄ : there exists xk → x̄ satisfying φ(xk ) → α}.


x→x̄

Proof. Suppose that lim inf x→x̄ φ(x) = ᾱ. We claim that for xk → x̄ with
φ(xk ) → α, α ≥ ᾱ. As xk → x̄, for any δ > 0, there exists kδ ∈ N such that
xk ∈ Bδ (x̄) for every k ≥ kδ . Therefore,

φ(xk ) ≥ inf φ(x).


x∈Bδ (x̄)

Taking the limit as k → +∞ in the above inequality,

α≥ inf φ(x), ∀ δ > 0.


x∈Bδ (x̄)

© 2012 by Taylor & Francis Group, LLC


8 What Is Convex Optimization?

Because δ is arbitrarily chosen, so taking the limit δ ↓ 0 along with the defin-
ition of the limit infimum of φ leads to

α ≥ lim inf φ(x),


x→x̄

that is, α ≥ ᾱ.


To prove the result, we shall show that there exists a sequence xk → x̄
such that φ(xk ) → ᾱ. For a nonnegative sequence {δk }, define

ᾱk = inf φ(x).


x∈Bδk (x̄)

As δk → 0, by Definition 1.4 of limit infimum, ᾱk → ᾱ. Now for every k ∈ N,


by the definition of infimum it is possible to find xk ∈ Bδk (x̄) for which φ(xk )
is very close to ᾱk , that is, in an interval [ᾱk , αk ] where ᾱk < αk and αk → ᾱ.
Therefore, as k → +∞, xk → x̄, and φ(xk ) → ᾱ, thereby establishing the
result. 
After the characterization of limit infimum of a function, the result below
gives an equivalent characterization of lower semicontinuity of the function in
terms of the epigraph and lower level set.

Theorem 1.9 Consider a function φ : Rn → R̄. Then the following condi-


tions are equivalent:

(i) φ is lsc over Rn .


(ii) The epigraph of φ, epi φ, is a closed set in Rn × R.
(iii) The lower-level set lev≤α φ = {x ∈ Rn : φ(x) ≤ α} is closed for every
α ∈ R.

Proof. If φ ≡ ∞, the result holds trivially. So assume that dom φ is nonempty


and thus epi φ and lev≤α φ are nonempty.
We will first show that (i) implies (ii). Consider a sequence {(xk , αk )} ⊂
epi φ such that (xk , αk ) → (x̄, ᾱ). Therefore, φ(xk ) ≤ αk , which implies that

lim inf φ(x) ≤ lim inf φ(xk ) ≤ ᾱ.


x→x̄ k→∞

By the lower semicontinuity of φ,

φ(x̄) = lim inf φ(x),


x→x̄

which reduces the preceding condition to φ(x̄) ≤ ᾱ, thereby proving that epi φ
is a closed set in Rn × R.
Next we show that (ii) implies (iii). For a fixed α ∈ R, suppose
that {xk } ⊂ lev≤α φ such that xk → x̄. Therefore, φ(xk ) ≤ α, that is,
(xk , α) ∈ epi φ. By (ii), epi φ is closed, which implies (x̄, α) ∈ epi φ, that
is, φ(x̄) ≤ α. Thus, x̄ ∈ lev≤α φ, thereby yielding condition (iii).

© 2012 by Taylor & Francis Group, LLC


1.2 Basic Concepts 9

Finally, to obtain the equivalence, we will establish that (iii) implies (i).
To show that φ is lsc, we need to show that for every x̄ ∈ Rn ,

φ(x̄) ≤ lim inf φ(xk ) whenever xk → x̄.


k→∞

On the contrary, assume that for some x̄ ∈ Rn and some sequence xk → x̄,

φ(x̄) > lim inf φ(xk ),


k→∞

which implies there exists α ∈ R such that

φ(x̄) > α > lim inf φ(xk ). (1.1)


k→∞

Thus, there exists a subsequence, without relabeling, say {xk } such that
φ(xk ) ≤ α for every k ∈ N, which implies xk ∈ lev≤α φ. By (iii), the lower
level set lev≤α φ is closed and hence x̄ ∈ lev≤α φ, that is, φ(x̄) ≤ α, which
contradicts (1.1). Therefore, φ is lsc over Rn . 
The proof of the last implication, that is, (iii) implies (i) of Theorem 1.9
by contradiction was from Bertsekas [12]. We present an alternative proof for
the same from Rockafellar and Wets [101].
It is obvious that for any x̄ ∈ Rn ,

ᾱ = lim inf φ(x) ≤ φ(x̄).


x→x̄

Therefore, to establish the lower semicontinuity of φ at x̄, we need to prove


that φ(x̄) ≤ ᾱ. By Lemma 1.8, there exists a sequence {xk } ⊂ Rn with
xk → x̄ such that φ(xk ) → ᾱ. Thus, for every α > ᾱ, φ(xk ) ≤ α, which
implies xk ∈ lev≤α φ. Now if condition (iii) of the above theorem holds, that
is, lev≤α φ is closed in Rn ,

x̄ ∈ lev≤α φ, ∀ α > ᾱ.

Thus, φ(x̄) ≤ α, which leads to φ(x̄) ≤ ᾱ. Because x̄ ∈ Rn was arbitrarily


chosen, φ is lsc over Rn .
Theorem 1.9 gives equivalent characterization of lower semicontinuity of a
function. But if the function is not lsc, its epigraph is not closed. The result
below gives an equivalent characterization of the closure of the epigraph of
any arbitrary function.

Proposition 1.10 For any arbitrary extended-valued function φ : Rn → R̄,


(x̄, ᾱ) ∈ cl epi φ if and only if

lim inf φ(x) ≤ ᾱ.


x→x̄

© 2012 by Taylor & Francis Group, LLC


10 What Is Convex Optimization?

Proof. Suppose that (x̄, ᾱ) ∈ cl epi φ, which implies that there exists
{(xk , αk )} ⊂ epi φ such that (xk , αk ) → (x̄, ᾱ). Thus, taking the limit as
k → +∞, the condition

lim inf φ(x) ≤ lim inf φ(xk )


x→x̄ xk →x̄

yields

lim inf φ(x) ≤ ᾱ,


x→x̄

as desired.
Conversely, assume that lim inf x→x̄ φ(x) ≤ ᾱ but (x̄, ᾱ) 6∈ cl epi φ.
We claim that, lim inf x→x̄ φ(x) = ᾱ. On the contrary, suppose that
lim inf x→x̄ φ(x) < ᾱ. As (x̄, ᾱ) ∈
/ cl epi φ, there exists δ̄ > 0 such that for
every δ ∈ (0, δ̄),

Bδ ((x̄, ᾱ)) ∩ cl epi φ = ∅,

which implies for every (x, α) ∈ Bδ ((x̄, ᾱ)), φ(x) > α. In particular for
(x, ᾱ) ∈ Bδ ((x̄, ᾱ)), φ(x) > ᾱ, that is,

φ(x) > ᾱ, ∀ x ∈ Bδ (x̄).

Therefore, taking the limit as δ → 0 along with the definition of limit infimum
of a function yields

lim inf φ(x) ≥ ᾱ,


x→x̄

which is a contradiction. Therefore, lim inf x→x̄ φ(x) = ᾱ. By Lemma 1.8, there
exists a sequence xk → x̄ such that φ(xk ) → ᾱ. Because (xk , φ(xk )) ∈ epi φ,
(x̄, ᾱ) ∈ cl epi φ, thereby reaching a contradiction and hence the result. 
Now the question is whether it is possible to construct a function that is
the closure of the epigraph of another function. This leads to the concept of
closure of a function.

Definition 1.11 For any function φ : Rn → R̄, an lsc function that is con-
structed in such a way that its epigraph is the closure of the epigraph of φ is
called the lower semicontinuous hull or the closure of the function φ and is
denoted by cl φ. Therefore,

epi(cl φ) = cl epi φ.

Equivalently, the closure of φ is defined as

cl φ(x̄) = lim inf φ(x), ∀ x̄ ∈ Rn .


x→x̄

By Proposition 1.10, it is obvious that (x̄, ᾱ) ∈ cl epi φ if and only if


(x̄, ᾱ) ∈ epi cl φ. The function φ is said to be closed if cl φ = φ.

© 2012 by Taylor & Francis Group, LLC


1.2 Basic Concepts 11

−1 1 −1 1

epi φ epi cl φ

FIGURE 1.1: Lower semicontinuous hull.

If φ is lsc, then it is closed as well. Also cl φ is lsc and the greatest of all
lsc functions ψ such that ψ(x) ≤ φ(x) for every x ∈ Rn . From Theorem 1.9,
one has that closedness is the same as lower semicontinuity over Rn . In this
discussion, the function φ was defined over Rn . But what if φ is defined over
some subset of Rn . Then one cannot talk about the lower semicontinuity of
the function over Rn . In such a case, how is the closedness of a function related
to lower semicontinuity? This issue was addressed by Bertsekas [12]. Consider
a set F ⊂ Rn and a function φ : F → R̄. Observe that here we define φ over
the set F and not Rn . The function φ can be extended over Rn by defining a
function φ̄ : Rn → R̄ as

φ(x), x ∈ F,
φ̄(x) =
+∞, otherwise.
Note that both the extended-valued functions φ and φ̄ have the same epigraph.
Thus from the above discussion, one has φ is closed if and only if φ̄ is lsc over
Rn . Also observe that the lower semicontinuity of φ over dom φ is not sufficient
for φ to be closed. In addition, one has to assume the closedness of dom φ.
To emphasize this fact, let us consider a simple example. Consider φ : R → R̄
defined as

0, x ∈ (−1, 1),
φ(x) =
+∞, otherwise.
Here, dom φ = (−1, 1) over which the function is lsc but epi φ is not closed
and hence, φ is not closed. The closure of φ is given by

0, x ∈ [−1, 1],
cl φ(x) =
+∞, otherwise.
Observe that in Figure 1.1, epi φ is not closed while epi cl φ is closed. There-
fore, we have the following result from Bertsekas [12].

© 2012 by Taylor & Francis Group, LLC


12 What Is Convex Optimization?

Proposition 1.12 Consider F ⊂ Rn and a function φ : F → R̄. If dom φ is


closed and φ is lsc over dom φ, then φ is closed.

Because we are interested in studying the minimization problem, it is im-


portant to know whether a minimizer exists or not. In this respect, we have
the classical Weierstrass theorem, according to which “A continuous function
attains its minimum over a compact set.” For a more general version of this
theorem from Bertsekas [12], we require the notion of coercivity.

Definition 1.13 A function φ : Rn → R̄ is said to be coercive over a set


F ⊂ Rn if for every sequence {xk } ⊂ F

lim φ(xk ) = +∞ whenever kxk k → +∞.


k→∞

For F = Rn , φ is simply called coercive.

Observe that for a coercive function, every nonempty lower level set is
bounded. Below we prove the Weierstrass Theorem.

Theorem 1.14 (Weierstrass Theorem) Consider a proper lsc function


φ : Rn → R̄ and assume that one of the following holds:

(i) dom φ is bounded.

(ii) there exists α ∈ R such that the lower level set lev≤α φ is nonempty and
bounded.

(iii) φ is coercive.

Then the set of minimizers of φ over Rn is nonempty and compact.

Proof. Suppose that condition (i) holds, that is, dom φ is bounded. Because
φ is proper, φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. Denote
φinf = inf x∈Rn φ(x), which implies φinf = inf x∈dom φ φ(x). Therefore, there
exists a sequence {xk } ⊂ dom φ such that φ(xk ) → φinf . Because dom φ is
bounded, {xk } is a bounded sequence, which by Bolzano–Weierstrass Theo-
rem, Proposition 1.3, has a convergent subsequence. Without loss of generality,
assume that xk → x̄. By the lower semicontinuity of φ,

φ(x̄) ≤ lim inf φ(xk ) = lim φ(xk ) = φinf ,


k→∞ k→∞

which implies that x̄ is a point of minimizer of φ over Rn . Denote the set


of minimizers by S. Therefore, x̄ ∈ S and hence S is nonempty. Because
S ⊂ dom φ which is bounded, S is a bounded set. Also, S is the intersection
of the lower level sets lev≤α φ, where α > m. For an lsc function φ, lev≤α φ is
closed by Theorem 1.9 and thus S is closed. Hence S is compact.
Assume that condition (ii) holds; that is, for some α ∈ R, the lower level

© 2012 by Taylor & Francis Group, LLC


1.2 Basic Concepts 13

set lev≤α φ is nonempty and bounded. Consider a proper function φ̄ : Rn → R̄


defined as

φ(x), φ(x) ≤ α,
φ̄(x) =
+∞, otherwise.

Therefore, dom φ̄ = lev≤α φ which is nonempty and bounded by condition (ii).


Since φ is lsc which by Theorem 1.9 implies that dom φ̄ is closed. Also by the
lower semicontinuity of φ along with Proposition 1.12, φ̄ is closed and hence
lsc. Moreover, the set of minimizers of φ̄ is the same as that of φ. The result
follows by applying condition (i) to φ̄.
Suppose that condition (iii) is satisfied, that is, φ is coercive. Because φ
is proper, dom φ is nonempty and thus has a nonempty lower level set. By
the coercivity of φ, it is obvious that the nonempty lower level sets of φ are
bounded, thereby satisfying condition (ii), and therefore leading to the desired
result. 
As we all know, the next concept that comes to mind after limit and
continuity is the derivative of a function. Below we define this very notion.

Definition 1.15 For a scalar-valued function φ : Rn → R, the derivative of


φ at x̄ is denoted by ∇φ(x̄) ∈ Rn and is defined as

φ(x̄ + h) − φ(x̄) − h∇φ(x̄), hi


lim = 0.
khk→0 khk

Equivalently, the derivative can also be expressed as

φ(x) = φ(x̄) + h∇φ(x̄), x − x̄i + o(kx − x̄k),

o(kx − x̄)k
where limx→x̄ = 0. A function φ is differentiable if it is differen-
kx − x̄k
tiable at every x ∈ Rn . The derivative, ∇φ(x̄), of φ at x̄ is also called the
gradient of φ at x̄, which can be expressed as
 
∂φ ∂φ ∂φ
∇φ(x̄) = (x̄), (x̄), . . . , (x̄) ,
∂x1 ∂x2 ∂xn

∂φ
where , i = 1, 2, . . . , n denotes the i-th partial derivative of φ. If φ is
∂xi
continuously differentiable, that is, the map x 7→ ∇φ(x) is continuous over
Rn , then φ is called a smooth function. If φ is not smooth, it is called a
nonsmooth function.
Similar to the first-order differentiability, we have the second-order differ-
entiability notion as follows.

© 2012 by Taylor & Francis Group, LLC


14 What Is Convex Optimization?

Definition 1.16 For a scalar-valued function φ : Rn → R, the second-order


derivative of φ at x̄ is denoted by ∇2 φ(x̄) ∈ Rn×n and is defined as
1
φ(x̄ + h) − φ(x̄) − h∇φ(x̄), hi − h∇2 φ(x̄)h, hi
lim 2 = 0,
khk→0 khk2

which is equivalent to

φ(x) = φ(x̄) + h∇φ(x̄), x − x̄i + h∇2 φ(x̄)(x − x̄), x − x̄i + o(kx − x̄k2 ).

The matrix ∇2 φ(x̄) is also referred to as the Hessian with the ij-th entry of
∂2φ
the matrix being the second-order partial derivative (x̄). If φ is twice
∂xi ∂xj
2
continuously differentiable, then the matrix ∇ φ(x̄) is a symmetric matrix.
In the above definitions we considered the function φ to be a scalar-valued
function. Next we define the notion of differentiability for a vector-valued
function Φ.

Definition 1.17 For a vector-valued function Φ : Rn → Rm , the derivative


of Φ at x̄ is denoted by JΦ(x̄) ∈ Rm×n and is defined as

kΦ(x̄ + h) − Φ(x̄) − hJΦ(x̄), hik


lim = 0.
khk→0 khk

The matrix JΦ(x̄) is also called the Jacobian of Φ at x̄. If Φ = (φ1 , φ2 , . . . , φm ),


Φ is differentiable if each φi : Rn → R, i = 1, 2, . . . , m is differentiable. The
Jacobian of Φ at x̄ can be expressed as
 
∇φ1 (x̄)
 ∇φ2 (x̄) 
 
JΦ(x̄) =  .. 
 . 
∇φm (x̄)

∂φi
with the ij-th entry of the matrix being the partial derivative (x̄). In
∂xj
the above expression of JΦ(x̄), the vectors ∇φ1 (x̄), ∇φ2 (x̄), . . . , ∇φm (x̄) are
written as row vectors.
Observe that the derivative is a local concept and it is defined at a point
x if x ∈ int dom φ. Below we state the Mean Value Theorem, which plays a
pivotal role in the study of optimality conditions.

Theorem 1.18 (Mean Value Theorem) Consider a continuously differen-


tiable function φ : Rn → R. Then for every x, y ∈ Rn , there exists z ∈ [x, y]
such that

φ(y) − φ(x) = h∇φ(z), y − xi.

© 2012 by Taylor & Francis Group, LLC


1.3 Smooth Convex Optimization 15

With all these basic concepts we now move on to the study of convexity.
The importance of convexity in optimization stems from the fact that when-
ever we minimize a convex function over a convex set, every local minimum
is a global minimum. Many other issues in optimization depend on convexity.
However, convex functions suffer from the drawback that they need not be
differentiable at every point of their domain of definition and the nondiffer-
entiability may be precisely at the point where the minimum is achieved. For
instance, consider the minimization of the absolute value function, |x|, over R.
At the point of minima, x̄ = 0, the function is nondifferentiable. How this ma-
jor difficulty was overcome by the development of a completely different type
of analysis is possibly one of the most thrilling developments in optimization
theory. This analysis depends on set-valued maps, which we briefly present
below.
Definition 1.19 A set-valued map Φ from Rn to Rm associates every x ∈ Rn
to a set in Rm ; that is, for every x ∈ Rn , Φ(x) ⊂ Rm . Symbolically it is
expressed as Φ : Rn ⇉ Rm . A set-valued map is associated with its graph
defined as
gph Φ = {(x, y) ∈ Rn × Rm : y ∈ Φ(x)}.
Φ is said to be a proper map if there exists x ∈ Rn such that Φ(x) 6= ∅. Φ is said
to be closed-valued or convex-valued or bounded-valued if for every x ∈ Rn , the
sets Φ(x) are closed or convex or bounded, respectively. Φ is locally bounded
at x̄ ∈ Rn if there exists δ > 0 and a bounded set F ⊂ Rn such that
Φ(x) ⊂ V, ∀ x ∈ Bδ (x̄).
The set-valued map Φ is said to be closed if it has a closed graph; that is, for
any sequence {xk } ⊂ Rn with xk → x̄ and yk ∈ Φ(xk ) with yk → ȳ, ȳ ∈ Φ(x̄).
A set-valued map Φ : Rn → Rm is said to be upper semicontinuous (usc) at
x̄ ∈ Rn if for any ε > 0, there exists δ > 0 such that
Φ(x) ⊂ Φ(x̄) + εB, ∀ x ∈ Bδ (x̄),
where the balls are in the respective spaces. If Φ is locally bounded and has a
closed graph, then it is usc. If Φ is single-valued, that is, Φ(x) is singleton for
every x, the upper semicontinuity of Φ coincides with continuity.
For more on set-valued maps, the readers may refer to Berge [10]. A de-
tailed analysis of convex function appears in Chapter 2.

1.3 Smooth Convex Optimization


Recall the convex optimization problem (CP ) stated in Section 1.1, that is,

© 2012 by Taylor & Francis Group, LLC


16 What Is Convex Optimization?

(y, f (y))

(x, f (x)) (y, f (x) + ∇f (x)(y − x))


graph of f

x y x

FIGURE 1.2: Graph of a real-valued differentiable convex function.

min f (x) subject to x ∈ C, (CP )


where f : Rn → R is a convex function and C is a closed convex set in Rn . Let
us additionally assume that f is differentiable. It is mentioned in Chapter 2
that if f is differentiable, then for any x ∈ Rn ,
f (y) − f (x) ≥ h∇f (x), y − xi, ∀ y ∈ Rn .
Conversely, if the above relation holds for a function, then the function is
convex. This fact appears as Theorem 2.81 in the next chapter. It is mentioned
there as a consequence of more general facts. However, we provide a direct
proof here.
Observe that if f is convex, then for any x, y ∈ Rn and any λ ∈ [0, 1],
(1 − λ)f (x) + λf (y) ≥ f (x + λ(y − x)).
Hence, for any λ ∈ (0, 1),
f (x + λ(y − x)) − f (x)
f (y) − f (x) ≥ .
λ
Taking the limit as λ ↓ 0, the above inequality yields
f (y) − f (x) ≥ h∇f (x), y − xi. (1.2)
For the converse, suppose that (1.2) holds for any x, y ∈ Rn . Setting
z = x + λ(y − x) with λ ∈ (0, 1), then
f (y) − f (z) ≥ h∇f (z), y − zi (1.3)
f (x) − f (z) ≥ h∇f (z), x − zi (1.4)

© 2012 by Taylor & Francis Group, LLC


1.3 Smooth Convex Optimization 17

The result is obtained by simply multiplying (1.3) with λ and (1.4) with
(1 − λ) and then adding them up. This description geometrically means that
the tangent plane should always lie below the graph of the function. For a
convex function f : R → R, it looks something like Figure 1.2. This important
characterization of a convex function leads to the following result.

Theorem 1.20 Consider the convex optimization problem (CP ) where f is


a differentiable convex function and C is a closed convex set in Rn . Then x̄
is a point of minimizer of (CP ) if and only if

h∇f (x̄), x − x̄i ≥ 0, ∀ x ∈ C. (1.5)

Proof. It is simple to see that as C is a convex set, for x ∈ C,

x̄ + λ(x − x̄) ∈ C, ∀ λ ∈ [0, 1].

Therefore, if x̄ is a point of minimum,

f (x̄ + λ(x − x̄)) ≥ f (x̄),

that is,

f (x̄ + λ(x − x̄)) − f (x̄) ≥ 0.

Dividing both sides by λ > 0 and taking the limit as λ ↓ 0 leads to

h∇f (x̄), x − x̄i ≥ 0.

Because x ∈ C was arbitrarily chosen,

h∇f (x̄), x − x̄i ≥ 0, ∀ x ∈ C.

Also as f is convex, by the condition (1.2), for any x ∈ C,

f (x) − f (x̄) ≥ h∇f (x̄), x − x̄i.

Now if (1.5) is satisfied, then the above inequality reduces to

f (x) ≥ f (x̄), ∀ x ∈ C,

thereby proving the requisite result. 

Remark 1.21 Expressing the optimality condition in the form of (1.5) leads
to what is called a variational inequality. Let F : Rn → Rn be a given function
and C be a closed convex set in Rn . Then the variational inequality V I(F, C)
is the problem of finding x̄ ∈ C such that

hF (x̄), x − x̄i ≥ 0, ∀ x ∈ C.

© 2012 by Taylor & Francis Group, LLC


18 What Is Convex Optimization?

C λ = λ0

x x̄

FIGURE 1.3: Local minimizer is global minimizer.

When f is a differentiable convex function, for F = ∇f , V I(∇f, C) is nothing


but the condition (1.5). In order to solve V I(F, C) efficiently, one needs an
additional property on F which is monotonicity. A function F : Rn → Rn is
called monotone if for any x, y ∈ Rn ,

hF (y) − F (x), y − xi ≥ 0.

However, when f is a convex function, one has the following pleasant property.

Theorem 1.22 A differentiable function f is convex if and only if ∇f is


monotone.

For proof, see Rockafellar [97]. However, the reader should try to prove it on
his/her own. We have shown that when (CP ) has a smooth f , one can write
down a necessary and sufficient condition for a point x̄ ∈ C to be a global
minimizer of (CP ). In fact, as already mentioned, the importance of studying
convexity in optimization stems from the following fact. For the problem (CP ),
every local minimizer is a global minimizer irrespective of the fact whether f
is smooth or not. This can be proved in a simple way as follows. If x̄ is a local
minimizer of (CP ), then there exists δ > 0 such that

f (x) ≥ f (x̄), ∀ x ∈ C ∩ Bδ .

Now consider any x ∈ C. Then it is easy to observe from Figure 1.3 that there
exists λ0 ∈ (0, 1) such that for every λ ∈ (0, λ0 ),

λx + (1 − λ)x̄ ∈ C ∩ Bδ .

Hence

f (λx + (1 − λ)x̄) ≥ f (x̄).

© 2012 by Taylor & Francis Group, LLC


1.3 Smooth Convex Optimization 19

The convexity of f shows that

λ(f (x) − f (x̄)) ≥ 0.

Because λ > 0, f (x) ≥ f (x̄). As x ∈ C was arbitrary, our claim is established.


The result can also be obtained using the approach of contradiction as done
in Theorem 2.90.
Now consider the following function

θ(x) = sup h∇f (x), x − yi.


y∈C

The interesting feature of the function is that

θ(x) ≥ 0, ∀ x ∈ C

and if θ(x) = 0 for x ∈ C, then x solves the problem (CP ). Furthermore, if x


solves the problem (CP ), we have θ(x) = 0. The function θ is usually called the
gap function or the merit function associated with (CP ). For the variational
inequality problem, such a function was first introduced by Auslender [5]. The
next question is how useful is the function θ to the problem (CP ). What we
will now show is that for certain classes of the problem (CP ), the function
θ can provide an error bound for the problem (CP ). By an error bound we
mean an upper estimate of the distance of a point in C to the solution set
of (CP ). The class of convex optimization problems where such a thing can
be achieved is the class of strongly convex optimization problems. A function
f : Rn → R is strongly convex with modulus of strong convexity ρ > 0 if for
any x, y ∈ Rn and λ ∈ [0, 1],

(1 − λ)f (x) + λf (y) ≥ f (x + λ(y − x)) + ρλ(1 − λ)ky − xk2 .

If f is differentiable, then f is strongly convex if and only if for any x, y ∈ Rn ,

f (y) − f (x) ≥ h∇f (x), y − xi + ρky − xk2 .


1
Observe that f (x) = hx, Axi, where x ∈ Rn and A is a positive definite n×n
2
matrix, is strongly convex with ρ = λmin (A), the minimum eigenvalue of A
while f (x) = x with x ∈ Rn is not strongly convex.
If f is a twice continuously differentiable strongly convex function, then
∇2 f (x) is always positive definite for each x. Now if f is strongly convex with
modulus of convexity ρ > 0, then for any x, y ∈ Rn ,

f (y) − f (x) ≥ h∇f (x), y − xi + ρky − xk2 ,


f (x) − f (y) ≥ h∇f (y), x − yi + ρky − xk2 .

Adding the above inequalities leads to

0 ≥ h∇f (x), y − xi + h∇f (y), x − yi + 2ρky − xk2 ,

© 2012 by Taylor & Francis Group, LLC


20 What Is Convex Optimization?

that is,
h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 . (1.6)
The property of ∇f given by (1.6) is called strong monotonicity with 2ρ as the
modulus of monotonicity. It is in fact interesting to observe that if f : Rn → R
is a differentiable function for which there exists ρ > 0 such that for every
x, y ∈ Rn ,

h∇f (y) − ∇f (x), y − xi ≥ 2ρky − xk2 ,

which implies that

h∇f (y) − ∇f (x), y − xi ≥ ρky − xk2 , ∀ x, y ∈ Rn .

Now we request the reader to show that f is strongly convex with modulus
ρ > 0. In fact, if f is strongly convex with ρ > 0 one can also show that ∇f
is strongly monotone with ρ > 0. Thus we conclude that f is strongly convex
with modulus of strong convexity ρ > 0 if and only if ∇f is strongly monotone
with modulus of monotonicity ρ > 0.
It is important to note that one cannot guarantee θ to be finite unless C
has some additional conditions, for example, C is compact. Assume that C
is compact and let x̄ be a solution of the problem (CP ), where f is strongly
convex. (Think why a solution should exist.) Now as f is strongly convex, it
is simple enough to see that x̄ is the unique solution of (CP ). Thus from the
definition of θ, for any x ∈ C and y = x̄,

θ(x) ≥ h∇f (x), x − x̄i.

By strong convexity of f with ρ > 0 as the modulus of strong convexity, ∇f


is strongly monotone with modulus 2ρ. Thus,

h∇f (x), x − x̄i ≥ h∇f (x̄), x − x̄i + 2ρkx − x̄k2 ,

thereby yielding

θ(x) ≥ h∇f (x̄), x − x̄i + 2ρkx − x̄k2 . (1.7)

But by the optimality condition in Theorem 1.20,

h∇f (x̄), x − x̄i ≥ 0, ∀ x ∈ C.

Therefore, the inequality (1.7) reduces to

θ(x) ≥ 2ρkx − x̄k2 ,

which leads to
s
θ(x)
kx − x̄k ≤ .

© 2012 by Taylor & Francis Group, LLC


1.3 Smooth Convex Optimization 21

This provides an error bound for (CP ), where f is strongly convex and C is
compact. In this derivation if ∇f was strongly monotone with modulus ρ > 0,
then the error bound will have the expression
s
θ(x)
kx − x̄k ≤ .
ρ

Observe that as ρ > 0,


s s
θ(x) θ(x)

2ρ ρ

and hence the error bound provided by considering that f is strongly monotone
with modulus 2ρ gives a sharper error bound.
Now the question is can we design a merit function for (CP ) that can be
used to develop an error bound even when C is noncompact. Such a merit func-
tion was first developed by Fukushima [48] for general variational inequalities.
In our context, the function given by
 α 
θ̂α (x) = sup h∇f (x), x − yi − ky − xk2 , α > 0.
y∈C 2

It will be an interesting exercise for the reader to show that

θ̂α (x) ≥ 0, ∀ x ∈ C

and θ̂α (x) = 0 for x ∈ C if and only if x is a solution of (CP ). Observe that
 α 
θ̂α (x) = − inf h∇f (x), y − xi + ky − xk2 , α > 0.
y∈C 2
For a fixed x, observe that the function
α
φα
x (y) = h∇f (x), y − xi + ky − xk2
2
is a strongly convex function and is coercive (Definition 1.13). Hence φαx attains
a lower bound on C. The point of minimum is unique as φα x is strongly convex.
Hence for each x, the function φα x has a finite minimum value. Thus θ̂ α (x) is
always finite, thereby leading to the following error bound.

Theorem 1.23 Consider the convex optimization problem (CP ) where f is


a differentiable strongly convex function with modulus ρ > 0 and C is a closed
convex set in Rn . Let x̄ ∈ C be the unique solution of (CP ). Furthermore, if
α
ρ > , then for any x ∈ C,
2
s
2θ̂α (x)
kx − x̄k ≤ .
2ρ − α

© 2012 by Taylor & Francis Group, LLC


22 What Is Convex Optimization?

Proof. For any x ∈ C and y = x̄ in particular,


α
θ̂α (x) ≥ h∇f (x), x − x̄i − kx − x̄k2 .
2
By the fact that ∇f is strongly monotone with modulus ρ > 0,
α
θ̂α (x) ≥ h∇f (x̄), x − x̄i + ρkx − x̄k2 − kx − x̄k2 . (1.8)
2
Because x̄ is the unique point of minimizer of (CP ), by Theorem 1.20,

h∇f (x̄), x − x̄i ≥ 0,

thereby reducing the inequality (1.8) to


α
θ̂α (x) ≥ (ρ − )kx − x̄k2 .
2
Therefore,
s
2θ̂α (x)
kx − x̄k ≤ ,
2ρ − α

as desired. 
The reader is urged to show that under the hypothesis of the above theo-
rem, one can prove a more tighter error bound of the form
s
2θ̂α (x)
kx − x̄k ≤ .
4ρ − α

The study of optimality conditions with C explicitly given by functional con-


straints will begin in Chapter 3.

© 2012 by Taylor & Francis Group, LLC


Chapter 2
Tools for Convex Optimization

2.1 Introduction
With the basic concepts discussed in the previous chapter, we devote this
chapter to the study of concepts related to the convex analysis. Convex analy-
sis is the branch of mathematics that studies convex objects, namely, convex
sets, convex functions, and convex optimization theory. These concepts will be
used in the subsequent chapters to discuss the details of convex optimization
theory and in the development of the book.

2.2 Convex Sets


Recall that for any x, y ∈ Rn , the set
[x, y] = {z ∈ Rn : z = (1 − λ)x + λy, 0 ≤ λ ≤ 1}
denotes the line segment joining the points x and y. The open line segment
joining x and y is given by
(x, y) = {z ∈ Rn : z = (1 − λ)x + λy, 0 < λ < 1}.
Definition 2.1 A set F ⊂ Rn is a convex set if
λx + (1 − λ)y ∈ F, ∀ x, y ∈ F, ∀ λ ∈ [0, 1].
Equivalently, for any x, y ∈ F , the line segment [x, y] is contained in F . Fig-
ure 2.1 present convex and nonconvex sets.
Consider the hyperplane defined as
H(a, b) = {x ∈ Rn : ha, xi = b},
where a ∈ Rn and b ∈ R. Observe that it is a convex set. Similarly, the closed
half spaces given by
H≤ (a, b) = {x ∈ Rn : ha, xi ≤ b} and H≥ (a, b) = {x ∈ Rn : ha, xi ≥ b},

23

© 2012 by Taylor & Francis Group, LLC


24 Tools for Convex Optimization

convex set nonconvex set

FIGURE 2.1: Convex and nonconvex sets.

and the open half spaces given by


H< (a, b) = {x ∈ Rn : ha, xi < b} and H> (a, b) = {x ∈ Rn : ha, xi > b}
are also convex. Another class of sets that are also convex is the affine sets.
Definition 2.2 A set M ⊂ Rn is said to be an affine set if
(1 − λ)x + λy ∈ M, ∀ x, y ∈ M, ∀ λ ∈ R,
where the set {z ∈ Rn : z = (1 − λ)x + λy, λ ∈ R} denotes the line passing
through x and y. Equivalently, M ∈ Rn is affine if for any x, y ∈ M , the line
passing through them is contained in M .
Note that a hyperplane is an example of an affine set. The empty set ∅ and
the whole space Rn are the trivial examples of affine sets. Even though affine
sets are convex, the converse need not be true, as is obvious from the example
of half spaces.
Next we state some basic properties of convex sets.
Proposition 2.3 (i) The intersection of an arbitrary collection of convex sets
is convex.
(ii) For two convex sets F1 , F2 ⊂ Rn , F1 + F2 is convex.
(iii) For a convex set F ⊂ Rn and scalar λ ∈ R, λF is convex.
(iv) For a convex set F ⊂ Rn and scalars λ1 ≥ 0 and λ2 ≥ 0,
(λ1 + λ2 )F = λ1 F + λ2 F
which is convex.
Proof. The properties (i)-(iii) can be established by simply using Defini-
tion 2.1. The readers are urged to prove (i)-(iii) on their own. Here we will
prove only (iv). Consider z ∈ (λ1 + λ2 )F . Thus, there exists x ∈ F such that
z = (λ1 + λ2 )x = λ1 x + λ2 x ∈ λ1 F + λ2 F.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 25

Because z ∈ (λ1 + λ2 )F was arbitrary,

(λ1 + λ2 )F ⊂ λ1 F + λ2 F. (2.1)

Conversely, let z ∈ λ1 F + λ2 F , which implies that there exist x1 , x2 ∈ F such


that  
λ1 λ2
z = λ1 x1 + λ2 x2 = (λ1 + λ2 ) x1 + x2 . (2.2)
λ1 + λ2 λ1 + λ2
λi
Because ∈ [0, 1], i = 1, 2, which along with the convexity of F implies
λ1 + λ2
that
λ1 λ2
x= x1 + x2 ∈ F. (2.3)
λ1 + λ2 λ1 + λ2
Combining the conditions (2.2) and (2.3) lead to

z = (λ1 + λ2 )x ∈ (λ1 + λ2 )F.

As z ∈ λ1 F + λ2 F was arbitrarily chosen,

(λ1 + λ2 )F ⊃ λ1 F + λ2 F,

which along with the inclusion (2.1) yields the desired equality. Observe that
(ii) and (iii) lead to the convexity of (λ1 + λ2 )F = λ1 F + λ2 F . 
From Proposition 2.3, it is obvious that intersection of finitely many closed
half spaces is again a convex set. Such sets that can be expressed in this
form are called polyhedral sets. These sets play an important role in linear
programming problems. We will deal with polyhedral sets later in this chapter.
However, unlike the intersection and sum of convex sets, the union as well
as the complement of convex sets need not be convex. For instance, consider
the sets

F1 = {(x, y) ∈ R2 : x2 + y 2 ≤ 1} and F2 = {(x, y) ∈ R2 : y ≥ x2 }.

Observe from Figure 2.2 that both F1 and F2 along with their intersection are
convex sets but neither their complements nor the union of these two sets is
convex.
To overcome such situations where nonconvex sets come into the picture
in convex analysis, one has to convexify the nonconvex sets. This leads to the
notion of convex combination and convex hull.

Definition 2.4 A point x ∈ Rn is said to be a convex combination of the


points x1 , x2 , . . . , xm ∈ Rn if

x = λ1 x1 + λ2 x2 + . . . + λm xm
Pm
with λi ≥ 0, i = 1, 2, . . . , m and i=1 λi = 1.

© 2012 by Taylor & Francis Group, LLC


26 Tools for Convex Optimization

0000000
1111111 0
10
10
10
10
10
1 0 1
1
0
1 00
10
1
000
111
0
1
0
1
11
000
10
10
10
1 0
1 0
10
10
1
0
10
1
111
000
1111111
000000011
00 0
1
11
00 0
1
01
1 0
10 1
1 0 10
1
010
10
111
00
11
00
111
000 11
00 111
000
0
100
10
1 0
1 0
100
10
1
F1 111
0000
1
0
1
0
1
0
1
F
0
12
0
1 0
10
1
0
10
1
1111
0000
1111
0000
0
1
11
00 11
00 111
0000
1
11111
000000
10
1 0
1 1111
0000
0
1 0
1
1111
0000
111
000 11
00
111
000 01
10 0
1 0
1
111
000 0 1
1 0 1
1111111111
00000000000
1111111
000000011
00 1111111111
0000000000
11111111111
00000000000
1111111
0000000
1111111
0000000 1111111111
0000000000
F1c F2c

1
0
0
10
1
0
1 0
1
1
0 1
0 1
00
10
1
0
10
1
01
10
0
1 0
1
0
1 0
1 0
1
0
10
10
1
0
10
1
0
1
0
1 0
1
0
1 0
1 0
1
0
1 0
1
0
1
01
1 0
1
0
1
0
1 0
1
1
0
0
1 1
0 1
0
1
0
0
10
10
1
0
0
1
0
1
0
1
1
00
1 0
1
0
1 0
1
0
1 0
1 0
1
0
10
10
1
0
10
1
0
1 0
1 0
1
0
1
1 0
1 0
1
1
00
10
1
0
10
10
1
0
1 0
1 0
0
1 1
0 0
1
0
1
01
100
1
0
1 0
1 0
1
0
1
0
1 0
1
0
1 0
1
0
1
0
1
0
1 11
0
0
1 0 0
1
0
1
0
1
1
0 1
0 1
0
1
0
0
1
0
1 0
1 0
1
0
1
0
1 0
1
0 1
0 1
1 0

F1 ∩ F2 F1 ∪ F2

FIGURE 2.2: F1 , F2 , and F1 ∩ F2 are convex while F1c , F2c , and F1 ∪ F2 are
nonconvex.

The next result expresses the concept of convex set in terms of the convex
combination of its elements.

Theorem 2.5 A set F ⊂ Rn is convex if and only if it contains all the convex
combinations of its elements.

Proof. From Definition 2.1 of convex set, F ⊂ Rn is convex if and only if

(1 − λ)x1 + λx2 ∈ F, ∀ x1 , x2 ∈ F, λ ∈ [0, 1],

that is, the convex combination for m = 2 belongs to F .


To establish the result, we will use the induction approach. Suppose that
the convex combination of m = l − 1 elements of F belong to F . Consider
m = l. The convex combination of l elements is

x = λ1 x1 + λ2 x2 + . . . + λl xl ,

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 27
Pl
where xi ∈ F and λi ≥ 0, i = 1, 2, . . . , l with i=1 λi = 1. Because
Pl
i=1 λi = 1, there exists at least one λj > 0 for some j ∈ {1, 2, . . . , l}. Denote

x̃ = λ̃1 x1 + . . . + λ̃j−1 xj−1 + λ̃j+1 xj+1 + . . . + λ̃l xl ,


λi
where λ̃i = ≥ 0, i = 1, . . . , j − 1, j + 1, . . . , l. Observe that
1 − λj
Pl
i=1, i6=j λ̃i = 1 and thus x̃ ∈ F because it is a convex combination of l − 1
elements of F . The element x can now be expressed in terms of xj and x̃ as

x = λj xj + (1 − λj )x̃.

Therefore, x is a convex combination of two elements from F , which implies


that x ∈ F . Thus, the convex set F can be equivalently expressed as a convex
combination of its elements, as desired. 
Similar to the concept of convex combination of points, next we introduce
the notion of the convex hull of a set.

Definition 2.6 The convex hull of a set F ⊂ Rn is the smallest convex set
containing F and is denoted by co F . It is basically nothing but the intersection
of all the convex sets containing F .

Further, the convex hull of F can be expressed in terms of the convex


combination of the elements of F as presented in the theorem below.

Theorem 2.7 For any set F ⊂ Rn , the convex hull of F , co F , consists of


all the convex combinations of the elements of F , that is,
m
X
co F = {x ∈ Rn : x = λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m,
i=1
m
X
λi = 1, m ≥ 0}.
i=1

Proof. Denote the set of convex combination of the elements of F by F, that


is,
m
X
F = {x ∈ Rn : x = λi xi , xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m,
i=1
m
X
λi = 1, m ≥ 0}.
i=1

From Definition 2.6, co F is the smallest convex set containing F . Therefore,


F ⊂ co F . By Theorem 2.5, the convex combination of the elements of F also
belong to the convex set co F , that is,

co F ⊃ F. (2.4)

© 2012 by Taylor & Francis Group, LLC


28 Tools for Convex Optimization

To establish the result, we will show that F is also convex. Suppose that
Px̃
x, ∈ F, which implies there exist xi ∈ F, λi ≥ P 0, i = 1, 2, . . . , m with
m l
λ
i=1 i = 1 and x̃ i ∈ F, λ̃ i ≥ 0, i = 1, 2, . . . , l with i=1 λ̃i = 1 such that

x = λ1 x1 + λ2 x2 + . . . + λm xm ,
x̃ = λ̃1 x̃1 + λ̃2 x̃2 + . . . + λ̃l x̃l .

For any λ ∈ [0, 1],

(1 − λ)x + λx̃ = (1 − λ)λ1 x1 + . . . + (1 − λ)λm xm + λλ̃1 x̃1 + . . . + λλ̃l x̃l .

Observe that (1 − λ)λi ≥ 0, i = 1, 2, . . . , m and λλ̃i ≥ 0, i = 1, 2, . . . , l


satisfying
m
X l
X
(1 − λ) λi + λ λ̃i = 1.
i=1 i=1

Thus, for any λ ∈ [0, 1],

(1 − λ)x + λx̃ ∈ F.

As x, x̃ ∈ F were arbitrary, the above relation holds for any x, x̃ ∈ F, thereby


implying the convexity of F. Also, F ⊂ F. Therefore, by Definition 2.6 of
convex hull, co F ⊂ F, which along with the inclusion (2.4) leads to the
desired result. 
It follows from the above discussion that a set F ⊂ Rn is convex if co F = F
and thus equivalent to the fact that a convex set F ⊂ Rn contains all the
convex combinations of the elements of F . From the above theorem we observe
that co F is expressed as a convex combination of m elements of F , where
m ≥ 0 is arbitrary. But the obvious question is how large this m has to be
chosen in the result. This is answered in the famous Carathéodory Theorem,
which we present next. Though one finds various approaches to prove the
result [12, 97, 101], we present a simple proof from Mangasarian [82].

Theorem 2.8 (Carathéodory Theorem) Consider a nonempty set F ⊂ Rn .


Then any point of the convex hull of F is representable as a convex combina-
tion of at most n + 1 points of F .

Proof. From Theorem 2.7, any element in co F can be expressed as a convex


combination of m elements of F . We have to show that m ≤ (n + 1). Suppose
that xP∈ co F , which implies that there exist xi ∈ F, λi ≥ 0, i = 1, 2, . . . , m,
m
with i=1 λi = 1 such that

x = λ1 x1 + λ2 x2 + . . . + λm xm .

Assume that m > (n + 1). We will prove that x can be expressed as a convex
combination of (m − 1) elements. The result can be established by applying

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 29

the reduction process until m = n + 1. In case for some i ∈ {1, 2, . . . , m},


λi = 0, then x is a convex combination of (m − 1) elements of F .
Suppose that λi > 0, i = 1, 2, . . . , m. It is known that for l > n, any l
elements in Rn are linearly dependent. As m − 1 > n, there exist αi ∈ R,
i = 1, 2, . . . , m − 1, not all zeroes, such that

α1 (x1 − xm ) + α2 (x2 − xm ) + . . . + αm−1 (xm−1 − xm ) = 0.

Define αm = −(α1 + α2 + . . . + αm−1 ). Observe that


m
X m
X
αi = 0 and αi xi = 0. (2.5)
i=1 i=1

Define λ̃i = λi − γαi , i = 1, 2, . . . , m, where γ > 0 is chosen such that λ̃i ≥ 0,


i ∈ {1, 2, . . . , m} and for some j ∈ {1, 2, . . . , m}, λ̃j = 0. This is possible by
taking
 
1 αi αj
= max = .
γ i=1,...,m λi λj

By choice, λ̃j = 0 and λ̃i ≥ 0, i = 1, . . . , j − 1, j + 1, . . . , m, which by the


condition (2.5) yields
m
X m
X m
X m
X
λ̃i = λ̃i = λi − γ αi = 1
i=1, i6=j i=1 i=1 i=1

and
m
X m
X m
X m
X
x= λi xi = λ̃i xi + γ αi xi = λ̃i xi ,
i=1 i=1 i=1 i=1, i6=j

which implies that x is now expressed as a convex combination of (m − 1)


elements of F . This reduction can be carried out until m = (n + 1) and thus
any element in the convex hull of F is representable as a convex combination
of at most (n + 1) elements of F , as desired. 
Using the above theorem, we have the following important result for a
compact set from Bertsekas [12] and Rockafellar [97].

Theorem 2.9 For a compact set F ⊂ Rn , its convex hull co F is also a


compact set.

Proof. We claim that co F is closed. Consider a sequence {zk } ⊂ co F . By


the Carathéodory Theorem, Theorem 2.8, there exist sequences {λki } ⊂ R+ ,
Pn+1
i = 1, 2, . . . , n + 1, satisfying i=1 λki = 1 and {xki } ⊂ F , i = 1, 2, . . . , n + 1,
such that
m
X
zk = λki xki , ∀ k ∈ N. (2.6)
i=1

© 2012 by Taylor & Francis Group, LLC


30 Tools for Convex Optimization
Pm
As for any k ∈ N, λki ≥ 0, i = 1, 2, . . . , n + 1 with i=1 λki = 1, {λki } is a
bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3,
{λki }, i = 1, 2, . . . , n + 1, has a convergent subsequence. Without loss of
generality, assume that λki → λi , i = 1, 2, . . . , n + 1, such that λi ≥ 0,
Pn+1
i = 1, 2, . . . , n + 1 with i=1 λi = 1. By the compactness of F , the se-
k
quence {xi }, i = 1, 2, . . . , n+1, is bounded. Again by the Bolzano–Weierstrass
Theorem, {xki } has a convergent subsequence. Without loss of generality, let
xki → xi , i = 1, 2, . . . , n + 1. By the compactness of F , F is a closed set and
thus xi ∈ F, i = 1, 2, . . . , n + 1. Taking the limit as k → +∞, (2.6) yields that
n+1
X
zk → z = λi xi ∈ co F.
i=1

Because {zk } ⊂ co F was arbitrary sequence, co F is a closed set.


To prove that co F is compact, we will establish that co F is bounded.
As F is compact, it is a bounded set, which implies that there exists M > 0
such that kxk ≤ M for every x ∈ F . Now consider z ∈ co F , which by the
Carathéodory
Pn+1Theorem implies that there exist λi ≥ 0, i = 1, 2, . . . , n + 1,
satisfying i=1 λi = 1 and xi ∈ F , i = 1, 2, . . . , n + 1, such that

n+1
X
z= λi xi .
i=1

Therefore, by the boundedness of F along with the fact that λi ∈ [0, 1],
i = 1, 2, . . . , n + 1 yields that
n+1
X n+1
X
kzk = k λi xi k ≤ λi kxi k ≤ (n + 1)M.
i=1 i=1

Because z ∈ co F was arbitrary, every element in co F is bounded above by


(n + 1)M and thus co F is bounded. Hence co F is a compact set. 
However, the above result does not hold true if one replaces the compact-
ness of F by simply the closedness of the set. To verify this fact, we present
an example from Bertsekas [12]. Consider the closed set F defined as

F = {(0, 0)} ∪ {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 ≥ 0, x2 ≥ 0},

while the convex hull of F is

co F = {(0, 0)} ∪ {(x1 , x2 ) ∈ R2 : x1 > 0, x2 > 0},

which is not a closed set.


Now, similar to the concepts of convex combination and convex hull, we
present the notions of affine combination and affine hulls.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 31

Definition 2.10 A point x ∈ Rn is said to be a affine combination of the


points x1 , x2 , . . . , xm ∈ Rn if

x = λ1 x1 + λ2 x2 + . . . + λm xm
m
X
with λi ∈ R, i = 1, 2, . . . , m and λi = 1.
i=1

Definition 2.11 The affine hull of a set F ⊂ Rn is the smallest affine set
containing F and is denoted by af f F . It consists of all affine combinations
of the elements of F .

Now we move on to the properties of closure, interior, and relative interior


of convex sets.

Definition 2.12 The closure of a set F ⊂ Rn , cl F , is expressed as


\ \
cl F = (F + εB) = {x + εB : x ∈ F },
ε>0 ε>0

while the interior of the set F , int F , is defined as

int F = {x ∈ Rn : there exists ε > 0 such that (x + εB) ⊂ F }.

It is well known that the arbitrary intersection of closed sets is closed


but not for the union. However, for the case of union, the following relation
holds. For any arbitrary family of set {Fλ }, λ ∈ Λ, where the index set Λ may
possibly be infinite,
[ [
cl Fλ ⊂ cl Fλ .
λ∈Λ λ∈Λ

The notion of interior suffers from a drawback that even for a nonempty
convex set, it may turn out to be empty. For example, consider a line in R2 .
From the above definition, it is obvious that the interior is empty. But the
set of interior points relative to the affine hull of the set is nonempty. This
motivates us to introduce the notion of relative interior.

Definition 2.13 The relative interior of a convex set F ⊂ Rn , ri F , is the


interior of F relative to the affine hull of F , that is,

ri F = {x ∈ Rn : there exists ε > 0 such that (x + εB) ∩ af f F ⊂ F }.

For an n-dimensional convex set F ⊂ Rn , af f F = Rn and thus


ri F = int F . Though the notion of relative interior helps in overcoming the
emptiness of the interior of a convex set, it also suffers from a drawback. For
nonempty convex sets F1 , F2 ⊂ Rn ,

F1 ⊂ F2 =⇒ cl F1 ⊂ cl F2 and int F1 ⊂ int F2 ,

© 2012 by Taylor & Francis Group, LLC


32 Tools for Convex Optimization

but ri F1 ⊂ ri F2 need not hold. For instance, consider F1 = {(0, 0)} and
F2 = {(0, y) ∈ R2 : y ≥ 0}. Here F1 ⊂ F2 with ri F1 = {(0, 0)} and
ri F2 = {(0, y) ∈ R2 : y > 0}. Here the relative interiors are nonempty and
disjoint.
Next we present some properties of closure and relative interior of convex
sets. The proofs are from Bertsekas [11, 12] and Rockafellar [97].

Proposition 2.14 Consider a nonempty convex set F ⊂ Rn . Then the fol-


lowing hold:

(i) ri F is nonempty.
(ii) (Line Segment Principle) Let x ∈ ri F and y ∈ cl F . Then for λ ∈ [0, 1),

(1 − λ)x + λy ∈ ri F.

(iii) (Prolongation Principle) x ∈ ri F if and only if every line segment in


F having x as one end point can be prolonged beyond x without leaving
F , that is, for every y ∈ F there exists γ > 1 such that

x + (γ − 1)(x − y) ∈ F.

(iv) ri F and cl F are convex sets with the same affine hulls as that of F .

Proof. (i) Without loss of generality assume that 0 ∈ F . Then the affine hull
of F , af f F , is a subspace containing F . Denote the dimension of af f F by
m. If m = 0, then F as well as af f F consist of a single point and hence ri F
is the point itself, thus proving the result.
Suppose that m > 0. Then one can always find linearly independent ele-
ments x1 , x2 , . . . , xm from F such that af f F = span{x1 , x2 , . . . , xm }, that is,
x1 , x2 , . . . , xm form a basis of the subspace af f F . If this was not possible, then
there exist linearly independent elements y1 , y2 , . . . , yl with l < m from F such
that F ⊂ span{y1 , y2 , . . . , yl }, thereby contradicting the fact that the dimen-
sion of af f F is m. Observe that co {0, x1 , x2 , . . . , xm } ⊂ F has a nonempty
interior with respect to af f F , which implies co {0, x1 , x2 , . . . , xm } ⊂ ri F ,
thereby yielding that ri F is nonempty.
(ii) Suppose that y ∈ cl F , which implies there exists {yk } ⊂ F such that
yk → y. As x ∈ ri F , there exists ε > 0 such that Bε (x) ∩ af f F ⊂ F . For
λ ∈ [0, 1), define yλ = (1 − λ)x + λy and yk,λ = (1 − λ)x + λyk . Therefore, from
Figure 2.3, it is obvious that each point of B(1−λ)ε (yk,λ ) ∩ af f F is a convex
combination of yk and some point from Bε (x) ∩ af f F . By the convexity of F ,

B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F, ∀ k ∈ N.

Because yk → y, yk,λ → yλ . Thus, for sufficiently large k,

B(1−λ)ε/2 (yλ ) ⊂ B(1−λ)ε (yk,λ ),

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 33

x yk,λ
yk
ε

(1 − λ)ε

FIGURE 2.3: Line segment principle.

which implies

B(1−λ)ε/2 (yλ ) ∩ af f F ⊂ B(1−λ)ε (yk,λ ) ∩ af f F ⊂ F.

Hence, yλ = (1 − λ)x + λy ∈ ri F for λ ∈ [0, 1).


Aliter. The above approach was direct and somewhat cumbersome. In the
proof to follow, we use the fact that relative interiors are preserved under one-
to-one affine transformation of Rn to itself and hence these transformations
preserve the affine hulls. This property simplifies the proofs. If F is an m-
dimensional set in Rn , there exists a one-to-one affine transformation of Rn
to itself that carries af f F to the subspace

S = {(x1 , . . . , xm , xm+1 , . . . , xn ) ∈ Rn : xm+1 = 0, . . . , xn = 0}.

Thus, S can be considered a copy of Rm . From this view, one can simply
consider the case when F ⊂ Rn is an n-dimensional set, which implies that
ri F = int F . We will now establish the result for int F instead of ri F .
Because y ∈ cl F ,

y ∈ F + εB, ∀ ε > 0.

Therefore, for every ε > 0,

(1 − λ)x + λy + εB ⊂ (1 − λ)x + λ(F + εB) + εB


 
ε(1 + λ)
= (1 − λ) x + B + λF, ∀ λ ∈ [0, 1).
1−λ

© 2012 by Taylor & Francis Group, LLC


34 Tools for Convex Optimization

Because x ∈ int F , choosing ε > 0 sufficiently small such that


ε(1 + λ)
x+ B ⊂ F,
1−λ
which along with the convexity of F reduces the preceding relation to

(1 − λ)x + λy + εB ⊂ (1 − λ)F + λF ⊂ F, ∀ λ ∈ [0, 1).

Thus, (1 − λ)x + λy ∈ int F for λ ∈ [0, 1) as desired.


(iii) For x ∈ ri F , by the definition of relative interior, the condition holds.
Conversely, suppose that x ∈ Rn satisfies the condition. We claim that
x ∈ ri F . By (i) there exists x̃ ∈ ri F . If x = x̃, we are done. So assume
that x 6= x̃. As x̃ ∈ ri F ⊂ F , by the condition, there exists γ > 1 such that

y = x + (γ − 1)(x − x̃) ∈ F.
1
Therefore, for λ = ∈ (0, 1),
γ
x = (1 − λ)x̃ + λy,

which by the fact that y ∈ F ⊂ cl F along with the line segment principle (ii)
implies that x ∈ ri F , thereby establishing the result.
(iv) Because ri F ⊂ cl F , by (ii) we have that ri F is convex. From (i), we know
that there exist x1 , x2 , . . . , xm ∈ F such that af f {x1 , x2 , . . . , xm } = af f F
and co {0, x1 , x2 , . . . , xm } ⊂ ri F . Therefore, ri F has an affine hull the same
as that of F .
By Proposition 2.3, F +εB is convex for every ε > 0. Also, as intersection of
convex sets is convex, cl F , which is the intersection of the collection of the sets
F + εB over ε > 0 is convex. Because F ⊂ af f F , cl F ⊂ cl af f F = af f F ,
and as F ⊂ cl F , af f F ⊂ af f cl F , which together implies that the affine
hull of cl F coincides with af f F . 
In the result below we discuss the closure and relative interior operations.
The proofs are from Bertsekas [12] and Rockafellar [97].

Proposition 2.15 Consider nonempty convex sets F, F1 , F2 ⊂ Rn . Then the


following hold:

(i) cl(ri F ) = cl F .
(ii) ri(cl F ) = ri F .
(iii) ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ) and cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2 .
In addition if ri F1 ∩ ri F2 6= ∅,

ri F1 ∩ ri F2 = ri (F1 ∩ F2 ) and cl (F1 ∩ F2 ) = cl F1 ∩ cl F2 .

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 35

(iv) Consider a linear transformation L : Rn → Rm . Then

L(cl F ) ⊂ cl (LF ) and L(ri F ) = ri (LF ).

(v) ri (αF ) = α ri F for every α ∈ R.

(vi) ri (F1 + F2 ) = ri F1 + ri F2 and cl F1 + cl F2 ⊂ cl (F1 + F2 ). If either


F1 or F2 is bounded,

cl F1 + cl F2 = cl (F1 + F2 )

Proof. (i) Because ri F ⊂ F , it is obvious that

cl(ri F ) ⊂ cl F.

Conversely, suppose that y ∈ cl F . We claim that y ∈ cl(ri F ). Consider


any x ∈ ri F . By the line segment principle, Proposition 2.14 (ii), for every
λ ∈ [0, 1),

(1 − λ)x + λy ∈ ri F.

Observe that the sequence {(1 − λk )x + λk y} ⊂ ri F is such that as the limit


λk → 1, (1 − λk )x + λk y → y, which implies that y ∈ cl(ri F ), as claimed.
Hence the result.
(ii) We know that F ⊂ cl F and by Proposition 2.14 (iv), af f F = af f (cl F ).
Consider x ∈ ri F , which by the definition of relative interior along with the
preceding facts imply that there exists ε > 0 such that

(x + εB) ∩ af f (cl F ) = (x + εB) ∩ af f F ⊂ F ⊂ cl F,

thereby yielding that x ∈ ri (cl F ). Hence, ri F ⊂ ri (cl F ).


Conversely, suppose that x ∈ ri (cl F ). We claim that x ∈ ri F . By the
nonemptiness of ri F , Proposition 2.14 (i), there exists x̃ ∈ ri F ⊂ cl F .
If in particular x = x̃, we are done. So assume that x 6= x̃. We can choose
γ > 1, sufficiently close to 1 such that by applying the Prolongation Principle,
Proposition 2.14 (iii),

y = x + (γ − 1)(x − x̃) ∈ cl F.

1
Therefore, for λ = ∈ (0, 1),
γ

x = (1 − λ)x̃ + λy,

which by the Line Segment Principle, Proposition 2.14 (ii), implies that
x ∈ ri F , thereby leading to the requisite result.

© 2012 by Taylor & Francis Group, LLC


36 Tools for Convex Optimization

(iii) Suppose that x ∈ ri F1 ∩ ri F2 and y ∈ F1 ∩ F2 . By the Prolongation


Principle, Proposition 2.14 (iii), there exist γi > 1, i = 1, 2 such that

x + (γi − 1)(x − y) ∈ Fi , i = 1, 2.

Choosing γ = min{γ1 , γ2 } > 1, the above condition reduces to

x + (γ − 1)(x − y) ∈ F1 ∩ F2 ,

which again by the Prolongation Principle leads to x ∈ ri (F1 ∩ F2 ). Thus,

ri F1 ∩ ri F2 ⊂ ri (F1 ∩ F2 ).

Because F1 ∩ F2 ⊂ cl F1 ∩ cl F2 , it is obvious that cl (F1 ∩ F2 ) ⊂ cl F1 ∩ cl F2


as intersection of arbitrary closed sets is closed.
Assume that ri F1 ∩ ri F2 is nonempty. Suppose that x ∈ ri F1 ∩ ri F2
and y ∈ cl F1 ∩ cl F2 . By the Line Segment Principle, Proposition 2.14 (ii),
for every λ ∈ [0, 1),

(1 − λ)x + λy ∈ ri F1 ∩ ri F2 .

Observe that the sequence {(1 − λk )x + λk y} ⊂ F is such that as λk → 1,


(1 − λk )x + λk y → y and hence y ∈ cl (ri F1 ∩ ri F2 ). Therefore,

cl F1 ∩ cl F2 ⊂ cl (ri F1 ∩ ri F2 ) ⊂ cl (F1 ∩ F2 ), (2.7)

thereby yielding the desired equality, that is,

cl F1 ∩ cl F2 = cl (F1 ∩ F2 ).

Also from the inclusion (2.7),

cl (ri F1 ∩ ri F2 ) = cl (F1 ∩ F2 ).

By (ii), the above condition leads to

ri (ri F1 ∩ ri F2 ) = ri (cl (ri F1 ∩ ri F2 ))


= ri (cl (F1 ∩ F2 )) = ri (F1 ∩ F2 ),

which implies that

ri F1 ∩ ri F2 ⊃ ri (F1 ∩ F2 ),

thus establishing the requisite result.


(iv) Suppose that x ∈ cl F , which implies there exists a sequence {xk } ⊂ F
such that xk → x. Because L is a linear transformation, it is continu-
ous. Therefore, L(xk ) → L(x), which implies L(x) ∈ cl(LF ) and hence
L(cl F ) ⊂ cl(LF ).

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 37

As ri F ⊂ F , on applying the linear transformation L, L(ri F ) ⊂ LF and


thus cl L(ri F ) ⊂ cl (LF ). Also, as F ⊂ cl F , proceeding as before which
along with (i) and the closure inclusion yields

LF ⊂ L(cl F ) = L(cl (ri F )) ⊂ cl L(ri F ).

Therefore, cl (LF ) ⊂ cl L(ri F ), which by earlier condition leads to


cl (LF ) = cl L(ri F ). By (ii),

ri (LF ) = ri (cl (LF )) = ri (cl L(ri F )) = ri (L(ri F )),

thereby yielding ri (LF ) ⊂ L(ri F ).


Conversely, suppose that x̄ ∈ L(ri F ), which implies that there exists
x̃ ∈ ri F such that x̄ = L(x̃). Consider any ȳ ∈ LF and corresponding ỹ ∈ F
such that ȳ = L(ỹ). By the Prolongation Principle, Proposition 2.14 (iii),
there exists γ > 1 such that

(1 − γ)ỹ + γ x̃ ∈ F,

which under the linear transformation leads to

(1 − γ)ȳ + γ x̄ ∈ LF.

Because ȳ ∈ LF was arbitrary, again applying the Prolongation Principle


yields x̄ ∈ ri (LF ), that is, L(ri F ) ⊂ ri (LF ), thereby establishing the
desired equality.
(v) For arbitrary but fixed α ∈ R, define a linear transformation Lα : Rn → Rn
given by Lα (x) = αx. Therefore, for a set F , Lα F = αF . For every α ∈ R,
applying (iv) to Lα F leads to

α ri F = Lα (ri F ) = ri (Lα F ) = ri (αF )

and hence the result.


(vi) Define a linear transformation L : Rn × Rn → Rn given by
L(x1 , x2 ) = x1 + x2 which implies L(F1 , F2 ) = F1 + F2 . Now applying (iv)
to L yields

ri (F1 + F2 ) = ri F1 + ri F2 and cl F1 + cl F2 ⊂ cl (F1 + F2 ).

To establish the equality in the closure part, assume that F1 is bounded.


Suppose that x ∈ cl(F1 + F2 ), which implies that there exist {xki } ⊂ Fi ,
i = 1, 2, such that xk1 + xk2 → x. Because F1 is bounded, {xk1 } is a bounded se-
quence, which leads to the boundedness of {xk2 }. By the Bolzano–Weierstrass
Theorem, Proposition 1.3, the sequence {(xk1 , xk2 )} has a subsequence converg-
ing to (x1 , x2 ) such that x1 +x2 = x. As xi ∈ cl Fi for i = 1, 2, x ∈ cl F1 +cl F2 ,
hence establishing the result cl F1 + cl F2 = cl (F1 + F2 ). 

© 2012 by Taylor & Francis Group, LLC


38 Tools for Convex Optimization

Note that for the equality part in Proposition 2.15 (iii), the nonemptiness
of ri F1 ∩ ri F2 is required, otherwise the equality need not hold. We present
an example from Bertsekas [12] to illustrate this fact. Consider the sets

F1 = {x ∈ R : x ≥ 0} and F2 = {x ∈ R : x ≤ 0}.

Therefore, ri (F1 ∩ F2 ) = {0} 6= ∅ = ri F1 ∩ ri F2 . For the closure part,


consider

F1 = {x ∈ R : x > 0} and F2 = {x ∈ R : x < 0}.

Thus, cl (F1 ∩ F2 ) = ∅ 6= {0} = cl F1 ∩ cl F2 .


Also the boundedness assumption in (vi) for the closure equality is neces-
sary. For instance, consider the sets

F1 = {(x1 , x2 ) ∈ R2 : x1 x2 ≥ 1, x1 > 0, x2 > 0},


F2 = {(x1 , x2 ) ∈ R2 : x1 = 0}.

Here, both F1 and F2 are closed unbounded sets, whereas the sum

F1 + F2 = {(x1 , x2 ) ∈ R2 : x1 > 0}

is not closed. Thus cl F1 + cl F2 = F1 + F2 $ cl (F1 + F2 ).


As a consequence of Proposition 2.15, we have the following result from
Rockafellar [97].

Corollary 2.16 (i) Consider two convex sets F1 and F2 in Rn . Then


cl F1 = cl F2 if and only if ri F1 = ri F2 . Equivalently,

ri F1 ⊂ F2 ⊂ cl F1 .

(ii) Consider a convex set F ⊂ Rn . Then any open set that meets cl F also
meets ri F .
(iii) Consider a convex set F ⊂ Rn and an affine set H ⊂ Rn containing a
point from ri F . Then

ri (F ∩ H) = ri F ∩ H and cl(F ∩ H) = cl F ∩ H.

Proof. (i) Suppose that cl F1 = cl F2 . Invoking Proposition 2.15 (ii),

ri F1 = ri(cl F1 ) = ri(cl F2 ) = ri F2 . (2.8)

Now assume that ri F1 = ri F2 , which by Proposition 2.15 (i) implies that

cl F1 = cl(ri F1 ) = cl(ri F2 ) = cl F2 . (2.9)

Combining the relations (2.8) and (2.9) leads to

ri F1 = ri F2 ⊂ F2 ⊂ cl F2 = cl F1 ,

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 39

thereby establishing the desired result.


(ii) Denote the open set by O. Suppose that O meets cl F , that is, there exists
x ∈ Rn such that x ∈ O ∩ cl F . By Proposition 2.15 (i), cl F = cl(ri F ), which
implies

x ∈ O ∩ cl(ri F ).

Because x ∈ cl(ri F ), there exists {xk } ⊂ ri F such that xk → x. Therefore,


for k sufficiently large, one can choose ε̄ > 0 such that xk ∈ x + ε̄B. Also, as
O is an open set and x ∈ O, there exists ε̃ > 0 such that x + ε̃B ⊂ O. Define
ε = min{ε̄, ε̃}. Thus for sufficiently large k,

xk ∈ x + εB ⊂ O,

which along with the fact that xk ∈ ri F implies that O also meets ri F ,
hence proving the result.
(iii) Observe that for an affine set H, ri H = H = cl H. Therefore, by the
given hypothesis,

ri F ∩ H = ri F ∩ ri H 6= ∅.

Thus, by Proposition 2.15 (iii),

ri(F ∩ H) = ri F ∩ H and cl(F ∩ H) = cl F ∩ H,

thereby completing the proof. 


Before moving on to the various classes of convex sets, we would like to
mention the concept of core of a set like the notions of closure and interior of
a set from Borwein and Lewis [17].

Definition 2.17 The core of a set F ⊂ Rn , denoted by core F , is defined as

core F = {x ∈ F : for every d ∈ Rn there exists λ > 0


such that x + λd ∈ F }.

It is obvious that int F ⊂ core F . For a convex set F ⊂ Rn , int F = core F .

2.2.1 Convex Cones


Since we are interested in the study of convex optimization theory, a class
of sets that plays an active role in this direction is the epigraphical set as
discussed briefly in Chapter 1. From the definition it is obvious that epigraph-
ical sets are unbounded. Thus it seems worthwhile to understand the class of
unbounded convex sets for which one needs the idea of recession cones. But
before that, we require the concept of cones.

© 2012 by Taylor & Francis Group, LLC


40 Tools for Convex Optimization

Definition 2.18 A set K ⊂ Rn is said to be a cone if for every x ∈ K,


λx ∈ K for every λ ≥ 0. Therefore, for any set F ⊂ Rn , the cone generated
by F is denoted by cone F and is defined as
[
cone F = λF = {z ∈ Rn : z = λx, x ∈ F, λ ≥ 0}.
λ≥0

Note that for a nonconvex set F , cone F may or may not be convex. For
example, consider F = {(1, 1), (2, 2)}. Here,

cone F = {z ∈ R2 : z = λ(1, 1), λ ≥ 0},

which is convex. Now consider F = {(−1, 1), (1, 1)}. Observe that the cone
generated by F comprises of two rays, that is,

cone F = {z ∈ R2 : z = λ(−1, 1) or z = λ(1, 1), λ ≥ 0}.

But we are interested in the convex scenarios, thereby moving on to the notion
of the convex cone.

Definition 2.19 The set K ⊂ Rn is said to be convex cone if it is convex


as well as a cone. Therefore, for any set F ⊂ Rn , the convex cone generated
by F is denoted by cone co F and is expressed as a set containing all conic
combinations of the elements of the set F , that is,
m
X
cone co F = {x ∈ Rn : x = λi xi , xi ∈ F, λi ≥ 0,
i=1
i = 1, 2, . . . , m, m ≥ 0}.

The convex cone generated by the set F is the smallest convex cone containing
F . Also, for a collection of convex sets Fi ⊂ Rn , i = 1, 2, . . . , m, the convex
cone generated by Fi , i = 1, 2, . . . , m can be easily shown to be expressed as
m
[ m
[ X
cone co Fi = λ i Fi .
i=1 λ∈Rm
+
i=1

Some of the important convex cones that play a pivotal role in convex
optimization are the polar cone, tangent cone, and the normal cone. We shall
discuss them later in the chapter. Before going back to the discussion of un-
bounded sets, we characterize the class of convex cones in the result below.

Theorem 2.20 A cone K ⊂ Rn is convex if and only if K + K ⊂ K.

Proof. Suppose that the cone K is convex. Consider x, y ∈ K. Because K is


convex, for λ = 1/2,
1
(1 − λ)x + λy = (x + y) ∈ K.
2

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 41

Also, as K is a cone, x + y ∈ 2K ⊂ K which implies that K + K ⊂ K.


Conversely, suppose that K + K ⊂ K. Consider x, y ∈ K. Because K is a
cone, for λ ∈ [0, 1],

(1 − λ)x ∈ K and λy ∈ K,

which along with the assumption K + K ⊂ K leads to

(1 − λ)x + λy ∈ K, ∀ λ ∈ [0, 1].

As x, y ∈ K were arbitrary, the cone K is convex, thus proving the result. 


Now coming back to unbounded convex sets, a set can be thought to be un-
bounded if for any point in the set there exists a direction moving along which
one still remains within the set. Such directions are known as the directions
of recession and are independent of the point chosen.

Definition 2.21 For a convex set F ⊂ Rn , d ∈ Rn is said to be the direction


of recession of F if x + λd ∈ F for every x ∈ F and for every λ ≥ 0. The
collection of all the directions of recession of a set F ⊂ Rn form a cone known
as the recession cone of F and is denoted by 0+ F . Equivalently,

0+ F = {d ∈ Rn : F + d ⊂ F }. (2.10)

It is easy to observe that for any d ∈ 0+ F , d belongs to the set on the


right-hand side of the relation (2.10) by choosing in particular λ = 1. Now
suppose that d ∈ Rn belongs to the set on the right-hand side of the relation
(2.10). Therefore, for any x ∈ F , x + d ∈ F . Invoking the condition iteratively,
x + kd ∈ F for k ∈ N. Because F is convex, for any λ̄ ∈ [0, 1],

(1 − λ̄)x + λ̄(x + kd) = x + λ̄kd ∈ F, ∀ k ∈ N.

Denoting λ = λ̄k ≥ 0, the above condition reduces to x + λd ∈ F for every


λ ≥ 0. As this relation holds for any x ∈ F , d ∈ 0+ F , thereby establishing the
relation (2.10).
Below we present some properties of the recession cone with proofs from
Bertsekas [12].

Proposition 2.22 Consider a closed convex set F ⊂ Rn . Then the following


holds:

(i) 0+ F is a closed convex cone.

(ii) d ∈ 0+ F if and only if there exists x ∈ F such that x + λd ∈ F for every


λ ≥ 0.

(iii) The set F is bounded if and only if 0+ F = {0}.

© 2012 by Taylor & Francis Group, LLC


42 Tools for Convex Optimization

Proof. (i) Suppose that d ∈ 0+ F , which implies that for every x ∈ F and
λ ≥ 0, x + λd ∈ F . Consider α > 0. Denote λ̄ = λ/α ≥ 0. Then,

x + λ̄(αd) = x + λd ∈ F,

which implies αd ∈ 0+ F for α > 0. For α = 0, it is trivial. Thus, 0+ F is a


cone.
Suppose that d1 , d2 ∈ 0+ F , which implies for every x ∈ F and λ ≥ 0,

x + λdi ∈ F, i = 1, 2.

By the convexity of F , for any λ̃ ∈ [0, 1],

(1 − λ̃)(x + λd1 ) + λ̃(x + λd2 ) = x + λ((1 − λ̃)d1 + λ̃d2 ) ∈ F,

which yields that (1 − λ̃)d1 + λ̃d2 ∈ 0+ F , thus implying the convexity of 0+ F .


Finally, to establish the closedness of 0+ F , suppose that d ∈ cl 0+ F , which
implies that there exists {dk } ⊂ 0+ F . Therefore, for every x ∈ F and every
λ ≥ 0, x + λdk ∈ F . Because F is closed,

x + λd ∈ F, ∀ x ∈ F, ∀ λ ≥ 0,

which implies that d ∈ 0+ F , thereby implying that 0+ F is closed.


(ii) If d ∈ 0+ F , then from the definition of recession cone itself, the condition
is satisfied. Conversely, suppose that d ∈ Rn is such that there exists x ∈ F
satisfying

x + λd ∈ F, ∀ λ ≥ 0.

Without loss of generality, assume that d 6= 0. Consider arbitrary x̃ ∈ F .


Because 0+ F is a cone, it suffices to prove that x̃ + d ∈ F . Define

xk = x + kd, k ∈ N,

which by the condition implies that {xk } ⊂ F . If x̃ = xk for some k ∈ N, then


again by the condition

x̃ + d = x + (k + 1)d ∈ F

and thus we are done. So assume that x̃ 6= xk for every k. Define


xk − x̃
dk = kdk, ∀ k ∈ N.
kxk − x̃k

kyk
Therefore, for λ̃ = ≥ 0,
kxk − x̃k

x̃ + dk = (1 − λ̃)x̃ + λ̃xk ,

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 43

which implies that x̃ + dk lies on the line starting at x̃ and passing through
xk . Now consider
dk xk − x̃
=
kdk kxk − x̃k
xk − x x − x̃
= +
kxk − x̃k kxk − x̃k
kxk − xk xk − x x − x̃
= +
kxk − x̃k kxk − xk kxk − x̃k
kxk − xk d x − x̃
= + .
kxk − x̃k kdk kxk − x̃k

By the construction of {xk }, we know that it is an unbounded sequence.


Therefore,

kxk − xk kkdk x − x̃ x − x̃
= →1 and = → 0,
kxk − x̃k kx − x̃ + kdk kxk − x̃k kx − x̃ + kdk

which along with the preceding condition leads to dk → d. The vector


x̃ + dk ∈ (x̃, xk ) for every k ∈ N such that kxk − x̃k ≥ kdk, which by the
convexity of F implies that x̃ + dk ∈ F . Therefore, x̃ + dk → x̃ + d, which
by the closedness of F leads to x̃ + d ∈ F . As x̃ ∈ F was arbitrarily chosen,
F + d ⊂ F , thereby implying that d ∈ 0+ F .
(iii) Suppose that F is bounded. Consider 0 6= d ∈ 0+ F , which implies that
for every x ∈ F ,

x + λd ∈ F, ∀ λ ≥ 0.

Therefore, as the limit λ → ∞, kx + λdk → ∞, thereby contradicting the


boundedness of F . Hence, 0+ F = {0}.
Conversely, suppose that F is unbounded. Consider x ∈ F and an un-
bounded sequence {xk } ⊂ F . Define
xk − x
dk = .
kxk − xk

Observe that {dk } is a bounded sequence and thus by the Bolzano–Weierstrass


Theorem, Proposition 1.3, has a convergent subsequence. Without loss of gen-
erality, assume that dk → d and as kdk k = 1, kdk = 1. For any fixed λ ≥ 0,
x + λdk ∈ (x, xk ) for every k ∈ N such that kxk − xk ≥ λ. By the convexity
of F , x + λdk ∈ F . Because x + λdk → x + λd, which by the closedness of F
implies that

x + λd ∈ F, ∀ λ ≥ 0.

Applying (ii) yields that 0 6= d ∈ 0+ F , thereby establishing the result. 

© 2012 by Taylor & Francis Group, LLC


44 Tools for Convex Optimization

Observe that if the set F is not closed, then the recession cone of F need not
be closed. Also the equivalence in (ii) of the above proposition need not hold.
To verify this claim, we present an example from Rockafellar [97]. Consider
the set

F = {(x, y) ∈ R2 : x > 0, y > 0} ∪ {(0, 0)},

which is not closed. Here the recession cone 0+ F = F and hence is not closed.
/ 0+ F but (1, 1)+λ(1, 0) ∈ F for every λ ≥ 0, thereby contradicting
Also (1, 0) ∈
the equivalence in (ii).

2.2.2 Hyperplane and Separation Theorems


An unbounded convex set that plays a pivotal role in the development of
convex optimization is the hyperplane. A hyperplane divides the space into
two half spaces. This property helps in the study of separation theorems, thus
moving us a step ahead in the study of convex analysis.

Definition 2.23 A hyperplane H ⊂ Rn is defined as

H = {x ∈ Rn : ha, xi = b},

where a ∈ Rn with a 6= 0 and b ∈ R. If x̄ ∈ H, then the hyperplane can be


equivalently expressed as

H = {x ∈ Rn : ha, xi = ha, x̄i} = x̄ + {x ∈ Rn : ha, xi = 0}.

Therefore, H is an affine set parallel to {x ∈ Rn : ha, xi = 0}.

Definition 2.24 The hyperplane H divides the space into two half spaces,
either closed or open. The closed half spaces associated with H are

H≤ = {x ∈ Rn : ha, xi ≤ b} and H≥ = {x ∈ Rn : ha, xi ≥ b},

while the open half spaces associated with H are

H< = {x ∈ Rn : ha, xi < b} and H> = {x ∈ Rn : ha, xi > b}.

As already mentioned, the notion of separation is based on the fact that


the hyperplane in Rn divides it into two parts. Before discussing the sepa-
ration theorems, we first present types of separation that we will be using
in our subsequent study of developing the optimality conditions for convex
optimization problems.

Definition 2.25 Consider two convex sets F1 and F2 in Rn . A hyperplane


H ⊂ Rn is said to separate F1 and F2 if

ha, x1 i ≤ b ≤ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 45

The separation is said to be strict if

ha, x1 i ≤ b < ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .

The separation is proper if

sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i.
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2

In particular, if F1 = {x̄} and F2 = F such that x̄ ∈ cl F , a hyperplane that


separates {x̄} and F is called a supporting hyperplane to F at x̄, that is,

ha, x̄i ≤ ha, xi, ∀ x ∈ F.

The next obvious question is when will the separating hyperplane or the
supporting hyperplane exist. In this respect we prove some existence results
below. The proof is from Bertsekas [12].

Theorem 2.26 (i) (Supporting Hyperplane Theorem) Consider a nonempty


convex set F ⊂ Rn and x̄ ∈
/ ri F . Then there exist a ∈ Rn with a 6= 0 and
b ∈ R such that

ha, x̄i ≤ b ≤ ha, xi, ∀ x ∈ F.

(ii) (Separation Theorem) Consider two nonempty convex sets F1 and F2 in


Rn such that either F1 ∩ F2 = ∅ or F1 ∩ ri F2 = ∅. Then there exists a
hyperplane in Rn separating them.
(iii) (Strict Separation Theorem) Consider two nonempty convex sets F1 and
F2 in Rn such that F1 ∩F2 = ∅. Furthermore, if F1 −F2 is closed or F1 is closed
while F2 is compact, then there exists a hyperplane in Rn strictly separating
them. In particular, consider a nonempty closed convex set F ⊂ Rn and x̄ ∈ / F.
Then there exist a ∈ Rn with a 6= 0 and b ∈ R such that

ha, x̄i < b ≤ ha, xi, ∀ x ∈ F.

(iv) (Proper Separation Theorem) Consider a nonempty convex set F ⊂ Rn


and x̄ ∈ Rn . There exists a hyperplane separating F and x̄ properly if and
only if

x̄ ∈
/ ri F.

Further, consider two nonempty convex sets F1 and F2 in Rn . Then


ri F1 ∩ ri F2 = ∅ if and only if there exists a hyperplane in Rn separating
the sets properly.

Proof. (i) Consider the closure of F , cl F , which by Proposition 2.14 (iv) is


also convex. Because x̄ ∈
/ ri F , there exists a sequence {xk } such that xk ∈
/ cl F

© 2012 by Taylor & Francis Group, LLC


46 Tools for Convex Optimization

and xk → x̄. Denote the projection of xk on cl F by x̄k . By Proposition 2.52


(see Section 2.3), for every k ∈ N,

hxk − x̄k , x − x̄k i ≤ 0, ∀ x ∈ cl F,

which implies for every k ∈ N,

hx̄k − xk , xi ≥ hx̄k − xk , x̄i


= hx̄k − xk , x̄ − xk i + hx̄k − xk , xk i
≥ hx̄k − xk , xk i, ∀ x ∈ cl F.

Dividing the above inequality throughout by kx̄k − xk k and denoting


x̄k − xk
ak = ,
kx̄k − xk k

hak , xk i ≤ hak , xi, ∀ x ∈ cl F, ∀ k ∈ N.

As kak k = 1 for every k, {ak } is a bounded sequence. By the Bolzano–


Weierstrass Theorem, Proposition 1.3, {ak } has a convergent subsequence.
Without loss of generality, assume that ak → a, where a 6= 0 with kak = 1.
Taking the limit as k → +∞ in the above inequality yields

ha, x̄i ≤ ha, xi, ∀ x ∈ cl F.

Because F ⊂ cl F , the above inequality holds in particular for x ∈ F , that is,

ha, x̄i ≤ b ≤ ha, xi, ∀ x ∈ F,

where b = inf x∈cl F ha, xi, thereby yielding the desired result. If x̄ ∈ cl F , then
the hyperplane so obtained supports F at x̄.
(ii) Define the set

F = F1 − F2 = {x ∈ Rn : x = x1 − x2 , xi ∈ Fi , i = 1, 2}.

Suppose that either F1 ∩ F2 = ∅ or F1 ∩ ri F2 = ∅. Under both scenarios,


0 ∈
/ ri F . By the Supporting Hyperplane Theorem, that is (i), there exist
a ∈ Rn with a 6= 0 such that

ha, xi ≥ 0, ∀ x ∈ F,

which implies

ha, x1 i ≥ ha, x2 i, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 ,

hence proving the requisite result.


(iii) We shall prove the result under the assumption that F2 −F1 is closed as by
Proposition 2.15, the closedness of F1 along with the compactness of F2 imply

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 47

that F1 − F2 is closed. As F1 ∩ F2 = ∅, 0 6∈ F2 − F1 . Suppose that a ∈ F2 − F1


is the projection of origin on F2 − F1 . Therefore, there exist x̄i ∈ Fi , i = 1, 2,
x̄1 + x̄2
such that a = x̄2 − x̄1 . Define x̄ = . Then the projection of x̄ on cl F1
2
is x̄1 while that on cl F2 is x̄2 . By Proposition 2.52,

hx̄ − x̄i , xi − x̄i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2,

which implies

kak2
ha, x1 i ≤ ha, x̄i − < ha, x̄i, ∀ x1 ∈ F1 ,
2
kak2
ha, x̄i < ha, x̄i + ≤ ha, x2 i, ∀ x2 ∈ F2 .
2
Denoting b = ha, x̄i, the above inequality leads to

ha, x1 i < b < ha, x2 i, ∀ xi ∈ Fi , i = 1, 2,

thus obtaining the strict separation result.


Now consider a closed convex set F ⊂ Rn with x̄ ∈ / F . Taking F1 = F and
F2 = {x̄} in the strict separation result, there exist a ∈ Rn with a 6= 0 and
b ∈ Rn such that

ha, x̄i < b < ha, xi, ∀ x ∈ F.

Defining b̄ = inf x∈F ha, xi, the above inequality yields

ha, x̄i < b̄ ≤ ha, xi, ∀ x ∈ F,

as desired.
(iv) Suppose that there exists a hyperplane that separates F and x̄ properly;
that is, there exists a ∈ Rn with a 6= 0 such that

ha, x̄i ≤ inf ha, xi and ha, x̄i < sup ha, xi.
x∈F x∈F

We claim that x̄ ∈
/ ri F . Suppose on the contrary that x̄ ∈ ri F . Therefore by
the conditions of proper separation, ha, .i attains its minimum at x̄ over F .
By the assumption that x̄ ∈ ri F implies that ha, xi = ha, x̄i for every x ∈ F ,
thereby violating the strict inequality. Hence the supposition was wrong and
x̄ ∈
/ ri F .
Conversely, suppose that x̄ ∈
/ ri F . Consider the following two cases.

(a) x̄ 6∈ af f F : Because af f F is a closed convex subset of Rn , by the Strict


Separation Theorem, that is (iii), there exists a ∈ Rn with a 6= 0 such
that

ha, x̄i < ha, xi, ∀ x ∈ af f F.

© 2012 by Taylor & Francis Group, LLC


48 Tools for Convex Optimization

As F ⊂ af f F , the above inequality holds for every x ∈ F and hence

ha, x̄i ≤ inf ha, xi and ha, x̄i < sup ha, xi,
x∈F x∈F

thereby establishing the proper separation between F and x̄.

(b) x̄ ∈ af f F : Consider a subspace C parallel to af f F and define the


orthogonal complement of C as

C ⊥ = {x∗ ∈ Rn : hx∗ , xi = 0, ∀ x ∈ C}.

Define F̃ = F +C ⊥ and thus, by Proposition 2.15 (vi), ri F̃ = ri F +C ⊥ .


We claim that x̄ ∈/ ri F̃ . On the contrary, assume that x̄ ∈ ri F̃ , which
implies that there exists x ∈ ri F such that x̄−x ∈ C ⊥ . As x̄, x ∈ af f F ,
x̄−x ∈ C. Therefore, kx̄−xk2 = 0, thereby yielding x̄ = x ∈ ri F , which
is a contradiction. Therefore, x̄ 6∈ ri F̃ . By the Supporting Hyperplane
Theorem, that is (i), there exists a ∈ Rn with a 6= 0 such that

ha, x̄i ≤ ha, x̃i, ∀ x̃ ∈ F̃ ,

which implies that

ha, x̄i ≤ ha, x + yi, ∀ x ∈ F, ∀ y ∈ C ⊥ .

Suppose that ha, ȳi 6= 0 for some ȳ ∈ C ⊥ . Without loss of generality,


let ha, ȳi > 0. Consider x̃ = x + αȳ. Therefore, as the limit α → −∞,
ha, x̃i → −∞, thereby contradicting the above inequality. Thus,

ha, yi = 0, ∀ y ∈ C ⊥ .

Now by Proposition 2.14, ri F̃ is nonempty and thus ha, xi is not con-


stant over F̃ . Thus, by the above condition on C ⊥ ,

ha, x̄i < sup ha, x̃i


x̃∈F̃
= sup ha, xi + sup ha, yi = sup ha, xi,
x∈F y∈C ⊥ x∈F

thereby establishing the proper separation between F and x̄.

Thus, the equivalence between the proper separation of F and x̄ and the fact
that x̄ ∈
/ ri F is proved.
Consider the nonempty convex sets F1 , F2 ⊂ Rn . Define F = F2 − F1 ,
which by Proposition 2.15 (v) and (vi) implies that ri F = ri F1 − ri F2 .

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 49

Therefore, ri F1 ∩ ri F2 = ∅ if and only if 0 ∈


/ ri F . By the proper separation
/ ri F is equivalent to the existence of a ∈ Rn with a 6= 0 such that
result, 0 ∈

0 ≤ inf ha, xi and 0 < sup ha, xi.


x∈F x∈F

By Proposition 1.7,

sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i,
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2

thereby completing the proof. 


A consequence of the Separation Theorem is the following characterization
of a closed convex set.

Theorem 2.27 A closed convex set F ⊂ Rn is the intersection of closed half


spaces containing it. Consequently for any F̃ ⊂ Rn , cl co F̃ is the intersection
of all the closed half spaces containing F̃ .

Proof. Without loss of generality, we assume that F 6= Rn , otherwise the


result is trivial. For any x̄ ∈
/ F , define F1 = {x̄} and F2 = F . Therefore by
Theorem 2.26 (iii), there exist (a, b) ∈ Rn × R with a 6= 0 such that

ha, x̄i < b ≤ ha, xi, ∀ x ∈ F,

which implies that a closed half space associated with the supporting hyper-
plane contains F and not x̄. Thus the intersection of the closed half spaces
containing F has no points that are not in F .
For any F̃ ⊂ Rn , taking F = cl co F̃ and applying the result for closed
convex set F yields that cl co F̃ is the intersection of all the closed half spaces
containing F̃ . 
Another application of the separation theorem is the famous Helly’s The-
orem. We state the result from Rockafellar [97] without proof.

Proposition 2.28 (Helly’s Theorem) Consider a collection of nonempty


closed convex sets Fi , i ∈ I in Rn , where I is an arbitrary index set. Assume
that the sets Fi have no common direction of recession. If every subcollection
T
consisting of n + 1 or fewer sets has nonempty intersection, then i∈I Fi is
nonempty.

The supporting hyperplanes through the boundary points of a set charac-


terizes the convexity of the set that we present in the result below. The proof
is from Schneider [103].

Proposition 2.29 Consider a closed set F ⊂ Rn such that int F is nonempty


and through each boundary point of F there is a supporting hyperplane. Then
F is convex.

© 2012 by Taylor & Francis Group, LLC


50 Tools for Convex Optimization

Proof. Suppose that F is not convex, which implies that there exist x, y ∈ F
such that z ∈ [x, y] but z ∈/ F . Because int F is nonempty, there exists some
a ∈ int F such that x, y and a are affinely independent. Also as F is closed,
[a, z) meets the boundary of F , say at b ∈ F . By the given hypothesis, there
is a supporting hyperplane H to F through b with a 6∈ H. Therefore, H meets
af f {x, y, a} in a line and hence x, y and a must lie on the same side of the
line, which is a contradiction. Hence, F is a convex set.

2.2.3 Polar Cones


From the previous discussions, we know that closed half spaces are closed
convex sets and by Proposition 2.3 that arbitrary intersection of half spaces
give rise to another closed convex set. One such class is of the polar cones.
Definition 2.30 Consider a set F ⊂ Rn . The cone defined as
F ◦ = {x∗ ∈ Rn : hx∗ , xi ≤ 0, ∀ x ∈ F }
is called the polar cone of F . Observe that the elements of the polar cone
make an obtuse angle with every element of the set. The cone F ◦◦ = (F ◦ )◦ is
called the bipolar cone of the set F .
Thus, the polar of the set F is a closed convex cone irrespective of whether
F is closed convex or not. We present some properties of polar and bipolar
cones.
Proposition 2.31 (i) Consider two sets F1 , F2 ⊂ Rn such that F1 ⊂ F2 .
Then F2◦ ⊂ F1◦ .
(ii) Consider a nonempty set F ⊂ Rn . Then
F ◦ = (cl F )◦ = (co F )◦ = (cone co F )◦ .
(iii) Consider a nonempty set F ⊂ Rn . Then
F ◦◦ = cl cone co F.
If F is a convex cone, F ◦◦ = cl F and in addition if F is closed, F ◦◦ = F .
(iv) Consider two cones Ki ∈ Rni , i = 1, 2. Then
(K1 × K2 )◦ = K1◦ × K2◦ .

(v) Consider two cones K1 , K2 ⊂ Rn . Then


(K1 + K2 )◦ = K1◦ ∩ K2◦ .
(vi) Consider two closed convex cones K1 , K2 ⊂ Rn . Then
(K1 ∩ K2 )◦ = cl(K1◦ + K2◦ ).
The closure is superfluous under the condition K1 ∩ int K2 6= ∅.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 51

Proof. (i) Suppose that x∗ ∈ F2◦ , which implies that

hx∗ , xi ≤ 0, ∀ x ∈ F2 .

Because F1 ⊂ F2 , the above inequality leads to

hx∗ , xi ≤ 0, ∀ x ∈ F1 ,

thereby showing that F2◦ ⊂ F1◦ .


(ii) As F ⊂ cl F , by (i) (cl F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ .
Consider x ∈ cl F , which implies that there exists {xk } ⊂ F such that xk → x.
Because x∗ ∈ F ◦ , by Definition 2.30,

hx∗ , xk i ≤ 0,

which implies that

hx∗ , xi ≤ 0.

Because x ∈ cl F was arbitrary, the above inequality holds for every x ∈ cl F


and hence x∗ ∈ (cl F )◦ , thus yielding F ◦ = (cl F )◦ .
As F ⊂ co F , by (i) (co F )◦ ⊂ F ◦ . Conversely, suppose that x∗ ∈ F ◦ ,
which implies

hx∗ , xi ≤ 0, ∀ x ∈ F.

For any λ ∈ [0, 1],

hx∗ , (1 − λ)x + λyi ≤ 0, ∀ x, y ∈ F,

which implies that

hx∗ , zi ≤ 0, ∀ z ∈ co F.

Therefore, x∗ ∈ (co F )◦ , as desired.


Also, because F ⊂ cone F , again by (i) (cone F )◦ ⊂ F ◦ . Conversely,
suppose that x∗ ∈ F ◦ . For any λ ≥ 0,

hx∗ , λxi ≤ 0, ∀ x ∈ F,

which implies that

hx∗ , zi ≤ 0, ∀ z ∈ cone F.

Therefore, x∗ ∈ (cone F )◦ , thereby yielding the desired result.


(iii) We shall first establish the case when F is a closed convex cone. Suppose
that x ∈ F . By Definition 2.30 of F ◦ ,

hx∗ , xi ≤ 0, ∀ x∗ ∈ F ◦ ,

© 2012 by Taylor & Francis Group, LLC


52 Tools for Convex Optimization

which implies that x ∈ F ◦◦ . Therefore, F ⊂ F ◦◦ .


Conversely, suppose that x̄ ∈ F ◦◦ . We claim that x ∈ F . On the contrary,
assume that x̄ 6∈ F . Because F is closed, by Theorem 2.26 (iii), there exist
a ∈ Rn with a 6= 0 and b ∈ R such that

ha, x̄i < b ≤ ha, xi, ∀ x ∈ F.

As F is a cone, 0 ∈ F , which along with the above inequality implies that


b ≤ 0 and ha, x̄i < 0. We claim that a ∈ −F ◦ . If not, then there exists x̃ ∈ F
such that ha, x̃i < 0. Choosing λ̃ > 0 such that ha, λ̃x̃i < b. Again, as F is a
cone, λ̃x̃ ∈ F , thereby contradicting the fact that

b ≤ ha, xi, ∀ x ∈ F.

Therefore, a ∈ −F ◦ . Because ha, x̄i < 0 for x̄ ∈ F ◦◦ , it contradicts a ∈ −F ◦ .


Thus we arrive at a contradiction and hence F ◦◦ ⊂ F , thereby leading to the
requisite result.
Now from (ii), it is obvious that

F ◦ = (cl cone co F )◦ .

Therefore,

F ◦◦ = (cl cone co F )◦◦ ,

which by the fact that cl cone co F is a closed convex cone yields that

F ◦◦ = cl cone co F,

as desired. If F is a convex cone, from the above condition it is obvious that


F ◦◦ = cl F .
(iv) Suppose that d = (d1 , d2 ) ∈ (K1 × K2 )◦ , which implies that

hd, xi ≤ 0, ∀ x ∈ K1 × K2 .

Therefore, for x = (x1 , x2 ) ∈ K1 × K2 ,

hd1 , x1 i + hd2 , x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 .

Because K1 and K2 are cones, 0 ∈ Ki , i = 1, 2. In particular, for x2 = 0, the


above inequality reduces to

hd1 , x1 i ≤ 0, ∀ x1 ∈ K1 ,

which implies that d1 ∈ K1◦ . Similarly it can be shown that d2 ∈ K2◦ . Thus,
d ∈ K1◦ × K2◦ , thereby leading to (K1 × K2 )◦ ⊂ K1◦ × K2◦ .
Conversely, suppose that di ∈ Ki◦ , i = 1, 2, which implies

hdi , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 53

Therefore,

h(d1 , d2 ), (x1 , x2 )i ≤ 0, ∀ (x1 , x2 ) ∈ K1 × K2 ,

which yields (d1 , d2 ) ∈ (K1 × K2 )◦ , that is, K1◦ × K2◦ ⊂ (K1 × K2 )◦ , thereby
proving the result.
(v) Suppose that x∗ ∈ (K1 + K2 )◦ , which implies that for xi ∈ Ki , i = 1, 2,

hx∗ , x1 + x2 i ≤ 0, ∀ x1 ∈ K1 , ∀ x2 ∈ K2 .

Because K1 and K2 are cones, 0 ∈ Ki , i = 1, 2, which reduces the above


inequality to

hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.

Therefore, x∗ ∈ K1◦ ∩ K2◦ , thereby leading to (K1 + K2 )◦ ⊂ K1◦ ∩ K2◦ .


Conversely, suppose that x∗ ∈ K1◦ ∩ K2◦ , which implies that

hx∗ , xi i ≤ 0, ∀ xi ∈ Ki , i = 1, 2.

Thus, for x = x1 + x2 ∈ K1 + K2 , the above inequality leads to

hx∗ , xi ≤ 0, ∀ x ∈ K1 + K2 ,

which implies that x∗ ∈ (K1 + K2 )◦ , thus yielding the desired result.


(vi) Replacing Ki by Ki◦ , i = 1, 2, in (iv) along with (iii) leads to

(K1◦ + K2◦ )◦ = K1 ∩ K2 .

Again by (iii), the above condition becomes

cl (K1◦ + K2◦ ) = (K1 ∩ K2 )◦ ,

thereby yielding the requisite result. 


Similar to the polar cone, we have the notion of a positive polar cone.

Definition 2.32 Consider a set F ⊂ Rn . The positive polar cone to the set
F is defined as

F + = {x∗ ∈ Rn : hx∗ , xi ≥ 0, ∀ x ∈ F }.

Observe that F + = (−F )◦ = −F ◦ .

The notion of polarity will play a major role in the study of tangent and
normal cones that are polar to each other. These cones are important in the
development of convex optimization.

© 2012 by Taylor & Francis Group, LLC


54 Tools for Convex Optimization

2.2.4 Tangent and Normal Cones


In the analysis of a constrained optimization problem, we try to look at the
local behavior of the function in the neighboring feasible points. To move
from one point to another feasible point, we need a direction that leads to the
notion of feasible directions.
Definition 2.33 Let F ⊂ Rn and x̄ ∈ F . A vector d ∈ Rn is said to be a
feasible direction of F at x̄ if there exists ᾱ > 0 such that
x̄ + αd ∈ F, ∀ α ∈ [0, ᾱ].
It is easy to observe that the set of all feasible directions forms a cone. For
a convex set F , the set of feasible directions at x̄ is of the form α(x − x̄) where
α ∈ [0, 1] and x ∈ F . However, in case F is nonconvex, the set of feasible
directions may reduce to singleton {0}. For example, consider the nonconvex
set F = {(−1, 1), (1, 1)}. The only feasible direction possible is {0}. This
motivates us to introduce the concept of tangent cones that would provide
local information of the set F at a point even when feasible direction is just
zero. The notion of tangent cones may be considered a generalization of the
tangent concept in a smooth scenario to that in a nonsmooth case.
Definition 2.34 Consider a set F ⊂ Rn and x̄ ∈ F . A vector d ∈ Rn is said
to be a tangent to F at x̄ if there exist {dk } ⊂ Rn with dk → d and {tk } ⊂ R+
with tk → 0 such that
x̄ + tk dk ∈ F, ∀ k ∈ N.
Observe that if d is a tangent, then so is λd for λ ≥ 0. Thus, the collection
of all tangents form a cone known as the tangent cone denoted by TF (x̄) and
given by
TF (x̄) = {d ∈ Rn : there exist dk → d, tk ↓ 0 such that
x̄ + tk dk ∈ F, ∀ k ∈ N}.
In the above definition, denote xk = x̄ + tk dk ∈ F . Taking the limit as
k → +∞, tk → 0, and dk → d, which implies that tk dk → 0, thereby leading
to xk → x̄. Also from construction,
xk − x̄
= dk → d.
tk
Thus, the tangent cone can be equivalently expressed as
TF (x̄) = {d ∈ Rn : there exist {xk } ⊂ F, xk → x̄, tk ↓ 0 such that
xk − x̄
→ d}.
tk
Figure 2.4 is a representation of the tangent cone to a convex set F . Next
we present some properties of the tangent cone. The proofs are from Hiriart-
Urruty and Lemaréchal [63].

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 55

x̄ + TF (x̄)

TF (x̄)

FIGURE 2.4: Tangent cone.

Theorem 2.35 Consider a set F ⊂ Rn and x̄ ∈ F . Then the following hold:

(i) TF (x̄) is closed.

(ii) If F is convex, TF (x̄) is the closure of the cone generated by F − {x̄},


that is,

TF (x̄) = cl cone(F − x̄)

and hence convex.

Proof. (i) Suppose that {dk } ⊂ TF (x̄) such that dk → d. Because dk ∈ TF (x̄),
there exist {xrk } ⊂ F with xrk → x̄ and {trk } ⊂ R+ with trk → 0 such that

xrk − x̄
→ dk , ∀ k ∈ N.
trk

For a fixed k, one can always find r̄ such that

xrk − x̄ 1
k − dk k < , ∀ r ≥ r̄.
trk k

© 2012 by Taylor & Francis Group, LLC


56 Tools for Convex Optimization

Taking the limit as k → +∞, one can generate a sequence {xk } ⊂ F with
xk → x̄ and {tk } ⊂ R+ with tk → 0 such that
xk − x̄
→ d.
tk

Thus, d ∈ TF (x̄), thereby establishing that TF (x̄) is closed.


(ii) Suppose that d ∈ TF (x̄), which implies that there exist {xk } ⊂ F with
xk → x̄ and {tk } ⊂ R+ with tk → 0 such that
xk − x̄
→ d.
tk

Observe that xk − x̄ ∈ F − x̄. As tk > 0, 1/tk > 0, which implies that


xk − x̄
∈ cone (F − x̄),
tk

thereby implying that d ∈ cl cone (F − x̄). Hence

TF (x̄) ⊂ cl cone (F − x̄). (2.11)

Conversely, consider an arbitrary but fixed element x ∈ F . Define a se-


quence
1
xk = x̄ + (x − x̄), k ∈ N.
k
By the convexity of F , it is obvious that {xk } ⊂ F . Taking the limit as
k → +∞, xk → x̄, then by construction

k(xk − x̄) = x − x̄.

1 (xk − x̄)
Denoting tk = > 0, tk → 0 such that → x − x̄, which implies that
k tk
x − x̄ ∈ TF (x̄). Because x ∈ F is arbitrary, F − x̄ ⊂ TF (x̄). As TF (x̄) is a
cone, cone (F − x̄) ⊂ TF (x̄). By (i), TF (x̄) is closed, which implies

cl cone (F − x̄) ⊂ TF (x̄).

The above inclusion along with the reverse inclusion (2.11) yields the desired
equality.
Because F is convex, the set F − x̄ is also convex. Invoking Proposi-
tion 2.14 (iv) implies that TF (x̄) is a convex set. 
We now move on to another conical approximation of a convex set that
is the normal cone that plays a major role in establishing the optimality
conditions.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 57

x̄ + NF (x̄)

NF (x̄)

FIGURE 2.5: Normal cone.

Definition 2.36 Consider a convex set F ⊂ Rn and x̄ ∈ F . A vector d ∈ Rn


is normal to F at x̄ if

hd, x − x̄i ≤ 0, ∀ x ∈ F.

Observe that if d is a normal, then so is λd for λ ≥ 0. The collection of all


normals forms the cone called normal cone and is denoted by NF (x̄).

For a convex set, the relation between the tangent cone and the normal
cone is given by the proposition below.

Proposition 2.37 Consider a convex set F ⊂ Rn . Then TF (x̄) and NF (x̄)


are polar to each other, that is,

NF (x̄) = (TF (x̄))◦ and TF (x̄) = (NF (x̄))◦ .

Proof. Suppose that d ∈ NF (x̄), which implies that

hd, x − x̄i ≤ 0, ∀ x ∈ F.

© 2012 by Taylor & Francis Group, LLC


58 Tools for Convex Optimization

Observe that for x ∈ F , x − x̄ ∈ F − x̄, which implies that d ∈ (F − x̄)◦ . By


Proposition 2.31 (ii) along with the convexity of F and hence of F − x̄, and
Theorem 2.35 (ii),

d ∈ (cl cone (F − x̄))◦ = (TF (x̄))◦ ,

thereby implying that NF (x̄) ∈ (TF (x̄))◦ .


Conversely, suppose that d ∈ (TF (x̄))◦ . As F − x̄ ⊂ TF (x̄), by Proposi-
tion 2.31 (i), (TF (x̄))◦ ⊂ (F − x̄)◦ , which implies that

hd, x − x̄i ≤ 0, ∀ x ∈ F,

that is, d ∈ NF (x̄). Therefore, NF (x̄) = (TF (x̄))◦ as desired.


For a convex set F , TF (x̄) is a closed convex cone. Therefore, by Proposi-
tion 2.31 (iii),

(NF (x̄))◦ = (TF (x̄))◦◦ = TF (x̄),

thereby yielding the requisite result. 

Figure 2.5 is a representation of the normal cone to a convex set F . Observe


that the normal cone is polar to the tangent cone in Figure 2.4. Now we present
some simple examples for tangent cones and normal cones.

Example 2.38 (i) For a convex set F ⊂ Rn , it can be easily observed that
TF (x) = Rn for every x ∈ int F and by polarity, NF (x) = {0} for every
x ∈ int F .
(ii) For a closed convex cone K ⊂ Rn , by Theorem 2.35 (ii) it is obvious that
TK (0) = K while by Proposition 2.37, NK (0) = K ◦ . Also, for 0 6= x ∈ K,
from the definition of normal cone,

NK (x) = {d ∈ Rn : d ∈ K ◦ , hd, xi = 0}.

(iii) Consider the closed convex set F ⊂ Rn given by

F = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m}

and define the active index set I(x) = {i ∈ {1, 2, . . . , m} : hai , xi = bi }. The
set F is called a polyhedral set, which we will discuss in the next section. Then

TF (x) = {d ∈ Rn : hai , di ≤ 0, ∀ i ∈ I(x)},


NF (x) = cone co {ai : i ∈ I(x)}.

Before moving on to discuss the polyhedral sets, we present some results


on the tangent and normal cones.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 59

Proposition 2.39 (i) Consider two closed convex sets Fi ⊂ Rn , i = 1, 2. Let


x̄ ∈ F1 ∩ F2 . Then

T (x̄; F1 ∩ F2 ) ⊂ T (x̄; F1 ) ∩ T (x̄; F2 ),


N (x̄; F1 ∩ F2 ) ⊃ N (x̄; F1 ) + N (x̄; F2 ).

If in addition, ri F1 ∩ ri F2 6= ∅, the above inclusions hold as equality.


(ii) Consider two closed convex sets Fi ⊂ Rni , i = 1, 2. Let x̄i ∈ Fi , i = 1, 2.
Then

T ((x̄1 , x̄2 ); F1 × F2 ) = T (x̄1 ; F1 ) × T (x̄2 ; F2 ),


N ((x̄1 , x̄2 ); F1 × F2 ) = N (x̄1 ; F1 ) × N (x̄2 ; F2 ).

(iii) Consider two closed convex sets Fi ⊂ Rn , i = 1, 2. Let x̄i ∈ Fi , i = 1, 2.


Then

T (x̄1 + x̄2 ; F1 + F2 ) = cl (T (x̄1 ; F1 ) + T (x̄2 ; F2 )),


N (x̄1 + x̄2 ; F1 + F2 ) = N (x̄1 ; F1 ) ∩ N (x̄2 ; F2 ).

Proof. (i) We first establish the result for the normal cone and then use it
to derive the result for the tangent cone. Suppose that di ∈ NFi (x̄), i = 1, 2,
which implies that

hdi , xi − x̄i ≤ 0, ∀ xi ∈ Fi , i = 1, 2.

For any x ∈ F1 ∩ F2 , the above inequality is still valid for i = 1, 2. Therefore,

hd1 + d2 , x − x̄i ≤ 0, ∀ x ∈ F1 ∩ F2 ,

which implies that d1 + d2 ∈ NF1 ∩F2 (x̄). Because di ∈ NFi (x̄), i = 1, 2, were
arbitrarily chosen, NF1 (x̄) + NF2 (x̄) ⊂ NF1 ∩F2 (x̄).
By Propositions 2.31 (i), (v), and 2.37,

TF1 ∩F2 (x̄) ⊂ (NF1 (x̄) + NF2 (x̄))◦ = TF1 (x̄) ∩ TF2 (x̄),

as desired. We shall prove the equality part as an application of the subdiffer-


ential sum rule, Theorem 2.91.
(ii) Suppose that d = (d1 , d2 ) ∈ NF1 ×F2 (x̄1 , x̄2 ), which implies

h(d1 , d2 ), (x1 , x2 ) − (x̄1 , x̄2 )i ≤ 0, ∀ (x1 , x2 ) ∈ F1 × F2 ,

that is,

hd1 , x1 − x̄1 i + hd2 , x2 − x̄2 i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .

The above inequality holds in particular for x2 = x̄2 , thereby reducing it to

hd1 , x1 − x̄1 i ≤ 0, ∀ x1 ∈ F1 ,

© 2012 by Taylor & Francis Group, LLC


60 Tools for Convex Optimization

which by Definition 2.36 implies that d1 ∈ NF1 (x̄1 ). Similarly, it can be


shown that d2 ∈ NF2 (x̄2 ). Because (d1 , d2 ) ∈ NF1 ×F2 (x̄1 , x̄2 ) was arbitrary,
NF1 ×F2 (x̄1 , x̄2 ) ⊂ NF1 (x̄1 ) × NF2 (x̄2 ).
Conversely, consider d1 ∈ NF1 (x̄1 ) and d2 ∈ NF2 (x̄2 ), which implies that
(d1 , d2 ) ∈ NF1 (x̄1 ) × NF2 (x̄2 ). Therefore,
hdi , xi − x̄i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2,
which leads to
h(d1 , d2 ), (x1 , x2 ) − (x̄1 , x̄2 )i ≤ 0, ∀ (x1 , x2 ) ∈ F1 × F2 ,
thereby yielding that (d1 , d2 ) ∈ NF1 ×F2 (x̄1 , x̄2 ). As di ∈ NFi (x̄i ), i = 1, 2,
were arbitrary, NF1 ×F2 (x̄1 , x̄2 ) ⊃ NF1 (x̄1 ) × NF2 (x̄2 ), thereby leading to the
desired result. The result on the tangent cone can be obtained by applying
Propositions 2.31 (iv) and 2.37.
(iii) Suppose that d ∈ NF1 +F2 (x̄1 + x̄2 ), which leads to
hd, x1 − x̄1 i + hd, x2 − x̄2 i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
In particular, for x2 = x̄2 , the above inequality reduces to
hd, x1 − x̄1 i ≤ 0, ∀ x1 ∈ F1 ,
that is, d ∈ NF1 (x̄1 ). Similarly, d ∈ NF2 (x̄). Because d ∈ NF1 +F2 (x̄1 + x̄2 ) was
arbitrary, NF1 +F2 (x̄1 + x̄2 ) ⊂ NF1 (x̄1 ) ∩ NF2 (x̄2 ).
Conversely, consider d ∈ NF1 (x̄1 ) ∩ NF2 (x̄2 ). Therefore,
hd, xi − x̄i i ≤ 0, ∀ xi ∈ Fi , i = 1, 2,
which implies that
hd, (x1 + x2 ) − (x̄1 + x̄2 )i ≤ 0, ∀ x1 ∈ F1 , ∀ x2 ∈ F2 .
This leads to d ∈ NF1 +F2 (x̄1 + x̄2 ). As d ∈ NF1 (x̄1 ) ∩ NF2 (x̄2 ) was arbitrary,
NF1 (x̄1 ) ∩ NF2 (x̄2 ) ⊂ NF1 +F2 (x̄1 + x̄2 ), thus establishing the desired result.
The result on tangent cone can be obtained by applying Propositions 2.31 (vi)
and 2.37. 

2.2.5 Polyhedral Sets


As discussed in the beginning, finite intersection of closed half spaces generate
a class of convex sets known as the polyhedral sets. Here, we discuss briefly
this class of sets.
Definition 2.40 A set P ⊂ Rn is said to be a polyhedral set if it is nonempty
and is expressed as
P = {x ∈ Rn : hai , xi ≤ bi , i = 1, 2, . . . , m},
where ai ∈ Rn and bi ∈ R for i = 1, 2, . . . , m. Obviously, P is a convex set.

© 2012 by Taylor & Francis Group, LLC


2.2 Convex Sets 61

Polyhedral sets play an important role in the study of linear programming


problems. A polyhedral set can also be considered a finite intersection of
closed half spaces and hyperplane. Any hyperplane ha, xi = b can be further
segregated into two half spaces, ha, xi ≤ b and h−a, xi ≤ −b, and thus can be
expressed as the form in the definition. If in the above definition of polyhedral
sets, bi = 0, i = 1, 2, . . . , m, we get the notion of polyhedral cones.

Definition 2.41 A polyhedral set P is a polyhedral cone if and only if it can


be expressed as the intersection of finite collection of closed half spaces whose
supporting hyperplane pass through the origin. Equivalently, the polyhedral
cone P is given by

P = {x ∈ Rn : hai , xi ≤ 0, i = 1, 2, . . . , m},

where ai ∈ Rn for i = 1, 2, . . . , m.

Next we state some operations on the polyhedral sets and cones. For proofs,
the readers are advised to refer to Rockafellar [97].

Proposition 2.42 (i) Consider polyhedral set (cone) P ⊂ Rn and a linear


transformation A : Rn → Rm . Then A(C) as well as A−1 (C) are polyhedral
sets (cones).
(ii) Consider polyhedral sets (cones) Fi ⊂ Rni , i = 1, 2, . . . , m. Then the
Cartesian product F1 × F2 × . . . × Fm is a polyhedral set (cone).
(iii) Consider polyhedral sets (cones) Fi ⊂ Rn , i = 1, 2, . . . , m. Then the
m
\ Xm
intersection Fi and the sum Fi are also polyhedral sets (cones).
i=1 i=1

With the notion of polyhedral sets, another concept that comes into the
picture is that of a finitely generated set.

Definition 2.43 A set F ⊂ Rn is a finitely generated set if and only if there


exist xi ∈ Rn , i = 1, 2, . . . , m, such that for a fixed integer j, 0 ≤ j ≤ m, F is
given by
j
X m
X
F = {x ∈ Rn : x = λi xi + λi xi , λi ≥ 0, i = 1, 2, . . . , m,
i=1 i=j+1
j
X
λi = 1}
i=1

where {x1 , x2 , . . . , xm } are the generators of the set. For a finitely generated
cone, it is the same set with j = 0 and then {x1 , x2 , . . . , xm } are the generators
of the cone.

© 2012 by Taylor & Francis Group, LLC


62 Tools for Convex Optimization

Below we mention some characterizations and properties of polyhedral sets


and finitely generated cones. The results are stated without proofs. For more
details on polyhedral sets, one can refer to Bertsekas [12], Rockafellar [97],
and Wets [111].

Proposition 2.44 (i) A set (cone) is polyhedral if and only if it is finitely


generated.
(ii) The polar of a polyhedral convex set is polyhedral.
(iii) Let x1 , x2 , . . . , xm ∈ Rn . Then the finitely generated cone

F = cone co{x1 , x2 , . . . , xm }

is closed and its polar cone is a polyhedral cone given by

F ◦ = {x ∈ Rn : hxi , xi ≤ 0, i = 1, 2, . . . , m}.

With all these background on convex sets, we move on to the study of


convex functions.

2.3 Convex Functions


We devote this section to the study of convex functions and their properties.
We also look into some special class of convex functions, namely the sublinear
functions. We begin by formally defining the convex functions. But before
that, let us recall some notions.

Definition 2.45 Consider a function φ : Rn → R̄. The domain of the func-


tion φ is defined as

dom φ = {x ∈ Rn : φ(x) < +∞}.

The epigraph of the function φ is given by

epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.

Observe that the notion of epigraph holds for domain points only. The function
is proper if φ(x) > −∞ for every x ∈ Rn and dom φ is nonempty. A function
is said to be improper if there exists x̂ ∈ Rn such that φ(x̂) = −∞.

Definition 2.46 A function φ : Rn → R̄ is said to be convex if for any


x, y ∈ Rn and λ ∈ [0, 1] we have

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 63

φ
epi φ
(y, φ(y))
(x, φ(x))

x y

FIGURE 2.6: Graph and epigraph of convex function.

If φ is a convex function, then the function ψ : Rn → R defined as ψ = −ψ is


said to be a concave function.

Definition 2.47 A function φ : Rn → R̄ is said to be strictly convex if for


distinct x, y ∈ Rn and λ ∈ (0, 1) we have

φ((1 − λ)x + λy) < (1 − λ)φ(x) + λφ(y).

The proposition given below is an equivalent characterization of a convex


set in terms of its epigraph mentioned in Chapter 1.

Proposition 2.48 Consider a proper function φ : Rn → R̄. φ is convex if


and only if epi φ is a convex set on Rn × R.

Proof. Suppose φ is convex. Consider (xi , αi ) ∈ epi φ, i = 1, 2, which implies


that φ(xi ) ≤ αi , i = 1, 2. This along with the convexity of φ yields that for
every λ ∈ [0, 1],

φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ) ≤ (1 − λ)α1 + λα2 .

Thus, ((1 − λ)x1 + λx2 , (1 − λ)α1 + λα2 ) ∈ epi f . Because (xi , αi ), i = 1, 2,


were arbitrary, thereby leading to the convexity of epi φ on Rn × R.
Conversely, suppose that epi φ is convex. Consider x1 , x2 ∈ dom φ. It is
obvious that (xi , φ(xi )) ∈ epi φ, i = 1, 2. By the convexity of epi φ, for every
λ ∈ [0, 1],

(1 − λ)(x1 , φ(x1 )) + λ(x2 , φ(x2 )) ∈ epi φ,

which implies that

φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1],

thereby implying the convexity of φ, and thus establishing the result. 

© 2012 by Taylor & Francis Group, LLC


64 Tools for Convex Optimization

Figure 2.6 represents the graph and epigraph of a convex function. Observe
that the epigraph is a convex set. Another alternate characterization of the
convex function is in terms of the strict epigraph set. So next we state the
notion of strict epigraph and present the equivalent characterization.

Definition 2.49 The strict epigraph of the function φ is given by

epis φ = {(x, α) ∈ Rn × R : φ(x) < α}.

Proposition 2.50 Consider a proper function φ : Rn → R̄. φ is convex if


and only if epis φ is a convex set on Rn × R.

Proof. The necessary part, that is, the convexity of φ implies that epi s φ is
convex, can be worked along the lines of proof of Proposition 2.48.
Conversely, suppose that epis φ is convex. Consider x1 , x2 ∈ dom φ and
αi ∈ R, i = 1, 2 such that φ(xi ) < αi , i = 1, 2. Therefore, (xi , αi ) ∈ epis φ,
i = 1, 2. By the convexity of epis φ, for every λ ∈ [0, 1],

(1 − λ)(x1 , α1 ) + λ(x2 , α2 ) ∈ epis φ,

which implies that

φ((1 − λ)x1 + λx2 ) < (1 − λ)α1 + λα2 , ∀ λ ∈ [0, 1].

As the above inequality holds for every αi > φ(xi ), i = 1, 2, taking the limit
as αi → φ(xi ), i = 1, 2, the above condition becomes

φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1].

Because x1 and x2 were arbitrarily chosen, the above inequality leads to the
convexity of φ and hence the result. 
The definitions presented above are for extended-valued functions. These
definitions can also be given for a function to be convex over a convex set
F ⊂ Rn as φ is convex when dom φ is restricted to F . However, in this book,
we will be considering real-valued functions unless otherwise specified.
Next we state Jensen’s inequality for the proper convex functions. The
proof can be worked out using the induction and the readers are advised to
do so.

Theorem 2.51 (Jensen’s Inequality) Consider a proper convex Pm function


φ : Rn → R̄. Let xi ∈ dom φ and λi ≥ 0 for i = 1, 2, . . . , m with i=1 λi = 1.
Then φ is convex if and only if
Xm m
X
φ( λi xi ) ≤ λi φ(xi )
i=1 i=1

for every such collection of xi and λi , i = 1, 2, . . . , m.

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 65

Consider a set F ⊂ Rn . The indicator function, δF : Rn → R̄, to the set F


is defined as

0, x ∈ F,
δF (x) =
+∞, otherwise.

It can be easily shown that δF is lsc and convex if and only if F is closed and
convex, respectively. Also, for sets F1 , F2 ⊂ Rn ,

δF1 ∩F2 (x) = δF1 (x) + δF2 (x).

The indicator function plays an important role in the study of optimality


conditions by converting a constrained problem into an unconstrained one.
Consider a constrained programming problem

min f (x) subject to x ∈ C,

where f : Rn → R and C ⊂ Rn . Then the associated unconstrained problem


is

min f0 (x) subject to x ∈ Rn ,

where f0 : Rn → R̄ is a function given by f0 (x) = f (x) + δC (x), that is,



f (x), x ∈ C,
f0 (x) =
+∞, otherwise.

We will look into this aspect more when we study the derivations of optimality
condition for the convex programming problem (CP ) presented in Chapter 1
in the subsequent chapters.
For a set F ⊂ Rn , the distance function, dF : Rn → R, to F from a point
x̄ is defined as

dF (x̄) = inf kx − x̄k.


x∈F

For a convex set F , the distance function dF is a convex function. If the


infimum is attained, say at x̃ ∈ F , that is,

inf kx − x̄k = kx̃ − x̄k,


x∈F

then x̃ is said to be a projection of x̄ to F and denoted by projF (x̄). Below


we present an important result on projection.

Proposition 2.52 Consider a closed convex set F ⊂ Rn and x̄ ∈ Rn . Then


x̃ ∈ projF (x̄) if and only if

hx̄ − x̃, x − x̃i ≤ 0, ∀ x ∈ F. (2.12)

© 2012 by Taylor & Francis Group, LLC


66 Tools for Convex Optimization

Proof. Suppose that the inequality (2.12) holds for x̃ ∈ F and x̄ ∈ Rn . For
any x ∈ F , consider
kx − x̄k2 = kx − x̄k2 + kx̃ − x̄k2 − 2hx̄ − x̃, x − x̃i
≥ kx̃ − x̄k2 − 2hx̄ − x̃, x − x̃i, ∀ x ∈ F.
Because (2.12) is assumed to hold, the above condition leads to
kx − x̄k2 ≥ kx̃ − x̄k2 , ∀ x ∈ F,
thereby implying that x̃ ∈ projF (x̄).
Conversely, suppose that x̃ ∈ projF (x̄). Consider any x ∈ F and for
α ∈ [0, 1], define
xα = (1 − α)x̃ + αx ∈ F.
Therefore,
kx̄ − xα k2 = k(1 − α)(x̄ − x̃) + α(x̄ − x)k2
= (1 − α)2 kx̄ − x̃k2 + α2 kx̄ − xk2 + 2α(1 − α)hx̄ − x̃, x̄ − xi.
Observe that as a function of α, kx̄ − xα k2 has a point of minimizer over [0, 1]
at α = 0. Thus,
∇α {kx̄ − xα k2 } |α=0 ≥ 0,
which implies

2 −kx̄ − x̃k2 + hx̄ − x, x̄ − x̃i ≥ 0.
The above inequality leads to
−hx̄ − x̃, x̄ − x̃i + hx̄ − x, x̄ − x̃i = hx̄ − x̃, x̃ − xi ≥ 0, ∀ x ∈ F,
thereby yielding (2.12) and hence completing the proof. 
Another class of functions that is also convex in nature are the sublinear
functions and support functions. These classes of functions will be discussed
in the next subsection. But before that we present some operations on the
convex functions that again belong to the class of convex functions itself.
n
Proposition 2.53 (i) Consider proper Pm convex functions φi : R → R̄ and
αi ≥ 0, i = 1, 2, . . . , m. Then φ = i=1 αi φi is also a convex function.
(ii) Consider a proper convex function φ : Rn → R̄ and a nondecreasing
proper convex function ψ : R → R̄. Then the composition function defined as
(ψ ◦ φ)(x) = ψ(φ(x)) is a convex function provided ψ(+∞) = +∞.
(iii) Consider a family of proper convex functions φi : Rn → R̄, i ∈ I, where
I is an arbitrary index set. Then φ = supi∈I φi is a convex function.
(iv) Consider a convex set F ⊂ Rn+1 . Then φ(x) = inf{α ∈ R : (x, α) ∈ F }
is convex.

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 67

Proof. (i) By Definition 2.46 of convexity, for any x, y ∈ Rn and any λ ∈ [0, 1],

φi ((1 − λ)x + λy) ≤ (1 − λ)φi (x) + λφi (y), i = 1, 2, . . . , m.

As αi ≥ 0, i = 1, 2, . . . , m, multiplying the above inequality by αi and adding


them leads to
m
X m
X m
X
αi φi ((1 − λ)x + λy) ≤ (1 − λ) αi φi (x) + λ αi φi (y),
i=1 i=1 i=1
Pm
thereby yielding the convexity of i=1 αi φi .
(ii) By the convexity of φ, for every x, y ∈ Rn and for every λ ∈ [0, 1],

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).

Because ψ is nondecreasing convex function, for every x, y ∈ Rn ,

ψ(φ((1 − λ)x + λy)) ≤ ψ((1 − λ)φ(x) + λφ(y))


≤ (1 − λ)ψ(φ(x)) + λψ(φ(y)), ∀ λ ∈ [0, 1].

Thus, (ψ ◦ φ) is a convex function.


T
(iii) Observe that epi φ = i∈I epi φi , which on applying Proposition 2.3 (i)
leads to the convexity of epi φ. Now invoking Proposition 2.48 yields the
convexity of φ.
(iv) Consider any arbitrary ε > 0. Then for any (xi , αi ) ∈ F, i = 1, 2,

αi ≤ φ(xi ) + ε, i = 1, 2.

By the convexity of F , for any λ ∈ [0, 1],

φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)α1 + λα2 ≤ (1 − λ)φ(x1 ) + λφ(x2 ) + ε.

Because the above condition holds for every ε > 0, taking the limit as ε → 0,
the above condition reduces to

φ((1 − λ)x1 + λx2 ) ≤ (1 − λ)φ(x1 ) + λφ(x2 ), ∀ λ ∈ [0, 1],

thereby leading to the convexity of φ. 


The proof of (iv) is from Hiriart-Urruty and Lemaréchal [63]. These prop-
erties play an important role in convex analysis. From the earlier discussions
we have that a constrained problem can be equivalently expressed as an uncon-
strained problem using the indicator function. Under the convexity assump-
tions as in the convex programming problem (CP ) and using (i) of the above
proposition, one has that f0 (x) = (f + δC )(x) is a convex function, thereby
reducing (CP ) to an unconstrained convex programming problem that then

© 2012 by Taylor & Francis Group, LLC


68 Tools for Convex Optimization

leads to the KKT optimality conditions under some assumptions, as we shall


see in Chapter 3.
The property (ii) of Proposition 2.53 leads to the formulation of conjugate
functions. We will discuss this class of functions later in this chapter as it will
also play a pivotal in the study of convex optimization theory.
Next we define infimal convolution or simply inf-convolution on convex
functions. The motivation for this operation comes from the sum of epigraph
and the infimum operation as in (iv) of the above proposition. Consider two
proper convex functions φi : Rn → R̄, i = 1, 2. Then by Proposition 2.3 (ii),
the set F = epi φ1 +epi φ2 is a convex set in Rn ×R. Explicitly, F is expressed
as

F = {(x1 + x2 , α1 + α2 ) ∈ Rn × R : (xi , αi ) ∈ epi φi , i = 1, 2}.

Then by (iv) of Proposition 2.53, the function

φ(x) = inf{α1 + α2 : (x1 + x2 , α1 + α2 ) ∈ F, x1 + x2 = x}

is a convex function. This function φ can be reduced to the form known as


the inf-convolution of φ1 and φ2 as defined below.

Definition 2.54 Consider proper convex functions φi : Rn → R̄,


i = 1, 2, . . . , m. Then the infimal convolution or inf-convolution of φ1 and φ2
is denoted by φ1  φ2 : Rn → R̄ and defined as

(φ1  φ2 )(x̄) = inf{φ1 (x1 ) + φ2 (x2 ) : xi ∈ Rn , i = 1, 2, x1 + x2 = x̄}


= inf{φ1 (x) + φ2 (x̄ − x) : x ∈ Rn }.

A simple consequence for the inf-convolution is the distance function. Con-


sider a convex set F ⊂ Rn . Then the distance function φ(x) = dF (x) can be
expressed as

φ(x) = (φ1  φ2 )(x),

where φ1 (x) = kxk and φ2 (x) = δF (x).


As it turns out, the inf-convolution of convex functions is again convex.
To verify this claim, we will need the following result on strict epigraph sum.
The proof appears in Moreau [89] but here we present its proof and that of
the proposition to follow from Attouch, Buttazzo, and Michaille [3].

Proposition 2.55 Consider two proper convex functions φi : Rn → R̄,


i = 1, 2. Then
epis (φ1  φ2 ) = epis φ1 + epis φ2 . (2.13)
Consequently,
cl epi (φ1  φ2 ) = cl (epi φ1 + epi φ2 ). (2.14)

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 69

Proof. Consider (x, α) ∈ epis (φ1  φ2 ), which implies

(φ1  φ2 )(x) < α.

The above inequality holds if and only if there exist x1 , x2 ∈ Rn with


x1 + x2 = x such that

φ1 (x1 ) + φ2 (x2 ) < α.

This is equivalent to the existence of x1 , x2 ∈ Rn and α1 , α2 ∈ R with


x1 + x2 = x and α1 + α2 = α such that φi (xi ) < αi , i = 1, 2, thereby es-
tablishing (2.13).
By Definition 2.49 of strict epigraph, it is obvious that for any function φ,

epis φ ⊂ epi φ and cl epis φ = epi φ,

which along with the strict epigraph condition implies that

epi (φ1  φ2 ) = cl epis (φ1  φ2 )


= cl (epis φ1 + epis φ2 ) ⊂ cl (epi φ1 + epi φ2 ). (2.15)

Now suppose that (xi , αi ) ∈ epi φi , i = 1, 2, which along with the definition
of inf-convolution implies that

(φ1  φ2 )(x1 + x2 ) ≤ φ1 (x1 ) + φ2 (x2 ) ≤ α1 + α2 .

Therefore, (x1 + x2 , α1 + α2 ) ∈ epi (φ1  φ2 ). Because (xi , αi ) ∈ epi φi ,


i = 1, 2, were arbitrary,

epi φ1 + epi φ2 ⊂ epi (φ1  φ2 ).

Taking closure on both sides of the above relation along with the condition
(2.15) yields the condition (2.14), as desired. 
Using this proposition, we now move on to show that the inf-convolution
of proper convex functions is also convex.

Proposition 2.56 Consider two proper convex functions φ1 , φ2 : Rn → R̄.


Then φ1  φ2 is also convex.

Proof. From Proposition 2.55,

epis (φ1  φ2 ) = epis φ1 + epis φ2 .

As φ1 and φ2 are convex functions, by Proposition 2.50, epis φ1 and epis φ2 are
convex sets. This along with the above condition implies that epis (φ1  φ2 ),
is convex which again by the characterization of convex functions, Proposi-
tion 2.50, leads to the convexity of φ1  φ2 . 

© 2012 by Taylor & Francis Group, LLC


70 Tools for Convex Optimization

An application of inf-convolution can be seen in the following property of


indicator function. For convex sets F1 and F2 in Rn , the indicator function of
the sum of the sets is

δF1 +F2 = δF1  δF2 .

The importance of inf-convolution will be discussed in the study of conjugate


functions later in this chapter. For more on inf-convolution, the readers may
refer to Strömberg [107].
As we discussed that the inf-convolution is motivated by taking
F = epi φ1 + epi φ2 , similarly the notion of convex hull of a function is moti-
vated by taking F = co epi φ. Below we define this concept.

Definition 2.57 The convex hull of a nonconvex function φ is denoted as


co φ and is obtained from Proposition 2.53 (iv) with F = co epi φ. Therefore,
by Theorem 2.7, (x, P α) ∈ F if and only if there exist (xi , αi ) ∈ epi φ, λi ≥ 0,
m
i = 1, 2, . . . , m with i=1 λi = 1 such that

(x, α) = λ1 (x1 , α1 ) + λ2 (x2 , α2 ) + . . . + λm (xm , αm )


= (λ1 x1 + λ2 x2 + . . . + λm xm , λ1 α1 + λ2 α2 + . . . + λm αm ).

Because φ(xi ) ≤ αi , i = 1, 2, . . . , m, Proposition 2.53 (iv) leads to

co φ(x) = inf{λ1 φ(x1 ) + λ2 φ(x2 ) + . . . + λm φ(xm ) ∈ R :


λ1 x1 + λ2 x2 + . . . + λm xm = x,
X
λi ≥ 0, i = 1, 2, . . . , m, λi = 1}.

It is the greatest convex function majorized by φ. If φ is convex, co φ = φ. The


S hull of an arbitrary collection of functions {φi : i ∈ I} is denoted by
convex
co i∈I φi and is the convex hull of the pointwise infimum of the collection,
that is,
[
co φi = co inf φi .
i∈I
i∈I

It is a function obtained from Proposition 2.53 (iv) by taking


[
F = co epi φi .
i∈I

It is the greatest convex function majorized by every φi , i = 1, 2, . . . , m.


The closed convex hull of a function φ is denoted by cl co φ and defined as

cl co φ(x′ ) = sup{hξ, x′ i − α : hξ, xi − α ≤ φ(x), ∀ x ∈ Rn }.

Similar to closure of a function, cl co φ satisfies the condition

epi cl co φ = cl co epi φ.

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 71

For more details on the convex hull and the closed convex hull of a function,
readers are advised to refer to Hiriart-Urruty and Lemaréchal [62, 63].
Now before moving on with the properties of convex functions, we briefly
discuss an important class of convex functions, namely sublinear and support
functions, which as we will see later in the chapter are important in the study
of convex analysis.

2.3.1 Sublinear and Support Functions


Definition 2.58 A proper function p : Rn → R̄ is said to be a sublinear
function if and only if p is subadditive and positively homogeneous, that is,
p(x1 + x2 ) ≤ p(x1 ) + p(x2 ), ∀ x1 , x2 ∈ Rn (subadditive property)
p(λx) = λp(x), ∀ x ∈ Rn , ∀ λ > 0 (positively homogeneous property)

From the positive homogeneity property, p(0) = λp(0) for every λ > 0, which
is satisfied for p(0) = 0 as well as p(0) = +∞. Most sublinear functions satisfy
p(0) = 0. As p is proper, dom p is nonempty. So if p(x) < +∞, then by the
positive homogeneity property, p(tx) < +∞, which implies that dom p is a
cone. Observe that as p is positively homogeneous, for x, y ∈ Rn and any
λ ∈ (0, 1),

p((1 − λ)x) = (1 − λ)p(x) and p(λy) = λp(y).

By the subadditive property of p,

p((1 − λ)x + λy) ≤ p((1 − λ)x) + p(λy)


= (1 − λ)p(x) + λp(y), ∀ λ ∈ (0, 1).

The inequality holds as equality for λ = 0 and λ = 1. Because x, y ∈ Rn were


arbitrary, p is convex. Therefore, a sublinear function is a particular class of
convex functions and hence dom p is convex. Next we present a proposition
that gives the geometric characterization of sublinear functions. For the proof,
we will also need the equivalent form of positive homogeneity from Hiriart-
Urruty and Lemaréchal [63] according to which

p(λx) ≤ λp(x), ∀ x ∈ Rn , ∀ λ > 0.

Note that if p is positively homogeneous, then the above condition holds triv-
ially. Conversely, if the above inequality holds, then for any λ > 0,
1
p(x) = p(λ−1 λx) ≤ p(λx), ∀ x ∈ Rn ,
λ
which along with the preceding inequality yields that p is positively homoge-
neous.

Theorem 2.59 Consider a proper function p : Rn → R̄. p is a sublinear


function if and only if its epigraph, epi p, is a convex cone in Rn × R.

© 2012 by Taylor & Francis Group, LLC


72 Tools for Convex Optimization

Proof. Suppose that p is sublinear. From the above discussion, p is a convex


function as well and thus epi p is convex. Consider (x, α) ∈ epi p, which
implies that p(x) ≤ α. By the positively homogeneous property

p(λx) = λp(x) ≤ λα, λ > 0,

which implies that λ(x, α) = (λx, λα) ∈ epi p for every λ > 0. Also,
(0, 0) ∈ epi p. Thus, epi p is a cone.
Conversely, suppose that epi p is a convex cone. By Theorem 2.20, for any
(xi , αi ) ∈ epi p, i = 1, 2,

(x1 + x2 , α1 + α2 ) ∈ epi p.

In particular for αi = p(xi ), i = 1, 2, the above condition leads to

p(x1 + x2 ) ≤ p(x1 ) + p(x2 ).

Because x1 , x2 ∈ Rn are arbitrarily chosen, the above inequality implies that


p is subadditive. Also, as epi p is a cone, any (x, α) ∈ epi p implies that
λ(x, α) ∈ epi p for every λ > 0. In particular, for α = p(x),

p(λx) ≤ λp(x), ∀ λ > 0,

which is an equivalent definition for positive homogeneity, as discussed before.


Hence, p is a sublinear function. 
Sublinear functions are particular class of convex functions. For a convex
cone K ⊂ Rn , the indicator function δK and the distance function dK are
also sublinear functions. An important class of sublinear functions is that of
support functions. We will discuss the support functions in brief.

Definition 2.60 Consider a set F ⊂ Rn . The support function, σF : Rn → R̄,


to F at x̄ ∈ Rn is defined as

σF (x̄) = sup hx̄, xi.


x∈F

From Proposition 1.7 (ii) and (iii), it is obvious that a support function
is sublinear. As it is the supremum of linear functions that are continuous,
support functions are lsc. For a closed convex cone K ⊂ Rn ,

0, hx̄, xi ≤ 0, ∀ x ∈ K,
σK (x̄) =
+∞, otherwise,
which is nothing but the indicator function of the polar cone K ◦ . Equivalently,

σ K = δK ◦ and δK = σ K ◦ .

Next we present some properties of the support functions, the proofs of


which are from Burke and Deng [22], Hiriart-Urruty and Lemaréchal [63], and
Rockafellar [97].

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 73

Proposition 2.61 (i) Consider two convex sets F1 and F2 in Rn . Then

F1 ⊂ F2 =⇒ σF1 (x) ≤ σF2 (x), ∀ x ∈ Rn .

(ii) For a set F ⊂ Rn , one has

σF = σcl F = σco F = σcl co F .

(iii) Consider a convex set F ⊂ Rn . Then x̄ ∈ cl F if and only if

hx∗ , x̄i ≤ σF (x∗ ), ∀ x∗ ∈ Rn .

(iv) For convex sets F1 , F2 ⊂ Rn , cl F1 ⊂ cl F2 if and only if

σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .

(v) Let F1 , F2 ⊂ Rn be convex sets and K ⊂ Rn be a closed convex cone. Then

σF1 (x) ≤ σF2 (x), ∀ x ∈ K ⇐⇒ σF1 (x) ≤ σF2 +K ◦ (x), ∀ x ∈ Rn


⇐⇒ F1 ⊂ cl(F2 + K ◦ ).

(vi) The support function of a set F ⊂ Rn is finite everywhere if and only if


F is bounded.

Proof. (i) By Proposition 1.7 (i), it is obvious that for F1 ⊂ F2 ,

sup hx, x1 i ≤ sup hx, x2 i, ∀ x ∈ Rn ,


x1 ∈F1 x2 ∈F2

thereby leading to the desired result.


(ii) As hx, .i is linear and hence continuous over Rn , then on taking supremum
over F ,

σF (x) = σcl F (x), ∀ x ∈ Rn .

Because F ⊂ co F , by (i),

σF (x) ≤ σco F (x), ∀ x ∈ Rn .

Also, for any x′ ∈ co F , by Carathéodory Theorem,P Theorem 2.8, there ex-


n+1
ist x′i ∈ F , λi ≥ 0, i = 1, 2, . . . , n + 1, satisfying i=1 λi = 1 such that

P n+1 ′
x = i=1 λi xi . Therefore,
n+1
X n+1
X
hx, x′ i = λi hx, x′i i ≤ λi σF (x) = σF (x).
i=1 i=1

Because x′ ∈ co F was arbitrary, the above inequality holds for every x′ ∈ co F


and hence

σco F (x) ≤ σF (x), ∀ x ∈ Rn ,

© 2012 by Taylor & Francis Group, LLC


74 Tools for Convex Optimization

thus yielding the equality as desired. These relations also imply that
σF = σcl co F .
(iii) Invoking Theorem 2.27, the desired result holds.
(iv) By (i) and (ii), cl F1 ⊂ cl F2 implies that
σF1 (x∗ ) ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Conversely, suppose that the above inequality holds, which implies for
every x ∈ cl F1 ,
hx∗ , xi ≤ σF2 (x∗ ), ∀ x∗ ∈ Rn .
Therefore, by (iii), x ∈ cl F2 . Because x ∈ cl F1 was arbitrary, cl F1 ⊂ cl F2 ,
thereby completing the proof.
(v) Consider x ∈ K. As F2 ⊂ F2 + K ◦ , by (i) along with Proposition 1.7 and
the definition of polar cone leads to
σF2 (x) ≤ σF2 +K ◦ (x) = σF2 (x) + σK ◦ (x) ≤ σF2 (x),
that is, σF2 (x) = σF2 +K ◦ (x) for x ∈ K. Now if x 6∈ K, there exists z ∈ K ◦
such that hz, xi > 0. Consider y ∈ F2 . Therefore, as the limit λ → +∞,
hy + λz, xi → +∞ which implies σF2 +K ◦ (x) = +∞. Thus, establishing the
first equivalence. The second equivalence can be obtained by (ii) and (iv).
(vi) Suppose that F is bounded, which implies that there exists M > 0 such
that
kx′ k ≤ M, ∀ x′ ∈ F.
Therefore, by the Cauchy–Schwarz Inequality, Proposition 1.1,
hx, x′ i ≤ kxkkx′ k ≤ kxkM, ∀ x′ ∈ F,
which implies that σF (x) ≤ kxkM for every x ∈ Rn . Thus, σF is finite every-
where.
Conversely, suppose that σF is finite everywhere. In the next section, we
will present a result establishing the local Lipschitz property and hence con-
tinuity of the convex function, Theorem 2.72. This leads to the local bound-
edness. Therefore there exists M such that
hx, x′ i ≤ σF (x) ≤ M, ∀ (x, x′ ) ∈ B × F.
x′
If x′ 6= 0, taking x = , the above inequality leads to kx′ k ≤ M for every
kx′ k
x′ ∈ F , thereby establishing the boundedness of F and hence proving the
result. 
As mentioned earlier, the support function is lsc and sublinear. Similarly,
a closed sublinear function can be viewed as a support function. We end
this subsection by presenting this important result to assert the preceding
statement. The proof is again due to Hiriart-Urruty and Lemaréchal [63].

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 75

Theorem 2.62 For a proper lsc sublinear function σ : Rn → R̄, there exists a
linear function minorizing σ. In fact, σ is the supremum of the linear function
minorizing it; that is, σ is the support function of the closed convex set given
by

Fσ = {x ∈ Rn : hx, di ≤ σ(d), ∀ d ∈ Rn }.

Proof. Because sublinear functions are convex, σ is a proper lsc convex func-
tion. As we will discuss in one of the later sections, every proper lsc convex
function can be represented as a pointwise supremum of affine functions ma-
jorized by it, Theorem 2.100 and there exists (x, α) ∈ Rn × R such that

hx, di − α ≤ σ(d), ∀ d ∈ Rn .

As σ(0) = 0, the preceding inequality leads to α ≥ 0. By the positive homo-


geneity of σ,
α
hx, di − ≤ σ(d), ∀ d ∈ Rn , ∀ λ > 0.
λ
Taking the limit as λ → +∞,

hx, di ≤ σ(d), ∀ d ∈ Rn ,

that is, σ is minorized by linear functions.


As mentioned in the beginning, convex functions are supremum of affine
functions, which for sublinear functions can be restricted to linear functions.
Therefore, by Theorem 2.100,

σ(d) = sup hx, di


x∈Fσ

and hence σ is the support function of Fσ . 


After discussing these classes of convex functions, we move on to discuss
the nature of convex functions.

2.3.2 Continuity Property


We have already discussed the operations that preserve convexity of the func-
tions. Now we shall study the continuity, Lipschitzian and differentiability
properties of the function. But before doing so, let us recall proper functions.
A function φ : Rn → R̄ is proper if φ(x) > −∞ for every x ∈ Rn and
dom φ is nonempty, that is, epi φ is nonempty and contains no vertical lines.
A function that is not proper is called an improper function. We know that
for a convex function, the epigraph is a convex set. If φ is an improper convex
function such that there exists x̄ ∈ ri dom φ such that φ(x̄) = −∞, then the
convexity of epi φ is broken unless φ(x) = −∞ for every x ∈ ri dom φ. Such

© 2012 by Taylor & Francis Group, LLC


76 Tools for Convex Optimization

−1 1 −1 1

epi φ1 epi φ2

FIGURE 2.7: Epigraphs of improper functions φ1 and φ2 .

functions can however have finite values at the boundary points. For example,
consider φ1 : R → R̄ given by

 −∞, |x| < 1,
φ1 (x) = 0, |x| = 1,

+∞, |x| > 1.

Here, φ1 is an improper convex function with finite values at boundary points


of the domain x = −1 and x = 1. Also it is obvious that φ cannot have a finite
value on ri dom φ and −∞ at a boundary point. For better understanding,
suppose that x ∈ ri odm φ such that φ(x) > −∞ and let y be a boundary
point of dom φ with φ(y) = −∞. By the convexity of φ,

φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y), ∀ λ ∈ (0, 1),

which implies that for (1 − λ)x + λy ∈ ri dom φ,

φ((1 − λ)x + λy) = −∞.

This contradicts the convexity of the epigraph. This aspect can be easily
visualized by modifying the previous example as follows. Define an improper
function φ2 : R → R̄ as

 −∞, x = 1,
φ2 (x) = 0, −1 ≤ x < 1,

+∞, |x| > 1.

Obviously φ2 cannot be convex as epi φ2 is not convex as in Figure 2.7. These


discussions can be stated as the following result from Rockafellar [97].

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 77

Proposition 2.63 Consider an improper convex function φ : Rn → R̄. Then


φ(x) = −∞ for every x ∈ ri dom φ. Thus φ is necessarily infinite except
perhaps at the boundary point of dom φ. Moreover, an lsc improper convex
function can have no finite values.

As discussed in Chapter 1, the continuity of a function plays an important


role in the study of its bounds and hence in optimization problems. Before
discussing the continuity property of convex functions we shall present some
results on interior of the epigraph of a convex function and closure of a convex
function.

Proposition 2.64 Consider a convex function φ : Rn → R̄ such that


ri dom φ is nonempty. Then ri epi φ is also nonempty and given by

ri epi φ = {(x, α) ∈ Rn × R : x ∈ ri dom φ, φ(x) < α}.

Equivalently, (x̄, ᾱ) ∈ ri epi φ if and only if ᾱ > lim supx→x̄ φ(x).

Proof. To obtain the result for ri epi φ, it is sufficient to derive it for int epi φ,
that is,

int epi φ = {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}.

By Definition 2.12, for (x̄, ᾱ) ∈ int epi φ, there exists ε > 0 such that

(x̄, ᾱ) + εB ⊂ epi φ,

which implies that x̄ ∈ int dom φ along with φ(x̄) < ᾱ. As (x̄, ᾱ) ∈ int epi φ
is arbitrary,

int epi φ ⊂ {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α}.

Now suppose that x̄ ∈ int dom φ and φ(x̄) ≤ ᾱ. Consider


x1 , x2 , . . . , xm ∈ dom φ such that x̄ ∈ int F where F = co {x1 , x2 , . . . , xm }.
Define

γ = max {φ(xi )}.


1,2,...,m

By the convexity
Pm of F , for any x ∈ F there exist λi ≥ 0, i = 1, 2, . . . , m,
satisfying i=1 λi = 1 such that
m
X
x= λi xi .
i=1

Because φ is convex,
m
X m
X
φ(x) ≤ λi φ(xi ) ≤ λi γ = γ.
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


78 Tools for Convex Optimization

Therefore, the open set

{(x, α) ∈ Rn × R : x ∈ int F, γ < α} ⊂ epi φ.

In particular, for any α > γ, (x̄, α) ∈ int epi φ. Thus, (x̄, ᾱ) can be con-
sidered as lying on the interior of line segment passing through the points
(x̄, α) ∈ int epi φ, which by the line segment principle, Proposition 2.14,
(x̄, ᾱ) ∈ int epi φ. Because (x̄, ᾱ) is arbitrary,

int epi φ ⊃ {(x, α) ∈ Rn × R : x ∈ int dom φ, φ(x) < α},

thereby leading to the requisite result.


Now we move on to prove the equivalent part for ri epi φ. Suppose that
(x̄, ᾱ) ∈ ri epi φ. Therefore, by the earlier characterization one can always
find an ε > 0 such that

x̄ ∈ ri dom φ and sup φ(x) < ᾱ.


x∈Bε (x̄)

Taking the limit as ε → 0 along with Definition 1.5 of limit supremum,

lim sup φ(x) < ᾱ.


x→x̄

Conversely, suppose that for (x̄, ᾱ) the strict inequality condition holds
which implies

lim sup φ(x) < ᾱ.


ε↓0 x∈Bε (x̄)

Therefore, there exists ε > 0 such that

sup φ(x) < ᾱ,


x∈Bε (x̄)

which yields φ(x̄) < ᾱ with x̄ ∈ ri dom φ, thereby proving the equivalent
result. Note that this equivalence can be established for int epi φ as well. 
Note that the above result can also be obtained for the relative interior of
the epigraph as it is nothing but the interior relative to the affine hull of the
epigraph. As a consequence of the above characterization of ri F , we have the
following result from Rockafellar [97].

Corollary 2.65 Consider α ∈ R and φ : Rn → R̄ to be a proper convex


function such that for some x ∈ dom φ, φ(x) < α. Then actually φ(x) < α
for some x ∈ ri dom φ.

Proof. Define a hyperplane H as

H = {(x, µ) ∈ Rn × R : µ < α}.

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 79

Because for some x ∈ Rn , φ(x) < α, in particular for µ = φ(x), we have that
H meets epi φ. Invoking Corollary 2.16 (ii), H also meets ri epi φ, which by
Proposition 2.64 implies that there exists x ∈ ri dom φ such that φ(x) < α,
thereby yielding the desired result. 
Recall that in the previous chapter the closure of a function φ : Rn → R̄
was defined as

cl φ(x̄) = lim inf φ(x), ∀ x̄ ∈ Rn ,


x→x̄

which is a bit complicated to compute. In case of a convex function, it is much


easier to compute and is presented in the next proposition. The proof is from
Rockafellar [97].

Proposition 2.66 Consider a proper convex function φ : Rn → R̄. Then cl φ


agrees with φ in ri dom φ and for x̂ ∈ ri dom φ,

cl φ(x) = lim φ((1 − λ)x̂ + λx), ∀ x ∈ Rn .


λ→1

Proof. From Definition 1.11 of closure of a function, cl φ is lsc and cl φ ≤ φ.


Therefore, by the lower semicontinuity of cl φ,

lim inf (cl φ)((1 − λ)x̂ + λx) = cl φ(x) ≤ lim inf φ((1 − λ)x̂ + λx).
λ→1 λ→1

To prove the result, we will establish the following inequality

cl φ(x) ≥ lim sup φ((1 − λ)x̂ + λx).


λ→1

Consider α ∈ R such that cl φ(x) ≤ α, which implies that

(x, α) ∈ epi cl φ = cl epi φ.

Consider any (x̂, α̂) ∈ ri epi φ. Applying the Line Segment Principle, Propo-
sition 2.14,

(1 − λ)(x̂, α̂) + λ(x, α) ∈ ri epi φ, ∀ λ ∈ [0, 1).

By Proposition 2.64,

φ((1 − λ)x̂ + λx) < (1 − λ)α̂ + λα, ∀ λ ∈ [0, 1).

Taking the limit superior as λ → 1, the above inequality leads to

lim sup φ((1 − λ)x̂ + λx) ≤ lim sup(1 − λ)α̂ + λα = α.


λ→1 λ→1

In particular, taking α = cl φ(x) in the above inequality yields the desired


result.

© 2012 by Taylor & Francis Group, LLC


80 Tools for Convex Optimization

In the relation

cl φ(x) = lim φ((1 − λ)x̂ + λx),


λ→1

in particular, taking x = x̂ ∈ ri dom φ leads to cl φ(x̂) = φ(x̂). Because


x̂ ∈ ri dom φ is arbitrary, cl φ = φ on ri dom φ, that is, cl φ agrees with φ in
ri dom φ. 
Next we present some results from Rockafellar [97] on closure and relative
interior.

Proposition 2.67 Consider a proper convex function φ : Rn → R̄ and let


α ∈ R such that α > inf x∈Rn φ(x). Then the level sets

{x ∈ Rn : φ(x) ≤ α} and {x ∈ Rn : φ(x) < α}

have the same closure and relative interior, namely

{x ∈ Rn : cl φ(x) ≤ α} and {x ∈ Rn : x ∈ ri dom φ, φ(x) < α},

respectively.

Proof. Define a hyperplane H = {(x, α) ∈ Rn × R : x ∈ Rn } in Rn+1 .


Applying Corollary 2.65 and Proposition 2.64, H intersects ri epi φ, which
implies that

ri H ∩ ri epi φ = H ∩ ri epi φ 6= ∅.

Now consider

H ∩ epi φ = {(x, α) ∈ Rn × R : φ(x) ≤ α}.

Invoking Corollary 2.16 (iii),

cl(H ∩ epi φ) = cl H ∩ cl epi φ = H ∩ epi cl φ, (2.16)


ri(H ∩ epi φ) = ri H ∩ ri epi φ = H ∩ ri epi φ. (2.17)

The projection of these sets in Rn are, respectively,

cl {x ∈ Rn : φ(x) ≤ α} = {x ∈ Rn : cl φ(x) ≤ α},


ri {x ∈ Rn : φ(x) ≤ α} = {x ∈ Rn : x ∈ ri dom φ, φ(x) < α}.

The latter relation implies that

ri {x ∈ Rn : φ(x) ≤ α} ⊂ {x ∈ Rn : φ(x) < α} ⊂ {x ∈ Rn : φ(x) ≤ α}.

Therefore, by Corollary 2.16 (ii), {x ∈ Rn : φ(x) < α} has the same closure
and relative interior as {x ∈ Rn : φ(x) ≤ α}. 

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 81

Proposition 2.68 Consider proper convex functions φi : Rn → R̄,


i = 1, 2, . . . , m. If every φi , i = 1, 2, . . . , m, is lsc and φ1 +φ2 +. . .+φm 6≡ +∞,
then φ1 + φ2 + . . . + φm is a proper lsc convex function. If φi , i = 1, 2, . . . , m,
are not all lsc but ri dom φ1 ∩ ri dom φ2 ∩ . . . ∩ ri dom φm is nonempty, then

cl (φ1 + φ2 + . . . + φm ) = cl φ1 + cl φ2 + . . . + cl φm .

Proof. Define φ = φ1 + φ2 + . . . + φm and assume


m
\
x̂ ∈ ri dom φ = ri ( dom φi ).
i=1

By Proposition 2.66, for every x ∈ Rn ,


m
X
cl φ(x) = lim φ((1 − λ)x̂ + λx) = lim φi ((1 − λ)x̂ + λx). (2.18)
λ→1 λ→1
i=1

If φi , i = 1, 2, . . . , m, are all lsc, then the above condition becomes

cl φ(x) = φ1 (x) + φ2 (x) + . . . + φm (x), ∀ x ∈ Rn

and thus cl φ = φ.
Suppose that φi , i = 1, 2, . . . , m, are not all lsc. If
m
\
ri dom φi 6= ∅,
i=1

by Proposition 2.15 (iii),


m
\ m
\
ri dom φi = ri dom φi = ri dom φ.
i=1 i=1

Therefore,

x̂ ∈ ri dom φi , i = 1, 2, . . . , m.

Again by Proposition 2.66,

cl φi (x) = lim φi ((1 − λ)x̂ + λx), i = 1, 2, . . . , m.


λ→1

Therefore, the condition (2.18) becomes

cl φ(x) = cl φ1 (x) + cl φ2 (x) + . . . + cl φm (x), ∀ x ∈ Rn ,

thereby completing the proof. 


Using the above propositions, one can prove the continuity property of the
convex functions.

© 2012 by Taylor & Francis Group, LLC


82 Tools for Convex Optimization

Theorem 2.69 A proper convex function φ : Rn → R̄ is continuous on


ri dom φ.
Proof. By Proposition 2.66, cl φ agrees with φ in ri dom φ, which implies
that φ is lsc on ri dom φ. Now suppose that x̄ ∈ ri dom φ. For any α such
that (x̄, α) ∈ ri epi φ, by Proposition 2.64,
lim sup φ(x) < α.
x→x̄

Taking the limit as α → φ(x̄), the preceding condition becomes


lim sup φ(x) ≤ φ(x̄),
x→x̄

thereby implying the upper semicontinuity of φ at x̄. Because x̄ ∈ ri dom φ


is arbitrary, φ is usc on ri dom φ. Thus φ is continuous on ri dom φ, thereby
yielding the desired result. 
Before moving on to discuss the derivative property of a convex function,
we shall discuss its Lipschitzian property. For that we first define Lipschitz
and locally Lipschitz functions.
Definition 2.70 A function φ : Rn → R is said to be Lipschitz if there exists
L > 0 such that
|φ(x) − φ(y)| ≤ L kx − yk, ∀ x, y ∈ Rn .
The positive number L is called the Lipschitz constant of φ, or φ is said to be
Lipschitz with constant L.
Definition 2.71 Consider a function φ : Rn → R and x̄ ∈ Rn . Then φ is said
to be locally Lipschitz if there exist Lx̄ > 0 and a neighborhood N (x̄) of x̄
such that
|φ(x) − φ(y)| ≤ Lx̄ kx − yk, ∀ x, y ∈ N (x̄).
It is a well known that a Lipschitz function is continuous but the converse
need not hold. From Theorem 2.69, we know that a convex function function
is continuous in the relative interior of its domain. In the results to follow, we
show that local boundedness of a convex function implies that the function is
continuous as well as is locally Lipschitz. The result is from Attouch, Buttazzo,
and Michaille [3].
Theorem 2.72 Consider a proper convex function φ : Rn → R̄ and
x̄ ∈ dom φ. For some ε > 0 such that
sup φ(x) = M < +∞.
x∈Bε (x̄)

Then φ is continuous at x̄. Moreover, φ is Lipschitz continuous on every ball


Bε′ (x̄) with ε′ < ε and
2M
|φ(x) − φ(y)| ≤ kx − yk, ∀ x, y ∈ Bε′ (x̄).
ε − ε′

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 83

Proof. Without loss of generality, by translation (that is, by considering the


function φ(x + x̄) − φ(x̄)), the problem reduces to the case when x̄ = 0 and
φ(0) = 0. Therefore, the local boundedness in the neighborhood of x̄ = 0
reduces to
sup φ(x) = M < +∞.
x∈Bε (0)

Consider an arbitrary δ ∈ (0, 1] and x ∈ Bδε (0). Now expressing


1
x = (1 − δ)0 + δ( x),
δ
1
where x ∈ Bε (0). The convexity of φ along with the local boundedness
δ
condition leads to
1
φ(x) ≤ (1 − δ)φ(0) + δφ( x) ≤ δM.
δ
Rewriting
1 δ −1
0= x+ ( x),
1+δ 1+δ δ
1
where x ∈ Bε (0). Again, the convexity of φ yields
δ
1 δ −1 1 δM
0 = φ(0) ≤ φ(x) + φ( x) ≤ φ(x) + ,
δ 1+δ δ 1+δ 1+δ
which along with the previous condition on φ(x) implies that
−δM ≤ φ(x) ≤ δM.
Because x ∈ Bδε (0) is arbitrary,
|φ(x)| ≤ δM, ∀ x ∈ Bδε (0),
thereby establishing the continuity of φ at 0.
In the above discussion, in particular for δ = 1,
|φ(x + x̄) − φ(x̄)| ≤ M, ∀ x ∈ Bε (0).
Consider arbitrary x, y ∈ Bε′ (x̄) with x 6= y. Denoting δ = ε − ε′ > 0,
δ kx − yk
z =x+ (x − y) and λ= .
kx − yk δ + kx − yk
Observe that
δ
kz − x̄k = k(y − x̄) + + (x − y)k
kx − yk
δ
≤ ky − x̄k + kx − yk
kx − yk
= ε′ + δ = ε,

© 2012 by Taylor & Francis Group, LLC


84 Tools for Convex Optimization

which implies z ∈ Bε (x̄). Also

kx − ykz = (δ + kx − yk)x − δy,

which implies that

x = (1 − λ)y + λz, ∀ λ ∈ (0, 1).

By the convexity of φ,

φ(x) ≤ (1 − λ)φ(y) + λφ(x) = φ(y) + λ(φ(z) − φ(y)),

which leads to

φ(x) − φ(y) ≤ λ(φ(z) − φ(y)) ≤ λ|φ(z) − φ(y)|.

Observe that

|φ(z) − φ(y)| ≤ |φ(z) − φ(x̄)| + |φ(y) − φ(x̄)| ≤ 2M,

as z ∈ Bε (x̄) and y ∈ Bε′ (x̄) ⊂ Bε (x̄). Therefore,

kx − yk 2M
φ(x) − φ(y) ≤ 2M ≤ kx − yk.
δ + kx − yk δ

Interchanging the roles of x and y yields

2M
|φ(x) − φ(y)| ≤ kx − yk,
δ
thereby establishing the result. 
In the above result, we showed that if a proper convex function is locally
bounded at a point, then it is locally Lipschitz at that point. As a matter of
fact, it is more than that which is presented in the result below, the proof of
which is along the lines of Hiriart-Urruty and Lemaréchal [63].

Theorem 2.73 Consider a proper convex function φ : Rn → R̄. Then φ is


locally Lipschitz on ri dom φ.

Proof. Similar to the proof of Proposition 2.14 (i), consider n + 1 lin-


early independent vectors x1 , x2 , . . . , xn+1 ∈ dom φ such that
x̄ ∈ ri co {x1 , x2 , . . . , xn+1 } ⊂ dom φ. Now consider ε > 0 such that
Bε (x̄) ⊂ co {x1 , x2 , . . . , xn+1 }. For any arbitrary x ∈ Bε (x̄), there exist λi ≥ 0,
Pn+1
i = 1, 2, . . . , n + 1, satisfying i=1 λi = 1 such that
n+1
X
x= λi xi .
i=1

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 85

By the convexity of φ,
n+1
X
φ(x) ≤ λi φ(xi ) ≤ max φ(xi ) = M < +∞.
1,2,...,n+1
i=1

Because x ∈ Bε (x̄) is arbitrary, the above condition holds for every x ∈ Bε (x̄).
Therefore, by Theorem 2.72, for ε′ < ε,

2M
|φ(x) − φ(y)| ≤ kx − yk, ∀ x, y ∈ Bε′ (x̄),
ε − ε′
thus proving that φ is locally Lipschitz at x̄ ∈ ri dom φ. Because x̄ ∈ ri dom φ
is arbitrary, φ is locally Lipschitz on ri dom φ. 

2.3.3 Differentiability Property


After discussing the continuity and the Lipschitzian property of a convex
function, we will now make a move toward studying its differentiability na-
ture. In general, a convex function need not be differentiable on the whole of
Rn . For instance, consider the convex function φ(x) = |x|. It is differentiable
everywhere except x = 0, which is the point of minimizer if we minimize this
function over the whole of R. Another example of nonsmooth convex function
that appears naturally is the max-function. Consider φ(x) = max{x, x2 }. As
we know from Proposition 2.53, the supremum of convex functions is convex,
so φ is convex. Here both x and x2 are differentiable over R but φ is not
differentiable at x = 0 and x = 1. Again for the unconstrained minimization
of φ over R, the point of minimizer is x̄ = 0. So how is one supposed to
study the optimality at a point if the function is not differentiable there. This
means the notion of differentiability must be replaced by some other concept
so as to facilitate nonsmooth convex functions. For a differentiable function
we know that both left-sided as well as right-sided derivatives exist and are
equal. In case of a convex function, the right-sided derivative always exists.
So in the direction to replace differentiability, we first introduce the concept
of one-sided directional derivative or simply directional derivative.

Definition 2.74 For a proper convex function φ : Rn → R̄, the directional


derivative of φ at x̄ ∈ dom φ in the direction d ∈ Rn is defined as

φ(x̄ + λd) − φ(x̄)


φ′ (x̄, d) = lim ,
λ↓0 λ

provided +∞ and −∞ are allowed as limits.

Before we move on to present the result on the existence of directional


derivatives of a convex function, we present a result from Rockafellar and
Wets [101].

© 2012 by Taylor & Francis Group, LLC


86 Tools for Convex Optimization

Proposition 2.75 (Slope Inequality) Consider a function φ : I → R where


I ⊂ R denotes an interval. Then φ is convex on I if and only if for arbitrary
points x < z < y in I,
φ(z) − φ(x) φ(y) − φ(x) φ(y) − φ(z)
≤ ≤ . (2.19)
z−x y−x y−z
φ(y) − φ(x)
Consequently, ψ(y) = is nondecreasing on I for every y ∈ I \{x}.
y−x
Moreover, if φ is differentiable over an open interval I ⊂ R, then ∇φ is
nondecreasing on I.
Proof. We know that the convexity of φ on I is equivalent to
   
y−z z−x
φ(z) ≤ φ(x) + φ(y), ∀ x < z < y in I.
y−x y−x
The above inequality leads to
 
y−z z−x
φ(z) − φ(x) ≤ − 1 φ(x) + φ(y)
y−x y−x
 
φ(y) − φ(x)
= (z − x) ,
y−x
as desired. The other inequalities can be established similarly, thereby leading
to (2.19).
Conversely, suppose that x < z < y, which implies that there exists
λ ∈ (0, 1) such that z = (1 − λ)x + λy. Substituting z = (1 − λ)x + λy in
(2.19) leads to
φ((1 − λ)x + λy) − φ(x) φ(y) − φ(x)
≤ ,
λ(y − x) y−x
that is,
φ((1 − λ)x + λy) ≤ (1 − λ)φ(x) + λφ(y).
Because x and y were arbitrarily chosen, the above inequality holds for any
x, y ∈ I and any λ ∈ [0, 1] (the above inequality holds trivially for λ = 0 and
λ = 1). Hence, φ is a convex function.
Suppose that y1 , y2 ∈ I such that yi 6= x, i = 1, 2, and y1 < y2 . Consider
the following cases:
x < y1 < y2 , y1 < x < y2 and y1 < y2 < x.
Suppose that x < y1 < y2 . In particular, for z = y1 and y = y2 in the
inequality (2.19) yields
φ(y1 ) − φ(x) φ(y2 ) − φ(x)
ψ(y1 ) = ≤ = ψ(y2 ).
y1 − x y2 − x

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 87

Applying (2.19) to the remaining two cases leads to the fact that
φ(y) − φ(x)
ψ(y) = is nondecreasing.
y−x
Suppose that φ is convex, which implies (2.19) holds. As φ is differentiable,
for x1 , x2 ∈ I with x1 < x2 ,
φ(x2 ) − φ(x1 ) φ(x1 ) − φ(x2 )
∇φ(x1 ) ≤ = ≤ ∇φ(x2 ),
x2 − x1 x1 − x2
thereby establishing the result. 
If φ : R → R̄ is a proper convex function, dom φ may be considered an
interval I. Then from the nondecreasing property of ψ in the above proposi-
tion, the right-sided derivative of φ, φ′+ , exists at x̄ provided both −∞ and
+∞ values are allowed and is defined as
φ(x) − φ(x̄)
φ′+ (x̄) = lim .
x↓x̄ x − x̄
φ(x) − φ(x̄)
If has a finite lower bound,
x − x̄
φ(x) − φ(x̄) φ(x) − φ(x̄)
lim = inf ,
x↓x̄ x − x̄ x>x̄, x∈I x − x̄
φ(x) − φ(x̄) φ(x) − φ(x̄)
because is nondecreasing on I. In case does not have
x − x̄ x − x̄
a finite lower bound,
φ(x) − φ(x̄) φ(x) − φ(x̄)
lim = inf = −∞,
x↓x̄ x − x̄ x>x̄, x∈I x − x̄
and for the case when I = {x̄},

φ(x) − φ(x̄)
inf = +∞
x>x̄, x∈I x − x̄
as {x ∈ R : x > x̄, x ∈ I} = ∅. Thus,

φ(x) − φ(x̄)
φ′+ (x̄) = inf .
x>x̄, x∈I x − x̄

Theorem 2.76 Consider a proper convex function φ : Rn → R̄ and


x̄ ∈ dom φ. Then for every d ∈ Rn , the directional derivative φ′ (x̄, d) exists
with φ′ (x̄, 0) = 0 and

φ(x̄ + λd) − φ(x̄)


φ′ (x̄, d) = inf .
λ>0 λ
Moreover, φ′ (x̄, d) is a sublinear function in d for every d ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


88 Tools for Convex Optimization

Proof. Define ψ : R → R̄ given by

ψ(λ) = φ(x̄ + λd).

As x̄ ∈ dom φ, ψ(0) = φ(x̄) < +∞, which along with the convexity of φ
implies that ψ is a proper convex function. Now consider ϕ : R → R̄ defined
as
ψ(λ) − ψ(0) φ(x̄ + λd) − φ(x̄)
ϕ(λ) = = .
λ λ
By Proposition 2.75, ϕ is nondecreasing when λ > 0. Then by the discussion

preceding the theorem, ψ+ (0) exists and

ψ+ (0) = lim ϕ(λ) = inf ϕ(λ),
λ→0 λ>0

as desired.
Suppose that d ∈ Rn and α > 0. Then
φ(x̄ + λαd) − φ(x̄)
φ′ (x̄, αd) = lim
λ→0 λ
φ(x̄ + λαd) − φ(x̄)
= lim α
λ→0 λα
φ(x̄ + λ′ d) − φ(x̄)
= α lim = αφ′ (x̄, d),
λ′ →0 λ′
which implies that φ′ (x, .) is positively homogeneous.
Suppose that d1 , d2 ∈ Rn and α ∈ [0, 1], by the convexity of φ,

φ(x̄ + λ((1 − α)d1 + αd2 )) − φ(x̄) ≤ (1 − α)(φ(x̄ + λd1 ) − φ(x̄))


+α(φ(x̄ + λd2 ) − φ(x̄)).

Dividing both sides by λ > 0 and taking the limit as λ → 0, the above
inequality reduces to

φ′ (x̄, ((1 − α)d1 + αd2 )) ≤ (1 − α)φ′ (x̄, d1 ) + αφ(x̄, d2 ), ∀ α ∈ [0, 1].

In particular for α = 1/2 and applying the positive homogeneity property, the
above condition yields

φ′ (x̄, (d1 + d2 )) ≤ φ′ (x̄, d1 ) + φ(x̄, d2 ).

Because d1 , d2 ∈ Rn were arbitrary, the above inequality implies that φ′ (x̄, .)


is subadditive, which along with positive homogeneity implies that φ′ (x̄, .) is
sublinear. 
For a differentiable convex function φ : Rn → R, the following relation
holds between the directional derivative and the gradient of the function φ

φ′ (x̄, d) = h∇φ(x̄), di, ∀ d ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 89

But in absence of differentiability, can one have such a relation for the direc-
tional derivative? The answer is yes. The notion that replaces the gradient in
the above condition is the subgradient.

Definition 2.77 Consider a proper convex function φ : Rn → R̄ and


x̄ ∈ dom φ. Then ξ ∈ Rn is said to be the subgradient of the function φ
at x̄ if

φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .

The collection of all such vectors constitute the subdifferential of φ at x̄ and


is denoted by ∂φ(x̄). For x̄ ∈
/ dom φ, ∂φ(x̄) is empty.

For a differentiable function, its gradient at any point acts as a tangent to


the graph of the function at that point. In a similar way, from the definition
above it can be seen that the affine function φ(x̄) + hξ, x − x̄i is a supporting
hyperplane to the epigraph of φ at (x̄, φ(x̄)) with the slope ξ. In fact, at the
point of nondifferentiability, there can be an infinite number of such supporting
hyperplanes and the collection of the slopes of each of these hyperplanes forms
the subdifferential.
Recall the indicator function to the convex set F ⊂ Rn . Obviously
δF : Rn → R̄ is a proper convex function. Now from the above definition, the
subdifferential of δF at x̄ ∈ F is given by

∂δF (x̄) = {ξ ∈ Rn : δF (x) − δF (x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn }


= {ξ ∈ Rn : 0 ≥ hξ, x − x̄i, ∀ x ∈ F },

which is nothing but the normal cone to the set F at x̄. Therefore, for a convex
set F , ∂δF = NF .
Consider the norm function φ(x) = kxk, x ∈ Rn . Observe that φ is a
convex function. At x̄ = 0, φ is not differentiable and ∂φ(x̄) = B.
Like the relation between the directional derivative and gradient, we are
interested in deriving a relationship between the directional derivative and the
subdifferential, which we establish in the next result.

Theorem 2.78 Consider a proper convex function φ : Rn → R̄ and


x̄ ∈ dom φ. Then

∂φ(x̄) = {ξ ∈ Rn : φ′ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn }.

Proof. Suppose that ξ ∈ ∂φ(x̄), which by Definition 2.77 implies that

φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .

In particular, for x = x̄ + λd with λ > 0, the above condition reduces to

φ(x̄ + λd) − φ(x̄)


≥ hξ, di, ∀ d ∈ Rn .
λ

© 2012 by Taylor & Francis Group, LLC


90 Tools for Convex Optimization

Taking the limit as λ → 0 leads to


φ′ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn ,
as desired.
Conversely, suppose that ξ ∈ Rn satisfies
φ′ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn .
By the alternate definition of φ′ (x̄, d) from Theorem 2.76 leads to
φ(x̄ + λd) − φ(x̄)
≥ hξ, di, ∀ d ∈ Rn .
λ
In particular, for λ ∈ [0, 1] and d = x − x̄, which along with the convexity of
φ leads to
φ(x̄ + λ(x − x̄)) − φ(x̄)
φ(x) − φ(x̄) ≥ ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
λ
which implies that ξ ∈ ∂φ(x̄), thereby establishing the result. 
The result below from Rockafellar [97] shows that actually the directional
derivative is the support function of the subdifferential set.
Theorem 2.79 Consider a proper convex function φ : Rn → R̄ and
x̄ ∈ dom φ. Then
cl φ′ (x̄, d) = sup hξ, di = σ∂φ(x̄) (d), ∀ d ∈ Rn .
ξ∈∂φ(x̄)

However, if x̄ ∈ ri dom φ,
φ′ (x̄, d) = σ∂φ(x̄) (d), ∀ d ∈ Rn
and if x̄ ∈ int dom φ, φ′ (x̄, d) is finite for every d ∈ Rn .
Proof. Because φ′ (x̄, .) is sublinear, combining Theorems 2.62 and 2.78 leads
to
cl φ′ (x̄, d) = σ∂φ(x̄) (d).
If x̄ ∈ ri dom φ, the domain of φ′ (x̄, .) is an affine set that is actually a
subspace parallel to the affine hull of dom φ. By sublinearity, φ′ (x̄, 0) = 0, it is
not identically −∞ on the affine set. Therefore, by Proposition 2.63, cl φ′ (x̄, .)
and hence φ′ (x̄, .) is a proper function. By Proposition 2.66, cl φ′ (x̄, .) agrees
with φ′ (x̄, .) on the affine set and hence is closed, thereby leading to the desired
condition. For x̄ ∈ int dom φ, the domain of φ′ (x̄, .) is Rn and hence it is finite
everywhere. 
As mentioned earlier for a differentiable convex function, for every d ∈ Rn ,

φ (x̄, d) = h∇φ(x̄), di. So the question is: for a differentiable convex function,
how are the gradient and the subdifferential related? We discuss this aspect
in the result below.

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 91

Proposition 2.80 Consider a convex function φ : Rn → R differentiable at


x̄ with gradient ∇φ(x̄). Then the unique subgradient of φ at x̄ is the gradient,
that is, ∂φ(x̄) = {∇φ(x̄)}.

Proof. For a differentiable convex function φ,

φ′ (x̄, d) = h∇φ(x̄), di, ∀ d ∈ Rn .

By Theorem 2.79, for every ξ ∈ ∂φ(x̄),

h∇φ(x̄) − ξ, di ≥ 0, ∀ d ∈ Rn .

Because the above condition holds for every d ∈ Rn , it reduces to

h∇φ(x̄) − ξ, di = 0, ∀ d ∈ Rn ,

which leads to ∇φ(x̄) = ξ. As ξ ∈ ∂φ(x̄) is arbitrary, the subdifferential is a


singleton with ∂φ(x̄) = {∇φ(x̄)}. 
From the above theorem, we have the following result, which gives the
equivalent characterization of a differentiable convex function.

Theorem 2.81 Consider a differentiable function φ : Rn → R. Then φ is


convex if and only if

φ(y) − φ(x) ≥ h∇φ(x), y − xi, ∀ x, y ∈ Rn .

Observe that in Theorem 2.79 we defined the relation between the direc-
tional derivative and the support function of the subdifferential for point x̄ in
the relative interior of the domain. The reason for this is the fact that at the
boundary of the domain, the subdifferential may be an empty set. For a clear
view into this aspect, we consider the following example from Bertsekas [12].
Let φ : R → R̄ be a proper convex function given by
 √
− x, 0 ≤ x ≤ 1,
φ(x) =
+∞, otherwise.

The subdifferential of φ is
−1

√ , 0 < x < 1,
2 x
∂φ(x) =
 [−1/2, +∞) ,
 x = 1,
∅, x ≤ 0 or x > 1.

Note that the subdifferential is empty at the boundary point x = 0. Also at


the other boundary point x = 1, it is unbounded. But the subdifferential may
also turn out to be unbounded at a point in the relative interior of the domain.

© 2012 by Taylor & Francis Group, LLC


92 Tools for Convex Optimization

For example, consider the following proper convex function φ : R → R̄ defined


as

0, x = 0,
φ(x) =
+∞, otherwise.

Observe that at x = 0, ∂φ(x) = R, which is unbounded even though 0 is in


the relative interior of the domain. Based on these illustrations, we have the
following result from Rockafellar [97] and Attouch, Buttazzo, and Michaille [3].

Proposition 2.82 Consider a proper convex function φ : Rn → R̄ and


x̄ ∈ dom φ. Then ∂φ(x̄) is closed and convex. For x̄ ∈ ri dom φ, the sub-
differential ∂φ(x̄) is nonempty. Furthermore, if x̄ ∈ int dom φ, ∂φ(x̄) is non-
empty and compact. Moreover, if φ is continuous at x̄ ∈ dom φ, then ∂φ(x̄)
is compact.

Proof. Suppose that {ξk } ⊂ ∂φ(x̄) such that ξk → ξ. By Definition 2.77 of


subdifferential,

φ(x) − φ(x̄) ≥ hξk , x − x̄i, ∀ x ∈ Rn .

Taking the limit as k → +∞, the above inequality leads to

φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,

which implies that ξ ∈ ∂φ(x̄), thereby yielding the closedness of ∂φ(x̄).


Consider ξ1 , ξ2 ∈ ∂φ(x̄), which implies that for i = 1, 2,

φ(x) − φ(x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn .

Therefore, for any λ ∈ [0, 1],

φ(x) − φ(x̄) ≥ h(1 − λ)ξ1 + λξ2 , x − x̄i, ∀ x ∈ Rn ,

which implies (1 − λ)ξ1 + λξ2 ∈ ∂φ(x̄). Because ξ1 , ξ2 were arbitrary, ∂φ(x̄) is


convex.
From the proof of Theorem 2.79, for x̄ ∈ ri dom φ, φ′ (x̄, .) is the support
function of ∂φ(x̄), which is proper. Hence, ∂φ(x̄) is nonempty.
Again by Theorem 2.79, for x̄ ∈ int dom φ, φ′ (x̄, .) is finite everywhere.
Because it is a support of ∂φ(x̄), by Proposition 2.61, ∂φ(x̄) is bounded and
hence compact.
Now suppose that φ is continuous at x̄ ∈ dom φ. We have already seen that
∂φ is always closed and convex. Therefore to establish that ∂φ(x̄) is compact,
we only need to show that it is bounded. By the continuity of φ at x̄, it is
bounded in the neighborhood of x̄. Thus, there exist ε > 0 and M ≥ 0 such
that

φ(x̄ + εd) ≤ M, ∀ d ∈ B.

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 93

Consider ξ ∈ ∂φ(x̄), which implies that

hξ, x − x̄i ≤ φ(x) − φ(x̄), ∀ x ∈ Rn .

In particular, for any d ∈ B, the above inequality along with the boundedness
of φ in the neighborhood of x̄ leads to

hξ, εdi ≤ φ(x̄ + εd) − φ(x̄) ≤ M + |φ(x̄)|,

which implies that


1
hξ, di ≤ (M + |φ(x̄)|), ∀ d ∈ B.
ε
Therefore,
1
kξk ≤ (M + |φ(x̄)|).
ε
Because ξ ∈ ∂φ(x̄) was arbitrary, ∂φ(x̄) is bounded and hence compact. 
If we consider a real-valued convex function φ : Rn → R, then int dom φ =
n
R and therefore, the above result reduces to the following.

Proposition 2.83 Consider a convex function φ : Rn → R. Then the subdif-


ferential ∂φ(x) is nonempty, convex, and compact for every x ∈ Rn .

With the discussion on subdifferentials, we present some properties of the


subdifferential as x varies by treating it as a multifunction or set-valued map-
ping x 7→ ∂φ(x) starting with some of the fundamental continuity results of
the subdifferential mapping.

Theorem 2.84 (Closed Graph Theorem) Consider a proper lsc convex func-
tion φ : Rn → R̄. If for sequences {xk }, {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ) with
xk → x̄ and ξk → ξ,¯ then ξ¯ ∈ ∂φ(x̄). This means gph ∂φ is a closed subset of
Rn × Rn .
¯ then from Definition 2.77
Proof. Because ξk ∈ ∂φ(xk ) with (xk , ξk ) → (x̄, ξ),
of subdifferential,

φ(x) − φ(xk ) ≥ hξk , x − xk i, ∀ x ∈ Rn .

Taking the limit infimum as k → +∞, which along with the lower semiconti-
nuity of φ reduces the above condition to
¯ x − x̄i, ∀ x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ,

thereby implying that ξ¯ ∈ ∂φ(x̄) and thus establishing that gph ∂φ is closed,
as desired. 

© 2012 by Taylor & Francis Group, LLC


94 Tools for Convex Optimization

From the above theorem one may note that the normal cone to a convex
set F ⊂ Rn is also graph closed as it is nothing but the subdifferential of the
convex indicator function δF , that is, NF = ∂δF .
In general we know that the arbitrary union of closed sets need not be
closed. But in the proposition below from Bertsekas [12] and Rockafellar [97]
we have that the union of the subdifferential over a compact set is compact.
n
S function φ : R → R and a compact
Proposition 2.85 Consider a convex
n
set F ∈ R . Then the set ∂φ(F ) = x∈F ∂φ(x) is nonempty and compact.

Proof. Because F is a nonempty subset of dom φ = Rn , by Proposition 2.82,


∂φ(F ) is nonempty.
We claim that ∂φ(F ) is closed. Consider a sequence {ξk } ⊂ ∂φ(F ) such
that ξk → ξ. ¯ As ξk ∈ ∂φ(F ) for k ∈ N, there exist xk ∈ F such that
ξk ∈ ∂φ(xk ), k ∈ N. By the compactness of F , {xk } is a bounded sequence
that by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent
subsequence. Without loss of generality, suppose that xk → x̄, which by the
closedness of F implies that x̄ ∈ F . Invoking the Closed Graph Theorem,
Theorem 2.84, ξ¯ ∈ ∂φ(x̄) ⊂ ∂φ(F ). Thus, ∂φ(F ) is closed.
Now to establish the compactness of ∂φ(F ), we will establish the bounded-
ness of ∂φ(F ). On the contrary, suppose that there exist a bounded sequence
{xk } ⊂ F and an unbounded sequence {ξk } ⊂ Rn such that ξk ∈ ∂φ(xk ).
ξk
Define ηk = , which is a bounded sequence. Because {xk } and {ηk } are
kξk k
bounded sequences, by the Bolzano–Weierstrass Theorem, have a convergent
subsequence. As ξk ∈ ∂φ(xk ), by Definition 2.77 of subdifferential,

φ(xk + ηk ) − φ(xk ) ≥ hξk , ηk i = kξk k.

By Theorem 2.69, φ is continuous on Rn , which along with the convergence


of {xk } and {ηk } yields that φ(xk + ηk ) − φ(xk ) is bounded. Therefore, by
the above inequality, {ξk } is a bounded sequence, thereby contradicting our
assumption. Thus, ∂φ(F ) is a bounded set and hence compact. 

Theorem 2.86 Consider a proper convex function φ : Rn → R̄. Then ∂φ is


usc on int dom φ. Moreover, if φ : Rn → R is a differentiable convex function,
then it is continuously differentiable.

Proof. By Proposition 2.82, ∂φ(x̄) is nonempty and compact if and only if


x̄ ∈ int dom φ. By Theorem 2.84, ∂φ is graph closed. Therefore, from the
discussion on set-valued mappings in Chapter 1, ∂φ is usc on int dom φ.
As for a single-valued map, the notion of upper semicontinuity coincides
with that of continuity and by Proposition 2.80, for a differentiable convex
function ∂φ = {∇φ}, φ is continuously differentiable. 
Below we state another important characteristic of the subdifferential with-
out proof. For more details on the treatment of ∂φ as a multifunction, one
may refer to Rockafellar [97].

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 95

Theorem 2.87 Consider a closed proper convex function φ : Rn → R̄. Then


the subdifferential ∂φ is a maximal monotone where by monotonicity we mean
that for any x1 , x2 ∈ Rn ,

hξ1 − ξ2 , x1 − x2 i ≥ 0, ∀ ξi ∈ ∂φ(xi ), i = 1, 2,

and maximal monotone map in the sense that its graph is not properly con-
tained in the graph of any other monotone map.

Similar to the standard Mean Value Theorem, Theorem 1.18, we present


the Mean Value Theorem for convex functions in terms of the subdifferential.

Theorem 2.88 Consider a convex function φ : Rn → R. Then for x, y ∈ Rn ,


there exists z ∈ (x, y) such that

φ(y) − φ(x) ∈ h∂φ(z), y − xi,

where h∂φ(z), y − xi = {hξ, y − xi : ξ ∈ ∂φ(z)}.

Proof. Consider the function ψ : [0, 1] → R defined by

ψ(λ) = φ(x + λ(y − x)) − φ(x) + λ(φ(x) − φ(y)).

Because φ is real-valued and by Theorem 2.69 it is continuous on Rn , hence ψ


is a real-valued continuous function on [0,1]. Observe that ψ(0) = 0 = ψ(1).
Also, by the convexity of φ,

ψ(λ) ≤ (1 − λ)φ(x) + λφ(y) − φ(x) + λ(φ(x) − φ(y)) = 0, ∀ λ ∈ [0, 1].

Thus, ψ attains its maximum at λ = 0 and λ = 1 and hence there exists


λ̄ ∈ (0, 1) at which ψ attains its minimum over [0, 1]. Therefore,

ψ ′ (λ̄, d) ≥ 0, ∀ d ∈ R.

Denote z = x + λ̄(y − x) ∈ (x, y). Therefore,

ψ(λ̄ + λd) − ψ(λ̄)


ψ ′ (λ̄, d) = lim
λ↓0 λ
φ(x + (λ̄ + λd)(y − x)) − φ(x + λ̄(y − x))
= lim + d(φ(x) − φ(y))
λ↓0 λ
= φ′ (z, d(y − x)) + d(φ(x) − φ(y)), ∀ d ∈ R,

which implies that

φ′ (z, d(y − x)) ≥ d(φ(y) − φ(x)), ∀ d ∈ R.

In particular, taking d = 1 in the above condition leads to

φ(y) − φ(x) ≤ φ′ (z, y − x),

© 2012 by Taylor & Francis Group, LLC


96 Tools for Convex Optimization

whereas taking d = −1 yields

−φ′ (z, x − y) ≤ φ(y) − φ(x).

Combining the preceding inequalities imply

−φ′ (z, x − y) ≤ φ(y) − φ(x) ≤ φ′ (z, y − x),

which by Theorem 2.79 becomes

inf hξ, y − xi = − sup hξ, x − yi ≤ φ(y) − φ(x) ≤ sup hξ, y − xi.


ξ∈∂φ(z) ξ∈∂φ(z) ξ∈∂φ(z)

By Proposition 2.83, ∂φ(z) is compact, which along with the continuity of


hξ, y − xi implies that there exists ξ¯ ∈ ∂φ(z) such that
¯ y − xi ∈ h∂φ(z), y − xi,
φ(y) − φ(x) = hξ,

thereby completing the proof. 


We have discussed the various continuity and differentiability behaviors
of convex functions but in most cases these properties were restricted to the
interior or relative interior of the domain of the function. As seen in the
discussion preceding Proposition 2.82, the subdifferential set may be empty
at the boundary of the domain. To overcome this flaw of the subdifferential
of a convex function, we have the notion of ε-subdifferentials, which have the
nonemptiness property throughout the domain of the function. We will discuss
this notion in a later section in the chapter.
As we are interested in the convex optimization problem, we first give the
optimality condition for the unconstrained convex programming problem
min f (x) subject to x ∈ Rn , (CPu )
where f : Rn → R is a convex function.

Theorem 2.89 Consider the unconstrained convex programming problem


(CPu ). Then x̄ ∈ Rn is the point of minimizer of (CPu ) if and only if
0 ∈ ∂f (x̄).

Proof. Suppose that x̄ ∈ Rn is a point of minimizer of (CPu ), which implies


that

f (x) − f (x̄) ≥ 0, ∀ x ∈ Rn .

By Definition 2.77 of subdifferential, 0 ∈ ∂f (x̄). The converse can be proved


by again employing the definition of the subdifferential. 
Now recall the constrained convex programming problem presented in
Chapter 1:
min f (x) subject to x ∈ C, (CP )

© 2012 by Taylor & Francis Group, LLC


2.3 Convex Functions 97

where f : Rn → R is a convex function and C is a convex subset of Rn . Recall


the important property of convex optimization discussed in Section 1.3 that
makes its study useful is that every local minimizer is also a global minimizer.
The next result provides an alternative proof to this fact.

Theorem 2.90 Consider a convex set C ⊂ Rn and let f : Rn → R̄ be a


proper convex function. Then the point of local minimum is a point of global
minimum. If in addition f is strictly convex, there exists at most one global
point of minimum.

Proof. Suppose that x̄ ∈ Rn is a point of local minimum of f over C. We claim


that x̄ is a point of global minimum. On the contrary, assume that x̄ is not a
point of global minimum. Thus there exists x̃ ∈ C such that f (x̃) < f (x̄). By
the convexity of f , for every λ ∈ (0, 1),

f ((1 − λ)x̄ + λx̃) ≤ (1 − λ)f (x̄) + λf (x̃) < f (x̄). (2.20)

Also by the convexity of C, (1 − λ)x̄ + λx̃ ∈ C. Taking λ sufficiently small,


(1 − λ)x̄ + λx̃ is in the neighborhood of x̄, which by the inequality (2.20)
implies that

f ((1 − λ)x̄ + λx̃) < f (x̄),

which contradicts that x̄ is a point of local minimum. Hence, x̄ is a point of


global minimum of f over C.
Suppose that f is a strictly convex function with x̄ and ȳ as the points of
global minimum. Let f (x̄) = f (ȳ) = fmin , say. We claim that x̄ = ȳ. On the
contrary, assume that x̄ 6= ȳ. By Definition 2.47 of strict convexity, for every
λ ∈ (0, 1),

f ((1 − λ)x̄ + λȳ) < (1 − λ)f (x̄) + λf (ȳ) = fmin . (2.21)

By the convexity of C, (1 − λ)x̄ + λȳ ∈ (x̄, ȳ) ⊂ C. Now the strict inequality
(2.21) contradicts the fact that x̄ and ȳ are the points of global minimizers
of f over C, which is a contradiction. Thus, x̄ = ȳ, thereby implying that
minimizing a strictly convex function f over a convex set C has at most one
point of global minimum. 
As discussed earlier in this chapter, the above problem can be converted
into the unconstrained convex programming problem of the form (CPu ) with
the objective function f replaced by f + δC . From the above theorem, x̄ is the
point of minimizer of (CP ) if and only if

0 ∈ ∂(f + δC )(x̄).

To express the above inclusion explicitly in terms of the subdifferentials of


the objective function f and the indicator function δC , one needs the calculus
rules for the subdifferentials. Thus, following this path we shall now discuss
the subdifferential calculus rules.

© 2012 by Taylor & Francis Group, LLC


98 Tools for Convex Optimization

2.4 Subdifferential Calculus


As we have already seen that subdifferentials play a pivotal role in the convex
analysis. It replaces the role of derivative in case of nondifferentiable convex
functions. So it is obvious to look into the matter as to whether or not the dif-
ferential calculus is carried over to subdifferential calculus. As we proceed in
this direction, one will see that it does satisfy results similar to standard cal-
culus but under certain assumptions. We begin our journey of subdifferential
calculus with the sum rule.

Theorem 2.91 (Moreau–Rockafellar Sum Rule) Consider two proper convex


functions φi : Rn → R̄, i = 1, 2. Suppose that ri dom φ1 ∩ ri dom φ2 6= ∅.
Then

∂(φ1 + φ2 )(x) = ∂φ1 (x) + ∂φ2 (x)

for every x ∈ dom(φ1 + φ2 ).

Proof. We first show that

∂φ1 (x̄) + ∂φ2 (x̄) ⊂ ∂(φ1 + φ2 )(x̄). (2.22)

Suppose that ξi ∈ ∂φi (x̄), i = 1, 2. By the definition of a subdifferential,

φi (x) − φi (x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i = 1, 2.

Therefore,

(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ1 + ξ2 , x − x̄i, ∀ x ∈ Rn ,

which implies that (ξ1 + ξ2 ) ∈ ∂(φ1 + φ2 )(x̄), thereby establishing (2.22).


To obtain the result, we will now prove the reverse inclusion, that is,

∂(φ1 + φ2 )(x̄) ⊂ ∂φ1 (x̄) + ∂φ2 (x̄). (2.23)

Suppose that ξ ∈ ∂(φ1 + φ2 )(x̄). Define two convex functions

ψ1 (x) = φ1 (x + x̄) − φ1 (x̄) − hξ, xi and ψ2 (x) = φ2 (x + x̄) − φ2 (x̄).

Here, ψ1 (0) = ψ2 (0) = 0. Observe that ξ ∈ ∂(φ1 + φ2 )(x̄) which by the above
constructed functions is equivalent to

(ψ1 + ψ2 )(x) ≥ 0, ∀ x ∈ Rn ,

that is, 0 ∈ ∂(ψ1 + ψ2 )(0). Thus, without loss of generality, consider x̄ = 0,


ξ = 0, and φ1 (0) = φ2 (0) = 0 such that

0 ∈ ∂(φ1 + φ2 )(0),

© 2012 by Taylor & Francis Group, LLC


2.4 Subdifferential Calculus 99

which implies

(φ1 + φ2 )(x) ≥ (φ1 + φ2 )(0) = 0, ∀ x ∈ Rn ,

that is, φ1 (x) ≥ −φ2 (x) for every x ∈ Rn . Define

F1 = {(x, α) ∈ Rn × R : φ1 (x) ≤ α}

and F2 = {(x, α) ∈ Rn × R : α ≤ −φ2 (x)}.


Observe that both F1 and F2 are closed convex sets, where by Proposition 2.64,

ri F1 = ri epi φ1 = {(x, α) ∈ Rn × R : x ∈ ri dom φ1 , φ1 (x) < α}.

As φ1 (x) ≥ −φ2 (x), we have

ri F1 ∩ F2 = ∅

with (0, 0) ∈ F1 ∩F2 . Therefore, by the separation theorem, Theorem 2.26 (ii),
there exists (x∗ , α∗ ) ∈ Rn × R with (x∗ , α∗ ) 6= (0, 0) such that

hx∗ , xi + α∗ α ≥ 0, ∀ (x, α) ∈ F1 ,
hx∗ , xi + α∗ α ≤ 0, ∀ (x, α) ∈ F2 .

By assumption as φ1 (0) = 0, we have (0, α) ∈ F1 for α ≥ 0. Therefore, from


the inequality above, we have α∗ ≥ 0. We claim that α∗ 6= 0. Suppose that
α∗ = 0. Thus the above inequalities imply

hx∗ , x1 i ≥ 0 ≥ hx∗ , x2 i, ∀ x1 ∈ dom φ1 , ∀ x2 ∈ dom φ2 .

This implies that dom φ1 and dom φ2 can be separated, which contradicts
the hypothesis that ri dom φ1 ∩ ri dom φ2 6= ∅. Hence, α∗ > 0 and can be
normalized to one and thus

hx∗ , xi + α ≥ 0, ∀ (x, α) ∈ F1 ,
hx∗ , xi + α ≤ 0, ∀ (x, α) ∈ F2 .

In particular, for (x, φ1 (x)) ∈ F1 and (x, −φ2 (x)) ∈ F2 , we have −x∗ ∈ ∂φ1 (0)
and x∗ ∈ ∂φ2 (0), thereby leading to

0 ∈ ∂φ1 (0) + ∂φ2 (0),

thus establishing (2.23) and hence completing the proof. 


The necessity of the condition ri dom φ1 ∩ ri dom φ2 6= ∅ can be seen from
the following example from Phelps [93]. Consider φ1 , φ2 : R2 → R̄ defined as

φ1 (x) = δF1 (x), F1 = epi y 2 , y ∈ R,


φ2 (x) = δF2 (x), F2 = {(y1 , y2 ) ∈ R2 : y2 = 0}.

© 2012 by Taylor & Francis Group, LLC


100 Tools for Convex Optimization

Here, ∂(φ1 + φ2 )(0) = R2 whereas

∂φ1 (0) = {(0, ξ) ∈ R2 : ξ ≤ 0} and ∂φ2 (0) = {(0, ξ) ∈ R2 : ξ ∈ R}.

Therefore, ∂(φ1 + φ2 )(0) 6= ∂φ1 (0) + ∂φ2 (0). Observe that dom φ1 ∩ dom φ2 =
F1 ∩ F2 = {(0, 0)} while ri dom φ1 ∩ ri dom φ2 = ri F1 ∩ ri F2 = ∅.
Now as an application of the Subdifferential Sum Rule, we prove the equal-
ity in Proposition 2.39 (i) under the assumption of ri F1 ∩ ri F2 6= ∅.
Proof of Proposition 2.39 (i). For convex sets F1 , F2 ⊂ Rn , define φ1 = δF1
and φ2 = δF2 . Observe that dom φi = Fi for i = 1, 2. If ri F1 ∩ ri F2 6= ∅,
then ri dom φ1 ∩ ri dom φ2 6= ∅. Now applying the Sum Rule, Theorem 2.91,

∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄), ∀ x̄ ∈ dom φ1 ∩ dom φ2 ,

which along with the facts that δF1 + δF2 = δF1 ∩F2 and ∂δF = NF implies
that

NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 ,

hence completing the proof. 


n
Now if in Theorem 2.91, φi : R → R for i = 1, 2 are real-valued convex
functions, then the Sum Rule can be derived using the directional derivative.
We briefly discuss that approach from Hiriart-Urruty and Lemaréchal [63].
Using Theorem 2.79, the support of ∂φ1 (x̄) + ∂φ2 (x̄) is φ′1 (x̄, .) + φ′2 (x̄, .).
Readers are advised to verify this fact using the definition of support. Also,
the support of ∂(φ1 + φ2 )(x̄) is (φ1 + φ2 )′ (x̄, .), which is same as that of
∂φ1 (x̄) + ∂φ2 (x̄). Because the support functions are same for both sets,

∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄).

Observe that no additional assumption was required as here ri dom φ1 as well


as ri dom φ2 is Rn .
Other than the sum of convex functions being convex, from Proposi-
tion 2.53, we have that the composition of a nondecreasing convex function
with a convex function is also convex. So before presenting the Chain Rule, we
introduce the notion of increasing function defined over Rn and a result on the
subdifferential of a nondecreasing function. Recall that in Proposition 2.53,
the nondecreasing function ψ was defined over R.

Definition 2.92 A function φ : Rn → R is called nondecreasing if for


x, y ∈ Rn with xi ≥ yi , i = 1, 2, . . . , n, implies that φ(x) ≥ φ(y).

Theorem 2.93 Consider a nondecreasing convex function φ : Rn → R. Then


for every x ∈ Rn , ∂φ(x) ⊂ Rn+ .

© 2012 by Taylor & Francis Group, LLC


2.4 Subdifferential Calculus 101

Proof. Because φ is a nondecreasing convex function,


φ(x̄) ≥ φ(x̄ − ei ) ≥ φ(x̄) + hξ, −ei i,
where ei = (0, . . . , 0, 1, 0, . . . , 0) with 1 at the i-th place and ξ ∈ ∂φ(x̄). This
implies that
φ(x̄) ≥ φ(x̄) − ξi ,
that is, ξi ≥ 0. Since i was arbitrary, ξi ≥ 0, i = 1, 2, . . . , n and thus
∂φ(x̄) ⊂ Rn+ . 
We now present the subdifferential calculus rule of the composition of
convex functions. The proof is from Hiriart-Urruty and Lemaréchal [63].
Theorem 2.94 (Chain Rule) Consider a nondecreasing convex function
φ : Rm → R and a vector-valued function Φ : Rn → Rm given by
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) where φi : Rn → R, i = 1, 2, . . . , m be a
convex function. Then
(m
X
∂(φ ◦ Φ)(x̄) = µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m} .
Proof. Define
(m )
X
F= µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)), ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m .
i=1

We will prove the result in the following steps:


1. We shall show that F is a convex compact set as ∂(φ ◦ Φ).
2. We shall calculate the support function of F.
3. We shall calculate the support function of ∂(φ ◦ Φ) and establish that it
is same as the support of F.
The result is completed by the fact that two convex sets are equal if and
only if their support functions are equal.
Step 1: Consider any ξ ∈ F. Thus there exist (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄))
and ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m, such that
m
X
ξ= µi ξi .
i=1

Therefore,
m
X
kξk ≤ |µi | kξi k.
i=1

© 2012 by Taylor & Francis Group, LLC


102 Tools for Convex Optimization

By Proposition 2.83, ∂φ(Φ(x̄)) as well ∂φi (x̄), i = 1, 2, . . . , m, are bounded


sets and hence ξ is bounded. Because ξ ∈ F was arbitrary, F is a bounded
set. Moreover, ∂φ(Φ(x̄)) and ∂φi (x̄), i = 1, 2, . . . , m, are closed sets; thus F
is also a closed set, thereby yielding the compactness of F .
Suppose that ξ1 , ξ2 ∈ F, which implies for j = 1, 2,
m
X
ξj = µji ξij ,
i=1

where (µj1 , µj2 , . . . , µjm ) ∈ ∂φ(Φ(x̄) and ξij ∈ ∂φi (x̄), i = 1, 2, . . . , m, for
j = 1, 2. Now for any λ ∈ (0, 1), define

ξλ = (1 − λ)ξ1 + λξ2 .

From Theorem 2.93, µji ≥ 0 for i = 1, 2, . . . , m and j = 1, 2. Define

µλi = (1 − λ)µ1i + λµ2i , i = 1, 2, . . . , m.

Note that µλi = 0 only when µ1i = µ2i = 0 as λ ∈ (0, 1). Therefore,
X  (1 − λ)µ1 λµ2i 2

i 1
ξ= µi ξi + ξi ,
¯
µi µi
i∈I

where I¯ = {i ∈ {1, 2, . . . , m} : µi > 0}. By Proposition 2.83, ∂φ(Φ(x̄)) and


∂φi (x̄), i = 1, 2, . . . , m, are convex sets and hence

(µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄))

(1 − λ)µ1i 1 λµ2i 2
and ξi + ξ ∈ ∂φi (x̄), i = 1, 2, . . . , m,
µi µi i
thereby showing that F is convex.
Step 2: Denote

Φ′ (x̄, d) = (φ′1 (x̄, d), φ′2 (x̄, d), . . . , φ′m (x̄, d)).

We will establish that

σF (d) = φ′ (Φ(x̄), Φ′ (x̄, d)).

Consider ξ ∈ F, which implies that


m
X
ξ= µi ξi ,
i=1

where (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)) and ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m. By The-


orem 2.79,

hξi , di ≤ φ′i (x̄, d), i = 1, 2, . . . , m.

© 2012 by Taylor & Francis Group, LLC


2.4 Subdifferential Calculus 103

By Theorem 2.93, µi ≥ 0, i = 1, 2, . . . , m, which along with the above in-


equality implies that
m
X m
X
hξ, di = µi hξi , di ≤ µi φ′i (x̄, d).
i=1 i=1

As µ = (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
m
X
µi φ′i (x̄, d) = hµ, Φ′ (x̄, d)i ≤ φ′ (Φ(x̄), Φ′ (x̄, d)).
i=1

We claim that there exists ξ¯ ∈ F such that

¯ di = φ′ (Φ(x̄), Φ′ (x̄, d)).


hξ,

By Proposition 2.83, ∂φ(Φ(x̄)) is compact and therefore, there exists


µ̄ = (µ̄1 , µ̄2 , . . . , µ̄m ) ∈ ∂φ(Φ(x̄)) such that
m
X
µ̄i φ′i (x̄, d) = hµ̄, Φ′ (x̄, d)i = φ′ (Φ(x̄), Φ′ (x̄, d)). (2.24)
i=1

Also, for i = 1, 2, . . . , m, ∂φi (x̄) is compact, which implies there exists


ξ¯i ∈ ∂φi (x̄) such that

hξ¯i , di = φ′i (x̄, d), i = 1, 2, . . . , m.

Therefore, the condition (2.24) becomes


m
X
µ̄i hξ¯i , di = φ′ (Φ(x̄), Φ′ (x̄, d)).
i=1

m
X
Denoting ξ¯ = µ̄i ξ¯i ∈ F,
i=1

¯ di = φ′ (Φ(x̄), Φ′ (x̄, d)),


hξ,

which implies that

σF (d) = φ′ (Φ(x̄), Φ′ (x̄, d)), ∀ d ∈ Rn .

Step 3: It is obvious that the support function of ∂(φ ◦ Φ)(x̄) is (φ ◦ Φ)′ (x̄, d).
We claim that

(φ ◦ Φ)′ (x̄, d) = φ′ (Φ(x̄), Φ′ (x̄, d)).

© 2012 by Taylor & Francis Group, LLC


104 Tools for Convex Optimization

For real-valued convex functions φi , i = 1, 2, . . . , m, from Definition 2.74 of


directional derivative, it is obvious that
φi (x̄ + λd) = φi (x̄) + λφ′i (x̄, d) + o(λ), i = 1, 2, . . . , m,
which implies that
Φ(x̄ + λd) = Φ(x̄) + λΦ′ (x̄, d) + o(λ).
By Theorem 2.69, φ is continuous on ri dom φ = Rn which yields
φ(Φ(x̄ + λd)) = φ(Φ(x̄) + λΦ′ (x̄, d)) + o(λ),
which again by the definition of φ′ leads to
φ(Φ(x̄ + λd)) = φ(Φ(x̄)) + λφ′ (Φ(x̄), Φ′ (x̄, d)) + o(λ).
Dividing throughout by λ > 0 and taking the limit as λ → 0 reduces the
above condition to
(φ ◦ Φ)′ (x̄, d) = φ′ (Φ(x̄), Φ′ (x̄, d)).
Because the support functions of both the sets are same, the sets ∂(φ ◦ Φ) and
F coincide. 
As we will discuss in this book, one of the ways to derive the optimality
conditions for (CP ) is the max-function approach, thereby hinting at the use
of subdifferential calculus for the max-function. Consider the convex functions
φi : Rn → R, i = 1, 2, . . . , m, and define the max-function
φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}.
Observe that φ can be expressed as a composition of the functions
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and ϕ(y) = max{y1 , y2 , . . . , ym }
given by φ(x) = (ϕ◦Φ)(x). It is now natural to apply the Chain Rule presented
above but along with that one needs to calculate ∂ϕ or ϕ′ (x, d). So before
moving on to establish the Max-Function Rule, we will present a result to
derive ϕ′ (x, d). The proof is from Hiriart-Urruty [59].
Theorem 2.95 Consider differentiable convex functions ϕi : Rn → R for
i = 1, 2, . . . , m. For x ∈ Rn , define
ϕ(x) = max{ϕ1 (x), ϕ2 (x), . . . , ϕm (x)}
and denote the active index set by I(x) defined as
I(x) = {i ∈ {1, 2, . . . , m} : ϕ(x) = ϕi (x)}.
Then
ϕ′ (x̄, d) = max {h∇ϕi (x̄), di}.
i∈I(x̄)

© 2012 by Taylor & Francis Group, LLC


2.4 Subdifferential Calculus 105

Proof. Without loss of generality, assume that I(x̄) = {1, 2, . . . , m} because


those ϕi where the maximum is not attained, do not affect ϕ′ (x̄, d). By the
definition of the max-function,

ϕ(x̄ + λd) ≥ ϕi (x̄ + λd), ∀ i = 1, 2, . . . , m,

which implies that

ϕ(x̄ + λd) − ϕ(x̄) ≥ ϕi (x̄ + λd) − ϕ(x̄), ∀ i = 1, 2, . . . , m.

As ϕ(x̄) = ϕi (x̄) for i ∈ I(x̄),

ϕ(x̄ + λd) − ϕ(x̄) ≥ ϕi (x̄ + λd) − ϕi (x̄), ∀ i ∈ I(x̄).

By Definition 2.74 of the directional derivative,


ϕi (x̄ + λd) − ϕi (x̄)
ϕ′ (x̄, d) ≥ lim , ∀ i ∈ I(x̄).
λ↓0 λ
Because φi , i ∈ I(x̄) are differentiable functions, which along with the above
inequality yields

ϕ′ (x̄, d) ≥ max h∇ϕi (x̄), di.


i∈I(x̄)

To establish the result, we will prove the reverse inequality, that is,

ϕ′ (x̄, d) ≤ max h∇ϕi (x̄), di.


i∈I(x̄)

We claim that there exists a neighborhood N (x̄) such that I(x) ⊂ I(x̄) for
every x ∈ N (x̄). On the contrary, assume that there exists {xk } ⊂ Rn with
xk → x̄ such that I(xk ) 6⊂ I(x̄). Therefore, we may choose ik ∈ I(xk ) but
/ I(x̄). As {ik } ⊂ {1, 2, . . . , m} for every k ∈ N, by the Bolzano–Weierstrass
ik ∈
Theorem, Proposition 1.3, it has a convergent subsequence. Without loss of
generality, suppose that ik → ī. Because I(xk ) is closed, ī ∈ I(xk ), which
implies ϕī (xk ) = ϕ(xk ). By Theorem 2.69, the functions are continuous on
Rn . Thus ϕī (x̄) = ϕ(x̄), that is, ī ∈ I(x̄). Because ik ∈ / I(x̄) for every k ∈ N,
which implies that ī 6∈ I(x̄), which is a contradiction, thereby establishing the
claim.
Now consider {λk } ⊂ R+ such that λk → 0. Observe that

ϕik (x̄ + λk d) = ϕ(x̄ + λk d), ∀ ik ∈ I(x̄ + λk d).

For sufficiently large k ∈ N, we may choose ik ∈ I(x̄). Because I(x̄) is closed,


which along with the Bolzano–Weierstrass Theorem implies that ik has a con-
vergent subsequence. Without loss of generality, assume that {ik } converges
to ī ∈ I(x̄). We may choose ik = ī. Therefore,
ϕ(x̄ + λk d) − ϕ(x̄)
lim ≤ max h∇ϕi (x̄), di.
k→∞ λk i∈I(x̄)

© 2012 by Taylor & Francis Group, LLC


106 Tools for Convex Optimization

By Theorem 2.76, the directional derivative of a convex function always exists


and therefore,

ϕ(x̄ + λd) − ϕ(x̄)


ϕ′ (x̄, d) = lim ≤ max h∇ϕi (x̄), di,
λ→0 λ i∈I(x̄)

hence completing the proof. 


We are now in a position to obtain the Subdifferential Max-Function Rule
as an application of the Chain Rule, Theorem 2.94, and the result Theo-
rem 2.95 established above.

Theorem 2.96 (Max-Function Rule) Consider convex functions φi : Rn →


R, i = 1, 2, . . . , m, and let φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)}. Then
[
∂φ(x̄) = co , ∂φi (x̄),
i∈I(x̄)

where I(x̄) denotes the active index set.

Proof. In the discussion preceding Theorem 2.95, we observed that φ = ϕ ◦Φ,


where

Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and ϕ(y) = max{y1 , y2 , . . . , ym }

with y = (y1 , y2 , . . . , ym ) ∈ Rm . By Theorem 2.95,

ϕ′ (y, d) = max

{hei , di},
i∈I (y)

where ei = (0, . . . , 0, 1, 0, . . . , 0) ∈ Rm with 1 at the i-th place and


I ′ (y) = {i ∈ {1, 2, . . . , m} : yi = ϕ(y)}. It is obvious that ϕ′ (y, .) is a support
function of {ei ∈ Rm : i ∈ I ′ (y)} and by Proposition 2.61, it is also the support
function of co {ei ∈ Rm : i ∈ I ′ (y)}. Therefore, by Theorem 2.79,

∂ϕ(y) = co {ei ∈ Rm : i ∈ I ′ (y)},

that is,

∂ϕ(y) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I ′ (y),


m
X
/ I ′ (y),
µi = 0, i ∈ µi = 1}.
i=1

Thus,

∂ϕ(Φ(x̄)) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I ′ (Φ(x̄)),


m
X
/ I ′ (Φ(x̄)),
µi = 0, i ∈ µi = 1}.
i=1

© 2012 by Taylor & Francis Group, LLC


2.4 Subdifferential Calculus 107

As I ′ (Φ(x̄)) = I(x̄), the above condition reduces to

∂ϕ(Φ(x̄)) = {(µ1 , µ2 , . . . , µm ) ∈ Rm : µi ≥ 0, i ∈ I(x̄),


Xm
µi = 0, i ∈
/ I(x̄), µi = 1}.
i=1

As ϕ is a nondecreasing convex function, applying Theorem 2.94 to φ = ϕ ◦ Φ


yields

Xm
∂φ(x̄) = { µi ξi : (µ1 , µ2 , . . . , µm ) ∈ ∂φ(Φ(x̄)),
i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m}
Xm Xm
= { µi ξi : µi ≥ 0, i ∈ I(x̄), µi = 0, i ∈
/ I(x̄), µi = 1,
i=1 i=1
ξi ∈ ∂φi (x̄), i = 1, 2, . . . , m},

which implies
[
∂φ(x̄) = co ∂φi (x̄),
i∈I(x̄)

thereby leading to the desired result. 


Observe that in the Max-Function Rule above, the maximum was over a
finite index set. Now if the index set is a compact set, need not be finite, then
what will the subdifferential for sup-function be? This aspect was looked into
by Valadier [109] and thus is also referred to as the Valadier Formula. Below
we present the Valadier Formula from Ruszczyński [102].

Theorem 2.97 Consider a function

Φ(x) = sup φ(x, y),


y∈Y

where φ : Rn × Y → R̄. Let x̄ ∈ dom Φ such that


(i) φ(., y) is convex for every y ∈ Y ,
(ii) φ(x, .) is usc for every x ∈ Rn ,
(iii) Y ⊂ Rm is compact.
Furthermore, if φ(., y) is continuous at x̄ for every y ∈ Y , then
[
∂Φ(x̄) = co ∂x φ(x̄, y),
y∈Ŷ (x̄)

where Ŷ (x̄) = {y ∈ Y : φ(x̄, y) = Φ(x̄)} and ∂x φ denotes the subdifferential


with respect to x.

© 2012 by Taylor & Francis Group, LLC


108 Tools for Convex Optimization

Proof. Observe that (ii) and (iii) ensure that Ŷ (x̄) is nonempty and compact.
Suppose that ξ ∈ ∂x φ(x̄, ȳ) for some ȳ ∈ Ŷ (x̄). By Definition 2.77 of the
subdifferential,

φ(x, ȳ) − φ(x̄, ȳ) ≥ hξ, x − x̄i, ∀ x ∈ Rn .

As ȳ ∈ Ŷ (x̄), φ(x̄, ȳ) = Φ(x̄). Therefore, the above inequality leads to

Φ(x) − Φ(x̄) = sup φ(x, y) − φ(x̄, ȳ) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,


y∈Y

thus implying that ξ ∈ ∂Φ(x̄). Because ȳ ∈ Ŷ (x̄) and ξ ∈ ∂x φ(x̄, ȳ) were
arbitrary,

∂Φ(x̄) ⊃ ∂x φ(x̄, y), ∀ y ∈ Ŷ (x̄).

Because ∂Φ(x̄) is convex, the preceding inclusion yields


[
∂Φ(x̄) ⊃ co ∂x φ(x̄, y).
y∈Ŷ (x̄)

To establish the converse, we will prove the reverse inclusion


S in the above
relation. Because ∂Φ(x̄) is closed, we first show that y∈Ŷ (x̄) ∂x φ(x̄, y) is
closed. Suppose that ξk ∈ ∂x φ(x̄, yk ), where {yk } ⊂ Ŷ (x̄) such that ξk → ξ. ¯
Because Ŷ (x̄) is compact and hence closed, {yk } is a bounded sequence. By
the Bolzano–Weierstrass Theorem, Proposition 1.3, it has a convergent subse-
quence. Without loss of generality, suppose yk → ȳ, which by the closedness
of Ŷ (x̄) implies that ȳ ∈ Ŷ (x̄). By the definition of subdifferential along with
the facts that {yk } ⊂ Ŷ (x̄) and ȳ ∈ Ŷ (x̄), that is, φ(x̄, yk ) = Φ(x̄) = φ(x̄, ȳ)
imply that for every x ∈ Rn ,

φ(x, yk ) ≥ φ(x̄, yk ) + hξk , x − x̄i


= φ(x̄, ȳ) + hξk , x − x̄i, ∀ k ∈ N.

Taking the limit supremum as k → +∞, which by the upper semicontinuity


of φ(x, .) over Rn leads to

φ(x, ȳ) ≥ lim sup φ(x, yk ) ≥ φ(x̄, ȳ) + lim suphξk , x − x̄i
k→∞ k→∞
¯ x − x̄i, ∀ x ∈ Rn ,
= φ(x̄, ȳ) + hξ,
S
thereby yielding that ξ¯ ∈ ∂x φ(x̄, ȳ). Hence, y∈Ŷ (x̄) ∂x φ(x̄, y) is closed.
Now let us assume on the contrary that
[
∂Φ(x̄) 6⊂ co ∂x φ(x̄, y),
y∈Ŷ (x̄)

© 2012 by Taylor & Francis Group, LLC


2.4 Subdifferential Calculus 109

that is, there exists ξ¯ ∈ ∂Φ(x̄) such that


[
ξ¯ ∈
/ co ∂x φ(x̄, y).
y∈Ŷ (x̄)

S
As co y∈Ŷ (x̄) ∂x φ(x̄, y) is a closed convex set, by the Strict Separation The-
orem, Theorem 2.26 (iii), there exists d ∈ Rn with d 6= 0 such that
¯ di > hξ, di, ∀ ξ ∈ ∂x φ(x̄, y), ∀ y ∈ Ŷ (x̄).
hξ, (2.25)

Consider a sequence {λk } ⊂ R+ such that λk → 0. As Φ is convex, by the


definition of subdifferential,

Φ(x̄ + λk d) − Φ(x̄) ¯ di.


≥ hξ, (2.26)
λk

For k ∈ N, define the set


 
φ(x̄ + λk d, y) − Φ(x̄) ¯ di .
Yk = y ∈ Y : ≥ hξ,
λk

We claim that Yk is compact and nonempty. Consider {yr } ∈ Yk such that


yr → ŷ. Because yr ∈ Yk ,

φ(x̄ + λk d, yr ) − Φ(x̄) ¯ di.


≥ hξ,
λk

Taking the limit supremum as r → +∞, which along with the upper semicon-
tinuity of φ(x, .) for every x ∈ Rn implies that

φ(x̄ + λk d, ŷ) − Φ(x̄) ¯ di.


≥ hξ,
λk

Thus, ŷ ∈ Yk and hence Yk is closed for every k ∈ N. As Yk ⊂ Y and Y


is compact, Yk is closed and bounded and thus compact. Also by the upper
semicontinuity of φ(x, .) for every x ∈ Rn , Ŷ (x̄ + λk d) is nonempty. From the
inequality (2.26) and the definition of the set Yk , Ŷ (x̄ + λk d) ⊂ Yk and hence
Yk is nonempty. For every y ∈ Y , consider the expression

φ(x̄ + λd, y) − Φ(x̄) φ(x̄ + λd, y) − φ(x̄, y) φ(x̄, y) − Φ(x̄)


= + . (2.27)
λ λ λ
From the discussion preceding Theorem 2.76 on directional derivatives, the
first term on the right-hand side of the above expression is a nondecreasing
function of λ, that is,

φ(x̄ + λ1 d, y) − φ(x̄, y) φ(x̄ + λ2 d, y) − φ(x̄, y)


≤ , ∀ 0 < λ 1 ≤ λ2 . (2.28)
λ1 λ2

© 2012 by Taylor & Francis Group, LLC


110 Tools for Convex Optimization

Also, as Φ(x̄) ≥ φ(x̄, y) for every y ∈ Y ,


φ(x̄, y) − Φ(x̄) φ(x̄, y) − Φ(x̄)
≤ , ∀ 0 < λ 1 ≤ λ2 , (2.29)
λ1 λ2
which implies that the second term is also nondecreasing in λ. Thus, combining
the conditions (2.28) and (2.29), the expression
φ(x̄ + λ1 d, y) − Φ(x̄) φ(x̄ + λ2 d, y) − Φ(x̄)
≤ , ∀ 0 < λ 1 ≤ λ2 ,
λ1 λ2
that is, the expression (2.27) is nondecreasing in λ. From the above inequality,
it is obvious that Y1 ⊂ Y2 for every 0 < λ1 ≤ λ2 . As {λk } is a decreasing
sequence,
Y1 ⊃ Y2 ⊃ Y3 ⊃ . . . .
As for every k ∈ N, Yk is compact and nonempty, there exists ỹ ∈ Yk for all
k ∈ N. Therefore,
φ(x̄ + λk d, ỹ) − Φ(x̄) ¯ di, ∀ k ∈ N,
≥ hξ,
λk
which implies that the term on the left-hand side is bounded below for
every k ∈ N. By the continuity of φ(., y) at x̄ for every y ∈ Y ,
φ(x̄ + λk d, ỹ) → φ(x̄, ỹ) which along with the lower boundedness yields that
ỹ ∈ Ŷ (x̄), that is, Φ(x̄) = φ(x̄, ỹ). Taking the limit as k → +∞ in the above
inequality along with Definition 2.74 of the directional derivative implies that
¯ di.
φ′ ((x̄, ỹ), d) ≥ hξ,
As φ(., ỹ) is continuous at x̄, any neighborhood of x̄ is contained in dom φ(., ỹ).
Thus, x̄ ∈ int dom φ(., ỹ), which by Theorem 2.79 implies that φ′ (x̄, ỹ) is the
support function of ∂x φ(x̄, ỹ). Also, by Proposition 2.82, ∂x φ(x̄, ỹ) is compact.
Therefore, there exists ξ ∈ ∂x φ(x̄, ỹ) such that the above inequality becomes
¯ di,
hξ, di ≥ hξ,
thereby contradicting the inequality (2.25) as ỹ ∈ Ŷ (x̄), hence completing the
proof. 
From Proposition 2.56, another operation on the convex functions that
leads to a convex function is the inf-convolution. We end this section of the
subdifferential calculus rules by presenting the subdifferential rule for the inf-
convolution for a particular case from Lucchetti [79].
Theorem 2.98 Consider proper lsc convex functions φi : Rn → R̄, i = 1, 2.
Let x̄, x1 , x2 ∈ Rn be such that
x1 + x2 = x̄ and (φ1  φ2 )(x̄) = φ1 (x1 ) + φ2 (x2 ).
Then
∂(φ1  φ2 )(x̄) = ∂φ1 (x1 ) ∩ ∂φ2 (x2 ).

© 2012 by Taylor & Francis Group, LLC


2.5 Conjugate Functions 111

Proof. Suppose that ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ). By Definition 2.77 of the subd-
ifferential, for i = 1, 2,

φi (yi ) − φi (xi ) ≥ hξ, yi − xi i, ∀ yi ∈ Rn .

Define y1 +y2 = ȳ. The above inequality along with the given hypothesis leads
to

φ1 (y1 ) + φ2 (y2 ) ≥ (φ1  φ2 )(x̄) + hξ, ȳ − x̄i, ∀ y1 , y2 ∈ Rn .

Taking the infimum over y1 and y2 satisfying y1 + y2 = ȳ in the above inequal-


ity, which by Definition 2.54 of the inf-convolution yields

(φ1  φ2 )(ȳ) ≥ (φ1  φ2 )(x̄) + hξ, ȳ − x̄i.

As ȳ ∈ Rn was arbitrary, the above inequality holds for every ȳ ∈ Rn .


Thus, ξ ∈ ∂(φ1  φ2 )(x̄). Because ξ ∈ ∂φ1 (x1 ) ∩ ∂φ2 (x2 ) was arbitrary,
∂φ1 (x1 ) ∩ ∂φ2 (x2 ) ⊂ ∂(φ1  φ2 )(x̄).
Conversely, suppose that ξ ∈ ∂(φ1  φ2 )(x̄). Therefore,

∂(φ1  φ2 )(ȳ) ≥ φ1 (x1 ) + φ2 (x2 ) + hξ, ȳ − x̄i, ∀ ȳ ∈ Rn .

As the above inequality holds for any ȳ ∈ Rn , then ȳ = x + x2 for some


x ∈ Rn . Substituting in the above inequality along with the definition of the
inf-convolution yields

φ1 (x) + φ2 (x2 ) ≥ φ1 (x1 ) + φ2 (x2 ) + hξ, (x + x2 ) − (x1 + x2 )i, ∀ x ∈ Rn ,

which implies that

φ1 (x) ≥ φ1 (x1 ) + hξ, x − x1 i, ∀ x ∈ Rn .

Therefore, ξ ∈ ∂φ1 (x1 ). Similarly, it can be shown that ξ ∈ ∂φ2 (x2 ) and
hence ξ ∈ ∂φ1 (x1 ) ∩ φ2 (x2 ). Because ξ ∈ ∂(φ1  φ2 )(x̄) was arbitrary,
∂(φ1  φ2 )(x̄) ⊂ ∂φ1 (x1 ) ∩ φ2 (x2 ), thereby establishing the result. 

2.5 Conjugate Functions


All this background on convexity, convex sets as well as convex functions, the
subdifferentials and their calculus form a backbone for the study of convex
optimization theory. Optimization problems appear not only in the specialized
fields of engineering, management sciences, and finance, but also in some sim-
ple real-life problems. For instance, if the cost of manufacturing x1 , x2 , . . . , xn
quantities of n goods is given by φ(x) and the price of selling these goods

© 2012 by Taylor & Francis Group, LLC


112 Tools for Convex Optimization

is ξ1 , ξ2 , . . . , ξn , respectively, then the manufacturer would like to choose the


quantities x1 , x2 , . . . , xn in such a way that it leads to maximum profit, where
the profit function is given by the affine function {hξ, xi − φ(x)}. Theoret-
ically, this problem had been expressed using the conjugate functions of φ
introduced by Fenchel [45], which forms a class of convex functions. As we
will see in a short while, these conjugate functions are related to not only the
subdifferential for a convex function by the Fenchel–Young inequality but also
to the ε-subdifferential via its epigraph. For convex functions, the very idea of
conjugacy seems to derive from the fact that a proper lsc convex function is
a pointwise supremum of affine functions majorized by it. But before moving
on to this result, we present the following lemma from Lucchetti [79].

Lemma 2.99 Consider a proper lsc convex function φ : Rn → R̄. Let


x̄ ∈ dom φ and γ ∈ R such that φ(x̄) > γ. Then there exists (a, b) ∈ Rn × R
such that the affine function h(x) = ha, xi + b satisfies

φ(x) ≥ h(x), ∀ x ∈ Rn and h(x̄) > γ.

Proof. As φ is an lsc convex function, by Theorem 1.9 and Proposition 2.48,


epi φ is a closed convex set in Rn × R. From the given hypothesis, it is obvious
that (x̄, γ) ∈
/ epi φ. By the Strict Separation Theorem, Theorem 2.26 (iii),
there exist (a, λ) ∈ Rn × R with (a, λ) 6= (0, 0) and b ∈ R such that

ha, xi + λα ≥ b > ha, x̄i + λγ, ∀ (x, α) ∈ epi φ. (2.30)

In particular, taking (x̄, φ(x̄)) ∈ epi φ, the above inequality reduces to

λ(φ(x̄) − γ) > 0.

As φ(x̄) > γ, the above strict inequality leads to λ > 0. Again, taking
(x, φ(x)) ∈ epi φ in the condition (2.30) yields

φ(x) ≥ h(x), ∀ x ∈ dom φ and h(x̄) > γ,


−a b
where h(x) = h , xi + . Observe that for x ∈
/ dom φ, the first inequality
λ λ
holds trivially, that is,

φ(x) ≥ h(x), ∀ x ∈ Rn ,

thereby establishing the result. 


Now we present the main result, the proof of which is from Lucchetti [79].

Theorem 2.100 A proper lsc convex function φ : Rn → R̄ can be expressed


as a pointwise supremum of the collection of all affine functions majorized by
it, that is, for every x ∈ Rn ,

φ(x) = sup{h(x) : φ(x) ≥ h(x), h(x) = ha, xi + b, a ∈ Rn , b ∈ R}.

© 2012 by Taylor & Francis Group, LLC


2.5 Conjugate Functions 113

Proof. Define the function Φ : Rn → R̄ as

Φ(x) = sup{h(x) : φ(x) ≥ h(x), h(x) = ha, xi + b, a ∈ Rn , b ∈ R}.

Because Φ is a pointwise supremum of affine functions, it is an lsc convex


function. Also, as φ(x) ≥ h(x), φ(x) ≥ Φ(x) for every x ∈ Rn , which implies
that epi φ is contained in the intersection of epigraph of the affine functions
h, epi h, which are majorized by φ, that is, φ(x) ≥ h(x) for every x ∈ Rn .
Therefore, to complete the proof, it is sufficient to prove that for (x̄, γ) 6∈ epi φ,
there exists an affine function h such that h(x̄) > γ. By Lemma 2.99, for
x̄ ∈ dom φ such an h exists.
Now suppose that x̄ ∈/ dom φ. As (x̄, γ) 6∈ epi φ, working along the lines
of the proof of Lemma 2.99, there exist (a, λ) ∈ Rn × R with (a, λ) 6= (0, 0)
and b ∈ R such that

ha, xi + λα ≥ b > ha, x̄i + λγ, ∀ (x, α) ∈ epi φ.

If λ 6= 0, the affine function h exists as in the lemma. If λ = 0, the above


inequality reduces to

ha, xi ≥ b > ha, x̄i, ∀ x ∈ dom φ.

From the above condition,

h(x) ≤ 0, ∀ x ∈ dom φ and h(x̄) > 0,

where h(x) = h−a, xi + b. As a consequence of Lemma 2.99, it is obvious that


a proper lsc convex function has at least one affine function majorized by it.
Therefore, φ has an affine function, say h̄, majorized by it, that is φ(x) ≥ h̄(x)
for every x ∈ Rn . Now for any µ > 0,

φ(x) ≥ h(x) + µh̄(x), ∀ x ∈ dom φ.

The above inequality holds trivially for x ∈


/ dom φ. Thus,

φ(x) ≥ (h + µh̄)(x), ∀ x ∈ Rn ,

which implies the affine function (h + µh̄) is majorized by φ. As h(x̄) > 0, for
µ sufficiently large, (h + µh̄)(x̄) > γ, thereby establishing the result. 
Denote the set of all affine functions by H. Consider the support set of φ
denoted by supp(φ, H), which is the collection of all affine functions majorized
by φ, that is,

supp(φ, H) = {h ∈ H : h(x) ≤ φ(x), ∀ x ∈ Rn }.

An affine function h ∈ H is the affine support of φ if

h(x) ≤ φ(x), ∀ x ∈ Rn and h(x̄) = φ(x̄), for some x̄ ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


114 Tools for Convex Optimization

Consider φ : Rn → R̄ and x̄ ∈ dom φ such that ∂φ(x̄) is nonempty. Then for


any ξ ∈ ∂φ(x̄), by Definition 2.77,

φ(x) ≥ hξ, xi + (φ(x̄) − hξ, x̄i), ∀ x ∈ Rn . (2.31)

Define an affine function h : Rn → R given by

h(x) = hξ, xi + (φ(x̄) − hξ, x̄i). (2.32)

Combining (2.31) and (2.32),

h(x) ≤ φ(x), ∀ x ∈ Rn and h(x̄) = φ(x̄),

thereby implying that h ∈ H is an affine support of φ. Therefore, if ∂φ is


nonempty, then there exists an affine support to it.
Now consider a set Φ∗ ⊂ Rn defined as
¯ ᾱ) ∈ Rn × R : h(x) = hξ,
Φ∗ = {(ξ, ¯ xi − ᾱ ≤ φ(x)},

which implies for every x ∈ Rn , h(x) ≤ φ(x). Therefore,


¯ xi − φ(x)},
ᾱ ≥ sup {hξ,
x∈Rn

which implies Φ∗ can be considered the epigraph of the function φ∗ , which is


the conjugate of φ. We formally introduce the notion of conjugate below.

Definition 2.101 Consider a function φ : Rn → R̄. The conjugate of φ,


φ∗ : Rn → R̄, is defined as

φ∗ (ξ) = sup {hξ, xi − φ(x)}.


x∈Rn

Observe that Φ∗ = epi φ∗ , as discussed above. The biconjugate of φ, φ∗∗ , is


the conjugate of φ∗ , that is,

φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)}.


ξ∈Rn

Consider a set F ⊂ Rn . The conjugate of the indicator function to the set


F is

δF∗ (ξ) = sup {hξ, xi − δF (x)} = sup hξ, xi,


x∈Rn x∈F

which is actually the support function to the set F . Therefore, δF∗ = σF for
any set F .
Observe that the definitions of conjugate and biconjugate functions are
given for any arbitrary function. Below we present some properties of conju-
gate functions.

© 2012 by Taylor & Francis Group, LLC


2.5 Conjugate Functions 115

Proposition 2.102 For any function φ : Rn → R̄, the conjugate function


φ∗ is always lsc convex. In addition, if φ is proper convex, then φ∗ is also a
proper convex function.

Proof. Consider any ξ1 , ξ2 ∈ Rn . Then for every λ ∈ [0, 1],

φ∗ ((1 − λ)ξ1 + λξ2 ) = sup {h((1 − λ)ξ1 + λξ2 ), xi − φ(x)}


x∈Rn
= sup {(1 − λ)(hξ1 , xi − φ(x)) + λ(hξ2 , xi − φ(x))},
x∈Rn

which by Proposition 1.7 leads to

φ∗ ((1 − λ)ξ1 + λξ2 ) ≤ (1 − λ) sup {hξ1 , xi − φ(x)} +


x∈Rn
λ sup {hξ2 , xi − φ(x)}
x∈Rn
∗ ∗
= (1 − λ)φ (ξ1 ) + λφ (ξ2 ), ∀ λ ∈ [0, 1].

Because ξ1 and ξ2 are arbitrary, from the above inequality φ∗ is convex. Also,
as φ∗ is a pointwise supremum of affine functions hx, .i − φ(x), it is lsc.
As φ is a proper convex function, dom φ is a nonempty convex set in Rn ,
which by Proposition 2.14 (i) implies that ri dom φ is nonempty. Also, by
Proposition 2.82, for any x̄ ∈ ri dom φ, ∂φ(x̄) is nonempty. Suppose that
ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential implies that

hξ, x̄i − φ(x̄) ≥ hξ, xi − φ(x), ∀ x ∈ Rn ,

which along with the definition of conjugate φ∗ implies that

hξ, x̄i − φ(x̄) = φ∗ (ξ).

As φ(x̄) is finite, φ∗ (ξ) is also finite, that is, ξ ∈ dom φ∗ . Also, by the proper-
ness of φ and the definition of φ∗ , it is obvious that φ∗ (ξ) > −∞ for every
x ∈ Rn , thereby showing that φ∗ is proper convex function. 
Observe that φ∗ is lsc convex irrespective of the nature of φ but for φ∗ to
be proper, we need φ to be a proper convex function. Simply assuming φ to be
proper need not imply that φ∗ is proper. For instance, consider φ(x) = −x2 ,
which is a nonconvex proper function. Then φ∗ ≡ +∞ and hence not proper.
Next we state some conjugate rules that can be proved directly using the
definition of conjugate functions.

Proposition 2.103 Consider a function φ̄, φ : Rn → R̄.


(i) If φ̄ ≤ φ, then φ̄∗ ≥ φ∗ .
(ii) If φ̄(x) = φ(x) + c, φ̄∗ (ξ) = φ∗ (ξ) − c.
(iii) If φ̄(x) = λφ(x) for λ > 0, φ̄∗ (ξ) = λφ∗ (ξ/λ).

© 2012 by Taylor & Francis Group, LLC


116 Tools for Convex Optimization

(iv) For every x and ξ in Rn ,

φ∗ (ξ) + φ(x) ≥ hξ, xi.

This is known as the Fenchel–Young Inequality. Equivalently,

φ∗∗ (x) ≤ φ(x), ∀ x ∈ Rn .

The readers are urged to verify these properties simply using Defini-
tion 2.101 of conjugate and biconjugate functions.
As we discussed in Theorem 2.100 that a convex function is pointwise
supremum of affine functions, the biconjugate of the function plays an impor-
tant role in this respect. Below we present a result that relates the biconju-
gate with the support set. The proof is along the lines of Hiriart-Urruty and
Lemaréchal [63].

Theorem 2.104 Consider a proper function φ : Rn → R̄. Then φ∗∗ is the


pointwise supremum of all affine functions majorized by φ, that is,

φ∗∗ (x̄) = sup h(x̄).


h∈supp(φ,H)

More precisely, φ∗∗ = cl co φ.

Proof. An affine function h is majorized by φ, that is, h(x) ≤ φ(x) for every
x ∈ Rn . Because an affine function is expressed as h(x) = hξ, xi − α for some
ξ ∈ Rn and α ∈ R,

hξ, xi − α ≤ φ(x), ∀ x ∈ Rn .

Therefore, by Definition 2.101 of the conjugate function, φ∗ (ξ) ≤ α, which


implies ξ ∈ dom φ∗ . Then for any x ∈ Rn ,

sup h(x) = sup {hξ, xi − α}


h∈supp(φ,H) ξ∈dom φ∗ , φ∗ (ξ)≤α

= sup {hξ, xi − φ∗ (ξ)}


ξ∈dom φ∗
= sup {hξ, xi − φ∗ (ξ)} = φ∗∗ (x),
ξ∈Rn

thereby yielding the desired result. From Definition 2.57 of the closed convex
function, φ∗∗ = cl co φ, as desired. 
Combining Theorems 2.100 and 2.104 we have the following result for a
proper lsc convex function.

Theorem 2.105 Consider a proper lsc convex function φ : Rn → R̄. Then


φ∗∗ = φ.

© 2012 by Taylor & Francis Group, LLC


2.5 Conjugate Functions 117

Observe that the above theorem holds when the function is lsc. What if
φ is only proper convex but not lsc, then how is one supposed to relate the
function φ to its biconjugate φ∗∗ ? The next result from Attouch, Buttazzo,
and Michaille [3] looks into this aspect.
Proposition 2.106 Consider a proper convex function φ : Rn → R̄. Assume
that φ admits a continuous affine minorant. Then φ∗∗ = cl φ. Consequently,
φ is lsc at x̄ ∈ Rn if and only if φ(x̄) = φ∗∗ (x̄).
Proof. By the Fenchel–Young inequality, Proposition 2.103 (iv),
φ(x) ≥ hξ, xi − φ∗ (ξ), ∀ x ∈ Rn ,
which implies that h(x) = hξ, xi − φ∗ (ξ) belongs to supp(φ, H). By Defini-
tion 2.101 of the biconjugate function,
φ∗∗ (x) = sup {hξ, xi − φ∗ (ξ)},
ξ∈Rn

which leads to φ∗∗ being the upper envelope of the continuous affine minorants
of φ. Applying Proposition 2.102 to φ∗ , φ∗∗ is a proper lsc convex function
and thus,
φ∗∗ ≤ cl φ ≤ φ.
This inequality along with Proposition 2.103 (i) leads to
(φ∗∗ )∗∗ ≤ (cl φ)∗∗ ≤ φ∗∗ .
As φ∗∗ and cl φ are both proper lsc convex functions, by Theorem 2.105,
(φ∗∗ )∗∗ = φ∗∗ and (cl φ)∗∗ = cl φ,
thereby reducing the preceding inequality to
φ∗∗ ≤ cl φ)∗∗ ≤ φ∗∗ .
Hence, φ∗∗ = cl φ, thereby establishing the first part of the result.
From Chapter 1, we know that closure of a function φ is defined as
cl φ(x̄) = lim inf φ(x),
x→x̄

which is the same as φ(x̄) if φ is lsc at x̄ by Definition 1.4, thereby yielding


φ(x̄) = cl φ(x̄). Consequently, by the first part, the lower semicontinuity of φ
at x̄ is equivalent to φ(x̄) = φ∗∗ (x̄), thereby completing the proof. 
With all the preceding results, and discussions on the properties of the
conjugates and biconjugates, we now move on to see how the conjugates of
the function operations are defined. More precisely, if given some functions and
we perform some operation on them, like the sum operation or the supremum
operation, then how are their conjugates related to the conjugates of the
given functions? In the next result from Hiriart-Urruty and Lemaréchal [63]
and Rockafellar [97], we look into this aspect of conjugate functions.

© 2012 by Taylor & Francis Group, LLC


118 Tools for Convex Optimization

Theorem 2.107 (i) (Inf-Convolution TRule) Consider proper functions


m
φi : Rn → R̄, i = 1, 2, . . . , m, satisfying i=1 dom φ∗i 6= ∅. Then

(φ1  φ2  . . .  φm )∗ = φ∗1 + φ∗2 + . . . + φ∗m .

(ii) (Sum Rule) Consider Tm proper convex functions φi : Rn → R̄,


i = 1, 2, . . . , m, satisfying i=1 dom φi 6= ∅. Then

(cl φ1 + cl φ2 + . . . + cl φm )∗ = cl (φ∗1  φ∗2  . . .  φ∗m ).


Tm
If i=1 ri dom φi 6= ∅, then

(φ1 + φ2 + . . . + φm )∗ = φ∗1  φ∗2  . . .  φ∗m

and for every ξ ∈ dom (φ1 + φ2 + . . . + φm )∗ , the infimum of the problem

inf{φ∗1 (ξ1 ) + φ∗2 (ξ2 ) + . . . + φ∗m (ξm ) : ξ1 + ξ2 + . . . + ξm = ξ}

is attained.
(iii) (Infimum Rule) Consider a family of proper functions φi : Rn → R̄,
i ∈ I, where I is an arbitrary index set, having a common affine minorant
and satisfying supi∈I φ∗i (ξ) < +∞ for some ξ ∈ Rn . Then

(inf φi )∗ = sup φ∗i .


i∈I i∈I

(iv) (Supremum Rule) Consider a family of proper lsc convex functions


φi : Rn → R̄, i ∈ I, where I is an arbitrary index set. If supi∈I φi is not
indentically +∞, then

(sup φi )∗ = cl co(inf φ∗i ).


i∈I i∈I

Proof. (i) From Definition 2.101 of the conjugate function and Definition 2.54
of the inf-convolution along with Proposition 1.7,

(φ1  . . .  φm )∗ (ξ) = sup {hξ, xi − inf (φ1 (x1 ) + . . . + φm (xm ))}


x∈Rn x1 +...+xm =x

= sup sup {hξ, xi − (φ1 (x1 ) + . . . + φm (xm ))}


x∈Rn x1 +...+xm =x
= sup {hξ, x1 i − φ1 (x1 ) + . . . +
x1 ,...,xm ∈Rn
hξ, xm i − φm (xm )}
= φ∗1 (ξ) + . . . + φ∗m (xm ),

thereby establishing the desired result.


(ii) Replacing φi by φ∗i for i = 1, 2, . . . , m, in (i) along with Proposition 2.106
leads to

cl φ∗1 + cl φ2 + . . . + cl φm = (φ∗1  φ∗2  . . .  φ∗m )∗ .

© 2012 by Taylor & Francis Group, LLC


2.5 Conjugate Functions 119

Taking the conjugate on both sides and again applying Proposition 2.106
yields the requisite condition,

(cl φ∗1 + cl φ2 + . . . + cl φm )∗ = cl (φ∗1  φ∗2  . . .  φ∗m ).


Tm
If i=1 ri dom φi is nonempty, then by Proposition 2.68,

cl φ1 + cl φ2 + . . . + cl φm = cl (φ1 + φ2 + . . . + φm ).

Also, by the definition of conjugate functions,

(cl φ1 + cl φ2 + . . . + cl φm )∗ = (cl (φ1 + φ2 + . . . + φm ))∗


= (φ1 + φ2 + . . . + φm )∗ .

Now to establish the result, it is enough to prove that

φ∗1  φ∗2  . . .  φ∗m

is lsc. By Theorem 1.9, it is equivalent to showing that the lower-level set,

Sα = {ξ ∈ Rn : (φ∗1  . . .  φ∗m )(ξ) ≤ α},

is closed for every α ∈ R. Consider a bounded sequence {ξk } ⊂ Sα such that


ξk → ξ. By Definition 2.54 of the inf-convolution, there exist ξki ∈ Rn with
P m i
i=1 ξk = ξk such that

1
φ∗1 (ξk1 ) + . . . + φ∗m (ξkm ) ≤ α + , ∀ k ∈ N. (2.33)
k
Tm
By assumption, suppose that x̂ ∈ i=1 ri dom φi . As φi , i = 1, 2, . . . , m, are
convex, by Theorem 2.69, the functions are continuous at x̂. Therefore, for
some ε > 0 and Mi ∈ R, i = 1, 2, . . . , m,

φi (x) ≤ Mi , ∀ x ∈ Bε (x̄), i = 1, 2, . . . , m. (2.34)

For any d ∈ Bε (0), consider

hξk1 , di = hξk1 , x̂i − hξk1 , x̂ − di


= hξk1 , x̂i + hξk2 , x̂ − di + . . . + hξkm , x̂ − di − hξk , x̂ − di,

which by the Fenchel–Young inequality, Proposition 2.103 (iv), and the


Cauchy–Schwarz inequality, Proposition 1.1, leads to

hξk1 , di ≤ φ∗1 (ξk1 ) + φ1 (x̂) + φ∗2 (ξk2 ) + φ2 (x̂ − d) + . . . +


φ∗m (ξkm ) + φm (x̂ − d) + kξk k kx̂ − dk.

By the conditions (2.33) and (2.34), the above inequality reduces to


1
hξk1 , di ≤ α + + M2 + . . . + Mm + kξk k kx̂ − dk,
k

© 2012 by Taylor & Francis Group, LLC


120 Tools for Convex Optimization

which along with the boundedness of {ξk } implies that {ξk1 } ⊂ Rn is a


bounded sequence. Similarly, it can be shown that {ξki }, i = 2, . . . , m, are
bounded sequences. By the Bolzano–Weierstrass Theorem, Proposition 1.3,
{ξki }, i = 1, 2, . . . , m, have a convergent subsequence. Without loss of general-
ity, assume that ξki → ξi , i = 1, 2, . . . , m. Because ξk = ξk1 + ξk2 + . . . + ξkm , as
limit k → +∞,

ξ = ξ1 + ξ2 + . . . + ξm .

By Proposition 2.102, φ∗i , i = 1, 2, . . . , m, are proper lsc convex functions,


therefore taking the limit as k → +∞,

φ∗1 (ξ1 ) + φ∗2 (ξ2 ) + . . . + φ∗m (ξm ) ≤ α.

By the definition of inf-convolution, the above inequality leads to

(φ∗1  φ∗2  . . .  φ∗m )(ξ) ≤ α,

which implies that ξ ∈ Sα . Because α ∈ R was arbitrary, the lower-level set is


closed for every α ∈ R and hence

φ∗1  φ∗2  . . .  φ∗m

is closed. Repeating the same arguments with

(φ∗1  φ∗2  . . .  φ∗m )(ξ) = α and ξk = ξ

yields that the infimum is achieved, thereby completing the proof.


(iii) By Definition 2.101, for every ξ ∈ Rn ,

(inf φi )∗ (ξ) = sup {hξ, xi − inf φi (x)}


i∈I x∈Rn i∈I

= sup sup{hξ, xi − φi (x)}


x∈Rn i∈I
= sup sup {hξ, xi − φi (x)} = sup φ∗i (ξ),
i∈I x∈Rn i∈I

as desired.
(iv) Replacing φi by φ∗i for i = 1, 2, . . . , m in (iii),

sup φ∗∗ ∗ ∗
i = (inf φi ) .
i∈I i∈I

As φi , i = 1, 2, . . . , m, are lsc, the above condition reduces to

sup φi = (inf φ∗i )∗ .


i∈I i∈I

Taking the conjugate on both sides leads to

(sup φi )∗ = (inf φ∗i )∗∗ ,


i∈I i∈I

© 2012 by Taylor & Francis Group, LLC


2.5 Conjugate Functions 121

which by Theorem 2.104 yields


(sup φi )∗ = cl co (inf φ∗i ). 
i∈I i∈I

Next, using the Fenchel–Young inequality, we present an equivalent char-


acterization of the subdifferential of a convex function.

Theorem 2.108 Consider a proper convex function φ : Rn → R̄. Then for


any x, ξ ∈ Rn

ξ ∈ ∂φ(x) ⇐⇒ φ(x) + φ∗ (ξ) = hξ, xi.

In addition, if φ is also lsc, then for any x and ξ in Rn

ξ ∈ ∂φ(x) ⇐⇒ φ(x) + φ∗ (ξ) = hξ, xi ⇐⇒ x ∈ ∂φ∗ (ξ).

Proof. Suppose that ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential


implies that

φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .

The above inequality leads to

hξ, x̄i − φ(x̄) ≥ sup {hξ, xi − φ(x)} = φ∗ (ξ),


x∈Rn

that is,

φ(x̄) + φ∗ (ξ) ≤ hξ, x̄i

which along with the Fenchel–Young inequality, Proposition 2.103 (iv), re-
duces to the desired condition

φ(x̄) + φ∗ (ξ) = hξ, x̄i.

Conversely, suppose that above condition is satisfied, which by Defini-


tion 2.101 of the conjugate function implies

hξ, x̄i − φ(x̄) ≥ hξ, xi − φ(x), ∀ x ∈ Rn ,

that is,

φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .

Thus, ξ ∈ ∂φ(x̄), thereby establishing the equivalence.


Now if φ is lsc as well, then by Theorem 2.105, φ = φ∗∗ . Then the equivalent
condition can be expressed as ξ¯ ∈ ∂φ(x̄) if and only if
¯ = hξ,
φ∗∗ (x̄) + φ∗ (ξ) ¯ x̄i.

© 2012 by Taylor & Francis Group, LLC


122 Tools for Convex Optimization

By Definition 2.101 of the biconjugate function, the above condition is equiv-


alent to
¯ x̄i − φ∗ (ξ)
hξ, ¯ ≥ hξ, x̄i − φ∗ (ξ), ∀ ξ ∈ Rn ,

that is,
¯ ≥ hξ − ξ,
φ∗ (ξ) − φ∗ (ξ) ¯ x̄i, ∀ ξ ∈ Rn .
¯ The converse can be worked
By the definition of subdifferential, x̄ ∈ ∂φ∗ (ξ).
out along the lines of the previous part and thus establishing the desired
relation. 
As an application of the above theorem, consider a closed convex cone
K ⊂ Rn . We claim that ξ¯ ∈ ∂δK (x̄) if and only if x̄ ∈ ∂δK ◦ (ξ).
¯ Suppose that
¯
ξ ∈ ∂δK (x̄) = NK (x̄), which is equivalent to
¯ x − x̄i ≤ 0, ∀ x ∈ K.
hξ,
¯ x̄i = 0.
In particular, taking x = 0 and x = 2x̄, respectively, implies that hξ,
Therefore, the above inequality reduces to
¯ xi ≤ 0, ∀ x ∈ K,
hξ,
which by Definition 2.30 implies that ξ¯ ∈ K ◦ . Thus, ξ¯ ∈ NK (x̄) is equivalent
to
x̄ ∈ K, ξ¯ ∈ K ◦ , ¯ x̄i = 0.
hξ,
For a closed convex cone K, by Proposition 2.31, K ◦◦ = K. As x̄ ∈ K = K ◦◦ ,
hξ, x̄i ≤ 0, ∀ ξ ∈ K ◦ .
¯ x̄i = 0, the above condition is equivalent to
Because hξ,
¯ x̄i ≤ 0, ∀ ξ ∈ K ◦ ,
hξ − ξ,
¯ thereby proving our claim.
which implies that x̄ ∈ NK ◦ (ξ),

2.6 ε-Subdifferential
In Subsection 2.3.3 on differentiability properties of a convex function, from
Proposition 2.82 and the examples preceding it, we noticed that ∂φ(x) may
turn out to be empty, even though x ∈ dom φ. To overcome this aspect of
subdifferentials, the concept of the ε-subdifferential came into existence; it not
only overcomes the drawback of subdifferentials but is also important from
the optimization point of view. The idea can be found in the work of Brønsted
and Rockafellar [19] but the theory of ε-subdifferential calculus was given by
Hiriart-Urruty [58].

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 123

Definition 2.109 Consider a proper convex function φ : Rn → R̄. For ε > 0,


the ε-subdifferential of φ at x̄ ∈ dom φ is given by
∂ε φ(x̄) = {ξ ∈ Rn : φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn }.
For a zero function, 0 : Rn → R, defined as 0(x) = 0 for every x ∈ Rn ,
∂ε 0(x̄) = {0} for every ε > 0. Otherwise if there exists ξ ∈ ∂ε 0(x̄) with ξ =
6 0,
by the above definition of ε-subdifferential,
0 ≥ hξ, x − x̄i − ε
Xn
= ξi (xi − x̄i ) − ε, ∀ x ∈ Rn .
i=1

Because ξ 6= 0, there exists some j ∈ {1, 2, . . . , n} such that ξj 6= 0. In



particular, taking xj = x̄j + and xi = x̄i , i 6= j, the above inequality yields
ξj
ε ≤ 0, which is a contradiction.
As shown in Section 2.4 that for a convex set F ⊂ Rn , the subdifferential
of the indicator function coincides with the normal cone, that is, ∂δF = NF .
Along similar lines, we define the ε-normal set.
Definition 2.110 Consider a convex set F ⊂ Rn . Then for ε > 0, the
ε-subdifferential of the indicator function at x̄ ∈ F is
∂ε δF (x̄) = {ξ ∈ Rn : δF (x) − δF (x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn }
= {ξ ∈ Rn : ε ≥ hξ, x − x̄i, ∀ x ∈ F },
which is also called the ε-normal set and denoted as Nε,F (x̄). Note that Nε,F
is not a cone unlike NF , which is always a cone.
Recall the proper convex function φ : R → R̄ given by
 √
− x, 0 ≤ x ≤ 1,
φ(x) =
+∞, otherwise,
considered in Subsection 2.3.3. As already mentioned, for x = 0, the subdif-
ferential ∂φ(x)
 is empty.
 But for any ε > 0, the ε-subdifferential at x = 0 is
−1
∂ε φ(x) = −∞, and hence nonempty.

In the proposition below we present some properties of the ε-subdifferential
of the convex functions.
Proposition 2.111 Consider a proper lsc convex function φ : Rn → R̄ and
let ε > 0 be given. Then for every x̄ ∈ dom φ, the ε-subdifferential ∂ε φ(x̄) is
a nonempty closed convex set and
\
∂φ(x̄) = ∂ε φ(x̄).
ε>0

For ε1 ≥ ε2 , ∂ε2 (x̄) ⊂ ∂ε1 (x̄).

© 2012 by Taylor & Francis Group, LLC


124 Tools for Convex Optimization

Proof. Observe that for x̄ ∈ dom φ and ε > 0, φ(x̄) − ε < φ(x̄), which
implies (x̄, φ(x̄) − ε) 6∈ epi φ. Because φ is a lsc convex, by Theorem 1.9 and
Proposition 2.48, epi φ is closed convex set in Rn × R. Therefore, applying the
Strict Separation Theorem, Theorem 2.26 (iii), there exists (ξ, γ) ∈ Rn × R
with (ξ, γ) 6= (0, 0) such that

hξ, x̄i + γ(φ(x̄) − ε) < hξ, xi + γα, ∀ (x, α) ∈ epi φ.

As (x, φ(x)) ∈ epi φ for every x ∈ dom φ, the above condition leads to

hξ, x̄i + γ(φ(x̄) − ε) < hξ, xi + γφ(x), ∀ x ∈ dom φ. (2.35)

In particular, taking x = x̄ in the preceding inequality yields γ > 0. Now


dividing (2.35) throughout by γ implies that
ξ
h− , x − x̄i − ε < φ(x) − φ(x̄), ∀ x ∈ dom φ.
γ
The above condition is also satisfied by x 6∈ dom φ, which implies
ξ
h− , x − x̄i − ε < φ(x) − φ(x̄), ∀ x ∈ Rn .
γ
ξ
By Definition 2.109 of the ε-subdifferential, − ∈ ∂ε φ(x̄). Thus, ∂ε φ(x̄) is
γ
nonempty for every x̄ ∈ dom φ.
Suppose that {ξk } ⊂ ∂ε φ(x̄) such that ξk → ξ. By the definition of
ε-subdifferential,

φ(x) − φ(x̄) ≥ hξk , x − x̄i − ε, ∀ x ∈ Rn .

Taking the limit as k → +∞, the above inequality leads to

φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

which implies that ξ ∈ ∂ε φ(x̄), thereby yielding the closedness of ∂ε φ(x̄).


Consider ξ1 , ξ2 ∈ ∂ε φ(x̄), which implies that for i = 1, 2,

φ(x) − φ(x̄) ≥ hξi , x − x̄i − ε, ∀ x ∈ Rn .

Therefore, for any λ ∈ [0, 1],

φ(x) − φ(x̄) ≥ h(1 − λ)ξ1 + λξ2 , x − x̄i − ε, ∀ x ∈ Rn ,

which implies (1 − λ)ξ1 + λξ2 ∈ ∂ε φ(x̄). Because ξ1 , ξ2 were arbitrary, ∂ε φ(x̄)


is convex.
Now we will prove that
\
∂φ(x̄) = ∂ε φ(x̄).
ε>0

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 125

Suppose that ξ ∈ ∂φ(x̄), which by Definition 2.77 of the subdifferential implies


that for every x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ, x − x̄i
≥ hξ, x − x̄i − ε, ∀ ε > 0.
Thus, by the definition of ε-subdifferential, ξ ∈ ∂ε φ(x̄) for every ε > 0. Because
ξ ∈ ∂φ(x̄) was arbitrary,
\
∂φ(x̄) ⊂ ∂ε φ(x̄).
ε>0

Conversely, consider ξ ∈ ∂ε φ(x̄) for every ε > 0, which implies that for
every x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ ε > 0.
As the preceding inequality holds for every ε > 0, taking the limit as ε → 0
leads to
φ(x) − φ(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,
thereby yielding ξ ∈ ∂φ(x̄). Because ξ was arbitrary, the reverse inclusion is
satisfied, that is,
\
∂φ(x̄) ⊃ ∂ε φ(x̄),
ε>0

hence establishing the result.


The relation ∂ε2 (x̄) ⊂ ∂ε1 (x̄) for ε1 ≥ ε2 can be easily worked out using
the definition of ε-subdifferential. 
The proof of the nonemptiness of ε-subdifferential
T is from Lucchetti [79].
In the example, it is easy to observe that at x = 0, ε>0 ∂ε φ(x) is empty, as is
∂φ(x). Before moving any further, let us consider the absolute value function,
φ(x) = |x|. The subdifferential of φ is given by

 1, x > 0,
∂φ(x) = [−1, 1] , x = 0,

−1, x < 0.
Now for ε > 0, the ε-subdifferential of φ is

 [1 − ε/x, 1] , x > ε/2,
∂ε φ(x) = [−1, 1] , −ε/2 ≤ x ≤ ε/2,

[−1, −1 − ε/x] x < −ε/2.
The graphs of ∂φ and ∂ε φ for ε = 1 are shown in Figures 2.8 and 2.9. The
graph of the subdifferential is a simple step function.
Similar to the characterization of the subdifferential in terms of the
conjugate function, the following result provides a relation between the
ε-subdifferential and the conjugate function.

© 2012 by Taylor & Francis Group, LLC


126 Tools for Convex Optimization

−1

FIGURE 2.8: Graph of ∂(|.|).

−1 1/2
−1/2 1

−1

FIGURE 2.9: Graph of ∂1 (|.|).

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 127

Theorem 2.112 Consider a proper convex function φ : Rn → R̄. Then for


any ε > 0 and x ∈ dom φ,

ξ ∈ ∂ε φ(x) ⇐⇒ φ(x) + φ∗ (ξ) − hξ, xi ≤ ε.

Proof. Consider any ξ ∈ ∂ε φ(x̄). By Definition 2.109 of ε-subdifferential,

φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

which implies that

hξ, xi − φ(x) + φ(x̄) − hξ, x̄i ≤ ε, ∀ x ∈ Rn .

By Definition 2.101 of the conjugate function,

φ∗ (ξ) + φ(x̄) − hξ, x̄i ≤ ε,

as desired.
Conversely, suppose that the inequality holds, which by the definition of
conjugate function implies that

hξ, xi − φ(x) + φ(x̄) − hξ, x̄i ≤ ε, ∀ x ∈ Rn ,

which yields that ξ ∈ ∂ε φ(x̄), thus establishing the equivalence. 


As mentioned earlier, the notion of ε-subdifferential appears in the well-
known work of Brønsted and Rockafellar [19] in which they estimated how
well ∂ε φ “approximates” ∂φ. We present the modified version of that famous
Brønsted–Rockafellar Theorem from Thibault [108] below. The proof involves
the famous Ekeland’s Variational Principle [41, 42, 43], which we state without
proof before moving on with the result.

Theorem 2.113 (Ekeland’s Variational Principle) Consider a closed proper


function φ : Rn → R̄ and for ε > 0, let x̄ ∈ Rn be such that

φ(x̄) ≤ infn φ(x) + ε.


x∈R

Then for any λ > 0, there exists xλ ∈ Rn such that


ε
kxλ − x̄k ≤ , φ(xλ ) ≤ φ(x̄),
λ
and xλ is the unique minimizer of the unconstrained problem
ε
inf φ(x) + kx − xλ k subject to x ∈ Rn .
λ
Observe that the second condition in Ekeland’s Variational Principle,
φ(xλ ) ≤ φ(x̄), implies that

φ(xλ ) − φ(x̄) ≤ 0 ≤ ε. (2.36)

© 2012 by Taylor & Francis Group, LLC


128 Tools for Convex Optimization

From the condition on x̄,

φ(x̄) ≤ infn φ(x) + ε ≤ φ(xλ ) + ε,


x∈R

which implies that

φ(x̄) − φ(xλ ) ≤ ε.

The above condition along with (2.36) leads to

|φ(xλ ) − φ(x̄)| ≤ ε.

Now to establish the modified version of Brønsted–Rockafellar Theorem, we


can apply

|φ(xλ ) − φ(x̄)| ≤ ε instead of φ(xλ ) ≤ φ(x̄)

in the Ekeland’s Variational Principle.

Theorem 2.114 (A modified version of the Brønsted–Rockafellar Theorem)


Consider a proper lsc convex function φ : Rn → R̄ and x̄ ∈ dom φ. Then for
any ε > 0 and for any ξ ∈ ∂ε φ(x̄), there exist xε ∈ Rn and ξε ∈ ∂φ(xε ) such
that
√ √
kxε − x̄k ≤ ε, kξε − ξk ≤ ε and |φ(xε ) − hξε , xε − x̄i − φ(x̄)| ≤ 2ε.

Proof. By Definition 2.109 of the ε-subdifferential, ξ ∈ ∂ε φ(x̄) implies

φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

that is,

φ(x̄) − hξ, x̄i ≥ φ(x) − hξ, xi + ε, ∀ x ∈ Rn .

By applying
√ √ 2.113, to φ − hξ, .i with
Ekeland’s Variational Principle, Theorem
λ = ε, there exists xε ∈ Rn such that kxε − x̄k ≤ ε,

|φ(xε ) − hξ, xε i − φ(x̄) + hξ, x̄i| ≤ ε (2.37)

and

φ(xε ) − hξ, xε i ≤ φ(x) − hξ, xi + εkx − xε k, ∀ x ∈ Rn . (2.38)

By the definition of subdifferential, the above condition (2.38) implies that



ξ ∈ ∂(φ + εk. − xε k)(xε ). (2.39)

As dom k. − xε k = Rn , by Theorem 2.69, k. − xε k is continuous on Rn .


Therefore, by Theorem 2.91 along with the fact that ∂(k. − xε k)(xε ) = B,
(2.39) becomes

ξ ∈ ∂φ(xε ) + εB.

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 129

Thus, there exists ξε ∈ ∂φ(xε ) such that kξε − ξk ≤ ε. From condition (2.37)
along with the Cauchy–Schwarz inequality, Proposition 1.1,

|φ(xε ) − hξ, xε − x̄i − φ(x̄)| ≤ ε + |hξε − ξ, xε − x̄i|


≤ ε + kξε − ξkkxε − x̄k = 2ε,

thereby completing the proof. 


As in the study of optimality conditions, we need the subdifferential cal-
culus rules; similarly, ε-subdifferentials also play a pivotal role in this respect.
Below we present the ε-subdifferential Sum Rule, Max-Function and the Scalar
Product Rules that we will need in our study of optimality conditions for the
convex programming problem (CP ). The proofs of the Sum and the Max-
Function Rules are from Hiriart-Urruty and Lemaréchal [62].

Theorem 2.115 (Sum Rule) Consider two proper convex functions φi : Rn → R̄,
i = 1, 2 such that ri dom φ1 ∩ ri dom φ2 6= ∅. Then for ε > 0,
[
∂ε (φ1 + φ2 )(x̄) = (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄))
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

for every x̄ ∈ dom φ1 ∩ dom φ2 .

Proof. Suppose that ε1 ≥ 0 and ε2 ≥ 0 such that ε1 + ε2 = ε. Consider


ξi ∈ ∂εi φi (x̄), i = 1, 2, which by Definition 2.109 of the ε-subdifferential
implies that for every x ∈ Rn ,

φi (x) − φi (x̄) ≥ hξi , x − x̄i − εi , i = 1, 2.

The above condition along with the assumption ε1 + ε2 = ε leads to

(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ1 + ξ2 , x − x̄i − (ε1 + ε2 )


= hξ1 + ξ2 , x − x̄i − ε, ∀ x ∈ Rn ,

thereby yielding ξ1 + ξ2 ∈ ∂ε (φ1 + φ2 )(x̄). Because εi ≥ 0 and ξi ∈ ∂εi φi (x̄)


for i = 1, 2, were arbitrary,
[
∂ε (φ1 + φ2 )(x̄) ⊃ (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄))
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

Conversely, suppose that ξ ∈ ∂ε (φ1 + φ2 )(x̄), which by the definition of


ε-subdifferential implies that

(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn .

By Definition 2.101 of the conjugate function,

(φ1 + φ2 )∗ (ξ) + (φ1 + φ2 )(x̄) − hξ, x̄i ≤ ε. (2.40)

© 2012 by Taylor & Francis Group, LLC


130 Tools for Convex Optimization

By the Sum Rule of the conjugate function, Theorem 2.107 (ii), as the as-
sumption ri dom φ1 ∩ ri dom φ2 6= ∅ holds,

(φ1 + φ2 )∗ (ξ) = (φ∗1  φ∗2 )(ξ),

and the infimum is attained, which implies there exist ξi ∈ Rn , i = 1, 2,


satisfying ξ1 + ξ2 = ξ such that

(φ1 + φ2 )∗ (ξ) = φ∗1 (ξ1 ) + φ∗2 (ξ2 ).

Therefore, the inequality (2.40) becomes

(φ∗1 (ξ1 ) + φ1 (x̄) − hξ1 , x̄i) + (φ∗2 (ξ2 ) + φ2 (x̄) − hξ2 , x̄i) ≤ ε.

Denote εi = φ∗i (ξi ) + φi (x̄) − hξi , x̄i, i = 1, 2, which by the Fenchel–Young


inequality, Proposition 2.103 (iv), implies that εi ≥ 0, i = 1, 2. Observe that
ε1 + ε2 ≤ ε. Again, by the definition of conjugate function,

φi (x) − φi (x̄) ≥ hξi , x − x̄i − εi


≥ hξi , x − x̄i − ε̄i ,
ε − ε1 − ε2
where ε̄i = εi + ≥ εi , i = 1, 2. Therefore, for i = 1, 2,
2
ξi ∈ ∂ε̄i φi (x̄),

with ε̄1 + ε̄2 = ε. Thus,

ξ = ξ1 + ξ2 ∈ ∂ε̄1 φ1 (x̄) + ∂ε̄2 φ2 (x̄).

Because ξ ∈ ∂ε (φ1 + φ2 )(x̄) was arbitrary,


[
∂ε (φ1 + φ2 )(x̄) ⊂ (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄)),
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

thereby completing the proof. 


Before proving the ε-subdifferential Max-Function Rule, we state a result
from Hiriart-Urruty and Lemaréchal [62] without proof and present the Scalar
Product Rule.

Proposition 2.116 Consider proper convex functions φi : Rn → R̄,


i = 1, . . . , m. Let φ(x) = max{φ
Sm1 (x), φ2 (x), . . . , φm (x)} and p = min{m, n + 1}.

For every ξ ∈ dom φP = co i=1 dom φ∗i , there exist ξi ∈ dom φ∗i and λi ≥ 0,
p
i = 1, 2, . . . , p, with i=1 λi = 1 such that
p
X p
X
φ∗ (ξ) = λi φ∗i (ξi ) and ξ= λi ξi .
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 131

More precisely, (ξi , λi ) solve the problem


p
X
inf λi φ∗i (ξi )
i=1
p
X p
X
subject to ξ = λi ξi , λi = 1, (P )
i=1 i=1
ξi ∈ dom φ∗i , λi ≥ 0, i = 1, 2, . . . , p.

For the ε-subdifferential Max-Function Rule, we will need the Scalar Prod-
uct Rule that we present below.

Theorem 2.117 (Scalar Product Rule) For a proper convex function


φ : Rn → R̄ and any ε ≥ 0,

∂ε (λg)(x̄) = λ∂ε/λ g(x̄), ∀ λ > 0.

Proof. Suppose that xi ∈ ∂ε (λφ)(x̄), which by Definition 2.109 implies that

(λφ)(x) − (λφ)(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn .

As λ > 0, dividing throughout by λ leads to

ξ ε
φ(x) − φ(x̄) ≥ h , x − x̄i − , ∀ x ∈ Rn ,
λ λ
ξ ε
which implies ∈ ∂ε̃ φ(x̄), where ε̃ = , that is, ξ ∈ λ∂ε̃ φ(x̄). Because
λ λ
ξ ∈ ∂ε (λφ)(x̄) was arbitrary,

∂ε (λg)(x̄) ⊂ λ∂ε̃ g(x̄).

Conversely, suppose that ξ ∈ λ∂ε̃ φ(x̄) for λ > 0, which implies there exists
ξ˜ ∈ ∂ε̃ φ(x̄) such that ξ = λξ.
˜ By the definition of ε-subdifferential,

˜ x − x̄i − ε̃, ∀ x ∈ Rn ,
φ(x) − φ(x̄) ≥ hξ,

which implies

(λφ)(x) − (λφ)(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

where ε = λε̃. Therefore, ξ ∈ ∂ε (λφ)(x̄). Because ξ ∈ λ∂ε̃ φ(x̄) was arbitrary,

∂ε (λg)(x̄) ⊃ λ∂ε̃ g(x̄),

thereby yielding the desired result. 


Now we proceed with establishing the ε-subdifferential Max-Function Rule
with the above results as the tool.

© 2012 by Taylor & Francis Group, LLC


132 Tools for Convex Optimization

Theorem 2.118 (Max-Function Rule) Consider proper convex functions


φi : Rn → R̄, i = 1, 2, . . . , m. Let φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)} and
p = min{m, n + 1}. Then ξ ∈ ∂ε φ(x̄) if P and only if there exist ξi ∈ dom φ∗i ,
p
εi ≥ 0, and λi ≥ 0, i = 1, 2, . . . , p, with i=1 λi = 1 such that

ξi ∈ ∂εi /λi φi (x̄) for every i satisfying λi > 0, (2.41)


p
X p
X p
X
ξ= λi ξi and εi + φ(x̄) − λi φi (x̄) ≤ ε. (2.42)
i=1 i=1 i=1

Proof. By Proposition 2.116,


p
X
φ∗ (ξ) = λi φ∗i (ξi ),
i=1

where p = min{m, n + 1} and (ξi , λi ) ∈ dom φ∗i × R+ , i = 1, 2, . . . , p, solves


the problem (P ), that is, satisfies
p
X p
X
ξ= λi ξi and λi = 1.
i=1 i=1

By the relation between the ε-subdifferential and the conjugate function, The-
orem 2.112, as ξ ∈ ∂ε φ(x),

φ∗ (ξ) + φ(x) − hξ, xi ≤ ε,

which by the conditions on (ξi , λi ), i = 1, 2, . . . , p, along with the definition of


φ leads to
Xp Xp
λi φ∗i (ξi ) + φ(x) − λi hξi , xi ≤ ε. (2.43)
i=1 i=1

The above condition can be rewritten as


p
X p
X
εi + φ(x) − λi φi (x) ≤ ε,
i=1 i=1

where εi = λi (φ∗i (ξi ) + φi (x) − hξi , xi), i = 1, 2, . . . , p, which by Theorem 2.112


yields that ξi ∈ ∂εi /λi φi (x) provided λi > 0, thereby leading to the conditions
(2.41) and (2.42) as desired.
Conversely, suppose that the conditions (2.41) and (2.42) hold. By Theo-
rem 2.112, (2.41) implies that for λi > 0,

λi (φ∗i (ξi ) + φi (x) − hξi , xi) ≤ εi ,

which along with (2.42) lead to


p
X p
X
λi φ∗i (ξi ) + φ(x) − λi hξi , xi ≤ ε,
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 133

that is, the inequality (2.43). Invoking Proposition 2.116 yields


p
X
φ∗ (ξ) = λi φ∗i (ξi ),
i=1

which along with (2.43) and Theorem 2.112 implies that ξ ∈ ∂ε φ(x), thereby
completing the proof. 

Remark 2.119 In the above result, applying the Scalar Product Rule, The-
orem 2.117, to the condition (2.41) implies that ξ˜i = λi ξi ∈ ∂εi (λi φi )(x) pro-
vided λi > 0. Therefore, ξ ∈ ∂ε φ(x) is such that there exist ξ˜i ∈ ∂εi (λi φi )(x),
i = 1, 2, . . . , p, satisfying
p
X p
X p
X
ξ= ξ˜i and εi + φ(x) − λi φi (x) ≤ ε.
i=1 i=1 i=1

As p = min{m, n + 1}, we consider two cases. If p = m, for some


j ∈ {1, 2, . . . , p}, define
p
X
ε̃j = εj + (ε − εi ) and ε̃i = εi , i 6= j
i=1

and the conditions become


ξ˜i ∈ ∂εi (λi φi )(x̄) for every i satisfying λi > 0, (2.44)
Xm Xm m
X
ξ= ˜
ξi and εi + φ(x̄) − λi φi (x̄) = ε. (2.45)
i=1 i=1 i=1

If p < m,Pdefine λi = 0 and Pp εi > 0 arbitrary for i = p + 1, p + 2, . . . , m,


m
such that i=p+1 εi = ε − j=1 εj . As already discussed, ∂εi (λi φi )(x) = {0},
i = p + 1, p + 2, . . . , m and hence yield the conditions (2.44) and (2.45). Thus,
∂ε φ(x), then there exist ξi ∈ dom φ∗i , εi ≥ 0 and λi ≥ 0, i = 1, 2, . . . , m,
if ξ ∈P
m
with i=1 λi = 1 such that the coditions (2.44) and (2.45) hold.
In particular, for a proper convex function φ : Rn → R̄ and
+
φ (x) = max{0, φ(x)},
∂ε (φ+ )(x̄) ⊂ {∂η (λφ)(x̄) : 0 ≤ λ ≤ 1, η ≥ 0, ε = η + φ+ (x̄) − λφ(x̄)}.
In the results stated above, the ε-subdifferential calculus rules were ex-
pressed in terms of the ε-subdifferential itself. Below we state a result by
Hiriart-Urruty and Phelps [64] relating the Sum Rule of the subdifferentials
and the ε-subdifferentials.
Theorem 2.120 Consider two proper lsc convex functions φi : Rn → R̄,
i = 1, 2. Then for any x̄ ∈ dom φ1 ∩ dom φ2 ,
\
∂(φ1 + φ2 )(x̄) = cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)).
ε>0

© 2012 by Taylor & Francis Group, LLC


134 Tools for Convex Optimization

Proof. Suppose that ξi ∈ ∂ε φi (x̄), i = 1, 2, which implies

φi (x) − φi (x̄) ≥ hξi , x − x̄i − ε, ∀ x ∈ Rn , i = 1, 2.

Therefore,

(φ1 + φ2 )(x) − (φ1 + φ2 )(x̄) ≥ hξ1 + ξ2 , x − x̄i − 2ε, ∀ x ∈ Rn ,

that is, ξ1 + ξ2 ∈ ∂2ε (φ1 + φ2 )(x̄). Because ξi ∈ ∂ε φi (x̄), i = 1, 2, are arbitrary,


which along with the closedness of ε-subdifferential by Proposition 2.111 yields

∂2ε (φ1 + φ2 )(x̄) ⊃ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)).

Further, applying Proposition 2.111 leads to


\
∂(φ1 + φ2 )(x̄) ⊃ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)).
ε>0

To establish the result, we shall prove the reverse containment in the above
condition. Suppose that ξ¯ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.108,
¯ = hξ,
(φ1 + φ2 )(x̄) + (φ1 + φ2 )∗ (ξ) ¯ x̄i,

which along with the Fenchel–Young inequality, Proposition 2.103 (iv), implies
that
¯ ≤ hξ,
(φ1 + φ2 )(x̄) + (φ1 + φ2 )∗ (ξ) ¯ x̄i.

Applying the Sum Rule of conjugate functions, Theorem 2.107 (ii), to proper
lsc convex functions φ1 and φ2 leads to

(φ1 + φ2 )∗ = cl (φ∗1  φ∗2 ).

Define

φ(ξ) = (φ∗1  φ∗2 )(ξ) − hξ, x̄i,

which implies cl φ = cl (φ∗1  φ∗2 ) − h., x̄i. By the preceding conditions,


¯ ≤ α. It is easy to observe that
denoting α = −(φ1 + φ2 )(x̄) yields φ(ξ)
\
{ξ ∈ Rn : cl φ(ξ) ≤ α} = cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}.
ε>0

Therefore, for every ε > 0,

ξ¯ ∈ cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2}.

If φ(ξ) ≤ α + ε/2, then

φ(ξ) − α = inf {φ∗1 (ξ1 ) + φ∗2 (ξ2 ) − hξ1 , x̄i − hξ2 , x̄i + φ1 (x̄) + φ2 (x̄)}
ξ=ξ1 +ξ2
= inf {(φ∗1 (ξ1 ) − hξ1 , x̄i + φ1 (x̄)) + (φ∗2 (ξ2 ) − hξ2 , x̄i + φ2 (x̄))}.
ξ=ξ1 +ξ2

© 2012 by Taylor & Francis Group, LLC


2.6 ε-Subdifferential 135

Therefore, there exist ξ1 , ξ2 such that ξ = ξ1 + ξ2 and

(φ∗1 (ξ1 ) − hξ1 , x̄i + φ1 (x̄)) + (φ∗2 (ξ2 ) − hξ2 , x̄i + φ2 (x̄))} < ε.

By the Fenchel–Young inequality,

φ∗i (ξi ) − hξi , x̄i + φi (x̄) ≥ 0, i = 1, 2,

which along with Definition 2.101 of the conjugate and the preceding condi-
tions imply that

hξi , x − x̄i − φ(x) + φi (x̄) ≤ ε, ∀ x ∈ Rn , i = 1, 2,

that is, ξi ∈ ∂εi φi (x̄) for i = 1, 2. Thus,

cl {ξ ∈ Rn : φ(ξ) ≤ α + ε/2} ⊂ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)),

which implies ξ¯ ∈ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)) for every ε > 0. As ξ¯ ∈ ∂(φ1 + φ2 )(x̄)
was arbitrary,
\
∂(φ1 + φ2 )(x̄) ⊂ cl (∂ε φ1 (x̄) + ∂ε φ2 (x̄)),
ε>0

thus establishing the result. 

Now if one goes back to the optimality condition 0 ∈ ∂φ(x̄) in Theo-


rem 2.89, it gives an equivalent characterization to the point of minimizer x̄
of the unconstrained problem (CPu ). So one will like to know then what the
condition 0 ∈ ∂ε φ(x̄) implies. As it turns out, it leads to the concept of ap-
proximate optimality conditions, which we will deal with in one of the later
chapters. For now we simply state the result on approximate optimality for
the unconstrained convex programming problem (CPu ).

Theorem 2.121 Consider a proper convex function φ : Rn → R̄ and let ε > 0


be given. Then 0 ∈ ∂ε φ(x̄) if and only if

φ(x̄) ≤ infn φ(x) + ε.


x∈R

The point x̄ is called an ε-solution of (CPu ).

In the above theorem we mentioned only one of the notions of approximate


solutions, namely the ε-solution. But there are other approximate solution
concepts, as we shall see later in the book, some of which are motivated by
the Ekeland’s Variational Principle, Theorem 2.113.

© 2012 by Taylor & Francis Group, LLC


136 Tools for Convex Optimization

2.7 Epigraphical Properties of Conjugate Functions


With the study of conjugate function and ε-subdifferential, we are now in a
position to present the relation of the epigraph of conjugate functions with the
ε-subdifferentials of a convex function from Jeyakumar, Lee, and Dinh [68].
This relation plays an important part in the study of sequential optimality
conditions as we shall see in the chapter devoted to its study.

Theorem 2.122 Consider a proper lsc convex function φ : Rn → R̄ and let


x̄ ∈ dom φ. Then
[
epi φ∗ = {(ξ, hξ, x̄ − φ(x̄) + ε) : ξ ∈ ∂ε φ(x̄)}.
ε≥0

Proof. Denote
[
F= {(ξ, hξ, x̄ − φ(x̄) + ε) : ξ ∈ ∂ε φ(x̄)}.
ε≥0

Suppose that (ξ, α) ∈ epi φ∗ , which implies φ∗ (ξ) ≤ α. By Definition 2.101 of


the conjugate function,

hξ, xi − φ(x) ≤ α, ∀ x ∈ Rn .

Denoting ε = α − hξ, x̄i + φ(x̄), the above inequality becomes

φ(x) − φ(x̄) ≥ hξ, xi − φ(x̄) − α


= hξ, x − x̄i − ε, ∀ x ∈ Rn ,

which by Definition 2.109 of the ε-subdifferential implies that ξ ∈ ∂ε φ(x̄).


Therefore, (ξ, α) ∈ F. Because (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ F.
Conversely, suppose that (ξ, α) ∈ F, which implies there exists ε ≥ 0 and
x̄ ∈ dom ∂φ with

ξ ∈ ∂ε φ(x̄) and α = hξ, x̄i − φ(x̄) + ε.

As ξ ∈ ∂ε φ(x̄), by the definition of ε-subdifferential,

φ(x) − φ(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

which by the definition of conjugate function leads to

φ∗ (ξ) ≤ hξ, x̄i − φ(x̄) + ε = α.

Thus, (ξ, α) ∈ epi φ∗ . Because (ξ, α) ∈ F was arbitrary, epi φ∗ ⊃ F, thereby


establishing the result. 
Next we discuss the epigraphical conditions for the operations of the con-
jugate functions.

© 2012 by Taylor & Francis Group, LLC


2.7 Epigraphical Properties of Conjugate Functions 137

Theorem 2.123 (i) (Sum Rule) Consider proper lsc convex functions
φi : Rn → R̄, i = 1, 2, . . . , m. Then

epi(φ1 + φ2 + . . . + φm )∗ = cl (epi φ∗1 + epi φ∗2 + . . . + epi φ∗m ).

(ii) (Supremum Rule) Consider a family of proper lsc convex functions


φi : Rn → R̄, i ∈ I, where I is an arbitrary index set. Then
[
epi (sup φi )∗ = cl co epi φ∗i .
i∈I
i∈I

(iii) Consider proper lsc convex functions φi : Rn → R, i = 1, 2, . . . , m.


Define a vector-valued convex function Φ : Rn → Rm , defined as
Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)). Then
[
epi (λΦ)∗ is a convex cone.
λ∈Rm
+

(iv) Consider a proper lsc convex function φ : Rn → R. Then for every λ > 0,
epi (λφ)∗ = λ epi φ∗ .

Proof. (i) As φi , i = 1, 2, . . . , m are proper lsc convex functions, the condition


of Theorem 2.107 (ii) reduces to

(φ1 + φ2 + . . . + φm )∗ = cl (φ∗1  φ∗2  . . .  φ∗m ),

which implies

epi (φ1 + φ2 + . . . + φm )∗ = epi cl (φ∗1  φ∗2  . . .  φ∗m ).

By Definition 1.11 of the closure of a function, the above condition becomes

epi (φ1 + φ2 + . . . + φm )∗ = cl epi (φ∗1  φ∗2  . . .  φ∗m ),

which by Proposition 2.55 leads to the desired condition.


(ii) Theorem 2.107 (iv) along with Definition 2.57 of the closed convex hull of
a function implies that
[
epi (sup φi )∗ = epi cl co (inf φ∗i ) = cl co epi φ∗i ,
i∈I i∈I
i∈I

thereby establishing the result.


S
(iii) Suppose that (ξ, α) ∈ λ∈Rm (λΦ)∗ , which implies that there exists
+
λ′ ∈ Rm ′ ∗
+ such that (ξ, α) ∈ epi (λ Φ) . This along with Definition 2.101 of
the conjugate function leads to

hξ, xi − (λ′ Φ)(x) ≤ (λ′ Φ)∗ (ξ) ≤ α, ∀ x ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


138 Tools for Convex Optimization

Multiplying throughout by any γ > 0,

hγξ, xi − ((γλ′ )Φ)(x) ≤ γα, ∀ x ∈ Rn ,

where γλ′ ∈ Rm + . Again, by the definition of conjugate function, the above


condition leads to

((γλ′ )Φ)∗ (γξ) ≤ γα,

which implies that


[
γ(ξ, α) ∈ epi ((γλ′ )Φ)∗ ⊂ epi (λΦ)∗ , ∀ γ > 0.
λ∈Rm
+

S
Hence, λ∈Rm epi (λΦ)∗ is a cone.
+ S
Now consider (ξi , αi ) ∈ λ∈Rm epi (λΦ)∗ , i = 1, 2, which implies there
+

exist λi ∈ Rm + such that (ξi , αi ) ∈ epi (λi Φ) for i = 1, 2. Therefore, by the
definition of conjugate function, for every x ∈ Rn ,

hξi , xi − (λi Φ)(x) ≤ αi , i = 1, 2.

For any γ ∈ [0, 1], the above condition leads to

h(1 − γ)ξ1 + γξ2 , xi − (λ′ Φ)(x) ≤ (1 − γ)α1 + γα2 , ∀ x ∈ Rn ,

where λ′ = (1 − γ)λ1 + γλ2 ∈ Rm


+ . Therefore,

(λ′ Φ)∗ ((1 − γ)ξ1 + γξ2 ) ≤ (1 − γ)α1 + γα2 ,

which implies that


[
(1 − γ)(ξ1 , α1 ) + γ(ξ2 , α2 ) ∈ epi (λ′ Φ)∗ ⊂ epi (λΦ)∗ , ∀ γ ∈ [0, 1].
λ∈Rm
+

S
Because (ξi , αi ), i = 1, 2, were arbitrary, thus λ∈Rm epi (λΦ)∗ is a convex
+
set.
(iv) Suppose that (ξ, α) ∈ epi (λφ)∗ , which implies that (λφ)∗ (ξ) ≤ α. As
λ > 0, Proposition 2.103 (iii) leads to
 
∗ ξ α
φ ≤ ,
λ λ
 
ξ α
which implies , ∈ epi φ∗ , that is, (ξ, α) ∈ λ epi φ∗ . Because
λ λ
(ξ, α) ∈ epi (λφ)∗ was arbitrary, epi (λφ)∗ ⊂ λepi φ∗ . The reverse inclusion
can be obtained by following the steps backwards, thereby establishing the
result. 

© 2012 by Taylor & Francis Group, LLC


2.7 Epigraphical Properties of Conjugate Functions 139

From the above theorem, for two proper lsc convex functions φi : Rn → R̄,
i = 1, 2,

epi(φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ).

In general, epi φ∗1 + epi φ∗2 need not be closed. But under certain additional
conditions, it can be shown that epi φ∗1 + epi φ∗2 is closed. We present below
the result from Burachik and Jeyakumar [20] and Dinh, Goberna, López, and
Son [32] to establish the same.

Proposition 2.124 Consider two proper lsc convex functions φi : Rn → R̄,


i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. If cone(dom φ1 − dom φ2 ) is a
closed subspace or at least one of the functions is continuous at some point in
dom φ1 ∩ dom φ2 , then epi φ∗1 + epi φ∗2 is closed.

Proof. As cone(dom φ1 − dom φ2 ) is a closed subspace, by Theorem 1.1 of


Attouch and Brézis [2] or Theorem 3.6 of Strömberg [107], the exact infimal
convolution holds, that is,

(φ1 + φ2 )∗ = φ∗1  φ∗2 .

The above condition along with Theorem 2.123 (i) leads to

cl (epi φ∗1 + epi φ∗2 ) = epi (φ1 + φ2 )∗ = epi (φ∗1  φ∗2 ) = epi φ∗1 + epi φ∗2 ,

thereby yielding the result that epi φ∗1 + epi φ∗2 is closed.
Suppose that φ1 is continuous at x̂ ∈ dom φ1 ∩ dom φ2 , which yields

0 ∈ core (dom φ1 − dom φ2 ).

This implies that cone (dom φ1 − dom φ2 ) is a closed subspace and thus leads
to the desired result. 
Note that the result gives only sufficient condition for the closedness of
epi φ∗1 + epi φ∗2 . The converse need not be true. For a better understanding,
we consider the following example from Burachik and Jeyakumar [20]. Let
φ1 = δ[0,∞) and φ2 = δ(−∞,0] . Therefore,

epi φ∗1 = epi σ[0,∞) = R− × R+ and epi φ∗2 = epi σ(−∞,0] = R+ × R+ ,

which leads to epi φ∗1 + epi φ∗2 = R × R+ , a closed convex cone. Observe that
cone(dom φ1 − dom φ2 ) = [0, ∞), which is not a subspace, and also neither
φ1 nor φ2 are continuous at dom φ1 ∩ dom φ2 = {0}. Thus, the condition,
epi φ∗1 + epi φ∗2 is closed, is a relaxed condition in comparison to the other
assumptions.
Using this closedness assumption, Burachik and Jeyakumar [21] obtained
an equivalence between the exact inf-convolution and ε-subdifferential Sum
Rule, which we present next.

© 2012 by Taylor & Francis Group, LLC


140 Tools for Convex Optimization

Theorem 2.125 Consider two proper lsc convex functions φi : Rn → R̄,


i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. Then the following are equivalent:

(i) (φ1 + φ2 )∗ = φ∗1  φ∗2 with exact infimal convolution,

(ii) For every ε ≥ 0 and every x̄ ∈ dom φ1 ∩ dom φ2 ,


[
∂ε (φ1 + φ2 )(x̄) = (∂ε1 φ1 (x̄) + ∂ε2 φ2 (x̄)).
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

(iii) epi φ∗1 + epi φ∗2 is closed,

Proof. (i) ⇒ (ii): The proof follows along the lines of Theorem 2.115.
(ii) ⇒ (iii): Suppose that (ξ, γ) ∈ cl (epi φ∗1 + epi φ∗2 ). By Theorem 2.123 (i),
(ξ, γ) ∈ epi (φ1 + φ2 )∗ . Let x̄ ∈ dom φ1 ∩ dom φ2 . By Theorem 2.122, there
exists ε ≥ 0 such that

ξ ∈ ∂ε (φ1 + φ2 )(x̄) and γ = hξ, x̄i − (φ1 + φ2 )(x̄) + ε.

By (ii), there exist εi ≥ 0 and ξi ∈ ∂εi φi (x̄), i = 1, 2, such that

ξ = ξ1 + ξ2 and ε = ε 1 + ε2 .

Define γi = hξi , x̄i−φi (x̄)+εi , i = 1, 2. Then from Theorem 2.122, for i = 1, 2,


(ξi , γi ) ∈ epi φ∗i , which implies

(ξ, γ) = (ξ1 , γ1 ) + (ξ2 , γ2 ) ∈ epi φ∗1 + epi φ∗2 ,

thereby leading to (iii).


(iii) ⇒ (i): Suppose that there exists ξ ∈ Rn such that ξ ∈ dom (φ1 + φ2 )∗ (ξ).
Otherwise (i) holds trivially. By (iii),

epi (φ1 + φ2 )∗ = cl (epi φ∗1 + epi φ∗2 ) = epi φ∗1 + epi φ∗2 ,

which implies (ξ, (φ1 + φ2 )∗ (ξ)) ∈ epi φ∗1 + epi φ∗2 . Thus for i = 1, 2, there
exist (ξi , γi ) ∈ epi φ∗i such that

ξ = ξ1 + ξ2 and (φ1 + φ2 )∗ = γ1 + γ2 ,

which implies there exists ξ¯ ∈ Rn such that


¯ + φ∗ (ξ)
φ∗1 (ξ − ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ).
2

Therefore,
¯ + φ∗ (ξ)
(φ∗1  φ∗2 )(ξ) ≤ φ∗1 (ξ − ξ) ¯ ≤ (φ1 + φ2 )∗ (ξ).
2

© 2012 by Taylor & Francis Group, LLC


2.7 Epigraphical Properties of Conjugate Functions 141

By Theorem 2.107 and (iii),

(φ1 + φ2 )∗ (ξ) = cl (φ∗1  φ∗2 )(ξ) ≤ (φ∗1  φ∗2 )(ξ),

which along with the preceding condition leads to the exact infimal convolu-
tion, thereby establishing (i). 
Though it is obvious that under the closedness of epi φ∗1 + epi φ∗2 , one
can obtain the subdifferential Sum Rule by choosing ε = 0 in (ii) of the above
theorem, we present a detailed version of the result from Burachik and Jeyaku-
mar [20]. Below is an alternative approach to the Sum Rule, Theorem 2.91.

Theorem 2.126 Consider two proper lsc convex functions φi : Rn → R̄,


i = 1, 2, such that dom φ1 ∩ dom φ2 6= ∅. If epi φ∗1 + epi φ∗2 is closed, then

∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄), ∀ x̄ ∈ dom φ1 ∩ dom φ2 .

Proof. Let x̄ ∈ dom φ1 ∩ dom φ2 . It is easy to observe that

∂(φ1 + φ2 )(x̄) ⊃ ∂φ1 (x̄) + ∂φ2 (x̄).

To prove the result, we shall prove the converse inclusion. Suppose that
ξ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.108,

(φ1 + φ2 )∗ (ξ) + (φ1 + φ2 )(x̄) = hξ, x̄i.

Therefore, the above condition along with the given hypothesis

(ξ, hξ, x̄i − (φ1 + φ2 )(x̄)) ∈ epi (φ1 + φ2 )∗ = epi φ∗1 + epi φ∗2 ,

which implies that there exist (ξi , γi ) ∈ epi φ∗i , i = 1, 2, such that

ξ = ξ1 + ξ2 and hξ, x̄i − (φ1 + φ2 )(x̄) = γ1 + γ2 .

Also, as (ξi , γi ) ∈ epi φ∗i , i = 1, 2, which along with the above conditions lead
to

φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≤ hξ, x̄i − (φ1 + φ2 )(x̄) = hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄).

By the Fenchel–Young inequality, Proposition 2.103 (iv),

φ∗1 (ξ1 ) + φ∗2 (ξ2 ) ≥ hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄),

which together with the preceding inequality leads to

φ∗1 (ξ1 ) + φ∗2 (ξ2 ) = hξ1 , x̄i + hξ2 , x̄i − φ1 (x̄) − φ2 (x̄).

Again by the Fenchel–Young inequality and the above equation,

φ∗1 (ξ1 ) + φ1 (x̄) − hξ1 , x̄i = hξ2 , x̄i − φ2 (x̄) − φ∗2 (ξ2 ) ≤ 0,

© 2012 by Taylor & Francis Group, LLC


142 Tools for Convex Optimization

which by Theorem 2.108 yields ξ1 ∈ ∂φ1 (x̄). Along similar lines it can be
obtained that ξ2 ∈ ∂φ2 (x̄). Thus,

ξ = ξ1 + ξ2 ∈ ∂φ1 (x̄) + ∂φ2 (x̄),

which implies that

∂(φ1 + φ2 )(x̄) ⊂ ∂φ1 (x̄) + ∂φ2 (x̄),

thereby leading to the desired result. 


We end this chapter with an application of Theorem 2.126 to provide an
alternative assumption to establish equality in Proposition 2.39 (i).

Corollary 2.127 Consider convex sets F1 , F2 ⊂ Rn such that F1 ∩ F2 6= ∅.


If epi σF1 + epi σF2 is closed, then

NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 .

Proof. We know that for any convex set F ⊂ Rn , δF∗ = σF . Thus the condition

epi σF1 + epi σF2 is closed

is equivalent to

epi δF∗ 1 + epi δF∗ 2 is closed.

As the condition for Theorem 2.126 to hold is satisfied,

∂(δF1 + δF2 )(x̄) = ∂δF1 (x̄) + ∂δF2 (x̄), x̄ ∈ dom δF1 ∩ dom δF2 .

Because δF1 + δF2 = δF1 ∩F2 , the above equality condition along with the fact
that for any convex set F ⊂ Rn , ∂δF = NF , yields the desired result, that is,
NF1 ∩F2 (x̄) = NF1 (x̄) + NF2 (x̄), ∀ x̄ ∈ F1 ∩ F2 . 

© 2012 by Taylor & Francis Group, LLC


Chapter 3
Basic Optimality Conditions Using
the Normal Cone

3.1 Introduction
Recall the convex optimization problem presented in Chapter 1
min f (x) subject to x ∈ C, (CP )
n n
where f : R → R is a convex function and C ⊂ R is a convex set. It is nat-
ural to think that f ′ (x, h) and ∂f (x) will play a major role in the process of
establishing the optimality conditions as these objects have been successful in
overcoming the difficulty posed by the absence of a derivative. In this chapter
we will not bother ourselves with extended-valued function but such a frame-
work can be easily adapted into the current framework. But use of extended-
valued convex functions might appear while doing some of the proofs, as one
will need to use the calculus rules for subdifferentials like the Sum Rule or the
Chain Rule. To begin our discussion more formally, we right away state the
following basic result.

Theorem 3.1 Consider the convex optimization problem (CP ). Then x̄ is a


point of minimizer of (CP ) if and only if either of two conditions hold:

(i) f ′ (x̄, d) ≥ 0, ∀ d ∈ TC (x̄) or,


(ii) 0 ∈ ∂f (x̄) + NC (x̄).

Proof. (i) As x̄ ∈ C and C is a convex set,

x̄ + λ(x − x̄) ∈ C, ∀ x ∈ C, ∀ λ ∈ [0, 1].

Also, as x̄ is a point of minimizer of (CP ), then for every x ∈ C

f (x̄ + λ(x − x̄)) ≥ f (x̄), ∀ λ ∈ [0, 1].

Therefore,

f (x̄ + λ(x − x̄)) − f (x̄)


lim ≥ 0,
λ↓0 λ

143

© 2012 by Taylor & Francis Group, LLC


144 Basic Optimality Conditions Using the Normal Cone

which implies

f ′ (x̄, x − x̄) ≥ 0, ∀ x ∈ C.

By Theorem 2.76, the directional derivative is sublinear in the direction and


thus

f ′ (x̄, d) ≥ 0, ∀ d ∈ cl cone(C − x̄).

By Theorem 2.35, TC (x̄) = cl cone(C − x̄) and therefore, the above inequality
becomes

f ′ (x̄, d) ≥ 0, ∀ d ∈ TC (x̄).

Conversely, suppose condition (i) holds. As f : Rn → R, applying Propo-


sition 2.83 and Theorem 2.79, for any d ∈ Rn there exists ξ ∈ ∂f (x̄) such
that

hξ, di = f ′ (x̄, d).

For every x ∈ C, x − x̄ ∈ TC (x̄). Therefore, the convexity of f along with the


above condition and condition (i) implies that for every x ∈ C, there exists
ξ ∈ ∂f (x̄) such that

f (x) − f (x̄) ≥ hξ, x − x̄i ≥ 0, ∀ x ∈ C,

thereby proving that x̄ is the point of minimizer of f over C.


(ii) As x̄ is a point of minimizer of f over C, we have that x̄ also solves the
problem

min (f + δC )(x).
x∈Rn

Hence, by the optimality conditions for the unconstrained optimization prob-


lem, Theorem 2.89,

0 ∈ ∂(f + δC )(x̄).

Because x̄ ∈ dom δC , by Proposition 2.14, ri dom δC = ri C is nonempty.


Also ri dom f = Rn and hence ri dom f ∩ ri dom δC 6= ∅. Now using the Sum
Rule, Theorem 2.91,

0 ∈ ∂f (x̄) + ∂δC (x̄),

which by the fact that ∂δC (x) = NC (x) leads to

0 ∈ ∂f (x̄) + NC (x̄).

© 2012 by Taylor & Francis Group, LLC


3.2 Slater Constraint Qualification 145

Conversely, suppose that condition (ii) is satisfied, which means that there
exists ξ ∈ ∂f (x̄) such that −ξ ∈ NC (x̄), that is,
hξ, x − x̄i ≥ 0, ∀ x ∈ C.
Therefore by the convexity of f , which along with the above inequality yields
f (x) ≥ f (x̄), ∀ x ∈ C,
thereby leading to the desired result. 
By condition (ii) of the above theorem, there exists ξ ∈ ∂f (x̄) such that
h−ξ, xi ≤ h−ξ, x̄i, ∀ x ∈ C.
As x̄ ∈ C, the above condition yields that the support function to the set C
at −ξ is given by
σC (−ξ) = −hξ, x̄i.
Thus, condition (ii) is equivalent to the above condition.
Again, by condition (ii) of Theorem 3.1, there exists ξ ∈ ∂f (x̄) such that
−ξ ∈ NC (x̄), which can be equivalently expressed as
h(x̄ − αξ) − x̄, x − x̄i ≤ 0, ∀ x ∈ C, ∀ α ≥ 0.
Therefore, by Proposition 2.52, condition (ii) is equivalent to
x̄ = projC (x̄ − αξ), ∀ α ≥ 0.
We state the above discussion as the following result.
Theorem 3.2 Consider the convex optimization problem (CP ). Then x̄ is a
point of minimizer of (CP ) if and only if there exists ξ ∈ ∂f (x̄) such that
either σC (−ξ) = −hξ, x̄i or x̄ = projC (x̄ − αξ), ∀ α ≥ 0.

3.2 Slater Constraint Qualification


Now consider the case where C is represented only through convex inequality
constraints. Observe that the equality affine constraints of the form
hj (x) = 0, j = 1, 2, . . . , l,
n
where hj : R → R, j = 1, 2, . . . , l, are affine functions can also be expressed
in the convex inequality form as
hj (x) ≤ 0, j = 1, 2, . . . , l,
−hj (x) ≤ 0, j = 1, 2, . . . , l.

© 2012 by Taylor & Francis Group, LLC


146 Basic Optimality Conditions Using the Normal Cone

Thus, we define

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, (3.1)

where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. In practice, this


is most often the case. In order to write an explicit optimality condition we
need to compute NC (x̄) and express it in terms of the constraint functions
gi , i = 1, 2, . . . , m. So how do we do that? In this respect, we present the
following result.

Proposition 3.3 Consider the set C as in (3.1). Assume that the active index
set at x̄, that is,

I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0}

is nonempty. Let the Slater constraint qualification hold, that is, there exists
x̂ ∈ Rn such that gi (x̂) < 0, for i = 1, 2, . . . , m. Then
 
 X 
NC (x̄) = λi ξi ∈ Rn : ξi ∈ ∂gi (x̄), λi ≥ 0, i ∈ I(x̄) .
 
i∈I(x̄)

In order to prove the above proposition, we need to do a bit of work, which


we will do step by step. Denote the set on the right-hand side of the equality
by
 
 X 
b
S(x̄) = λi ξi ∈ Rn : ξi ∈ ∂gi (x̄), λi ≥ 0, i ∈ I(x̄) . (3.2)
 
i∈I(x̄)

One might get curious as to what are these λi , i = 1, 2, . . . , m, in the expres-


b
sion of the elements of S(x̄). These are the Lagrange multipliers, vital stuff in
optimization and that we need to discuss more in detail. In order to establish
b
Proposition 3.3, that is, NC (x̄) = S(x̄), we will prove that S(x̄)b is a closed
convex cone for which we need the following lemma whose proof is as given in
van Tiel [110].

Proposition 3.4 Consider a nonempty compact set A ⊂ Rn with 0 6∈ A. Let


K be the cone generated by A, that is,

K = coneA = {λa ∈ Rn : λ ≥ 0, a ∈ A}.

Then K is a closed set.

Proof. Consider a sequence {xk } ⊂ K such that xk → x̃. To prove the result,
we need to show that x̃ ∈ K. As xk ∈ K, there exist λk ≥ 0 and ak ∈ A
such that xk = λk ak for every k. Because A is compact, {aν } is a bounded
sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, {ak } has a

© 2012 by Taylor & Francis Group, LLC


3.2 Slater Constraint Qualification 147

convergent subsequence. Thus, without loss of generality, let ak → ã and as


A is closed, ã ∈ A. Because 0 6∈ A, it is simple to observe that there exists
α > 0 such that kak ≥ α for every a ∈ A. Hence,

|λk |kak k 1
|λk | = ≤ kλk ak k.
kak k α

As λk ak → x̃, kλk ak k is bounded, thereby implying that {λk } is a bounded


sequence, that by the Bolzano–Weierstrass Theorem, Proposition 1.3, has a
convergent subsequence. Without loss of generality, assume that λk → λ̃. This
shows that

xk = λk ak → λ̃ã as k → +∞.

By the assumption xk → x̃ and as the limit is unique, λ̃ã = x̃. Hence x̃ ∈ K,


thereby establishing the result. 
b
We will now show that the set S(x̄) is a closed convex cone. This fact will
play a major role in the proof of Proposition 3.3.

Lemma 3.5 Assume that I(x̄) is nonempty and the Slater constraint quali-
b
fication holds. Then the set S(x̄) given by (3.2) is a closed convex cone.

Proof. Observe that S(x̄) b b


is a cone. To prove the convexity of S(x̄), let
b P j j j j
v1 , v2 (6= 0) ∈ S(x̄). Then vj = i∈I(x̄) λi ξi where λi ≥ 0 and ξi ∈ ∂gi (x̄),
b
i ∈ I(x̄) for j = 1, 2. As S(x̄) is a cone, to show that it is convex, by Theo-
b
rem 2.20 we just have to show that v1 + v2 ∈ S(x̄). Consider
X 
v1 + v2 = λ1i ξi1 + λ2i ξi2
i∈I(x̄)
X  
λ1i λ2i
= (λ1i + λ2i ) ξ 1
+ ξ 2
.
λ1i + λ2i i λ1i + λ2i i
i∈I(x̄)

Because ∂gi (x̄) is a convex set,

λ1i λ2
ξi1 + 1 i 2 ξi2 ∈ ∂gi (x̄).
λ1i + λi2 λi + λi

b
Hence, v1 + v2 ∈ S(x̄).
b
Finally, we have to show that S(x̄) is closed. Consider the function

g(x) = max{g1 (x), g2 (x), . . . , gm (x)}.

Moreover, as I(x̄) is nonempty, g(x̄) = 0 with J(x̄) = I(x̄), where

J(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = g(x̄)}.

© 2012 by Taylor & Francis Group, LLC


148 Basic Optimality Conditions Using the Normal Cone

Further, from the Max-Function Rule, Theorem 2.96,


 
[
∂g(x̄) = co  ∂gi (x̄) . (3.3)
i∈I(x̄)

b
We claim that S(x̄) = cone(∂g(x̄)), that is,

b
S(x̄) = {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}. (3.4)

b
But before showing that S(x̄) is given as above and applying Proposition 3.4
b
to conclude that S(x̄) is closed, we first need to show that 0 6∈ ∂g(x̄).
As the Slater constraint qualification holds, there exists x̂ such that
gi (x̂) < 0 for every i = 1, 2, . . . , m. Hence g(x̂) < 0. By the convexity of
g,

hξ, x̂ − x̄i ≤ g(x̂) − g(x̄), ∀ ξ ∈ ∂g(x̄).

Because J(x̄) = I(x̄) is nonempty, for every ξ ∈ ∂g(x̄),

hξ, x̂ − x̄i < 0.

As x̂ 6= x̄, it is clear that 0 6∈ ∂g(x̄). Otherwise, if 0 ∈ ∂g(x̄), the above inequal-


ity will be violated. Hence, observe that 0 6∈ ∂gi (x̄) for every i ∈ J(x̄) = I(x̄).
Because S(x̄)b b
is a cone, 0 ∈ S(x̄). For λ = 0,

0 ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.

b
Consider v ∈ S(x̄) with v 6= 0. We will show that

v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.

b
As vP∈ S(x̄), there exist λi ≥ 0 and ξi ∈ ∂gi (x̄), i ∈ I(x̄) such that
v = i∈I(x̄) λi ξi . Because v 6= 0 and 0 6∈ ∂gi (x̄) for all i ∈ I(x̄), it is clear that
P
all the λi , i ∈ I(x̄) cannot be simultaneously zero and hence i∈I(x̄) λi > 0.
P P
Let α = i∈I(x̄) λi and thus i∈I(x̄) λi /α = 1. Therefore,
 
1 X λi [
v= ξi ∈ co  ∂gi (x̄) ,
α α
i∈I(x̄) i∈I(x̄)

which by (3.3) implies that v ∈ α ∂g(x̄). Hence,

b
S(x̄) ⊆ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)}.

Conversely, consider v ∈ {λξ ∈ Rn : λ ≥ 0, ξ ∈ ∂g(x̄)} with v 6= 0.

© 2012 by Taylor & Francis Group, LLC


3.2 Slater Constraint Qualification 149

Therefore, v = λξ for some λ ≥ 0, ξ ∈ ∂g(x̄). The condition (3.3) yields that


there exist µi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
X
ξ= µi ξi
i∈I(x̄)
P
with i∈I(x̄) µi = 1. Therefore,
X X
v= λµi ξi = λ′i ξi ,
i∈I(x̄) i∈I(x̄)

b
where λ′i = λµi ≥ 0 for i ∈ I(x̄). Hence, v ∈ S(x̄). Because v was arbitrary,
(3.4) holds, which along with the fact that 0 6∈ ∂g(x̄) and Proposition 3.4
b
yields that S(x̄) is closed. 

Remark 3.6 It may be noted here that S(x̄) b is proved to be closed under
the Slater constraint qualification, which is equivalent to
[
0 6∈ co ∂gi (x̄).
i∈I(x̄)

This observation was made by Wolkowicz [112]. In the absence of such condi-
b
tions, S(x̄) need not be closed.

Now we turn to establish Proposition 3.3 according to which, if the Slater


constraint qualification holds, then
b
NC (x̄) = S(x̄).

b
Proof of Proposition 3.3. First we will prove that S(x̄) ⊆ NC (x̄). Consider
b
v ∈ S(x̄). Thus, there exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
any P
v = i∈I(x̄) λi ξi . Hence, for any x ∈ C,
X
hv, x − x̄i = λi hξi , x − x̄i.
i∈I(x̄)

By the convexity of gi , i ∈ I(x̄),

hξi , x − x̄i ≤ gi (x) − gi (x̄) ≤ 0, ∀ x ∈ C.

Thus hv, x − x̄i ≤ 0 for every x ∈ C, thereby showing that v ∈ NC (x̄).


b
Conversely, suppose that v ∈ NC (x̄). We have to show that v ∈ S(x̄). On
b b
the contrary, assume that v 6∈ S(x̄). As S(x̄) is a closed convex cone, by the
strict separation theorem, Theorem 2.26 (iii), there exists w ∈ Rn with w 6= 0
such that
b
hw, ξi ≤ 0 < hw, vi, ∀ ξ ∈ S(x̄).

© 2012 by Taylor & Francis Group, LLC


150 Basic Optimality Conditions Using the Normal Cone
S 
b
As S(x̄) = cone co i∈I(x̄) ∂gi (x̄) , for each i ∈ I(x̄), hw, ξi i ≤ 0 for every
ξi ∈ ∂gi (x̄), which along with Theorem 2.79 yields

gi′ (x̄, w) ≤ 0, ∀ i ∈ I(x̄). (3.5)

Define

K = {u ∈ Rn : gi′ (x̄, u) < 0, ∀ i ∈ I(x̄)}.

Our first step is to show that K is nonempty. By the Slater constraint qual-
ification, there exists x̂ such that gi (x̂) < 0 for every i = 1, 2, . . . , m, and
corresponding to that x̂, set u = x̂ − x̄. By the convexity of each gi and
Theorem 2.79,

gi′ (x̄, x̂ − x̄) ≤ gi (x̂) − gi (x̄), ∀ i ∈ I(x̄),

which implies

gi′ (x̄, x̂ − x̄) < 0, ∀ i ∈ I(x̄).

Hence, x̂ − x̄ ∈ K, thereby showing that K is nonempty. Observe that for any


u ∈ K, there exists λ > 0 sufficiently small such that gi (x̄ + λu) < 0 for all
i = 1, 2, . . . , m, which implies x̄ + λu ∈ C. Therefore,
1
u∈ (C − x̄) ⊆ cone(C − x̄) ⊆ cl cone(C − x̄).
λ
By Theorem 2.35, u ∈ TC (x̄). Because TC (x̄) is closed, cl K ⊆ TC (x̄). Also,
as K is nonempty, it is simple to show that

cl K = {u ∈ Rn : gi′ (x̄, u) ≤ 0, ∀ i ∈ I(x̄)}.

By (3.5), w ∈ cl K and hence, w ∈ TC (x̄). As v ∈ NC (x̄), hv, wi ≤ 0, thereby


contradicting the fact that hv, wi > 0 and thus establishing the result. 
Recall condition (ii) from Theorem 3.1, that is,

0 ∈ ∂f (x̄) + NC (x̄).

By combining it with Proposition 3.3, we can conclude that under the Slater
constraint qualification, x̄ is a point of minimizer of the convex programming
problem (CP ) with C given by (3.1) if and only if there exists λ̄ ∈ Rm + such
that
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄).
i∈I(x̄)

Setting λ̄i = 0 for i 6∈ I(x̄), the above expression can be rewritten as


m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

© 2012 by Taylor & Francis Group, LLC


3.2 Slater Constraint Qualification 151

The above two expressions form the celebrated Karush–Kuhn–Tucker (KKT)


optimality conditions for the convex programming problem (CP ) with C given
by (3.1). The vector λ̄ ∈ Rm+ is called a Lagrange multiplier or a Karush–Kuhn–
Tucker (KKT) multiplier. The second condition is known as the complemen-
tary slackness condition.
Now suppose that the KKT optimality conditions are satisfied. Then there
exist ξ0 ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄), i = 1, 2, . . . , m, such that
m
X
0 = ξ0 + λi ξi . (3.6)
i=1

Therefore, by the convexity of f and gi , i = 1, 2, . . . , m, for every x ∈ Rn ,

f (x) − f (x̄) ≥ hξ0 , x − x̄i,


gi (x) − gi (x̄) ≥ hξi , x − x̄i, i = 1, 2, . . . , m.

The above inequalities along with (3.6) yields that for every x ∈ Rn ,
m
X m
X
f (x) − f (x̄) + λi (gi (x) − gi (x̄)) ≥ hξ0 , x − x̄i + λi hξi , x − x̄i = 0. (3.7)
i=1 i=1

The above inequality holds, in particular, for x ∈ C ⊂ Rn . Invoking the


complementary slackness condition along with the feasibility of x ∈ C, the
condition (3.7) reduces to

f (x) ≥ f (x̄), ∀ x ∈ C.

Thus, x̄ is a point of minimizer of (CP ).


This discussion can be summed up in the form of the following theorem.

Theorem 3.7 Consider the convex programming problem (CP ) with C given
by (3.1). Assume that the Slater constraint qualification holds. Then x̄ is a
point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m,
such that
m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

It is obvious that the computation of the normal cone in Proposition 3.3


plays a major role in the derivation of the KKT optimality conditions. What
is shown by the computation of the normal cone in Proposition 3.3 is that the
Lagrange multipliers are not just auxiliary multipliers that help us convert a
constrained problem into a unconstrained one but are related to the geometry
of the feasible set.

© 2012 by Taylor & Francis Group, LLC


152 Basic Optimality Conditions Using the Normal Cone

Remark 3.8 In Proposition 3.3, we have seen how to compute the normal
cone when the convex inequality constraints need not be smooth. Now if gi ,
i = 1, 2, . . . , m, are differentiable and the Slater constraint qualification holds,
then from Proposition 3.3
X
NC (x̄) = {v ∈ Rn : λi ∇gi (x̄), λi ≥ 0, ∀ i ∈ I(x̄)}. (3.8)
i∈I(x̄)

This can be actually computed easily. Note that v ∈ NC (x̄) if x̄ is a point of


minimizer of the problem
min − hv, xi subject to gi (x) ≤ 0, i = 1, 2, . . . , m.
As the Slater condition holds, by Theorem 3.7 there exist λi ≥ 0,
i = 1, 2, . . . , m, such that
m
X
−v + λi ∇gi (x̄) = 0.
i=1

By the complementary slackness condition, λi = 0, i 6∈ I(x̄); thus the above


relation becomes
X
−v + λi ∇gi (x̄) = 0.
i∈I(x̄)

Hence, any v ∈ NC (x̄) belongs to the set on the right-hand side. One can
simply check that any element on the right-hand side is also an element in the
normal cone. From (3.8), it is simple to see that NC (x̄) is a finitely generated
cone with {∇gi (x̄) : i ∈ I(x̄)} being the set of generators. Thus, NC (x̄) is
polyhedral when the gi , i = 1, 2, . . . , m, are differentiable and the Slater
constraint qualification holds.
Is the normal cone also polyhedral if the Slater constraint qualification
holds but the constraint functions gi , i = 1, 2, . . . , m, are not be differen-
tiable? What is seen from Proposition 3.3 is that in the case nondifferentiable
constraints, NC (x̄) can be represented as
X
NC (x̄) = { λi ξi ∈ Rn : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)}
i∈I(x̄)
[ X
= { λi ξi ∈ Rn : λi ≥ 0, i ∈ I(x̄)},
ξi ∈∂gi (x̄) i∈I(x̄)

that is, the union of a family of polyhedral cones.


We will now show by an example that even though NC (x̄) is a union of a
family of polyhedral cones, it itself need not be polyhedral. Consider the set
C ⊆ R3 given as
q
C = {x ∈ R3 : x21 + x22 ≤ −x3 , x3 ≤ 0}.

© 2012 by Taylor & Francis Group, LLC


3.2 Slater Constraint Qualification 153

x3

NC (x̄)

x2
x̄ = (0, 0, 0)

x1
C

FIGURE 3.1: NC (x̄) is not polyhedral.

It is clear that C is described by the constraints


q
x21 + x22 + x3 ≤ 0
x3 ≤ 0.

Each of these are convex functions. It is simple to see that the Slater condition
holds. Just take the point x̂ = (0, 0, −1). It is also simple to see that the first
constraint is not differentiable at x̄ = (0, 0, 0). However, from the geometry,

© 2012 by Taylor & Francis Group, LLC


154 Basic Optimality Conditions Using the Normal Cone

Figure 3.1, it is simple to observe that


q
NC (x̄) = {v ∈ R3 : v12 + v22 ≤ v3 , v3 ≥ 0}.

It is easy to observe that this cone, which is also known as the second-order
cone, is not polyhedral as it has an infinite number of generators and hence is
not finitely generated.

3.3 Abadie Constraint Qualification


From the previous section it is obvious that to derive the KKT conditions an
important feature is that the Slater constraint qualification is satisfied. But
what happens if the Slater constraint qualification is not satisfied? Is there
any other route to derive the KKT conditions? In this direction, we introduce
what is known as the Abadie constraint qualification. Consider the problem
(CP ) with C given by (3.1), that is

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}

where gi , i = 1, 2, . . . , m, are convex functions. Then the Abadie constraint


qualification is said to hold at x̄ ∈ C if

TC (x̄) = {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}.



As C is convex, (TC (x̄)) = NC (x̄) and the expression (ii) in Theorem 3.1 can
be written as

0 ∈ ∂f (x̄) + (TC (x̄)) .

If the Abadie constraint qualification holds, we can compute the NC (x̄) as



NC (x̄) = (S(x̄)) ,

where (S(x̄)) denotes the polar cone of the cone

S(x̄) = {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}.

It can be easily verified that S(x̄) is a closed convex cone. Also observe that

TC (x̄) ⊂ {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}

is always satisfied. So one may simply consider the reverse inclusion as the

Abadie constraint qualification. We will now compute (S(x̄)) . But before
we do that, let us convince ourselves through an example that the Abadie

© 2012 by Taylor & Francis Group, LLC


3.3 Abadie Constraint Qualification 155

constraint qualification can hold even if the Slater constraint qualification


fails. Consider

C = {x ∈ R : |x| ≤ 0, x ≤ 0}.

Here, g1 (x) = |x|, g2 (x) = x and of course C = {0}. Let us set x̄ = 0. This
shows that TC (x̄) = {0}. Further, because both constraints are active at x̄,

S(x̄) = {v ∈ R : g1′ (x̄, v) ≤ 0, g2′ (x̄, v) ≤ 0}


= {v ∈ R : g1′ (x̄, v) ≤ 0, h∇g2 (x̄), vi ≤ 0}
= {v ∈ R : |v| ≤ 0, v ≤ 0}
= {0}.

Hence TC (x̄) = S(x̄), showing that the Abadie constraint qualification holds
while it is clear that the Slater constraint qualification does not hold.
Now we present the following result.
◦ b
Proposition 3.9 (S(x̄)) = cl S(x̄).

Proof. From the relation (3.2),


 
 X 
b
S(x̄) = λi ξi : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)
 
i∈I(x̄)

is a convex cone from Lemma 3.5. Recall from the proof of Lemma 3.5 that
b
S(x̄) was shown to be closed under the Slater constraint qualification. In the
b
absence of the Slater constraint qualification, S(x̄) need not be closed. First
b ◦ b
we show that cl S(x̄) ⊆ (S(x̄)) . Consider any v ∈ S(x̄), P which implies there
exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that v = i∈I(x̄) λi ξi . Consider
any element w ∈ S(x̄), that is, gi′ (x̄, w) ≤ 0 for i ∈ I(x̄). Hence for every
i ∈ I(x̄), by Theorem 2.79, hξi , wi ≤ 0 for every ξi ∈ ∂gi (x̄), which implies
* +
X
λi ξi , w ≤ 0,
i∈I(x̄)

b ◦
that is, hv, wi ≤ 0. Because w ∈ S(x̄) was arbitrarily chosen, S(x̄) ⊆ (S(x̄)) ,
◦ b ◦
which by closedness of (S(x̄)) leads to cl S(x̄) ⊆ (S(x̄)) .
To complete the proof, we will establish the reverse inclusion, that is,
◦ b ◦ b
(S(x̄)) ⊆ cl S(x̄). On the contrary, assume that (S(x̄)) * cl S(x̄), which
◦ b b
implies there exists w ∈ (S(x̄)) and w ∈/ cl S(x̄). As cl S(x̄) is a closed convex
cone, by the strict separation theorem, Theorem 2.26 (iii), there exists v ∈ Rn
with v 6= 0 such that

sup hv, ξi < hv, wi.


b
ξ∈cl S(x̄)

© 2012 by Taylor & Francis Group, LLC


156 Basic Optimality Conditions Using the Normal Cone
b
Because 0 ∈ cl S(x̄), hv, wi > 0. We claim that v ∈ (cl S(x̄)) b ◦
, that is,
b
hv, ξi ≤ 0 for every ξ ∈ cl S(x̄). b
If v 6∈ (cl S(x̄))◦ b
, then there exists ξ˜ ∈ cl S(x̄)
˜ ˜ ˜
such that hv, ξi > 0. For every λ > 0, λhv, ξi = hv, λξi > 0. Because cl S(x̄) b
˜ b
is a cone, λξ ∈ cl S(x̄) for λ > 0, which means that as λ becomes sufficiently
large, the inequality
˜ < hv, wi
hv, λξi

b
will be violated. Thus, v ∈ (cl S(x̄)) ◦
. Further, observe that for i ∈ I(x̄),
b
ξi ∈ S(x̄), where ξi ∈ ∂gi (x̄). Therefore, hv, ξi i ≤ 0 for every ξi ∈ ∂gi (x̄),
i ∈ I(x̄), which implies that gi′ (x̄, v) ≤ 0 for every i ∈ I(x̄). This shows that

v ∈ S(x̄) and therefore, hv, wi ≤ 0 because w ∈ (S(x̄)) . This leads to a
contradiction, thereby establishing the result. 
The result below presents the KKT optimality conditions under the Abadie
constraint qualification.

Theorem 3.10 Consider the convex programming problem (CP ) with C


given by (3.1). Let x̄ be a point of minimizer of (CP ) and assume that the
Abadie constraint qualification holds at x̄. Then
b
0 ∈ ∂f (x̄) + cl S(x̄). (3.9)

Conversely, if (3.9) holds for some x̄ ∈ Rn , then x̄ is a point of minimizer of


(CP ). Moreover, the standard KKT optimality conditions hold at x̄ if either
b
S(x̄) is closed or the functions gi , i ∈ I(x̄), are smooth functions.

Proof. If the Abadie constraint qualification holds at x̄, then using Proposi-
tion 3.9, the relation (3.9) holds.
Conversely, suppose that (3.9) holds at x̄. By the convexity of gi , i ∈ I(x̄),
for every ξi ∈ ∂gi (x̄),

hξi , x − x̄i ≤ gi (x) − gi (x̄) ≤ 0, ∀ x ∈ C.

For every b
v ∈ S(x̄), there exist λi ≥ 0 and ξi ∈ ∂gi (x̄) for i ∈ I(x̄) such that
P
v = i∈I(x̄) λi ξi . Therefore, by the above inequality,
X
hv, x − x̄i = hλi ξi , x − x̄i ≤ 0, ∀ x ∈ C,
i∈I(x̄)

b
which implies v ∈ NC (x̄). Thus S(x̄) ⊆ NC (x̄), which along with the fact that
b
NC (x̄) is closed implies that cl S(x̄) ⊆ NC (x̄). Therefore, (3.9) yields

0 ∈ ∂f (x̄) + NC (x̄)

and hence, by Theorem 3.1 (ii), x̄ is a point of minimizer of the convex pro-
gramming problem (CP ).

© 2012 by Taylor & Francis Group, LLC


3.4 Convex Problems with Abstract Constraints 157

If gi , i ∈ I(x̄), are smooth,


 
 X 
b
S(x̄) = λi ∇gi (x̄) ∈ Rn : λi ≥ 0, i ∈ I(x̄) .
 
i∈I(x̄)

b
Thus, S(x̄) is a finitely generated cone and hence is closed. Therefore, it is
b
clear that when either S(x̄) is closed or gi , i ∈ I(x̄) are smooth functions,
then under the Abadie constraint qualification, the standard KKT conditions
are satisfied. 

3.4 Convex Problems with Abstract Constraints


After studying the convex programming problem involving only inequality
constraints, in this section we turn our attention to a slightly modified version
of (CP ), which we denote as (CP 1) given as
min f (x)
subject to gi (x) ≤ 0, i = 1, 2, . . . , m, (CP 1)
x ∈ X,
where we have the additional abstract constraint x ∈ X with X as a closed
convex subset of Rn . The question is how to write down the KKT conditions
for the problem (CP 1).

Theorem 3.11 Let us consider the problem (CP 1). Assume the Slater-type
constraint qualification, that is, there exists x̂ ∈ ri X such that gi (x̂) < 0 for
i = 1, 2, . . . , m. Then the KKT optimality conditions are necessary as well as
sufficient at a point of minimizer x̄ of (CP 1) and are given as
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2 . . . , m.
i=1

Proof. The problem (CP 1) can be written as

min f (x) subject to x ∈ C ∩ X,

where C is given by (3.1). Thus if x̄ is a point of minimizer of (CP 1), then x̄


solves the unconstrained problem

min (f + δC∩X )(x),


x∈Rn

that is, x̄ solves

min (f + δC + δX )(x).
x∈Rn

© 2012 by Taylor & Francis Group, LLC


158 Basic Optimality Conditions Using the Normal Cone

By the optimality condition for unconstrained problem, Theorem 2.89,

0 ∈ ∂(f + δC + δX )(x̄).

The fact that ri dom f = Rn along with the Slater-type constraint qualifica-
tion and Propositions 2.15 and 2.67 imply that x̂ ∈ ri dom f ∩ ri C ∩ ri X. In-
voking the Sum Rule, Theorem 2.91, along with the facts that ∂δC (x̄) = NC (x̄)
and ∂δX (x̄) = NX (x̄), the above relation leads to

0 ∈ ∂f (x̄) + NC (x̄) + NX (x̄).

The Slater-type constraint qualification implies the Slater constraint qualifi-


cation which along with Proposition 3.3 yields
X
NC (x̄) = { λi ∂gi (x̄) : λi ≥ 0, i ∈ I(x̄)}.
i∈I(x̄)

By choosing λi = 0, i ∈ / I(x̄), the desired KKT optimality conditions are


obtained.
Conversely, by the optimality condition, there exist ξ0 ∈ ∂f (x̄) and
ξi ∈ ∂gi (x̄), i = 1, 2 . . . , m, such that
m
X
−ξ0 − λi ξi ∈ NX (x̄),
i=1

that is,
m
X
hξ0 , x − x̄i + hλi ξi , x − x̄i ≥ 0, ∀ x ∈ X.
i=1

The convexity of f and gi , i = 1, 2, . . . , m, along with the above condition


leads to
m
X m
X
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) ≥ 0, ∀ x ∈ X.
i=1 i=1

In particular, for any x ∈ C, the above inequality reduces to

f (x) ≥ f (x̄), ∀ x ∈ X,

thereby establishing that x̄ is a point of minimizer of (CP 1). 


Next consider the problem
min f (x)
subject to x ∈ C = {x ∈ Rn : Ax = b}, (CP 2)
m
where A is a m × n matrix and b ∈ R . It is clear that C is a polyhedron.
Further, a point x̄ ∈ C is a point of minimizer of f over C if and only if

0 ∈ ∂f (x̄) + NC (x̄).

© 2012 by Taylor & Francis Group, LLC


3.5 Max-Function Approach 159

If v ∈ NC (x̄), then x̄ solves the following smooth problem

min −hv, xi
subject to Ax = b.

As the constraints are affine, the KKT optimality conditions for this problem
automatically hold, that is, there exists λ ∈ Rm such that

−v + AT λ = 0,

that is, v = AT λ. Therefore,



NC (x̄) = v ∈ Rn : v = AT λ, λ ∈ Rm .

Hence, the optimality condition is that there exists λ ∈ Rm such that

−AT λ ∈ ∂f (x̄).

Using the convexity of f , the above relation implies that x̄ is a point of mini-
mizer of (CP 2). This discussion can be stated as the following theorem.

Theorem 3.12 Consider the problem (CP 2). Then x̄ is a point of minimizer
of (CP 2) if and only if there exists λ ∈ Rm such that

−AT λ ∈ ∂f (x̄).

3.5 Max-Function Approach


Until now the convex programming problems were tackled without modifying
the constraint sets. But every convex programming problem can be expressed
as nonsmooth convex programming with a lesser number of constraints. Con-
sider the problem (CP ) where C is given by (3.1), that is, convex inequality
constraints. Assume that the objective function f and the constraints func-
tions gi , i = 1, 2, . . . , m, are convex and smooth. Then (CP ) can be equiva-
lently posed as a problem with only one constraint, which is given as
min f (x) subject to g(x) ≤ 0, (CPeq )
n m
where g : R → R is defined as

g(x) = max{g1 (x), g2 (x), . . . , gm (x)}.

Hence g is intrinsically nonsmooth. We would like to invite the reader to


deduce the optimality condition of the problem (CP ) using (CPeq ). It is clear
that one needs to use the Max-Function Rule, Theorem 2.96, for evaluating
the subdifferential of the max-function. Thus, at a very fundamental level,

© 2012 by Taylor & Francis Group, LLC


160 Basic Optimality Conditions Using the Normal Cone

every convex programming problem (smooth or nonsmooth) is a nonsmooth


convex programming problem.
The Max-Function Rule is also in some sense very fundamental to convex
programming problems as can be seen in the result below. In the following
result, we derive the KKT optimality conditions for the convex programming
problem (CP ) with C given by (3.1) using the max-function approach.

Theorem 3.13 Consider the convex programming problem (CP ) with C


given by (3.1). Assume that the Slater constraint qualification holds. Then x̄ is
a point of minimizer of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m,
such that
m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Proof. As x̄ is a point of minimizer of (CP ), it also solves the unconstrained


problem

min F (x) subject to x ∈ Rn ,

where F (x) = max{f (x) − f (x̄), g1 (x), g2 (x), . . . , gm (x)}. Then by the uncon-
strained optimality condition, Theorem 2.89,

0 ∈ ∂F (x̄).

Applying the Max-Function Rule, Theorem 2.96,


[
0 ∈ co {∂f (x̄) ∪ ( ∂gi (x̄))},
i∈I(x̄)

where I(x̄) is the activePindex set at x̄. Therefore, there exists λi ≥ 0,


i ∈ {0} ∪ I(x̄) satisfying i∈{0}∪I(x̄) λi = 1 such that
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄). (3.10)
i∈I(x̄)

We claim that λ0 6= 0. On the contrary, assume that λ0 = 0. Thus, the above


inclusion reduces to
X
0∈ λi ∂gi (x̄),
i∈I(x̄)

that is, there exists ξi ∈ ∂gi (x̄) such that


X
0= λi ξi . (3.11)
i∈I(x̄)

© 2012 by Taylor & Francis Group, LLC


3.6 Cone-Constrained Convex Programming 161

By the convexity of gi , i ∈ I(x̄),

gi (x) = gi (x) − gi (x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i ∈ I(x̄),

which along with (3.11) implies that


X
λi gi (x) ≥ 0, ∀ x ∈ Rn .
i∈I(x̄)

As the Slater constraint qualification holds, there exists x̂ ∈ Rn such that


gi (x̂) < 0, i = 1, 2, . . . , m. Thus,
X
λi gi (x̂) < 0,
i∈I(x̄)

which is a contradiction of the preceding inequality. Therefore, λ0 6= 0 and


hence dividing (3.10) throughout by λ0 yields
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i∈I(x̄)

λi
where λ̄i = , i ∈ I(x̄). Taking λ̄i = 0, i 6∈ I(x̄), the above condition
λ0
becomes
m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i=1

that is, the KKT optimality condition. It is easy to observe that

λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,

thus yielding the desired conditions. The sufficiency part can be worked out
using the convexity of the functions, as done in the previous KKT optimality
theorems. 

3.6 Cone-Constrained Convex Programming


A convex optimization problem can be posed in a more general format. Con-
sider a nonempty closed convex cone S ⊂ Rm . Then consider the problem
min f (x) subject to G(x) ∈ −S, (CCP )
n n m
where f : R → R is a convex function and G : R → R is a S-convex
function; that is, for any x, y ∈ Rn and λ ∈ [0, 1],

(1 − λ)G(x) + λG(y) − G((1 − λ)x + λy) ∈ S.

© 2012 by Taylor & Francis Group, LLC


162 Basic Optimality Conditions Using the Normal Cone

In particular, if S = Rm + , then the above problem reduces to (CP ) with C


given by (3.1). If S = Rs+ × {0}m−s , then (CCP ) reduces to a convex problem
with both inequality and equality constraints. If S is not these two cones, then
(CCP ) is called a cone-constrained problem. We will now derive optimality
conditions for a slightly more general problem that has an added abstract
constraint. Consider the problem
min f (x) subject to G(x) ∈ −S, x ∈ X, (CCP 1)
where X ⊂ Rn is a nonempty closed convex set. There are many ways to
approach this problem. Here we demonstrate one approach. Define

C = {x ∈ Rn : G(x) ∈ −S}.

As S and X are nonempty convex sets, by Proposition 2.14, ri S and ri X are


both nonempty. Assume that the Slater-type constraint qualification holds,
that is, there exist x̂ ∈ ri X such that G(x̂) ∈ −ri S. The most natural
approach is to observe that if x̄ solves (CCP 1), then x̄ is also a point of
minimizer of the unconstrained problem

min (f + δC∩X )(x) subject to x ∈ Rn .

This is equivalent to the problem

min (f + δC + δX )(x).

As dom f = Rn , which along with the Slater-type constraint qualification im-


plies that x̂ ∈ ri dom f ∩ ri C ∩ ri X. Invoking the Sum Rule, Theorem 2.91,

0 ∈ ∂f (x̄) + ∂δC (x̄) + ∂δX (x̄).

and thus,

0 ∈ ∂f (x̄) + NC (x̄) + NX (x̄).

So our main concern now is to explicitly compute NC (x̄). How does one do
that? We have already observed that it is not so straightforward to compute
the normal cone when the inequality constraints are not smooth. Let us now
mention that in this case also we do not consider G to be differentiable. Thus
we shall now introduce the notion of a subdifferential of a cone convex function.
As G is a S-convex function, we call an m × n matrix A to be a subgradient
of G at x ∈ Rn

G(y) − G(x) − A(y − x) ∈ S, ∀ y ∈ Rn .

Then the subdifferential of the cone convex function G at x is given as

∂G(x) = {A ∈ Rm×n : G(y) − G(x) − A(y − x) ∈ S, ∀ y ∈ Rn }.

The important question is whether the set ∂G(x) is nonempty.

© 2012 by Taylor & Francis Group, LLC


3.6 Cone-Constrained Convex Programming 163

It was shown, for example, in Luc, Tan, and Tinh [78] that if G is an
S-convex function, then G is continuous on Rn . Further, G is also a locally
Lipschitz function on Rn ; that is, for any x0 ∈ Rn , there exists a neighborhood
N (x0 ) of x0 such that there exists Lx0 > 0 satisfying

kG(y) − G(x)k ≤ Lx0 ky − xk, ∀ x, y ∈ N (x0 ).

Observe that Lx0 depends on the chosen x0 and is also called the Lipschitz
constant at x0 . Also, note that a locally Lipschitz vector function need not
be differentiable everywhere. For a locally Lipschitz function G, the Clarke
Jacobian of G at x is given as follows,
 
∂C G(x) = co A ∈ Rm×n : A = lim JG(xk ) where xk → x, xk ∈ D ,
k→∞

where D is the set of points on Rn at which G is differentiable and JG(y) de-


notes the Jacobian of G at y. In fact, there is a famous theorem of Rademacher
that says that Rn \D is a set of Lebesgue measure zero. The set ∂C G(x) 6= ∅
for all x ∈ Rn and is convex and compact. For more details on the Clarke
Jacobian, see for example Clarke [27] or Demyanov and Rubinov [30]. The
property that will be important to us is the Clarke Jacobian as a set-valued
map is locally bounded and graph closed.
It was shown for example in Luc, Tan, and Tinh [78] that ∂C G(x) ⊆ ∂G(x),
thereby proving that if G is an S-convex function, then ∂G(x) 6= ∅ for every
x ∈ Rn . Before we proceed to develop the optimality conditions for (CCP 1),
let us look at a locally Lipschitz function φ : Rn → R.
Recall that a function φ : Rn → R is locally Lipschitz at x0 if there exists
a neighborhood N (x0 ) of x0 and Lx0 > 0 such that

|φ(y) − φ(x)| ≤ Lx0 ky − xk, ∀ x, y ∈ N (x0 ).

Naturally a locally Lipschitz scalar-valued function is not differentiable every-


where and the Rademacher Theorem tells us that the set of points where φ is
not differentiable forms a set of measure zero. Therefore, at any x ∈ Rn , the
Clarke generalized gradient or Clarke subdifferential is given as

∂ ◦ φ(x) = co {ξ ∈ Rn : ξ = lim ∇φ(xk ) where xk → x, xk ∈ D̃},


k→∞

where D̃ denotes the set of points at which φ is differentiable. One can observe
that if m = 1, the Clarke Jacobian reduces to the Clarke subdifferential.
The Clarke subdifferential is nonempty, convex, and compact. If x̄ is a local
minimum of φ over Rn , then 0 ∈ ∂ ◦ φ(x̄). It is important to note that this
condition is necessary but not sufficient. Now we state a calculus rule that
will be useful in our computation of the normal cone. The Sum Rule is from
Clarke [27].

© 2012 by Taylor & Francis Group, LLC


164 Basic Optimality Conditions Using the Normal Cone

Consider two locally Lipschitz functions φ1 , φ2 : Rn → R. Then

∂ ◦ (φ1 + φ2 )(x) ⊆ ∂ ◦ φ1 (x) + ∂ ◦ φ2 (x).

If one of the functions is continuously differentiable, then equality


holds.

The Chain Rule that we state is from Demyanov and Rubinov [30] (see also
Dutta [36]).

Consider the function φ ◦ Φ where Φ : Rn → Rm and φ : Rm → R


are locally Lipschitz functions. Assume that φ is continuously dif-
ferentiable. Then

∂ ◦ (φ ◦ Φ)(x) = {z T ∇φ(Φ(x)) ∈ Rn : z ∈ ∂C Φ(x)}.

Observe that v ∈ NC (x̄) (in the current context of (CCP 1)) if and only if
x̄ is a point of minimizer of the problem
min − hv, xi subject to G(x) ∈ −S. (N P )
n
For simplicity, assume that C = {x ∈ R : G(x) ∈ −S} is an n-dimensional
convex set. The approach to derive the necessary and sufficient condition for
optimality is due to Rockafellar [100] (see also Chapter 6 of Rockafellar and
Wets [101]). As the above problem is a convex programming problem, x̄ is a
global point of minimizer of (N P ). Further, without loss of generality, we can
assume it to be a unique. Observe that if we define

f (x) = −hv, xi and f˜(x) = −hv, xi + εkx − x̄k2 ,

then ∂f (x̄) = ∂ f˜(x̄) = {−v} and x̄ is the unique minimizer of f˜ because it


is a strictly convex function. Consider an n- dimension convex compact set
Y ⊂ Rn such that x̄ ∈ int Y and C ∩ Y 6= ∅ (Figure 3.2). It is simple to see
that x̄ is also the unique minimizer of the problem
min f (x) subject to G(x) ∈ −S, x ∈ Y. (N P 1)
Also observe that the normal cone NC (x̄) = NC∩Y (x̄). (We would urge the
readers to think why).
Our approach depends on the use of penalization, a method popular for
designing algorithms for constrained optimization. Consider the problem
min fˆ(x, u) = f (x) subject to G(x) − u = 0, (x, u) ∈ Y × (−S). (N dP)
As x̄ is the unique point of minimizer of (N P 1), we deduce that
(x̄, ū) = (x̄, G(x̄)) is the unique point of minimizer of (N dP ). For a sequence
of εk ↓ 0, consider the sequence of penalty approximations
1
min fˆk (x, u) = f (x) + kG(x) − uk2
2εk
subject to (x, u) ∈ Y ×(−S). d
(N P k)

© 2012 by Taylor & Francis Group, LLC


3.6 Cone-Constrained Convex Programming 165

C 11
00
C ∩Y 00
11

FIGURE 3.2: C ∩ Y .

Consider the following closed set

Sk = {(x, u) ∈ Y × (−S) : fˆ(x, u) ≤ fˆ(x̄, ū) = f (x̄)}.

Note that ū = G(x̄). Also, Sk is nonempty as (x̄, ū) ∈ Sk for each k. Because
Sk is nonempty for each k, solving (Nd P k ) is same as minimizing fˆk over Sk .
Denote the minimum of f over the compact set Y by µ. For any (x, u) ∈ Sk ,

fˆk (x, u) ≤ fˆk (x̄, ū) = f (x̄),

which implies
1
f (x) + kG(x) − uk2 ≤ f (x̄),
2εk

where (x, u) ∈ Sk ⊂ Y × (−S). As f (x) ≥ µ for every x ∈ Y ,

1
µ+ kG(x) − uk2 ≤ f (x̄),
2εk

which leads to
p
kG(x) − uk ≤ 2εk (f (x̄) − µ).

Thus for any given k,


p
S k ⊆ {(x, u) ∈ Y × (−S) : kG(x) − uk ≤ 2εk (f (x̄) − µ)}. (3.12)

© 2012 by Taylor & Francis Group, LLC


166 Basic Optimality Conditions Using the Normal Cone

Also, for a fixed k,


p
kuk ≤ kG(x)k + 2εk (f (x̄) − µ).

As G is an S-convex function, it is also locally Lipschitz and hence G(Y )


is a compact set. This shows that the right-hand side of (3.12) is bounded
for a fixed k. From this, along with the compactness of Y , we can deduce
that Sk is compact and thus fˆk achieves a minimum over Sk . Hence, (N dP k)
has a point of minimizer that naturally need not be unique. Denote a point
of minimizer of (N dP k ) by (xk , uk ) and thus, obtaining a bounded sequence
{(xk , uk )} ⊂ Y × (−S), which satisfies
p
kG(xk ) − uk k ≤ 2εk (f (x̄) − µ),

and as (xk , uk ) ∈ Sk ,

f (xk ) ≤ fˆk (xk , uk ) ≤ f (x̄).

Because {(xk , uk )} is bounded, by the Bolzano–Weierstrass Theorem, Propo-


sition 1.3, it has a convergent subsequence. Without loss of generality, assume
that xk → x̃ and uk → ũ. Therefore, as k → ∞, εk → 0 and thus

kG(x̃) − ũk = 0 and f (x̃) ≤ f (x̄).

Hence, ũ = G(x̃) and thus (x̃, ũ) is also a minimizer of (N d P ). But as (x̄, ū) is
the unique point of minimizer of (N d P ), we have x̃ = x̄ and ũ = ū.
d
Because (xk , uk ) is a point of minimizer of (N P k ), it is a simple exercise to
see that xk minimizes fk (x, uk ) over Y and uk minimizes fˆk (xk , u) over −S.
ˆ
Hence,

0 ∈ ∂x◦ fˆk (xk , uk ) + NY (xk ), (3.13)


0 ∈ ∂ ◦ fˆk (xk , uk ) + N−S (uk ).
u (3.14)

Now we analyze these conditions in more detail. Denote


1
yk = (G(xk ) − uk ).
εk
From condition (3.14),

−∇u fˆk (xk , uk ) ∈ N−S (uk ).

One can easily compute ∇u fˆk (xk , uk ) to see that yk = −∇u fˆk (xk , uk ) and
hence yk ∈ N−S (uk ). Moreover, applying the Sum Rule and the Chain Rule
for a locally Lipschitz to (3.13), then for each k,

0 ∈ −v + ∂C G(xk )T yk + NY (xk ). (3.15)

© 2012 by Taylor & Francis Group, LLC


3.6 Cone-Constrained Convex Programming 167

Suppose that {yk } is bounded and thus by the Bolzano–Weierstrass The-


orem has a convergent subsequence. Without loss of generality, suppose that
yk → ȳ. Noting that the normal cone is graph closed as a set-valued map and
∂C G is locally bounded, taking the limit as k → ∞ in (3.15) leads to

0 ∈ −v + ∂C G(x̄)T ȳ + NY (x̄).

But as x̄ ∈ int Y , by Example 2.38 NY (x̄) = {0}. As ∂C G(x̄) ⊂ ∂G(x̄),


v ∈ ∂C G(x̄)T ȳ ⊂ ∂G(x̄)T ȳ. Thus, v = z T ȳ for some z ∈ ∂G(x̄).
The important question is can {yk } be unbounded? We show that if {yk }
is unbounded, then the Slater constraint qualification, that is, there exists
x̂ ∈ Rn such that G(x̂) ∈ −ri S is violated.
On the contrary, assume that {yk } is unbounded and thus kyk k → ∞ as
k → ∞. Hence, noting that ∂C G(xk ) ⊂ ∂G(xk ), from (3.15) we have

1 yk 1
0∈ (−v) + ∂G(xk )T + NY (xk ),
kyk k kyk k kyk k

which implies
1
0∈ (−v) + ∂G(xk )T wk + NY (xk ), (3.16)
kyk k
yk
where wk = . Hence, {wk } is a bounded sequence and thus by the
kyk k
Bolzano–Weierstrass Theorem, Proposition 1.3, has a convergent subsequence.
Without loss of generality, assume that wk → w̄ with kw̄k = 1. Hence from
(3.16),
0 ∈ ∂G(x̄)T w̄. (3.17)
As yk ∈ N−S (uk ), we have wk ∈ N−S (uk ). Again using the fact that the
normal cone map has a closed graph, w̄ ∈ N−S (ū). Hence,

hw̄, z − G(x̄)i ≤ 0, z ∈ −S.

Because S is a cone, 0 ∈ −S, thus

hw̄, −G(x̄)i ≤ 0,

that is,
hw̄, G(x̄)i ≥ 0. (3.18)
Consider p ∈ −S. As S is a convex cone, by Theorem 2.20, G(x̄) + p ∈ −S.
Hence,

hw̄, G(x̄) + p − G(x̄)i ≤ 0,

which implies hw̄, pi ≤ 0. Because p was arbitrary,

hw̄, pi ≤ 0, ∀ p ∈ −S.

© 2012 by Taylor & Francis Group, LLC


168 Basic Optimality Conditions Using the Normal Cone

Thus, w̄ ∈ S + . Hence, hx̄, G(x̄)i ≤ 0, which together with (3.18) leads to


hw̄, G(x̄)i = 0. For any y ∈ Rn ,

G(y) − G(x̄) − A(y − x̄) ∈ S, ∀ A ∈ ∂G(x̄),

which implies

hw̄, G(y)i − hw̄, G(x̄)i − hw̄, A(y − x̄)i ≥ 0, ∀ A ∈ ∂G(x̄).

From (3.17), if {yk } is unbounded, there exists z̄ ∈ ∂G(x̄) such that z̄ T w̄ = 0.


Thus, from the above inequality we have

hw̄, G(y)i − hw̄, G(x̄)i − hz̄ T w̄, (y − x̄)i ≥ 0, ∀ y ∈ Rn ,

which along with hw̄, G(x̄)i = 0 and z̄ T w̄ = 0 yields

hw̄, G(y)i ≥ 0, ∀ y ∈ Rn .

If the Slater constraint qualification holds, there exists x̂ ∈ Rn such that


G(x̂) ∈ −ri S. As kw̄k = 1 and w̄ ∈ S + , hw̄, G(x̂)i < 0, which contradicts the
above inequality. Therefore, {yk } cannot be an unbounded sequence. Thus,
we leave it to the reader to see that

v = z T ȳ, where ȳ ∈ S + ,

and hence conclude that

NC (x̄) = {v ∈ Rn : there exists ȳ ∈ S + , z ∈ ∂G(x̄) satisfying


hȳ, G(x̄)i = 0 such that v = z T ȳ}.

The reader is now urged to write the necessary and sufficient optimality con-
ditions for the problem (CCP 1), as the structure of the normal cone to C at
x̄ is now known.

© 2012 by Taylor & Francis Group, LLC


Chapter 4
Saddle Points, Optimality, and Duality

4.1 Introduction
In the previous chapter, the KKT optimality conditions was studied using the
normal cone as one of the main vehicles of expressing the optimality condi-
tions. One of the central issues in the previous chapter was the computation
of the normal cone at the point of the feasible set C where the set C was ex-
plicitly described by the inequality constraints. In this chapter our approach
to the KKT optimality condition will take us deeper into convex optimization
theory and also we can avoid the explicit computation of the normal cone.
This approach uses the saddle point condition of the Lagrangian function as-
sociated with (CP ). We motivate the issue using two-person-zero-sum games.
Consider a two-person-zero-sum game where we denote the players as
Player 1 and Player 2 having strategy sets X ⊂ Rn and Λ ⊂ Rm , respec-
tively, which we assume to be compact for simplicity. In each move of the
game, the players reveal their choices simultaneously. For every choice x ∈ X
by Player 1 and λ ∈ Λ by Player 2, an amount L(x, λ) is paid by Player 1 to
Player 2. Now Player 1 behaves in the following way. For any given choice of
strategy x ∈ X, he would like to know what the maximum amount he would
have to give to Player 2. In effect, he computes the function
φ(x) = max L(x, λ).
λ∈Λ

Further, it is natural that he would choose an x ∈ X that minimizes φ(x),


that is, Player 1 solves the problem
min φ(x) subject to x ∈ X,
which implies that in effect, he solves a minimax problem
min max L(x, λ).
x∈X λ∈Λ

Similarly, Player 2 would naturally want to know what the guaranteed amount
he will receive once he makes a move λ ∈ Λ. This means he computes the
function
ψ(λ) = min L(x, λ).
x∈X

169

© 2012 by Taylor & Francis Group, LLC


170 Saddle Points, Optimality, and Duality

Of course he would like to maximize the amount of money he gets and therefore
solves the problem

max ψ(λ) subject to λ ∈ Λ,

that is, he solves

max min L(x, λ).


λ∈Λ x∈X

Thus, in every game there are two associated optimization problems. The
minimization problem for Player 1 and the maximization problem for Player
2. In the optimization literature, the problem associated with Player 1 is
called the primal problem while that associated with Player 2 is called the dual
problem. Duality is a deep issue in modern optimization theory. In this chapter,
we will have quite a detailed discussion on duality in convex optimization. The
game is said to have a value if

min max L(x, λ) = max min L(x, λ).


x∈X λ∈Λ λ∈Λ x∈X

The above relation is the minimax equality.


For any given λ̃ ∈ Λ,

min L(x, λ̃) ≤ min max L(x, λ).


x∈X x∈X λ∈Λ

Because λ̃ ∈ Λ is arbitrary, we obtain the minimax inequality, that is,

max min L(x, λ) ≤ min max L(x, λ),


λ∈Λ x∈X x∈X λ∈Λ

which always holds true.


Of course the minimax equality would hold true if a saddle point exists,
that is, a pair (x̄, λ̄) ∈ X × Λ exists that satisfies the following inequality,

L(x̄, λ) ≤ L(x̄, λ̄) ≤ L(x, λ̄), ∀ x ∈ X, ∀ λ ∈ Λ.

The above relation is called the saddle point condition. It is easy to observe
that (x̄, λ̄) ∈ X × Λ is a saddle point if and only if

max L(x̄, λ) = L(x̄, λ̄) = min L(x, λ̄).


λ∈Λ x∈X

The above condition implies

min max L(x, λ) ≤ max L(x̄, λ) = min L(x, λ̄) ≤ max min L(x, λ),
x∈X λ∈Λ λ∈Λ x∈X λ∈Λ x∈X

which along with the minimax inequality yields the minimax equality.
Before moving on to study the optimality of the convex programming

© 2012 by Taylor & Francis Group, LLC


4.2 Basic Saddle Point Theorem 171

problem (CP ) via the saddle point approach, we state the Saddle Point The-
orem (Proposition 2.6.9, Bertsekas [12]) for which we will need the following
notations.
For each λ ∈ Λ, define the proper function φλ : Rn → R̄ as

L(x, λ), x ∈ X,
φλ (x) =
+∞, otherwise,

and for every x ∈ X, the proper function ψx : Rm → R̄ is given by



−L(x, λ), λ ∈ Λ,
ψx (λ) =
+∞, otherwise.

Proposition 4.1 (Saddle Point Theorem) Assume that for every λ ∈ Λ, φλ


and for every x ∈ X, ψx are lsc and convex. The set of saddle points of L is
nonempty and compact under any one of the following conditions:

(i) X and Λ are compact.


(ii) Λ is compact and there exists λ̄ ∈ Λ and α ∈ R such that the set

{x ∈ X : L(x, λ̄) ≤ α}

is nonempty and compact.


(iii) X is compact and there exists x̄ ∈ X and α ∈ R such that the set

{λ ∈ Λ : L(x̄, λ) ≥ α}

is nonempty and compact.


(iv) There exist x̄ ∈ X, λ̄ ∈ Λ, and α ∈ R such that

{x ∈ X : L(x, λ̄) ≤ α} and {λ ∈ Λ : L(x̄, λ) ≥ α}

are nonempty and compact.

This proposition will play a pivotal role in the study of enhanced optimality
conditions in Chapter 5.

4.2 Basic Saddle Point Theorem


The saddle point condition can itself be taken as an optimality condition for
the problem of Player 1, that is,

min φ(x) subject to x ∈ X.

© 2012 by Taylor & Francis Group, LLC


172 Saddle Points, Optimality, and Duality

Our question is, can we construct a function like L(x, λ) for the convex
(CP ) for which f (x) can be represented in a way as φ(x) has been represented
through L(x, λ)? Note that if we remove the compactness from Λ, then φ(x)
could take up +∞ value for some x. It is quite surprising that for the objective
function f (x) of (CP ), such a function can be obtained by considering the
classical Lagrangian function from calculus.
For the problem (CP ) with inequality constraints, we construct the La-
grangian function L : Rn × Rm + → R as

m
X
L(x, λ) = f (x) + λi gi (x)
i=1

with λ = (λ1 , λ2 , . . . , λm ) ∈ Rm
+ . Observe that it is a simple matter to show
that

f (x), x is feasible,
sup L(x, λ) =
λ∈Rm +
+∞, otherwise.

Here, the Lagrangian function L(x, λ) is playing the role of L(x, λ). So the
next pertinent question is, if we can solve (CP ) then does there exist a saddle
point for it? Does the existence of a saddle point for L(x, λ) guarantee that
a solution to the original problem (CP ) is obtained? The following theorem
answers the above questions. Recall the convex programming problem
min f (x) subject to x∈C (CP )
with C given by

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}, (4.1)

where gi : Rn → R, i = 1, 2, . . . , m, are now assumed to be convex and


non-affine functions.

Theorem 4.2 Consider the convex programming problem (CP ) with C given
by (4.1). Assume that the Slater constraint qualification holds, that is, there
exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. Then x̄ is a point of
minimizer of (CP ) if and only if there exists λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) ∈ Rm
+
satisfying the complementary slackness condition, that is, λ̄i gi (x̄) = 0 for
i = 1, 2, . . . , m and the saddle point condition

L(x̄, λ) ≤ L(x̄, λ̄) ≤ L(x, λ̄), ∀ x ∈ Rn , λ ∈ Rm


+.

Proof. As x̄ is a point of minimizer of (CP ) the following system



f (x) − f (x̄) < 0
,
gi (x) < 0, i = 1, 2, . . . , m

© 2012 by Taylor & Francis Group, LLC


4.2 Basic Saddle Point Theorem 173

has no solution. Define a set

Λ = {(y0 , y) ∈ R × Rm : there exists x ∈ Rn such that


f (x) − f (x̄) < y0 , gi (x) < yi , i = 1, 2, . . . , m}.

We leave it the reader to prove that the set Λ is convex and open. It is clear
that (0, 0) ∈
/ Λ. Hence, by the Proper Separation Theorem, Theorem 2.26 (iv),
there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that
m
X
λ0 y0 + λi yi ≥ 0, ∀ (y0 , y) ∈ Λ. (4.2)
i=1

Corresponding to x̄ ∈ Rn , for yi > 0, i = 0, 1, . . . , m, (y0 , y) ∈ Λ. Also, for


any γ > 0, (y0 + γ, y) ∈ Λ. Therefore, from condition (4.2),
Xm
1
λ0 ≥ − {λ0 y0 + λi yi },
γ i=1

which as the limit γ → ∞ leads to λ0 ≥ 0. It is now left to the reader to prove


in a similar fashion that λ ∈ Rm
+.
For any x ∈ Rn , consider a fixed αi > 0, i = 0, 1, . . . , m. Then for any
γi > 0, i = 0, 1, . . . , m,

(f (x) − f (x̄) + γ0 α0 , g1 (x) + γ1 α1 , . . . , gm (x) + γm αm ) ∈ Λ.

Therefore, from (4.2),


m
X
λ0 (f (x) − f (x̄) + γ0 α0 ) + λi (gi (x) + γi αi ) ≥ 0.
i=1

As γi → 0, the above inequality yields


m
X
λ0 (f (x) − f (x̄)) + λi gi (x) ≥ 0, ∀ x ∈ Rn . (4.3)
i=1

We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0, thereby reducing


(4.3) to
m
X
λi gi (x) ≥ 0, ∀ x ∈ Rn .
i=1

This violates the Slater constraint qualification. Thus, λ0 > 0. Therefore,


λi
denoting λ̄i = , the condition (4.3) yields
λ0
m
X
f (x) − f (x̄) + λ̄i gi (x) ≥ 0, ∀ x ∈ Rn .
i=1

© 2012 by Taylor & Francis Group, LLC


174 Saddle Points, Optimality, and Duality
Pm
In particular, x = x̄ in the above inequality leads to i=1 λ̄i gi (x̄) = 0. Because
the sum of negative numbers is zero only if each term is zero, the complemen-
tary slackness condition, that is, λ̄i gi (x̄) = 0, i = 1, 2, . . . , m, holds. Therefore,
the preceding inequality leads to
m
X m
X
f (x) + λ̄i gi (x) ≥ f (x̄) + λ̄i gi (x̄), ∀ x ∈ Rn ,
i=1 i=1

which implies

L(x, λ̄) ≥ L(x̄, λ̄), ∀ x ∈ Rn .


Pm
Further, for any λ ∈ Rm
+, i=1 λi gi (x̄) ≤ 0. Thus,

m
X m
X
f (x̄) + λi gi (x̄) ≤ f (x̄) = f (x̄) + λ̄i gi (x̄),
i=1 i=1

that is,

L(x̄, λ) ≤ L(x̄, λ̄), ∀ λ ∈ Rm


+,

thereby establishing the saddle point condition.


Conversely, suppose that there exists λ̄ ∈ Rm + such that the saddle point
condition and the complementary slackness condition hold at x̄. We first prove
that x̄ is feasible, that is, −g(x̄) = (−g1 (x̄), −g2 (x̄), . . . , −gm (x̄)) ∈ Rm
+ . On
the contrary, assume that −g(x̄) 6∈ Rm m
+ . As R+ is a closed convex cone, by
the Strict Separation Theorem, Theorem 2.26 (iii), there exists λ ∈ Rm + with
λ 6= 0 such that
m
X
hλ, g(x̄)i = λi gi (x̄) > 0.
i=1

Therefore,
m
X
f (x̄) + λi gi (x̄) > f (x̄),
i=1

which implies L(x̄, λ) > L(x̄, λ̄), thereby contradicting the saddle point con-
dition. Hence, x̄ is feasible to (CP ).
Because L(x̄, λ̄) ≤ L(x, λ̄) and the complementary slackness condition is
satisfied,
m
X
f (x̄) ≤ f (x) + λi gi (x) ≤ f (x), ∀ x ∈ C.
i=1

Thus, x̄ is a point of minimizer of (CP ). 

© 2012 by Taylor & Francis Group, LLC


4.3 Affine Inequalities and Equalities and Saddle Point Condition 175

The consequence of the saddle point criteria is simple. If (x̄, λ̄) is a saddle
point associated with the Lagrangian function of (CP ) where x̄ is a point of
minimizer of f over C, then

L(x̄, λ̄) = minn L(x, λ̄)


x∈R

with λ̄i gi (x̄) = 0 for i = 1, 2, . . . , m. Therefore, by the optimality condition


for the unconstrained problem, Theorem 2.89,

0 ∈ ∂x L(x̄, λ̄),

which under Slater constraint qualification yields


m
X
0 ∈ ∂f (x̄) + λ̄i ∂gi (x̄),
i=1

thus leading to the KKT optimality conditions for (CP ).

4.3 Affine Inequalities and Equalities and Saddle Point


Condition
Observe that in the previous section we had mentioned that the convex func-
tion are non-affine. This eventually has to do with the Slater constraint qual-
ification. Consider the set

C = {(x1 , x2 ) ∈ R2 : x1 + x2 ≤ 0, − x1 ≤ 0}.

This set is described by affine inequalities. However, C = {(0, 0)} and hence
the Slater constraint qualification fails. The question is whether in such a sit-
uation the saddle point condition exists or not. What we show below is that
the presence of affine inequalities does not affect the saddle point condition.
In fact, we should only bother about the Slater constraint qualification for
the convex non-affine inequalities. The presence of affine inequalities by itself
is a constraint qualification. To the best of our knowledge, the first study
in this respect was due to Jaffray and Pomerol [65]. We present their result
establishing the saddle point criteria under a modified version of Slater con-
straint qualification using the separation theorem. For that we now consider
the feasible set C of the convex programming problem (CP ) defined by convex
non-affine and affine inequalities as

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , l}, (4.4)

where gi : Rn → R, i = 1, 2, . . . , m, are convex functions while hj : Rn → R,

© 2012 by Taylor & Francis Group, LLC


176 Saddle Points, Optimality, and Duality

j = 1, 2, . . . , l, are affine functions. Observe that C is a convex set. Correspond-


ing to this convex programming problem (CP ), the associated Lagrangian
function L : Rn × Rm l
+ × R+ is defined as

m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x).
i=1 j=1

Then (x̄, λ̄, µ̄) is the saddle point of (CP ) with C given by (4.4) if

L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄).

We shall now present the proof of Jaffray and Pomerol in a more detailed
and simplified manner.

Theorem 4.3 Consider (CP ) with C defined by (4.4). Assume that the mod-
ified Slater constraint qualification holds, that is, there exists x̂ ∈ Rn such that
gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) ≤ 0, j = 1, 2, . . . , l. Then x̄ is a point of
minimizer of (CP ) if and only if there exist (λ̄, µ̄) ∈ Rm l
+ × R+ such that

L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ Rn , λ ∈ Rm l


+ , µ ∈ R+

along with the complementary slackness conditions, that is,

λ̄i gi (x̄) = 0, i = 1, 2, . . . , m and µ̄j hj (x̄) = 0, j = 1, 2, . . . , l.

Proof. Consider an index set J as the (possibly empty) maximal sub-


P of {1, 2, . . . , l} such that there
set
n
exists αj > 0 for j ∈ J such that
j∈J αj h j (x) = 0 for every x ∈ R . Observe that for every x ∈ C,

hj (x) = 0, ∀ j ∈ J.

Otherwise, if for some x ∈ C and for some j ∈ J, hj (x) < 0, the maximality
of J is contradicted.
Define the Lagrange covers of (CP ) as

Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ Rn such that


f (x) − f (x̄) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) ≤ zj , j ∈ J c , hj (x) = zj , j ∈ J},

where J c = {j ∈ {1, 2, . . . , l} : j ∈
/ J}.
We claim that the set Λ is convex. Consider (y01 , y 1 , z 1 ) and (y02 , y 2 , z 2 ) in Λ
with x1 and x2 the respective associated elements from Rn . For any λ ∈ [0, 1],
x = λx1 + (1 − λ)x2 ∈ Rn . By the convexity of f and gi , i = 1, 2, . . . , m,

f (x) − f (x̄) ≤ λ(f (x1 ) − f (x̄)) + (1 − λ)(f (x2 ) − f (x̄))


≤ λy01 + (1 − λ)y02 ,
gi (x) ≤ λgi (x1 ) + (1 − λ)gi (x2 ) ≤ λyi1 + (1 − λ)yi2 , i = 1, 2, . . . , m,

© 2012 by Taylor & Francis Group, LLC


4.3 Affine Inequalities and Equalities and Saddle Point Condition 177

while the affineness of hj , j = 1, 2, . . . , l leads to

hj (x) = λhj (x1 ) + (1 − λ)hj (x2 ) ≤ λzj1 + (1 − λ)zj2 , j ∈ J c ,


hj (x) = λhj (x1 ) + (1 − λ)hj (x2 ) = λzj1 + (1 − λ)zj2 , j ∈ J.

Thus, for every λ ∈ [0, 1], λ(y01 , y 1 , z 1 ) + (1 − λ)(y02 , y 2 , z 2 ) ∈ Λ with x ∈ Rn


as the associated element, thereby implying the convexity of Λ.
Observe that corresponding to the point of minimizer of (CP ), x̄ ∈ Rn ,
(ȳ0 , 0, 0) ∈ Λ if and only if ȳ0 ≥ 0. Also, (y0 , 0, 0) belongs to the affine hull
of Λ for every y0 ∈ R, and hence, (0, 0, 0) belongs to the relative bound-
ary of Λ. Applying the Proper Separation Theorem, Theorem 2.26 (iv), to
the Lagrange cover Λ and the relative boundary point (0, 0, 0), there exists
(λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that
m
X l
X
λ0 y0 + λi yi + µj zj ≥ 0, ∀ (y0 , y, z) ∈ Λ (4.5)
i=1 j=1

and for some (y0 , y, z) ∈ Λ,


m
X l
X
λ0 y0′ + λi yi′ + µj zj′ > 0. (4.6)
i=1 j=1

Consider (y0 , y, z) ∈ Λ. For any α0 > 0 and α ∈ int Rm + , (y0 +α0 , y +α, z) ∈ Λ.
Therefore, by (4.5), i′ = 0, 1, . . . , m,
 ′

1  
m
X Xl iX −1 Xm
λi′ ≥ − λ0 y0 + λi yi + µj zj + λi αi + λi αi ,
αi′  i=1 i=1 i=0 ′

i=i +1

which as the limit αi′ → +∞ yields λi′ ≥ 0, i′ = 0, 1, . . . , m. Using the above


technique, we can also show that µj ≥ 0, j ∈ J c . The reader is advised to
check this out. Observe that µj for j ∈ J are unrestricted.
Let us proceed by assuming P that J is nonempty. Therefore, there exist
αj > 0, j ∈ J such that j∈J αj hj (x) = 0 for every x ∈ Rn . Redefining
λi , i = 0, 1, . . . , m, and µj , j = 1, 2, . . . , l, as

λ̂i = λi , i = 0, 1, . . . , m, µ̂j = µj , j ∈ J c and µ̂j = µj + γαj , j ∈ J,

where γ > 0 is chosen such that µ̂j > 0 for j ∈ J. Also, observe that
X X X X
µ̂j hj (x) = µj hj (x) + γαj hj (x) = µj hj (x).
j∈J j∈J j∈J j∈J

Thus, the conditions (4.5) and (4.6) hold for (λ̂0 , λ̂, µ̂) as well.
We claim that λ̂0 , λ̂i , i = 1, 2, . . . , m, and µ̂j , j ∈ J c , are not all simul-
taneously zero. On the contrary, assume that λ̂0 = 0, λ̂i = 0, i = 1, 2, . . . , m,

© 2012 by Taylor & Francis Group, LLC


178 Saddle Points, Optimality, and Duality

and µ̂j = 0, j ∈ J c . Therefore, from the construction of Λ along with (4.5)


yields
X
µ̂j hj (x) ≥ 0, ∀ x ∈ Rn .
j∈J

As x̄ is feasible for (CP ), the above condition becomes


X
µ̂j hj (x̄) = 0.
j∈J
P
Therefore, the affine function j∈J µ̂j hj (.) achieves its minimum over Rn at
x̄. Moreover, an affine function is unbounded over Rn . This shows that
X
µ̂j hj (x) = 0, ∀ x ∈ Rn .
j∈J

By condition (4.6), there exists x′ ∈ Rn associated to (y0′ , y ′ , z ′ ) ∈ Λ such that


X
µ̂j hj (x′ ) > 0.
j∈J

Hence, a contradiction is reached. Therefore, λ̂0 , λ̂i , i = 1, 2, . . . , m, and


µ̂j , j ∈ J c , are not all simultaneously zero.
Next suppose that λ̂0 = 0 and λ̂i = 0, i = 1, 2, . . . , m, and for some j ∈ J c ,
µ̂j > 0. Again working along the preceding lines, one obtains
X X
µ̂j hj (x) + µ̂j hj (x) = 0, ∀ x ∈ Rn .
j∈J c :µ̂j >0 j∈J

Observe that {j ∈ J c : µ̂j > 0} is nonempty. Because the above condition


holds for j ∈ {j ∈ J c : µ̂j > 0} ∪ {j ∈ J : µ̂j > 0}, thereby contradicting
the maximality of the index set J. Hence λ̂0 and λ̂i , i = 1, 2, . . . , m, are not
simultaneously zero.
Assume that λ̂0 = 0. As the modified Slater constraint qualification holds,
there exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) ≤ 0,
j = 1, 2, . . . , l, corresponding to x̂,

(f (x̂) − f (x̄), g1 (x̂), . . . , gm (x̂), h1 (x̂), . . . , hl (x̂)) ∈ Λ,

which along with condition (4.5) and the modified Slater constraint qualifica-
tion leads to
m
X l
X
0> λ̂i gi (x̂) + µ̂j hj (x̂) ≥ 0,
i=1 j=1

which is a contradiction. Hence λ̂0 6= 0.

© 2012 by Taylor & Francis Group, LLC


4.3 Affine Inequalities and Equalities and Saddle Point Condition 179

Now dividing (4.5) throughout by λ̂0 yields


m
X l
X
y0 + λ̄i yi + µ̄j zj ≥ 0, ∀ (y0 , y, z) ∈ Λ, (4.7)
i=1 j=1

λ̂i µ̂j
where λ̄i = , i = 1, 2, . . . , m, and µ̄j = , j = 1, 2, . . . , l. Corresponding
λ̂0 λ̂0
to every x ∈ Rn ,

(f (x) − f (x̄), g1 (x), . . . , gm (x), h1 (x), . . . , hl (x)) ∈ Λ,

thereby reducing the inequality (4.7) to


m
X l
X
f (x̄) ≤ f (x) + λ̄i gi (x) + µ̄j hj (x), ∀ x ∈ Rn . (4.8)
i=1 j=1

By the feasibility of x̄ for (CP ) and the fact that (λ̄, µ̄) ∈ Rm l
+ × R+ , condition
(4.8) implies that

L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ Rn .

In particular, taking x = x̄ in (4.8), along with the feasibility of x̄, leads to


m
X l
X
λ̄i gi (x̄) + µ̄j hj (x̄) = 0.
i=1 j=1

This shows that

λ̄i gi (x̄) = 0, i = 1, 2, . . . , m and µ̄j hj (x̄) = 0, j = 1, 2, . . . , l,

thereby establishing the complementary slackness condition. For any


(λ, µ) ∈ Rm l
+ × R+ , again by the feasibility of x̄,

m
X l
X m
X l
X
λi gi (x̄) + µj hj (x̄) ≤ 0 = λ̄i gi (x̄) + µ̄j hj (x̄),
i=1 j=1 i=1 j=1

that is,

L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄), ∀ λ ∈ Rm l


+ , µ ∈ R+ ,

thereby leading to the desired result. The converse of the above the result can
be obtained in a manner similar to Theorem 4.2. 
In the convex programming problem (CP ) considered by Jaffray and
Pomerol [65], the problem involved only convex non-affine and affine inequal-
ities. Next we present a similar result from Florenzano and Van [47] to derive

© 2012 by Taylor & Francis Group, LLC


180 Saddle Points, Optimality, and Duality

the saddle point criteria under a modified version of Slater constraint quali-
fication but for a more general scenario involving additional affine equalities
and abstract constraints in (4.4). Consider the feasible set C of the convex
programming problem (CP ) as

C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m, hj (x) ≤ 0, j = 1, 2, . . . , s,
hj (x) = 0, j = s + 1, s + 2, . . . , l}, (4.9)

where gi : Rn → R, i = 1, 2, . . . , m, are convex non-affine functions;


hj : Rn → R, j = 1, 2, . . . , l, are affine functions; and X ⊂ Rn is a con-
vex set. Corresponding to this problem, the associated Lagrangian function
L : X × Rm l
+ × R → R is defined as

m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x),
i=1 j=1

where µ = (µ̂, µ̃) ∈ Rs+ × Rl−s . Then (x̄, λ̄, µ̄) is called the saddle point of the
above problem if

L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X, λ ∈ Rm l


+, µ ∈ R ,

ˆ, µ̄
where µ = (µ̂, µ̃) and µ̄ = (µ̄ ˜) are in Rs+ × Rl−s .

Theorem 4.4 Consider the convex programming problem (CP ) with C de-
fined by (4.9). Let x̄ be a point of minimizer of (CP ). Assume that there exists
x̂ ∈ ri X such that

hj (x̂) ≤ 0, j = 1, 2, . . . , s,
hj (x̂) = 0, j = s + 1, s + 2, . . . , l.

Then there exist (λ0 , λ) ∈ R+ × Rm + with (λ0 , λ) 6= (0, 0), and


µ = (µ̂, µ̃) ∈ Rs+ × Rl−s such that

m
X l
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x) + µj hj (x), ∀ x ∈ X,
i=1 j=1

λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s.

Proof. Consider the set

Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) − f (x̄) < y0 ,


gi (x) < yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}.

It can be easily shown as in the proof of Theorem 4.3 that Λ is a convex set.

© 2012 by Taylor & Francis Group, LLC


4.3 Affine Inequalities and Equalities and Saddle Point Condition 181

Also, Λ is nonempty because corresponding to the point of minimizer x̄ of


(CP ), one can define (y0 , y, z) ∈ Λ as

y0 > 0, yi > 0, i = 1, 2, . . . , m, and zj = hj (x̄), j = 1, 2, . . . , l.

As Λ is a nonempty convex set, by Proposition 2.14, ri Λ is also a nonempty


convex set. Note that

Λ ∩ (R1+m+s
− × {0Rl−s }) = ∅.

Otherwise, there exists an element in Λ such that the associated x ∈ X is


feasible for (CP ) satisfying f (x) < f (x̄), which is a contradiction to the fact
that x̄ is a point of minimizer of (CP ). Therefore, by Proposition 2.15,

ri Λ ∩ ri (R1+m+s
− × {0Rl−s }) = ∅.

Invoking the Proper Separation Theorem, Theorem 2.26 (iv), there exists
(λ0 , λ, µ) ∈ R1+m+l with (λ0 , λ, µ) 6= (0, 0, 0) such that
m
X l
X m
X s
X
λ0 y0 + λi yi + µj zj ≥ λ0 w0 + λi wi + µj vj (4.10)
i=1 j=1 i=1 j=1

for every (y0 , y, z) ∈ Λ and (w0 , w, v) ∈ R1+m+s , and there exists


(y0′ , y ′ , z ′ ) ∈ Λ such that
m
X l
X
λ0 y0′ + λi yi′ + µj zj′ > 0. (4.11)
i=1 j=1

Let us partition µ = (µ̂, µ̃) ∈ Rs × Rl−s . We claim that λ0 ≥ 0, λ ∈ Rm +


and µ̂ ∈ Rs+ . Corresponding to the point of minimizer x̄, choose y0 > 0,
yi > 0, i = 1, 2, . . . , m, and zj = hj (x̄), j = 1, 2, . . . , l. From condition (4.10),
for i′ = 0, 1, . . . , m,

iX−1 m
X Xl Xm Xs
1
λi′ ≥ {− λi yi − λi yi − µj zj + λ0 w0 + λi wi + µj vj }.
yi′ i=0 ′ j=1 i=1 j=1
i=i +1

Taking the limit as yi′ → ∞ yields λi′ ≥ 0, i′ = 0, 1, . . . , m. Again from


(4.10), for j ′ = 1, 2, . . . , s,

Xm Xl Xm
1
µ ≥
j′ {λ0 y0 + λi yi + µj zj − λ0 w0 − λi wi
vj ′ i=1 j=1 i=1

jX −1 s
X
− µj vj − µj vj }.
j=1 j=j ′ +1

Taking the limit as vj ′ → ∞ leads to µj ′ ≥ 0, j ′ = 1, 2, . . . , s.

© 2012 by Taylor & Francis Group, LLC


182 Saddle Points, Optimality, and Duality

Now consider any x ∈ X and δ > 0. Define


y0 = f (x) − f (x̄) + δ,
yi = gi (x) + δ, i = 1, 2, . . . , m,
zj = hj (x), j = 1, 2, . . . , l.
Therefore, (y0 , y, z) ∈ Λ and for (0, 0, 0) ∈ Rm+s
− × {0Rl−s }, the condition
(4.10) yields that for every x ∈ X and every δ > 0,
m
X l
X m
X
λ0 (f (x) − f (x̄)) + λi gi (x) + µj hj (x) + λi δ ≥ 0.
i=1 j=1 i=0

Because δ > 0 was arbitrarily chosen, as δ → 0 the above condition reduces


to
m
X l
X
λ0 (f (x) − f (x̄)) + λi gi (x) + µj hj (x) ≥ 0, ∀ x ∈ X. (4.12)
i=1 j=1

In particular, for x = x̄, condition (4.12) yields


m
X l
X
λi gi (x̄) + µj hj (x̄) ≥ 0,
i=1 j=1

which along with the feasibility of x̄ for (CP ) leads to


λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s,
as in the proof of Theorem 4.3.
We claim that (λ0 , λ) 6= (0, 0). On the contrary, suppose that
(λ0 , λ) = (0, 0). Therefore, condition (4.12) leads to
l
X
µj hj (x) ≥ 0, ∀ x ∈ X.
j=1

By the given hypothesis, for x̂ ∈ ri X along with the above inequality implies
that
l
X
µj hj (x̂) = 0,
j=1
Pl
that is, the affine function j=1 µj hj (.) achieves its minimum at a relative
interiorPpoint. Because an affine function achieves its minimum at a boundary
l
point, j=1 µj hj (.) has a constant value zero over X, that is,
l
X
µj hj (x) = 0, ∀ x ∈ X. (4.13)
j=1

© 2012 by Taylor & Francis Group, LLC


4.3 Affine Inequalities and Equalities and Saddle Point Condition 183

Corresponding to (y0′ , y ′ , z ′ ) ∈ Λ satisfying (4.11) there exists x′ ∈ X such


that
l
X
µj hj (x′ ) > 0,
j=1

which contradicts (4.13). Therefore, λi , i = 0, 1, . . . , m, are not all simultane-


ously zero, which along with (4.12) leads to the desired result. 
Theorem 4.5 Consider the convex programming problem (CP ) with C de-
fined by (4.9). Assume that the modified Slater constraint qualification is sat-
isfied, that is there exists x̂ ∈ ri X such that
gi (x̂) < 0, i = 1, 2, . . . , m,
hj (x̂) ≤ 0, j = 1, 2, . . . , s,
hj (x̂) = 0, j = s + 1, s + 2, . . . , l.
Then x̄ is a point of minimizer of (CP ) if and only if there exist λ̄ ∈ Rm
+,
¯ µ̃)
µ̄ = (µ̂, ¯ ∈ Rs+ × Rl−s such that

L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X, λ ∈ Rm l


+,µ ∈ R ,

where µ = (µ̂, µ̃) ∈ Rs+ × Rl−s along with


λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s.
Proof. Because the modified Slater constraint qualification is satisfied, the hy-
pothesis of Theorem 4.4 also holds. Thus, if x̄ is a point of minimizer of (CP ),
there exist (λ0 , λ) ∈ R+ ×Rm s
+ with (λ0 , λ) 6= (0, 0) and µ = (µ̂, µ̃) ∈ R+ × R
l−s

such that
m
X l
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x) + µj hj (x), ∀ x ∈ X (4.14)
i=1 j=1

and
λi gi (x̄) = 0, i = 1, 2, . . . , m, and µj hj (x̄) = 0, j = 1, 2, . . . , s. (4.15)
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. Because
(λ0 , λ) 6= (0, 0), λ =
6 0. Therefore, the optimality condition (4.14) becomes
m
X l
X
λi gi (x) + µj hj (x) ≥ 0, ∀ x ∈ X.
i=1 j=1

In particular, for x = x̂, the above condition along with the modified Slater
constraint qualification leads to
m
X l
X
0> λi gi (x̂) + µj hj (x̂) ≥ 0,
i=1 j=1

© 2012 by Taylor & Francis Group, LLC


184 Saddle Points, Optimality, and Duality

which is a contradiction. Thus, λ0 > 0 and hence dividing (4.14) throughout


by λ0 yields
m
X l
X
f (x̄) ≤ f (x) + λ̄i gi (x) + µ̄j hj (x), ∀ x ∈ X,
i=1 j=1

λi µj
where λ̄i = , i = 1, 2, . . . , m, and µ̄j = , j = 1, 2, . . . , l. This inequality
λ0 λ0
along with the condition (4.15) leads to

L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X.

As x̄ is feasible for (CP ), g(x̄) ∈ −Rm s


+ , −ĥ(x̄) ∈ R+ and h̃(x̄) = {0}Rl−s .
Therefore, for λ ∈ Rm + , µ = (µ̂, µ̃) ∈ R s
+ × Rl−s
,

m
X l
X m
X l
X
λi gi (x̄) + µj hj (x̄) ≤ 0 = λ̄i gi (x̄) + µ̄j hj (x̄),
i=1 j=1 i=1 j=1

which leads to

L(x̄, λ, µ) ≤ L(x̄, λ̄, µ̄),

thereby proving the desired saddle point result. The converse can be worked
out as in Theorem 4.2. 
Observe that the saddle point condition in the above theorem

L(x̄, λ̄, µ̄) ≤ L(x, λ̄, µ̄), ∀ x ∈ X

can be rewritten as
m
X s
X l
X
f (x̄) + λ̄i gi (x̄) + µ̂j hj (x̄) + µ̃j hj (x̄) + δX (x̄)
i=1 j=1 j=s+1
m
X s
X l
X
≤ f (x) + λi gi (x) + µ̂j hj (x) + µ̃j hj (x) + δX (x)
i=1 j=1 j=s+1

for every x ∈ Rn . The above inequality implies that


l
X s
X l
X
0 ∈ ∂(f + λi gi + µ̂j hj (x) + µ̃j hj (x) + δX )(x̄).
i=1 j=1 j=s+1

By the modified Tl qualification x̂ ∈ ri X and therefore,


Tm Slater constraint
ri dom f ∩ i=1 ri dom gi ∩ j=1 ri dom hj ∩ ri dom δX = ri X is

© 2012 by Taylor & Francis Group, LLC


4.4 Lagrangian Duality 185

nonempty. Applying the Sum Rule, Theorem 2.91 along with the fact that
∂δX (x̄) = NX (x̄) yields the KKT optimality condition
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∂hj (x̄) + ∂( µ̃j hj )(x̄) + NX (x̄).
i=1 j=1 j=s+1

By the affineness of hj , j = 1, 2, . . . , l, ∂hj (x̄) = {∇hj (x̄)}, thereby reducing


the above condition to the standard KKT optimality condition
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∇hj (x̄) + µ̃j ∇hj (x̄) + NX (x̄).
i=1 j=1 j=s+1

We state this discussion as the following result.


Theorem 4.6 Consider the convex programming problem (CP ) with C de-
fined by (4.9). Assume that the modified Slater constraint qualification is
satisfied. Then x̄ is a point of minimizer of (CP ) if and only if there exist
λi ≥ 0, i = 1, 2, . . . , m; µ̂j ≥ 0, j = 1, 2, . . . , s; and µ̃j ∈ R, j = s + 1, . . . , l,
such that
m
X s
X l
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + µ̂j ∇hj (x̄) + µ̃j ∇hj (x̄) + NX (x̄)
i=1 j=1 j=s+1

along with
λi gi (x̄) = 0, i = 1, 2, . . . , m, and µ̂j hj (x̄) = 0, j = 1, 2, . . . , s.

4.4 Lagrangian Duality


In the beginning of this chapter, we tried to motivate the notion of a saddle
point using two-person-zero-sum games. We observed that two optimization
problems were being simultaneously solved. Player 1 was solving a minimiza-
tion problem while Player 2 was solving a maximization problem. The maxi-
mization problem is usually referred to as the dual of the minimization prob-
lem. Similarly, corresponding to the problem (CP ), one can actually construct
a dual problem following an approach quite similar to that of the two-person-
zero-sum games. Consider the problem (CP ) with the feasible set given by
(4.9). Then if vL denotes the optimal value of (CP ), then observe that
vL = inf sup L(x, λ, µ̂, µ̃),
x∈C (λ,µ̂,µ̃)∈Ω

where Ω = Rm s
+ × R+ × R
l−s
. Taking a clue from the two-person-zero-sum
games, the dual problem to (CP ) that we denote by (DP ) can be stated as

© 2012 by Taylor & Francis Group, LLC


186 Saddle Points, Optimality, and Duality

sup w(λ, µ̂, µ̃) subject to (λ, µ̂, µ̃) ∈ Ω, (DP )


where w(λ, µ̂, µ̃) = min L(x, λ, µ̂, µ̃). We denote the optimal value of (DP )
x∈X
by dL . Our main aim here is to check if

dL = sup w(λ, µ̂, µ̃) = vL , (4.16)


(λ,µ̂,µ̃)∈Ω

that is,

sup inf L(x, λ, µ̂, µ̃) = inf sup L(x, λ, µ̂, µ̃).
(λ,µ̂,µ̃)∈Ω x∈X x∈C (λ,µ̂,µ̃)∈Ω

The statement (4.16) is known as strong duality. We now present a result that
shows when strong duality holds.

Theorem 4.7 Consider the problem (CP ) where the set C is defined by (4.9).
Assume that (CP ) has a lower bound, that is, it has an infimum value, vL ,
that is finite. Also, assume that the modified Slater constraint qualification is
satisfied. Then the dual problem (DP ) has a supremum and the supremum is
attained with

vL = d L .

Proof. We always have vL ≥ dL . This is absolutely straightforward and we


urge the reader to establish this. This is called weak duality.
The problem (CP ) has an infimum, vL , that is,

vL = inf f (x).
x∈C

Working along the lines of the proof of Theorem 4.4, we conclude from (4.12)
ˆ, µ̄
that there exists nonzero (λ̄0 , λ̄, µ̄ ˜) ∈ R+ × Rm s
+ × R+ × R
l−s
such that

m
X s
X l
X
λ̄0 (f (x) − vL ) + λ̄i gi (x) + ˆj hj (x) +
µ̄ ˜j hj (x) ≥ 0, ∀ x ∈ X.
µ̄
i=1 j=1 j=s+1

As the modified Slater constraint qualification holds, by Theorem 4.5, it is


simple to observe that λ̄0 6= 0 and without loss of generality, assume λ̄0 = 1.
Hence,
m
X s
X l
X
(f (x) − vL ) + λ̄i gi (x) + ˆj hj (x) +
µ̄ ˜j hj (x) ≥ 0, ∀ x ∈ X.
µ̄
i=1 j=1 j=s+1

Therefore,

ˆ, µ̄
L(x, λ̄, µ̄ ˜) ≥ vL , ∀ x ∈ X,

© 2012 by Taylor & Francis Group, LLC


4.4 Lagrangian Duality 187

ˆ, µ̄
that is, w(λ̄, µ̄ ˜) ≥ vL . Hence,

sup ˆ, µ̄
w(λ, µ̂, µ̃) ≥ w(λ̄, µ̄ ˜ ) ≥ vL .
(λ,µ̂,µ̃)∈Ω

By the weak duality, vL ≥ sup(λ,µ̂,µ̃)∈Ω w(λ, µ̂, µ̃). Thus,

sup w(λ, µ̂, µ̃) = vL = inf f (x),


(λ,µ̂,µ̃)∈Ω x∈C

thereby establishing the strong duality between (CP ) and (DP ). 


It is important to note that the assumption of the Slater constraint qual-
ification is quite crucial as its absence can give a positive duality gap. We
provide below the following famous example due to Duffin [35].

Example 4.8 Consider the primal problem


q
inf ex2 subject to x21 + x22 ≤ x1 .

The Lagrangian dual problem is

max w(λ) subject to λ ∈ Rm


+,

where
q
w(λ) = inf2 ex2 + λ( x21 + x22 − x1 ), λ ≥ 0.
x∈R

Observe that the only feasible point of the primal problem is (x1 , x2 ) = (0, 0)
and hence inf ex2 = e0 = 1. Thus, the minimum value or the infimum value
of the primal problem is vL = 1. Now let us evaluate the p function w(λ) for
each λ ≥ 0. Observe that for every fixed x2 , the term ( x21 + x22 − x1 ) → 0
as x1 → +∞. Thus, for each x2 , the value ex2 dominates the expression
q
ex2 + λ( x21 + x22 − x1 )

as x1 → +∞. Hence, for a fixed x2 ,


q
inf ex2 + λ( x21 + x22 − x1 ) = ex2 .
x1

By letting x2 → −∞,

w(λ) = 0, ∀ λ ≥ 0.

Therefore, the supremum value of the dual problem is dL = 0. Hence, there


is a positive duality gap. Observe that the Slater constraint qualification does
not hold in the primal case.

© 2012 by Taylor & Francis Group, LLC


188 Saddle Points, Optimality, and Duality

We are now going to present some deeper properties of the dual variables
(or Lagrange multipliers) for the problem (CP ) with convex non-affine in-
equality, that is, the feasible set C is given by (4.1),

C = {x ∈ X : gi (x) ≤ 0, i = 1, 2, . . . , m}.

The set of Lagrange multipliers at a given solution x̄ of (CP ) is given as


m
X
M(x̄) = {λ ∈ Rm
+ : 0 ∈ ∂f (x̄) + λi ∂gi (x̄), λi gi (x̄) = 0, i = 1, 2, . . . , m}.
i=1

It is quite natural to think that when we change x̄, the set of multipliers will
also change. We now show that for a convex programming problem, the set
M(x̄) does not depend on the solution x̄. Consider the set

M = {λ ∈ Rm
+ : inf f (x) = infn L(x, λ)}. (4.17)
x∈C x∈R

In the following result we show that M(x̄) = M for any solution x̄ of (CP ).
The proof of this fact is from Attouch, Buttazzo, and Michaille [3].

Theorem 4.9 Consider the convex programming problem (CP ) with C de-
fined by (4.1). Let x̄ be the point of minimizer of (CP ). Then M(x̄) = M.

Proof. Suppose that λ ∈ M(x̄). Then

0 ∈ ∂x L(x̄, λ)

with λi gi (x̄) = 0, i = 1, 2, . . . , m, where ∂x L denotes the subdifferential with


respect to x. Hence, x̄ solves the problem

min L(x, λ) subject to x ∈ Rn .

Therefore, for every x ∈ Rn ,


m
X m
X
f (x̄) + λi gi (x̄) ≤ f (x) + λi gi (x),
i=1 i=1

which along with λi gi (x̄) = 0, i = 1, 2, . . . , m, implies


m
X
f (x̄) ≤ f (x) + λi gi (x), ∀ x ∈ Rn .
i=1

Thus,
m
X
f (x̄) = infn (f + λi gi )(x) = infn L(x, λ).
x∈R x∈R
i=1

© 2012 by Taylor & Francis Group, LLC


4.4 Lagrangian Duality 189

Further, f (x̄) = inf f (x). Hence, λ ∈ M.


x∈C
Conversely, suppose that λ ∈ M, which implies
m
X
f (x̄) = infn (f + λi gi )(x).
x∈R
i=1

Therefore,
m
X
f (x̄) ≤ f (x̄) + λi gi (x̄),
i=1

thereby yielding
m
X
λi gi (x̄) ≥ 0.
i=1

The above inequality along with the feasibility of x̄ for (CP ) and nonnegativity
of λi , i = 1, 2, . . . , m, leads to
m
X
λi gi (x̄) = 0.
i=1

This further yields

λi gi (x̄) = 0, i = 1, 2, . . . , m.

Thus,
m
X m
X
f (x̄) + λi gi (x̄) = infn (f + λi gi )(x),
x∈R
i=1 i=1

which implies that x̄ solves the problem

min L(x, λ) subject to x ∈ Rn .

Therefore, 0 ∈ ∂x L(x̄, λ). As dom f = dom gi = Rn , i = 1, 2, . . . , m, applying


the Sum Rule, Theorem 2.91,
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄).
i=1

This combined with the fact that λi gi (x̄) = 0, i = 1, 2, . . . , m, shows that


λ ∈ M(x̄), thereby establishing that M(x̄) = M. 

Remark 4.10 In the above theorem, x̄ was chosen to be any arbitrary solu-
tion of (CP ). Thus, it is clear that M(x̄) is independent of the choice of x̄
and hence M(x̄) = M for every solution x̄ of (CP ).

© 2012 by Taylor & Francis Group, LLC


190 Saddle Points, Optimality, and Duality

Note that the above result can be easily extended to the problem with
feasible set C defined by (4.9), that is, convex non-affine and affine inequalities
along with affine equalities. If we take a careful look at the set M, we realize
that for λ ∈ Rm + it is not essential that (CP ) has a solution; one merely
needs (CP ) to be bounded below. Thus Attouch, Buttazzo, and Michaille [3]
call the set M to be the set of generalized Lagrange multipliers. Of course
if (CP ) has a solution, then M is the set of Lagrange multipliers. We now
show how deeply the notion of Lagrange multipliers is associated with the
perturbation of the constraints of the problem. From a numerical point of
view, it is important to deal with constraint perturbations. Note that due to
rounding off and other errors, often the iterates do not satisfy the constraints
exactly but some perturbed version of it, that is, possibly in the form

gi (x) ≤ yi , i = 1, 2, . . . , m.

Thus, the function

v(y) = inf{f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}

is called the value function or the marginal function associated with (CP ). It
is obvious that if v(0) ∈ R, then v(0) is the optimal value of (CP ). We now
establish that v : Rm → R̄ is a convex function. In order to show that, we
need the following interesting and important lemma.

Lemma 4.11 Consider Φ : Rn × Rm → R ∪ {+∞}, which is convex in both


variables. Then the function

φ(v) = infn Φ(u, v)


u∈R

is a convex function in v.

Proof. Consider (vi , αi ) ∈ epis φ, i = 1, 2, that is,

φ(vi ) < αi , i = 1, 2.

Therefore, there exist ū1 , ū2 ∈ Rn such that by the definition of infimum,

Φ(ūi , vi ) < αi , i = 1, 2.

By the convexity of Φ, for every λ ∈ [0, 1],

Φ((1 − λ)ū1 + λū2 , (1 − λ)v1 + λv2 ) ≤ (1 − λ)Φ(ū1 , v1 ) + λΦ(ū2 , v2 )


< (1 − λ)α1 + λα2 ,

which implies

φ((1 − λ)v1 + λv2 ) < (1 − λ)α1 + λα2 , ∀ λ ∈ [0, 1].

© 2012 by Taylor & Francis Group, LLC


4.4 Lagrangian Duality 191

Thus

((1 − λ)v1 + λv2 ), (1 − λ)α1 + λα2 ) ∈ epis φ,

which by Proposition 2.50 leads to the convexity of φ. 


Observe that the value function can be expressed as

v(y) = infn {f (x) + δC(y) (x)}, (4.18)


x∈R

where C(y) = {x ∈ Rn : gi (x) ≤ yi , i = 1, 2, . . . , m}. Now to prove the


convexity of the value function, what one needs to show is that f (x)+δC(y) (x)
is convex in both the variables x as well as y, and we leave it to the reader.
Once that is done, we just have to use Lemma 4.11 to conclude that v is a
convex function.
Through the following result given in Attaouch, Buttazzo, and
Michaille [3], we show how the Lagrange multipliers (or the generalized La-
grange multipliers) are related to the value function.

Theorem 4.12 (i) Let v(0) ∈ R, then M = −∂v(0). Further, if the Slater
constraint qualification holds, then v is continuous at the origin and hence M
is convex compact set in Rm +.

(ii) Consider the problem


sup − v ∗ (−λ) subject to λ ∈ Rm
+. (DP 1)
The solutions of (DP 1) coincide with the set M. Further, for every λ ∈ Rm
+,

m
X
−v ∗ (−λ) = infn {f (x) + λi gi (x)}.
x∈R
i=1

Thus the problem (DP 1) coincides with the Lagrangian dual problem of (CP ).

Proof. (i) We begin by proving M = −∂v(0). Consider any λ ∈ M. By the


definition of the value function v (4.18) and M (4.17),
m
X
v(0) = infn {f + δC }(x) = infn {f + λi gi }(x).
x∈R x∈R
i=1

For any given y ∈ Rm , consider the set

C(y) = {x ∈ Rn : gi (x) ≤ yi , i = 1, 2, . . . , m}.

As λ ∈ Rm
+ , for any x ∈ C(y),

m
X m
X
λi gi (x) ≤ λi yi ,
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


192 Saddle Points, Optimality, and Duality

which implies that


m
X m
X
f (x) + λi gi (x) ≤ f (x) + λi yi .
i=1 i=1

Therefore,
m
X m
X
inf {f + λi gi }(x) ≤ inf f (x) + λi yi . (4.19)
x∈C(y) x∈C(y)
i=1 i=1

Because C(y) ⊂ Rn , by Proposition 1.7,


m
X m
X
infn {f + λi gi }(x) ≤ inf {f + λi gi }(x).
x∈R x∈C(y)
i=1 i=1

As λ ∈ M, by (4.17) along with (4.19) leads to


m
X
v(0) ≤ v(y) + λi yi ,
i=1

that is,

v(y) ≥ v(0) + h−λ, y − 0i, ∀ y ∈ Rm .

This yields that −λ ∈ ∂v(0), thereby establishing that M ⊂ −∂v(0).


Conversely, suppose that λ ∈ −∂v(0), that is, −λ ∈ ∂v(0). We will prove
that λ ∈ M. Consider any y ∈ Rm + . Then it is easy to observe that C ⊂ C(y).
Again by Proposition 1.7,

inf f (x) ≥ inf f (x),


x∈C x∈C(y)

that is,

v(0) ≥ v(y), ∀ y ∈ Rm
+.

As −λ ∈ ∂v(0), which along with the above inequality leads to

hλ, yi ≥ v(0) − v(y) ≥ 0.

Because y ∈ Rm m
+ was arbitrary, it is clear that λ ∈ R+ . We now establish that
λ ∈ M by proving that
m
X
inf f (x) = infn {f + λi gi }(x),
x∈C x∈R
i=1

that is,
m
X
v(0) = infn {f + λi gi }(x).
x∈R
i=1

© 2012 by Taylor & Francis Group, LLC


4.4 Lagrangian Duality 193
Pm
Note that if x ∈ C, gi (x) ≤ 0, i = 1, 2, . . . , m. Then i=1 λi gi (x) ≤ 0 as
λi ≥ 0, i = 1, 2, . . . , m. Thus,
m
X
f (x) + λi gi (x) ≤ f (x), ∀ x ∈ C.
i=1

Therefore,
m
X m
X
infn {f + λi gi }(x) ≤ inf {f + λi gi }(x) ≤ inf f (x) = v(0). (4.20)
x∈R x∈C x∈C
i=1 i=1

The fact that −λ ∈ ∂v(0) leads to

v(y) + hλ, yi ≥ v(0), ∀ y ∈ Rm ,

that is, for every y ∈ Rm ,


m
X
v(y) + λi yi ≥ v(0).
i=1

Consider any x̃ ∈ Rn and set ỹ = gi (x̃), i = 1, 2, . . . , m. Therefore, the above


inequality leads to
m
X
v(ỹ) + λi gi (x̃) ≥ v(0).
i=1

By the definition (4.18) of value function v(ỹ) ≤ f (x̃), which along with the
above inequality leads to
m
X
f (x̃) + λi gi (x̃) ≥ v(0).
i=1

Because x̃ was arbitrary,


m
X
infn {f + λi gi }(x) ≥ v(0). (4.21)
x∈R
i=1

Combining (4.21) with (4.20),


m
X
v(0) = infn {f + λi gi }(x).
x∈R
i=1

Therefore, λ ∈ M and thus establishing that M = −∂v(0).


Now assume that v(0) is finite and the Slater constraint qualification holds,
that is, there exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. Thus there

© 2012 by Taylor & Francis Group, LLC


194 Saddle Points, Optimality, and Duality

exists δ > 0 such that for every y ∈ Bδ (0) = δB, gi (x̂) < yi , i = 1, 2, . . . , m,
which implies that
v(y) ≤ f (x̂), ∀ y ∈ Bδ (0). (4.22)
As dom f = Rn , f (x̂) < +∞, thereby establishing that v is bounded above
on Bδ (0). This fact shows that

Bδ (0) × [f (x̂), +∞) ⊂ epi v.

We claim that v(y) > −∞ for every y ∈ Rm . On the contrary, assume that
there exists ŷ ∈ Rm such that v(ŷ) = −∞. Thus,

{ŷ} × R ⊂ epi v.

Consider z = −αŷ such that α > 0 and kzk < δ. This is possible by choosing
δ 1 1−λ
α= . Setting λ = , we have λ ∈ (0, 1) and α = . This implies
2kŷk 1+α λ
−(1 − λ)
that z = ŷ, that is,
λ
λz + (1 − λ)ŷ = 0.

By choice, z ∈ Bδ (0), which by (4.22) implies that v(z) ≤ f (x̂) and thus,

(z, f (x̂)) ∈ Bδ (0) × [f (x̂), +∞) ⊂ epi v.

Further, for every t ∈ R,

(ŷ, t) ∈ {ŷ} × R ⊂ epi v.

As v is convex, by Proposition 2.48, epi v is a convex set, which implies that

(λz + (1 − λ)ŷ, λf (x̂) + (1 − λ)t) ∈ epi v,

that is,

(0, λf (x̂) + (1 − λ)t) ∈ epi v.

Therefore,

v(0) ≤ λf (x̂) + (1 − λ)t, ∀ t ∈ R.

Taking the limit as t → −∞, v(0) ≤ −∞. But v(0) ≥ −∞ and hence
v(0) = −∞, which is a contradiction because v(0) ∈ R. By Theorem 2.72,
the function v : Rm → R ∪ {+∞} is majorized on a neighborhood of the ori-
gin and hence v is continuous at y = 0. Then by Proposition 2.82, ∂v(0) is
convex compact set, which implies so is M.
(ii) We already know that λ ∈ M if and only −λ ∈ ∂v(0). Therefore, from
Theorem 2.108,

−λ ∈ ∂v(0) ⇐⇒ v(0) + v ∗ (−λ) = 0,

© 2012 by Taylor & Francis Group, LLC


4.4 Lagrangian Duality 195

which implies λ ∈ M if and only if

v(0) + v ∗ (−λ) = 0.

From (i) we know that v is continuous at y = 0. Thus, by Proposition 2.106,


v(0) = v ∗∗ (0). By Definition 2.101 of the biconjugate,

v ∗∗ (0) = sup {−v ∗ (µ)} = sup {−v ∗ (−µ)}.


µ∈Rm µ∈Rm

Thus λ ∈ M if and only if

−v ∗ (−λ) = v ∗∗ (0) = sup {−v ∗ (−µ)},


µ∈Rm

which is equivalent to the fact that λ solves the problem

sup −v ∗ (µ) subject to µ ∈ Rm .

Observe that

v ∗ (µ) = sup {hµ, yi − v(y)}


y∈Rm
= sup {hµ, yi − inf {f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}}
y∈Rm x

= sup {hµ, yi + sup{−f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}}


y∈Rm x

= sup {hµ, yi − f (x) : gi (x) ≤ yi , i = 1, 2, . . . , m}.


(y,x)∈Rm ×Rn

If for some i ∈ {1, 2, . . . , m}, µi > 0, then v ∗ (µ) = +∞. So assume that
µ ∈ −Rm + . Then

m
X
v ∗ (µ) = sup {−f (x) + sup µi yi }.
x∈Rn yi ≥gi (x) i=1

Pm
As i=1 µi yi = hµ, yi is a linear function,
m
X m
X
sup µi yi = µi gi (x).
yi ≥gi (x) i=1 i=1

Hence, for µ ∈ −Rm


+,

Xm
v ∗ (µ) = sup { µi gi (x) − f (x)}.
x∈Rn i=1

In particular, for µ = −λ,


m
X
v ∗ (−λ) = sup {−(f (x) + λi gi (x))},
x∈Rn i=1

© 2012 by Taylor & Francis Group, LLC


196 Saddle Points, Optimality, and Duality

which implies
m
X

−v (−λ) = infn {f (x) + λi gi (x)}.
x∈R
i=1

Thus, −v ∗ (−λ) = w(λ), thereby showing that the dual problems (DP ) and
(DP 1) are the same. 

4.5 Fenchel Duality


In the last section it was clear that the notion of conjugation is linked to the
understanding of Lagrangian duality. In this section we explore this relation a
bit more. We will focus on Fenchel duality where the dual problem is expressed
explicitly in terms of the conjugate functions. Also we shall make a brief
presentation of Rockafellar’s perturbation approach to duality. Our approach
to Fenchel duality will be that of Borwein and Lewis [17], which we present
below.

Theorem 4.13 Consider proper convex functions f : Rn → R̄ and


g : Rm → R̄ and a linear map A : Rn → Rm . Let vF , dF ∈ R̄ be the opti-
mal values of the primal and the dual problems given below:

vF = infn {f (x) + g(Ax)} and dF = sup {−f ∗ (AT λ) − g ∗ (−λ)},


x∈R φ∈Rm

where AT denotes the conjugate of the linear map A or the transpose of the
matrix represented by A. In fact, A can be viewed as an m×n matrix. Assume
that the condition

0 ∈ core(dom g − A dom f )

holds. Then vF = dF and the supremum in the dual problem is attained if the
optimal value is finite. (Instead of the term core, one can also use interior or
relative interior.)

Proof. We first prove that vF ≥ dF , that is, the weak duality holds. By the
definition of conjugate function, Definition 2.101,

f ∗ (AT λ) = sup {hAT λ, xi − f (x)} ≥ hλ, Axi − f (x), ∀ x ∈ Rn ,


x∈Rn

which implies

f (x) ≥ hλ, Axi − f ∗ (AT λ), ∀ x ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


4.5 Fenchel Duality 197

Similarly, we have

g(Ax) ≥ −hλ, Axi − g ∗ (−λ), ∀ x ∈ Rn .

The above inequalities immediately show that for any λ ∈ Rm and any x ∈ Rn ,

f (x) + g(Ax) ≥ −f ∗ (AT λ) − g ∗ (−λ).

Thus, the above inequality it yields that vF ≥ dF .


Next, to prove the equality under the given constraint qualification, define
the function h : Rm → R̄ as

h(y) = infn {f (x) + g(Ax + y)}.


x∈R

In the parlance of optimization, h is referred to as the optimal value function or


just a value function. Here the vector y acts as a parameter. See the previous
section for more details. Using Lemma 4.11, it is easy to observe that h is
convex. We urge the reader to reason out for himself / herself. Further, one
must decide what dom h is. We claim that

dom h = dom g − A dom f.

Consider y ∈ dom h, that is, h(y) < +∞. Hence there exists x ∈ Rn such that
x ∈ dom f and Ax + y ∈ dom g, which leads to

y ∈ dom g − A dom f.

This holds for every y ∈ dom h and thus

dom h ⊂ dom g − A dom f.

Let z ∈ dom g − A dom f , which implies that there exists u ∈ dom g and
x̂ ∈ dom f such that z = u − Ax̂. Hence z + Ax̂ ∈ dom g, that is,

f (x̂) + g(z + Ax̂) < +∞.

Thus h(z) < +∞, thereby showing that z ∈ dom h. This proves the assertion
toward the domain of h.
Note that if vF = −∞, there is nothing to prove. Without loss of gen-
erality, we assume that vF is finite. By assumption, 0 ∈ core(dom h) (or
0 ∈ int(dom h)). By Proposition 2.82, ∂h(0) 6= ∅, which implies that there
exists −ξ ∈ ∂h(0). Thus, by Definition 2.77 of the subdifferential along with
the definition of h,

h(0) ≤ h(y) + hξ, yi ≤ f (x) + g(Ax + y) + hξ, yi, ∀ y ∈ Rm .

Hence,

h(0) ≤ {f (x) − hA∗ ξ, xi} + {g(Ax + y) − h−ξ, Ax + yi}.

© 2012 by Taylor & Francis Group, LLC


198 Saddle Points, Optimality, and Duality

Taking the infimum first over y and then over x yields that

h(0) ≤ −f ∗ (A∗ ξ) − g ∗ (−ξ) ≤ dF ≤ vF ≤ h(0),

thereby establishing that vF = dF . Observe that the dual value is obtained at


λ = ξ. 
It is important to mention that the above problem was also studied by
Rockafellar [97]. In Rockafellar [97], the function g is taken to be a concave
function, Definition 2.46, and the objective function of the primal problem
and the dual problem are, respectively, given as

f (x) − g(Ax) and g∗ (λ) − f ∗ (AT λ).

Further, g∗ denotes the conjugate of the concave function g, which is defined


as

g∗ (λ) = infm {hλ, yi − g(y)}.


y∈R

From the historical point of view, we provide the statement of the classical
Fenchel duality theorem as it appears in Rockafellar [97].

Theorem 4.14 Consider a proper convex function f : Rn → R̄ and a proper


concave function g : Rn → R̄. Assume that one of the following conditions
holds:
(1) ri(dom f ) ∩ ri(dom g) 6= ∅,
(2) f and g are lsc and ri(dom f ∗ ) ∩ ri(dom g∗ ) 6= ∅.
Then

inf {f (x) − g(x)} = sup {g∗ (λ) − f ∗ (λ)}. (4.23)


x∈Rn λ∈Rn

We request the readers to figure out how one will define the notion of a
proper concave function. Of course, if we consider g to be a convex function,
(4.23) can be written as

inf {f (x) + g(x)} = sup {−g ∗ (−λ) − f ∗ (λ)}.


x∈Rn λ∈Rn

Note that this can be easily proved using Theorem 4.13 by taking A to be
the identity mapping I : Rn → Rn . Moreover, ri(dom f ) ∩ ri(dom g) 6= ∅
shows that 0 ∈ int(dom g − dom f ). Hence the result follows by invoking
Theorem 4.13.
We now look into the perturbation-based approach. This approach is due
to Rockafellar. Rockafellar’s monogarph [99] entitled Conjugate Duality and
Optimization makes a detailed study of this method in an infinite dimen-
sional setting. We however discuss the whole issue from a finite dimensional

© 2012 by Taylor & Francis Group, LLC


4.5 Fenchel Duality 199

viewpoint. In this approach, one considers the original problem being embed-
ded in a family of problems. In fact, we begin by considering the convexly
parameterized family of convex problems
min F (x, y) subject to x ∈ Rn , (CP (y))
where the vector y is called the parameter and the function F : Rn × Rm → R̄
is assumed to be proper convex jointly in x and y. In fact, in such a situation,
the optimal value function

v(y) = inf F (x, y)


x∈C

is a convex function by Lemma 4.11. Of course, the function F is so chosen


that f0 (x) = F (x, 0), where

f (x), x ∈ C,
f0 (x) =
+∞, otherwise.

In fact, (CP ) can be viewed as

min f0 (x) subject to x ∈ Rn ,

thus embedding the original problem (CP ) in (CP (y)).


Now we pose the standard convex optimization problem as (CP (y)). Con-
sider the problem (CP ) with C given by (3.1), that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},

where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Corresponding to


(CP ), introduce the family of parameterized problems (CP (y)) as follows

min F (x, y) subject to x ∈ Rn ,

where

f (x), gi (x) ≤ yi , i = 1, 2, . . . , m,
F (x, y) =
+∞, otherwise.
It is clear that

f (x), gi (x) ≤ 0, i = 1, 2, . . . , m,
F (x, 0) = f0 (x) =
+∞, otherwise.

Recall that the Lagrangian function corresponding to (CP ) is given by



f (x) + hλ, g(x)i, λ ∈ Rm
+,
L(x, λ) =
+∞, otherwise.

Next we look at how to construct the dual problem for (CP (y)). Define the
Lagrangian function L : Rn × Rm → R̄ as

L(x, λ) = infm {F (x, y) + hy, λi},


y∈R

© 2012 by Taylor & Francis Group, LLC


200 Saddle Points, Optimality, and Duality

that is,

−L(x, λ) = sup {hy, λi − F (x, y)}.


y∈Rm

Observe that

F ∗ (x∗ , λ∗ ) = sup {hx∗ , xi + hλ∗ , yi − F (x, y)}


x∈Rn ,y∈Rm
= sup {hx , xi + sup (hλ∗ , yi − F (x, y))}

x∈Rn y∈Rm
= sup {hx , xi − L(x, λ∗ )}.

x∈Rn

Thus,

−F ∗ (0, λ∗ ) = infn L(x, λ∗ ).


x∈R

Hence the Fenchel dual problem associated with (CP ) is


sup (−F ∗ (0, λ)) subject to λ ∈ Rm . (DPF )
With the given Lagrangian in a similar fashion as before, one can define a
saddle point (x, λ) of the Lagrangian function L(x, λ).
We now state without proof the following result. For proof, see for example
Lucchetti [79] and Rockafellar [99].

Theorem 4.15 Consider the problem (CP ) and (DPF ) as given above. Then
the following are equivalent:

(i) (x̄, λ̄) be a saddle point of L,

(ii) x̄ is a solution for (CP ) and λ̄ is a solution for (DPF ) and there is no
duality gap.

For more details on the perturbation-based approach, see Lucchetti [79]


and Rockafellar [99].

4.6 Equivalence between Lagrangian and Fenchel Dual-


ity
In the previous sections, we studied two types of duality theories, namely the
Lagrangian duality and the Fenchel duality. The obvious question that comes
to mind is whether the two theories are equivalent or not. It was shown by
Magnanti [81] that for a convex programming problem, both these forms of

© 2012 by Taylor & Francis Group, LLC


4.6 Equivalence between Lagrangian and Fenchel Duality 201

duality coincide. We end this chapter by taking a look at the equivalence


between the two duality theories based on the approach of Magnanti [81].
Consider the following convex programming problems:

Lagrange: inf f (x) subject to x ∈ C,


Fenchel: inf (f1 (x) − f2 (x)) subject to x ∈ C1 ∩ C2 ,

where

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m,
hj (x) = 0, j = 1, 2, . . . , l, x ∈ X},

f : X → R, f1 : C1 → R are convex functions; f2 : C2 → R is a con-


cave function; gi : X → R, i = 1, 2, . . . , m, are convex non-affine functions;
hj : Rn → R, j = 1, 2, . . . , l, are affine functions; and C1 , C2 , X are convex
subsets of Rn . Denote the optimal values of the Lagrangian and the Fenchel
convex programming problems as vL and vF , respectively. Observe that the
Lagrangian problem is a particular case of (CP ) with C given by (4.9). Cor-
responding to the two convex programming problem, we have the following
dual problems:

Lagrange: sup inf L(x, λ, µ) subject to (λ, µ) ∈ Rm l


+ ×R ,
x∈X
Fenchel: sup ((f2 )∗ (ξ) − f1∗ (ξ)) subject to ξ ∈ Rn ,

where the Lagrangian function L : Rn × Rm l


+ × R is defined as

m
X l
X
L(x, λ, µ) = f (x) + λi gi (x) + µj hj (x).
i=1 j=1

As fi are defined over Ci , that is, dom fi = Ci for i = 1, 2, the conjugate


functions reduce to

f1∗ (ξ) = sup {hξ, xi − f1 (x)},


x∈C1
(f2 )∗ (ξ) = inf {hξ, xi − f2 (x)}.
x∈C2

Denote the optimal values of the Lagrangian and the Fenchel dual problems
as dL and dF , respectively. Note that f1∗ (ξ) = +∞ for some ξ ∈ Rn is a pos-
sibility. Similarly, for the concave conjugate, (f2 )∗ (ξ) = −∞ for some ξ ∈ Rn
is also a possibility. But these values play no role in the Fenchel dual problem
and thus the problem may be considered as

Fenchel: sup ((f2 )∗ (ξ) − f1∗ (ξ)) subject to ξ ∈ C1∗ ∩ C2∗ ,

where

C1∗ = {ξ ∈ Rn : f1∗ (ξ) < +∞} and C2∗ = {ξ ∈ Rn : (f2 )∗ (ξ) > −∞}.

© 2012 by Taylor & Francis Group, LLC


202 Saddle Points, Optimality, and Duality

By Theorem 4.7 and Theorem 4.14, we have the strong duality results for the
Lagrangian and the Fenchel problems, respectively, that is,
vL = d L and vF = d F .
Now we move on to show that the two strong dualities are equivalent. But
before doing so we present a result from Magnanti [81] on relative interior.
Lemma 4.16 Consider the set
Λ = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 ,
gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}.
If x̂ ∈ ri X such that
f (x̂) < ŷ0 , gi (x̂) < ŷi , i = 1, 2, . . . , m, and hj (x̂) = ẑj , j = 1, 2, . . . , l,
then (ŷ0 , ŷ, ẑ) ∈ ri Λ.
Proof. By the convexity of the functions f , gi , i = 1, 2, . . . , m, and hj ,
j = 1, 2, . . . , l, and the set X, it is easy to observe that the set Λ is con-
vex. It is left to the reader to verify this fact. To prove the result, we will
invoke the Prolongation Principle, Proposition 2.14 (iii).
Consider (y0 , y, z) ∈ Λ, that is, there exists x ∈ X such that
f (x) ≤ y0 , gi (x) ≤ yi , i = 1, 2, . . . , m, and hj (x) = zj , j = 1, 2, . . . , l.
n
Because X ⊂ R is a nonempty convex set and x̂ ∈ ri X, by the Prolongation
Principle, there exists γ > 1 such that
γ x̂ + (1 − γ)x ∈ X,
which by the convexity of X yields that
αx̂ + (1 − α)x ∈ X, ∀ α ∈ (1, γ]. (4.24)
As dom f = dom gi = X, i = 1, 2, . . . , m with x̂ ∈ ri X, for some α ∈ (1, γ],
f (αx̂ + (1 − α)x) < αŷ0 + (1 − α)y0 , (4.25)
gi (αx̂ + (1 − α)x) < αŷi + (1 − α)yi , i = 1, 2, . . . , m. (4.26)
By the affineness of hj , j = 1, 2, . . . , l,
hj (αx̂ + (1 − α)x) < αẑj + (1 − α)zj , j = 1, 2, . . . , l. (4.27)
Combining the conditions (4.24) through (4.27) yields that for α > 1,
α(ŷ0 , ŷ, ẑ) + (1 − α)(y0 , y, z) ∈ Λ.
Because (y0 , y, z) ∈ Λ was arbitrary, by the Prolongation Principle,
(ŷ0 , ŷ, ẑ) ∈ ri Λ as desired. 
Now we present the equivalence between the strong duality results.

© 2012 by Taylor & Francis Group, LLC


4.6 Equivalence between Lagrangian and Fenchel Duality 203

Theorem 4.17 Lagrangian strong duality is equivalent to Fenchel strong du-


ality, that is, Theorem 4.7 is equivalent to Theorem 4.14.
Proof. Suppose that the Lagrangian strong duality, Theorem 4.7, holds under
the assumption of modified Slater constraint qualification, that is, there exists
x̂ ∈ ri X such that
gi (x̂) < 0, i = 1, 2, . . . , m, and hj (x̂) = 0, j = 1, 2, . . . , l.
Define X = C1 × C2 × Rn and x = (x1 , x2 , x3 ). The Fenchel convex program-
ming problem can now be expressed as
vF = inf (f1 (x1 ) − f2 (x2 )),
x∈C

where
C = {x ∈ R3n : hrj (x) = (xj − x3 )r = 0, j = 1, 2, r = 1, 2, . . . , n, x ∈ X}.
Observe that here hj : Rn → Rn . Note that the reformulated Fenchel problem
is nothing but the Lagrangian convex programming problem. The correspond-
ing Lagrangian dual problem is as follows:
n
X n
X
dL = sup inf {f1 (x1 ) − f2 (x2 ) + µr1 (x1 − x3 )r + µr2 (x2 − x3 )r },
(µ1 ,µ2 )∈R2n x∈X r=1 r=1

that is,
dL = sup inf {f1 (x1 ) − f2 (x2 ) + hµ1 , x1 i + hµ2 , x2 i
(µ1 ,µ2 )∈R2n x∈X

−hµ1 + µ2 , x3 i}. (4.28)


From the assumption of Theorem 4.14,
ri(dom f1 ) ∩ ri(dom f2 ) = ri C1 ∩ ri C2 6= ∅,
which implies there exists x̂ ∈ Rn such that x̂ ∈ ri C1 ∩ ri C2 . Therefore,
x = (x̂, x̂, x̂) ∈ ri X such that hrj (x) = 0, j = 1, 2, r = 1, 2, . . . , n; thereby
implying that the modified Slater constraint qualification holds. Invoking the
Lagrangian strong duality, Theorem 4.7,
vF = d L , (4.29)
it is easy to note that if µ1 6= −µ2 , the infimum is −∞ as x3 ∈ Rn . So taking
the supremum over µ = −µ1 = µ2 along with Proposition 1.7, the Lagrangian
dual problem (4.28) leads to
dL = sup inf {f1 (x1 ) − f2 (x2 ) − hµ, x1 i + hµ, x2 i}
µ∈Rn (x1 ,x2 )∈C1 ×C2
= sup { inf (hµ, x2 i − f2 (x2 )) + inf (f1 (x1 ) − hµ, x1 i)}
µ∈Rn x2 ∈C2 x1 ∈C1

= sup {(f2 )∗ (µ) − f1∗ (µ)},


µ∈Rn

© 2012 by Taylor & Francis Group, LLC


204 Saddle Points, Optimality, and Duality

thereby implying that dL = dF . This along with the relation (4.29) yields that
vF = dF and hence the Fenchel strong duality holds.
Conversely, suppose that the Fenchel strong duality holds under the as-
sumption that ri C1 ∩ ri C2 6= ∅. Define

C1 = {(y0 , y, z) ∈ R1+m+l : there exists x ∈ X such that f (x) ≤ y0 ,


gi (x) ≤ yi , i = 1, 2, . . . , m,
hj (x) = zj , j = 1, 2, . . . , l}

and

C2 = {(y0 , y, z) ∈ R1+m+l : yi ≤ 0, i = 1, 2, . . . , m, zj = 0, j = 1, 2, . . . , l}.

The Lagrange convex programming problem can now be expressed as

vL = inf{y0 : (y0 , y, z) ∈ C1 ∩ C2 },

which is of the form of the Fenchel problem with f1 (y0 , y, z) = y0 and


f2 (y0 , y, z) = 0. The corresponding Fenchel dual problem is

dF = sup{((f2 )∗ (ξ) − f1∗ (ξ)) : ξ = (λ0 , λ, µ) ∈ R1+m+l }


= sup { inf {λ0 y0 + hλ, yi + hµ, zi}
(λ0 ,λ,µ)∈R1+m+l (y0 ,y,z)∈C2

− sup {λ0 y0 + hλ, yi + hµ, zi − y0 }


(y0 ,y,z)∈C1

= sup { inf {y0 − λ0 y0 − hλ, yi − hµ, zi}


(λ0 ,λ,µ)∈R1+m+l (y0 ,y,z)∈C1

+ inf {λ0 y0 + hλ, yi + hµ, zi}}. (4.30)


(y0 ,y,z)∈C2

By the assumption of Theorem 4.7, the modified Slater constraint qualifica-


tion holds, which implies that there exists x̂ ∈ ri X such that gi (x̂) < 0,
i = 1, 2, . . . , m, and hj (x̂) = 0, j = 1, 2, . . . , l. As dom gi = X, i = 1, 2, . . . , m,
by Theorem 2.69, gi , i = 1, 2, . . . , m, is continuous on ri X. Therefore, there
exists ŷi < 0 such that gi (x̂) < ŷi < 0, i = 1, 2, . . . , m. Also, as dom f = X
with x̂ ∈ ri X, one may choose ŷ0 ∈ R such that f (x̂) < ŷ0 . Thus, for x̂ ∈ ri X,

f (x̂) < ŷ0 , gi (x̂) < ŷi , i = 1, 2, . . . , m, and hj (x̂) = ẑj , j = 1, 2, . . . , l,

where ẑj = 0, j = 1, 2, . . . , l. By Lemma 4.16, (ŷ0 , ŷ, ẑ) ∈ ri C1 ∩ ri C2 , which


implies ri C1 ∩ ri C2 6= ∅. Thus, by Theorem 4.14,

vL = d F . (4.31)

From the definition of C2 , the second infimum in (4.30) reduces to


m
X
inf{λ0 y0 + λi yi : y0 ∈ R, yi ≤ 0, i = 1, 2, . . . , m}.
i=1

© 2012 by Taylor & Francis Group, LLC


4.6 Equivalence between Lagrangian and Fenchel Duality 205

The infimum is −∞ if either λ0 6= 0 or λi > 0 for some i = 1, 2, . . . , m and


takes the value 0 otherwise. Therefore, the Fenchel dual problem becomes

dF = sup inf {y0 − hλ, yi − hµ, zi}


(λ,µ)∈Rm l (y0 ,y,z)∈C1
− ×R

m
X l
X
= sup inf {y0 + λi yi + µj zj }
(λ,µ)∈Rm l (y0 ,y,z)∈C1
+ ×R i=1 j=1
m
X l
X
= sup inf {f (x) + λi gi (x) + µj hj (x)},
(λ,µ)∈Rm l x∈X
+ ×R i=1 j=1

which yields that dF = dL . This along with (4.31) implies that vL = dL ,


thereby establishing the Lagrangian strong duality. 

© 2012 by Taylor & Francis Group, LLC


Chapter 5
Enhanced Fritz John Optimality
Conditions

5.1 Introduction
Until now we have studied how to derive the necessary KKT optimality con-
ditions for convex programming problems (CP ) or its slight variations such
as (CP 1), (CCP ) or (CCP 1) via normal cone or saddle point approach. Ob-
serve that in the KKT optimality conditions, the multiplier associated with
the subdifferential of the objective function is nonzero and thus normalized
to one. As discussed in Chapters 3 and 4, some additional conditions known
as the constraint qualifications are to be satisfied by the constraints to ensure
that the multiplier is nonzero and hence the KKT optimality conditions hold.
But in absence a of constraint qualification, one may not be able to derive
KKT optimality conditions. For example, consider the problem

min x subject to x2 ≤ 0.

In this example, f (x) = x and g(x) = x2 with C = {0} at which none of


the constraint qualifications is satisfied. Observe that the KKT optimality
conditions is also not satisfied at the point of minimizer x̄ = 0, the only
feasible point, as there do not exist λ0 = 1 and λ ≥ 0 such that

λ0 ∇f (x̄) + λ∇g(x̄) = 0 and λg(x̄) = 0.

In this chapter, we will consider the convex programming problem


min f (x) subject to x ∈ C (CP 1)
where (CP 1) which involves not only inequality constraints but also additional
abstract constraints, that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X}.

Here f, gi : Rn → R, i = 1, 2, . . . , m, are convex functions on Rn and X ⊂ Rn


is a closed convex set. Below we present the standard Fritz John optimality
conditions for (CP 1).

207

© 2012 by Taylor & Francis Group, LLC


208 Enhanced Fritz John Optimality Conditions

Theorem 5.1 Consider the convex programming problem (CP 1) and let x̄ be
a point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m,
not all simultaneously zero, such that
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, ∀ i = 1, 2, . . . , m.
i=1

Proof. As x̄ is a point of minimizer of (CP 1), it is a point of minimizer of


the problem

min F (x) subject to x ∈ X,

where F (x) = max{f (x) − f (x̄), g1 (x), g2 (x), . . . , gm (x)} is a convex function.
Therefore by the optimality condition (ii) of Theorem 3.1,

0 ∈ ∂F (x̄) + NX (x̄).

Applying the Max-Function Rule, XTheorem 2.96, there exist λ0 ≥ 0 and


λi ≥ 0, i ∈ I(x̄), satisfying λ0 + λi = 1 such that
i∈I(x̄)

X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i∈I(x̄)

where I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is the active index set at x̄. For
i∈
/ I(x̄), defining λi = 0 yields
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄)
i=1

along with λi gi (x̄) = 0, i = 1, 2, . . . , m, hence completing the proof. 


Note that in the example considered earlier, the Fritz John optimality
condition holds if one takes λ0 = 0 and λ > 0. Observe that the Fritz John
optimality conditions are only necessary and not sufficient. To study the suf-
ficiency optimality conditions, one needs KKT optimality conditions.

5.2 Enhanced Fritz John Conditions Using the


Subdifferential
Recently, Bertsekas [11, 12] studied the Fritz John optimality conditions,
which are more enhanced than those stated above and hence called them en-
hanced Fritz John optimality conditions. The proof of the enhanced Fritz John

© 2012 by Taylor & Francis Group, LLC


5.2 Enhanced Fritz John Conditions Using the Subdifferential 209

optimality condition involves the combination of the quadratic penalty func-


tion and metric approximation approaches. The penalty function approach
is an important theoretical as well as algorithmic method in the study of
constrained programming problems. Corresponding to the given problem, a
sequence of the unconstrained penalized problem is formulated and in the lim-
iting scenario, the sequence of point of minimizers of the penalized problem
converges to the point of minimizer of the original constrained problem. The
approach of metric approximations was introduced by Mordukhovich [84, 85].
This approach involves approximating the objective function and the con-
straint functions by smooth functions and reducing the constrained into an
unconstrained problem. The work of Bertsekas [11, 12] was based mostly on
the work of Hestenes [55], which was in turn motivated by the penalty function
approach of McShane [83] to establish the Fritz John optimality conditions.
It was the work of Hestenes [55] in which the complementary slackness was
strengthened to obtain a somewhat weaker condition than the complemen-
tary violation condition, which we will discuss in the subsequent derivation of
enhanced Fritz John optimality condition. In their works, McShane [83] and
Hestenes [55] considered X = Rn while Bertsekas extended the study when
X 6= Rn . Below we discuss the above approach to establish the enhanced Fritz
John optimality conditions for the convex programming problem (CP 1).

Theorem 5.2 Consider the convex programming problem (CP 1) and let x̄ be
the point of minimizer of (CP 1). Then there exist λi ≥ 0 for i = 0, 1, . . . , m,
not all simultaneously zero, such that
m
X
(i) 0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄).
i=1

(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ 6= ∅, then


there exists a sequence {xk } ⊂ X that converges to x̄ and is such that
for all k sufficiently large,

f (xk ) < f (x̄) and ¯


λi gi (xk ) > 0, ∀ i ∈ I.

Proof. For k = 1, 2, . . ., consider the penalized problem


min Fk (x) subject to x ∈ X ∩ cl Bε (x̄), (Pk )
where ε > 0 is such that f (x̄) ≤ f (x) for every x ∈ cl Bε (x̄) feasible to (CP 1).
The function Fk : Rn → R is defined as
m
kX + 1
Fk (x) = f (x) + (g (x))2 + kx − x̄k2 ,
2 i=1 2

where g + (x) = max{0, g(x)}. By the convexity of the functions f and


gi , i = 1, 2, . . . , m, Fk is a real-valued convex on Rn . As dom Fk = Rn ,
by Theorem 2.69, Fk is continuous on Rn . Also, as X is a closed convex set

© 2012 by Taylor & Francis Group, LLC


210 Enhanced Fritz John Optimality Conditions

and cl Bε (x̄) is a compact convex set, X ∩ cl Bε (x̄) is a compact convex subset


of Rn . By the Weierstrass Theorem, Theorem 1.14, there exists a point of
minimizer xk for the problem (Pk ). Therefore,

Fk (xk ) ≤ Fk (x̄), ∀ k ∈ N,

which implies
m
kX + 1
f (xk ) + (g (xk ))2 + kxk − x̄k2 ≤ f (x̄), ∀ k ∈ N. (5.1)
2 i=1 i 2

Because dom f = Rn , again by Theorem 2.69, f is continuous on Rn . Hence,


it is continuous on X ∩ cl Bε (x̄) and thus bounded over X ∩ cl Bε (x̄). By the
boundedness of f (xk ) over X ∩ cl Bε (x̄) and the relation (5.1), we have

lim gi+ (xk ) = 0, i = 1, 2, . . . , m. (5.2)


k→∞

Otherwise as k → +∞, the left-hand side of (5.1) also tends to infinity, which
is a contradiction.
As {xk } is a bounded sequence, by the Bolzano–Weierstrass Theorem,
Proposition 1.3, it has a convergent subsequence. Without loss of generality,
assume that {xk } converge to x̃ ∈ X ∩ cl Bε (x̄). By the condition (5.2),

gi (x̃) ≤ 0, i = 1, 2, . . . , m,

and hence x̃ is feasible for (CP 1). Taking the limit as k → +∞ in the condition
(5.1) yields
1
f (x̃) + kx̃ − x̄k2 ≤ f (x̄).
2
As x̄ is the point of minimizer of (CP 1) and x̃ is feasible to (CP 1),
f (x̄) ≤ f (x̃). Thus the above inequality reduces to

kx̃ − x̄k2 ≤ 0,

which implies kx̃ − x̄k = 0, that is, x̃ = x̄. Hence, the sequence xk → x̄ and
thus there exists a k̄ ∈ N such that xk ∈ ri X ∩ Bε (x̄) for every k ≥ k̄.
For k ≥ k̄, xk is a point of minimizer of the penalized problem (Pk ), which
by Theorem 3.1 implies that

0 ∈ ∂Fk (xk ) + NX∩Bε (x̄) (xk ).

As xk ∈ ri X ∩ Bε (x̄), by Proposition 2.39,

NX∩Bε (x̄) (xk ) = NX (xk ) + NBε (x̄) (xk ).

Again, because xk ∈ Bε (x̄), by Example 2.38, NBε (x̄) (xk ) = {0} and thus

0 ∈ ∂Fk (xk ) + NX (xk ), ∀ k ≥ k̄.

© 2012 by Taylor & Francis Group, LLC


5.2 Enhanced Fritz John Conditions Using the Subdifferential 211

As dom f = dom gi = Rn , i = 1, 2, . . . , m, by Theorem 2.69, f and gi ,


i = 1, 2, . . . , m are continuous on Rn . Applying the Sum Rule and the Chain
Rule for the subdifferentials, Theorems 2.91 and 2.94, respectively, the above
condition becomes
m
X
0 ∈ ∂f (xk ) + k gi+ (xk )∂gi+ (xk ) + (xk − x̄) + NX (xk ), ∀ k ≥ k̄,
i=1

which implies that for every k ≥ k̄, there exist ξ0k ∈ ∂f (xk ) and ξik ∈ ∂gi (xk ),
i = 1, 2, . . . , m, such that
m
X
−{ξ0k + αik ξik + (xk − x̄)} ∈ NX (xk ), (5.3)
i=1

where αik = kβk gi+ (xk ) and βk ∈ [0, 1] for i = 1, 2, . . . , m. Denote


v
u m
u X 1 αk
k
γ = 1+ t (αik )2 , λk0 = k and λki = ki , i = 1, 2, . . . , m. (5.4)
i=1
γ γ

Observe that
m
X
(λk0 )2 + (λki )2 = 1, ∀ k ≥ k̄. (5.5)
i=1

Therefore, the sequences {λk0 } and {λki }, i = 1, 2, . . . , m, are bounded se-


quences in R+ and thus by the Bolzano–Weierstrass Theorem, Proposition 1.3
have a convergent subsequence. Without loss of generality, let λki → λi ,
i = 0, 1, . . . , m. As αik ≥ 0, i = 1, 2, . . . , m and γ k ≥ 1 for every k ≥ k̄,
λki ≥ 0 and thereby implying that λi ≥ 0, i = 0, 1, . . . , m. Also by condi-
tion (5.5), it is obvious that λ0 , λ1 , . . . , λm are not simultaneously zero. Now
dividing (5.3) by γ k leads to
m
X 1
−{λk0 ξ0k + λki ξik + (xk − x̄)} ∈ NX (xk ). (5.6)
i=1
γk

As f and gi , i = 1, 2, . . . , m are continuous at xk ∈ Rn , therefore by


Proposition 2.82, ∂f (xk ) and ∂gi (xk ), i = 1, 2, . . . , m, are compact. Thus
{ξik }, i = 0, 1, . . . , m, are bounded sequences in Rn and hence by the Bolzano–
Weierstrass Theorem have a convergent subsequence. Without loss of general-
ity, let ξik → ξi , i = 0, 1, . . . , m. By the Closed Graph Theorem, Theorem 2.84,
of the subdifferentials, ξ0 ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄) for i = 1, 2, . . . , m.
Taking the limit as k → +∞ in (5.6) along with the fact that the normal
cone NX has a closed graph yields
m
X
−{λ0 ξ0 + λi ξi } ∈ NX (x̄),
i=1

© 2012 by Taylor & Francis Group, LLC


212 Enhanced Fritz John Optimality Conditions

which implies
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i=1

thereby establishing condition (i).


Now suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is non-
empty. For i ∈ I,¯ corresponding to λi > 0, there exists a sequence λk → λi .
i
¯ By
Therefore, for all k sufficiently large, λki > 0 and hence λi λki > 0 for i ∈ I.
the condition (5.4), λi gi+ (xk ) > 0 for sufficiently large k, which implies
¯
λi gi (xk ) > 0, ∀ i ∈ I.

Also, by condition (5.1), f (xk ) < f (x̄) for sufficiently large k and hence con-
dition (ii) is satisfied, thereby yielding the requisite result. 
Observe that the condition (ii) in the above theorem is a condition that
replaces the complementary slackness condition in the Fritz John optimal-
ity condition. According to the condition (ii), if λi > 0, the corresponding
constraint gi is violated at the points arbitrarily close to x̄. Thus the con-
dition (ii) is called the complementary violation condition by Bertsekas and
Ozdaglar [13].
Now let us consider, in particular, X = Rn and gi , i = 1, 2, . . . , m,
to be affine in (CP 1). Then from the above theorem there exist λi ≥ 0,
i = 0, 1, . . . , m, not all simultaneously zero, such that conditions (i) and (ii)
hold. Due to affineness of gi , i = 1, 2, . . . , m, we have

gi (x) = gi (x̄) + h∇gi (x̄), x − x̄i, ∀ x ∈ Rn . (5.7)

Suppose that λ0 = 0. Then by condition (i) of Theorem 5.2,


m
X
0= λi ∇gi (x̄),
i=1

which implies that


m
X
0= λi h∇gi (x̄), x − x̄i.
i=1

As all the scalars cannot be all simultaneously zero, the index set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. By condition (ii), there exists
a sequence {xk } ⊂ Rn such that gi (xk ) > 0 for i ∈ I. ¯ Therefore, by (5.7),
which along with the above condition for x = xk , leads to
m
X m
X X
λi gi (x̄) = λi gi (xk ) = λi gi (xk ) > 0,
i=1 i=1 i∈I¯

© 2012 by Taylor & Francis Group, LLC


5.2 Enhanced Fritz John Conditions Using the Subdifferential 213
¯ thereby contradicting the feasi-
which implies that gi (x̄) > 0 for some i ∈ I,
bility of x̄. Thus λ0 > 0 and hence can be normalized to one, thereby leading
to the KKT optimality condition.
Observe that in the case as discussed above, the KKT optimality condition
holds without any assumption of constraint qualification. But if the convex
programming problem is not of the above type, to ensure that λ0 6= 0, one has
to impose some form of constraint qualification. In view of the enhanced Fritz
John optimality conditions, Bertsekas [12] introduced the notion of pseudo-
normality, which is defined as follows.

Definition 5.3 A feasible point x̄ of (CP 1) is said to be pseudonormal if


there does not exist any λi , i = 1, 2, . . . , m, and sequence {xk } ⊂ X such that
m
X
(i) 0 ∈ λi ∂gi (x̄) + NX (x̄)
i=1

(ii) λi ≥ 0, i = 1, 2, . . . , m and λi = 0 for i 6∈ I(x̄). Recall that


I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} denotes the active index set at x̄.
(iii) {xk } converges to x̄ and
m
X
λi gi (xk ) > 0, ∀ k ∈ N.
i=1

Below we present a result to show how the affineness of gi , i = 1, 2, . . . , m,


or the Slater-type constraint qualification ensure the pseudonormality at a
feasible point.

Theorem 5.4 Consider the problem (CP 1) and let x̄ be a feasible point of
(CP 1). Then x̄ is pseudonormal under either one of the following two criteria:

(a) Linearity criterion: X = Rn and the functions gi , i = 1, 2, . . . , m, are


affine.
(b) Slater-type constraint qualification: there exists a feasible point x̂ ∈ X
of (CP 1) such that gi (x̂) < 0, i = 1, 2, . . . , m.

Proof. (a) Suppose on the contrary that x̄ is not pseudonormal, which implies
that there exist λi , i = 1, 2, . . . , m, and {xk } ⊂ Rn satisfying conditions (i),
(ii), and (iii) in the Definition 5.3. By the affineness of gi , i = 1, 2, . . . , m, for
every x ∈ Rn ,

gi (x) = gi (x̄) + h∇gi (x̄), x − x̄i,

which implies
m
X m
X m
X
λi gi (x) = λi gi (x̄) + λi h∇gi (x̄), x − x̄i, ∀ x ∈ Rn . (5.8)
i=1 i=1 i=1

© 2012 by Taylor & Francis Group, LLC


214 Enhanced Fritz John Optimality Conditions

By the conditions (i) and (ii) in the definition of pseudonormality,


m
X
0= λi ∇gi (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m,
i=1

thereby reducing the condition (5.8) to


m
X
λi gi (x) = 0, ∀ x ∈ Rn .
i=1

This is a contradiction of condition (iii) of Definition 5.3 at x̄. Hence, x̄ is


pseudonormal.
(b) On the contrary, suppose that x̄ is not pseudonormal. By the convexity of
gi , i = 1, 2, . . . , m, for every x ∈ Rn ,

gi (x) − gi (x̄) ≥ hξi , x − x̄i, ∀ ξi ∈ ∂gi (x̄), i = 1, 2, . . . , m. (5.9)

By condition (i) in the definition of pseudonormality, there exist ξ¯i ∈ ∂gi (x̄),
i = 1, 2, . . . , m, such that
m
X
λi hξ¯i , x − x̄i ≥ 0, ∀ x ∈ X.
i=1

The above inequality along with condition (ii) reduces the condition (5.9) to
m
X
λi gi (x) ≥ 0, ∀ x ∈ X. (5.10)
i=1

As the Slater constraint qualification is satisfied at x̂ ∈ X,


m
X
λi gi (x̂) < 0
i=1

if λi > 0 for some i ∈ I(x̄). Thus, the condition (5.10) holds only if λi = 0
for i = 1, 2, . . . , m. But then this contradicts condition (iii). Therefore, x̄ is
pseudonormal. 
In Chapter 3 we derived the KKT optimality conditions under the Slater
constraint qualification as well as the Abadie constraint qualification. For
the convex programming problem (CP ) considered in previous chapters, the
feasible set C was given by (3.1), that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}.

Recall that the Abadie constraint qualification is said to hold at x̄ ∈ C if

TC (x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)},

© 2012 by Taylor & Francis Group, LLC


5.2 Enhanced Fritz John Conditions Using the Subdifferential 215

where I(x̄) is the active index set at x̄. But unlike the Slater constraint quali-
fication, the Abadie constraint qualification need not imply pseudonormality.
For better understanding, let us recall the example

C = {x ∈ R : |x| ≤ 0, x ≤ 0}.

From the discussion in Chapter 3, we know that the Abadie constraint qual-
ification is satisfied at x̄ = 0 but the Slater constraint qualification does not
hold as the feasible set C = {0}. Observe that both constraints are active at
x̄. Taking the scalars λ1 = λ2 = 1 and the sequence {xk } as {1/k}, conditions
(i), (ii), and (iii) in Definition 5.3 are satisfied. Thus, x̄ = 0 is not pseudo-
normal. The Abadie constraint qualification is also known as quasiregularity
at x̄. This condition was defined for X = Rn . The notion of quasiregularity
is implied by the concept of quasinormality. This concept was introduced by
Hestenes [55] for the case X = Rn . The notion of quasinormality is further
implied by pseudonormality.
Now if X 6= Rn , the quasiregularity at x̄ is defined as

TC (x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)} ∩ TX (x̄).

The above condition was studied by Gould and Tolle [53] and Guignard [54].
It was shown by Ozdaglar [91] and Ozdaglar and Bertsekas [92] that under
the regularity (Chapter 2 end notes) of the set X, pseudonormality implies
quasiregularity. They also showed that unlike the case X = Rn where quasi-
regularity leads to KKT optimality conditions, the concept is not enough to
derive KKT conditions when X 6= Rn unless some additional conditions are
assumed. For more on quasiregularity and quasinormality, readers are advised
to refer to the works of Bertsekas.
Next we establish the KKT optimality conditions under the pseudonor-
mality assumptions at the point of minimizer.

Theorem 5.5 Consider the convex programming problem (CP 1). Assume
that x̄ satisfies pseudonormality. Then x̄ is a point of minimizer of (CP 1)
if and only if there exist λi ≥ 0, i = 1, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Proof. Observe that the complementary slackness condition is equivalent to


condition (ii) in the definition of pseudonormality. Therefore, λi = 0 for every
i∈/ I(x̄). Suppose that the multiplier λ0 associated with the subdifferential
of the objective function in the enhanced Fritz John optimality condition is
zero. Therefore,
m
X
0∈ λi ∂gi (x̄) + NX (x̄),
i=1

© 2012 by Taylor & Francis Group, LLC


216 Enhanced Fritz John Optimality Conditions

that is, condition (i) of Definition 5.3 holds. As all λi ≥ 0, i = 0, 1, . . . , m,


are not simultaneously zero, λi > 0 for some i ∈ I(x̄) and thus
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Therefore, by condition (ii) of
the enhanced Fritz John condition, there exists a sequence {xk } ⊂ X con-
verging to x̄ such that
¯
λi gi (xk ) > 0, ∀ i ∈ I,
which implies
m
X
λi gi (xk ) > 0.
i=1

Thus condition (iii) in the definition of pseudonormality is satisfied, thereby


implying that x̄ is not pseudonormal. This contradicts the given hypothesis.
Therefore, λ0 6= 0, thereby satisfying the KKT optimality conditions. The
sufficiency can be worked out using the convexity of the objective function
and the constraint functions along with the convexity of the set X as done in
Chapter 3. 

5.3 Enhanced Fritz John Conditions under Restrictions


Observe that in the problem (CP 1), the functions f and gi , i = 1, 2, . . . , m,
are convex on Rn . But if these functions are convex only over the closed
convex set X, the line of proof of the above theorem breaks down. Bertsekas,
Ozdaglar, and Tseng [14] gave an alternative version of the enhanced Fritz
John optimality conditions, which is independent of the subdifferentials. The
proof given by them, which we present below relies on the saddle point theory.
Theorem 5.6 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex
set X ⊂ Rn and let x̄ be a point of minimizer of (CP 1). Then there exist
λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 f (x̄) = min{λ0 f (x) + λi gi (x)}.
x∈X
i=1

(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ 6= ∅, then


there exists a sequence {xk } ⊂ X that converges to x̄ and is such that
lim f (xk ) = f (x̄) and lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m,
k→∞ k→∞

and for all k sufficiently large


f (xk ) < f (x̄) and ¯
gi (xk ) > 0, ∀ i ∈ I.

© 2012 by Taylor & Francis Group, LLC


5.3 Enhanced Fritz John Conditions under Restrictions 217

Proof. For the positive integers k and r, consider the saddle point function
Lk,r : X × Rm
+ → R defined as

Xm
1 2 1
Lk,r (x, α) = f (x) + 3
kx − x̄k + αi gi (x) − kαk2 .
k i=1
2r

For fixed αi ≥ 0, i = 1, 2, . . . , m, by the lower semicontinuity and convexity


of the functions f and gi , i = 1, 2, . . . , m, over X, Lk,r (., α) is an lsc convex
function while for a fixed x ∈ X, Lk,r (x, .) is strongly concave and quadratic
in α. For every k, define the set

Xk = X ∩ B̄k (x̄).

Observe that Xk is a compact set. As f and gi , i = 1, 2 . . . , m, are lsc convex on


X, the functions are lsc, convex on Xk . Also, as Lk,r (x, .) is strongly concave,
it has a unique maximizer over Rm + and thus for some β ∈ R, the level set

{α ∈ Rm
+ : Lk,r (x̄, α) ≥ β}

is nonempty and compact. Thus by condition (iii) of the Saddle Point Theo-
rem, Proposition 4.1, Lk,r has a saddle point over Xk × Rm
+ , say (xk,r , αk,r ).
By the saddle point definition,

Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm


+ . (5.11)

As Lk,r (., αk,r ) attains an infimum over Xk at xk,r ,

Xm
1 2 1
Lk,r (xk,r , αk,r ) = f (xk,r ) + 3 kxk,r − x̄k + αk,r i gi (xk,r ) − kαk,r k2
k i=1
2r
Xm
1 2
≤ inf {f (x) + kx − x̄k + αk,r i gi (x)}
x∈Xk k3 i=1
Xm
1 2
≤ inf {f (x) + kx − x̄k + αk,r i gi (x)}
x∈Xk ,gi (x)≤0,∀i k3 i=1
1
≤ inf {f (x) + kx − x̄k2 }.
x∈Xk ,gi (x)≤0,∀i k3

As x̄ ∈ Xk and satisfies gi (x̄) ≤ 0, i = 1, 2, . . . , m, the above inequalities yield

Lk,r (xk,r , αk,r ) ≤ f (x̄). (5.12)

Again from (5.11), Lk,r (xk,r , .) attains a supremum over α ∈ Rm + at αk,r . As a


function of α ∈ Rm
+ , Lk,r (xk,r , .) is strongly concave and quadratic, and thus,
has a unique supremum at

αk,r i = rgi+ (xk,r ), i = 1, 2, . . . , m. (5.13)

© 2012 by Taylor & Francis Group, LLC


218 Enhanced Fritz John Optimality Conditions

We leave it to the readers to figure out how to compute αk,r i . Therefore,

1 r
Lk,r (xk,r , αk,r ) = f (xk,r ) + kxk,r − x̄k2 + kg + (xk,r )k2 , (5.14)
k3 2
which implies that
Lk,r (xk,r , αk,r ) ≥ f (xk,r ). (5.15)

From the conditions (5.12) and (5.15), we have

xk,r ∈ {x ∈ Xk : f (x) ≤ f (x̄)}.

As Xk is compact, the set {x ∈ Xk : f (x) ≤ f (xk )} is bounded and thus {xk,r }


forms a bounded sequence. In fact, we leave it to the readers to show that f
is also coercive on Xk . Thus, by the Bolzano–Weierstrass Theorem, Propo-
sition 1.3, for a fixed k the sequence {xk,r } has a convergent subsequence.
Without loss of generality, let {xk,r } converge to xk ∈ {x ∈ Xk : f (x) ≤ f (x̄)}.
As f is convex and coercive on Xk , by the Weierstrass Theorem, The-
orem 1.14, an infimum over Xk exists. Therefore for each k, the sequence
{f (xk,r )} is bounded below by inf x∈Xk f (x). Also, by condition (5.12),
Lk,r (xk,r , αk,r ) is bounded above by f (x̄). Thus, from (5.14),

lim sup gi (xk,r ) ≤ 0, i = 1, 2, . . . , m,


r→∞

which along with the lower semicontinuity of gi , i = 1, 2, . . . , m, implies that


gi (xk ) ≤ 0 for i = 1, 2, . . . , m, thereby yielding the feasibility of xk for (CP 1).
We urge the reader to work out the details. As x̄ is the minimizer of (CP 1),
f (xk ) ≥ f (x̄), which along with the conditions (5.12), (5.15), and the lower
semicontinuity of f leads to

f (x̄) ≤ f (xk ) ≤ lim inf f (xk,r ) ≤ lim sup f (xk,r ) ≤ f (x̄),


r→∞ r→∞

which implies that for each k,

lim f (xk,r ) = f (x̄).


r→∞

By the conditions (5.12) and (5.14), we have for every k ∈ N,

lim xk,r = x̄.


r→∞

Further note that using the definition of Lk,r (xk,r , αk,r ) and using (5.12) and
(5.15), for every k,
m
X
lim αk,ri gi (xk,r ) = 0.
r→+∞
i=1

© 2012 by Taylor & Francis Group, LLC


5.3 Enhanced Fritz John Conditions under Restrictions 219

Therefore, by the preceding conditions,


m
X
lim {f (xk,r ) − f (x̄) + αk,r i gi (xk,r )} = 0. (5.16)
r→∞
i=1

Denote
v
u m
u X 1 αk,r i
γ k,r = t1 + (αk,r i )2 , λk,r
0 = and λk,r
i = , i = 1, 2, . . . , m.
i=1
γ k,r γ k,r
(5.17)
Dividing (5.16) by γ k,r > 0 leads to
m
X
lim {λk,r k,r
0 f (xk,r ) − λ0 f (x̄) + λk,r
i gi (xk,r )} = 0.
r→∞
i=1

For each k, we fix an integer rk such that


m
X 1
|λk,r
0
k
f (xk,rk ) − λk,r
0
k
f (x̄) + λk,r
i
k
gi (xk,rk )| ≤ (5.18)
i=1
k

and
1 1 1
kxk,rk − x̄k ≤ , |f (xk,rk ) − f (x̄)| ≤ , |gi+ (xk,rk )| ≤ , i = 1, 2, . . . , m.
k k k
(5.19)
Dividing the saddle point condition

Lk,rk (xk,rk , αk,rk ) ≤ Lk,rk (x, αk,rk ), ∀ x ∈ Xk

by γ k,rk yields
Xm
λk,r k

λk,r
0
k
f (xk,rk ) 0 2
+ 3 kxk,rk − x̄k + λk,r
i
k
gi (xk,rk )
k i=1
Xm
1
≤ λk,r
0
k
f (x) + kx − x̄k2
+ λk,r
i
k
gi (x), ∀ x ∈ Xk .
k 3 γ k,rk i=1

As αik,rk ≥ 0, i = 1, 2, . . . , m, from the condition (5.17), γ k,rk ≥ 1 and


λk,r
i
k
≥ 0, i = 0, 1, . . . , m, along with
m
X
(λk,r
0
k 2
) + (λk,r
i
k 2
) = 1.
i=1

Therefore, {λk,r
i
k
}, i = 0, 1, . . . , m, are bounded sequences in R+ and thus
by the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent
subsequence. Without loss of generality, assume that λk,r i
k
→ λi ≥ 0,

© 2012 by Taylor & Francis Group, LLC


220 Enhanced Fritz John Optimality Conditions

i = 0, 1, . . . , m, not all simultaneously zero. Taking the limit as k → +∞


in the above inequality along with the condition (5.18) leads to
m
X
λ0 f (x̄) ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X,
i=1

which implies
m
X
λ0 f (x̄) ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀i
i=1
≤ inf λ0 f (x)
x∈X,gi (x)≤0,∀i

= λ0 f (x̄).

Therefore, λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero, satisfy condi-


tion (i), that is,
m
X
λ0 f (x̄) = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1

Next suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is


nonempty. Corresponding to λi > 0 for i ∈ I, ¯ there is a sequence λk,rk → λi
i
k,rk
such that λi > 0, i = 1, 2, . . . , m, which along with the condition (5.13)
implies

gi (xk,rk ) > 0, ∀ i ∈ I¯

for sufficiently large k. For each k, choosing rk such that xk,rk 6= x̄ and the
condition (5.19) is satisfied, implies that

xk,rk → x̄, f (xk,rk ) → f (x̄), gi+ (xk,rk ) → 0, i = 1, 2, . . . , m.

Also, by the condition (5.12),

f (xk,rk ) ≤ f (x̄),

thereby proving (ii) and hence establishing the requisite result. 


Similar to the pseudonormality notion defined earlier, the notion is stated
as below for the enhanced Fritz John conditions obtained in Theorem 5.6.

Definition 5.7 The constraint set of (CP 1) is said to be pseudonormal if


there do not exist any scalars λi ≥ 0, i = 1, 2, . . . , m, and a vector x′ ∈ X
such that

© 2012 by Taylor & Francis Group, LLC


5.3 Enhanced Fritz John Conditions under Restrictions 221
m
X
(i) 0 = inf λi gi (x),
x∈X
i=1

m
X
(ii) λi gi (x′ ) > 0.
i=1

For a better understanding of the above definition of pseudonormality, we


recall the idea of proper separation from Definition 2.25. A hyperplane H is
said to separate two convex sets F1 and F2 properly if

sup ha, x1 i ≤ inf ha, x2 i and inf ha, x1 i < sup ha, x2 i.
x1 ∈F1 x2 ∈F2 x1 ∈F1 x2 ∈F2

Now consider a set G = {g(x) = (g1 (x), g2 (x), . . . , gm (x)) : x ∈ X}. Then
from Definition 5.7 it is easy to observe that pseudonormality implies that
there exists no hyperplane H that separates the set G and the origin {0}
properly.
Similar to Theorem 5.4, the pseudonormality of the constraint set can be
derived under the Linearity criterion or the Slater constraint qualification.

Theorem 5.8 Consider the problem (CP 1). Then the constraint set is
pseudonormal under either one of the following two criteria:

(a) Linearity criterion: X = Rn and the functions gi , i = 1, 2, . . . , m, are


affine.

(b) Slater--type constraint qualification: there exists a feasible point x̂ ∈ X


of (CP 1) such that gi (x̂) < 0, i = 1, 2, . . . , m.

Proof. (a) Suppose on the contrary that the constraint set is not pseudo-
normal, which implies that there exist λi ≥ 0, i = 1, 2, . . . , m, and a vector
x′ ∈ Rn satisfying conditions (i) and (ii) in the Definition 5.7. Suppose that
x̄ ∈ Rn is feasible to (CP 1), that is, gi (x̄) ≤ 0, i = 1, 2, . . . , m, which along
with condition (i) yields
Xm
λi gi (x̄) = 0. (5.20)
i=1

By the affineness of gi , i = 1, 2, . . . , m,

gi (x) = gi (x̄) + h∇gi (x̄), x − x̄i, ∀ x ∈ Rn ,

which again by condition (i) and (5.20) implies


m
X
λi h∇gi (x̄), x − x̄i ≥ 0, ∀ x ∈ Rn .
i=1

© 2012 by Taylor & Francis Group, LLC


222 Enhanced Fritz John Optimality Conditions

λ X = Rn
λ
G
0 0

G
H H

Linearity criterion Slater criterion

FIGURE 5.1: Pseudonormality.

Pm
By Definition 2.36 of the normal cone, i=1 λi ∇gi (x̄) ∈ NRn (x̄). As x̄ ∈ Rn ,
by Example 2.38, the normal cone NRn (x̄) = {0}, which implies
m
X
λi ∇gi (x̄) = 0.
i=1

This equality along with the condition (5.20) and the affineness of gi ,
i = 1, 2, . . . , m implies that
m
X
λi gi (x) = 0, ∀ x ∈ Rn ,
i=1

thereby contradicting condition (ii) in the definition of pseudonormality. Hence


the constraint set is pseudonormal.
(b) Suppose on the contrary that the constraint set is not pseudonormal.
As the Slater-type constraint qualification holds, there exists x̂ ∈ X such
that gi (x̂) < 0, i = 1, 2, . . . , m, condition (i) is satisfied only if λi = 0,
i = 1, 2, . . . , m, which contradicts condition (ii) in Definition 5.7. Therefore,
the constraint set is pseudonormal. 
In case the Slater-type constraint qualification is satisfied, the set G inter-
sects the orthant {x ∈ Rm : xi ≤ 0, i = 1, 2, . . . , m} as shown in Figure 5.1.
Then obviously condition (i) in the definition of pseudonormality does not
hold; that is, there exists no hyperplane H passing through origin supporting
G such that G lies in the positive orthant. Now when one has the linearity
criterion, that is, X = Rn and gi , i = 1, 2, . . . , m, are affine, the set G is also
affine (see Figure 5.1) and thus, condition (ii) is violated; that is, the hyper-
plane H does not contain the set G completely. In the linearity criterion, if X
is a polyhedron instead of X = Rn along with gi , i = 1, 2, . . . , m, being affine,

© 2012 by Taylor & Francis Group, LLC


5.3 Enhanced Fritz John Conditions under Restrictions 223

λ H
0

FIGURE 5.2: Not pseudonormal.

pseudonormality need not hold as shown in Figure 5.2. These observations


were made by Bertsekas, Ozdaglar, and Tseng [14].
We end this section by establishing the KKT optimality conditions similar
to Theorem 5.5, under the pseudonormality of the constraint set.
Theorem 5.9 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m are lsc and convex on the closed convex set
X ⊂ Rn . Assume that the constraint set is pseudonormal. Then x̄ is a point
of minimizer of (CP 1) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such
that
m
X
f (x̄) = min{f (x) + λi gi (x)} and λi gi (x̄) = 0, i = 1, 2, . . . , m.
x∈X
i=1

Proof. Suppose that in the enhanced Fritz John optimality condition, Theo-
rem 5.6, λ0 = 0. This implies
m
X
0 = min λi gi (x),
x∈X
i=1

that is, λi ≥ 0, i = 1, 2, . . . , m, satisfies condition (i) in the definition of


pseudonormality of the constraint set. As in the enhanced Fritz John condi-
tion λi , i = 0, 1, . . . , m, are not all simultaneously zero, there exists at least
one i ∈ {1, 2, . . . , m} such that λi > 0, that is, I¯ is nonempty. Again by The-
orem 5.6, there exist a sequence {xk } ⊂ X such that
¯
gi (xk ) > 0, ∀ i ∈ I,
which implies
X m
X
λi gi (xk ) = λi gi (xk ) > 0,
i∈I¯ i=1

© 2012 by Taylor & Francis Group, LLC


224 Enhanced Fritz John Optimality Conditions

that is, satisfying condition (ii) in Definition 5.7, thereby contradicting the
fact that the constraint sets are pseudonormal. Thus, λ0 6= 0 and hence can
be taken in particular as one, thereby establishing the optimality condition.
Using the optimality condition along with the feasibility of x̄ leads to
m
X
0≤ λi gi (x̄) ≤ 0,
i=1

that is,
m
X
λi gi (x̄) = 0.
i=1

As the sum of nonpositive terms is zero if every term is zero, the above equality
leads to

λi gi (x̄) = 0, i = 1, 2, . . . , m,

thereby establishing the complementary slackness condition.


Conversely, by the optimality condition,
m
X
f (x̄) ≤ f (x) + λi gi (x), ∀ x ∈ X.
i=1

In particular, for any x feasible to (CP 1), that is, x ∈ X satisfying gi (x) ≤ 0,
i = 1, 2, . . . , m, the above inequality reduces to

f (x̄) ≤ f (x),

thus proving that x̄ is a point of minimizer of (CP 1). 

5.4 Enhanced Fritz John Conditions in the Absence of


Optimal Solution
Up to now in this chapter, one observes two forms of enhanced Fritz John op-
timality conditions, one when the functions are convex over the whole space
Rn while in the second scenario convexity of the functions is over the convex
set X 6= Rn . The results obtained in Section 5.3 are in a form similar to strong
duality. In all the results of enhanced Fritz John and KKT optimality condi-
tions, it is assumed that the point of minimizer exists. But what if the convex
programming problem (CP 1) has an infimum that is not attained? In such
a case is it possible to establish a Fritz John optimality condition that can
then be extended to KKT optimality conditions under the pseudonormality

© 2012 by Taylor & Francis Group, LLC


5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 225

condition? The answer is yes and we present a result from Bertsekas [12] and
Bertsekas, Ozdaglar, and Tseng [14] to establish the enhanced Fritz John op-
timality conditions similar to those derived in Section 5.3. But in the absence
of a point of minimizer of (CP 1), the multipliers are now dependent on the
infimum, as one will observe in the theorem below.

Theorem 5.10 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn
and let finf < +∞ be the infimum of (CP 1). Then there exist λi ≥ 0 for
i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
λ0 finf = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1

Proof. If the infimum finf = −∞, then by the condition

inf f (x) ≤ inf f (x) = finf ,


x∈X x∈X,gi (x)≤0,∀i

inf x∈X f (x) = −∞. Thus for λ0 = 1 and λi = 0, i = 1, 2, . . . , m, the requisite


condition is obtained.
Now suppose that finf is finite. To establish the Fritz John optimality
condition we will invoke supporting hyperplane theorem. For that purpose,
define a set in Rm+1 as

M = {(d0 , d) ∈ R × Rm : there exists x ∈ X such that


f (x) ≤ d0 , gi (x) ≤ di , i = 1, 2, . . . , m}.

We claim that M is a convex set. For j = 1, 2, consider (dj0 , dj ) ∈ M, which


implies that there exist xj ∈ X such that

f (xj ) ≤ dj0 and gi (xj ) ≤ dji , i = 1, 2, . . . , m.

As X is a convex set, for every µ ∈ [0, 1], y = µx1 + (1 − µ)x2 ∈ X. Also by


the convexity of f and gi , i = 1, 2, . . . , m,

f (y) ≤ µf (x1 ) + (1 − µ)f (x2 ) ≤ µd10 + (1 − µ)d20 ,


gi (y) ≤ µgi (x1 ) + (1 − µ)gi (x2 ) ≤ µd1i + (1 − µ)d2i , i = 1, 2, . . . , m,

which implies that µ(d10 , d1 ) + (1 − µ)(d20 , d2 ) ∈ M for every µ ∈ [0, 1]. Hence
M is a convex subset of R × Rm .
Next we prove that (finf , 0) ∈ / int M. On the contrary, suppose that
(finf , 0) ∈ int M, which by Definition 2.12 implies that there exists ε > 0
such that (finf − ε, 0) ∈ M. Thus, there exists x ∈ X such that

f (x) ≤ finf − ε and gi (x) ≤ 0, i = 1, 2, . . . , m.

From the above condition it is obvious that x is a feasible point of (CP 1),
thereby contradicting the fact that finf is the infimum of the problem (CP 1).

© 2012 by Taylor & Francis Group, LLC


226 Enhanced Fritz John Optimality Conditions

Hence (finf , 0) 6∈ int M. By the supporting hyperplane theorem, Theo-


rem 2.26 (i), there exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that
m
X
λ0 finf ≤ λ0 d0 + λi di , ∀ (d0 , d) ∈ M. (5.21)
i=1

Let (d0 , d) = (d0 , d1 , . . . , dm ) ∈ M. Then for αi > 0,

(d0 , . . . , di−1 , di + αi , di+1 , . . . , dm ) ∈ M, i = 0, 1, . . . , m.

If for some i ∈ {0, 1, . . . , m}, λi < 0, then as the corresponding αi → +∞, it


leads to a contradiction of (5.21). Therefore, λi ≥ 0 for i = 0, 1, . . . , m.
It is easy to observe that (f (x), g(x)) = (f (x), g1 (x), g2 (x), . . . , gm (x)) ∈ M
for any x ∈ X. Therefore, the condition (5.21) becomes
m
X
λ0 finf ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X,
i=1

which implies
m
X
λ0 finf ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀i
i=1
≤ inf λ0 f (x)
x∈X,gi (x)≤0,∀i
= λ0 finf ,

thereby leading to the Fritz John optimality condition, as desired. 


Note that in Theorem 5.10, there is no complementary slackness condition.
Under the Slater-type constraint qualification, that is, there exists x̂ ∈ X
such that gi (x̂) < 0, i = 1, 2, . . . , m, it can be ensured that λ0 6= 0. Other-
wise if λ0 = 0, from the Fritz John optimality condition, there exist λi ≥ 0,
i = 1, 2, . . . , m, not all simultaneously zero such that
m
X
λi gi (x) ≥ 0, ∀ x ∈ X,
i=1

which contradicts the Slater-type constraint qualification. This discussion can


be stated as follows.

Theorem 5.11 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are convex on the convex set X ⊂ Rn
and let finf < +∞ be the infimum of (CP 1). Assume that the Slater-type

© 2012 by Taylor & Francis Group, LLC


5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 227

constraint qualification holds. Then there exist λi ≥ 0, i = 1, 2, . . . , m, such


that
m
X
finf = inf {f (x) + λi gi (x)}.
x∈X
i=1

In Theorem 5.10, the Fritz John optimality condition is established in the


duality form in the absence of any point of minimizer of (CP 1) but at the
cost of the complementary slackness condition. Note that in Theorems 5.10
and 5.11, one requires the set X to be convex, but need not be closed. The
enhanced Fritz John optimality condition similar to Theorem 5.6 has also
been obtained in this scenario by Bertsekas, Ozdaglar, and Tseng [14] and
Bertsekas [12]. The proof is similar to that of Theorem 5.6 but complicated
as the point of minimizer does not exist.

Theorem 5.12 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex
set X ⊂ Rn and let finf < +∞ be the infimum of (CP 1). Then there exist
λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 finf = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1

(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ =


6 ∅, then
there exists a sequence {xk } ⊂ X such that

lim f (xk ) = finf and lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m


k→∞ k→∞

and for all k sufficiently large

f (xk ) < finf and ¯


gi (xk ) > 0, ∀ i ∈ I.

Proof. If for every x ∈ X, f (x) ≥ finf , then the result holds for λ0 = 1 and
λi = 0, i = 1, 2, . . . , m.
Now suppose that there exists an x̄ ∈ X such that f (x̄) < finf , thereby
implying that finf is finite. Consider the minimization problem
min f (x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ Xk . (CP 1k )
In (CP 1k ), Xk is a closed convex subset of Rn defined as

Xk = X ∩ B̄βk (0), ∀ k ∈ N

and β > 0 is chosen to be sufficiently large such that for every k, the constraint
set

{x ∈ Xk : gi (x) ≤ 0, i = 1, 2, . . . , m}

© 2012 by Taylor & Francis Group, LLC


228 Enhanced Fritz John Optimality Conditions

is nonempty. As f and gi , i = 1, 2, . . . , m, are lsc convex on X, they are lsc


convex and coercive on Xk . Thus by the Weierstrass Theorem, Theorem 1.14,
the problem (CP 1k ) has a point of minimizer, say x̄k . As k → ∞, Xk → X and
thus f (x̄k ) → finf . Because Xk ⊂ X, finf ≤ f (x̄k ). Define δk = f (x̄k ) − finf .
Observe that δk ≥ 0 for every k. If δk = 0 for some k, then x̄k ∈ Xk ⊂ X
is a point of minimizer of (CP 1) and the result holds by Theorem 5.6 with
finf = f (x̄k ).
Now suppose that δk > 0 for every k. For positive integers k and positive
scalars r, consider the function Lk,r : Xk × Rm + → R given by

Xm
δk2 2 kαk2
Lk,r (x, α) = f (x) + 2
kx − x̄k k + αi gi (x) − .
4k i=1
2r

Observe that the above function is similar to the saddle point function con-
1
sidered in Theorem 5.6 except that the term 3 kx − x̄k2 is now replaced by
k
δk2 2
kx − x̄k k . In Theorem 5.6, x̄ is a point of minimizer of (CP 1) whereas
4k 2
here the infimum is not attained and thus the term involves x̄k , the point of
minimizer of the problem (CP 1k ) and δk .
Now working along the lines of proof of Theorem 5.6, Lk,r has a saddle
point over Xk × Rm + , say (xk,r , αk,r ), which by the saddle point definition
implies
Lk,r (xk,r , α) ≤ Lk,r (xk,r , αk,r ) ≤ Lk,r (x, αk,r ), ∀ x ∈ Xk , ∀ α ∈ Rm
+ . (5.22)

As Lk,r (., αk,r ) attains an infinimum over Xk at xk,r ,


Lk,r (xk,r , αk,r ) ≤ f (x̄k ). (5.23)
Also, from (5.22), L(xk,r , α) attains a supremum over α ∈ Rm
+ at

αk,r i = rgi+ (xk,r ), i = 1, 2, . . . , m. (5.24)


Therefore,
Lk,r (xk,r , αk,r ) ≥ f (xk,r ). (5.25)
Further, as in the proof of Theorem 5.6,
lim f (xk,r ) = f (x̄k ).
r→∞

Note that in the proof, the problem (CP 1k ) is considered instead of (CP 1)
as in Theorem 5.6 and hence the condition obtained involves the point of
minimizer of (CP 1k ), that is, x̄k . Now as δk = f (x̄k )−finf , the above equality
leads to
lim f (xk,r ) = finf + δk . (5.26)
r→∞
Now before continuing with the proof to obtain the multipliers for the Fritz
John optimality condition, we present a lemma from Bertsekas, Ozdaglar, and
Tseng [14].

© 2012 by Taylor & Francis Group, LLC


5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 229

Lemma 5.13 For sufficiently large k and every r ≤ 1/ δk ,

δk
f (xk,r ) ≤ finf − . (5.27)
2

Furthermore, there exists a scalar rk ≥ 1/ δk such that

δk
f (xk,rk ) = finf − . (5.28)
2
Proof. Define δ = finf − f (x̄), where x̄ ∈ X is such that f (x̄) < finf . For
sufficiently large k, x̄ ∈ Xk . As x̄k is the point of minimizer of the problem
(CP 1k ), f (x̄k ) ≥ finf such that f (x̄k ) → finf , thus for sufficiently large k,

f (x̄k ) − finf < finf − f (x̄),

which implies δk < δ. By the convexity of f over X and that of Xk ⊂ X, for


λk ∈ [0, 1],

f (yk ) ≤ λk f (x̄) + (1 − λk )f (x̄k )


= λk (finf − δ) + (1 − λk )(finf + δk )
= finf − λk (δk + δ) + δk ,

2δk
where yk = λk x̄+(1−λk )x̄k . Because 0 ≤ δk < δ, 0 ≤ < 1. Substituting
δk + δ
2δk
λk = in the above condition yields
δk + δ

f (yk ) ≤ finf − δk . (5.29)

Again by the convexity assumptions of gi , i = 1, 2, . . . , m, along with the


2δk
feasibility of x̄k for (CP 1k ), for λk = ,
δk + δ

gi (yk ) ≤ λk gi (x̄) + (1 − λk )gi (x̄k )


2δk
≤ gi (x̄), i = 1, 2, . . . , m. (5.30)
δk + δ

From the saddle point condition (5.22) along with (5.24) and (5.25),

f (xk,r ) ≤ Lk,r (xk,r , αk,r )


δ2 r
= inf {f (x) + k2 kx − x̄k k2 + kg + (x)k2 }.
x∈Xk 4k 2

As x, x̄k ∈ Xk ⊂ B̄βk (0),

kx − x̄k k ≤ kxk + kx̄k ≤ 2βk,

© 2012 by Taylor & Francis Group, LLC


230 Enhanced Fritz John Optimality Conditions

thereby reducing the preceding inequality to


r
f (xk,r ) ≤ f (x) + (βδk )2 + kg + (x)k2 , ∀ x ∈ Xk .
2
In particular, taking x = yk ∈ Xk in the above condition, which along with
(5.29) and (5.30) implies that for sufficiently large k,
2rδk2
f (xk,r ) ≤ finf − δk + (βδk )2 + kg + (x̄)k2 .
(δk + δ)2

For sufficiently large k, δk → 0 and for every r ≤ 1/ δk , the above inequality
reduces to (5.27).
Now by the saddle point condition (5.22), which along with (5.24) implies
that
δk2 r
Lk,r (xk,r , αk,r ) = f (xk,r ) + 2
kxk,r − x̄k k2 + kg + (xk,r )k2
4k 2
δk2 r
= inf {f (x) + 2 kx − x̄k k + kg + (x)k2 }.
2
x∈Xk 4k 2
Consider r̄ > 0. Then for every r ≥ r̄,
δk2 r̄
Lk,r̄ (xk,r̄ , αk,r̄ ) = inf {f (x) + kx − x̄k k2 + kg + (x)k2 }
x∈Xk 4k 2 2
δ2 r̄
≤ f (xk,r ) + k2 kxk,r − x̄k k2 + kg + (xk,r )k2
4k 2
δ2 r
≤ f (xk,r ) + k2 kxk,r − x̄k k2 + kg + (xk,r )k2
4k 2
= Lk,r (xk,r , αk,r )
δ2 r
≤ f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2 .
4k 2
Thus as r ↓ r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ ).
Now for r ≤ r̄,
δk2 r
f (xk,r̄ ) + kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2
4k 2 2
δ2 r̄
≤ f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2
4k 2
= Lk,r̄ (xk,r̄ , αk,r̄ )
δ2 r̄
≤ f (xk,r ) + k2 kxk,r − x̄k k2 + kg + (xk,r )k2
4k 2
δ2 r r̄ − r +
= f (xk,r ) + k2 kxk,r − x̄k k2 + kg + (xk,r )k2 + kg (xk,r )k2
4k 2 2
r̄ − r +
= Lk,r (xk,r , αk,r ) + kg (xk,r )k2
2
δ2 r r̄ − r +
≤ f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2 + kg (xk,r )k2 .
4k 2 2

© 2012 by Taylor & Francis Group, LLC


5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 231

For every k, as gi , i = 1, 2, . . . , m, is lsc and coercive on Xk , {gi (xk,r )} is


bounded below by inf gi (x), which exists by the Weierstrass Theorem, The-
x∈Xk
orem 1.14. Therefore as r ↑ r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ ), which along
with the previous case of r ↓ r̄ leads to the continuity of Lk,r (xk,r , αk,r ) in r,
that is, as r → r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ ).
By the conditions (5.23) and (5.25) xk,r belongs to the compact set
{x ∈ Xk : f (x) ≤ f (x̄k )} for every k and therefore {xk,r } is a bounded se-
quence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, as r → r̄, it
has convergent subsequence. Without loss of generality, let xk,r → x̂k , where
x̂k ∈ {x ∈ Xk : f (x) ≤ f (x̄k )}. The continuity of Lk,r (xk,r , αk,r ) in r along
with the lower semicontinuity of f and gi , i = 1, 2, . . . , m, leads to
Lk,r̄ (xk,r̄ , αk,r̄ ) = lim Lk,r (xk,r , αk,r )
r→r̄
δk2 r
= lim {f (xk,r ) + kxk,r − x̄k k2 + kg + (xk,r )k2 }
r→r̄ 4k 2 2
δ2 r̄
≥ f (x̂k ) + k2 kx̂k − x̄k k2 + kg + (x̂k )k2
4k 2
δk2 r̄
≥ inf {f (x) + 2 kx − x̄k k + kg + (x)k2 }
2
x∈Xk 4k 2
= Lk,r̄ (xk,r̄ , αk,r̄ ),
which implies x̂k is the point of minimizer of
δk2 r̄
f (x) + kx − x̄k k2 + kg + (x)k2
4k 2 2
over Xk . As a strict convex function has unique point of minimizer and f (x) +
δk2 r̄
kx − x̄k k2 + kg + (x)k2 is strictly convex, x̂k = xk,r̄ .
4k 2 2
We claim that f (xk,r ) → f (xk,r̄ ) as r → r̄. As f is lsc, we will
prove the upper semicontinuity of f in r. On the contrary, suppose that
f (xk,r̄ ) < lim supr→r̄ f (xk,r ). As r → r̄, Lk,r (xk,r , αk,r ) → Lk,r̄ (xk,r̄ , αk,r̄ )
and xk,r → x̂k = xk,r̄ , which implies that
δk2 r̄
f (xk,r̄ ) + 2
kxk,r̄ − x̄k k2 + lim inf kg + (xk,r )k2
4k r→r̄ 2
< lim sup Lk,r (xk,r , αk,r )
r→r̄
= Lk,r̄ (xk,r̄ , αk,r̄ )
δ2 r̄
= f (xk,r̄ ) + k2 kxk,r̄ − x̄k k2 + kg + (xk,r̄ )k2 .
4k 2
But the above inequality is a contradiction of the lower semicontinuity of
gi , i = 1, 2, . . . , m. Therefore, f (xk,r ) is continuous in r.
Now by (5.26), for sufficiently large k,
lim f (xk,r ) = finf + δk .
r→+∞

© 2012 by Taylor & Francis Group, LLC


232 Enhanced Fritz John Optimality Conditions
3δk
Therefore, taking ε = , for r sufficiently large,
2
3δk
|f (xk,r ) − (finf + δk )| < ,
2
which implies that

δk 5δk
finf − < f (xk,r ) < finf + . (5.31)
2 2
1
For r ≤ √ , by (5.27),
δk

δk
f (xk,r ) ≤ finf − .
2
1
Now for r = √ , we have two possibilities:
δk
δk
(i) f (xk,r ) = finf − ,
2
δk
(ii) f (xk,r ) < finf − .
2
If (i) holds, then we are done with r = rk . If (ii) holds, then it contradicts

δk
finf − < f (xk,r )
2
1
and thus, r must satisfy r > √ . As f (xk,r ) is continuous in r, by the
δk √
Intermediate Value Theorem, there exists rk ≥ 1/ δk such that

δk
f (xk,r ) = finf − ,
2
that is, (5.28) holds. 
Now we continue proving the theorem. From the conditions (5.23) and
(5.25),

Xm
δk2 2
f (xk,r ) ≤ Lk,r (xk,r , αk,r ) ≤ inf {f (x) + 2 kx − x̄k k + αk,r i gi (x)}
x∈Xk 4k i=1
Xm
δk2 2
= f (xk,r ) + kxk,r − x̄k k + αk,r i gi (xk,r )
4k 2 i=1
≤ f (x̄k ).

© 2012 by Taylor & Francis Group, LLC


5.4 Enhanced Fritz John Conditions in the Absence of Optimal Solution 233
1
For r = rk ≥ √ , the above condition along with (5.28) and the fact that as
δk
k → +∞, f (x̄k ) → finf and δk → 0 imply that

Xm
δk2 2
lim f (xk,rk ) − finf + kxk,rk − x̄k k + αk,rk i gi (xk,rk ) = 0. (5.32)
k→∞ 4k 2 i=1

Define
v
u m
u X 1 αk,rk i
γk = t1 + 2
αk,r ki
, λk0 = , λki = , i = 1, 2, . . . , m. (5.33)
i=1
γk γk

As αk,rk ∈ Rm+ , δk ≥ 1 for every k. Therefore, dividing (5.32) by γk along with


the relation (5.33) leads to

Xm
δk2 λk0
lim λk0 f (xk,rk ) − λk0 finf + 2
kxk,rk − x̄k k + λki gi (xk,rk ) = 0. (5.34)
k→∞ 4k 2 i=1

By the saddle point condition (5.22),

Lk,rk (xk,rk , αk,rk ) ≤ Lk,rk (x, αk,rk ), ∀ x ∈ Xk .

Dividing the above inequality throughout by γk along with the fact that
kx − x̄k k ≤ 2βk for every x ∈ Xk implies that

Xm
δk2 λk0
λk0 f (xk,rk ) + kxk,r − x̄k k2
+ λki gi (xk,rk )
4k 2 k
i=1
Xm
δk2 λk0
≤ λk0 f (x) + kx − x̄k k2
+ λki gi (x).
4k 2 i=1

From (5.33), for every k, λki ≥ 0, i = 0, 1, . . . , m, such that


m
X
(λk0 )2 + (λki )2 = 1.
i=1

Therefore {λki }, i = 1, 2, . . . , m are bounded sequences and hence, by


the Bolzano–Weierstrass Theorem, Proposition 1.3, have convergent subse-
quences. Without loss of generality, assume that λki → λi , i = 1, 2, . . . , m,
with λi ≥ 0, i = 0, 1, . . . , m, not all simultaneously zero. Taking the limit as
k → +∞ in the above inequality along with (5.34) leads to
m
X
λ0 finf ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X.
i=1

© 2012 by Taylor & Francis Group, LLC


234 Enhanced Fritz John Optimality Conditions

Therefore,
m
X
λ0 finf ≤ inf {λ0 f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ inf {λ0 f (x) + λi gi (x)}
x∈X,gi (x)≤0,∀ i
i=1
≤ inf λ0 f (x)
x∈X,gi (x)≤0,∀i
= λ0 finf ,

which leads to condition (i).


Now dividing the condition (5.24) by γk , which along with (5.33) leads to

rk gi+ (xk,rk )
λki = , i = 1, 2, . . . , m.
γk
As k → +∞,

rk gi+ (xk,rk )
λi = lim , i = 1, 2, . . . , m.
k→∞ γk
In the beginning of the proof, we assumed that there exists x̄ ∈ X satisfying
f (x̄) < finf , which along with the condition (i) implies that the index set
I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Otherwise, if I¯ is empty, then

inf λ0 f (x) ≤ λ0 f (x̄) < λ0 finf ,


x∈X

¯ λi > 0, which implies there


which is a contradiction to condition (i). For i ∈ I,
exists a sequence λi → λi such that λi > 0. Therefore, gi+ (xk,rk ) > 0, that is,
k k

¯
gi (xk,rk ) > 0, ∀ i ∈ I.
1
In particular, for r = rk ≥ √ , conditions (5.23) and (5.24) yield
δk

rk + δ2 rk
f (xk,rk ) + kg (xk,rk )k2 ≤ f (xk,rk ) + k2 kxk,rk − x̄k k2 + kg + (xk,rk )k2
2 4k 2
≤ f (x̄k ),

which along with (5.28) and the relation δk = f (x̄k ) − finf ≥ 0 implies that

rk kg + (xk,rk )k2 ≤ 3δk .



As k → +∞, δk → 0 and rk ≥ 1/ δk → ∞, the above inequality leads to
g + (xk,rk ) → 0, that is,

lim sup gi (xk,rk ) ≤ 0, i = 1, 2, . . . , m.


k→∞

© 2012 by Taylor & Francis Group, LLC


5.5 Enhanced Dual Fritz John Optimality Conditions 235

Also, from the condition (5.28),

f (xk,rk ) < finf and lim f (xk,rk ) = finf .


k→∞

Thus, the condition (ii) is satisfied by the sequence {xk,rk } ⊂ X, thereby


yielding the desired result. 
Under the Slater-type constraint qualification, the multiplier λ0 can be
ensured to be nonzero and hence can be normalized to one.

5.5 Enhanced Dual Fritz John Optimality Conditions


In this chapter we emphasize the enhanced Fritz John conditions. As observed
in Section 5.4, we dealt with the situation where the infimum of the original
problem (CP 1) exists but is not attained. Those results were extended to
the dual scenario where the dual problem has a supremum but not attained
by Bertsekas, Ozdaglar and Tseng [14]. Now corresponding to the problem
(CP 1), the associated dual problem is
sup w(λ) subject to λ ∈ Rm
+ (DP 1)
where w(λ) = inf L(x, λ) with
x∈X
 m
 f (x) + X λ g (x),

λ ∈ Rm
i i +,
L(x, λ) =

 i=1
−∞, otherwise.

Before presenting the enhanced dual Fritz John optimality condition, we


first prove a lemma that will be required in establishing the theorem.

Lemma 5.14 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the convex set
X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the
infimum of (CP 1) and for every δ > 0, assume that

fδ = inf f (x).
x∈X,gi (x)≤δ,∀i

Then the supremum of (DP 1), wsup , satisfies fδ ≤ wsup for every δ > 0 and

wsup = lim fδ .
δ↓0

Proof. For the problem (CP 1), as the infimum finf exists and finf < +∞,
the feasible set of (CP 1) is nonempty, that is, there exists x̄ ∈ X satisfying
gi (x̄) ≤ 0, i = 1, 2, . . . , m. Thus for δ > 0, the problem

© 2012 by Taylor & Francis Group, LLC


236 Enhanced Fritz John Optimality Conditions

inf f (x) subject to gi (x) ≤ δ, i = 1, 2, . . . , m, x ∈ X, (CP 1δ )


satisfies the Slater-type constraint qualification as x̄ ∈ X with gi (x̄) < δ,
i = 1, 2, . . . , m. Therefore, by Theorem 5.11, there exist λδi ≥ 0,
i = 1, 2, . . . , m, such that
m
X m
X
fδ = inf {f (x) + λδi gi (x) − δ λδi }
x∈X
i=1 i=1
m
X
≤ inf {f (x) + λδi gi (x)}
x∈X
i=1
δ
= w(λ )
≤ sup w(λ) = wsup .
λ∈Rm
+

Therefore, for every δ > 0, fδ ≤ wsup and hence

lim fδ ≤ wsup . (5.35)


δ↓0

Now as δ → 0, the feasible region of (CP 1δ ) decreases and thus fδ is


nondecreasing as δ ↓ 0 and for every δ > 0, fδ ≤ finf . This leads to two cases,
either limδ→0 fδ > −∞ or limδ→0 fδ = −∞.
If limδ→0 fδ > −∞, then fδ > −∞ for every δ > 0 sufficiently small. For
those δ > 0, choosing xδ ∈ X such that gi (xδ ) ≤ δ, i = 1, 2, . . . , m, and
f (xδ ) ≤ fδ + δ. Such xδ are called almost δ-solution of (CP 1), the concept
that will be dealt with in Chapter 10. Therefore for λ ∈ Rm +,

m
X
w(λ) = inf {f (x) + λi gi (x)}
x∈X
i=1
m
X
≤ f (xδ ) + λi gi (xδ )
i=1
Xm
≤ fδ + δ + δ λi .
i=1

Taking the limit as δ → 0 in the above inequality leads to

w(λ) ≤ lim fδ , ∀ λ ∈ Rm
+,
δ→0

which implies wsup ≤ limδ→0 fδ .


If limδ→0 fδ = −∞, then for δ > 0, choose xδ ∈ X such that gi (xδ ) ≤ δ,
i = 1, 2, . . . , m, and f (xδ ) ≤ −1/δ. As in the previous case, for λ ∈ Rm
+,

Xm
−1
w(λ) ≤ +δ λi ,
δ i=1

© 2012 by Taylor & Francis Group, LLC


5.5 Enhanced Dual Fritz John Optimality Conditions 237

which leads to w(λ) = −∞ for every λ ∈ Rm + as δ ↓ 0 and hence,


wsup = −∞ = limδ→0 fδ . From both these cases along with the condition
(5.35), the requisite result is established. 
Finally, we present the enhanced dual Fritz John optimality conditions
obtained by Bertsekas, Ozdaglar, and Tseng [14], which are expressed with
respect to the supremum of the dual problem (DP 1).

Theorem 5.15 Consider the convex programming problem (CP 1) where the
functions f and gi , i = 1, 2, . . . , m, are lsc and convex on the closed convex
set X ⊂ Rn and (DP 1) is the associated dual problem. Let finf < +∞ be the
infimum of (CP 1) and wsup > −∞ be the supremum of (DP 1). Then there
exist λi ≥ 0 for i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
(i) λ0 wsup = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1

(ii) Consider the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. If I¯ =


6 ∅, then
there exists a sequence {xk } ⊂ X such that

lim f (xk ) = wsup and lim sup gi (xk ) ≤ 0, i = 1, 2, . . . , m,


k→∞ k→∞

and for all k sufficiently large

f (xk ) < wsup and ¯


gi (xk ) > 0, ∀ i ∈ I.

Proof. By the weak duality, wsup ≤ finf , which along with the hypothesis
implies that finf and wsup are finite. For k = 1, 2, . . ., consider the problem:
1
min f (x) subject to , i = 1, 2, . . . , m, x ∈ X.
gi (x) ≤ (CP 1k )
k4
k
By Lemma 5.14, the infimum finf of (CP 1k ) satisfies the condition
k
finf ≤ wsup for every k. For each k, consider x̂k ∈ X such that
1 1
f (x̂k ) ≤ wsup + and gi (x̂k ) ≤ , i = 1, 2, . . . , m. (5.36)
k2 k4
Now consider another problem:
1 ˆ 1k )
min f (x) subject to gi (x) ≤ , i = 1, 2, . . . , m, x ∈ X̂k , (CP
k4
where X̂k = X ∩ {x ∈ Rn : kxk ≤ k(maxj=1,...,k kx̂j k + 1)} is a compact
set. By the lower semicontinuity and convexity of f and gi , i = 1, 2, . . . , m,
over X, the functions are lsc convex and coercive on X̂k . Therefore, by the
Weierstrass Theorem, Theorem 1.14, (CP ˆ 1k ) has a point of minimizer, say
ˆ 1k ) which leads to
x̄k . From (5.36), x̂k is feasible for (CP
1
f (x̄k ) ≤ f (x̂k ) ≤ wsup + . (5.37)
k2

© 2012 by Taylor & Francis Group, LLC


238 Enhanced Fritz John Optimality Conditions

For every k, define the Lagrangian function as


m
X kαk2
Lk (x, α) = f (x) + αi gi (x) −
i=1
2k

and the set

Xk = X̂k ∩ {x ∈ Rn : gi (x) ≤ k, i = 1, 2, . . . , m}. (5.38)

For a fixed α ∈ Rm + , Lk (., α) is lsc convex and coercive on Xk by the lower


semicontinuity convexity and coercivity of f and gi , i = 1, 2, . . . , m, on X̂k
whereas for a given x ∈ Xk , Lk (x, .) is quadratic negative definite in α. Then
by the Saddle Point Theorem, Proposition 4.1, Lk has a saddle point over
Xk × Rm + , say (xk , αk ), that is,

Lk (xk , α) ≤ Lk (xk , αk ) ≤ L(x, αk ), ∀ x ∈ Xk , ∀ α ∈ Rm


+.

Because Lk (xk , .) is quadratic negative definite, it attains a supremum over


Rm
+ at
αik = kgi+ (xk ), i = 1, 2, . . . , m. (5.39)
Also, as Lk (., αk ) attains the infimum over Xk at xk , which along with (5.37),
(5.38) and (5.39) implies
m
X kαk2
Lk (xk , αk ) = f (xk ) + αik gi (xk ) −
i=1
2k
m
X
≤ f (xk ) + αik gi (xk )
i=1
m
X
= inf {f (x) + k gi+ (xk )gi (x)}
x∈Xk
i=1
m
X
≤ inf {f (x) + k gi+ (xk )gi (x)}.
x∈Xk ,gi (x)≤ k14 ,∀i
i=1

As xk ∈ Xk , gi (xk ) ≤ k for i = 1, 2, . . . , m. Therefore, the above inequality


leads to
m
Lk (xk , αk ) ≤ inf 1 {f (x) + 2 }
x∈Xk ,gi (x)≤ k4 ,∀i k
m
= f (x̄k ) + 2
k
m+1
≤ wsup + . (5.40)
k2
Due to the finiteness of wsup , there exists a sequence {µk } ⊂ Rm
+ satisfying

kµk k2
w(µk ) → wsup and → 0, (5.41)
2k

© 2012 by Taylor & Francis Group, LLC


5.5 Enhanced Dual Fritz John Optimality Conditions 239

which is ensured by choosing µk as the point of maximizer of the problem


max w(α) subject to kαk ≤ k 1/3 , α ∈ Rm
+.

Thus for every k,


Lk (xk , αk ) = sup inf Lk (x, α)
α∈Rm x∈Xk
+

≥ sup inf Lk (x, α)


α∈Rm x∈X
+
m
X kαk2
= sup { inf {f (x) + αi gi (x)} − }
α∈Rm
+
x∈X
i=1
2k
kαk2
= sup {w(α) − }
α∈Rm
+
2k
kµk k2
≥ w(µk ) − . (5.42)
2k
From the conditions (5.40) and (5.42),
m
X
kµk k2 kαk k2
w(µk ) − ≤ f (xk ) + αik gi (xk ) −
2k i=1
2k
m
X
≤ f (xk ) + αik gi (xk )
i=1
m+1
≤ wsup + . (5.43)
k2
Taking the limit as k → +∞ in the above inequality, which along with (5.41)
implies that
m
X
lim {f (xk ) − wsup + αik gi (xk )} = 0. (5.44)
k→∞
i=1
Define
v
u m
u X 1 αik
γk = t1 + (αik )2 , λk0 = and λki = , i = 1, 2, . . . , m. (5.45)
i=1
γk γk

As αk ∈ Rm
+,from the above condition it is obvious that γk ≥ 1 for every k
and thus dividing (5.44) by γk yields
m
X
lim {λk0 (f (xk ) − wsup ) + λki gi (xk )} = 0. (5.46)
k→∞
i=1

As xk minimizes Lk (., αk ) over Xk ,


m
X m
X
f (xk ) + αik gi (xk ) ≤ f (x) + αik gi (x), ∀ x ∈ Xk ,
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


240 Enhanced Fritz John Optimality Conditions

which on dividing throughout by γk leads to


m
X m
X
λk0 f (xk ) + λki gi (xk ) ≤ λk0 f (x) + λki gi (x), ∀ x ∈ Xk .
i=1 i=1

From the condition (5.45),


m
X
(λk0 )2 + (λki )2 = 1,
i=1

which implies that the sequences {λki } ⊂ R+ , i = 0, 1, . . . , m, are bounded and


thus by Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent sub-
sequence. Without loss of generality, let λki → λi with λi ≥ 0, i = 0, 1, . . . , m,
not all simultaneously zero. Therefore, as k → +∞ in the preceding inequality,
which along with (5.46) yields
m
X
λ0 wsup ≤ λ0 f (x) + λi gi (x), ∀ x ∈ X,
i=1

which leads to
m
X
λ0 wsup ≤ inf {λ0 f (x) + λi gi (x)}. (5.47)
x∈X
i=1

If λ0 > 0, then from the above inequality (5.47),

Xm
λi
wsup ≤ inf {f (x) + gi (x)} = w(λ/λ0 ) ≤ wsup ,
x∈X λ
i=1 0

thereby satisfying condition (i).


If λ0 = 0, then the relation (5.45) reduces to
m
X
0 ≤ inf λi gi (x).
x∈X
i=1

As finf exists and is finite, the feasible set of (CP 1) is nonempty, which implies
that there exists x ∈ X satisfying gi (x) ≤ 0, i = 1, 2, . . . , m. Therefore, the
above condition becomes
m
X
0 = inf λi gi (x).
x∈X
i=1

Therefore for both cases, condition (i) holds, that is,


m
X
λ0 wsup = inf {λ0 f (x) + λi gi (x)}.
x∈X
i=1

© 2012 by Taylor & Francis Group, LLC


5.5 Enhanced Dual Fritz John Optimality Conditions 241

Now suppose that the index set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is non-


empty. Dividing the condition (5.39) throughout by γk and using (5.45),

kgi+ (xk )
λki = , i = 1, 2, . . . , m.
δk

As k → +∞, λki → λi , i = 1, 2, . . . , m, thereby reducing the above equality


to

kgi+ (xk )
λi = lim , i = 1, 2, . . . , m.
k→∞ δk
¯ λi > 0, which implies for sufficiently large k, g + (xk ) > 0, that
For any i ∈ I, i
is,
¯
gi (xk ) > 0, ∀ i ∈ I.

From the inequalities (5.43), for every k,


m
X m+1
k(f (xk ) − wsup ) + k αik gi (xk ) ≤ .
i=1
k

Pm 1
By the condition (5.39), i=1 αik gi (xk ) = kαk k2 . Therefore, the above in-
k
equality becomes
m
X m+1
k(f (xk ) − wsup ) + (αik )2 ≤ .
i=1
k

Dividing the above inequality throughout by γk2 , which along with (5.45) im-
plies that
m
k(f (xk ) − wsup ) X k 2 m+1
+ (λi ) ≤ ,
γk2 i=1
kγk2

which as k → +∞ yields

Xm
k(f (xk ) − wsup )
lim sup 2 ≤− λ2i . (5.48)
k→∞ γk i=1

As I¯ is nonempty, the above inequality leads to

k(f (xk ) − wsup )


lim sup < 0,
k→∞ γk2

which for sufficiently large k implies that f (xk ) < wsup .

© 2012 by Taylor & Francis Group, LLC


242 Enhanced Fritz John Optimality Conditions

Now from (5.41) and (5.43),


m
X kαk k2
lim {f (xk ) − wsup + αik gi (xk )} − lim = 0,
k→∞
i=1
k→∞ 2k

which by the condition (5.44) implies that

kαk k2
lim = 0. (5.49)
k→∞ 2k
The condition (5.39) along with (5.41) and (5.43) leads to

kαk k2
lim (f (xk ) − wsup ) + = 0,
k→∞ 2k
which together with (5.48) implies that f (xk ) → wsup . Also, (5.49) along with
(5.39) and (5.48) yields
m
X
lim k (gi+ (xk ))2 = 0,
k→∞
i=1

which shows that

lim sup gi (xk ) ≤ 0.


k→∞

Thus for nonempty I, ¯ the sequence {xk } ⊂ X satisfies condition (ii), thereby
establishing the desired result. 

© 2012 by Taylor & Francis Group, LLC


Chapter 6
Optimality without Constraint
Qualification

6.1 Introduction
In the last few chapters we saw how fundamental the role of constraint qual-
ification is like the Slater constraint qualification in convex optimization. In
Chapter 3 we saw that a relaxation of the Slater constraint qualification to
the Abadie constraint qualification leads to an asymptotic version of the KKT
conditions for the nonsmooth convex programming problems. Thus it is inter-
esting to ask whether it is possible to develop necessary and sufficient opti-
mality conditions for (CP ) without any constraint qualifications. Recently a
lot of work has been done in this respect in the form of sequential optimality
conditions. But to the best of our knowledge the first step in this direction
was taken by Ben-Tal, Ben-Israel, and Zlobec [7]. They obtained the necessary
and sufficient optimality conditions in the smooth scenario in the absence of
constraint qualifications. This work was extended to the nonsmooth scenario
by Wolkowicz [112]. All these studies involved direction sets, which we will
discuss below. So before moving on with the discussion of the results derived
by Ben-Tal, Ben-Israel, and Zlobec [7], and Wolkowicz [112], we present the
notion of direction sets. Before that we introduce the definition of a blunt
cone.
A set K ⊂ Rn is said to be a cone (Definition 2.18) if
λx ∈ K whenever λ ≥ 0 and x ∈ K,
whereas K is a blunt cone if K is a cone without origin, that is,
0∈
/K and λx ∈ K if x ∈ K and λ > 0.
For example, R2+ \ {(0, 0)} is a blunt cone while the set K ⊂ R2 given as
K = {(x, y) ∈ R2 : x = y} is not a blunt cone.
Definition 6.1 Let φ : Rn → R be a given function and let x̄ ∈ Rn be any
given point. Then the set
Dφrelation (x̄) = {d ∈ Rn : there exists ᾱ > 0 such that
φ(x̄ + αd) relation φ(x̄), ∀ α ∈ (0, ᾱ]},

243

© 2012 by Taylor & Francis Group, LLC


244 Optimality without Constraint Qualification

where the relation can be =, ≤, <, ≥, or >.

In particular, the set Dφ= is called the cone of directions of constancy that was
considered by Ben-Tal, Ben-Israel, and Zlobec [7]. The other direction sets
were introduced in the work of Wolkowicz [112]. We present certain examples
of computing explicitly the set Dφ= (x̄) from Ben-Tal, Ben-Israel, and Zlobec [7].
For a strictly convex function φ : Rn → R, Dφ= (x̄) = {0} for any x̄ ∈ Rn .
Another interesting example from Ben-Tal, Ben-Israel, and Zlobec [7] is the
cone of the directions of constancy for the so-called faithfully convex function
given as

φ(x) = h(Ax + b) + ha, xi + β,

where h : Rn → R is strictly convex, A is an m × n matrix, b ∈ Rm , a ∈ Rn ,


and β ∈ R. The class of faithfully convex functions is quite broad, comprising
all the strictly convex functions and quadratic convex functions. See Rockafel-
lar [98] for more details. In the case of faithfully convex functions,
 
= A
Dφ (x̄) = N ull = {d ∈ Rn : Ad = 0, ha, di = 0},
aT

where N ull(S) is the null space of the matrix S. It is obvious that the null
space is contained in Dφ= . For the sake of completeness, we provide an expla-
nation for the reverse containment. We consider the following cases.

1. Ad = 0: Then by the definition of direction of constancy, ha, di = 0.

2. ha, di = 0: Suppose that d ∈ Dφ= (x̄), which implies there exists ᾱ > 0
such that

h(Ax̄ + αAd + b) = h(Ax̄ + b), ∀ α ∈ (0, ᾱ].

Suppose Ad 6= 0, then Ax̄ + αAd + b 6= Ax̄ + b for every α ∈ (0, ᾱ]. Now
two cases arise. If h(Ax̄ + α̂Ad + b) = h(Ax̄ + b) for some α̂ ∈ (0, ᾱ],
then by the strict convexity of h, for every λ ∈ (0, 1),

h(Ax̄ + λα̂Ad + b) < (1 − λ)h(Ax̄ + b) + λh(Ax̄ + α̂Ad + b),

which implies

h(Ax̄ + αAd + b) < h(Ax̄ + b), ∀ α ∈ (0, α̂)

/ Dφ= (x̄).
and hence, d ∈
The second case is that h(Ax̄+αAd+b) 6= h(Ax̄+b) for every α ∈ (0, ᾱ].
Then again it implies that d 6∈ Dφ= (x̄), which violates our assumption.
Therefore, for d to be a direction of constancy, Ad = 0 .

© 2012 by Taylor & Francis Group, LLC


6.1 Introduction 245

3. Ad 6= 0, ha, di 6= 0: This implies d 6= 0. We will show that φ is strictly


convex on the line segment [x̄, x̄ + ᾱd]. Consider xi = x̄ + αi d, i = 1, 2,
where αi ∈ [0, ᾱ] and α1 6= α2 . Therefore x1 6= x2 . By the strict convex-
ity of h, for every λ ∈ (0, 1),

h(A(λx1 + (1 − λ)x2 ) + b) < λh(Ax1 + b) + (1 − λ)h(Ax2 + b),

and by linearity of ha, .i,

ha, λx1 + (1 − λ)x2 i = λha, x1 i + (1 − λ)ha, x2 i.

Combining the above two conditions,

φ(λx1 + (1 − λ)x2 ) < λφ(x1 ) + (1 − λ)φ(x2 ), ∀ λ ∈ (0, 1).

This condition holds for every x1 , x2 ∈ [x̄, x̄ + αd] and thus φ is strictly
convex on [x̄, x̄ + αd]. Hence as mentioned earlier, Dφ= (x̄) = {0} for the
strictly convex function φ. But this contradicts the fact that d 6= 0.

Combining the above cases, we have

Dφ= (x̄) = {d ∈ Rn : Ad = 0, ha, di = 0}.

Below we present some results on the direction sets that will be required
in deriving the optimality conditions from Ben-Tal, Ben-Israel, and Zlobec [7],
Ben-Tal and Ben-Israel [6], and Wolkowicz [112].

Proposition 6.2 (i) Consider a function φ : Rn → R and x̄ ∈ Rn . Then

Dφ= (x̄) ⊂ {d ∈ Rn : φ′ (x̄, d) = 0}.

(ii) Consider a differentiable convex function φ : Rn → R and x̄ ∈ Rn . Then


Dφ= (x̄) is a convex cone.

(iii) Consider a convex function φ : Rn → R and x̄ ∈ Rn . Then Dφ≤ (x̄) is a


convex cone while Dφ< (x̄) is a convex blunt open cone. Also

Dφ≤ (x̄) = {d ∈ Rn : φ′ (x̄, d) ≤ 0} and Dφ< (x̄) = {d ∈ Rn : φ′ (x̄, d) < 0}.

(iv) Consider a convex function φ : Rn → R and x̄ ∈ Rn . Assume that


Dφ< (x̄) 6= ∅ (equivalently 0 ∈
/ ∂φ(x̄)). Then

(Dφ≤ (x̄))◦ = cone ∂φ(x̄).

Proof. (i) Consider d ∈ Dφ= (x̄), which implies there exists ᾱ > 0 such that

φ(x̄ + αd) = φ(x̄), ∀ α ∈ (0, ᾱ].

© 2012 by Taylor & Francis Group, LLC


246 Optimality without Constraint Qualification

Therefore, from the above condition,

φ(x̄ + αd) − φ(x̄)


lim = 0,
α↓0 α

which implies φ′ (x̄, d) = 0, thereby yielding the desired result.


(ii) Consider d ∈ Dφ= (x̄), which implies there exists ᾱ > 0 such that

φ(x̄ + αd) = φ(x̄), ∀ α ∈ (0, ᾱ].

The above condition can be rewritten as

φ(x̄ + α′ d′ ) = φ(x̄), ∀ α′ ∈ (0, ᾱ′ ],


α ᾱ
where d′ = λd, α′ = and ᾱ′ = for any λ > 0. Also 0 ∈ Dφ= (x̄). Therefore,
λ λ
λd ∈ Dφ= (x̄) for every λ ≥ 0 and hence Dφ= (x̄) is a cone.
Now consider d1 , d2 ∈ Dφ= (x̄). Then for i = 1, 2, there exists ᾱi > 0 such
that

φ(x̄ + αi di ) = φ(x̄), ∀ αi ∈ (0, ᾱi ].

Taking ᾱ = min{ᾱ1 , ᾱ2 } > 0, for i = 1, 2 the above condition becomes

φ(x̄ + αdi ) = φ(x̄), ∀ α ∈ (0, ᾱ]. (6.1)

For any λ ∈ [0, 1], consider d = λd1 + (1 − λ)d2 . The convexity of φ along with
(6.1) on d1 and d2 yields

φ(x̄ + αd) = φ(λ(x̄ + αd1 ) + (1 − λ)(x̄ + αd2 ))


≤ λφ(x̄ + αd1 ) + (1 − λ)φ(x̄ + αd2 ), ∀ α ∈ (0, ᾱ],

that is,
φ(x̄ + αd) ≤ φ(x̄), ∀ α ∈ (0, ᾱ]. (6.2)
Again, by the convexity of φ for the differentiable case, for every α ∈ (0, ᾱ],

φ(x̄ + αd) ≥ φ(x̄) + αh∇φ(x̄), di


= φ(x̄) + αλh∇φ(x̄), d1 i + α(1 − λ)h∇φ(x̄), d2 i. (6.3)

For a differentiable convex function, φ′ (x̄, d) = h∇φ(x̄), di for any d ∈ Rn .


Thus the relation in (i) becomes

Dφ= (x̄) ⊂ {d ∈ Rn : h∇φ(x̄), di = 0},

which reduces the inequality (6.3) to

φ(x̄ + αd) ≥ φ(x̄), ∀ α ∈ (0, ᾱ]

© 2012 by Taylor & Francis Group, LLC


6.1 Introduction 247

as d1 , d2 ∈ Dφ= (x̄). This inequality along with the condition (6.2) implies that
d ∈ Dφ= (x̄), thereby leading to the convexity of Dφ= .
(iii) We will prove the result for Dφ< (x̄). Consider d ∈ Dφ< (x̄), which implies
there exists ᾱ > 0 such that

φ(x̄ + αd) < φ(x̄), ∀ α ∈ (0, ᾱ].

As done in (ii), the above inequality can be rewritten as

φ(x̄ + α′ d′ ) < φ(x̄), ∀ α′ ∈ (0, ᾱ′ ],


α ᾱ
where d′ = λd, α′ = and ᾱ′ = for any λ > 0. Note that 0 6∈ Dφ< (x̄).
λ λ
Therefore, λd ∈ Dφ< (x̄) for every λ > 0 and hence Dφ< (x̄) is a blunt cone.
Now consider d1 , d2 ∈ Dφ< (x̄). Working along the lines of the proof in (ii),
for i = 1, 2,
φ(x̄ + αdi ) < φ(x̄), ∀ α ∈ (0, ᾱ]. (6.4)
For any λ ∈ [0, 1], let d = λd1 + (1 − λ)d2 . The convexity of φ along with the
condition (6.4) on d1 and d2 yields

φ(x̄ + αd) ≤ λφ(x̄ + αd1 ) + (1 − λ)φ(x̄ + αd2 )


< φ(x̄), ∀ α ∈ (0, ᾱ],

thereby implying the convexity of Dφ< . From Definition 6.1, it is obvious that
Dφ< is open by using the continuity of φ.
Consider d ∈ Dφ< (x̄), which implies that there exists ᾱ > 0 such that

φ(x̄ + αd) < φ(x̄), ∀ α ∈ (0, ᾱ].

By the convexity of φ, for every α ∈ (0, ᾱ],

αhξ, di ≤ φ(x̄ + αd) − φ(x̄) < 0, ∀ ξ ∈ ∂φ(x̄).

As dom φ = Rn , by Theorem 2.79 and Proposition 2.83, the directional deriv-


ative is the support function of the subdifferential, which along with the com-
pactness of ∂φ is attained at some ξ ∈ ∂φ(x̄). Thus φ′ (x̄, d) < 0, which leads
to
Dφ< (x̄) ⊂ {d ∈ Rn : φ′ (x̄, d) < 0}. (6.5)
Now consider d ∈ Rn such that φ′ (x̄, d) < 0, that is,

φ(x̄ + αd) − φ(x̄)


lim < 0.
α↓0 α
Therefore, there exists ᾱ > 0 such that

φ(x̄ + αd) < φ(x̄), ∀ α ∈ (0, ᾱ],

© 2012 by Taylor & Francis Group, LLC


248 Optimality without Constraint Qualification

which implies d ∈ Dφ< (x̄), thereby establishing the equality in the relation
(6.5). Working along the above lines of proof, readers are advised to prove the
result for Dφ≤ (x̄).

(iv) As Dφ< (x̄) is nonempty, (iii) implies that

φ′ (x̄, d) < 0, ∀ d ∈ Dφ< (x̄).

Because dom φ = Rn , by Theorem 2.79, the directional derivative acts as a


support function of the subdifferential, which along with the above relation is
equivalent to 0 6∈ ∂φ(x̄). Therefore, by Proposition 3.4, cone ∂φ(x̄) is closed.
The proof can be worked along the lines of Proposition 3.9 by replacing S(x̄)
b
by Dφ≤ (x̄) and cl S(x̄) by the closed set cone ∂φ(x̄). 

Note that unlike (iii) where the relation holds as equality, one is able to
prove only inclusion in (i) and not equality. For example, consider the strict
convex function φ : R → R defined as φ(x) = x2 . For x̄ = 0, Dφ= (x̄) = {0} and
∇φ(x̄) = 0. Observe that

{d ∈ R : h∇φ(x̄), di = 0} = R 6= {0} = Dφ= (x̄).

Hence, the equality need not hold in (i) even for a differentiable function. Also,
for a differentiable function φ : Rn → R, if there are n linearly independent
vectors di ∈ Dφ= (x̄), i = 1, 2, . . . , n, then ∇φ(x̄) = 0. Observe that one needs
the differentiability assumption only in (ii). A careful look at the proof of (ii)
shows that to prove the reverse inequality in (6.2), we make use of (i) under
differentiability. So if φ is nondifferentiable, to prove the result one needs to
assume that for some ξ ∈ ∂φ(x̄), φ′ (x̄, d) = hξ, di = 0 for every d ∈ Dφ= (x̄).
For a better understanding, we illustrate with an example from Ben-Tal, Ben-
Israel, and Zlobec [7]. Consider a convex nondifferentiable function φ : R2 → R
defined as

φ(x1 , x2 ) = max{x1 , x2 }.

For x̄ = (0, 0), ∂φ(x̄) = co {(1, 0), (0, 1)} and

Dφ= (x̄) = {(d, 0) ∈ R2 : d ≤ 0} ∪ {(0, d) ∈ R2 : d ≤ 0},

which is not convex. Note that h(ξ¯1 , ξ¯2 ), (d, 0)i = 0 for ξ¯ = (0, 1) whereas
h(ξ˜1 , ξ˜2 ), (0, d)i = 0 for ξ˜ = (1, 0), that is, φ′ (x̄, d) = 0 for ξ,
¯ ξ˜ ∈ ∂φ(x̄) with
ξ¯ 6= ξ.
˜
With all these discussions on the direction sets, we move on to study the
work done by Ben-Tal, Ben-Israel, and Zlobec [7].

© 2012 by Taylor & Francis Group, LLC


6.2 Geometric Optimality Condition: Smooth Case 249

6.2 Geometric Optimality Condition: Smooth Case


Ben-Tal, Ben-Israel, and Zlobec [7] established the necessary and sufficient
optimality conditions for (CP ) with the feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
in the absence of any constraint qualifications in the smooth scenario. The
result relates the point of minimizer of (CP ) with the inconsistency of a
system. We present the result below. Throughout we will assume that the
active index set I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is nonempty.
Theorem 6.3 Consider the convex programming problem (CP ) with C given
by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then
x̄ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(x̄),
the system

h∇f (x̄, di < 0, 
h∇gi (x̄), di < 0, i ∈ Ω, (CPΩ )

d ∈ Di= (x̄), i ∈ Ω∗ = I(x̄)\Ω
is inconsistent where Di= (x̄) = Dg=i (x̄) for i ∈ Ω∗ . It is important to note that
for Ω = I(x̄), Ω∗ = ∅ and then by convention we will consider d ∈ Rn .
Proof. We will prove that the negation of the result, that is, x̄ is not a point of
minimizer of (CP ) if and only if there exists some subset Ω ⊂ I(x̄) such that
the system (CPΩ ) is consistent. Suppose that x̄ is not a point of minimizer
of (CP ), which implies that there exists a feasible point x̃ ∈ C of (CP ) such
that f (x̃) < f (x̄). Therefore, by the convexity of the differentiable functions
f and gi , i ∈ I(x̄), Theorem 2.81,
h∇f (x̄), x̃ − x̄i ≤ f (x̃) − f (x̄) < 0,
h∇gi (x̄), x̃ − x̄i ≤ gi (x̃) − gi (x̄) ≤ 0, i ∈ I(x̄),
which implies for d = x̃ − x̄, the system
h∇f (x̄), di < 0,
h∇gi (x̄), di ≤ 0, i ∈ I(x̄).
Define the subset Ω of I(x̄) as
Ω = {i ∈ I(x̄) : h∇gi (x̄), di < 0}.
Therefore, d satisfies the system
h∇f (x̄), di < 0,
h∇gi (x̄), di < 0, i ∈ Ω,
h∇gi (x̄), di = 0, i ∈ Ω∗ .

© 2012 by Taylor & Francis Group, LLC


250 Optimality without Constraint Qualification

We claim that for every i ∈ Ω∗ ,

Di= (x̄) = {d ∈ Rn : h∇gi (x̄), di = 0}.

By Proposition 6.2 (i),

Di= (x̄) ⊂ {d ∈ Rn : gi′ (x̄, d) = 0} = {d ∈ Rn : h∇gi (x̄), di = 0}. (6.6)

Thus, to establish our claim, we will prove the reverse inclusion in the con-
dition (6.6). Consider any i ∈ Ω∗ . Define a differentiable convex function
Gi : R → R as Gi (λ) = gi (x̄ + λd). Therefore,

Gi (λ + δ) − Gi (λ)
∇Gi (λ) = lim
δ↓0 δ
gi (x̄ + (λ + δ)d) − gi (x̄ + λd)
= lim ,
δ↓0 δ

which for λ = 0 along with the fact that i ∈ Ω∗ implies that

gi (x̄ + (λ + δ)d)
∇Gi (λ) = lim = h∇gi (x̄), di = 0.
δ↓0 δ

By Proposition 2.75, ∇Gi is a nondecreasing over λ > 0, that is,

∇Gi (λ) ≥ ∇Gi (0) = 0, ∀ λ > 0.

Therefore, Gi is a nondecreasing function over λ > 0, which implies that λ = 0


is a point of minimizer of Gi . Hence,

gi (x̄ + λd) = Gi (λ) ≥ Gi (0) = 0, ∀ λ > 0 (6.7)

as i ∈ Ω∗ ⊂ I(x̄). As x̃ = x̄ + d is feasible to (CP ), for i ∈ Ω∗ , gi (x̄ + d) ≤ 0.


Thus, for λ = 1, the condition (6.7) reduces to gi (x̄ + d) = 0. By the convexity
of gi ,

gi (x̄ + λd) = gi ((1 − λ)x̄ + λ(x̄ + d))


≤ (1 − λ)gi (x̄) + λgi (x̄ + d) = 0, ∀ λ ∈ (0, 1),

which by (6.7) yields

gi (x̄ + λd) = 0, ∀ λ ∈ (0, 1].

Thus, d ∈ Di= (x̄). Because d ∈ {d ∈ Rn : h∇gi (x̄), di = 0} was arbitrary,

Di= (x̄) ⊃ {d ∈ Rn : h∇gi (x̄), di = 0},

thereby proving the claim. As the claim holds for every i ∈ Ω∗ , x̄ is not a
point of minimizer of (CP ) implies that the system (CPΩ ) is consistent.

© 2012 by Taylor & Francis Group, LLC


6.2 Geometric Optimality Condition: Smooth Case 251

Conversely, suppose that the system (CPΩ ) is consistent for some subset
Ω ⊂ I(x̄), that is,

h∇f (x̄, di < 0, (6.8)


h∇gi (x̄), di < 0, i ∈ Ω, (6.9)
d∈ Di= (x̄), i ∈ Ω∗ = I(x̄)\Ω. (6.10)

From the inequality (6.8),


f (x̄ + αd) − f (x̄)
lim < 0,
α↓0 α
which implies there exists ᾱf > 0 such that

f (x̄ + αd) < f (x̄), ∀ α ∈ (0, ᾱf ]. (6.11)

Similarly, from the condition (6.9), there exist ᾱi > 0, i ∈ Ω such that

gi (x̄ + αd) < gi (x̄) = 0, ∀ α ∈ (0, ᾱi ], i ∈ Ω. (6.12)

From (6.10), d ∈ Di= (x̄), i ∈ Ω∗ , which by Definition 6.1 implies that there
exist ᾱi > 0, i ∈ Ω∗ such that

gi (x̄ + αd) = gi (x̄) = 0, ∀ α ∈ (0, ᾱi ], i ∈ Ω∗ . (6.13)

/ I(x̄) is continuous on Rn , there exist


For i 6∈ I(x̄), gi (x̄) < 0. As gi , i ∈
ᾱi > 0, i ∈ I(x̄) such that

gi (x̄ + αd) < 0, ∀ α ∈ (0, ᾱi ], i ∈ I(x̄). (6.14)

Define ᾱ = min{ᾱf , ᾱ1 , . . . , ᾱm }. Therefore, the conditions (6.12), (6.13), and
(6.14) hold for ᾱ as well, which implies x̄ + ᾱd ∈ C, that is, feasible for (CP ).
By the strict inequality (6.11),

f (x̄ + ᾱd) < f (x̄),

thereby leading to the fact that x̄ is not a point of minimizer of (CP ), as


desired. 
We illustrate the above result by the following example. Consider the con-
vex programming problem
min −x1 + x2
subject to x1 + x2 + 1 ≤ 0,
x22 ≤ 0.

Observe that x̄ = (−1, 0) is the point of minimizer of the above problem. The
KKT optimality condition at x̄ is given by
       
−1 1 0 0
+ λ1 + λ2 = ,
1 1 0 0

© 2012 by Taylor & Francis Group, LLC


252 Optimality without Constraint Qualification

which is not satisfied by any λi ≥ 0, i = 1, 2. For x̄, I(x̄) = {1, 2} with

D1= (x̄) = {(d1 , d2 ) ∈ R2 : d1 + d2 = 0},


D2= (x̄) = {(d1 , d2 ) ∈ R2 : d2 = 0}.

Now consider the following systems as in Theorem 6.3:



−d1 + d2 < 0, 
d1 + d2 = 0, (CP∅ )

d2 = 0.

−d1 + d2 < 0, 
d1 + d2 < 0, (CP1 )

d2 = 0.

−d1 + d2 < 0, 
0 < 0, (CP2 )

d1 + d2 = 0.

−d1 + d2 < 0, 
d1 + d2 < 0, (CPI(x̄) )

0 < 0.
Observe that all four systems are inconsistent. Therefore, by the above theo-
rem, x̄ is the point of minimizer of the problem.
Now if we consider x̃ = (−2, 0), which is feasible for the problem,
I(x̃) = {2} with D2= (x̃) = D2= (x̄). For x̃, the system

−d1 + d2 < 0,
(CPI(x̃) )
0 < 0.
is inconsistent whereas

−d1 + d2 < 0,
(CP∅ )
d2 = 0.
is consistent. Thus, by Theorem 6.3, x̃ is not the point of minimizer.
Theorem 6.3 was expressed in terms of the inconsistency of a system for
every subset Ω ⊂ I(x̄). Next we present the result of Ben-Tal, Ben-Israel, and
Zlobec [7] in terms of the Fritz John type optimality conditions. But before
establishing that result, we state the Dubovitskii–Milyutin Theorem, which
acts as a tool in the proof.

Proposition 6.4 Consider open blunt convex cones C1 , C2 , . . . , Cm and con-


vex cone Cm+1 . Then
m+1
\
Ci = ∅
i=1

© 2012 by Taylor & Francis Group, LLC


6.2 Geometric Optimality Condition: Smooth Case 253

if and only if there exists yi ∈ Ci◦ , i = 1, 2, . . . , m, not all simultaneously zero


such that

y1 + y2 + . . . + ym + ym+1 = 0.

Theorem 6.5 Consider the convex programming problem (CP ) with C given
by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions. Then
x̄ is a point of minimizer of (CP ) if and only if for every subset Ω ⊂ I(x̄) the
system
X 
= ◦
0 ∈ λ0 ∇f (x̄) + λi ∇gi (x̄) + (DΩ ∗ (x̄)) , 
i∈Ω (CPΩ′ )

λ0 ≥ 0, λi ≥ 0, i ∈ Ω, not all simultaneously zeros
is consistent, where
 \
 Di= (x̄), if Ω∗ 6= ∅,
=
DΩ ∗ (x̄) = i∈Ω∗
 n
R , if Ω∗ = ∅.

Proof. From Theorem 6.3, x̄ is a point of minimum of (CP ) if and only


if for every subset Ω ⊂ I(x̄), the system (CPΩ ) is inconsistent, which by the
differentiability of f and gi , i ∈ Ω, along with Proposition 6.2 (iii) is equivalent
to
!
\
< < =
Df (x̄) ∩ Di (x̄) ∩ DΩ ∗ (x̄) = ∅,

i∈Ω

where Di< (x̄) = Dg<i (x̄), i ∈ Ω. By Proposition 6.2 (c), Df< (x̄) and Di< (x̄),
=
i ∈ Ω are open blunt convex cones while DΩ ∗ (x̄), being the intersection of

convex cones, is itself a convex cone. Applying Propositions 6.2 (iv) and 2.80,

(Dφ< (x̄))◦ = {y ∈ Rn : y = µ∇φ(x̄), µ ≥ 0}.

Now applying the Dubovitskii–Milyutin Theorem, Proposition 6.4, is equiva-


lent to the existence of multipliers λ0 ≥ 0, λi ≥ 0, i ∈ Ω, not all simultane-
ously zero such that
X
= ◦
0 ∈ λ0 ∇f (x̄) + λi ∇gi (x̄) + (DΩ ∗ (x̄)) ,

i∈Ω

thereby leading to the requisite result. 


Ben-Tal, Ben-Israel, and Zlobec [7] also dealt with the strictly convex case.
For more details, one can go through [7]. Observe that taking Ω = I(x̄) in
Theorem 6.5, the system (CPΩ′ ) reduces to the standard Fritz John optimality
condition. Similarly in Theorem 6.3, the system (CPΩ ) becomes

h∇f (x̄), di < 0,


h∇gi (x̄), di < 0, i ∈ I(x̄).

© 2012 by Taylor & Francis Group, LLC


254 Optimality without Constraint Qualification

Similar to the notion of constraint qualification, they define the following


concept of regularization condition under which the result holds for Ω = I(x̄),
and the other subsets of I(x̄) need not be considered.

Definition 6.6 A condition is called a regularization condition at a point x̄


if, when assumed along with the convexity and differentiability conditions of
f and gi , i = 1, 2, . . . , m, the family {(CPΩ ) : Ω ⊂ I(x̄)} can be replaced by a
single system (CPI(x̄) ). Thus, x̄ is a point of minimum of (CP ) if and only if

(CPI(x̄) ) is inconsistent or (CPI(x̄) ) is consistent. In this case, the Fritz John
optimality condition is necessary as well as sufficient to characterize the point
of minimum of (CP ).

In the example considered in this section, there is no regularization con-


dition because in the case of x̃ we need to verify the inconsistency, all the
possible systems other than (CPI(x̃) ) only to check that x̃ is not the point of
minimizer.
As observed in Chapter 3, under the Slater constraint qualification, the
KKT optimality conditions is necessary as well as sufficient to check whether
a point is optimal or not. It has been shown in Ben-Tal, Ben-Israel, and
Zlobec [7] that the Slater constraint qualification acts as a regularization con-
dition for (CP ). We present the result below.

Proposition 6.7 Consider the convex programming problem (CP ) with C


given by (3.1). Let f and gi , i = 1, 2, . . . , m, be differentiable convex functions.
Then the Slater constraint qualification, that is, there exists x̂ ∈ Rn such that
gi (x̂) < 0, i = 1, 2, . . . , m, is a regularization condition for (CP ).

Proof. We prove the result by establishing the negation. From the definition
of regularization condition, it is equivalent to verifying that x̄ is not a point of
minimizer of (CP ) if and only if the system (CPI(x̄) ) is consistent. If (CPI(x̄) )
is consistent, then by Theorem 6.3, x̄ is not a point of minimizer of (CP ).
Conversely, suppose that x̄ is not a point of minimizer for (CP ). Again, by
Theorem 6.3, there exists a subset Ω̄ ∈ I(x̄) and d¯ ∈ Rn such that the system

¯ < 0,
h∇f (x̄), di 
¯ < 0, i ∈ Ω̄,
h∇gi (x̄), di (CPΩ̄ )

d¯ ∈ Di= (x̄), i ∈ Ω̄∗ = I(x̄)\Ω̄.
By Proposition 6.2 (i),
¯ = 0, i ∈ Ω̄∗ .
h∇gi (x̄), di (6.15)

As x̂ satisfies the Slater constraint qualification, applying Theorem 2.81 to


gi , i ∈ I(x̄),

h∇gi (x̄), x̂ − x̄i ≤ gi (x̂) − gi (x̄) < 0, i ∈ I(x̄).

Define d˜ = d¯+ α(x̂ − x̄) for α > 0 sufficiently small. Then using the condition
(6.15), the system

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 255

˜ < 0,
h∇f (x̄, di
˜ < 0, i ∈ I(x̄), (CPI(x̄) )
h∇gi (x̄), di
˜ thereby leading to the desired result.
is consistent for d, 
Note that in the example considered in this section, the regularization
condition did not hold. As a matter of fact, the Slater constraint qualification
was not satisfied.

6.3 Geometric Optimality Condition: Nonsmooth Case


The work of Ben-Tal, Ben-Israel, and Zlobec [7] was extended by Wolkow-
icz [112] to nonsmooth convex scenario. The latter not only studied the op-
timality conditions by avoiding constraint qualifications, but also gave a geo-
metrical interpretation to what he termed as badly behaved constraints. Before
discussing the contributions of Wolkowicz [112] toward the convex program-
ming problem (CP ) with the feasible set C given by (3.1), we will define some
notations. The equality set is given by
I = = {i ∈ {1, 2, . . . , m} : gi (x) = 0, ∀ x ∈ C}.
For x̄ ∈ C, define
I < (x̄) = I(x̄)\I = ,
where I(x̄) is the active index set at x̄. Observe that while I < (x̄) depends on
x̄, I = is independent of any x ∈ C. Using the direction notations presented in
the beginning of this chapter, Wolkowicz [112] defined the set of badly behaved
constraints.
Definition 6.8 For x̄ ∈ C, the set of badly behaved constraints is given by
\
I b (x̄) = {i ∈ I = : (Di> (x̄) ∩ S(x̄))\cl Di= (x̄) 6= ∅},
i∈I =

where
S(x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0, ∀ i ∈ I(x̄)}.
Recall that we introduced the set S(x̄) in Section 3.3 and proved in Proposi-
tion 3.9 that
b
(S(x̄))◦ = cl S(x̄),
where
X
b
S(x̄) ={ λi ξi : λi ≥ 0, ξi ∈ ∂gi (x̄), i ∈ I(x̄)}.
i∈I(x̄)

© 2012 by Taylor & Francis Group, LLC


256 Optimality without Constraint Qualification

The set I b (x̄) is the set of constraints that create problems in KKT conditions.
A characterization of the above set in terms of the directional derivative was
stated by Wolkowicz [112] without proof. We present the result with proof for
a better understanding.

Theorem 6.9 Consider the convex programming problem (CP ) with C given
by (3.1). Let i∗ ∈ I = . Then i∗ ∈ I b (x̄) if and only if the system

gi′∗ (x̄, d) = 0, 


gi′ (x̄, d) ≤ 0, ∀ i\∈ I(x̄)\i∗ ,
(CPb )
d∈ / Di=∗ (x̄) ∪ cl Di= (x̄). 


i∈I =

is consistent.

Proof. Suppose that i∗ ∈ I b (x̄), which implies there exists d∗ ∈ Rn such that
 ∗

 d ∈ Di>∗ (x̄),
 ∗
d ∈ S(x̄),\


 d ∗

/ cl Di= (x̄).
i∈I =

As d∗ ∈ Di>∗ (x̄), d∗ ∈
/ Di=∗ (x̄), which along with the last condition implies
\
d∗ ∈
/ Di=∗ (x̄) ∪ cl Di= (x̄). (6.16)
i∈I =

Also, as d∗ ∈ Di>∗ (x̄), by Definition 6.1 there exists α∗ > 0 such that

gi∗ (x̄ + αd∗ ) > gi∗ (x̄), ∀ α ∈ (0, α∗ ].

Therefore,

gi∗ (x̄ + αd∗ ) − gi∗ (x̄)


lim ≥ 0,
α↓0 α

which implies
gi′∗ (x̄, d∗ ) ≥ 0. (6.17)
Because d∗ ∈ S(x̄),
gi′ (x̄, d∗ ) ≤ 0, ∀ i ∈ I(x̄). (6.18)
In particular, taking i∗ ∈ I = ⊆ I(x̄) in the above inequality along with (6.17)
yields
gi′∗ (x̄, d∗ ) = 0. (6.19)
Combining the conditions (6.16), (6.18), and (6.19) together imply that d∗
solves the system (CPb ), thereby leading to its consistency.

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 257

Conversely, suppose that (CPb ) is consistent, which implies there exists


d∗ ∈ Rn such that

gi′∗ (x̄, d∗ ) = 0, 


gi′ (x̄, d∗ ) ≤ 0, ∀ i \
∈ I(x̄)\i∗ ,
d∗ ∈ / Di=∗ (x̄) ∪ cl Di= (x̄). 


i∈I =

The first equality condition can be expressed as two inequalities given by

gi′∗ (x̄, d∗ ) ≤ 0 and gi′∗ (x̄, d∗ ) ≥ 0. (6.20)

As i∗ ∈ I = ⊆ I(x̄) along with the above condition yields

d∗ ∈ S(x̄). (6.21)

Also, from the inequality (6.20), there exists α∗ > 0 such that

gi∗ (x̄ + αd∗ ) ≥ gi∗ (x̄), ∀ α ∈ (0, α∗ ].

As d∗ 6∈ Di=∗ (x̄), the above inequality holds as a strict inequality and hence

d∗ ∈ Di>∗ (x̄). (6.22)


T
The conditions (6.21) and (6.22) along with the fact that d∗ ∈ / cl i∈I = Di= (x̄)
implies that d∗ ∈ I b (x̄), thereby establishing the desired result. 
Observe that if Di=∗ (x̄) = {d ∈ Rn : gi′∗ (x̄, d) = 0}, then by the above
characterization of the badly behaved constraints, i∗ 6∈ I b (x̄). The class of
functions that are never badly behaved includes the class of all continuous lin-
ear functionals and the classical distance function. For more on badly behaved
constraints, one can refer to Wolkowicz [112].
Before moving any further, we present a few results from Wolkowicz [112]
that act as a tool in the derivation of the characterization for the point of
minimum.

Proposition 6.10 Consider the convex programming problem (CP ) with C


given by (3.1). Suppose that x̄ ∈ C. Then
\ \ \
(i) Di≤ (x̄) = Di= (x̄) ∩ Di≤ (x̄).
i∈I(x̄) i∈I = i∈I < (x̄)
\ \
(ii) Di= (x̄) ∩ Di< (x̄) 6= ∅.
i∈I = i∈I < (x̄)

Furthermore, suppose that the set Ω satisfies I b (x̄) ⊂ Ω ⊂ I = . If


\
either co Di= (x̄) is closed or Ω = I = ,
i∈Ω

then

© 2012 by Taylor & Francis Group, LLC


258 Optimality without Constraint Qualification
\
(iii) TC (x̄) = cl Di≤ (x̄).
i∈I(x̄)
\ \ \
(iv) cl co Di= (x̄) ∩ S(x̄) = cl Di= (x̄) ∩ S(x̄) = cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈Ω i∈I =
\
(v) TC (x̄) = cl co Di= (x̄) ∩ S(x̄).
i∈Ω
[ \
(vi) −co ∂gi (x̄) ∩ ( Di= (x̄))◦ = ∅.
i∈I < (x̄) i∈Ω

Proof. (i) Observe that I(x̄) = I = ∪ I < (x̄), which implies


\
Di≤ (x̄) = {d ∈ Rn : there exists ᾱ > 0 such that
i∈I(x̄)

gi (x̄ + αd) ≤ gi (x̄), ∀ i ∈ I(x̄)}


\ \
= Di≤ (x̄) ∩ Di≤ (x̄).
i∈I = i∈I < (x̄)

For any d ∈ Di≤ (x̄), there exists ᾱ > 0 such that

gi (x̄ + αd) ≤ gi (x̄) = 0, α ∈ (0, ᾱ],

which implies x̄ + αd ∈ C for every α ∈ (0, ᾱ]. As for every i ∈ I = , gi (x) = 0


for every feasible point x ∈ C of (CP ), thereby implying that for every i ∈ I = ,

gi (x̄ + αd) = gi (x̄) = 0, α ∈ (0, ᾱ],

which implies Di≤ (x̄) = Di= (x̄) for every i ∈ I = . Therefore, by this condition,
\ \ \
Di≤ (x̄) = Di= (x̄) ∩ Di≤ (x̄),
i∈I(x̄) i∈I = i∈I < (x̄)

as desired.
(ii) If I(x̄) = ∅, the result holds trivially by (i). Suppose that I = and I < (x̄)
are nonempty. Then corresponding to any i ∈ I < , there exists some x̂ ∈ C
such that gi (x̂) < 0. By the convexity of gi , for every λ ∈ (0, 1],

gi (x̄ + λ(x̂ − x̄)) ≤ λgi (x̂) + (1 − λ)gi (x̄) < 0 = gi (x̄),

which implies that dˆ = x̂ − x̄ ∈ Di< (x̄). Also, suppose that there is some
j ∈ I < (x̄), j 6= i, then corresponding to j there exists some x̃ ∈ C such that
gj (x̃) < 0. Then as before, d˜ = x̃ − x̄ ∈ Dj< (x̄). Now if i and j are such that

gi (x̃) = 0 and gj (x̂) = 0,

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 259

then by the convexity of gi and gj , for every λ ∈ (0, 1),

gi (λx̂ + (1 − λ)x̃) ≤ λgi (x̂) + (1 − λ)gi (x̃) < 0,


gj (λx̂ + (1 − λ)x̃) ≤ λgj (x̂) + (1 − λ)gj (x̃) < 0,

which implies for λ ∈ (0, 1), (λx̂ + (1 − λ)x̃) − x̄ = λdˆ + (1 − λ)d˜ such that

λdˆ + (1 − λ)d˜ ∈ Di< (x̄) and λdˆ + (1 − λ)d˜ ∈ Dj< (x̄).

Proceeding as above, there exists d¯ ∈ Rn such that

d ∈ Di< (x̄), ∀ i ∈ I < (x̄) (6.23)

with corresponding ᾱ > 0 such that x̄+αd ∈ C for every α ∈ (0, ᾱ]. Therefore,
for every i ∈ I = ,

gi (x̄ + αd) = 0 = gi (x̄), ∀ α ∈ (0, ᾱ],

that is,

d ∈ Di= (x̄), ∀ i ∈ I = ,

which along with the condition (6.23) proves the desired result.
(iii) Consider a feasible point x ∈ C of (CP ) that implies

gi (x) ≤ 0, ∀ i ∈ I(x̄).

By the convexity of gi , i ∈ I(x̄), for every λ ∈ (0, 1],

gi (x̄ + λ(x − x̄)) ≤ λgi (x) + (1 − λ)gi (x̄) ≤ 0 = gi (x̄), ∀ i ∈ I(x̄).


T
Therefore, x − x̄ ∈ i∈I(x̄) Di≤ (x̄) for every x ∈ C, which implies
\
(C − x̄) ⊂ Di≤ (x̄).
i∈I(x̄)

T
As i∈I(x̄) Di≤ (x̄) is a cone,
\
cone (C − x̄) ⊂ Di≤ (x̄). (6.24)
i∈I(x̄)

T
Suppose that d ∈ i∈I(x̄) Di≤ (x̄), which implies there exists ᾱ > 0 such that

gi (x̄ + αd) ≤ gi (x̄) = 0, ∀ α ∈ (0, ᾱ], ∀ i ∈ I(x̄).

For i 6∈ I(x̄), gi (x̄) < 0 and thus, there exists some α′ > 0 such that for any
d ∈ Rn ,

gi (x̄ + αd) < 0, ∀ α ∈ (0, α′ ), ∀ i 6∈ I(x̄).

© 2012 by Taylor & Francis Group, LLC


260 Optimality without Constraint Qualification

Therefore, by the preceding inequalities, x′ = x̄ + αd ∈ C for α ∈ (0, α∗ ],


where α∗ = min{ᾱ, α′ }, which implies αd ∈ C − x̄, thereby leading to
d ∈ cone (C − x̄), which along with the condition (6.24) yields
\
Di≤ (x̄) = cone (C − x̄).
i∈I(x̄)

By Theorem 2.35,
\
TC (x̄) = cl cone (C − x̄) = cl Di≤ (x̄),
i∈I(x̄)

hence establishing the result.


(iv) By the given hypothesis Ω ⊂ I = , which implies that the containment
relation
\ \ \
cl Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄) ⊂ cl co Di= (x̄) ∩ S(x̄) (6.25)
i∈I = i∈Ω i∈Ω

holds. To establish the result, we will prove the following:


\ \
(1) cl co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄)).
i∈Ω i∈Ω
T =
If co i∈Ω Di (x̄) is closed, then
\ \ \
cl co Di= (x̄) ∩ S(x̄) = co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄)),
i∈Ω i∈Ω i∈Ω

thereby establishing the above condition.


If Ω = I = , we prove
\ \
cl co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄)). (6.26)
i∈I = i∈I =
T
As S(x̄) is a closed convex set and i∈I = Di= (x̄) ⊂ S(x̄),
\
cl co Di= (x̄) ⊂ S(x̄).
i∈I =
T T
Also S(x) = i∈I = Si (x̄) ∩ i∈I < (x̄) Si (x̄), where

Si (x̄) = {d ∈ Rn : gi′ (x̄, d) ≤ 0}.

Therefore, establishing (6.26) is equivalent to proving


\ \ \ \
cl co Di= (x̄) ∩ Si (x̄) ⊂ cl (co Di= (x̄) ∩ Si (x̄)).
i∈I = i∈I < (x̄) i∈I = i∈I < (x̄)

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 261

By condition (ii), there exists d ∈ Rn such that


\ \ \ \
d∈ Di= (x̄) ∩ Di< (x̄) ⊂ co Di= (x̄) ∩ int Si (x̄),
i∈I = i∈I < (x̄) i∈I = i∈I < (x̄)

which yields the above condition.


\ \
(2) co Di= (x̄) ∩ S(x̄) = Di= (x̄) ∩ S(x̄).
i∈Ω i∈Ω

By Proposition 6.2 (i) and (iii),


\ \ ≤
Di= (x̄) ⊂ Di (x̄).
i∈Ω i∈Ω

Because Di≤ (x̄) is convex,


\ \
co Di= (x̄) ⊂ Di≤ (x̄). (6.27)
i∈Ω i∈Ω

As Ω ⊂ I = , for every feasible point x ∈ C, gi (x) = 0, i ∈ Ω. For any


d ∈ Di≤ (x̄), i ∈ Ω, there exists ᾱi > 0 such that

gi (x̄ + αd) ≤ gi (x̄) = 0, ∀ α ∈ (0, ᾱi ],

which implies x̄ + αd ∈ C. Therefore, for any i ∈ Ω,

gi (x̄ + αd) = 0, ∀ α ∈ (0, ᾱi ],

thereby implying that d ∈ Di≤ (x̄), i ∈ Ω. Thus, the condition (6.27) becomes
\ \ \
co Di= (x̄) ⊂ Di= (x̄) ⊂ co Di= (x̄).
i∈Ω i∈Ω i∈Ω
T
The above relation implies that i∈Ω Di= (x̄) is convex, thereby leading to
\ \
co Di= (x̄) ∩ S(x̄) = Di= (x̄) ∩ S(x̄),
i∈Ω i∈Ω
T
as desired. Note that, in particular, for Ω = I = , i∈I = Di= (x̄) is convex.
\ \
(3) cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =
Suppose that Ω ( I = . We claim that
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =

Assume on the contrary that there exists d ∈ Rn such that


\ \
d∈ Di= (x̄) ∩ S(x̄) \ (cl Di= (x̄) ∩ S(x̄)).
i∈Ω i∈I =

© 2012 by Taylor & Francis Group, LLC


262 Optimality without Constraint Qualification

By the given hypothesis, there exists Ω̃ ⊂ I = \ Ω such that


\
d ∈ S(x̄), d ∈ Di= (x̄)
i∈I = \Ω̃

but
\
/ Di= (x̄) ∪ cl
d∈ Di= (x̄), ∀ i ∈ Ω̃.
i∈I =

By the hypothesis I b (x̄) ⊂ Ω ⊂ I = , Ω̃ ⊂ I = \ Ω ⊂ I = \ I b (x̄), which implies


Ω̃ 6⊂ I b (x̄). By invoking Theorem 6.9, the system (CPb ) is inconsistent and
thus

gi′ (x̄, d) < 0, ∀ i ∈ Ω̃.

Therefore, \ \
d∈ Di< (x̄) ∩ Di= (x̄). (6.28)
i∈Ω̃ i∈I = \Ω̃

By (ii), as
\ \
Di= (x̄) ∩ Di< (x̄) 6= ∅,
i∈I = i∈I < (x̄)

there exists d¯ ∈ Rn such that


\ \
d¯ ∈ Di= (x̄) ∩ Di< (x̄). (6.29)
i∈I = i∈I < (x̄)

Define dλ = λd + (1 − λ)d.¯ By condition (6.28), for i ∈ Ω̃ there exists ᾱi > 0


such that
gi (x̄ + αd) < gi (x̄), ∀ α ∈ (0, ᾱi ]. (6.30)
As Ω̃ ⊂ I = , by condition (6.29), for i ∈ Ω̃ there exists α̂i > 0 such that

gi (x̄ + αd) = gi (x̄), ∀ α ∈ (0, α̂i ]. (6.31)

Denote αi = min{ᾱi , α̂i }. By the convexity of gi , i ∈ Ω̃ along with conditions


(6.30) and (6.31), for λ ∈ (0, 1],
¯ < gi (x̄), ∀ α ∈ (0, αi ],
gi (x̄ + αdλ ) ≤ λgi (x̄ + αd) + (1 − λ)gi (x̄ + αd)

which implies \
dλ ∈ Di< (x̄), ∀ λ ∈ (0, 1]. (6.32)
i∈Ω̃

Again from (6.28),


\ \
d∈ Di= (x̄) ⊂ Di≤ (x̄),
i∈I = \Ω̃ i∈I = \Ω̃

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 263

and from (6.29),


\ \ \
d¯ ∈ Di= (x̄) ⊂ Di= (x̄) ⊂ Di≤ (x̄).
i∈I = i∈I = \Ω̃ i∈I = \Ω̃

Because Di≤ (x̄), i ∈ I = \ Ω̃ are convex sets,


\
dλ ∈ Di≤ (x̄), ∀ λ ∈ (0, 1). (6.33)
i∈I = \Ω̃

By Theorem 2.69, gi , i ∈ I < (x̄), is continuous on Rn , which along with con-


dition (6.29) implies that there exists β ∈ (0, 1) such that
\
dλ ∈ Di< (x̄), λ ∈ (0, β]. (6.34)
i∈I < (x̄)

Observe that

I(x̄) = I < (x̄) ∪ I = \ Ω̃ ∪ Ω̃,

which along with (i) leads to


\ \ \ \ \
Di≤ (x̄) ∩ Di< (x̄) = Di≤ (x̄) ∩ Di≤ (x̄) ∩ Di< (x̄).
i∈I(x̄) i∈Ω̃ i∈I < (x̄) i∈I = \Ω̃ i∈Ω̃

Therefore, combining (6.32), (6.33), and (6.34) along with the above relation
yields
\ \
dλ ∈ Di≤ (x̄) ∩ Di< (x̄).
i∈I(x̄) i∈Ω̃

As Ω̃ ⊂ I = , which along with (i) implies


\ \ \ \ \
Di≤ (x̄) = Di≤ (x̄) ∩ Di= (x̄) ⊂ Di≤ (x̄) ∩ Di= (x̄).
i∈I(x̄) i∈I < (x̄) i∈I = i∈I < (x̄) i∈Ω̃

Thus,
\
dλ ∈ Di= (x̄),
i∈Ω̃

which is a contradiction to
\
dλ ∈ Di< (x̄).
i∈Ω̃

Therefore,
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄).
i∈Ω i∈I =

© 2012 by Taylor & Francis Group, LLC


264 Optimality without Constraint Qualification
\
Because cl Di= (x̄) and S(x̄) are closed sets,
i∈I =
\ \
cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄),
i∈Ω i∈I =

thereby establishing the desired result when Ω ( I = .


If Ω = I = ,
\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di= (x̄) ∩ S(x̄),
i∈I = i∈I =

thus yielding the desired condition as before.

From the conditions (1) through (3), it is easy to observe that


\ \
cl co Di= (x̄) ∩ S(x̄) ⊂ cl (co Di= (x̄) ∩ S(x̄))
i∈Ω i∈Ω
\ \
= cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di= (x̄) ∩ S(x̄),
i∈Ω i∈I =

which along with (6.25) yields the requisite result.


(v) Using (iii) and (iv), it is enough to show that
\ \
cl Di≤ (x̄) = cl ( Di= (x̄) ∩ S(x̄)).
i∈I(x̄) i∈I =

From (ii) and Proposition 6.2 (iii), it is obvious that


\ \
cl Di≤ (x̄) ⊂ cl ( Di= (x̄) ∩ S(x̄)). (6.35)
i∈I(x̄) i∈I =

To prove the result, we claim that


\ \
Di= (x̄) ∩ S(x̄) ⊂ cl Di≤ (x̄). (6.36)
i∈I = i∈I(x̄)
T
Suppose that d ∈ i∈I = Di= (x̄) ∩ S(x̄). By (ii), there exists d¯ ∈ Rn such that
\ \
d¯ ∈ Di= (x̄) ∩ Di< (x̄).
i∈I = i∈I < (x̄)

+ (1 − λ)d.
Denote dλ = λd[ ¯ Therefore, by Theorem 2.79 and Proposition 6.2,
for every ξ ∈ ∂gi (x̄),
i∈I < (x̄)

¯ < 0, ∀ λ ∈ [0, 1),


hξ, dλ i = λhξ, di + (1 − λ)hξ, di

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 265

which again by Theorem 2.79 implies that for every i ∈ I < (x̄),

gi′ (x̄, dλ ) < 0, ∀ λ ∈ [0, 1).

Therefore, by Proposition 6.2 (iii),


\
dλ ∈ Di< (x̄), ∀ λ ∈ [0, 1). (6.37)
i∈I < (x̄)

T
Also, by the convexity of i∈I = Di= (x̄),
\
dλ ∈ Di= (x̄), ∀ λ ∈ [0, 1). (6.38)
i∈I =

Thus, by the relations (6.37) and (6.38), along with (i), we obtain
\ \ \
dλ ∈ Di< (x̄) ∩ Di= (x̄) ⊂ Di≤ (x̄), ∀ λ ∈ [0, 1).
i∈I < (x̄) i∈I = i∈I(x̄)

T
As the limit λ → 1, dλ → d, which implies d ∈ cl i∈I(x̄) Di≤ (x̄), thus proving
(6.36), which yields that
\ \
cl ( Di= (x̄) ∩ S(x̄)) ⊂ cl Di≤ (x̄).
i∈I = i∈I(x̄)

The above condition along with (6.35) establishes the desired result.
(vi) Define
[
F = −co ∂gi (x̄).
i∈I < (x̄)

We will prove the result by contradiction. Assume that


\
F ∩( Di= (x̄))◦ 6= ∅,
i∈Ω

which implies there exists


\
ξ ∈F ∩( Di= (x̄))◦ .
i∈Ω
P
As ξ ∈ F , there exists ξi ∈ ∂gi (x̄) and λi ≥ 0, i ∈ I < (x̄) with i∈I < (x̄) λi = 1
such that
X
ξ=− λi ξi .
i∈I < (x̄)

© 2012 by Taylor & Francis Group, LLC


266 Optimality without Constraint Qualification

By Proposition 2.31,
\
Di= (x̄) ⊂ {ξ}◦ = {d ∈ Rn : hξ, di ≤ 0},
i∈Ω
−F ◦ ⊂ −{ξ}◦ = {d ∈ Rn : hξ, di ≥ 0}.

Therefore,
\
−F ◦ ∩ Di= (x̄) ⊂ {d ∈ Rn : hξ, di = 0}.
i∈Ω

By (ii), there exists


\ \
dˆ ∈ Di= (x̄) ∩ Di< (x̄)
i∈I = i∈I < (x̄)
\
⊂ Di= (x̄) ∩ −F ◦ ⊂ {d ∈ Rn : hξ, di = 0},
i∈Ω

ˆ = 0. As dˆ ∈ T < D< (x̄), there exists ᾱi > 0, i ∈ I < (x̄),


that is, hξ, di i∈I (x̄) i
such that
ˆ < 0, ∀ αi ∈ (0, ᾱi ], ∀ i ∈ I < (x̄).
gi (x̄ + αi d)

By the convexity of gi , i ∈ I < (x̄), for every αi ∈ (0, ᾱi ],


ˆ ≤ gi (x̄ + αi d)
αi hξi , di ˆ − gi (x̄) < 0, ∀ i ∈ I < (x̄),

which implies
X
hξ, di = λi hξi , di < 0,
i∈I < (x̄)

which is a contradiction, thereby leading to the requisite result. 


Wolkowicz [112] derived a certain characterization in form of the KKT
type optimality conditions. But before presenting that result, we present a
lemma that will be required to prove the result.

Lemma 6.11 Consider the convex programming problem (CP ) with C given
by (3.1). Suppose that x̄ ∈ C and F ⊂ Rn any nonempty set. Then the
statement

x̄ is a point of minimizer of (CP ) if and only if the system


X 
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + F 
i∈I(x̄) (6.39)

λi ≥ 0, i ∈ I(x̄)
is consistent

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 267

holds for any objective function f if and only if F satisfies


b
NC (x̄) = S(x̄) + F. (6.40)

Proof. Suppose that the statement is satisfied for any fixed objective func-
tion. We will prove the condition (6.40). Consider ξ ∈ NC (x̄) and define the
objective function as f (x) = −hξ, xi. Then ξ ∈ −∂f (x̄)∩NC (x̄), which implies

0 ∈ ∂f (x̄) + NC (x̄).

By the optimality conditions for (CP ), Theorem 3.1 (ii), x̄ is a point of min-
imizer of (CP ). Therefore, by (6.39) along with ∂f (x̄) = {−ξ} leads to

b
ξ ∈ S(x̄) + F,

that is,
b
NC (x̄) ⊂ S(x̄) + F. (6.41)
b
Now suppose that ξ ∈ S(x̄) + F , which implies there exist ξi ∈ ∂gi (x̄) and
λi ≥ 0 for i ∈ I(x̄) such that
X
ξ− λi ξi ∈ F.
i∈I(x̄)

Again define the objective function as f (x) = −hξ, xi, which implies
∂f (x̄) = {−ξ}. By the above condition it is obvious that the condition (6.39)
is satisfied and thus by the statement, x̄ is a point of minimizer of (CP ).
Applying Theorem 3.1, −ξ ∈ NC (x̄), which implies

b
S(x̄) + F ⊂ NC (x̄).

The above containment along with the relation (6.41) yields the desired con-
dition (6.40).
Conversely, suppose that (6.40) holds. By Theorem 3.1 (ii), x̄ is a point of
minimizer of (CP ) if and only if

0 ∈ ∂f (x̄) + NC (x̄),

which by (6.40) is equivalent to

b
0 ∈ ∂f (x̄) + S(x̄) + F,

that is, the system (6.39) is consistent, thereby completing the proof. 
b
As mentioned in the beginning of this section, (S(x̄))◦ = cl S(x̄) by Propo-
b
sition 3.9. Therefore, if S(x̄) is closed, condition (6.40) becomes

NC (x̄) = (S(x̄))◦ + F.

© 2012 by Taylor & Francis Group, LLC


268 Optimality without Constraint Qualification

A similar result as the above theorem was studied by Gould and Tolle [53]
under the assumption of differentiability of the functions but not necessarily
convex.
Applying the above lemma along with some additional conditions, Wolkow-
icz [112] established KKT type optimality conditions. We present the result
below.

Theorem 6.12 Consider the convex programming problem (CP ) with C


given by (3.1) and x̄ ∈ C. Suppose that the set Ω satisfies

I b (x̄) ⊂ Ω ⊂ I =

and both the sets


\ \
co Di= (x̄) and b
S(x̄) +( Di= (x̄))◦
i∈Ω i∈Ω

are closed. Then x̄ is a point of minimizer of (CP ) if and only if the system
X \ 
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + ( Di= (x̄))◦ , 
i∈I(x̄) i∈Ω (6.42)

λi ≥ 0, i ∈ I(x̄),
is consistent.

Proof.T Observe that the system (6.42) is obtained, in particular, by taking


F = ( i∈Ω Di= (x̄))◦ in Lemma 6.11. Thus, to establish the result, it is suffi-
cient to prove that
\
b
NC (x̄) = S(x̄) +( Di= (x̄))◦ . (6.43)
i∈Ω

By Proposition 6.10 (v),


\
TC (x̄) = S(x̄) ∩ cl co Di= (x̄),
i∈Ω

which by Propositions 2.31 and 3.9 imply that


\
NC (x̄) = cl ((S(x̄)◦ + (cl co Di= (x̄))◦ )
i∈Ω
\
b
= cl (S(x̄) +( Di= (x̄))◦ ).
i∈Ω

The closedness assumption leads to the condition (6.43), thereby yielding the
requisite result. 
In the above theorem, the closedness conditions on the sets
\ \
co b
Di= (x̄) and S(x̄) +( Di= (x̄))◦
i∈Ω i∈Ω

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 269

act as a constraint qualification. If, in particular, we choose Ω = I = , then the


closedness conditions are no longer needed. In fact,
\
b
NC (x̄) = S(x̄) +( Di= (x̄))◦
i∈I =

is always satisfied. Below we present the result for this particular case.

Theorem 6.13 x̄ is a minimum of (CP ) if and only if the system


X \ 
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + ( Di= (x̄))◦ , 
i∈I(x̄) i∈I =

λi ≥ 0, i ∈ I(x̄),
is consistent.

Proof. By Theorem 3.1 (ii), x̄ ∈ C is a point of minimizer if and only if

0 ∈ ∂f (x̄) + NC (x̄).

In order to establish the result, it is enough to show that


\
b
NC (x̄) = S(x̄) +( Di= (x̄))◦ . (6.44)
i∈I =

Observe that int Di≤ (x̄) = Di< (x̄) for every i ∈ I < (x̄). Thus, invoking Propo-
sitions 2.31 and 6.10 implies
X \
NC (x̄) = TC (x̄)◦ = (Di≤ (x̄))◦ + ( Di= (x̄))◦ .
i∈I < (x̄) i∈I =

Again by Proposition 6.10 (ii), Di< (x̄) 6= ∅, which along with Proposi-
tion 6.2 (iv) yields
X \
NC (x̄) = { λi ∂gi (x̄) : λi ≥ 0, i ∈ I < (x̄)} + ( Di= (x̄))◦ .
i∈I < (x̄) i∈I =

Choosing λi = 0, i ∈ I = , the above condition leads to


\
b
NC (x̄) ⊂ S(x̄) +( Di= (x̄))◦ . (6.45)
i∈I =

By Propositions 3.9, 2.31, and 6.10 imply that


\
b
S(x̄) ⊂ (S(x̄))◦ = ( Di≤ (x̄))◦ = NC (x̄). (6.46)
i∈I(x̄)

Again, by Proposition 6.10,


\ \
( Di= (x̄))◦ ⊂ ( Di≤ (x̄))◦ = NC (x̄).
i∈I = i∈I(x̄)

© 2012 by Taylor & Francis Group, LLC


270 Optimality without Constraint Qualification

As NC (x̄) is a closed convex cone, the above relation along with (6.46) leads
to
\
b
S(x̄) +( Di= (x̄))◦ ⊂ NC (x̄),
i∈I =

which together with (6.45) yields the desired condition (6.44). 


In all these discussions, the notion of constraint qualification was not con-
sidered. Observe that in Theorem 6.13, instead of the standard KKT opti-
mality conditions,TWolkowicz [112] derived KKT type optimality conditions
involving the set i∈I =TDi= (x̄). The system reduces to the standard KKT op-
timality conditions if ( i∈I = Di= (x̄))◦ = {0}, that is, F = {0} in system (6.39)
of Lemma 6.11. Similar to the regularization condition of Ben-Tal, Ben-Israel,
and Zlobec [7], Wolkowicz [112] introduced the notion of regular point and
weakest constraint qualification.

Definition 6.14 A feasible point x̄ ∈ C of (CP ) is a regular point if for


any objective function f , the system (6.39) holds for F = {0}. A constraint
qualification that is satisfied if and only if x̄ is a regular point is known as the
weakest constraint qualification.

For the differentiable case, Gould and Tolle [52, 53] showed that the Abadie
constraint qualification, that is,

TC (x̄) = S(x̄)

is a weakest constraint qualification. Under the differentiability of the func-


b
tions gi , i ∈ I(x̄), the set S(x̄) is closed, which along with the Abadie con-
straint qualification is equivalent to
b
NC (x̄) = S(x̄),

which is a weakest constraint qualification. For the nonsmooth case, as dis-


cussed in Theorem 3.10, the Abadie constraint qualification along with the
b
assumption that S(x̄) is closed leads to the standard KKT conditions. In fact,
the Abadie constraint qualification is equivalent to the emptiness of the class
of badly behaved constraints I b (x̄). We present the result below.

Proposition 6.15 Let x̄ ∈ C. Then TC (x̄) = S(x̄) if and only if I b (x̄) = ∅.

Proof. Suppose that I b (x̄) = ∅. Therefore by Proposition 6.10 (iii) and (v),
it is obvious that TC (x̄) = S(x̄).
Conversely, let I b (x̄) 6= ∅, which implies there exists i∗ ∈ I b (x̄) such that

i ∈ I = and there exists
\
v ∗ ∈ (Di>∗ (x̄) ∩ S(x̄))\cl Di= (x̄).
i∈I =

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 271

Again by Proposition 6.10,


\
v∗ ∈
/ cl Di= (x̄) ∩ S(x̄) = TC (x̄),
i∈I =

which implies TC (x̄) 6= S(x̄), thereby proving the result. 


Now we illustrate by examples the above result. Consider

C = {x ∈ Rn : x2 ≤ 0, x ≤ 0}

with g1 (x) = x2 and g2 (x) = x. Observe that C = {0}. For x̄ = 0,


TC (x̄) = {0}, and I(x̄) = I = = {1, 2}. Here,

S(x̄) = {v ∈ R : g1′ (x̄, v) ≤ 0, g2′ (x̄, v) ≤ 0}


= {v ∈ R : h∇g1 (x̄), vi ≤ 0, h∇g2 (x̄), vi ≤ 0}
= {v ∈ R : v ≤ 0}.

Thus, TC (x̄) 6= S(x̄), thereby showing that the Abadie constraint qualification
is not satisfied. Also by the definitions of the cones of directions, we have

D1> (x̄) = {v ∈ R : v 6= 0},


D2> (x̄) = {v ∈ R : v > 0},
D1= (x̄) = {0} = D2= (x̄).

Observe that I b (x̄) = {1}, that is, the set of badly behaved constraints is
nonempty.
Next let us consider the set

C = {x ∈ R : |x| ≤ 0, x ≤ 0}.

Recall from Chapter 3 that the Abadie constraint qualification is satisfied at


x̄ = 0 with S(x̄) = {0}. Here also, the cones of directions are the same as that
of the previous example but now Di> (x̄) ∩ S(x̄) = ∅, thereby showing that the
set of badly behaved constraints I b (x̄) is empty.
Wolkowicz [112] gave an equivalent characterization of the regular point
with Abadie constraint qualification and the set of badly behaved constraints
I b (x̄). We state the result below. The proof can be worked out using Theo-
rem 3.10 and Proposition 6.15.

Theorem 6.16 Consider the convex programming problem (CP ) with C


given by (3.1) and let x̄ ∈ C. Then the following are equivalent:

(i) x̄ is a regular point,


b
(ii) Abadie constraint qualification holds at x̄ and S(x̄) is closed,
b
(iii) I b (x̄) is empty and S(x̄) is closed.

© 2012 by Taylor & Francis Group, LLC


272 Optimality without Constraint Qualification

In Chapter 3 we derived the optimality conditions not only under the


Abadie constraint qualification, but also the Slater constraint qualification.
It was observed by Wolkowicz [112] that the Slater constraint qualification is
a weakest constraint qualification with respect to the Fritz John optimality
condition, which we present below.
Theorem 6.17 Consider the convex programming problem (CP ) with C
given by (3.1). Then the Slater constraint qualification is a weakest constraint
qualification.
Proof. By Definition 6.14, the Slater constraint qualification is a weakest
constraint qualification if and only if x̄ is a regular point. Consider the Fritz
John optimality condition for (CP ); that is, if x̄ ∈ C is a point of minimizer
of (CP ), then there exist λi ≥ 0, i ∈ {0} ∪ I(x̄), not all simultaneously zero
such that
X
0 ∈ λ0 ∂f (x̄) + λi ∂gi (x̄).
i∈I(x̄)

Suppose that the Slater constraint qualification is satisfied, that is, there exists
x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m. We claim that λ0 6= 0. On the
contrary, assume that λ0 = 0. Then the above condition implies that there
exist λi ≥ 0, i ∈ I(x̄), not all simultaneously zero, such that
X
0∈ λi ∂gi (x̄),
i∈I(x̄)

which implies that there exist ξi ∈ ∂gi (x̄), i ∈ I(x̄), such that
X
0= λi ξi . (6.47)
i∈I(x̄)

By the convexity of gi , i ∈ I(x̄),


hξi , x̂ − x̄i ≤ gi (x̂) − gi (x̄) < 0, ∀ ξi ∈ ∂gi (x̄),
which along with the condition (6.47) leads to a contradiction. Thus, λ0 6= 0
and hence can be normalized to one, thereby leading to the KKT optimality
conditions.
Observe that the KKT optimality condition holds at x̄ if the system
X 
0 ∈ ∂f (x̄) + λi ∂gi (x̄), 
i∈I(x̄)

λi ≥ 0, i ∈ I(x̄),
is consistent for any f , which is equivalent to the inconsistency of the system
X 
0∈ λi ∂gi (x̄), 
i∈I(x̄)

λi ≥ 0, i ∈ I(x̄), not all simultaneously zero.

© 2012 by Taylor & Francis Group, LLC


6.3 Geometric Optimality Condition: Nonsmooth Case 273

Thus, the inconsistency of the above system is equivalent to


[
0 6∈ cone co ∂gi (x̄). (6.48)
i∈I(x̄)

We claim that the above condition is equivalent to the Slater constraint


qualification. Suppose that the condition (6.48) holds. Because dom gi = Rn ,
i ∈ I(x̄), by Proposition 2.83, S ∂gi (x̄) is a nonempty compact set. As
I(x̄) ⊂ {1, 2, . . . , m} is finite, i∈I(x̄) ∂gi (x̄) is also nonempty compact. Also,
as
[ [
∂gi (x̄) ⊂ cone co ∂gi (x̄),
i∈I(x̄) i∈I(x̄)

the condition (6.48) implies that


[
0∈
/ ∂gi (x̄).
i∈I(x̄)

Invoking Proposition 3.4,


[
cone co ∂gi (x̄)
i∈I(x̄)

is a closed set. Invoking the Strict Separation Theorem, Theorem 2.26 (iii),
there exists d¯ ∈ Rn and d¯ 6= 0 such that
[
¯ < 0, ∀ z ∈ cone co
hz, di ∂gi (x̄).
i∈I(x̄)

In particular, for ξi ∈ ∂gi (x̄), i ∈ I(x̄), the above inequality leads to


¯ < 0.
hξ, di

As dom gi = Rn , i ∈ I(x̄), by Theorem 2.79, for i ∈ I(x̄),


¯ = g ′ (x̄, d)
max hξi , di ¯ < 0,
i
ξi ∈∂gi (x̄)

which implies
¯ − gi (x̄)
gi (x̄ + λd) ¯
gi (x̄ + λd)
lim = lim < 0.
λ↓0 λ λ↓0 λ
Therefore, for every λ > 0,
¯ < 0, ∀ i ∈ I(x̄).
gi (x̄ + λd) (6.49)

For i 6∈ I(x̄), gi (x̄) < 0. Because dom gi = Rn , i 6∈ I(x̄), by Theorem 2.69,


gi , i 6∈ I(x̄) is continuous over Rn . Thus, there exists λ̄ > 0 such that

gi (x̄ + λ̄d) < 0, ∀ d ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


274 Optimality without Constraint Qualification
¯ the above inequality becomes
In particular, for d = d,
¯ < 0, ∀ i ∈
gi (x̄ + λ̄d) / I(x̄). (6.50)

Combining (6.49) and (6.50), for x̄ + λ̄d¯ ∈ Rn ,


¯ < 0, ∀ i = 1, 2, . . . , m,
gi (x̄ + λ̄d)

which implies that the Slater constraint qualification holds.


Conversely, suppose that the Slater constraint qualification holds, that
is, there exists x̂ ∈ Rn such that gi (x̂) < 0, i ∈ I. By Definition 2.77 of
subdifferentiability, for any ξi ∈ ∂gi (x̄), i ∈ I(x̄),

hξ, x̂ − x̄i ≤ gi (x̂) − gi (x̄) = gi (x̂) < 0,

which implies that


[
hz, x̂ − x̄i < 0, ∀ z ∈ cone co ∂gi (x̄).
i∈I(x̄)
S
Therefore, z 6= 0 for any z ∈ cone co i∈I(x̄) ∂gi (x̄), thereby establishing
(6.48). Hence, the Slater constraint qualification is a weakest constraint qual-
ification. 
In both these approaches, one makes use of the direction sets to establish
optimality conditions in the absence of any constraint qualification for the
convex programming problem (CP ). More recently, Jeyakumar and Li [69]
studied a class of sublinear programming problems involving separable sub-
linear constraints in the absence of any constraint qualification, which we
discuss in the next section.

6.4 Separable Sublinear Case


As already mentioned, the sublinear programming problem considered by
Jeyakumar and Li [69] involved separable sublinear constraints. So before
moving ahead with the problem, we state the concept of separable sublinear
function.

Definition 6.18 A sublinear function p : Rn → R is called a separable sub-


linear function if
n
X
p(x) = pj (xj )
j=1

with each pj : R → R, j = 1, 2, . . . , n being a sublinear function.

© 2012 by Taylor & Francis Group, LLC


6.4 Separable Sublinear Case 275

The sublinear programming problem studied by Jeyakumar and Li [69] is


min p0 (x) subject to pi (x) ≤ bi , i = 1, 2, . . . , m (SP )
where p0 : Rn → R is a sublinear function, pi : Rn → R, i = 1, 2, . . . , m, is a
separable sublinear function and bi ∈ R, i = 1, 2, . . . , m. Before establishing
the optimality conditions for (SP ), we first present the Farkas’ Lemma derived
by Jeyakumar and Li [69]. Farkas’ Lemma acts as a tool in the study of
optimality conditions for (SP ) in the absence of any constraint qualification.

Theorem 6.19 Consider the sublinear function p̃0 : Rn → R and separable


sublinear functions p̃i : Rn → R, i = 1, 2, . . . , m. Then the following are
equivalent:

(i) x ∈ Rn , p̃i (x) ≤ 0, i = 1, 2, . . . , m ⇒ p̃0 (x) ≥ 0,

(ii) There exist λi ≥ 0, i = 1, 2, . . . , m, such that


m
X
p̃0 (x) + λi p̃i (x) ≥ 0, ∀ x ∈ Rn .
i=1

Proof. Suppose that condition (i) holds. We claim that condition (ii) is also
satisfied. On the contrary, assume that (ii) does not hold, which along with
the fact that for a real-valued sublinear function p : Rn → R, p(0) = 0 implies
that for any λi ≥ 0, i = 1, 2, . . . , m, x̄ = 0 is not a point of minimizer of the
unconstrained problem
m
X
min p̃0 (x) + λi p̃i (x) subject to x ∈ Rn .
i=1

As sublinear functions are a special class of convex functions, the sublinear


programming problem (SP ) is also a convex programming problem for which
the KKT optimality conditions are necessary as well as sufficient for the point
of minimizer. Therefore, the KKT optimality condition does not hold at x̄ = 0,
that is,
m
X
0∈
/ ∂(p̃0 + λi p̃i )(0).
i=1

As dom p̃i = Rn , i = 0, 1, . . . , m, by Theorem 2.69, p̃i , i = 0, 1, . . . , m, are


continuous on Rn . Applying the Sum Rule, Theorem 2.91, the above condition
becomes
m
X
0 6∈ ∂ p̃0 (0) + λi ∂ p̃i (0),
i=1

© 2012 by Taylor & Francis Group, LLC


276 Optimality without Constraint Qualification

thereby implying ∂ p̃0 (0) ∩ (−P ) = ∅, where


Xm
P ={ λi ∂ p̃i (0) : λi ≥ 0, i = 1, 2, . . . , m}.
i=1

As p̃i , i = 1, 2, . . . , m, are separable sublinear functions,


n
X
p̃i (x) = p̃ij (xj ), i = 1, 2, . . . , m,
j=1

where p̃ij are sublinear functions on R. Thus,


Xm
P ={ λi (∂ p̃i1 (0) × ∂ p̃i2 (0) × . . . × ∂ p̃in (0)) : λi ≥ 0, i = 1, 2, . . . , m}.
i=1

As p̃ij : R → R, by Proposition 2.83, ∂ p̃ij is a nonempty convex and compact


set in R, that is,

∂ p̃ij (0) = [lij , uij ], i = 1, 2, . . . , m, j = 1, 2, . . . , n,

for some lij , uij ∈ R with lij ≤ uij . Therefore,


Xm
P ={ λi ([li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ]) : λi ≥ 0, i = 1, 2, . . . , m}
i=1
m
[
= cone co ([li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ])
i=1
[m
= cone co { (ai1 , ai2 , . . . , ain ) : aij ∈ [lij , uij ], j = 1, 2, . . . , n}.
i=1

Note that [li1 , ui1 ] × [li2 , ui2 ] × . . . × [lin , uin ] forms a convex polytope in Rn
with 2n vertices denoted by
r r r
(vir ) = (vi1 , vi2 , . . . , vin ), i = 1, 2, . . . , m, r = 1, 2, . . . , 2n ,
r
where vij ∈ {lij , uij }. Also, any element in the polytope can be expressed as
the convex combination of the vertices. Therefore,

(ai1 , ai2 , . . . , ain ) = co{(vi1 ), (vi2 ), . . . , (vi2n )},

which implies that


m
[
P = cone co {(vi1 ), (vi2 ), . . . , (vi2n )}.
i=1

Hence, P is a finitely generated convex cone and thus, by Proposition 2.44, is


a polyhedral cone that is always closed.

© 2012 by Taylor & Francis Group, LLC


6.4 Separable Sublinear Case 277

As sublinear functions are convex, by Proposition 2.83, ∂ p̃0 (0) is a com-


pact convex set and, from the above discussion, P is a closed convex cone.
Therefore, by the Strict Separation Theorem, Theorem 2.26 (iii), there exists
α ∈ Rn with α 6= 0 such that

sup hα, ξi < inf hα, ξi = − suphα, ξi. (6.51)


ξ∈∂ s̃0 (0) ξ∈−P ξ∈P

Consider
( m
)
X
suphα, ξi = sup hα, ξi : ξ ∈ { λi ∂ p̃i (0) : λi ≥ 0, i = 1, 2, . . . , m}
ξ∈P i=1
m
X
≥ sup{hα, ξi : ξ ∈ λi ∂ p̃i (0)}, ∀ λ ∈ Rm
+
i=1
m
X
= λi p̃i (α), ∀ λ ∈ Rm
+.
i=1

From the preceding relation and condition (6.51),


m
X
p̃0 (α) = sup hα, ξi < − suphα, ξi ≤ − λi p̃i (α), ∀ λ ∈ Rm
+, (6.52)
ξ∈∂ p̃0 (0) ξ∈P i=1

which implies
m
X
λi p̃i (α) < −p̃0 (α), ∀ λ ∈ Rm
+.
i=1

This inequality holds for every λ ∈ Rm + if s̃i (α) ≤ 0, i = 1, 2, . . . , m. Otherwise,


if for some i ∈ {1, 2, . . . , m}, s̃i (α) > 0, then choosing the corresponding
λi → +∞, we arrive at a contradiction. Also, as P is a closed convex cone,
from (6.52),

p̃0 (α) < − sup hα, ξi ≤ 0.


ξ∈P

Therefore, for α ∈ Rn ,

p̃0 (α) < 0 and p̃i (α) ≤ 0, i = 1, 2, . . . , m,

which contradicts (i). Thus condition (ii) is satisfied.


Conversely, suppose that condition (ii) holds, which implies for some
λi ≥ 0, i = 1, 2, . . . , m,
m
X
− λi p̃i (x) ≤ p̃0 (x), ∀ x ∈ Rn .
i=1

© 2012 by Taylor & Francis Group, LLC


278 Optimality without Constraint Qualification

If for some x ∈ Rn , p̃i (x) ≤ 0, i = 1, 2, . . . , m, from the above inequality


p̃0 (x) ≥ 0, thereby establishing condition (i) and hence the desired result. 
We end this chapter by deriving the constraint qualification free optimality
condition for the sublinear programming problem (SP ) from Jeyakumar and
Li [69].

Theorem 6.20 Consider the sublinear programming problem (SP ). Then x̄


is a minimizer of (SP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such
that
m
X m
X
0 ∈ ∂p0 (0) + λi ∂pi (0) and p0 (x̄) + λi bi = 0.
i=1 i=1

Proof. Observe that x̄ is a minimizer of (SP ) if and only if

pi (x) − bi ≤ 0, i = 1, 2, . . . , m =⇒ p0 (x) − p0 (x̄) ≥ 0. (6.53)

But Theorem 6.19 cannot be applied directly to the above system as the
theorem is for the system involving sublinear functions, whereas here neither
pi (x) − bi , i = 1, 2, . . . , m, nor p0 (x) − p0 (x̄) is positively homogeneous and
hence not sublinear functions. So define p̃i : Rn × R → R, i = 0, 1, . . . , m, as

p̃0 (x, t) = p0 (x) − tp0 (x̄) and p̃i (x, t) = pi (x) − tbi , i = 1, 2, . . . , m.

Because pi , i = 1, 2, . . . , m, are separable sublinear functions on Rn , p̃i ,


i = 1, 2, . . . , m, are also separable sublinear functions along with the sublin-
earity of p̃0 on Rn × R. Now consider the system

p̃i (x, t) ≤ 0, i = 1, 2, . . . , m =⇒ p̃0 (x, t) ≥ 0. (6.54)

This system is in the desired form needed for the application of Farkas’
Lemma, Theorem 6.19. To establish the result, we will first establish the equiv-
alence between the systems (6.53) and (6.54).
Suppose that the system (6.53) holds. We claim that (6.54) is also satisfied.
On the contrary, assume that the system (6.54) does not hold, which implies
there exists (x̃, t̃) ∈ Rn × R such that

p̃0 (x̃, t̃) < 0 and p̃i (x̃, t̃) ≤ 0, i = 1, 2, . . . , m.

For t̃ > 0, by positive homogeneity of the sublinear function and the con-
struction of p̃i , i = 0, 1, . . . , m,

p0 (x̃/t̃) − p0 (x̄) = p̃0 (x̃/t̃, 1) < 0,


pi (x̃/t̃) − bi = p̃i (x̃/t̃, 1) ≤ 0, i = 1, 2, . . . , m,

thereby contradicting (6.53).

© 2012 by Taylor & Francis Group, LLC


6.4 Separable Sublinear Case 279

Now, in particular, taking t̃ = 0,

p0 (x̃) = p̃0 (x̃, 0) < 0 and pi (x̃) = p̃i (x̃, 0) ≤ 0, i = 1, 2, . . . , m.

For t > 0, consider x̄ + tx̃ ∈ Rn . Therefore, by the feasibility of x̄ for (SP )


and the above condition,

p0 (x̄ + tx̃) − p0 (x̄) ≤ tp0 (x̃) < 0,


pi (x̄ + tx̃) − bi ≤ pi (x̄) − bi + tpi (x̃) ≤ 0, i = 1, 2, . . . , m,

which is again a contradiction of the system (6.53).


If t̃ < 0, then by construction of p̃i , i = 0, 1, . . . , m,

p0 (x̃) − t̃p0 (x̄) < 0 and pi (x̃) − t̃bi ≤ 0, i = 1, 2, . . . , m.

Consider x̃ + (−t̃ + 1)x̄ ∈ Rn . By the sublinearity of pi , i = 0, 1, . . . , m,

p0 (x̃ + (−t̃ + 1)x̄) ≤ p0 (x̃) + (−t̃ + 1)p0 (x̄)


≤ t̃p0 (x̄) + (−t̃ + 1)p0 (x̄) = p0 (x̄),

and

pi (x̃ + (−t̃ + 1)x̄) ≤ pi (x̃) + (−t̃ + 1)pi (x̄)


≤ t̃bi + (−t̃ + 1)bi = bi , i = 1, 2, . . . , m,

which contradicts (6.53). Thus from all three cases, it is obvious that our
assumption is wrong and hence the system (6.54) holds.
Conversely, taking t = 1 in system (6.54) yields (6.53). Hence, both systems
(6.53) and (6.54) are equivalent. Applying Farkas’ Lemma, Theorem 6.19, for
the sublinear systems to (6.54), there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
p̃0 (x, t) + λi p̃i (x, t) ≥ 0, ∀ (x, t) ∈ Rn × R,
i=1

which implies (0, 0) ∈ Rn × R is a point of minimizer of the unconstrained


problem
m
X
min p̃0 (x, t) + λi p̃i (x, t) subject to (x, t) ∈ Rn × R.
i=1

By the KKT optimality condition for the unconstrained problem, Theo-


rem 2.89,
m
X
(0, 0) ∈ ∂(p̃0 + λi p̃i )(0, 0)
i=1
Xm m
X
= ∂(p0 + λi pi )(0) × ∇(tp0 (x̄) + λi tbi )(0),
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


280 Optimality without Constraint Qualification

where the subdifferential is with respect to x and the gradient with respect to
t. Therefore, a componentwise comparison leads to
m
X
0 ∈ ∂(p0 + λi pi )(0) and p0 (x̄) + λi bi = 0.
i=1

As dom pi = Rn , i = 0, 1, . . . , m, by Theorem 2.69, pi , i = 0, 1, . . . , m, are


continuous on Rn . Thus, by the Sum Rule, Theorem 2.91, the first relation
yields
m
X
0 ∈ ∂p0 (0) + λi ∂pi (0),
i=1

thereby establishing the desired result. 

© 2012 by Taylor & Francis Group, LLC


Chapter 7
Sequential Optimality Conditions

7.1 Introduction

In this chapter we are going to look into a completely different approach to


develop optimality conditions in convex programming. These optimality con-
ditions, called sequential optimality conditions, can hold without any qualifica-
tion and thus both from a theoretical as well as practical point of view this is
of great interest. To the best of our knowledge, this approach was initiated by
Thibault [108]; Jeyakumar, Rubinov, Glover, and Ishizuka [70]; and Jeyaku-
mar, Lee, and Dinh [68]. Unlike the approach of direction sets in Chapter 6,
in the sequential approach one needs calculus rules for subdifferentials and ε-
subdifferentials, namely the Sum Rule and the Chain Rule. As the name itself
suggests, the sequential optimality conditions are established as a sequence
of subdifferentials at neighborhood points as in the work of Thibault [108] or
sequence of ε-subdifferentials at the exact point as in the study of Jeyakumar
and collaborators [68, 70]. Thibault [108] used the approach of sequential sub-
differential calculus rules while Jeyakumar and collaborators [68, 70] used the
approach of epigraphs of conjugate functions to study the sequential optimal-
ity conditions extensively. In both these approaches, the convex programming
problem involved cone constraints and abstract constraints. But keeping in
sync with the convex programming problem (CP ) studied in this book, we
consider the feasible set C involving convex inequalities. The reader must have
realized the central role of the Slater constraint qualification in the study of
optimality and duality in optimization. However, as we have seen, the Slater
constraint qualification can fail even for very simple problems. The failure of
the Slater constraint qualification was overcome by the development of the
so-called closed cone constraint qualification. It is a geometric qualification
that uses the Fenchel conjugate of the constraint function. We will study this
qualification condition in detail.

281

© 2012 by Taylor & Francis Group, LLC


282 Sequential Optimality Conditions

7.2 Sequential Optimality: Thibault’s Approach


We first discuss the approach due to Thibault [108]. As already mentioned,
he makes use of sequential subdifferential rules in his work. As one will observe,
the Sum Rule and the Chain Rule are expressed in terms of the sequence
of subdifferentials at neighborhood points. We present below the Sum Rule
from Thibault [108] which involves the application of the Sum Rule given by
Hiriart-Urruty and Phelps [64].
Theorem 7.1 (Sequential Sum Rule) Consider two proper lsc convex func-
tions φi : Rn → R̄, i = 1, 2. Then for any x̄ ∈ dom φ1 ∩ dom φ2 ,
∂(φ1 + φ2 )(x̄) = lim sup {∂φ1 (x1 ) + ∂φ2 (x2 )},
xi →φi −h,i x̄

where lim supxi →φi −h,i x̄ {∂φ1 (x1 ) + ∂φ2 (x2 )} denotes the set of all limits
limk→∞ (ξ1k + ξ2k ) for which there exists xki → x̄, i = 1, 2 such that
ξik ∈ ∂φi (xki ), i = 1, 2, and
φi (xki ) − hξik , xki − x̄i → φi (x̄), i = 1, 2. (7.1)
Proof. Suppose that ξ ∈ ∂(φ1 + φ2 )(x̄). By Theorem 2.120,
\
ξ∈ cl{∂1/k φ1 (x̄) + ∂1/k φ2 (x̄)},
k∈N

which implies for every k ∈ N,


ξ ∈ cl {∂1/k φ1 (x̄) + ∂1/k φ2 (x̄)}.
From Definition 2.12 of the closure of a set, for every k ∈ N,
1
ξ ∈ ∂1/k φ1 (x̄) + ∂1/k φ2 (x̄) + B.
k
Therefore, there exists ξi′k ∈ ∂1/k φi (x̄), i = 1, 2, and bk ∈ B such that
1 k
ξ = ξ1′k + ξ2′k + b . (7.2)
k
Applying the modified version of the Brøndsted–Rockafellar Theorem, Theo-
rem 2.114, there exist xki ∈ Rn and ξik ∈ ∂φi (xki ) such that for i = 1, 2,
1 1 2
kxki − x̄k ≤ √ , kξik − ξi′k k ≤ √ , |φi (xki ) − hξik , xki − x̄i − φi (x̄)| ≤ , (7.3)
k k k

which implies ξi′k = ξik + 1/ k bki for some bki ∈ B for i = 1, 2. Therefore, the
condition (7.2) becomes
1 1 1
ξ = ξ1k + ξ2k + ( bk + √ bk1 + √ bk2 ),
k k k

© 2012 by Taylor & Francis Group, LLC


7.2 Sequential Optimality: Thibault’s Approach 283

that is,

ξ = lim (ξ1k + ξ2k ),


k→∞

which along with (7.3) yields the desired inclusion.


Conversely, suppose that

ξ ∈ lim sup {∂φ1 (x1 ) + ∂φ2 (x2 )},


xi →φi −h,i x̄

which implies for i = 1, 2, there exist xki → x̄, ξik ∈ ∂φi (xki ) satisfying
φi (xki ) − hξik , xki − x̄i → φi (x̄) and

ξ = lim (ξ1k + ξ2k ).


k→∞

As ξik ∈ ∂φi (xki ), i = 1, 2,

hξik , x − xki i ≤ φi (x) − φi (xki ), ∀ x ∈ Rn .

Also, for every x ∈ Rn ,

hξik , x − x̄i = hξik , x − xki i + hξik , xki − x̄i


≤ φi (x) − φi (xki ) + hξik , xki − x̄i, i = 1, 2,

thereby yielding

hξ1k + ξ2k , x − x̄i ≤ φ1 (x) + φ2 (x) − φ1 (xk1 ) − φ2 (xk2 )


+hξ1k , xk1 − x̄i + hξ2k , xk2 − x̄i

for every x ∈ Rn . Taking the limit as k → +∞ and using the condition (7.1),
the above inequality reduces to

hξ, x − x̄i ≤ (φ1 + φ2 )(x) − (φ1 + φ2 )(x̄), ∀ x ∈ Rn ,

which implies ξ ∈ ∂(φ1 + φ2 )(x̄), thereby establishing the requisite result. 


Using a very different assumption, the Moreau–Rockafellar Sum Rule, The-
orem 2.91, was obtained by Thibault [108].

Corollary 7.2 Consider two proper lsc convex functions φi : Rn → R̄,


i = 1, 2. If

0 ∈ core(dom φ1 − dom φ2 ),

then for every x̄ ∈ dom φ1 ∩ dom φ2 ,

∂(φ1 + φ2 )(x̄) = ∂φ1 (x̄) + ∂φ2 (x̄).

© 2012 by Taylor & Francis Group, LLC


284 Sequential Optimality Conditions

Proof. By Definition 2.77 of the subdifferentials, it is easy to observe that


the inclusion
∂(φ1 + φ2 )(x̄) ⊃ ∂φ1 (x̄) + ∂φ2 (x̄) (7.4)
always holds true.
To prove the result, we will show the reverse inclusion in relation (7.4).
Consider ξ ∈ ∂(φ1 + φ2 )(x̄). Then by Theorem 7.1, for i = 1, 2, there exist
xki → x̄ and ξik ∈ ∂φi (xki ) such that

ξ = lim (ξ1k + ξ2k ) and γik = φi (xki ) − hξik , xki − x̄i → φi (x̄). (7.5)
k→∞

Denote ξ k = ξ1k + ξ2k . As 0 ∈ core(dom φ1 − dom φ2 ), by Definition 2.17, for


any y ∈ Rn and y 6= 0, there exist α > 0 and xi ∈ dom φi , i = 1, 2, such that
αy = x1 − x2 . By the convexity of φi , i = 1, 2 along with (7.5),

hξ1k , αyi = hξ1k , x1 − xk1 i + hξ1k , xk1 − x̄i + hξ1k , x̄ − x2 i


≤ φ1 (x1 ) − φ(xk1 ) + hξ1k , xk1 − x̄i + hξ1k , x̄ − x2 i
= φ1 (x1 ) − γ1k + hξ k , x̄ − x2 i + hξ2k , x2 − xk2 i + hξ2k , xk2 − x̄i
≤ φ1 (x1 ) − γ1k + hξ k , x̄ − x2 i + φ2 (x2 ) − φ2 (xk2 ) + hξ2k , xk2 − x̄i
= (φ1 (x1 ) − γ1k ) + (φ2 (x2 ) − γ2k ) + hξ k , x̄ − x2 i.

As the limit k → +∞, using the conditions (7.5),

(φ1 (x1 ) − γ1k ) + (φ2 (x2 ) − γ2k ) + hξ k , x̄ − x2 i


→ (φ1 (x1 ) − φ1 (x̄)) + (φ2 (x2 ) − φ2 (x̄)) + hξ, x̄ − x2 i.

Therefore,
My
hξ1k , yi ≤ , ∀ k ∈ N,
α
My
that is, {hξ1k , yi} is bounded above by which is independent of k. Simi-
α
k
larly, the sequence {hξ1 , −yi} is bounded above. In particular, taking y = ei ,
i = 1, 2, . . . , n, where ei is a vector in Rn with i-th component 1 and all other
zeroes,

kξ1k k∞ = max |hξ1k , ei i| ≤ max |Mi |.


i=1,2,...,n i=1,2,...,n

Thus, {ξ1k } is a bounded sequence. As ξ1k + ξ2k → ξ, {ξ2k } is also a bounded se-
quence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, the sequences
{ξik }, i = 1, 2, have a convergent subsequence. Without loss of generality,
assume that ξik → ξi , i = 1, 2, such that ξ1 + ξ2 = ξ. By Theorem 2.84,
ξi ∈ ∂φi (x̄), i = 1, 2, thereby yielding

∂(φ1 + φ2 )(x̄) ⊂ ∂φ1 (x̄) + ∂φ2 (x̄),

© 2012 by Taylor & Francis Group, LLC


7.2 Sequential Optimality: Thibault’s Approach 285

which along with (7.4) leads to the desired result. 


Consider the convex optimization problem
min f (x) subject to x ∈ C, (CP )
where f : Rn → R̄ is a proper lsc convex function and C ⊂ Rn is a closed
convex set. We shall now provide the sequential optimality condition for (CP )
as an application to Theorem 7.1.

Theorem 7.3 Consider the convex optimization problem (CP ) where


f : Rn → R̄ is an extended-valued proper lsc convex function. Then x̄ is a
point of minimizer of (CP ) if and only if there exist xki → x̄, i = 1, 2, with
ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ NC (xk2 ) such that

ξ1k + ξ2k → 0, f (xk1 ) − hξ1k , xk1 − x̄i → f (x̄) and hξ2k , xk2 − x̄i → 0.

Proof. Observe that (CP ) is equivalent to the unconstrained problem

min (f + δC )(x) subject to x ∈ Rn .

By the optimality condition for the unconstrained programming problem, The-


orem 2.89, x̄ is a minimum to (CP ) if and only if

0 ∈ ∂(f + δC )(x̄).

Applying Theorem 7.1, there exist sequence {xki } ⊂ Rn with xki → x̄, i = 1, 2,
ξ1k ∈ ∂f (xk1 ) and ξ2k ∈ δC (xk2 ) = NC (xk2 ) satisfying

f (xk1 ) − hξ1k , xk1 − x̄i → f (x̄) and hξ2k , xk2 − x̄i → 0

such that

lim (ξ1k + ξ2k ) = 0,


k→∞

thereby yielding a sequential optimality condition. 


It is important to note that the conditions on the problem data of (CP ) was
minimal. The importance of the Sequential Sum Rule becomes clear because
under the assumptions in (CP ), it is not obvious whether the qualification
conditions needed to apply the exact Sum Rule holds or not.
For the convex programming problem (CP ) with C given by (3.1), that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},

it was discussed in Chapter 3 how the normal cones could be explicitly


expressed in terms of the subdifferentials of the constraint functions gi ,
i = 1, 2, . . . , m, in presence of the Slater constraint qualification. But if the
Slater constraint qualification is not satisfied, then how would one explicitly
compute the normal cone. For that we first present the sequential Chain Rule

© 2012 by Taylor & Francis Group, LLC


286 Sequential Optimality Conditions

from Thibault [108] in a finite dimensional setting, a corollary to which plays a


pivotal role in deriving the sequential optimality conditions, when C is explic-
itly given by convex inequalities. Note that in the following we will consider
a vector-valued convex function Φ : Rn → Rm . This means that each compo-
nent function of Φ is a real-valued convex function on Rn . Equivalently, Φ is
convex if for every x1 , x2 ∈ Rn and for every λ ∈ [0, 1],
(1 − λ)Φ(x1 ) + λΦ(x2 ) − Φ((1 − λ)x1 + λx2 ) ∈ Rm
+. (7.6)
The epigraph of Φ, epi Φ, is defined as
epi Φ = {(x, µ) ∈ Rn × Rm : µ ∈ Φ(x) + Rm
+ }.

A function φ : Rm → R̄ is said to be nondecreasing on a set F ⊂ Rm if for


every y1 , y2 ∈ F ,
φ(y1 ) ≤ φ(y2 ) whenever y2 − y1 ∈ Rm
+.

Consider a vector-valued convex function Φ and let φ be nondecreasing convex


function on Φ(Rn ) + Rm n
+ . By the convexity of Φ, for every x1 , x2 ∈ R and for
every λ ∈ [0, 1], the condition (7.6) leads to
(1 − λ)Φ(x1 ) + λΦ(x2 ) ∈ Φ((1 − λ)x1 + λx2 ) + Rm n m
+ ⊂ Φ(R ) + R+ .

Also,
Φ((1 − λ)x1 + λx2 ) ∈ Φ(Rn ) ⊂ Φ(Rn ) + Rm
+.

As φ is a nondecreasing function on Φ(Rn ) + Rm


+ , by the convexity of Φ, (7.6)
implies that
φ(Φ((1 − λ)x1 + λx2 )) ≤ φ((1 − λ)Φ(x1 ) + λΦ(x2 )).
By the convexity of φ, for every x1 , x2 ∈ Rn ,
φ(Φ((1 − λ)x1 + λx2 )) ≤ (1 − λ)φ(Φ(x1 )) + λφ(Φ(x2 )), ∀ λ ∈ [0, 1],
that is, for every λ ∈ [0, 1],
(φ ◦ Φ)((1 − λ)x1 + λx2 ) ≤ (1 − λ)(φ ◦ Φ)(x1 ) + λ(φ ◦ Φ)(x2 ).
Hence, (φ ◦ Φ) is a convex function.
Below we present the Sequential Chain Rule from Thibault [108].
Theorem 7.4 (Sequential Chain Rule) Consider a vector-valued convex func-
tion Φ : Rn → Rm and a proper lsc convex function φ : Rm → R̄ that is non-
decreasing over Φ(Rn ) + Rm+ . Then for ȳ = Φ(x̄) ∈ dom φ, ξ ∈ ∂(φ ◦ Φ)(x̄) if
and only if there exist xk → x̄, yk → ȳ, ξk → ξ, τk → 0, yk′ ∈ Φ(xk ) + Rm +
with yk′ → Φ(x̄) and ηk ∈ Rm+ such that

ηk + τk ∈ ∂φ(yk ), ξk ∈ ∂(ηk Φ)(xk ), hηk , yk′ i → hηk , Φ(xk )i


and
φ(yk ) − hηk , yk − ȳi → φ(ȳ) and hηk , Φ(xk ) − ȳi → 0.

© 2012 by Taylor & Francis Group, LLC


7.2 Sequential Optimality: Thibault’s Approach 287

Proof. Define φ1 (x, y) = φ(y) and φ2 (x, y) = δepi Φ (x, y). We claim that

ξ ∈ ∂(φ ◦ Φ)(x̄) if and only if (ξ, 0) ∈ ∂(φ1 + φ2 )(x̄, ȳ). (7.7)

Suppose that ξ ∈ ∂(φ ◦ Φ)(x̄), which by Definition 2.77 of the subdifferential


implies that

φ(Φ(x)) − φ(Φ(x̄)) = (φ ◦ Φ)(x) − (φ ◦ Φ)(x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn .

Consider (x, y) ∈ epi Φ, which implies y − Φ(x) ∈ Rm m


+ , that is, y ∈ Φ(x) + R+ .
n m
Because φ is nondecreasing over Φ(R ) + R+ , φ(y) ≥ φ(Φ(x)) for every
(x, y) ∈ epi Φ. Therefore, the above condition leads to

φ(y) − φ(ȳ) ≥ hξ, x − x̄i, ∀ (x, y) ∈ epi Φ,

where ȳ = Φ(x̄). From the definition of φ1 and φ2 , for every (x, y) ∈ Rn × Rm


the above condition leads to

φ1 (x, y) + φ2 (x, y) − φ1 (x̄, ȳ) − φ2 (x̄, ȳ) ≥ hξ, x − x̄i + h0, y − ȳi,

thereby implying that (ξ, 0) ∈ ∂(φ1 + φ2 )(x̄, ȳ).


Conversely, suppose that (ξ, 0) ∈ ∂(φ1 + φ2 )(x̄, ȳ), which by the definition
of subdifferential implies that for every (x, y) ∈ Rn × Rm ,

(φ1 + φ2 )(x, y) − (φ1 + φ2 )(x̄, ȳ) ≥ hξ, x − x̄i + h0, y − ȳi.

The above inequality holds in particular for every (x, y) ∈ epi Φ. As


(x, Φ(x)) ∈ epi Φ, the above inequality reduces to

φ(Φ(x)) − φ(Φ(x̄)) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,

which implies that ξ ∈ ∂(φ ◦ Φ)(x̄), thereby establishing our claim (7.7).
Now by Theorem 7.1,

(0, βk ) + (ξk , θk ) → (ξ, 0),

where

βk ∈ ∂φ(yk ), (ξk , θk ) ∈ ∂δepi Φ (xk , yk′ ),


yk → ȳ, (xk , yk′ ) → (x̄, ȳ),
φ(yk ) − hβk , yk − ȳi → φ(ȳ),
φ2 (xk , yk′ ) − hθk , yk′ − ȳi − hξk , xk − x̄i → φ2 (x̄, ȳ) = 0.

Set θk = −ηk and define τk = βk − ηk . Observe that ξk → ξ and


τk = βk + θk → 0. The preceding facts can thus be written as

(0, ηk + τk ) + (ξk , −ηk ) → (ξ, 0),

© 2012 by Taylor & Francis Group, LLC


288 Sequential Optimality Conditions

with

ηk + τk ∈ ∂φ(yk ), (ξk , −ηk ) ∈ ∂δepi Φ (xk , yk′ ),


yk → ȳ, (xk , yk′ ) → (x̄, ȳ),
φ(yk ) − hηk + τk , yk − ȳi → φ(ȳ), (7.8)
φ2 (xk , yk′ ) + hηk , yk′ − ȳi − hξk , xk − x̄i → φ2 (x̄, ȳ) = 0. (7.9)

As (xk , yk′ ) ∈ epi Φ, φ2 (xk , yk′ ) = 0, which along with ξk → ξ and xk → x̄


reduces (7.9) to
hηk , yk′ − ȳi → 0. (7.10)
Also, (ξk , −ηk ) ∈ ∂δepi Φ (xk , yk′ ) implies that

hξk , x − xk i − hηk , y − yk′ i ≤ 0, ∀ (x, y) ∈ epi Φ. (7.11)

Observe that (xk , yk′ ) ∈ epi Φ, which implies that

yk′ − Φ(xk ) ∈ Rm
+.

Therefore, for any y ′ ∈ Rm


+,

y ′ + yk′ − Φ(xk ) ∈ Rm
+,

that is, (xk , y ′ +yk′ ) ∈ epi Φ. In particular, taking x = xk and setting y = yk′ +y ′
for any y ′ ∈ Rm + in (7.11) yields

hηk , y ′ i ≥ 0, ∀ y ′ ∈ Rm
+,

which implies that ηk ∈ Rm + . Taking x = xk and y = Φ(xk ) in


(7.11), hηk , yk′ − Φ(xk )i ≤ 0 which along with the facts that ηk ∈ Rm + and
yk′ − Φ(xk ) ∈ Rm ′
+ leads to hηk , yk − Φ(xk )i = 0. Therefore, (7.11) is equivalent
to

hξk , x − xk i ≤ hηk , yi − hηk , Φ(xk )i, ∀ (x, y) ∈ epi Φ.

In particular, for y = Φ(x),

hξk , x − xk i ≤ (ηk Φ)(x) − (ηk Φ)(xk ), ∀ x ∈ Rn .

Observe that as ηk ∈ Rm + , (ηk Φ) is a convex function and thus the above


inequality implies that (7.11) is equivalent to ξk ∈ ∂(ηk Φ)(xk ). Also from the
condition hηk , yk′ − Φ(xk )i = 0, we have hηk , yk′ i = hηk , Φ(xk )i. Inserting this
fact in (7.10) leads to

hηk , Φ(xk ) − ȳi → 0.

Because τk → 0, (7.8) is equivalent to

φ(yk ) − hηk , yk − ȳi → φ(ȳ),

© 2012 by Taylor & Francis Group, LLC


7.2 Sequential Optimality: Thibault’s Approach 289

thereby establishing the result. 


Next we present the Sequential Chain Rule in a simpler form using the
above theorem and a lemma for which one will require the notion of the Clarke
subdifferential. Recall that at any x ∈ Rn the Clarke generalized gradient or
Clarke subdifferential is given as

∂ ◦ φ(x) = co {ξ ∈ Rn : ξ = lim ∇φ(xk ) where xk → x, xk ∈ D̃},


k→∞

where D̃ denotes the set of points at which φ is differentiable. The Sum


Rule from Clarke [27] is as follows. Consider two locally Lipschitz functions
φ1 , φ2 : Rn → R. Then for λ1 , λ2 ∈ R,

∂ ◦ (λ1 φ1 + λ2 φ2 )(x) ⊂ λ1 ∂ ◦ φ1 (x) + λ2 ∂ ◦ φ2 (x).

For a convex function φ, the convex subdifferential and the Clarke subdiffer-
ential coincide, that is, ∂φ(x̄) = ∂ ◦ φ(x̄). Now we present the lemma that plays
an important role in obtaining the Sequential Chain Rule.

Lemma 7.5 Consider a locally Lipschitz vector-valued function Φ : Rn → Rm .


Suppose that there exist {λk } ⊂ Rm and {xk } ⊂ Rn with λk → 0 and xk → x̄
such that

ωk ∈ ∂ ◦ (λk Φ)(xk ), ∀ k ∈ N.

Then ωk → 0.

Proof. By the Clarke Sum Rule,


m
X
ωk ∈ ∂ ◦ (λk Φ)(xk ) ⊂ λki ∂ ◦ φi (xk ), ∀ k ∈ N,
i=1

where Φ(x) = (φ1 (x), φ2 (x), . . . , φm (x)) and φi : Rn → R, i = 1, 2, . . . , m,


are locally Lipschitz functions. From the above condition, there exist
ωik ∈ ∂ ◦ φi (xk ), i = 1, 2, . . . , m, such that
m
X
ωk = λki ωik .
i=1

Therefore,
m
X m
X
kωk k = k λki ωik k ≤ |λki | kωik k.
i=1 i=1

Because the Clarke subdifferential ∂ ◦ φi (xk ), i = 1, 2, . . . , m, are compact,


{ωik }, i = 1, 2, . . . , m, are bounded sequences and hence by the Bolzano–
Weierstrass Theorem, Proposition 1.3, have a convergent subsequence. With-
out loss of generality, assume that ωik → ωi , i = 1, 2, . . . , m. Because the

© 2012 by Taylor & Francis Group, LLC


290 Sequential Optimality Conditions

Clarke subdifferential as a set-valued map has a closed graph, ωi ∈ ∂ ◦ φi (x̄),


i = 1, 2, . . . , m. Also, as λk → 0, |λki | → 0. Hence, by the compactness of the
Clarke subdifferential, for i = 1, 2, . . . , m, kωi k ≤ Mi < +∞. Therefore,
m
X m
X
|λki | kωik k → 0(Mi ) = 0,
i=1 i=1

thus implying that ωk → 0. 

Theorem 7.6 (A simpler version of the Sequential Chain Rule) Consider


a vector-valued convex function Φ : Rn → Rm and a proper lsc convex
function φ : Rm → R̄ that is nondecreasing over Φ(Rn ) + Rm + . Then for
ȳ = Φ(x̄) ∈ dom φ, ξ ∈ ∂(φ ◦ Φ)(x̄) if and only if there exist

ηk ∈ ∂φ(yk ) and ξk ∈ ∂(ηk Φ)(xk )

satisfying

xk → x̄, yk → Φ(x̄), ξk → ξ,
φ(yk ) − hηk , yk − ȳi → φ(ȳ) and hηk , Φ(xk ) − ȳi → 0.

Proof. Consider ξ ∈ ∂(φ ◦ Φ)(x̄). Suppose that xk , yk , yk′ , ξk , ηk , and τk are


as in Theorem 7.4. Denote ζk = ηk + τk . Observe that for every k ∈ N,

ξk ∈ ∂(ηk Φ)(xk ) and ηk Φ = ζk Φ + (ηk − ζk )Φ,

with ηk − ζk = −τk → 0. Because Φ is convex and every component is


locally Lipschitz, it is simple to show that Φ is also locally Lipschitz. As
xk → x̄, for sufficiently large k, xk ∈ N (x̄) where N (x̄) is a neighborhood of
x̄ on which Φ satisfies the Lipschitz property with Lipschitz constant L̄ > 0.
Hence, (ηk − ζk )Φ is also locally Lipschitz over N (xk ) with Lipschitz constant
L̄kηk − ζk k > 0. This follows from the fact that for any x, x′ ∈ N (xk ),

|(ηk − ζk )Φ(x) − (ηk − ζk )Φ(x′ )| = |h(ηk − ζk ), Φ(x) − Φ(x′ )i|,

which by the Cauchy–Schwarz inequality, Proposition 1.1, implies that

|(ηk − ζk )Φ(x) − (ηk − ζk )Φ(x′ )| ≤ kηk − ζk k kΦ(x) − Φ(x′ )k


≤ L̄kηk − ζk k kx − x′ k.

From Theorem 7.4, ηk ∈ Rm + , which implies (ηk Φ) is convex. However, (ζk Φ)


and ((ηk − ζk )Φ) need not be convex. Thus, ξk ∈ ∂(ηk Φ)(xk ) implies

ξk ∈ ∂ ◦ (ηk Φ)(xk ) = ∂ ◦ {ζk Φ + (ηk − ζk )Φ}(xk ).

By the Clarke Sum Rule,

ξk ∈ ∂ ◦ (ζk Φ)(xk ) + ∂ ◦ ((ηk − ζk )Φ)(xk ),

© 2012 by Taylor & Francis Group, LLC


7.2 Sequential Optimality: Thibault’s Approach 291

which implies that there exist ρk ∈ ∂ ◦ (ζk Φ)(xk ) and ̺ ∈ ∂ ◦ ((ηk − ζk )Φ)(xk )
such that ξk = ρk + ̺k . As ηk − ζk → 0 and xk → x̄, by Lemma 7.5, ̺k → 0.
Setting ̺k = −βk ,

ρk = ξk + βk ∈ ∂ ◦ (ζk Φ)(xk ).

As the limit k → +∞, ρk → ξ, ζk = ηk + τk ∈ ∂φ(yk ), hζk , Φ(xk ) − ȳi → 0,


and

φ(yk ) − hζk , yk − ȳi → φ(ȳ),

thereby yielding the desired conditions.


Conversely, suppose that conditions are satisfied. As ηk ∈ ∂φ(yk ),

φ(y) − φ(yk ) ≥ hηk , y − yk i, ∀ y ∈ Rm ,

which implies that

φ(y) ≥ φ(yk ) − hηk , yk − ȳi + hηk , y − ȳi, ∀ y ∈ Rm .

In particular, for y = Φ(x) for every x ∈ Rn , the above inequality yields that
for every x ∈ Rn ,

(φ ◦ Φ)(x) ≥ φ(yk ) − hηk , yk − ȳi + hηk , Φ(x) − ȳi


= φ(yk ) − hηk , yk − ȳi + hηk , Φ(x) − Φ(xk )i + hηk , Φ(xk ) − ȳi
= φ(yk ) − hηk , yk − ȳi + (ηk ◦ Φ)(x) − (η ◦ Φ)(xk ) +
hηk , Φ(xk ) − ȳi.

Because ξk ∈ ∂(ηk ◦ Φ)(xk ), for every x ∈ Rn ,

(φ ◦ Φ)(x) ≥ φ(yk ) − hηk , yk − ȳi + hξk , x − xk i + hηk , Φ(xk ) − ȳi,

which as the limit k → +∞ reduces to


¯ x − x̄i
(φ ◦ Φ)(x) ≥ φ(ȳ) + hξ,
¯ x − x̄i, ∀ x ∈ Rn .
= (φ ◦ Φ)(x̄) + hξ,

Thus, ξ ∈ ∂(φ ◦ Φ)(x̄), thereby completing the result. 


Now we move on to establish the sequential optimality condition for (CP )
with the real-valued convex objective function f : Rn → R obtained by
Thibault [108] using the above theorem. To apply the result, we equivalently
expressed the feasible set C as

C = {x ∈ Rn : G(x) ∈ −Rm
+ },

where G(x) = (g1 (x), g2 (x), . . . , gm (x)). Observe that G : Rn → Rm is a


vector-valued convex function as gi , i = 1, 2, . . . , m, are convex. Now using
the sequential subdifferential calculus rules, Theorems 7.1 and 7.6, we present
the sequential optimality conditions for the constrained problem (CP ).

© 2012 by Taylor & Francis Group, LLC


292 Sequential Optimality Conditions

Theorem 7.7 Consider the convex programming problem problem (CP ) with
C given by (3.1). Then x̄ ∈ C is a point of minimizer of (CP ) if and only if
there exist xk → x̄, yk → G(x̄), λk ∈ Rm + , ξ ∈ ∂f (x̄), and ξk ∈ ∂(λk G)(xk )
such that
ξ + ξk → 0, hλk , yk i = 0,
hλk , yk − G(x̄)i → 0, hλk , G(xk ) − G(x̄)i → 0.
Proof. Observe that the problem (CP ) can be rewritten as the unconstrained
problem
min(f + (δ−Rm
+
◦ G))(x) subject to x ∈ Rn .
By Theorem 2.89, x̄ is a point of minimizer of (CP ) if and only if
0 ∈ ∂(f + (δ−Rm
+
◦ G))(x̄).
As dom f = Rn , invoking the Sum Rule, Theorem 2.91,
0 ∈ ∂f (x̄) + ∂(δ−Rm
+
◦ G)(x̄),

which is equivalent to the existence of ξ ∈ ∂f (x̄) and ξˆ ∈ ∂(δ−Rm


+
◦ G)(x̄) such
that
ξ + ξˆ = 0.
As G is a convex function and the indicator function δ−Rm
+
is a proper lsc con-
vex function nondecreasing over G(Rn ) + Rm+ , thereby applying Theorem 7.6,
ξ ∈ ∂(δ−R+ ◦ G)(x̄) if and only if there exist xk → x̄, yk → G(x̄), ξk → ξˆ
ˆ m

such that
λk ∈ ∂δ−Rm
+
(yk ) and ξk ∈ ∂(λk G)(xk ),
satisfying
δ−Rm
+
(yk ) − hλk , yk − G(x̄)i → (δ−Rm
+
◦ G)(x̄) and hλk , G(xk ) − G(x̄)i → 0.
As λk ∈ ∂δ−Rm
+
(yk ) = N−Rm
+
(yk ), the sequence {yk } ⊂ −Rm
+ with

hλk , y − yk i ≤ 0, ∀ y ∈ −Rm
+.

In particular, taking y = 0 and y = 2yk in the above inequality leads to


hλk , yk i = 0.
Thus,
hλk , yi ≤ 0, ∀ y ∈ −Rm
+,

which implies {λk } ⊂ Rm m


+ . Using the fact that {yk } ⊆ −R+ , the condition

δ−Rm
+
(yk ) − hλk , yk − G(x̄)i → (δ−Rm
+
◦ G)(x̄)
reduces to
hλk , yk − G(x̄)i → 0,
thereby leading to the requisite result. 

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 293

7.3 Fenchel Conjugates and Constraint Qualification


Observe that in the previous section, the sequential optimality conditions are
obtained in terms of the subdifferentials that are calculated at some neigh-
boring point rather than the exact point of minimum, as is the case in the
standard KKT conditions. But this can be overcome by using the Brøndsted–
Rockafellar Theorem, Theorem 2.114, thereby expressing the result in terms of
the ε-subdifferentails at the exact point. To the best of our knowledge this was
carried out by Jeyakumar, Rubinov, Glover, and Ishizuka [70] and Jeyakumar,
Lee, and Dinh [68]. In their approach, the epigraph of the conjugate function
of the objective function and the constraints play a central role in the charac-
terization of the optimality for the convex programming problem (CP ). The
proof is based on the result in Jeyakumar, Rubinov, Glover, and Ishizuka [70].

Theorem 7.8 Consider the convex programming problem (CP ) with C given
by (3.1). Then x̄ is a point of minimizer of (CP ) if and only if

[ Xm
(0, −f (x̄)) ∈ epi f ∗ + cl epi ( λi gi )∗ . (7.12)
λ∈Rm
+
i=1

Proof. Recall that the feasible set C of the convex programming problem
(CP ) is given by (3.1), that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},

which can be equivalently expressed as

C = {x ∈ Rn : G(x) ∈ −Rm
+ },

where G : Rn → Rm is defined as G(x) = (g1 (x), g2 (x), . . . , gm (x)). Because


gi , i = 1, 2, . . . , m, are convex functions, G is also a convex function. x̄ is a
point of minimizer of (CP ) if and only if

f (x) ≥ f (x̄), ∀ x ∈ C,

that is,

φ(x) + δC (x) ≥ 0, ∀ x ∈ Rn ,

where φ(x) = f (x) − f (x̄). By Definition 2.101 of the conjugate function,

(φ + δC )∗ (0) = sup {h0, xi − (φ + δC )(x)}


x∈Rn
= sup −(φ + δC )(x) ≤ 0.
x∈Rn

© 2012 by Taylor & Francis Group, LLC


294 Sequential Optimality Conditions

This shows that

(0, 0) ∈ epi (φ + δC )∗ ,

which by the epigraph of the conjugate of the sum, Theorem 2.123, implies
that

(0, 0) ∈ cl{epi φ∗ + epi δC



}.

As dom φ = Rn , by Theorem 2.69, φ is continuous on Rn . Hence, by Propo-


sition 2.124, epi φ∗ + epi δC

is closed, which reduces the above condition
to
(0, 0) ∈ epi φ∗ + epi δC

. (7.13)
Consider

(λG)(x) = hλ, G(x)i.

For x ∈ C, G(x) ∈ −Rm m


+ , which implies (λG)(x) ≤ 0 for every λ ∈ R+ . Thus,

sup (λG)(x) = 0. (7.14)


λ∈Rm
+

If x ∈
/ C, there exists some i ∈ {1, 2, . . . , m} such that gi (x) > 0. Hence, it is
simple to see that
sup (λG)(x) = +∞. (7.15)
λ∈Rm
+

Combining (7.14) and (7.15),

δC (x) = sup (λG)(x).


λ∈Rm
+

Applying Theorem 2.123, relation (7.13) along with Proposition 2.103 yields
[
(0, 0) ∈ epi f ∗ + (0, f (x̄)) + cl co epi(λG)∗ .
λ∈Rm
+

S
By Theorem 2.123, λ∈Rm epi(λG)∗ is a convex cone and thus, the above
+
relation reduces to
[
(0, 0) ∈ epi f ∗ + (0, f (x̄)) + cl epi(λG)∗
λ∈Rm
+

[ Xm
= epi f ∗ + (0, f (x̄)) + cl epi( λi gi )∗ ,
λ∈Rm
+
i=1

thereby leading to the requisite condition (7.12).

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 295

Conversely, suppose that condition (7.12) holds, which implies


Pmthere ex-
ist ξ ∈ dom f ∗ , α ≥ 0, αk ≥ 0, {λk } ⊂ Rm + , {ξk } ⊂ dom ( k ∗
i=1 i i ) ,
λ g
i = 1, 2, . . . , m, such that
m
X

(0, −f (x̄)) = (ξ, f (ξ) + α) + lim (ξk , (λki gi )∗ (ξk ) + αk ).
k→∞
i=1

Componentwise comparison leads to

0 = ξ + lim ξk , (7.16)
k→∞
m
X
−f (x̄) = f ∗ (ξ) + α + lim ( (λki gi )∗ (ξk ) + αk ). (7.17)
k→∞
i=1

By Definition 2.101 of the conjugate functions, the condition (7.17) implies


that for every x ∈ Rn ,
m
X
f (x̄) − f (x) ≤ −hξ, xi − α − lim ( (λki gi )∗ (ξk ) + αk )
k→∞
i=1
m
X
≤ −hξ, xi − α − lim (hξk , xi − λki gi (x) + αk ).
k→∞
i=1

In particular, taking x ∈ C, that is, gi (x) ≤ 0, i = 1, 2, . . . , m, in the above


inequality along with the nonnegativity of α, αk , λki , i = 1, 2, . . . , m, and the
condition (7.16) yields

f (x̄) ≤ f (x), ∀ x ∈ C.

Therefore, x̄ is a point of minimizer of (CP ), as desired. 


As one can express the epigraph of conjugate functions in terms of the
ε-subdifferential of the function, Theorem 2.122, Jeyakumar et al. [70, 68]
expressed the above theorem in terms of the ε-subdifferentials, thus obtaining
the sequential optimality conditions presented below. We present the same
using the condition (7.12) obtained in Theorem 7.8.

Theorem 7.9 Consider the convex programming problem (CP ) with C given
by (3.1). Then x̄ is a point of minimizer for (CP ) if and only if there exist
ξ ∈ ∂f (x̄), εki ≥ 0, λki ≥ 0, ξik ∈ ∂εki gi (x̄), i = 1, 2, . . . , m, such that

m
X m
X m
X
ξ+ λki ξik → 0, λki gi (x̄) → 0 and λki εki ↓ 0 as k → +∞.
i=1 i=1 i=1

Proof. Consider Theorem 7.8, according to which x̄ is a point of minimizer


of (CP ) if and only if the containment (7.12) is satisfied. By Theorem 2.122,

© 2012 by Taylor & Francis Group, LLC


296 Sequential Optimality Conditions
Pm
there exist ξ ∈ ∂ε f (x̄), λki ≥ 0 and ξk ∈ ∂εk ( i=1 λki gi )(x̄), i = 1, 2, . . . , m,
with ε, εk ≥ 0 such that
Xm
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + lim (ξk , hξk , x̄i + εk − ( λki gi )(x̄)).
k→∞
i=1

Componentwise comparison leads to

0 = ξ + lim ξk ,
k→∞
Xm
−ε = hξ, x̄i + lim (hξk , x̄i + εk − ( λki gi )(x̄)),
k→∞
i=1

which together imply that


Xm
−ε = lim (εk − ( λki gi )(x̄)).
k→∞
i=1

This equation along with the nonnegativity of ε, εk , and λki , i = 1, 2, . . . , m,


implies
Xm
ε = 0, εk ↓ 0 and λki gi (x̄) → 0. (7.18)
i=1

As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.115, there exist εki ≥ 0,


n

i = 1, 2, . . . , m, such that
m
X m
X
ξk ∈ ∂εki (λki gi )(x̄) and εk = εki .
i=1 i=1

Define I¯k = {i ∈ {1, 2, . . . , m} : λki > 0}. By Theorem 2.117,

∂εki (λki gi )(x̄) = λki ∂ε̄ki gi (x̄), ∀ i ∈ I¯k ,

εki
where ε̄ki = ≥ 0. Therefore,
λki
X X
ξk ∈ λki ∂ε̄ki gi (x̄) + ∂εki (λki gi )(x̄). (7.19)
i∈I¯k / I¯k
i∈

As discussed in Chapter 2, the ε-subdifferential of zero function is zero, that


is, ∂ε 0(x) = {0}. Thus,

/ I¯k .
∂εki (λki gi )(x̄) = 0 = λki ∂εki gi (x̄), ∀ i ∈

The above relation along with the condition (7.19) yields that
X X
ξk ∈ λki ∂ε̄ki gi (x̄) + λki ∂εki gi (x̄). (7.20)
i∈I¯k / I¯k
i∈

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 297

Also,
X X
εk = λki ε̄ki + λki εki ,
i∈I¯k / I¯k
i∈

which along with (7.20) leads to the desired sequential optimality conditions.
Conversely, suppose that the sequential optimality conditions hold. From
Definitions 2.77 and 2.109 of subdifferentials and ε-subdifferentials,

f (x) − f (x̄) ≥ hξ, x − x̄i,


gi (x) − gi (x̄) ≥ hξik , x − x̄i − εki , i = 1, 2, . . . , m,

respectively. The above inequalities along with the sequential optimality con-
ditions imply that
m
X
f (x) − f (x̄) + λki gi (x) ≥ 0, ∀ x ∈ Rn ,
i=1

where {λki } ⊂ R+ , i = 1, 2, . . . , m. In particular, taking x ∈ C, that is,


gi (x) ≤ 0, i = 1, 2, . . . , m, which along with the condition on {λk } reduces
the above inequality to

f (x) ≥ f (x̄), ∀ x ∈ C,

thereby establishing the optimality of x̄ for (CP ). 


Observe that not only is the optimality condition sequential, but one ob-
tains a sequential complementary slackness condition. Note that we are work-
ing in a simple scenario with a convex inequality system. This helps in ex-
pressing the condition (7.12) derived in Theorem 7.8 in a more relaxed form.
By applying Theorem 2.123, the condition becomes
[ Xm
(0, −f (x̄)) ∈ epi f ∗ + cl cl( epi (λi gi )∗ ). (7.21)
λ∈Rm
+
i=1

By the closure properties of the arbitrary union of sets, the condition (7.21)
leads to
m
[ X
(0, −f (x̄)) ∈ epi f ∗ + cl epi (λi gi )∗ . (7.22)
λ∈Rm
+
i=1

Define I¯λ = {i ∈ {1, 2, . . . , m} : λi > 0}. Again by Theorem 2.123,

epi (λi gi )∗ = λi epi gi∗ , ∀ i ∈ I¯λ .

For i 6∈ I¯λ with λi = 0,



∗ ∗ 0, ξ = 0,
(λi gi ) (ξ) = 0 (ξ) =
+∞, otherwise,

© 2012 by Taylor & Francis Group, LLC


298 Sequential Optimality Conditions

which implies that


/ I¯λ .
epi (λi gi )∗ = {0} × R+ , ∀ i ∈
Using the preceding conditions, the relation (7.22) becomes
[ X X
(0, −f (x̄)) ∈ epi f ∗ + cl ( λi epi gi∗ + {0} × R+ )
λ∈Rm ¯
+ i∈Iλ / I¯λ
i∈
[ X
= epi f ∗ + cl ( λi epi gi∗ + {0} × R+ ). (7.23)
λ∈Rm
+ i∈I¯λ
P
Now consider (ξ, α) ∈ i∈I¯λ λi epi gi∗ , which implies that for i ∈ I¯λ there exist
(ξi , αi ) ∈ epi gi∗ such that
X
(ξ, α) = λi (ξi , αi ).
i∈I¯λ

Therefore, for any element (0, ᾱ) ∈ {0} × R+ ,


X
(ξ, α + ᾱ) = λi (ξi , αi + ᾱ/λi ),
i∈I¯λ

ᾱ
where ≥ 0. As (ξi , αi ) ∈ epi gi∗ ,
λi
ᾱ
gi∗ (ξi ) ≤ αi ≤ αi + , ∀ i ∈ I¯λ ,
λi
P
which implies that (ξi , αi + ᾱ/λi ) ∈ epi gi∗ . Hence (ξ, α + ᾱ) ∈ i∈I¯λ λi epi gi∗
for every ᾱ ≥ 0. Therefore, (7.23) reduces to
m
[ X

(0, −f (x̄)) ∈ epi f + cl λi epi gi∗ .
λ∈Rm
+
i=1

It is quite simple to see that


m
[ X m
[
cl λi epi gi∗ = cl cone co epi gi∗ .
λ∈Rm
+
i=1 i=1

We leave this as an exercise for the reader. Hence,


m
[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi gi∗ . (7.24)
i=1

The condition (7.24) implies that there exist ξ ∈ dom f ∗ , α ≥ 0, ξik ∈ dom gi∗ ,
αik , λki ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λki (ξik , gi∗ (ξik ) + αik ).
k→∞
i=1

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 299

Componentwise comparison leads to


m
X
0 = ξ + lim λki ξik , (7.25)
k→∞
i=1
m
X
−f (x̄) = f ∗ (ξ) + α + lim λki (gi∗ (ξik ) + αik ). (7.26)
k→∞
i=1

By Definition 2.101 of the conjugate functions, the condition (7.26) implies


that
m
X
f (x̄) − f (x) ≤ −hξ, xi − α − lim λki (gi∗ (ξik ) + αik )
k→∞
i=1
m
X
≤ −hξ, xi − α − lim λki (hξik , xi − gi (x) + αik ).
k→∞
i=1

In particular, taking x ∈ C along with the nonnegativity of α, αik , and λki ,


i = 1, 2, . . . , m, and the condition (7.25) yields
f (x̄) ≤ f (x), ∀ x ∈ C.
Therefore, x̄ is a point of minimizer of (CP ) under the relation (7.24). This
discussion can be stated as the following result.
Theorem 7.10 Consider the convex programming problem (CP ) with C
given by (3.1). Then x̄ is a point of minimizer of (CP ) if and only if
m
[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi gi∗ . (7.27)
i=1

Using the above result, we present an alternate proof to the sequential


optimality conditions, Theorem 7.9.
Alternate proof of Theorem 7.9. According to the Theorem 7.10, x̄ is
a point of minimizer of (CP ) if and only if the containment (7.27) is satis-
fied. By Theorem 2.122, there exist ξ ∈ ∂ε f (x̄), ξik ∈ ∂εki gi (x̄) and λki ≥ 0,
i = 1, 2, . . . , m, with ε, εki ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + lim λki (ξik , hξik , x̄i + εki − gi (x̄)).
k→∞
i=1

Componentwise comparison leads to


m
X
0 = ξ + lim λki ξik ,
k→∞
i=1
m
X
−ε = hξ, x̄i + lim λki (hξik , x̄i + εki − gi (x̄)),
k→∞
i=1

© 2012 by Taylor & Francis Group, LLC


300 Sequential Optimality Conditions

which together imply that


m
X
−ε = lim λki (εki − gi (x̄)).
k→∞
i=1

k k
This equation along
Pm with the nonnegativity
Pm of kε,k εi and λi , i = 1, 2, . . . , m,
k
implies ε = 0, i=1 λi gi (x̄) → 0 and i=1 λi εi ↓ 0 as k → +∞, thereby
establishing the sequential optimality conditions. The converse can be verified
as in Theorem 7.9. 
As already discussed in the previous chapters, if one assumes certain con-
straint qualifications, then the standard KKT conditions can be established. If
we observe the necessary and sufficient condition
S given
Pmin Theorem 7.8 care-
fully, we will observe that the term cl λ∈Rm epi ( i=1 λi gi )∗ prevents us
+
from further manipulation. On the other hand, one might feel that the route
to the KKT optimality conditions lies in further manipulation of the condition
(7.12). Further, observe that we arrived at the condition (7.12) without any
constraint qualification. However, in order to derive the KKT optimality con-
ditions, one needs some additional qualification conditions on the constraints.
Thus from (7.12) it is natural to consider that the set

[ Xm
epi ( λi gi )∗ is closed.
λ∈Rm
+
i=1

This is usually known as the closed cone constraint qualification or the Farkas–
Minkowski (FM) constraint qualification. One may also take the more relaxed
constraint qualification based on condition (7.27), that is,
m
[
cone (co epi gi∗ ) is closed.
i=1

We will call the above constraint qualification as the relaxed FM constraint


qualification. Below we derive the standard KKT conditions under either of
the two constraint qualification.

Theorem 7.11 Consider the convex programming problem (CP ) with C


given by (3.1). Assume that either the FM constraint qualification holds or
the relaxed FM constraint qualification holds. Then x̄ is a point of minimizer
of (CP ) if and only if there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Proof. From Theorem 7.8, we know that x̄ is a point of minimizer of (CP )


if and only if the relation (7.12) holds. As the FM constraint qualification is

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 301

satisfied, (7.12) reduces to


[ Xm
(0, −f (x̄)) ∈ epi f ∗ + epi ( λi gi )∗ .
λ∈Rm
+
i=1

By the ε-subdifferential characterization of the epigraph of the conjugate func-


Pm 2.122, there exist ξ ∈ ∂ε f (x̄), λi ≥ 0, i = 1, 2, . . . , m, and
tion, Theorem
ξ ′ ∈ ∂ε′ ( i=1 λi gi )(x̄) with ε, ε′ ≥ 0 such that
Xm
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + (ξ ′ , hξ ′ , x̄i + ε′ − ( λi gi )(x̄)).
i=1

Componentwise comparison leads to

0 = ξ + ξ′, (7.28)
m
X
−f (x̄) = hξ, x̄i + ε − f (x̄) + hξ ′ , x̄i + ε′ − ( λi gi )(x̄). (7.29)
i=1

By the feasibility of x̄ ∈ C along with the nonnegativity of ε, ε′ , and λi ,


i = 1, 2, . . . , m, the condition (7.29) leads to
m
X
ε = 0, ε′ = 0, and λi gi (x̄) = 0.
i=1

Because ε = 0 and ε′ = 0, the condition (7.28) lead to the fact that


m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄). (7.30)
i=1
Pm
Further, i=1 λi gi (x̄) = 0 implies that

λi gi (x̄) = 0, i = 1, 2, . . . , m. (7.31)

The conditions (7.30) and (7.31) together yield the KKT optimality condi-
tions.
Now if the relaxed constraint qualification is satisfied, (7.27) reduces to
m
[

(0, −f (x̄)) ∈ epi f + cone co epi gi∗ .
i=1

By the ε-subdifferential characterization of the epigraph of the conjugate


function, Theorem 2.122, there exist ξ ∈ ∂ε f (x̄), ξi ∈ ∂εi gi (x̄) and λi ≥ 0,
i = 1, 2, . . . , m, with ε, εi ≥ 0, i = 1, 2, . . . , m, such that
m
X
(0, −f (x̄)) = (ξ, hξ, x̄i + ε − f (x̄)) + λi (ξi , hξi , x̄i + εi − gi (x̄)).
i=1

© 2012 by Taylor & Francis Group, LLC


302 Sequential Optimality Conditions

Componentwise comparison leads to


m
X
0 = ξ+ λi ξi , (7.32)
i=1
m
X
−f (x̄) = hξ, x̄i + ε − f (x̄) + λi (hξi , x̄i + εi − gi (x̄)). (7.33)
i=1

By the feasibility of x̄ ∈ C along with the nonnegativity of ε, εi , and λi ,


i = 1, 2, . . . , m, the condition (7.33) leads to
m
X m
X
ε = 0, λ i εi = 0 and λi gi (x̄) = 0.
i=1 i=1

Let us assume that I¯ = {i ∈ {1, 2, . . . , m} : λi > 0} is nonempty. Then


¯ εi = 0 and gi (x̄) = 0, imply that
corresponding to any i ∈ I,
ξ ∈ ∂f (x̄), ξi ∈ ∂gi (x̄) and ¯
λi gi (x̄) = 0, i ∈ I. (7.34)
Therefore, from (7.32) and (7.34),
X X
0=ξ+ λi ξi ∈ ∂f (x̄) + λi ∂gi (x̄).
i∈I¯ i∈I¯

¯ choose εi = 0, the above condition leads to


For i 6∈ I,
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄),
i=1

along with the complementary slackness condition λi gi (x̄) = 0, i = 1, 2, . . . , m


and thereby establishing the standard KKT optimality conditions. The reader
should try to see how to arrive at the KKT optimality conditions when I¯ is
empty. The sufficiency can be worked out using Definition 2.77 of subdiffer-
entials, as done in Chapter 3. 
The proof of the KKT optimality conditions under the FM constraint
qualification was given by Jeyakumar, Lee, and Dinh [68] and that using
the relaxed FM condition is based on Jeyakumar [67]. It has been shown by
Jeyakumar, Rubinov, Glover, and Ishizuka [70] that under the Slater con-
straint qualification, the FM constraint qualification holds. We present the
result below proving the same.
Proposition 7.12 Consider the set C given by (3.1). Assume that the Slater
constraint qualification holds, that is, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m. Then the FM constraint qualification is satisfied, that is,
[ Xm
epi ( λi gi )∗ is closed.
λ∈Rm
+
i=1

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 303

Proof. Observe that defining G = (g1 , g2 , . . . , gm ),

[ Xm [
epi ( λi gi )∗ = epi (λG)∗ .
λ∈Rm
+
i=1 λ∈Rm
+

Suppose that
[
(ξk , αk ) → (ξ, α) ∈ cl epi (λG)∗
λ∈Rm
+

with (λk G)∗ (ξk ) ≤ αk for some λk ∈ Rm m


+ . As int R+ is nonempty, one can
m
always find a compact convex set R ⊂ R+ such that 0 ∈/ R and cone R = Rm+.
Thus, λk = γk bk , where γk ≥ 0 and bk ∈ R. Assume that γk ≥ 0 for every k
and bk → b ∈ R by the compactness of R. We consider the following cases.
(i) γk → γ > 0: Consider

(λk G)∗ (ξk ) ≤ αk ⇐⇒ (γk bk G)∗ ≤ αk


⇐⇒ (bk G)∗ (ξk /γk ) ≤ αk /γk .

Because bk G → bG, ξk /γk → ξ/γ and αk /γk → α/γ,

(bG)∗ (ξ/γ) ≤ lim inf (bk G)∗ (ξk /γk ) ≤ α/γ.


k→∞
S
Therefore, (ξ/γ, α/γ) ∈ epi (bG)∗ and hence (ξ, α) ∈ λ∈Rm epi (λG)∗ .
+

(ii) γk → +∞: Then ξk /γk → 0 and αk /γk → 0. Therefore,

(bG)∗ (0) ≤ lim inf (bk G)∗ (ξk /γk ) ≤ 0,


k→∞

which implies

− infn (bG)(x) = sup (−(bG)(x)) ≤ 0,


x∈R x∈Rn

that is, (bG)(x) ≥ 0 for every x ∈ Rn . But by the Slater constraint qual-
ification, G(x̂) ∈ −int Rm
+ and b 6= 0. Therefore, (bG)(x̂) < 0, which is a
contradiction.
(iii) γk → 0: This implies that λk → 0 and thus (λk G) → 0. Therefore,

0∗ (ξ) ≤ lim inf (λk G)∗ (ξk ) ≤ α.


k→∞

Observe that

∗ ′ 0, ξ ′ = 0,
0 (ξ ) =
+∞, otherwise,

© 2012 by Taylor & Francis Group, LLC


304 Sequential Optimality Conditions

which leads to ξ = 0 and α ≥ 0. Thus,


[
(0, α) ∈ epi (0G)∗ ⊂ epi (λG)∗ .
λ∈Rm
+

Therefore, the closed cone constraint qualification is satisfied. 


Next we present some examples to show that the FM constraint qualifica-
tion is weaker in comparison to the Slater constraint qualification. Consider
C = {x ∈ R : g(x) ≤ 0}, where
 2
x , x ≤ 0,
g(x) =
x, x ≥ 0.

Observe that C = {0} and hence the Slater constraint qualification is not
satisfied. Also TC (0) = {0} while

S(0) = {d ∈ R : g ′ (0, d) ≤ 0}
= {d ∈ R : d ≤ 0},

which implies that the Abadie constraint qualification is also not satisfied. For
ξ ∈ R,

∗ 0, 0 ≤ ξ ≤ 1,
g (ξ) =
ξ 2 /4, ξ ≤ 0.

Observe that as only one constraint is involved,


[ [
epi (λg)∗ = λ epi g ∗ = cone epi g ∗ .
λ≥0 λ≥0

Therefore, the FM constraint qualification reduces to the set cone epi g ∗ being
closed, which is same as the relaxed FM constraint qualification. Here,

cone epi g ∗ = {(ξ, α) ∈ R2 : ξ ≤ 0, α > 0} ∪ {(ξ, α) ∈ R2 : ξ ≥ 0, α ≥ 0}

is not closed and hence the FM constraint qualification is not satisfied.


Now suppose that in the previous example,

−2x, x ≤ 0,
g(x) =
x, x ≥ 0.

Again, C = {0} and the Slater constraint qualification does not hold. But
unlike the above example, TC (0) = {0} = S(0) which implies that the Abadie
constraint qualification is satisfied. For ξ ∈ R,

g ∗ (ξ) = 0, − 2 ≤ ξ ≤ 1.

Observe that the set

cone epi g ∗ = {(ξ, α) ∈ R2 : ξ ∈ R, α ≥ 0}

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 305

is closed. Thus, the FM constraint qualification also holds.


Observe that in the above examples, either both the Abadie constraint
qualification and the FM qualification are satisfied or neither holds. Now let
us consider an example from Jeyakumar, Lee, and Dinh [70] showing that the
FM constraint qualification is weaker than the Abadie constraint qualification
as well. Consider a convex function g : R2 → R defined as
q
g(x1 , x2 ) = x21 + x22 − x2 .

Here C = {(x1 , x2 ) ∈ R2 : x1 = 0, x2 ≥ 0}. Observe that the Slater con-


straint qualification does not hold as for any (x1 , x2 ) ∈ C, g(x1 , x2 ) = 0. For
(0, x2 ), x2 > 0, g is differentiable at (0, x2 ) and hence

S(0, x2 ) = R2 while TC (0, x2 ) = {(0, x2 ) : x2 ∈ R}.

Thus the Abadie constraint qualification is also not satisfied. Now, for any
(ξ1 , ξ2 ) ∈ R2 ,

∗ 0, ξ1 = ξ2 = 0,
g (ξ1 , ξ2 ) =
+∞, otherwise.

Therefore,

cone epi g ∗ = {(0, 0)} × R+ ,

which is closed. Hence, the FM constraint qualification holds, thereby showing


that it is a weaker constraint qualification with respect to the Slater and
Abadie constraint qualifications.
Until now we considered the convex programming problem (CP ) with a
real-valued objective function f . This fact played an important role in the
derivation of Theorem 7.8 as the continuity of f on Rn along with Proposi-
tion 2.124 leads to the closedness of

epi f ∗ + epi δC

.

But if f : Rn → R̄ is a proper lsc convex function and C involves inequality


constraints and additionally an abstract constraint, that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X} (7.35)

where X ⊂ Rn is a closed convex set, then one has to impose an additional


condition along with the closed cone constraint qualification to establish the
KKT optimality condition, namely the CC qualification condition, that is,
[ Xm
epi f ∗ + epi ( λi gi )∗ + epi δX

is closed.
λ∈Rm
+
i=1

© 2012 by Taylor & Francis Group, LLC


306 Sequential Optimality Conditions

Next we present the KKT optimality condition in the presence of the CC


qualification condition from Dinh, Nghia, and Vallet [34]. A similar result was
established by Burachik and Jeyakumar [20] under the assumption of CC as
well as FM constraint qualification.

Theorem 7.13 Consider the convex programming problem (CP ) where


f : Rn → R̄ is a proper lsc convex function and the feasible set C is given
by (7.35). Assume that the CC qualification condition is satisfied. Then
x̄ ∈ dom f ∩ C is a point of minimizer of (CP ) if and only if there exist
λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Proof. Suppose that x̄ ∈ dom f ∩ C is a point of minimizer of the problem


(CP ). Then working along the lines of Theorem 7.8, we have

(0, 0) ∈ cl {epi φ∗ + epi δC



},

where φ(x) = f (x) − f (x̄). Expressing C = C ∩ X implies that δC = δC + δX ,


where

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}.

From the proof of Theorem 7.8,

[ Xm

epi δC = cl epi ( λi gi )∗ .
λ∈Rm
+
i=1

Therefore, by Theorem 2.123 and Propositions 2.102 and 2.15 (vi), the above
condition becomes

(0, 0) ∈ cl {epi φ∗ + cl (epi δC


∗ ∗
+ epi δX )}
[ Xm
⊂ cl {epi φ∗ + cl ( epi ( λi gi )∗ + epi δX

)}
λ∈Rm
+
i=1

[ m
X
⊂ cl {epi φ∗ + epi ( λi gi )∗ + epi δX

}.
λ∈Rm
+
i=1

By Propositions 2.103 and 2.15 (vi), the above yields

[ Xm
(0, −f (x̄)) ∈ cl {epi f ∗ + epi ( λi gi )∗ + epi δX

},
λ∈Rm
+
i=1

© 2012 by Taylor & Francis Group, LLC


7.3 Fenchel Conjugates and Constraint Qualification 307

which under the CC qualification condition reduces to


[ Xm
(0, −f (x̄)) ∈ epi f ∗ + epi ( λi gi )∗ + epi δX

.
λ∈Rm
+
i=1

Pm
Applying Theorem 2.122, there exist ξf ∈ ∂εf f (x̄), ξg ∈ ∂εg ( i=1 λi gi )(x̄),
and ξx ∈ ∂εx δX (x̄) = NX,εx (x̄) with εf , εg , εx ≥ 0 such that

0 = ξf + ξg + ξx , (7.36)
−f (x̄) = (hξf , x̄i − f (x̄) + εf )
Xm
+(hξg , x̄i − ( λi gi )(x̄) + εg ) + hξx , x̄i + εx . (7.37)
i=1

Condition (7.36) leads to

Xm
0 ∈ ∂εf f (x̄) + ∂εg ( λi gi )(x̄) + NX,εx (x̄). (7.38)
i=1

Condition (7.37) along with (7.36) and the nonnegativity conditions yields
m
X
εf + εg + εx − λi gi (x̄) = 0.
i=1

From the above condition it is obvious that

εf = 0, εg = 0, εx = 0, and λi gi (x̄) = 0, i = 1, 2, . . . , m.

Therefore, the condition (7.38) reduces to

Xm
0 ∈ ∂f (x̄) + ∂( λi gi )(x̄) + NX (x̄).
i=1

n
As dom gi = R , i = 1, 2, . . . , m, by Theorem 2.91, the above condition
becomes
m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + NX (x̄),
i=1

which along with the complementary slackness condition yields the desired
optimality conditions.
Conversely, suppose that the optimality conditions hold. Therefore, there
exist ξ ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄) such that
m
X
−ξ − λi ξi ∈ NX (x̄),
i=1

© 2012 by Taylor & Francis Group, LLC


308 Sequential Optimality Conditions

that is,
m
X
hξ + λi ξi , x − x̄i ≥ 0, ∀ x ∈ X.
i=1

The convexity of f and gi , i = 1, 2, . . . , m, along with Definition 2.77 of the


subdifferentials, imply that
m
X m
X
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) ≥ 0, ∀ x ∈ X.
i=1 i=1

In particular, for x ∈ C̄, that is, gi (x) ≤ 0, i = 1, 2, . . . , m, along with the


complementary slackness condition, reduces the above condition to

f (x) ≥ f (x̄), ∀ x ∈ C.

Thus, x̄ is a point of minimizer of (CP ). 

7.4 Applications to Bilevel Programming Problems


Consider the following bilevel problem:
min f (x) subject to x ∈ C, (BP )
where C is given as

C = argmin{φ(x) : x ∈ Θ},

f, φ : Rn → R are convex functions, and Θ ⊂ Rn is a convex set. Thus


it is clear that C is a convex set and hence the problem (BP ) is a convex
programming problem. As C is the solution set to a subproblem, which is
again a convex optimization problem, here we call (BP ) a simple convex bilevel
programming problem. In particular, (BP ) contains the standard differentiable
convex optimization problem of the form

min f (x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, Ax = b,

where f, gi : Rn → R, i = 1, 2, . . . , m, are differentiable convex functions, A is


an l × n matrix, and b ∈ Rl . This problem can be posed as the problem (BP )
by defining φ as
m
X
2
φ(x) = ||Ax − b|| + || max{0, gi (x)}||2 ,
i=1

and the lower-level problem is to minimize φ over Rn .

© 2012 by Taylor & Francis Group, LLC


7.4 Applications to Bilevel Programming Problems 309

The bilevel programming problem (BP ) can be equivalently expressed as


a convex programming problem by assuming C to be nonempty and defining

α = inf φ(x).
x∈Θ

Then the reformulated problem is given by


min f (x) subject to φ(x) ≤ α, x ∈ Θ. (RP )
Observe that (RP ) has the same form as the convex programming problem
(CP ) studied in the previous section. From the definition of α, it is easy to
see that there does not exist any x̂ ∈ Θ such that φ(x̂) < α, which implies
that the Slater constraint qualification does not hold for (RP ). We present
the KKT optimality condition as a consequence of Theorem 7.13.

Theorem 7.14 Consider the reformulated problem (RP ). Assume that

{cone {(0, 1)} ∪ cone [(0, α) + epi φ∗ ]} + epi δΘ


is closed. Then x̄ ∈ Θ is a point of minimizer of (RP ) if and only if there is


λ ≥ 0 such that

0 ∈ ∂f (x̄) + λ∂φ(x̄) + NΘ (x̄) and λ(φ(x̄) − α) = 0.

Proof. Observe that the problem (RP ) is of the type considered in The-
orem 7.13. We can invoke Theorem 7.13 if the CC qualification condition
holds, that is,
[
epi f ∗ + epi(µ(φ(.) − α))∗ + epi δΘ

µ≥0

is closed. As dom f = Rn , by Theorem 2.69, f is continuous on Rn and


thus the CC qualification condition can be replaced by the FM constraint
qualification, that is,
[
epi (µ(φ(.) − α))∗ + epi δΘ

(7.39)
µ≥0

is closed. For µ > 0, by Proposition 2.103,

(µ(φ(.) − α))∗ (ξ) = µα + (µφ)∗ (ξ),

which along with Theorem 2.123 leads to

epi (µ(φ(.) − α))∗ = (0, µα) + epi (µφ)∗


= µ((0, α) + epi φ∗ ), ∀ µ > 0. (7.40)

For µ = 0,

∗ ∗ 0, ξ = 0,
(µ(φ(.) − α)) (ξ) = 0 (ξ) =
+∞, otherwise,

© 2012 by Taylor & Francis Group, LLC


310 Sequential Optimality Conditions

which implies

epi (µ(φ(.) − α))∗ = 0 × R+ = cone {(0, 1)}, µ = 0. (7.41)

Using (7.40) and (7.41), the condition (7.39) becomes


[
cone{(0, 1)} ∪ { µ((0, α) + epi φ∗ )} + epi δΘ

.
µ>0

Observe that cone{(0, 1)}∪{(0, 0)} = cone{(0, 1)} and thus the above becomes

cone{(0, 1)} ∪ cone ((0, α) + epi φ∗ ) + epi δΘ



. (7.42)

By the hypothesis of the theorem, (7.42) is a closed set and thus the reformu-
lated problem (RP ) satisfies the FM constraint qualification. Now invoking
Theorem 7.13, there exists λ ≥ 0 such that

0 ∈ ∂f (x̄) + λ∂(φ(.) − α)(x̄) + NΘ (x̄) and λ(φ(x̄) − α) = 0.

As ∂(φ(.) − α)(x̄) = ∂h(x̄), the optimality condition reduces to

0 ∈ ∂f (x̄) + λ∂φ(x̄) + NΘ (x̄),

thereby establishing the desired result. The converse can be proved as in Chap-
ter 3. 
For a better understanding of the above result, consider the bilevel pro-
gramming problem where f (x) = x2 + 1, Θ = [−1, 1], and φ(x) = max{0, x}.
Observe that C = [−1, 0] and α = 0. Thus the reformulated problem is

min x2 + 1 subject to max {0, x} ≤ 0, x ∈ [−1, 0].

For ξ ∈ R,

+∞, ξ < 0 or ξ > 1,
φ∗ (ξ) =
0, ξ ∈ [0, 1],

which implies

epi φ∗ = {(ξ, γ) ∈ R2 : ξ ∈ [0, 1], ξ ≥ 0} = [0, 1] × R+



while epi δΘ = epi |.|. Therefore,

cone epi φ∗ + epi δΘ



= R2+ ∪ {(ξ, γ) ∈ R2 : ξ ≤ 0, γ ≥ −ξ},

which is a closed set. Because cone{(0, 1)} ⊂ cone epi φ∗ , the reformulated
problem satisfies the qualification condition in Theorem 7.14. It is easy to see
that x̄ = 0 is a solution of the bilevel problem with NΘ (0) = {0}, ∂f (0) = {0},
and ∂φ(0) = [0, 1]. Thus the KKT optimality conditions of Theorem 7.14 are

© 2012 by Taylor & Francis Group, LLC


7.4 Applications to Bilevel Programming Problems 311

satisfied with λ = 0. Note that the Slater condition fails to hold for the
reformulated problem.
We end this chapter by presenting the optimality conditions for the bilevel
programming problem
inf f (x) subject to x ∈ C, (BP 1)
where C is the solution set of the lower-level problem

min φ(x) subject to gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X.

Here, φ : Rn → R̄ is a proper, convex, lsc function and gi : Rn → R,


i = 1, 2, . . . , m, are convex functions, and X ⊂ Rn is a closed convex set.
Define

α = inf{φ(x) : gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X} < +∞.

Without loss of generality, assume that α = 0. This can be achieved by setting


φ(x) = φ(x) − α. Then the bilevel programming problem (BP 1) is equivalent
to the following optimization problem:
min f (x) subject to φ(x) ≤ 0, gi (x) ≤ 0, i = 1, 2, . . . , m, x ∈ X. (RP 1)
Below we present the result on optimality conditions for the bilevel program-
ming problem (BP 1).

Theorem 7.15 Consider the bilevel programming problem (BP 1). Assume
that
[ Xm [ [ Xm
{ epi ( λi gi )∗ } ∪ { λ0 epi φ∗ + epi ( λi gi )∗ } + epi δX

λ∈Rm
+
i=1 λ0 >0 λ∈Rm
+
i=1

is closed. Then x̄ ∈ C is a point of minimizer of (BP 1) if and only if there


exist λ0 ≥ 0 and λi ≥ 0, i = 1, 2, . . . , m, such that
Xm
0 ∈ ∂f (x̄) + λ0 ∂φ(x̄) + ∂( λi gi )(x̄) + NX (x̄),
i=1
λ0 φ(x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
e = (λ0 , λ) ∈ R+ × Rm ,
Proof. Observe that for any λ +

m
X
e
(λg)(x) = λ0 φ(x) + λi gi (x).
i=1

Therefore,
Xm
e ∗ = cl{epi (λ0 φ)∗ + epi(
epi(λg) λi gi )∗ }.
i=1

© 2012 by Taylor & Francis Group, LLC


312 Sequential Optimality Conditions

As dom gi = Rn , i = 1, 2, . . . , m, dom (λi gi ) = Rn , i = 1, 2, . . . , m, which by


Theorem 2.69 are continuous on Rn . By Proposition 2.124,
Xm
e ∗ = epi (λ0 φ)∗ + epi (
epi(λg) λi gi )∗ . (7.43)
i=1

Now consider the two cases, namely λ0 = 0 and λ0 > 0. For λ0 = 0,


epi (λ0 φ)∗ = cone {(0, 1)}. Thus, the condition (7.43) reduces to
Xm
e ∗ = cone {(0, 1)} + epi (
epi(λg) λi gi )∗ .
i=1

Observe that for µ ≥ 0,


m
X
µ(0, 1) + (ξ, α) = (ξ, α + µ) ∈ epi ( λi gi )∗ ,
i=1

m
X
where (ξ, α) ∈ epi ( λi gi )∗ . Because µ ≥ 0 was arbitrary,
i=1

Xm Xm
cone {(0, 1)} + epi ( λi gi )∗ ⊂ epi ( λi gi )∗ .
i=1 i=1

m
X
Also, for any (ξ, α) ∈ epi ( λi gi )∗ ,
i=1

m
X
(ξ, α) = (0, 0) + (ξ, α) ∈ cone {(0, 1)} + epi ( λi gi )∗ .
i=1

Xm
As (ξ, α) ∈ epi ( λi gi )∗ was arbitrary,
i=1

Xm Xm
epi ( λi gi )∗ ⊂ cone {(0, 1)} + epi ( λi gi )∗ ,
i=1 i=1

thereby implying that


Xm Xm
cone {(0, 1)} + epi ( λi gi )∗ = epi ( λi gi )∗ .
i=1 i=1

Thus, for λ0 = 0,
Xm
e ∗ = epi (
epi(λg) λi gi )∗ .
i=1

© 2012 by Taylor & Francis Group, LLC


7.4 Applications to Bilevel Programming Problems 313

For the case when λ0 > 0, the condition (7.43) becomes


Xm
e ∗ = λ0 epi φ ∗ +epi (
epi(λg) λi gi )∗ .
i=1

Therefore,
[ [ Xm
e ∗ + epi δ ∗ = {
epi(λg) epi ( λi gi )∗ )} ∪
X
e 1+m
λ∈R +
λ∈Rm
+
i=1

[ [ Xm
{ epi φ∗ + epi ( λi gi )∗ } + epi δX

.
λ0 >0 λ∈Rm
+
i=1

By the given hypothesis, the set


[
e ∗ + epi δ ∗
epi(λg) X
e 1+m
λ∈R +

is closed. Hence, the FM constraint qualification holds for the problem (RP 1).
Because dom f = Rn , by Theorem 2.69 f is continuous on Rn , CC qualification
condition holds for (RP 1). As the bilevel problem (BP 1) is equivalent to
(RP 1), by Theorem 7.11, x̄ ∈ C is a point of minimizer of (BP 1) if and only
e = (λ0 , λ) ∈ R+ × Rm such that
if there exists λ +

e
0 ∈ ∂f (x̄) + ∂(λg)(x̄) + NX (x̄) and e
(λg)(x̄) = 0. (7.44)
As φ is proper convex, λ0 φ is also proper convex. Therefore, dom (λ0 φ) is
a nonempty convex set in Rn . By Proposition 2.14 (i), ri dom (λ0 φ) is non-
empty. Because dom g = Rn , dom (λg) = Rn . Now invoking the Sum Rule,
Theorem 2.91,
Xm
e
∂(λg)(x̄) = λ0 ∂φ(x̄) + ∂( λi gi )(x̄).
i=1

Thus the optimality condition in (7.44) becomes


Xm
0 ∈ ∂f (x̄) + λ0 ∂φ(x̄) + ∂( λi gi )(x̄) + NX (x̄). (7.45)
i=1

By the complementary slackness condition in (7.44),


m
X
e
(λg)(x̄) = λ0 φ(x̄) + λi gi (x̄) = 0.
i=1

As (λ0 , λ) ∈ R+ × Rm
+ , which along with the feasibility of x̄ yields that

λ0 φ(x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.


The above condition together with (7.45) leads to the requisite conditions.
The converse can be proved as in Chapter 3. 

© 2012 by Taylor & Francis Group, LLC


Chapter 8
Representation of the Feasible Set and
KKT Conditions

8.1 Introduction
Until now, we discussed the convex programming problem (CP ) with the
convex feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2 . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions and its variations like
(CP 1) and (CCP ). But is the convexity of the functions forming the convex
feasible set C important? For example, assume C as a subset of R2 given by
C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 ≥ 0}.
This set is convex even though g(x1 , x2 ) = 1−x1 x2 is a nonconvex function. As
stated in Chapter 1, convex optimization basically means minimizing a convex
function over a convex set with no emphasis on as to how the feasible set is
obtained. Very recently (2010), Lasserre [74] published a very interesting paper
discussing this aspect of convex feasibility for smooth convex optimization.

8.2 Smooth Case


In this section, we turn our attention to the case of smooth convex optimiza-
tion studied by Lasserre [74]. From Chapter 2 we know that when a convex
function is differentiable, then ∂φ(x) = {∇φ(x)} and its gradient is also con-
tinuous; thus any differentiable convex function is smooth. So one can obtain
the KKT optimality conditions at the point of minimizer from the subdif-
ferential optimality conditions discussed in Chapter 3; that is, if x̄ is the
point of minimizer of (CP ) with (C) given by (3.1), then there exist λi ≥ 0,
i = 1, 2, . . . , m, such that
m
X
∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

315

© 2012 by Taylor & Francis Group, LLC


316 Representation of the Feasible Set and KKT Conditions

Observe that the KKT conditions for smooth convex optimization problems
look absolutely the same as the KKT conditions for the usual smooth opti-
mization problem. As discussed in earlier chapters, under certain constraint
qualifications like the Slater constraint qualification, the above KKT condi-
tions are necessary as well as sufficient.
Lasserre observed that the convex feasible set C of (CP ) need not always
be defined by convex inequality constraints as in the above example. The
question that Lasserre answers is “in such a scenario what conditions would
make the KKT optimality conditions necessary as well as sufficient? ” So now
the convex set C given by (3.1) is considered, with the only difference that
gi , i = 1, 2, . . . , m, need not be convex even though they are assumed to
be smooth. Lasserre showed that if the Slater constraint qualification and
an additional nondegeneracy condition hold, then the KKT condition is both
necessary and sufficient. Though Lasserre defined the notion of nondegeneracy
for every point of the set C, we define it for a particular point and extend it
to the feasible set C.

Definition 8.1 The nondegeneracy condition is said to hold at x̄ ∈ C if

∇gi (x̄) 6= 0, ∀ i ∈ I(x̄),

where I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} denotes the active index set at x̄.
The set C is said to satisfy the nondegeneracy condition if it holds for every
x̄ ∈ C.

The Slater constraint qualification along with the nondegeneracy condi-


tion gives the following interesting characterization of a convex set given by
Lasserre [74].

Theorem 8.2 Consider the set C given by (3.1) where gi , i = 1, 2, . . . , m,


are smooth. Assume that the Slater constraint qualification is satisfied, that is,
there exists x̂ ∈ Rn such that gi (x̂) < 0, i = 1, 2, . . . , m, and the nondegeneracy
condition holds for C. Then C is convex if and only if

h∇gi (x), y − xi ≤ 0, ∀ x, y ∈ C, ∀ i ∈ I(x). (8.1)

Proof. Suppose that C is a convex set and consider x̄ ∈ C. Therefore, for any
y ∈ C, for every λ ∈ [0, 1], x̄ + λ(y − x̄) ∈ C, that is,

gi (x̄ + λ(y − x̄)) ≤ 0, ∀ i = 1, 2, . . . , m.

Now for i ∈ I(x̄),


gi (x̄ + λ(y − x̄)) − gi (x̄)
lim ≤ 0,
λ↓0 λ
that is, for every i ∈ I(x̄),

h∇gi (x̄), y − x̄i ≤ 0, ∀ y ∈ C.

© 2012 by Taylor & Francis Group, LLC


8.2 Smooth Case 317

Because x̄ ∈ C is arbitrary, the above inequality holds for every x̄ ∈ C, thereby


establishing the desired inequality.
Conversely, suppose that the condition (8.1) holds. Observe that C has an
interior because the Slater constraint qualification holds. Further, (8.1) along
with the nondegeneracy condition of the set C implies that each boundary
point of C has a nontrivial supporting hyperplane. The supporting hyper-
plane is nontrivial due to the non-degeneracy condition on C. Using Proposi-
tion 2.29, C is a convex set. 
Observe that the nondegeneracy condition of the set C is required only in
the sufficiency part of the proof.
Till now in the book, we have mainly dealt with the Slater constraint
qualification and some others namely, Abadie, pseudonormality and the
FM constraint qualification. Another well known constraint qualification for
the convex programming problem (CP ) is the Mangasarian–Fromovitz con-
straint qualification. For (CP ) with C given by (3.1) in the smooth scenario,
Mangasarian–Fromovitz constraint qualification is said to hold at x̄ ∈ Rn if
there exists d ∈ Rn such that

h∇gi (x̄), di < 0, ∀ i ∈ I(x̄).

One may observe that if this constraint qualification is satisfied for x̄ ∈ C,


then ∇gi (x̄) 6= 0, for every i ∈ I(x̄), thereby ensuring that the nondegeneracy
condition at x̄. But the converse need not hold, that is the nondegeneracy
condition need not imply the Mangasarian–Fromovitz constraint qualification.
We verify this claim by the following example. Consider the set C ⊂ R2 given
by

C = {(x1 , x2 ) ∈ R2 : 1 − x1 x2 ≤ 0, x1 + x2 − 2 ≤ 0, x1 ≥ 0}.

Here, g1 (x1 , x2 ) = 1 − x1 x2 , g2 (x1 , x2 ) = x1 + x2 − 2 and g3 (x1 , x2 ) = −x1 .


Note that C = {(1, 1)} and thus trivially is a convex set. At x̄ = (1, 1), the
active index set is I(x̄) = {1, 2} and
   
−1 1
∇g1 (x̄) = 6= 0 and ∇g2 (x̄) = 6= 0,
−1 1

which implies that the nondegeneracy condition is satisfied for C = {x̄}. But
observe that there exists no (d1 , d2 ) ∈ R2 satisfying

−d1 − d2 < 0 and d 1 + d2 < 0

simultaneously, thereby not satisfying the Mangasarian–Fromovitz constraint


qualification at x̄.
We end this section by presenting the result from Lasserre [74] establishing
the necessary and sufficient optimality condition for a minimizer of (CP ) over
C with gi , i = 1, 2, . . . , m, nonconvex smooth functions. As one will observe

© 2012 by Taylor & Francis Group, LLC


318 Representation of the Feasible Set and KKT Conditions

from the result below, the nondegeneracy condition is required only for the
necessary part at the given point and not the set as mentioned in the statement
of the theorem in Lasserre [74]. Also in the converse part, we require the
necessary part of Theorem 8.2, which is independent of the nondegeneracy
condition.

Theorem 8.3 Consider the problem (CP ) where f is smooth and C is given
by (3.1), where gi , i = 1, 2, . . . , m, are smooth but need not be convex. Assume
that the Slater constraint qualification is satisfied and the nondegeneracy con-
dition holds at x̄ ∈ C. Then x̄ is a point of minimizer of (CP ) if and only if
there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Proof. Let x̄ be a point of minimizer of f over C. By the Fritz John optimality


conditions, Theorem 5.1, there exist λ0 ≥ 0, λi ≥ 0, i = 1, 2, . . . , m, not all
simultaneously zero such that
m
X
λ0 ∇f (x̄) + λi ∇gi (x̄) = 0 and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Suppose that λ0 = 0, which implies for some i ∈ {1, 2, . . . , m}, λi > 0. There-
fore, the set

I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}

is nonempty. By the complementary slackness condition,


¯
gi (x̄) = 0, ∀ i ∈ I,

which implies I¯ ⊂ I(x̄). By the optimality condition,


X
λi ∇gi (x̄) = 0,
i∈I¯

which implies that


X
λi h∇gi (x̄), x − x̄i = 0, ∀ x ∈ C.
i∈I¯

As the Slater constraint qualification is satisfied, there exists x̂ ∈ Rn such


that gi (x̂) < 0 for every i = 1, 2, . . . , m. As dom gi = Rn , i = 1, 2, . . . , m,
by Theorem 2.69, gi , i = 1, 2, . . . , m, are continuous on Rn . Thus there exists
δ > 0 such that for every x ∈ Bδ (x̂) and every i = 1, 2, . . . , m, gi (x) < 0, that
is, Bδ (x̂) ⊂ int C. By the preceding equality,
X
λi h∇gi (x̄), x − x̄i = 0, ∀ x ∈ Bδ (x̂). (8.2)
i∈I¯

© 2012 by Taylor & Francis Group, LLC


8.2 Smooth Case 319

As I¯ ⊂ I(x̄), along with the convexity of C and Theorem 8.2, yields that for
¯
every i ∈ I,

h∇gi (x̄), x − x̄i ≤ 0, ∀ x ∈ C,


¯
which along with the condition (8.2) implies that for i ∈ I,

h∇gi (x̄), x − x̄i = 0, ∀ x ∈ Bδ (x̂). (8.3)

Because x̂ ∈ Bδ (x̂), the condition (8.3) reduces to


¯
h∇gi (x̄), x̂ − x̄i = 0, ∀ i ∈ I. (8.4)

For any d ∈ Rn , consider the vector x̂ + λd such that for λ > 0 sufficiently
¯
small, x̂ + λd ∈ Bδ (x̂). Hence, by the condition (8.3), for each i ∈ I,

h∇gi (x̄), x̂ + λd − x̄i = 0,

which implies

h∇gi (x̄), x̂ − x̄i + λh∇gi (x̄), di = 0.


¯
By condition (8.4), for every i ∈ I,

h∇gi (x̄), di = 0, ∀ d ∈ Rn .

Hence, ∇gi (x̄) = 0 for every i ∈ I¯ ⊂ I(x̄) and thereby contradicting the non-
degeneracy condition at x̄. Thus, λ0 6= 0. Dividing the Fritz John optimality
condition by λ0 , the KKT optimality condition is established at x̄ as
m
X
∇f (x̄) + λ̄i ∇gi (x̄) = 0 and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
i=1

λi
where λ̄i = , i = 1, 2, . . . , m.
λ0
Conversely, suppose that x̄ satisfies the KKT optimality conditions. As-
sume that x̄ is not a point of minimizer of (CP ). Therefore, there exists x ∈ C
such that f (x) < f (x̄), which along with the convexity of f implies that

0 > f (x) − f (x̄) ≥ h∇f (x̄), x − x̄i.

Therefore, by the KKT optimality conditions,


m
X
0 > f (x) − f (x̄) ≥ − λi h∇gi (x̄), x − x̄i. (8.5)
i=1

If λi = 0 for every i = 1, 2, . . . , m, we reach a contradiction. Now assume that


I¯ 6= ∅. By Theorem 8.2, for every i ∈ I¯ ⊂ I(x̄),

h∇gi (x̄), x − x̄i ≥ 0, ∀ x ∈ C,

© 2012 by Taylor & Francis Group, LLC


320 Representation of the Feasible Set and KKT Conditions

which implies that


m
X X
λi h∇gi (x̄), x − x̄i = λi h∇gi (x̄), x − x̄i ≥ 0, ∀ x ∈ C,
i=1 i∈I¯

thereby contradicting the condition (8.5) and thus leading to the requisite
result, that is, x̄ is a point of minimizer of f over C. 

8.3 Nonsmooth Case


Motivated by the above work of Lasserre [74], Dutta and Lalitha [40] extended
the study to a nonsmooth scenario involving the locally Lipschitz function.
But before we move on with the work done in this respect, we need some tools
for nonsmooth Lipschitz functions.
Consider a locally Lipschitz function φ : Rn → R. The Clarke directional
derivative of φ at x̄ in the direction d ∈ Rn is defined as
φ(x + λd) − φ(x)
φ◦ (x̄, d) = lim .
x→x̄, λ↓0 λ
The Clarke directional derivative is a sublinear function of the direction d.
In Section 3.6 we defined the Clarke subdifferential using the Rademacher
Theorem. Here, we express the Clarke subdifferential of φ at x̄ using the
Clarke directional derivative defined above as

∂ ◦ φ(x̄) = {ξ ∈ Rn : φ◦ (x̄, d) ≥ hξ, di, ∀ d ∈ Rn }.

The function φ is said to be regular at x̄ if for every d ∈ Rn , the directional


derivative φ′ (x̄, d) exists and

φ◦ (x̄, d) = φ′ (x̄, d), ∀ d ∈ Rn .

Every convex function is regular.


In the nonsmooth scenario, Dutta and Lalitha [40] considered the con-
vex feasible set C of (CP ) to be defined by inequality constraints involving
nonsmooth locally Lipschitz functions that are regular. For example, consider

φ(x) = max{φ1 (x), φ2 (x), . . . , φm (x)},

where φi : Rn → R, i = 1, 2, . . . , m, are smooth functions. Then φ is a locally


Lipschitz regular function.
Now, similar to the nondegeneracy condition in the smooth case given by
Lasserre [74], Dutta and Lalitha [40] defined the notion for nonsmooth locally
Lipschitz scenario as follows.

© 2012 by Taylor & Francis Group, LLC


8.3 Nonsmooth Case 321

Definition 8.4 Consider the set C given by (3.1) where each gi ,


i = 1, 2, . . . , m is a locally Lipschitz function. The set C is said to satisfy
the nondegeneracy condition at x̄ ∈ C if

0 6∈ ∂ ◦ gi (x̄), ∀ i ∈ I(x̄).

If the condition holds for every x̄ ∈ C, the nondegeneracy condition is said to


hold for the set C.

Before moving on to discuss the results obtained in this work, we present


some examples from Dutta and Lalitha [40] to have a look at the above non-
degeneracy condition.
Consider the set

C = {x ∈ R : g0 (x) ≤ 0},

where g0 (x) = max{x3 , x} − 1. Hence C = (−∞, 1]. At the boundary point


x̄ = 1, I(x̄) = {0} where ∂ ◦ g0 (x̄) = [1, 3], thereby satisfying the nonde-
generacy condition. Now if we define the function g0 (x) = max{x3 , x}, then
C = (−∞, 0] with boundary point x̄ = 0 at which ∂ ◦ g0 (x̄) = [0, 1]. Thus, the
nondegeneracy condition is not satisfied at x̄. Observe that in both the cases
g0 is a regular function and the Slater constraint qualification is also satisfied.
Yet in the second scenario the nondegeneracy condition is not satisfied.
But if the functions gi , i = 1, 2, . . . , m, involved are convex and the Slater
constraint qualification holds for (CP ), then the Mangasarian–Fromovitz con-
straint qualification for the nonsmooth case is satisfied at x̄ ∈ C, that is there
exists d ∈ Rn such that

gi′ (x̄, d) < 0, ∀ i ∈ I(x̄).

As the directional derivative is a support function to the subdifferential set,


Theorem 2.79, the above condition is equivalent to

hξi , di < 0, ∀ ξi ∈ ∂gi (x̄), ∀ i ∈ I(x̄),

from which it is obvious that the nondegeneracy condition is ensured for the
convex nonsmooth scenario.
Next we present the equivalent characterization of the convex set C under
the nonsmooth scenario.

Theorem 8.5 Consider the set C be given by (3.1) represented by nonsmooth


locally Lipschitz inequality constraints where gi , i = 1, 2, . . . , m, are regular.
Assume that the Slater constraint qualification holds and satisfies the nonde-
generacy condition. Then C is convex if and only if

gi◦ (x, y − x) ≤ 0, ∀ x, y ∈ C, ∀ i ∈ I(x). (8.6)

© 2012 by Taylor & Francis Group, LLC


322 Representation of the Feasible Set and KKT Conditions

Proof. Consider the convex set C. Working along the lines of Theorem 8.2,
for arbitrary but fixed x̄ ∈ C, for λ ∈ (0, 1),

gi (x̄ + λ(y − x̄)) − gi (x̄)


≤ 0, ∀ i ∈ I(x̄).
λ
As the functions gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions,

gi◦ (x̄, y − x̄) = gi′ (x̄, y − x̄) ≤ 0, ∀ y ∈ C, ∀ i ∈ I(x),

thus leading to the requisite result.


Conversely, suppose that (8.6) holds. As the Slater constraint qualification
holds, the set C has an interior. Now consider any boundary point x ∈ C. By
the condition (8.6) along with the fact that the Clarke directional derivative
is the support function of the Clarke subdifferential, then for every y ∈ C,

hξi , y − xi ≤ gi◦ (x, y − x) ≤ 0, ∀ ξi ∈ ∂ ◦ gi (x), ∀ i ∈ I(x).

As the nondegeneracy condition is satisfied, ξi 6= 0 for every ξi ∈ ∂ ◦ gi (x) and


every i ∈ I(x), which implies that there is a nontrivial supporting hyperplane
to C at x. Hence, by Proposition 2.29, C is a convex set, as desired. 
Now we present the theorem establishing the necessary and sufficient op-
timality conditions for the class of problem (CP ) dealt with in this section.

Theorem 8.6 Consider the problem (CP ) with C is given by (3.1), where
gi , i = 1, 2, . . . , m, are locally Lipschitz regular functions. Assume that the
Slater constraint qualification holds and the nondegeneracy condition is sat-
isfied at x̄ ∈ C. Then x̄ is a point of minimizer of (CP ) if and only if there
exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂f (x̄) + λi ∂ ◦ gi (x̄) and λi gi (x̄) = 0, i = 1, . . . , m.
i=1

Proof. Suppose that x̄ is a point of minimizer of f over C. We know by


Theorem 2.72 that a convex function f is locally Lipschitz. Then by the op-
timality conditions for locally Lipschitz functions at x̄, there exist λi ≥ 0,
i = 0, 1, . . . , m, not all simultaneously zero, such that
m
X
0 ∈ λ0 ∂ ◦ f (x̄) + λi ∂ ◦ gi (x̄) and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Because f is convex, ∂ ◦ f (x̄) = ∂f (x̄). Therefore, the optimality condition can


be rewritten as
m
X
0 ∈ λ0 ∂f (x̄) + λi ∂ ◦ gi (x̄).
i=1

© 2012 by Taylor & Francis Group, LLC


8.3 Nonsmooth Case 323

We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. As λi ,


i = 0, 1, . . . , m, are not all zeroes, the set I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}
is nonempty. Then the above optimality condition reduces to
X
0∈ λi ∂ ◦ gi (x̄),
i∈I¯

which implies there exist ξi ∈ ∂ ◦ gi (x̄), i ∈ I¯ such that


X
0= λi ξi .
i∈I¯

From the definition of the Clarke subdifferential, the above condition leads to
X X
λi gi◦ (x̄, d) ≥ λi hξi , di = 0, ∀ d ∈ Rn . (8.7)
i∈I¯ i∈I¯

As the Slater constraint qualification is satisfied, there exists x̂ ∈ Rn such


that gi (x̂) < 0 for every i = 1, . . . m. Also, as gi , i = 1, 2, . . . , m, are locally
Lipschitz, and hence continuous. Thus there exists δ > 0 such that for every
x ∈ Bδ (x̂), gi (x) < 0, i = 1, . . . , m. In condition (8.7), in particular, taking
d = x − x̄ where x ∈ Bδ (x̂) ⊂ C,
X
λi gi◦ (x̄, x − x̄) ≥ 0, ∀ x ∈ Bδ (x̂). (8.8)
i∈I¯

¯ that is,
By the complementary slackness condition, gi (x̄) = 0 for every i ∈ I,
¯
I ⊂ I(x̄). Therefore, by Theorem 8.5, as C is a convex set, we have
¯
gi◦ (x̄, x − x̄) ≤ 0, ∀ x ∈ Bδ (x̂), ∀ i ∈ I,
¯
which along with the condition (8.8) implies that for every i ∈ I,

gi◦ (x̄, x − x̄) = 0, ∀ x ∈ Bδ (x̂). (8.9)

In particular, for x̂ ∈ Bδ (x̂), the above condition reduces to


¯
gi◦ (x̄, x̂ − x̄) = 0, ∀ i ∈ I. (8.10)

Consider any v ∈ Rn and choose λ > 0 sufficiently small such that


¯
x̂ + λv ∈ Bδ (x̂). Hence, from the condition (8.9), for every i ∈ I,

gi◦ (x̄, x̂ + λv − x̄) = 0, ∀ v ∈ Rn .

As the Clarke generalized directional derivative is sublinear in the direction,


¯ the above condition becomes
for every i ∈ I,

gi◦ (x̄, x̂ − x̄) + λgi◦ (x̄, v) ≥ 0, ∀ v ∈ Rn ,

© 2012 by Taylor & Francis Group, LLC


324 Representation of the Feasible Set and KKT Conditions

which by (8.10) leads to

¯
gi◦ (x̄, v) ≥ 0, ∀ v ∈ Rn , ∀ i ∈ I.

¯
From the definition of the Clarke subdifferential, 0 ∈ ∂ ◦ gi (x̄) for every i ∈ I,
thereby contradicting the nondegeneracy condition. Therefore λ0 6= 0 and
dividing the optimality condition throughout by λ0 reduces it to
m
X
0 ∈ ∂f (x̄) + λ̄i ∂ ◦ gi (x̄) and λ̄i gi (x̄) = 0, i = 1, 2, . . . , m,
i=1

λi
where λ̄i = , i = 1, 2, . . . , m leading to the requisite result.
λ0
Conversely, suppose that the conditions hold at x̄. On the contrary, assume
that x̄ is not a point of minimizer of f over C. Thus, there exists x ∈ C such
that f (x) < f (x̄), which along with the convexity of f ,

0 > f (x) − f (x̄) ≥ hξ, x − x̄i, ∀ ξ ∈ ∂f (x̄). (8.11)

Using the optimality conditions at x̄, there exists ξ0 ∈ ∂f (x̄) and ξi ∈ ∂ ◦ gi (x̄),
i = 1, 2, . . . , m, such that
m
X
0 = ξ0 + λi ξi .
i=1

The above condition along with (8.11) leads to


m
X
0>− λi hξi , x − x̄i,
i=1

which by the definition of Clarke subdifferential along with Theorem 8.5 yields
m
X
0>− λi gi◦ (x̄, x − x̄) ≥ 0,
i=1

thereby leading to a contradiction. Therefore, x̄ is a point of minimizer of


(CP ). 
We end this chapter with an example from Dutta and Lalitha [40] to
illustrate that in the absence of the nondegeneracy condition, even though the
Slater constraint qualification and the regularity of the constraint functions
hold, the KKT optimality condition need not be satisfied.
Consider the problem

min f (x) subject to g1 (x) ≤ 0, g2 (x) ≤ 0

© 2012 by Taylor & Francis Group, LLC


8.3 Nonsmooth Case 325

where

3 −x − 1, x ≤ 0,
f (x) = −x, g1 (x) = x and g2 (x) =
−1, x > 0.

Then the feasible set is C = [−1, 0] and the point of minimizer is x̄ = 0.


Also, C does not satisfy the nondegeneracy condition but the Slater constraint
qualification holds along with the constraint functions being regular. Observe
that ∂f (x̄) = {−1}, ∂ ◦ g1 (x̄) = {0} and ∂ ◦ g2 (x̄) = ∂g2 (x̄) = [−1, 0], and thus
the KKT optimality conditions are not satisfied. Now if in the above example
one takes the objective function to be f (x) = x, then the point of minimizer is
x̄ = −1 at which ∂f (x̄) = {1}, ∂ ◦ g1 (x̄) = {3}, and ∂ ◦ g2 (x̄) = ∂g2 (x̄) = {−1}.
Observe that the KKT optimality conditions hold with λ1 = 0 and λ2 = 1.

© 2012 by Taylor & Francis Group, LLC


Chapter 9
Weak Sharp Minima in Convex
Optimization

9.1 Introduction
In the preceding chapters we studied the necessary and sufficient optimality
conditions for x̄ ∈ Rn to be a point of minimizer for the convex optimization
problem wherein a convex objective function f is minimized over a convex
feasible set C ⊂ Rn . From Theorem 2.90, if the objective function f is strictly
convex, then the point of minimizer x̄ is unique. The notion of unique mini-
mizer was extended to the concept of sharp minimum or, equivalently, strongly
unique local minimum. The ideas of sharp minimizer and strongly unique min-
imizer were introduced by Polyak [94, 95] and Cromme [29]. These notions
played an important role in the approximation theory or the study of pertur-
bation in optimization problems and also in the analysis of the convergence
of algorithms [1, 26, 56]. Below we define the notion of sharp minimum.

Definition 9.1 A function φ : Rn → R̄ defined over a set F ⊂ Rn is said to


be sharp minima at x̄ ∈ F if there exists α > 0 such that

φ(x) − φ(x̄) ≥ α kx − x̄k, ∀ x ∈ F.

From the above definition it is obvious that a point of sharp minimizer


is unique. This is one of the major drawbacks of the concept of sharp mini-
mum as it rules out the most basic optimization problem, namely the linear
programming problem. To overcome this difficulty, the notion of weak sharp
minimum was introduced by Ferris [46]. We study this notion for the convex
optimization problem

min f (x) subject to x ∈ C, (CP )

where f : Rn → R is a convex function and C ⊂ Rn is a closed convex set.

327

© 2012 by Taylor & Francis Group, LLC


328 Weak Sharp Minima in Convex Optimization

9.2 Weak Sharp Minima and Optimality


We begin this section by defining the weak sharp minimum for the convex
optimization problem (CP ) from Ferris [46].
Definition 9.2 Let S ⊂ Rn denote the nonempty solution set of (CP ). Then
S is said to be the set of weak sharp minimizer on C if there exists α > 0
such that
f (x) − f (projS (x)) ≥ α kx − projS (x)k, ∀ x ∈ C.
Observe that for any x ∈ C, projS (x) ∈ S and as S is the solution set,
f (x̄) = constant, ∀ x̄ ∈ S.
Equivalently, S is the set of the weak sharp minimizer if there exists α > 0
such that
f (x) − f (x̄) ≥ α dS (x), ∀ x ∈ C, ∀ x̄ ∈ S.
The equivalent definition was given in Burke and Ferris [25].
Before moving on with the results on equivalent conditions for weak
sharp minimizers, we present some results from Aubin and Ekeland [4], Luc-
chetti [79], Luenberger [80], and Rockafellar [97], which act as a tool in proving
the equivalence.
Proposition 9.3 Consider nonempty closed convex set F ⊂ Rn .
(i) For every x ∈ F ,
NF (x) = {v ∈ Rn : hv, xi = σF (v)}.
(ii) For every y ∈ Rn ,
dF (y) = max (hv, yi − σF (v)).
v∈cl B

(iii) If F is a closed convex cone, then for every y ∈ Rn ,


dF (y) = σcl B∩F ◦ (y).
(iv) For every y ∈ Rn ,
dF (y) = sup dx+TF (x) (y).
x∈F

(v) For every x ∈ F , the subdifferential of the distance function dF is


∂dF (x) = cl B ∩ NF (x)
and the directional derivative is
d′F (x, v) = dTF (x) (v) = σcl B∩NF (x) (v), ∀ v ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


9.2 Weak Sharp Minima and Optimality 329

Proof. (i) From Definition 2.36 of normal cone,

NF (x̄) = {v ∈ Rn : hv, x − x̄i ≤ 0, ∀ x ∈ F }.

Observe that any v ∈ NF (x̄) along with the fact that x̄ ∈ F satisfies the
inequality

hv, x̄i ≤ σF (v) ≤ hv, x̄i,

that is, σF (v) ≤ hv, x̄i. Thus

NF (x̄) = {v ∈ Rn : hv, x̄i = σF (v)}.

(ii) By the definition of the distance function,

dF (y) = inf ky − xk = inf sup hv, y − xi


x∈F x∈F v∈cl B

= sup {hv, yi + inf (−hv, xi)}


v∈cl B x∈F

= sup {hv, yi − σF (v)}.


v∈cl B

(iii) For a closed convex cone F , by Definition 2.30 of polar cone,

F ◦ = {v ∈ Rn : hv, xi ≤ 0, ∀ x ∈ F }.

Therefore, 
0, v ∈ F ◦ ,
σF (v) = (9.1)
+∞, otherwise.
From (ii), which along with the above relation (9.1) yields that

dF (y) = sup hv, yi, provided v ∈ F ◦ ,


v∈cl B

which implies

dF (y) = sup hv, yi = σcl B∩F ◦ (y),


v∈cl B∩F ◦

as desired.
(iv) By Theorem 2.35, TF (x) is a closed convex cone and hence x + TF (x) is
also a closed convex cone. Invoking (iii) along with Proposition 2.37 leads to

dx+TF (x) (y) = dTF (x) (y − x) = σcl B∩NF (x) (y − x).

Therefore,

sup dx+TF (x) (y) = sup sup hv, y − xi.


x∈F x∈F v∈cl B∩NF (x)

© 2012 by Taylor & Francis Group, LLC


330 Weak Sharp Minima in Convex Optimization

By (i) and (ii), the above condition reduces to

sup dx+TF (x) (y) = sup {hv, yi − σF (v)} = dF (y),


x∈F v∈cl B

thereby establishing the result.


(v) As an example to inf-convolution, Definition 2.54,

dF (x) = (k.k  δF )(x),

which is exact at every x ∈ Rn . Invoking the subdifferential inf-convolution


rule at the point where the inf-convolution is exact, Theorem 2.98,

∂dF (x) = ∂k.k(y) ∩ ∂δF (x − y).

For x ∈ int F , taking y = 0,

∂k.k(0) = cl B while ∂δF (x) = NF (x) = {0}.

Thus, ∂dF (x) = {0} for x ∈ int F . For x ∈ bdry F , again taking y = 0,

∂k.k(0) = cl B while ∂δF (x) = NF (x),

and hence ∂dF (x) = B ∩ NF (x). Therefore,

∂dF (x) = cl B ∩ NF (x), ∀ x ∈ F. (9.2)

As dom dF = Rn , by Theorem 2.79 and the condition (9.2),

d′F (x, v) = σ∂dF (x) (v) = σcl B∩NF (x) (v),

which by (iii) implies that

d′F (x, v) = dTF (x) (v)

and hence the result. 


As we know, the convex optimization problem can be equivalently ex-
pressed as the unconstrained problem
min f0 (x) subject to x ∈ Rn , (CPu )
where f0 (x) = f (x) + δC (x) is an lsc proper convex function. As (CP ) and
(CPu ) are equivalent, the solution set of both problems coincide, which implies
that S is also the set of weak sharp minimizers of (CPu ). Before moving on
to prove the main result on the characterization of the weak sharp minimizer,
we present the results in terms of the objective function f0 of (CPu ).

Lemma 9.4 Consider the unconstrained convex optimization problem (CPu )


and the set of weak sharp minimizers S. Let α > 0. Then the following are
equivalent:

© 2012 by Taylor & Francis Group, LLC


9.2 Weak Sharp Minima and Optimality 331

(i) α cl B ∩ NS (x) ⊂ ∂f0 (x) for every x ∈ S,


[ [
(ii) α cl B ∩ NS (x) ⊂ ∂f0 (x).
x∈S x∈S

Proof. It is easy to observe that (i) implies (ii). Conversely, suppose that (ii)
holds. Consider x̄ ∈ S with ξ ∈ α cl B ∩ NS (x̄). As (ii) is satisfied, there exists
ȳ ∈ S such that ξ ∈ ∂f0 (ȳ). By Definition 2.77 of subdifferential,

f0 (x) − f0 (ȳ) ≥ hξ, x − ȳi, ∀ x ∈ Rn . (9.3)

In particular, for any x ∈ S, f0 (x) = f0 (ȳ), thereby reducing the above in-
equality to

hξ, x − ȳi ≤ 0, ∀ x ∈ S,

which implies ξ ∈ NS (ȳ). By assumption, ξ ∈ NS (x̄). Thus, by Proposi-


tion 9.3 (i),
hξ, x̄i = σS (ξ) = hξ, ȳi. (9.4)
As x̄ ∈ S, f0 (x̄) = f0 (ȳ), which along the conditions (9.3) and (9.4) leads to

f0 (x) − f0 (x̄) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,

thereby implying that ξ ∈ ∂f0 (x̄). Because x̄ ∈ S was arbitrary, (i) holds. 
The above result was from Burke and Ferris [25]. The next result from
Burke and Deng [22] provides a characterization for weak sharp minimizer in
terms of f0 .

Theorem 9.5 Consider the convex optimization problem (CP ) and its equiv-
alent unconstrained problem (CPu ). Let α > 0. Then S is the set of weak sharp
minimizers with modulus α if and only if

f0′ (x̄, v) ≥ α dTS (x̄) (v), ∀ x̄ ∈ S, ∀ v ∈ Rn . (9.5)

Proof. Suppose that S is the set of weak sharp minimizers with modulus
α > 0. Consider x̄ ∈ S. Therefore, by Definition 9.2,

f (x) − f (x̄) ≥ α dS (x), ∀ x ∈ C.

As x̄ ∈ S ⊂ C, f0 (x̄) = f (x̄). Also for x ∈ C, f0 (x) = f (x). Therefore, the


above inequality leads to

f0 (x) − f0 (x̄) ≥ α dS (x), ∀ x ∈ C. (9.6)

For x 6∈ C, f0 (x) = +∞. Thus,

f0 (x) − f0 (x̄) ≥ α dS (x), ∀ x ∈


/C (9.7)

© 2012 by Taylor & Francis Group, LLC


332 Weak Sharp Minima in Convex Optimization

trivially. Combining (9.6) and (9.7) yields

f0 (x) − f0 (x̄) ≥ α dS (x), ∀ x ∈ Rn .

In particular, taking x = x̄ + λv ∈ Rn for λ > 0 and v ∈ Rn in the above


condition leads to

f0 (x̄ + λv) − f0 (x̄) ≥ α dS (x̄ + λv), ∀ λ > 0, ∀ v ∈ Rn ,

which implies that for every λ > 0,

f0 (x̄ + λv) − f0 (x̄) dS (x̄ + λv)


≥α , ∀ v ∈ Rn . (9.8)
λ λ
Observe that
(x − x̄)
dS (x̄ + λv) = inf kx̄ + λv − xk = λ inf kv − k,
x∈S x∈S λ

which by Definition 2.33 of tangent cone implies that

dS (x̄ + λv)
≥ inf kv − yk = dTS (x̄) (v). (9.9)
λ y∈TS (x̄)

Therefore, using (9.8) along with (9.9) leads to

f0 (x̄ + λv) − f0 (x̄)


≥ αdTS (x̄) (v), ∀ v ∈ Rn .
λ
Taking the limit as λ → 0 in the above inequality reduces it to

f0′ (x̄, v) ≥ αdTS (x̄) (v), ∀ v ∈ Rn .

Because x̄ ∈ S was arbitrary, the above condition yields (9.5).


Conversely, suppose that the relation (9.5) is satisfied. Consider x ∈ C and
x̄ ∈ S. Therefore,

f0 (x) − f0 (x̄) ≥ f0′ (x̄, x − x̄) ≥ α dTS (x̄) (x − x̄) = α dx̄+TS (x̄) (x).

By Proposition 9.3 (iv), the above inequality leads to

f0 (x) − f0 (x̄) ≥ α sup dx̄+TS (x̄) (x) = α dS (x).


x̄∈S

Because x ∈ C and x̄ ∈ S were arbitrary, the above condition holds for every
x ∈ C and every x̄ ∈ S, and hence S is the set of weak sharp minimizers. 
We end this chapter by giving equivalent characterizations for the set of
weak sharp minimizers, S, for (CP ) from Burke and Deng [22].

© 2012 by Taylor & Francis Group, LLC


9.2 Weak Sharp Minima and Optimality 333

Theorem 9.6 Consider the convex optimization problem (CP ) and its equiv-
alent unconstrained problem (CPu ). Let α > 0. Then the following statements
are equivalent:
(i) S is the set of weak sharp minimizers for (CP ) with modulus α > 0.
(ii) For every x̄ ∈ S and v ∈ TC (x̄),

f ′ (x̄, v) ≥ α dTS (x̄) (v).

(iii) For every x̄ ∈ S,

α cl B ∩ NS (x̄) ⊂ ∂f0 (x̄).

(iv) The inclusion


[ [
α cl B ∩ NS (x̄) ⊂ ∂f0 (x̄)
x̄∈S x̄∈S

holds.
(v) For every x̄ ∈ S and v ∈ TC (x̄) ∩ NS (x̄),

f ′ (x̄, v) ≥ α kvk.

(vi) For every x̄ ∈ S,

α B ⊂ ∂f (x̄) + (TC (x̄) ∩ NS (x̄))◦ .

(vii) For every x ∈ C,

f ′ (x̄, x − x̄) ≥ α dS (x),

where x̄ ∈ projS (x).

Proof. [(i) =⇒ (ii)] Because S is the set of weak sharp minimizers, by Theo-
rem 9.5,
f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.10)
The above condition holds in particular for v ∈ TC (x). As f0 (x) = f (x) + δC (x),
which along with the fact that f0′ (x, v) = f ′ (x, v) for every x ∈ S and
v ∈ TC (x), and condition (9.10) yields

f ′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ TC (x),

thereby establishing (ii).


[(ii) =⇒ (iii)] As dom f = Rn , by Theorem 2.79 and the relation (ii),

σ∂f (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ TC (x).

© 2012 by Taylor & Francis Group, LLC


334 Weak Sharp Minima in Convex Optimization

By Theorem 2.35, TC (x) is a closed convex cone. Invoking Proposition 2.61 (v)
along with Proposition 2.37 yields

σ∂f (x)+NC (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn .

By the fact that NC (x) = ∂δC (x) and from the Sum Rule, Theorem 2.91,
∂f (x) + NC (x) ⊂ ∂(f + δC )(x) = ∂f0 (x), which is always true along with
Proposition 2.61 (i), the above inequality yields

σ∂f0 (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.11)

By Proposition 9.3 (v), for any x ∈ S and v ∈ Rn ,

α dTS (x) (v) = α σcl B∩NS (x) (v) = α sup hv ∗ , vi.


v ∗ ∈cl B∩NS (x)

As α > 0, the above condition becomes

α dTS (x) (v) = sup hα v ∗ , vi


v ∗ ∈cl B∩NS (x)

= sup hα v ∗ , vi = σα cl B∩NS (x) (v). (9.12)


α v ∗ ∈α cl B∩NS (x)

Substituting the above relation in the inequality (9.11) leads to

σ∂f0 (x) (v) ≥ σα cl B∩NS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn . (9.13)

By Proposition 2.82, ∂f0 (x) is a closed convex set which along with Proposi-
tion 2.61 (iv) and (ii) implies that

α cl B ∩ NS (x) ⊂ ∂f0 (x), ∀ x ∈ S,

thereby proving (iii).


[(iii) =⇒ (i)] By Proposition 2.61 (i), relation (9.13) holds which along with
(9.12) implies that

σ∂f0 (x) (v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn .

By Theorem 2.79, the above inequality leads to

f0′ (x, v) ≥ α dTS (x) (v), ∀ x ∈ S, ∀ v ∈ Rn ,

that is, (9.5) is satisfied. Therefore by Theorem 9.5, (i) holds.


[(iii) ⇐⇒ (iv)] This holds by Lemma 9.4.
[(v) =⇒ (vi)] Because dom f = Rn , by Theorem 2.79, the relation (v) becomes

σ∂f (x) (v) ≥ α sup hv ∗ , vi, ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x).


v ∗ ∈cl B

© 2012 by Taylor & Francis Group, LLC


9.2 Weak Sharp Minima and Optimality 335

As α > 0, for every x ∈ S and every v ∈ TC (x) ∩ NS (x), the above inequality
is equivalent to

σ∂f (x) (v) ≥ sup hα v ∗ , vi = σα cl B (v).


α v ∗ ∈α cl B

Because TC (x) ∩ NS (x) is a closed convex cone, by Proposition 2.61 (v), the
above condition yields that for every x ∈ S,

α cl B ⊂ cl {∂f (x) + (TC (x) ∩ NS (x))◦ }.

Invoking Proposition 2.15,

α B = int (α cl B) ⊂ int {∂f (x) + (TC (x) ∩ NS (x))◦ }


⊂ ∂f (x) + (TC (x) ∩ NS (x))◦ , ∀ x ∈ S,

thereby leading to (vi).


[(vi) =⇒ (v)] Applying Proposition 2.61 (v) to condition (vi) leads to

σ∂f (x) (v) ≥ σα B (v), ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x).

As dom f = Rn , by Theorem 2.79, the above inequality leads to

f ′ (x, v) ≥ α kvk, ∀ x ∈ S, ∀ v ∈ TC (x) ∩ NS (x),

thereby establishing (v).


[(ii) =⇒ (v)] By Proposition 9.3 (vi), for every x ∈ S,

dTS (x) (v) = σcl B∩NS (x) (v).

For every v ∈ NS (x),


dTS (x) (v) = kvk. (9.14)
Therefore, for every x ∈ S and every v ∈ TC (x)∩NS (x), the relation (ii) along
with (9.14) leads to

f ′ (x, v) ≥ α kvk,

thereby deriving (v).


[(v) =⇒ (vii)] Consider x ∈ C and let x̄ ∈ projS (x). By Theorem 2.35,
x − x̄ ∈ TC (x̄). As x̄ ∈ projS (x), by Proposition 2.52,

hx − x̄, ȳ − x̄i ≤ 0, ∀ ȳ ∈ S,

which by Definition 2.36 of normal cone, x − x̄ ∈ NS (x̄). Therefore,

x − x̄ ∈ TC (x̄) ∩ NS (x̄).

© 2012 by Taylor & Francis Group, LLC


336 Weak Sharp Minima in Convex Optimization

i vii

iv iii

ii v vi

FIGURE 9.1: Pictorial representation of Theorem 9.6.

Now by relation (v),

f ′ (x̄, x − x̄) ≥ α kx − x̄k.

As x̄ ∈ projS (x), dS (x) = kx − x̄k. Thus the above inequality becomes

f ′ (x̄, x − x̄) ≥ α dS (x).

Because x ∈ C and x̄ ∈ projS (x) were arbitrary, the inequality holds for every
x ∈ C and x̄ ∈ projS (x), thereby yielding the relation (vii).
[(vii) =⇒ (i)] As dom f = Rn , by Theorem 2.79 along with Definition 2.77 of
subdifferential and the relation (vii) leads to

f (x) − f (x̄) ≥ f ′ (x̄, x − x̄) ≥ α dS (x), ∀ x ∈ C, (9.15)

with x̄ ∈ projS (x). Because for any ȳ ∈ S with ȳ 6= x̄, f (ȳ) = f (x̄). Thus,
(9.15) holds for every x ∈ C and every x̄ ∈ S, thereby leading to (i). 
Figure 9.1 presents the pictorial representation of Theorem 9.6. We have
devoted this chapter only to the theoretical aspect of weak sharp minimizers,
though as mentioned in the beginning this notion plays an important role
from the algorithmic point of view. For readers interested in its computational
aspects, one may refer to Burke and Deng [23, 24] and Ferris [46].

© 2012 by Taylor & Francis Group, LLC


Chapter 10
Approximate Optimality Conditions

10.1 Introduction
We have discussed the various aspects of studying optimality conditions for
the convex programming problem (CP ). Throughout, we concentrated on es-
tablishing the standard or the sequential optimality conditions at the exact
point of minima. But it may not be always possible to find the point of min-
imizer. There may be cases where the infimum exists but is not attainable.
For instance, consider

min ex subject to x ∈ R.

As we know, the infimum for the above problem is zero but it is not attainable
over the whole real line. Thus for scenarios we try to approximate the solution.
In this example, for a given ε > 0, one can always find x̄ ∈ R such that ex̄ < ε.
This leads to the notion of approximate solutions, which play a crucial role in
algorithmic study of optimization problems. Recall the convex optimization
problem
min f (x) subject to x ∈ C, (CP )
where f : Rn → R is a convex function and C is a convex subset of Rn .

Definition 10.1 Let ε ≥ 0 be given. Then x̄ ∈ C is said to be an ε-solution


of (CP ) or an approximate up to ε for (CP ) if

f (x̄) ≤ f (x) + ε, ∀ x ∈ C.

This is not the only way to study approximate solutions. In the literature,
one finds the notions of various approximate solutions introduced over the
years, such as quasi ε-solution, regular ε-solution, almost ε-solution [76], to
name a few. We will define these solution concepts before moving on to study
the approximate optimality conditions. The classes of quasi ε-solution and
regular ε-solution are motivated by Ekeland’s variational principle stated in
Chapter 2.

337

© 2012 by Taylor & Francis Group, LLC


338 Approximate Optimality Conditions

Definition 10.2 Let ε ≥ 0 be given. Then x̄ ∈ C is said to be quasi ε-solution


of (CP ) if

f (x̄) ≤ f (x) + εkx − x̄k, ∀ x ∈ C.
A point x̄ ∈ C, which is an ε-solution as well as a quasi ε-solution of (CP ), is
known as the regular ε-solution of (CP ).
The class of almost ε-solution, as the name itself suggests, seems to be
an approximation to the ε-solution. Actually, it is the approximate solution
concept associated with the perturbed problem. Before defining the almost
ε-solution, recall the feasible set C given by (3.1), that is,
C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},
where gi : Rn → R, i = 1, 2, . . . , m, are convex functions.
Definition 10.3 Let ε ≥ 0 be given. The ε-feasible set of (CP ) with the
feasible set C given by (3.1) is defined as
Cε = {x ∈ Rn : gi (x) ≤ ε, i = 1, 2, . . . , m}.
Then x̄ ∈ Rn is said to be an almost ε-solution of (CP ) if
x̄ ∈ Cε and f (x̄) ≤ f (x) + ε, ∀ x ∈ C.
Observe that here the almost ε-solution need not be from the actual feasible
set but should belong to the perturbed feasible set that is ε-feasible set.
Now we move on to discuss the approximate optimality conditions for the
various classes of approximate solutions. In this chapter we concentrate on the
ε-solution, quasi ε-solution, and almost ε-solution. We begin with the study
of ε-solutions.

10.2 ε-Subdifferential Approach


Consider the unconstrained convex programming problem (CPu )
min f (x) subject to x ∈ Rn . (CPu )
n
If x̄ ∈ R is an ε-solution, then by Definition 10.1,
f (x) − f (x̄) ≥ −ε, ∀ x ∈ Rn .
Using the definition of ε-subdifferential, Definition 2.109, 0 ∈ ∂ε f (x̄). The con-
verse can be established by directly applying the definition of ε-subdifferential.
This has been stated as a result characterizing the ε-solution in Theorem 2.121
as follows.

© 2012 by Taylor & Francis Group, LLC


10.2 ε-Subdifferential Approach 339

Theorem 10.4 Consider the unconstrained problem (CPu ). Then x̄ ∈ Rn is


an ε-solution of (CPu ) if and only if 0 ∈ ∂ε f (x̄).
As the convex programming problem (CP ) can be reformulated as an
unconstrained problem with the objective function f replaced by (f + δC ),
then from the above theorem one has that x̄ is an ε-solution of (CP ) if and
only if
0 ∈ ∂ε (f + δC )(x̄).
Observe that dom f = Rn . If in addition, the Slater constraint qualification,
that is, C has a nonempty relative interior holds, then by invoking the Sum
Rule of ε-subdifferential, Theorem 2.115, along with the definition of ε-normal
set, Definition 2.110, leads to
0 ∈ ∂ε1 f (x̄) + NC,ε2 (x̄)
for some εi ≥ 0, i = 1, 2, with ε1 + ε2 = ε. This may be stated as the following
theorem.
Theorem 10.5 Consider the convex optimization problem (CP ). Assume
that the Slater constraint qualification holds, that is ri C is nonempty. Let
ε ≥ 0 be given. Then x̄ ∈ C is an ε-solution of (CP ) if and only if there exist
εi ≥ 0, i = 1, 2, satisfying ε1 + ε2 = ε such that
0 ∈ ∂ε1 f (x̄) + NC,ε2 (x̄).
Note that for a nonempty convex set C, by Proposition 2.14 (i), ri C is
nonempty and hence the Slater constraint qualification holds. From the above
theorem it is obvious that to obtain the approximate optimality conditions in
terms of the constraint functions gi , i = 1, 2, . . . , m, NC,ε (x) must be explicitly
expressed in their terms. Below we present the result from Strodiot, Nguyen,
and Heukemes [106], which acts as the tool in establishing the approximate
optimality conditions. But before that, we define the right scalar multiplication
from Rockafellar [97].
Definition 10.6 Let φ : Rn → R̄ be a proper convex function and λ ≥ 0.
The right scalar multiplication, φλ, is defined as

λφ(λ−1 x), λ > 0,
(φλ)(x) =
δ{0} (x), λ = 0.
A positively homogeneous convex function generated by φ, ψ, is defined as
ψ(x) = inf{(φλ)(x) : λ ≥ 0}.
Proposition 10.7 Consider ε ≥ 0 and g : Rn → R is a convex function. Let
x̄ ∈ C̄ = {x ∈ Rn : g(x) ≤ 0}. Assume that the Slater constraint qualification
holds, that is, there exist x̂ ∈ Rn such that g(x̂) < 0. Then ξ ∈ NC̄,ε (x̄) if and
only if there exists λ ≥ 0 and ε̄ ≥ 0 such that
ε̄ ≤ λg(x̄) + ε and ξ ∈ ∂ε̄ (λg)(x̄).

© 2012 by Taylor & Francis Group, LLC


340 Approximate Optimality Conditions

Proof. Using the definition of an ε-normal set, Definition 2.110,

NC̄,ε (x̄) = {ξ ∈ Rn : hξ, x − x̄i ≤ ε, ∀ x ∈ C̄}


= {ξ ∈ Rn : σC̄ (ξ) ≤ hξ, x̄i + ε},

where σC̄ (ξ) denotes the support function to the set C̄ at ξ. Observe that
dom g = Rn and hence by Theorem 2.69 continuous over the whole of Rn .
Now invoking Theorem 13.5 from Rockafellar [97] (see also Remark 10.8), the
support function σC̄ is the closure of the positively homogenous function φ
generated by g ∗ , which is defined as

φ(ξ) = inf (g ∗ λ)(ξ) = inf λg ∗ (λ−1 ξ) = inf (λg)∗ (ξ).


λ≥0 λ≥0 λ≥0

Therefore,

NC̄,ε (x̄) = {ξ ∈ Rn : inf (λg)∗ (ξ) ≤ hξ, x̄i + ε}


λ≥0
= {ξ ∈ Rn : there exists λ ≥ 0 such that (λg)∗ (ξ) ≤ hξ, x̄i + ε}
= {ξ ∈ Rn : there exists λ ≥ 0 such that
(λg)∗ (ξ) + (λg)(x̄) ≤ hξ, x̄i + ε + (λg)(x̄)}
n
= {ξ ∈ R : there exists λ ≥ 0 such that
(λg)(x) − (λg)(x̄) ≥
hξ, x − x̄i − ε − (λg)(x̄), ∀ x ∈ Rn }.

From the above condition, there exists λ ≥ 0 such that ξ ∈ ∂ε+(λg)(x̄) (λg)(x̄).
As ∂ε1 φ(x) ⊂ ∂ε2 φ(x) whenever ε1 ≤ ε2 , there exists an ε̄ satisfying
0 ≤ ε̄ ≤ ε + (λg)(x̄) such that ξ ∈ ∂ε̄ (λg)(x̄). Therefore,
[
NC̄,ε (x̄) = {ξ ∈ Rn : there exists λ ≥ 0 such that
0≤ε̄≤ε+(λg)(x̄)
(λg)∗ (ξ) + (λg)(x̄) ≤ hξ, x̄i + ε̄}
[ [
= ∂ε̄ (λg)(x̄),
0≤ε̄≤ε+(λg)(x̄) λ≥0

thereby leading to the desired result. 

Remark 10.8 We state Theorem 13.5 from Rockafellar [97].

Let φ : Rn → R̄ be a proper lsc convex function. The support function


of the set C = {x ∈ Rn : φ(x) ≤ 0} is then cl ψ, where ψ is the
positively homogeneous convex function generated by φ∗ . Dually, the
closure of the positively homogeneous convex function ψ generated by
φ is the support function of the set {x∗ ∈ Rn : φ∗ (x∗ ) ≤ 0}.

For more details, readers are advised to refer to Rockafellar [97].

© 2012 by Taylor & Francis Group, LLC


10.2 ε-Subdifferential Approach 341

Next we present the approximate optimality conditions for the convex


programming problem (CP ).

Theorem 10.9 Consider the convex programming problem (CP ) with C


given by (3.1). Assume that the Slater constraint qualification is satisfied.
Let ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0,
ε̄i ≥ 0, and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε ≤ λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1

Proof. Observe that (CP ) is equivalent to the unconstrained problem


m
X
min (f + δCi )(x) subject to x ∈ Rn ,
i=1

where Ci = {x ∈ Rn : gi (x) ≤ 0}, i = 1, 2, . . . , m. By the Slater constraint


qualification, there exist x̂ ∈ Rn such that gi (x̂) < 0 for every i = 1, 2, . . . , m,
which implies ri Ci , i = 1, 2, . . . , m, is nonempty.
Pm Invoking Theorem 10.5,
there exist εi ≥ 0, i = 0, 1, . . . , m, with ε0 + i=1 εi = ε such that
m
X
0 ∈ ∂ε0 f (x̄) + NCi ,εi (x̄).
i=1

Applying Proposition 10.7 to Ci , i = 1, 2, . . . , m, there exist λ̄i ≥ 0 and ε̄i ≥ 0,


i = 1, 2, . . . , m, such that
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i g)(x̄)
i=1
and ε̄i − εi ≤ λ̄i gi (x̄) ≤ 0, i = 1, 2, . . . , m, (10.1)

where ε̄0 = ε0P


. Now summing (10.1) over i = 1, 2, . . . , m, and using the
m
condition ε0 + i=1 εi = ε leads to
m
X m
X
ε̄i − ε ≤ λ̄i gi (x̄) ≤ 0, (10.2)
i=0 i=1

as desired.
Conversely, define εi = ε̄i − λ̄i gi (x̄), i = 1, 2, . . . , m. Applying Proposi-
tion 10.7, ξi ∈ ∂ε̄i (λ̄i gi )(x̄) is equivalent to ξi ∈ NCi ,εi (x̄) for i = 1, 2, . . . , m.
Also, from the condition (10.2),
m
X m
X m
X
ε̄0 + εi + λ̄i gi (x̄) − ε ≤ λ̄i gi (x̄) ≤ 0,
i=1 i=1 i=1

© 2012 by Taylor & Francis Group, LLC


342 Approximate Optimality Conditions
Pm Pm
which implies ε̄0 + i=1 εi ≤ ε. Define ε0 = ε̄0 +εs , where εs = ε−ε̄0 − i=1 εi .
Observe that εs ≥ 0. Therefore,
m
X m
X
0 ∈ ∂ε̄0 f (x̄) + NCi ,εi (x̄) ⊂ ∂ε0 f (x̄) + NCi ,εi (x̄),
i=1 i=1
Pm
where ε0 + i=1 εi = ε. By Theorem 10.5, x̄ is an ε-solution of (CP ). 
Observe that in the above approximate optimality conditions instead of the
complementary slackness conditions, we have an ε-complementary slackness
condition. Also, we derived the approximate optimality conditions in terms of
the ε-subdifferentials of the objective function as well as the constraint func-
tions at the ε-solution of (CP ) by equivalent characterization of ε-normal set
in terms of the ε-subdifferentials of the constraint functions gi , i = 1, 2, . . . , m.

10.3 Max-Function Approach


As discussed in the Section 3.5, another approach that is well known in estab-
lishing the standard KKT optimality conditions is the max-function approach.
Applying a similar approach for an ε-solution, x̄, of (CP ) we introduce an un-
constrained minimization problem
min F (x) subject to x ∈ Rn , (CPmax )
where F (x) = max{f (x)−f (x̄)+ε, g1 (x), . . . , gm (x)}. Using this max-function,
an alternative proof is provided to derive the approximate optimality condi-
tions. But before that we present a result to study the relation between the
ε-solution of (CP ) and those of the unconstrained problem (CPmax ).

Theorem 10.10 Consider the convex programming problem (CP ) with C


given by (3.1). If x̄ is an ε̄-solution of (CP ), then x̄ is an ε-solution of the un-
constrained problem (CPmax ) for every ε ≥ ε̄. Conversely, if x̄ is an ε-solution
of (CPmax ), then it is an almost 2ε-solution of (CP ).

Proof. Because x̄ is an ε̄-solution of (CP ), x̄ ∈ C with

f (x̄) ≤ f (x) + ε̄, ∀ x ∈ C. (10.3)

Observe that F (x̄) = ε̄. To show that for every ε ≥ ε̄, x̄ ∈ Rn is an ε-solution
for (CPmax ), it is sufficient to establish that

F (x̄) ≤ F (x) + ε̄, ∀ x ∈ Rn ,

which is equivalent to proving that F (x) ≥ 0 for every x ∈ Rn .


For x ∈ C, gi (x) ≤ 0, i = 1, 2, . . . , m, while condition (10.3) ensures that

© 2012 by Taylor & Francis Group, LLC


10.3 Max-Function Approach 343

f (x) − f (x̄) + ε̄ ≥ 0. Therefore, F (x) ≥ 0 for every x ∈ C. If x 6∈ C, then for


some i ∈ {1, 2, . . . , m}, gi (x) > 0 and thus, F (x) > 0 for every x ∈
/ C. Hence,
x̄ is an ε-solution of (CPmax ).
Conversely, as x̄ is an ε-solution of (CPmax ),

F (x̄) ≤ F (x) + ε, ∀ x ∈ Rn .

Therefore,

0 < ε = max{ε, g1 (x̄), g2 (x̄), . . . , gm (x̄)} ≤ F (x) + ε, ∀ x ∈ Rn .

The above condition yields

F (x) > 0 and gi (x̄) ≤ F (x) + ε, i = 1, 2, . . . , m, ∀ x ∈ Rn .

From the first condition, in particular for x ∈ C,

f (x̄) ≤ f (x) + ε ≤ f (x) + 2ε

while in the second condition, taking x = x̄ leads to

gi (x̄) ≤ 2ε, i = 1, 2, . . . , m,

thereby implying that x̄ is an almost 2ε-solution of (CP ). 

Theorem 10.11 Consider the convex programming problem (CP ) with C de-
fined by (3.1). Assume that the Slater constraint qualification is satisfied and
let ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0,
ε̄i ≥ 0, and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε = λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1

Proof. As x̄ is an ε-solution of (CP ), then by Theorem 10.10, x̄ is also an


ε-solution of the unconstrained minimization problem (CPmax ). By the ap-
proximate optimality condition, Theorem 10.4, for the unconstrained problem,

0 ∈ ∂ε F (x̄).

Pm Rule, Remark 2.119, there exist εi ≥ 0,


By the ε-subdifferential Max-Function
λi ≥ 0, i = 0, 1, . . . , m, with i=1 λi = 1 and ξ0 ∈ ∂ε0 (λ0 f )(x̄) provided
λ0 > 0 and ξi ∈ ∂εi (λi gi )(x̄) for those i ∈ {1, 2, . . . , m} satisfying λi > 0 such
that
X m
X X
0 = ξ0 + ξi and εi + F (x̄) − λ0 ε − λi gi (x̄) = ε, (10.4)
i∈I¯ i=0 i∈I¯

where I¯ = {i ∈ {1, 2, . . . , m} : λi > 0}. Now if λ0 = 0, again invoking

© 2012 by Taylor & Francis Group, LLC


344 Approximate Optimality Conditions

the ε-subdifferential Max-Function Rule, Remark 2.119, there exists some


i ∈ {1, 2, . . . , m} such that λi > 0, which implies I¯ is nonempty. Thus, corre-
sponding to i ∈ I, ¯ there exist ξi ∈ ∂ε (λi gi )(x̄) such that
i

X m
X X
0= ξi and εi + F (x̄) − λi gi (x̄) = ε. (10.5)
i∈I¯ i=0 i∈I¯

As F (x̄) = ε, the second equality condition reduces to


m
X X
εi = λi gi (x̄). (10.6)
i=0 i∈I¯

By the definition of ε-subdifferentiability, Definition 2.109,


¯
λi gi (x) ≥ λi gi (x̄) + hξi , x − x̄i − εi , ∀ x ∈ Rn , i ∈ I.
Therefore, the above inequality along with (10.5) and the nonnegativity of
εi , i = 0, 1, . . . , m, leads to
m
X X X m
X
λi gi (x) = λi gi (x) ≥ λi gi (x̄) − εi ,
i=1 i∈I¯ i∈I¯ i=0

which by the condition (10.6) yields


m
X
λi gi (x) ≥ 0, ∀ x ∈ Rn . (10.7)
i=1

As the Slater constraint qualification holds, there exists x̂ ∈ Rn such that


gi (x̂) < 0, i = 1, 2, . . . , m. Thus,
m
X
λi gi (x̂) < 0,
i=1

thereby contradicting the inequality (10.7). Therefore, λ0 6= 0. Now divid-


ing both relations of (10.4) throughout by λ0 > 0, along with F (x̄) = ε and
Theorem 2.117, leads to
X m
X m
X
0 ∈ ξ¯0 + ξ¯i and ε̄i − ε = λ̄i gi (x̄) ≤ 0,
i∈I¯ i=0 i=1

¯ ε̄i = εi , i = 0, 1, . . . , m, and
where ξ¯0 ∈ ∂ε̄0 f (x̄), ξ¯i ∈ ∂ε̄i (λ̄i gi )(x̄), i ∈ I,
λ0
λi ¯ Corresponding to i ∈ ¯ λ̄i = 0 with ξ¯i = 0 ∈ ∂ε̄ (λ̄i gi )(x̄),
λ̄i = , i ∈ I. / I, i
λ0
thereby leading to the approximate optimality condition
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄)
i=1

© 2012 by Taylor & Francis Group, LLC


10.4 ε-Saddle Point Approach 345

along with the ε-complementary slackness condition. The converse can be


worked along the lines of Theorem 10.9 taking εs = 0. 
Note that in the ε-complementary slackness condition of Theorem 10.9, we
had inequality whereas in the above theorem it is in the form of an equation.
Actually, this condition can also be treated as an inequality if for condition
(10.4) we consider F (x̄) = max{ε, 0} ≥ ε instead of F (x̄) = ε.

10.4 ε-Saddle Point Approach


While studying the optimality conditions for the convex programming problem
(CP ), we have already devoted a chapter on saddle point theory. Now to
derive the approximate optimality conditions, we make use of the ε-saddle
point approach. Recall the Lagrangian function L : Rn × Rm + → R associated
with the convex programming problem (CP ) with C given by (3.1), that is,
involving convex inequalities, introduced in Chapter 4, is given by
m
X
L(x, λ) = f (x) + λi gi (x).
i=1

Definition 10.12 A point (x̄, λ̄) ∈ Rn × Rm


+ is said to be an ε-saddle point
of (CP ) if

L(x̄, λ) − ε ≤ L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn , ∀ λ ∈ Rm


+.

Below we present a saddle point result established by Dutta [37].

Theorem 10.13 Consider the convex programming problem (CP ) with C


given by (3.1). Let ε ≥ 0 be given and x̄ be an ε-solution of (CP ).
Assume that the Slater constraint qualification holds. Then there exist
λ̄i ≥P0, i = 1, 2, . . . , m, such that (x̄, λ̄) is an ε-saddle point of (CP ) and
m
ε + i=1 λ̄i gi (x̄) ≥ 0.

Proof. As x̄ is an ε-solution of (CP ), the following system

f (x) − f (x̄) + ε < 0,


gi (x) < 0, i = 1, 2, . . . , m,

has no solution x ∈ Rn . Define the set

Λ = {(y, z) ∈ R × Rm : f (x) − f (x̄) + ε < y, gi (x) < zi , i = 1, 2, . . . , m}.

The reader is urged to verify that Λ is an open convex set. Observe that
(0, 0) ∈
/ Λ. Therefore, by the Separation Theorem, Theorem 2.26 (ii), there

© 2012 by Taylor & Francis Group, LLC


346 Approximate Optimality Conditions

exists (λ0 , λ) ∈ R × Rm with (λ0 , λ) 6= (0, 0) such that


m
X
λ0 (f (x) − f (x̄) + ε) + λi gi (x) ≥ 0, ∀ x ∈ Rn . (10.8)
i=1

Working along the lines of proof of Theorem 4.2, it can be proved that
(λ0 , λ) ∈ R+ × Rm +.
We claim that λ0 6= 0. On the contrary, suppose that λ0 = 0. By the
Slater constraint qualification, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m which implies
m
X
λi gi (x̂) < 0,
i=1

thereby contradicting (10.8). Therefore, λ0 6= 0 and thus the condition (10.8)


can be expressed as
m
X
f (x) − f (x̄) + ε + λ̄i gi (x) ≥ 0, ∀ x ∈ Rn , (10.9)
i=1

λi
where λ̄i = for i = 1, 2, . . . , m. In particular, taking x = x̄, the above
λ0
inequality reduces to
X m
ε+ λ̄i gi (x̄) ≥ 0. (10.10)
i=1
As gi (x̄) ≤ 0, i = 1, 2, . . . , m, which along with (10.9) leads to
m
X m
X
f (x̄) + λ̄i gi (x̄) ≤ f (x) + λ̄i gi (x) + ε, ∀ x ∈ Rn ,
i=1 i=1

which implies
L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn . (10.11)
For any λi ≥ 0, i = 1, 2, . . . , m, the feasibility of x̄ along with the nonnega-
tivity of ε and (10.10) leads to
m
X m
X
f (x̄) + λi gi (x̄) − ε ≤ f (x̄) − ε ≤ f (x̄) + λ̄i gi (x̄),
i=1 i=1

that is,
L(x̄, λ) − ε ≤ L(x̄, λ̄), ∀ λ ∈ Rm
+.

The above inequality along with (10.11) implies that (x̄, λ̄) is an ε-saddle point
of (CP ), which satisfies (10.10), thereby yielding the desired result. 
Using this ε-saddle point result, we establish the approximate optimal-
ity conditions. But unlike Theorems 10.9 and 10.11, the result below is only
necessary with a relaxed ε-complementary slackness condition.

© 2012 by Taylor & Francis Group, LLC


10.4 ε-Saddle Point Approach 347

Theorem 10.14 Consider the convex programming problem (CP ) with C


given by (3.1). Let ε ≥ 0 be given and x̄ be an ε-solution of (CP ). Assume
that the Slater constraint qualificationP holds. Then there exist ε̄0 ≥ 0, ε̄i ≥ 0,
m
and λ̄i ≥ 0, i = 1, 2, . . . , m, with ε̄0 + i=1 ε̄i = ε such that
m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε+ λ̄i gi (x̄) ≥ 0.
i=1 i=1

Proof. By the previous theorem, there exist λ̄i ≥ 0, i = 1, 2, . . . , m, such that

L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn


Pm
along with ε + i=1 λ̄i gi (x̄) ≥ 0. By Definition 10.1 of ε-solution, the above
inequality implies that x̄ is an ε-solution of the unconstrained problem
m
X
inf f (x) + λ̄i gi (x) subject to x ∈ Rn .
i=1

By Theorem 10.4, the approximate optimality condition is


m
X
0 ∈ ∂ε (f + λ̄i gi )(x̄). (10.12)
i=1

As dom f = Rn and dom gi = Rn , i = 1, 2, . . . , m, applying the Sum Rule of


ε-subdifferential,
Pm Theorem 2.115, there exist ε̄i ≥ 0, i = 0, 1, . . . , m, satisfying
ε̄0 + i=1 ε̄i = ε such that (10.12) becomes
m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄),
i=1

thereby establishing the result. 

Observe that the conditions obtained in Theorem 10.14 are only neces-
sary and not sufficient. The approach used in Theorems 10.9 and 10.11 for
the sufficiency part cannot be invoked here. But if instead of the relaxed
ε-complementary slackness condition, one has the standard complementary
slackness, which is equivalent to
m
X
λ̄i gi (x̄) = 0,
i=1

then working along the lines of Theorem 10.9 the sufficiency can also be estab-
lished. The result below shows that the optimality conditions derived in the
above theorem imply toward the 2ε-solution of (CP ) instead of the ε-solution.

© 2012 by Taylor & Francis Group, LLC


348 Approximate Optimality Conditions

Theorem 10.15 Consider the convex programming problem (CP ) with C


given by (3.1). Let ε ≥ 0 be given. Assume that the approximate optimal-
ity condition and the relaxed ε-complementary slackness condition of Theo-
n m
Pm hold for (x̄, λ̄) ∈ R × R+ and εi ≥ 0, i = 0, 1, . . . , m, satisfying
rem 10.14
ε0 + i=1 εi = ε. Then x̄ is a 2ε-solution of (CP ).

Proof. From the approximate optimality condition of Theorem P 10.14, there


m
exist λ̄i ≥ 0, i = 1, 2, . . . , m, and εi ≥ 0, i = 0, 1, . . . , m, with ε0 + i=1 εi = ε,
ξ0 ∈ ∂ε0 f (x̄), and ξi ∈ ∂εi (λ̄i gi )(x̄), i = 1, 2, . . . , m such that
m
X
0 = ξ0 + ξi . (10.13)
i=1

By Definition 2.109 of the ε-subdifferential,

f (x) − f (x̄) ≥ hξ0 , x − x̄i − ε0 ,


λ̄i gi (x) − λ̄i gi (x̄) ≥ hξi , x − x̄i − εi , i = 1, 2, . . . , m.

Summing the above inequalities along with the condition (10.13) leads to
m
X m
X m
X
f (x) + λ̄i gi (x) ≥ f (x̄) + λ̄i gi (x̄) − (ε0 + εi ).
i=1 i=1 i=1

For any x feasible for (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which alongP


with the re-
m
laxed ε-complementary slackness condition and the fact that ε0 + i=1 εi = ε
implies that

f (x) ≥ f (x̄) − 2ε, ∀ x ∈ C.

Thus, x̄ is a 2ε-solution of (CP ). 


From Definition 10.12, (x̄, λ̄) is an ε-saddle point of (CP ) if

L(x̄, λ) − ε ≤ L(x̄, λ̄) ≤ L(x, λ̄) + ε, ∀ x ∈ Rn , ∀ λ ∈ Rm


+.

With respect to the ε-solution, we will call x̄ an ε-minimum solution of L(., λ̄)
and similarly, call λ̄ an ε-maximum solution of L(x̄, .).
We end this section by presenting a result relating the ε-solutions of the
saddle point to the almost ε-solution of (CP ) that was derived by Dutta [37].

Theorem 10.16 Consider the convex programming problem (CP ) with C


given by (3.1). Let (x̄, λ̄) ∈ Rn ×Rm
+ be such that x̄ is an ε1 -minimum solution
of L(., λ̄) and λ̄ is an ε2 -maximum solution of L(x̄, .). Then x̄ is an almost
(ε1 + ε2 )-solution of (CP ).

Proof. Because λ̄ ∈ Rm m
+ is an ε2 -maximum solution of L(x̄, λ) over R+ ,

L(x̄, λ) − ε2 ≤ L(x̄, λ̄), ∀ λ ∈ Rm


+.

© 2012 by Taylor & Francis Group, LLC


10.4 ε-Saddle Point Approach 349
Pm
As L(x, λ) = f (x) + i=1 λi gi (x), the above inequality reduces to
m
X
(λi − λ̄i )gi (x̄) ≤ ε2 , ∀ λi ≥ 0, i = 1, 2, . . . , m. (10.14)
i=1

We claim that x̄ ∈ Cε2 = {x ∈ Rn : gi (x) ≤ ε2 , i = 1, 2, . . . , m}. On the


contrary, suppose that x̄ 6∈ Cε2 , which implies that the system

gi (x̄) − ε2 ≤ 0, i = 1, 2, . . . , m

does not hold. Equivalently, the above condition implies that

/ Rm
(g1 (x̄) − ε2 , g2 (x̄) − ε2 , . . . , gm (x̄) − ε2 ) ∈ −.

As Rm− is a closed convex set, by the Strict Separation Theorem, Theo-


rem 2.26 (iii), there exists γ ∈ Rm with γ 6= 0 such that
m
X m
X m
X
γi gi (x̄) − γi ε2 > 0 ≥ γi yi , ∀ y ∈ Rm
−. (10.15)
i=1 i=1 i=1

We claim that γ ∈ Rm m
+ . On the contrary, assume that γ 6∈ R+ , which implies
for some i ∈ {1, 2, . . . , m}, γi < 0. As the inequality (10.15) holds for every
y ∈ Rm − , taking the corresponding yi → −∞ leads to a contradiction. Hence,
γ ∈ Rm +. Pm
Because γ 6= 0, it can be so chosen satisfying i=1 γi = 1. Therefore, the
strict inequality condition in (10.15) reduces to
m
X
γi gi (x̄) > ε2 . (10.16)
i=1

As λ̄ ∈ Rm m m
+ and γ ∈ R+ , λ̄ + γ ∈ R+ . Therefore, taking λ = λ̄ + γ in (10.14)
leads to
m
X
γi gi (x̄) ≤ ε2 ,
i=1

which contradicts (10.16). Thus, x̄ ∈ Cε2 ⊂ Cε1 +ε2 , where

Cε1 +ε2 = {x ∈ Rn : gi (x) ≤ ε1 + ε2 , i = 1, 2, . . . , m}.

As x̄ is an ε1 -minimum solution of L(x, λ̄) over Rn ,

L(x̄, λ̄) ≤ L(x, λ̄) + ε1 , ∀ x ∈ Rn ,

which implies
m
X m
X
f (x̄) + λ̄i gi (x̄) ≤ f (x) + λ̄i gi (x) + ε1 , ∀ x ∈ Rn .
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


350 Approximate Optimality Conditions

For any x feasible to (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which implies


Pi gmi (x) ≤ 0, i = 1, 2, . . . , m. Taking λi = 0, i = 1, 2, . . . , m, in (10.14),
λ̄
i=1 λ̄i gi (x̄) ≥ −ε2 . Thus, the preceding inequality reduces to

f (x̄) ≤ f (x) + ε1 + ε2 , ∀ x ∈ C.

Therefore, x̄ is an almost (ε1 + ε2 )-solution of (CP ). 

10.5 Exact Penalization Approach


We have discussed different approaches like the ε-subdifferential approach,
max-function approach, and saddle point approach to study the approximate
optimality conditions. Another approach to deal with the relationship between
the different classes of approximate solutions is the penalty function approach
by Loridan [76]. In the work of Loridan that appeared in 1982, he dealt with
the notion of regular and almost regular approximate solutions. But here
we will concentrate more on ε-solutions and almost ε-solutions for which we
move on to study the work done by Loridan and Morgan [77]. This approach
helps in dealing with the stability analysis with respect to the perturbed
problem, thereby relating the ε-solutions of the perturbed problem and almost
ε-solutions of (CP ).
We consider the exact penalty function
m
X
fρ (x) = f (x) + ρi max{0, gi (x)},
i=1

where ρ = (ρ1 , ρ2 , . . . , ρm ), with ρi > 0, i = 1, 2, . . . , m and the following


unconstrained problem
min fρ (x) subject to x ∈ Rn , (CP )ρ
is associated with it. The convergence of the ε-solutions of the sequence of
problems (CP )ρ under certain assumptions leads to an ε-solution of the prob-
lem (CP ). So before moving on to establish the convergence result, we present
a result relating the ε-solution of (CP )ρ with the almost ε-solution of (CP ).

Theorem 10.17 Assume that f is bounded below on Rn . Then there ex-


(α + ε)
ists ρε = where α = inf x∈C f (x) − inf x∈Rn f (x) such that whenever
ε
ρi ≥ ρε , i = 1, 2, . . . , m, every ε-solution of (CP )ρ is an almost ε-solution of
(CP ).

Proof. Suppose that xρ is an ε-solution for (CP )ρ . Then

fρ (xρ ) ≤ fρ (x) + ε, ∀ x ∈ Rn . (10.17)

© 2012 by Taylor & Francis Group, LLC


10.5 Exact Penalization Approach 351

Observe that for x ∈ C, as gi (x) ≤ 0, i = 1, 2, . . . , m, fρ (x) = f (x). This


along with the condition (10.17) and the definition of fρ implies

f (xρ ) ≤ fρ (xρ ) ≤ f (x) + ε, ∀ x ∈ C. (10.18)

Again, from the definition of fρ along with (10.18),


m
X
infn f (x) + ρi max{0, gi (xρ )} ≤ fρ (xρ ) ≤ inf f (x) + ε,
x∈R x∈C
i=1

which implies
m
X
ρi max{0, gi (xρ )} ≤ α + ε. (10.19)
i=1

(α + ε)
Now consider ρ = (ρ1 , ρ2 , . . . , ρm ) such that ρi ≥ ρε = for every
ε
i = 1, 2, . . . , m. Therefore, for the ε-solution xρ of (CP )ρ , the condition (10.19)
leads to
m
X
gi (xρ ) ≤ max{0, gi (xρ )} ≤ max{0, gi (xρ )} ≤ ε, ∀ i = 1, 2, . . . , m,
i=1

which implies xρ ∈ Cε . This along with (10.18) yields that xρ is an almost


ε-solution of (CP ). 
In the above theorem, it was shown that the ε-solutions of the penal-
ized problem (CP )ρ are almost ε-solutions of (CP ). But we are more in-
terested in deriving an ε-solution rather than an almost ε-solution of (CP ).
The next result paves a way in this direction by obtaining an ε-solution of
(CP ) from the ε-solutions of the sequence of problems {(CP )ρk }k , where
ρk = (ρk1 , ρk2 , . . . , ρkm ).

Theorem 10.18 Assume that f is bounded below on Rn and satisfies the


coercivity condition

lim f (x) = +∞.


kxk→+∞

Let {ρk }k be a sequence such that limk→+∞ ρki = +∞ for every i = 1, 2, . . . , m


and xρk be the ε-solution of (CP )ρk . Then every convergent sequence of {xρk }
has a limit point that is an ε-solution of (CP ).

Proof. As {xρk } is the ε-solution of (CP )ρk , by Theorem 10.17, {xρk } is an


almost ε-solution of (CP ) and thus satisfies

f (xρk ) ≤ f (x) + ε, ∀ x ∈ C.

Because f (xρk ) is bounded above for every k, therefore by the given hypothesis

© 2012 by Taylor & Francis Group, LLC


352 Approximate Optimality Conditions

{xρk } is a bounded sequence and thus by the Bolzano–Weierstrass Theorem,


Proposition 1.3, has a convergent subsequence. Without loss of generality,
assume that xρk → xρ . As dom f = Rn , by Theorem 2.69, f is continuous on
Rn . Thus, taking the limit as k → +∞, the above inequality leads to

f (xρ ) ≤ f (x) + ε, ∀ x ∈ C. (10.20)

Using the condition (10.19) in the proof of Theorem 10.17

gi (xρk ) ≤ (α + ε)/ρki .

Again, by Theorem 2.69, gi , i = 1, 2, . . . , m, is continuous on dom gi = Rn ,


i = 1, 2, . . . , m. Therefore, taking the limit as k → +∞, the above inequality
leads to

gi (xρ ) ≤ 0, ∀ i = 1, 2, . . . , m.

Thus, xρ ∈ C along with the condition (10.20) implies that xρ is an ε-solution


of (CP ). 
From the above discussions it is obvious that an ε-solution of (CP )ρ need
not be an ε-solution of (CP ) when xρ ∈ / C. But in case xρ ∈ C, it may
be considered as an ε-solution of (CP ). The result below tries to find an
ε-solution for (CP ) by using an ε-solution of (CP )ρ under the Slater constraint
qualification. Even though the result is from Loridan and Morgan [77] but the
proof is based on the work by Zangwill [114] on penalty functions. Here we
present the detailed proof for a better understanding.

Theorem 10.19 Consider the convex programming problem (CP ) with C


given by (3.1). Assume that f is bounded below on Rn and the Slater con-
straint qualification is satisfied, that is, there exists x̂ ∈ Rn such that gi (x̂) < 0,
i = 1, 2, . . . , m. Define β = inf x∈C f (x) − f (x̂) and γ = maxi=1,...,m gi (x̂) < 0.
Let ρ0 = (β − 1)/γ > 0. For ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 , i = 1, 2, . . . , m,
let xρ ∈/ C be an ε-solution for (CP )ρ . Let x̄ be the unique point on the line
segment joining xρ and x̂ lying on the boundary of C. Then x̄ is an ε-solution
of (CP ).

Proof. Because x̄ is a unique point on the line segment joining xρ and x̂ lying
on the boundary, the active index set I(x̄) = {i ∈ {1, 2, . . . , m} : gi (x̄) = 0} is
nonempty. Define a convex auxiliary function as
X
F(x) = f (x) + ρ0 gi (x).
i∈I(x̄)

Observe that for i ∈ I(x̄), gi (x̄) = 0 while for i ∈


/ I(x̄), gi (x̄) < 0. Therefore,

F(x̄) = f (x̄) = fρ0 (x̄). (10.21)

© 2012 by Taylor & Francis Group, LLC


10.5 Exact Penalization Approach 353

As x̄ lies on the line segment joining xρ and x̂, there exists λ ∈ (0, 1) such
that x̄ = λxρ + (1 − λ)x̂. Then by the convexity of gi , i = 1, 2, . . . , m,

gi (x̄) ≤ λgi (xρ ) + (1 − λ)gi (x̂), i = 1, 2, . . . , m.

For i ∈ I(x̄), gi (x̄) = 0, which along with the Slater constraint qualification
reduces the above inequality to

0 < −(1 − λ)gi (x̂) ≤ λgi (xρ ), ∀ i ∈ I(x̄).

Therefore, for i ∈ I(x̄), gi (xρ ) > 0, which implies

X X m
X
gi (xρ ) = max{0, gi (xρ )} ≤ max{0, gi (xρ )},
i∈I(x̄) i∈I(x̄) i=1

thereby leading to the fact that

F(xρ ) ≤ fρ0 (xρ ). (10.22)

To prove the result, it is sufficient to show that F(x̄) < F(xρ ). But first
we will show that F(x̂) < F(x̄). Consider
X
F(x̂) = f (x̂) + ρ0 gi (x̂).
i∈I(x̄)

P
Because gi (x̂) < 0, i = 1, 2, . . . , m, i∈I(x̄) gi (x̂) ≤ maxi=1,...,m gi (x̂), which
by the given hypothesis implies

F(x̂) ≤ f (x̂) + ρ0 γ = inf f (x) − 1 < f (x̄) = F(x̄). (10.23)


x∈C

The convexity of F along with (10.23) leads to

F(x̄) < λF(xρ ) + (1 − λ)F(x̄),

which implies F(x̄) < F(xρ ). Therefore, by (10.21) and (10.22),


f (x̄) < fρ0 (xρ ). By the definition of fρ , fρ0 (x) ≤ fρ (x) for every
ρ = (ρ1 , ρ2 , . . . , ρm ) with ρi ≥ ρ0 , i = 1, 2, . . . , m, which along with the fact
that xρ is an ε-solution of (CP )ρ implies

f (x̄) < fρ (xρ ) ≤ fρ (x) + ε, ∀ x ∈ Rn .

For x ∈ C, fρ (x) = f (x), which reduces the above condition to

f (x̄) ≤ f (x) + ε, ∀ x ∈ C,

thereby implying that x̄ is an ε-solution of (CP ). 

© 2012 by Taylor & Francis Group, LLC


354 Approximate Optimality Conditions

For a better understanding of the above result, let us consider the following
example. Consider

inf ex subject to x ≤ 0.

Obviously the Slater constraint qualification holds. Consider x̂ = −1 and then


ρ0 = e−1 + 1. For ε = 2, xρ = 1/2 > 0 is an ε-solution for every ρ ≥ ρ0 . Here,
x̄ = 0 ∈ [−1, 1/2] is an ε-solution for the constrained problem.
Observe that one requires the fact that xρ is an ε-solution of (CP )ρ only to
establish that x̄ is an ε-solution of (CP ). So from the proof of Theorem 10.19
it can also be worked out that under the Slater constraint qualification, cor-
responding to xρ 6∈ C, there exists x̄ ∈ C such that

f (x̄) = fρ (x̄) < fρ (xρ )

for every ρ ≥ ρ0 , where ρ0 is the same as in the previous theorem. As a matter


of fact, because the set C is closed convex, one can always find such an x̄ on
the boundary of C. As xρ 6∈ C is arbitrarily chosen, then from the above
inequality it is obvious that

inf f (x) ≤ fρ (x), ∀ x ∈


/ C.
x∈C

Also, for any x ∈ C, f (x) = fρ (x), which along with the above condition
implies

inf fρ (x) = inf f (x) ≤ infn fρ (x).


x∈C x∈C x∈R

The reverse inequality holds trivially. Therefore,

inf f (x) = infn fρ (x). (10.24)


x∈C x∈R

This leads to the fact that every ε-solution of (CP ) is also an ε-solution of the
penalized unconstrained problem. Next we derive the approximate optimality
conditions for (CP ) using the penalized unconstrained problem.

Theorem 10.20 Consider the convex programming problem (CP ) with C de-
fined by (3.1). Assume that the Slater constraint qualification is satisfied. Let
ε ≥ 0. Then x̄ is an ε-solution of (CP ) if and only if there exist ε̄0 ≥ 0, ε̄i ≥ 0
and λ̄i ≥ 0, i = 1, . . . , m, such that
m
X m
X m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄) and ε̄i − ε = λ̄i gi (x̄) ≤ 0.
i=1 i=0 i=1

Proof. As x̄ ∈ C is an ε-solution of (CP ), from the above discussion it is also


an ε-solution of the penalized unconstrained problem for ρ = (ρ1 , ρ2 , . . . , ρm )
with ρi ≥ ρ0 > 0, where ρ0 is defined in Theorem 10.19. Therefore, by the

© 2012 by Taylor & Francis Group, LLC


10.6 Ekeland’s Variational Principle Approach 355

approximate optimality condition, Theorem 10.4, for the unconstrained pe-


nalized problem (CP )ρ ,

0 ∈ ∂ε fρ (x̄).

As dom f = dom gi = Rn , applying the ε-subdifferential P Sum Rule, Theo-


m
rem 2.115, there exist εi ≥ 0, i = 0, 1, . . . , m, satisfying i=0 εi = ε such
that
m
X
0 ∈ ∂ε0 f (x̄) + ∂εi (max{0, ρi gi (.)})(x̄).
i=1

By the ε-subdifferential Max-Function Rule, Remark 2.119, there exist


0 ≤ λi ≤ 1 and ε̄i ≥ 0 satisfying

εi = ε̄i + max{0, ρi gi (x̄)} − λi ρi gi (x̄) = ε̄i − λi ρi gi (x̄) (10.25)

for every i = 1, 2, . . . , m such that


m
X
0 ∈ ∂ε̄0 f (x̄) + ∂ε̄i (λ̄i gi )(x̄),
i=1

ε0 ≥ 0 and λ̄i = ρi λi ≥ 0, i = 1, 2, . . . , m. The condition (10.25)


where ε̄0 =P
m
along with i=0 εi = ε implies that
m
X m
X
ε̄i − λ̄i gi (x̄) = ε,
i=0 i=1

thereby leading to the requisite conditions. The converse can be proved in a


similar fashion, as done in Theorem 10.9 with εs = 0. 
Note that the conditions obtained in the above theorem are the same as
those in Theorem 10.11.

10.6 Ekeland’s Variational Principle Approach


In all the earlier sections, we concentrated on the ε-solutions. If x̄ is an
ε-solution of (CP ), then by the Ekeland’s variational principle, Theorem 2.113,
mentioned in Chapter 2 there exists x̂ ∈ C such that

f (x̂) ≤ f (x) + εkx − x̂k, ∀ x ∈ C.

Any x̂ satisfying the above condition is a quasi ε-solution of (CP ). Observe


that we are emphasizing only one of the conditions of the Ekeland’s variational

© 2012 by Taylor & Francis Group, LLC


356 Approximate Optimality Conditions

principle and the other two need not be satisfied. In this section we deal
with the quasi ε-solution and derive the approximate optimality conditions
for this class of approximate solutions for (CP ). But before doing so, let
us illustrate by an example that a quasi ε-solution may or may not be an
ε-solution. Consider the problem
1
inf subject to x > 0.
x
1
Note that the infimum of the problem is zero, which is not attained. For ε = ,
4
it is easy to note that xε = 4 is an ε-solution. Now x̄ > 0 is a quasi ε-solution
if
1 1 1
≤ + |x − x̄|, ∀ x > 0.
x̄ x 2
Observe that x̄ = 4.5 is a quasi ε-solution that is also an ε-solution satisfying
all the conditions of the Ekeland’s variational principle, while x̄ = 3.5 is a
quasi ε-solution that is not ε-solution. Also, it does not satisfy the condition
1 1
≤ . These are not the only quasi ε-solutions. Even points that satisfy
x̄ xε
only the unique minimizer condition of the variational principle, like x̄ = 3,
are also the quasi ε-solution to the above problem.
Now we move on to discuss the approximate optimality conditions for the
quasi ε-solutions.

Theorem 10.21 Consider the convex programming problem (CP ) with C


given by (3.1). Let ε ≥ 0 be given. Assume that the Slater constraint qual-
ification holds. Then x̄ is a quasi ε-solution of (CP ) if and only if there exist
λi ≥ 0, i = 1, 2, . . . , m, such that
m
X √
0 ∈ ∂f (x̄) + λi ∂gi (x̄) + εB and λi gi (x̄) = 0, i = 1, 2, . . . , m.
i=1

Proof. A quasi ε-solution x̄ of (CP ) can be considered a minimizer of the


convex programming problem

min f (x) + εkx − x̄k subject to x ∈ C,

where C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m}. By the KKT optimality


condition, Theorem 3.7, x̄ is a minimizer of the above problem if and only if
there exist λi ≥ 0, i = 1, 2, . . . , m, such that
m
X

0 ∈ ∂(f + εk. − x̄k)(x̄) + λi ∂gi (x̄).
i=1

As dom f = dom k. − x̄k = Rn , invoking the Sum Rule, Theorem 2.91, along

© 2012 by Taylor & Francis Group, LLC


10.6 Ekeland’s Variational Principle Approach 357

with the fact that ∂k. − x̄k = B, the above inclusion becomes
m
X

0 ∈ ∂f (x̄) + εB + λi ∂gi (x̄),
i=1

along with λi gi (x̄) = 0, i = 1, 2, . . . , m, thereby yielding the requisite condi-


tions.
Conversely, by the approximate optimality condition, there exist
ξ0 ∈ ∂f (x̄), ξi ∈ ∂gi (x̄), i = 1, 2, . . . , m, and b ∈ B such that
m
X √
0 = ξ0 + λi ξi + εb. (10.26)
i=1

By Definition 2.77 of the subdifferential,

f (x) − f (x̄) ≥ hξ0 , x − x̄i, ∀ x ∈ Rn , (10.27)


gi (x) − gi (x) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i = 1, 2, . . . , m, (10.28)

and by the Cauchy–Schwartz inequality, Proposition 1.1,

kbk kx − x̄k ≥ hb, x − x̄i, ∀ x ∈ Rn . (10.29)

Combining the inequalities (10.27), (10.28), and (10.29) along with (10.26)
implies
m
X m
X √
f (x) − f (x̄) + λi gi (x) − λi gi (x̄) + εkbk kx − x̄k ≥ 0, ∀ x ∈ Rn .
i=1 i=1

For any x feasible to (CP ), gi (x) ≤ 0, i = 1, 2, . . . , m, which along with the


complementary slackness condition and the fact that λi ≥ 0, i = 1, 2, . . . , m,
reduces the above inequality to

f (x) − f (x̄) + εkbk kx − x̄k ≥ 0, ∀ x ∈ C.

As b ∈ B, kbk ≤ 1, thereby leading to



f (x) − f (x̄) + εkx − x̄k ≥ 0, ∀ x ∈ C

and thus, establishing the requisite result. 


Observe that the above theorem provides a necessary as well as sufficient
characterization to the quasi ε-solution. Here the approximate optimality con-
dition is in terms of B and the subdifferentials, unlike the earlier results of
this chapter where the approximate optimality conditions were expressed in
terms of the ε-subdifferentials. Also, here we obtain the standard complemen-
tary slackness condition instead of the ε-complementary slackness or relaxed
ε-complementary slackness conditions. Results similar to the ε-saddle point
can also be worked out for quasi ε-saddle points. For more details, one can
look into Dutta [37].

© 2012 by Taylor & Francis Group, LLC


358 Approximate Optimality Conditions

10.7 Modified ε-KKT Conditions


In all discussions regarding the KKT optimality conditions in the earlier chap-
ters, it was observed that under some constraint qualification, the optimality
conditions are established at the point of minimizer, that is, the KKT opti-
mality conditions are nothing but point conditions. Due to this very reason,
the KKT conditions have not been widely incorporated in the optimization
algorithm design but only used as stopping criteria. However, if one could find
the direction of the minima using the deviations from the KKT conditions,
it could be useful from an algorithmic point of view. Work has recently been
done in this respect by Dutta, Deb, Arora, and Tulshyan [39]. They introduced
a new notion of modified ε-KKT point and used it to study the convergence
of the sequences of modified ε-KKT points to the minima of the convex pro-
gramming problem (CP ). Below we define this new concept, which is again
motivated the Ekeland’s variational principle.
Definition 10.22 A feasible point x̄ of (CP ) is said to be a modified √ ε-KKT
point for a given ε > 0 if there exists x̃ ∈ Rn satisfying kx̃ − x̄k ≤ ε and
there exist ξ˜0 ∈ ∂f (x̃), ξ˜i ∈ ∂gi (x̃) and λi ≥ 0, i = 1, 2, . . . , m, such that
m
X m
X

kξ˜0 + λi ξ˜i k ≤ ε and ε+ λi gi (x̄) ≥ 0.
i=1 i=1

Observe that in the ε-KKT condition, the subdifferentials are calculated at


some x̃ ∈ B√ε (x̄), whereas the relaxed ε-complementary slackness condition
is satisfied at x̄ itself. Before moving on to the satability part, we try to relate
the already discussed ε-solution with the modified ε-KKT point.
Theorem 10.23 Consider the convex programming problem (CP ) with C
given by (3.1). Assume that the Slater constraint qualification holds and let x̄
be an ε-solution of (CP ). Then x̄ is a modified ε-KKT point.
Proof. Because x̄ is an ε-solution of (CP ), by Theorem 10.13, there exist
λi ≥ 0, i = 1, 2, . . . , m, such that x̄ is also an ε-saddle point along with
m
X
ε+ λi gi (x̄) ≥ 0. (10.30)
i=1

As x̄ is an ε-saddle point,
L(x̄, λ) ≤ L(x, λ) + ε, ∀ x ∈ Rn ,
n
√ over R . Applying Ekeland’s
which implies x̄ is an ε-solution of L(., λ)
n
vari-
ational principle,
√ Theorem 2.113, for ε, there exists x̃ ∈ R satisfying
kx̃ − x̄k ≤ ε such that x̃ is a minimizer of the problem

min L(x, λ) + εkx − x̃k subject to x ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


10.7 Modified ε-KKT Conditions 359

By the unconstrained optimality condition, Theorem 2.89,



0 ∈ ∂(L(., λ) + εk. − x̂k)(x̃).

As the functions dom f = dom gi = Rn , i = 1, 2, . . . , m, applying the Sum


Rule, Theorem 2.91,
m
X √
0 ∈ ∂f (x̃) + λi ∂gi (x̃) + εB,
i=1

which implies there exist ξ˜0 ∈ ∂f (x̃), ξ˜i ∈ ∂gi (x̃), i = 1, 2, . . . , m, and b ∈ B
such that
m
X √
0 = ξ˜0 + λi ξ˜i + εb,
i=1

thereby leading to
m
X √
kξ˜0 + λi ξ˜i k ≤ ε,
i=1

which along with the condition (10.30) implies that x̄ is a modified ε-KKT
point as desired. 
In the case of the exact penalization approach, from Theorem 10.16 we
have that every convergent sequence of ε-solutions of a sequence of penalized
problems converges to an ε-solution of (CP ). Now is it possible to establish
such a result by studying the sequence of modified ε-KKT points and the
answer is yes, as shown in the following theorem.

Theorem 10.24 Consider the convex programming problem (CP ) with C


given by (3.1). Assume that the Slater constraint qualification holds and let
{εk } ⊂ R+ such that εk ↓ 0 as k → +∞. For every k, let xk be a modified
εk -KKT point of (CP ) such that xk → x̄ as k → +∞. Then x̄ is a point of
minimizer of (CP ).

Proof. As for every k, xk is a modified


√ εk -KKT point of (CP ), there exists
x̃k ∈ Rn satisfying kx̃k − xk k ≤ ε and there exist ξ0k ∈ ∂f (x̃k ), ξik ∈ ∂gi (x̃k ),
and λki ≥ 0, i = 1, 2, . . . , m, such that
m
X m
X

kξ0k + λki ξik k ≤ εk and εk + λki gi (xk ) ≥ 0. (10.31)
i=1 i=1

We claim that {λk } ⊂ Rm k


+ is a bounded sequence. Suppose that {λ } is an
k
λ
unbounded sequence. Define a bounded sequence γ k = with kγ k k = 1.
kλk k

© 2012 by Taylor & Francis Group, LLC


360 Approximate Optimality Conditions

Because {γ k } is a bounded sequence, by the Bolzano–Weierstrass Theorem,


Proposition 1.3, it has a convergent subsequence. Without loss of generality,
assume that γ k → γ with kγk = 1. Observe that

kx̃k − x̄k ≤ kx̃k − xk k + kxk − x̄k



≤ εk + kxk − x̄k.

By the given hypothesis, as k → +∞, εk ↓ 0 and xk → x̄, which implies


x̃k → x̄.
Now dividing both the conditions of (10.31) throughout by kλk k yields
m √ m
1 k X k k εk X εk
k ξ + γ ξ k ≤ and γik gi (xk ) ≥ − .
kλk k 0 i=1 i i kλk k i=1
kλk k

By Proposition 2.83, f and gi , i = 1, 2, . . . , m, have compact subdifferentials.


Thus, {ξ0k } and {ξik }, i = 1, 2, . . . , m, are bounded sequences and hence by
the Bolzano–Weierstrass Theorem, Proposition 1.3, have a convergent subse-
quence. Without loss of generality, let ξ0k → ξ0 and ξik → ξi , i = 1, 2, . . . , m.
By the Closed Graph Theorem, Theorem 2.84, ξ0 ∈ ∂f (x̄) and ξi ∈ ∂gi (x̄),
i = 1, 2, . . . , m. Therefore, as k → +∞,

1 k εk εk
ξ → 0, → 0 and → 0,
kλk k 0 kλk k kλk k
Pm Pm Pm
which implies k i=1 γi ξi k ≤ 0, that is, i=1 γi ξi = 0 and i=1 γi gi (x̄) ≥ 0.
By Definition 2.77 of the subdifferential,

gi (x) − gi (x̄) ≥ hξi , x − x̄i, ∀ x ∈ Rn , i = 1, 2, . . . , m,

which yields
m
X m
X
γi gi (x) ≥ γi gi (x̄) ≥ 0, ∀ x ∈ Rn ,
i=1 i=1

thereby contradicting the existence of a point x̂ satisfying gi (x̂) < 0,


i = 1, 2, . . . , m by the Slater constraint qualification. Therefore, the sequence
{λk } is a bounded sequence and hence by the Bolzano–Weierstrass Theorem,
Proposition 1.3, has a convergent subsequence. Without loss of generality, let
λki → λi , i = 1, 2, . . . , m. Taking the limit as k → +∞ in (10.31) yields
m
X m
X
kξ0 + λi ξi k ≤ 0 and λi gi (x̄) ≥ 0. (10.32)
i=1 i=1

The norm condition in (10.32) implies that


m
X
0 = ξ0 + λi ξi ,
i=1

© 2012 by Taylor & Francis Group, LLC


10.8 Duality-Based Approach to ε-Optimality 361

thereby leading to the optimality condition


m
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄).
i=1

As {xk } is a modified εk -KKT point, it is a feasible point of (CP ), that


is, gi (xk ) ≤ 0, i = 1, 2, . . . , m, which implies gi (x̄) ≤ 0, i = 1, 2, . . . , m, as
k → +∞. This along with the condition in (10.32) leads to the complementary
slackness condition
m
X
λi gi (x̄) = 0.
i=1

Hence, x̄ satisfies the standard KKT optimality conditions. As (CP ) is a


convex programming problem, by the sufficient optimality conditions, x̄ is a
point of minimizer of (CP ), thereby establishing the desired result. 
Theorems 10.23 and 10.24 can be combined together and stated as follows.

Theorem 10.25 Consider the convex programming problem (CP ) with C


given by (3.1). Assume that the Slater constraint qualification holds. Let {xk }
be a sequence of the εk -solution of (CP ) such that xk → x̄ and εk ↓ 0 as
k → +∞. Then x̄ is a point of minimizer of (CP ).

10.8 Duality-Based Approach to ε-Optimality


In this chapter, in all the results on approximate optimality conditions, we
have assumed the Slater constraint qualification. But what if neither the Slater
nor any other constraint qualification is satisfied. Work has been done in this
respect by Yokoyama [113] using the exact penalization approach. In this
work, he replaced the assumption of Slater constraint qualification by relating
the penalty parameter with the ε-maximum solution of the dual problem
associated with (CP ). The results were obtained relating the ε-solutions of the
given problem (CP ), its dual problem, and the penalized problem. Here we will
discuss some of his results in comparison to the ones derived in Section 10.5.
For that purpose, we associate the dual problem
sup w(λ) subject to λ ∈ Rm , (DP )
where w(λ) = infn L(x, λ) and L(x, λ) is the Lagrange function given by
x∈R
 m
 f (x) + X λ g (x),

λi ≥ 0, i = 1, 2, . . . , m,
i i
L(x, λ) =

 i=1
−∞, otherwise.

© 2012 by Taylor & Francis Group, LLC


362 Approximate Optimality Conditions

Denote the duality gap by θ = inf x∈C f (x) − supλ∈Rm w(λ). Next we present
the theorem relating the ε-solution of (CP )ρ with the almost ε-solution of
(CP ) under the assumption of the ε-maximum solution of (DP ). Recall the
penalized problem
min fρ (x) subject to x ∈ Rn , (CP )ρ
Pm
where fρ (x) = f (x) + i=1 ρi max{0, gi (x)} and ρ = (ρ1 , ρ2 , . . . , ρm ) with
ρi > 0, i = 1, 2, . . . , m.
Theorem 10.26 Consider the convex programming problem (CP ) with C
given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying
θ
ρ ≥ 3 + max λ̄i + ,
i=1,...,m ε
where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is an ε-maximum solution of (DP ), every x̄ that
is an ε-solution of (CP )ρ is also an almost ε-solution of (CP ).
Proof. Consider an ε-solution x̂ ∈ C of (CP ), that is,
f (x̂) ≤ inf f (x) + ε.
x∈C

As x̄ is an ε-solution of (CP )ρ , in particular,


fρ (x̄) ≤ fρ (x̂) + ε = f (x̂) + ε,
which implies that
m
X
f (x̄) + ρ max{0, gi (x̄)} ≤ inf f (x) + 2ε. (10.33)
x∈C
i=1

By the definition of duality gap θ,


inf f (x) = sup w(λ) + θ. (10.34)
x∈C λ∈Rm

For an ε-maximum solution λ̄ of the dual problem (DP ),


m
X
sup w(λ) ≤ w(λ̄) + ε ≤ f (x̄) + λ̄i gi (x̄) + ε. (10.35)
λ∈Rm i=1

Therefore, using the conditions (10.34) and (10.35), (10.33) becomes


m
X m
X
f (x̄) + ρ max{0, gi (x̄)} ≤ f (x̄) + λ̄i gi (x̄) + 3ε + θ,
i=1 i=1

that is,
m
X m
X
ρ max{0, gi (x̄)} ≤ λ̄i gi (x̄) + 3ε + θ.
i=1 i=1

© 2012 by Taylor & Francis Group, LLC


10.8 Duality-Based Approach to ε-Optimality 363

Define the index set I > = {i ∈ {1, 2, . . . , m} : gi (x̄) > 0}. Thus
X m
X m
X
ρ gi (x̄) = ρ max{0, gi (x̄)} ≤ λ̄i gi (x̄) + 3ε + θ
i∈I > i=1 i=1
X
≤ λ̄i gi (x̄) + 3ε + θ,
i∈I >

which implies
X X
(ρ − max λ̄i ) gi (x̄) ≤ (ρ − λ̄i )gi (x̄) ≤ 3ε + θ.
i=1,...,m
i∈I > i∈I >

From the above condition and the given hypothesis on ρ,


X 3ε + θ
gi (x̄) ≤ ≤ ε,
i∈I >
(ρ − max λ̄i )
i=1,...,m

thereby implying that x̄ ∈ Cε = {x ∈ Rn : gi (x) ≤ ε, i = 1, 2, . . . , m}.


Also, f (x̄) ≤ fρ (x̄). As x̄ is an ε-solution of (CP )ρ ,
fρ (x̄) ≤ fρ (x) + ε, ∀ x ∈ Rn ,
which along with the fact that f (x) = fρ (x) for every x ∈ C leads to
f (x̄) ≤ f (x) + ε, ∀ x ∈ C,
thus implying that x̄ is an almost ε-solution of (CP ). 
This result is the same as Theorem 10.17 except for the bound on
α+ε
the penalty parameter. Recall from Theorem 10.17 that ρ ≥ where
ε
α = inf x∈C f (x) − inf x∈Rn f (x). Also in that result the Slater constraint qual-
ification was not assumed. Observe that both the results are similar but the
parameter bounds are different. Under the Slater constraint qualification, it is
known that strong duality holds and thus the duality gap θ = 0 and the dual
problem (DP ) is solvable. Consequently, under the Slater constraint qualifi-
cation, the bound on the parameter now becomes
ρ ≥ 3 + max λ̄i ,
i=1,...,m

where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is a maximizer of (DP ). Here we were discussing


the existence of an almost ε-solution of (CP ), given an ε-solution of (CP )ρ .
From the discussion in Section 10.5, it is seen that under the Slater con-
straint qualification and for ρ ≥ ρ0 with ρ0 given in Theorem 10.19,
inf f (x) = infn fρ (x),
x∈C x∈R

thereby implying that every x̄ that is an ε-solution of (CP ) is also an ε-solution


of (CP )ρ . So in absence of any constraint qualification, Yokoyama [113] ob-
tained that x̄ is an (2ε + θ)-solution of (CP )ρ presented below.

© 2012 by Taylor & Francis Group, LLC


364 Approximate Optimality Conditions

Theorem 10.27 Consider the convex programming problem (CP ) with C


given by (3.1) and its associated dual problem (DP ). Then for ρ satisfying

ρ ≥ max λ̄i ,
i=1,...,m

where λ̄ = (λ̄1 , λ̄2 , . . . , λ̄m ) is an ε-maximum solution of (DP ), every x̄ that


is an ε-solution of (CP ) is also a (2ε + θ)-solution of (CP )ρ .

Proof. As x̄ is an ε-solution of (CP ), x̄ ∈ C, which implies

fρ (x̄) = f (x̄) ≤ inf f (x) + ε.


x∈C

As λ̄ is an ε-maximum solution of (DP ), working along the lines of Theo-


rem 10.26, the above condition becomes
m
X
fρ (x̄) ≤ f (x) + λi gi (x) + 2ε + θ, ∀ x ∈ Rn .
i=1

Using the hypothesis on ρ, the above inequality leads to


m
X
fρ (x̄) ≤ f (x) + ρ max{0, gi (x)} + 2ε + θ = fρ (x) + 2ε + θ, ∀ x ∈ Rn ,
i=1

thereby implying that x̄ is a (2ε + θ)-solution of (CP )ρ . 


It was mentioned by Yokoyama [113] that in the presence of the Slater con-
straint qualification and with λ̄ as some optimal Lagrange multiplier, every
x̄ that is an ε-solution of (CP ) is also an ε-solution of (CP )ρ . In his work,
Yokoyama also derived the necessary approximate optimality conditions as
established in this chapter in the absence of any constraint qualification. The
sufficiency could be established only under the assumption of the Slater con-
straint qualification.

© 2012 by Taylor & Francis Group, LLC


Chapter 11
Convex Semi-Infinite Optimization

11.1 Introduction
In all the preceding chapters we considered the convex programming problem
(CP ) with the feasible set C of the form (3.1), that is,

C = {x ∈ Rn : gi (x) ≤ 0, i = 1, 2, . . . , m},

where gi : Rn → R, i = 1, 2, . . . , m, are convex functions. Observe that the


problem involved only a finite number of constraints. Now in situations where
the number of constraints involved is infinite, the problem extends to the class
of semi-infinite programming problems. Such problems come into existence in
many physical and social sciences models where it is necessary to consider
the constraints on the state or the control of the system during a period of
time. For examples from real-life scenarios where semi-infinite programming
problem are involved, readers may refer to Hettich and Kortanek [57] and
references therein. We consider the following convex semi-infinite programming
problem,
inf f (x) subject to g(x, i) ≤ 0, i ∈ I (SIP )
where f, g(., i) : Rn → R, i ∈ I are convex functions with infinite index set
I ⊂ Rm . The term “semi-infinite programming” is derived from the fact that
the decision variable x is finite while the index set I is infinite. But before
moving on with the derivation of KKT optimality conditions for (SIP ), we
present some notations that will be used in subsequent sections.
Denote the feasible set of (SIP ) by CI , that is,

CI = {x ∈ Rn : gi (x) ≤ 0, i ∈ I}.

Let RI be the product space of λ = (λi ∈ R : i ∈ I) and

R[I] = {λ ∈ RI : λi 6= 0 for finitely many i ∈ I},


[I]
while the positive cone in R[I] , R+ , is defined as
[I]
R+ = {λ ∈ R[I] : λi ≥ 0, ∀ i ∈ I}.

365

© 2012 by Taylor & Francis Group, LLC


366 Convex Semi-Infinite Optimization

For a given z ∈ RI and λ ∈ R[I] , define the supporting set of λ as


supp λ = {i ∈ I : λi =
6 0},
X X
hλ, zi = λ i zi = λ i zi .
i∈I i∈supp λ

With these notations, we now move on to study the various approaches to


obtain the KKT optimality conditions for (SIP ).

11.2 Sup-Function Approach


A possible approach to solve (SIP ) is to associate a problem with a finite
number of constraints, that is, the reduced form of (SIP )
inf f (x) subject to ˜
g(x, i) ≤ 0, i ∈ I, g)
(SIP
where I˜ ⊂ I is finite and f and g(., i), i ∈ I˜ are as in (SIP ) such that the
optimal value of (SIP ) and the reduced problem (SIP g ) coincide. Then (SIP
g)
is said to be the equivalent reduced problem of (SIP ). One way to reduce (SIP )
to an equivalent (SIPg ) is to replace the infinite inequality constraints by a
single constraint,

g̃(x) = sup g(x, i),


i∈I

where g̃ : Rn → R̄ is a convex function by Proposition 2.53 (iii). Therefore,


the reduced problem is
inf f (x) subject to g̃(x) ≤ 0. g sup )
(SIP
Such a formulation was studied by Pshenichnyi [96], where g(., i) for every
g sup )
i ∈ I, were taken to be convex differentiable functions. Observe that (SIP
is of the form (CP ) studied in Chapter 3. It was seen that under the Slater
constraint qualification, the standard KKT optimality conditions for (CP )
can be obtained. Therefore, to apply Theorem 3.7, (SIP g sup ) should satisfy
the Slater constraint qualification. But this problem is equivalent to (SIP ),
for which we introduce the following Slater constraint qualification for (SIP ).

Definition 11.1 The Slater constraint qualification for (SIP ) is

(i) I ⊂ Rm is a compact set,

(ii) g(x, i) is a continuous function of (x, i) ∈ Rn × I,

(iii) There exists x̂ ∈ Rn such that g(x̂, i) < 0 for every i ∈ I.

© 2012 by Taylor & Francis Group, LLC


11.2 Sup-Function Approach 367

Observe that in the Slater constraint qualification for (CP ), only condition
(iii) is considered. Here the additional conditions (i) and (ii) ensure that the
supremum is attained over I, which holds trivially in the finite index set
scenario. We now present the KKT optimality condition for (SIP ).
Theorem 11.2 Consider the convex semi-infinite programming problem
(SIP ). Assume that the Slater constraint qualification for (SIP ) holds. Then
[I(x̄)]
x̄ ∈ Rn is a point of minimizer of (SIP ) if and only if there exists λ ∈ R+
such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ

where I(x̄) = {i ∈ I : g(x̄, i) = 0} denotes the active index set and the
subdifferential ∂g(x̄, i) is with respect to x.

Proof. As already observed, (SIP ) is equivalent to (SIP g sup ) and thus x̄ is


g
also a point of minimizer of (SIP sup ). As the Slater constraint qualification
for (SIP ) holds, then by conditions (i) and (ii) the supremum is attained over
I. Therefore, by condition (iii) of the Slater constraint qualification for (SIP ),
there exists x̂ ∈ Rn such that
g̃(x̂) = sup g(x̂, i) < 0,
i∈I

g sup ) satisfies the Slater constraint qualification. In-


which implies that (SIP
voking Theorem 3.7, there exists λ′ ≥ 0 such that
0 ∈ ∂f (x̄) + λ′ ∂g̃(x̄) and λ′ g̃(x̄) = 0. (11.1)
Now we consider two cases depending on g̃(x̄).
(i) g̃(x̄) < 0: By the complementary slackness condition λ′ = 0. Also, because
g̃(x̄) < 0, g(x̄, i) < 0 for every i ∈ I, which implies the active index set I(x̄)
is empty. Thus the optimality condition (11.1) reduces to
0 ∈ ∂f (x̄),
[I]
and the KKT optimality condition for (SIP ) holds with λ = 0 ∈ R+ .
(ii) g̃(x̄) = 0: By the complementary slackness condition, λ ≥ 0. Define the
supremum set as
ˆ
I(x̄) = {i ∈ I : g(x̄, i) = g̃(x̄)} = {i ∈ I : g(x̄, i) = 0},
ˆ
which implies that I(x̄) = I(x̄). By the conditions (i) and (ii) of the Slater
ˆ
constraint qualification for (SIP ), I(x̄) and hence I(x̄) is nonempty. By the
Valadier formula, Theorem 2.97, the optimality condition becomes
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈I(x̄)

© 2012 by Taylor & Francis Group, LLC


368 Convex Semi-Infinite Optimization
[I(x̄)] P
where λi = λ′ λ̄i ≥ 0, i ∈ I(x̄) with λ̄ ∈ R+ satisfying i∈supp λ̄ λ̄i = 1.
′ [I(x̄)]
As λ ≥ 0, λ ∈ R+ and thus the preceding optimality condition can be
expressed as
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i).
i∈supp λ

Thus, the KKT optimality condition is obtained for (SIP ).


Conversely, suppose that the optimality condition holds, which implies
that there exist ξ ∈ ∂f (x̄) and ξi ∈ ∂g(x̄, i) such that
X
0=ξ+ λi ξi , (11.2)
i∈supp λ

[I(x̄)]
where λ ∈ R+ . By Definition 2.77 of the subdifferential, for every x ∈ Rn ,

f (x) ≥ f (x̄) + hξ, x − x̄i,


g(x, i) ≥ g(x̄, i) + hξi , x − x̄i, i ∈ supp λ,

which along with the condition (11.2) implies that


X X
f (x) + λi g(x, i) ≥ f (x̄) + λi g(x̄, i), ∀ x ∈ Rn .
i∈supp λ i∈supp λ

The above inequality along with the fact that g(x̄, i) = 0, i ∈ I(x̄) leads to
X
f (x) + λi g(x, i) ≥ f (x̄), ∀ x ∈ Rn .
i∈supp λ

In particular, for x ∈ CI , that is, g(x, i) ≤ 0, i ∈ I, the above condition


reduces to

f (x) ≥ f (x̄), ∀ x ∈ CI ,

thereby implying that x̄ is the minimizer of (SIP ). 

11.3 Reduction Approach


As already mentioned in the preceding section, the reduction approach is one
possible method to establish the KKT optimality condition for (SIP ). The
sup-function approach was one such reduction technique. Another way to for-
g ) is to use the approach by Ben-Tal, Rosinger, and
mulate an equivalent (SIP
Ben-Israel [9] to derive a Helly-type Theorem for open convex sets using the

© 2012 by Taylor & Francis Group, LLC


11.3 Reduction Approach 369

result by Klee [72]. But this approach was a bit difficult to follow. So Bor-
wein [16] provided a self-contained proof of the reduction approach involving
quasiconvex functions. Here we present the same under the assumptions that
g(., i) is convex for every i ∈ I and g(., .) is jointly continuous as a function
of (x, i) ∈ Rn × I. In the proof, one only needs g(x, i) to be jointly usc as a
function of (x, i) ∈ Rn × I along with the convexity assumption.

Proposition 11.3 Consider open and closed convex sets U ⊂ Rn and


C ⊂ Rn , respectively. The following are equivalent when I is compact.

(i) There exists x ∈ C and ε > 0 such that

x + εB ⊂ U, g(y, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I.

(ii) (a) For every set of n + 1 points {i0 , i1 , . . . , in } ⊂ I, there exists x ∈ C


such that

g(x, i0 ) < 0, g(x, i1 ) < 0, . . . , g(x, in ) < 0.

(b) For every set of n points {i1 , i2 , . . . , in } ⊂ I, there exists x ∈ C such


that

x ∈ U, g(x, i1 ) < 0, g(x, i2 ) < 0, . . . , g(x, in ) < 0.

Proof. It is obvious that (i) implies (ii)(b). Also, in particular, taking


y = x ∈ x + εB in (i) yields (ii)(a). Therefore, to establish the result, we show
that (ii) implies (i).
Suppose that both (ii)(a) and (b) are satisfied. We first prove that (ii)(a)
implies (i) with U = Rn . For any r ∈ N and any i ∈ I, define the set

1
C r (i) = {x ∈ C ∩ r cl B : g(y, i) < 0, ∀ y ∈ x + B}.
r
Observe that C r (i) ⊂ r cl B and hence is bounded.
We claim that C r (i) is convex. Consider x1 , x2 ∈ C r (i), which implies that
xj ∈ C ∩ r cl B, j = 1, 2. Because C and cl B are convex sets, C ∩ r cl B is
also convex. Thus,

(1 − λ)x1 + λx2 ∈ C ∩ r cl B, ∀ λ ∈ [0, 1].

1
For any yj ∈ xj + B, j = 1, 2,
r
1 1
y = (1 − λ)y1 + λy2 ∈ (1 − λ)(x1 + B) + λ(x2 + B)
r r
1
⊂ (1 − λ)x1 + λx2 + B.
r

© 2012 by Taylor & Francis Group, LLC


370 Convex Semi-Infinite Optimization

As x1 , x2 ∈ C r (i), for j = 1, 2,
1
g(yj , i) < 0, ∀ yj ∈ xj + B.
r
By the convexity of g(., i), for any λ ∈ [0, 1],
g(y, i) ≤ (1 − λ)g(y1 , i) + λg(y2 , i) < 0.
1
Because the above conditions hold for arbitrary yj ∈ xj + B, j = 1, 2,
r
1
g(y, i) < 0, ∀ y ∈ (1 − λ)x1 + λx2 + B.
r
Therefore, from the definition of C r (i), it is obvious that
(1 − λ)x1 + λx2 ∈ C r (i), ∀ λ ∈ [0, 1].
Because x1 , x2 ∈ C r (i) are arbitrary, C r (i) is a convex set.
Next we prove that C r (i) is closed. Suppose that x̄ ∈ cl C r (i), which
implies there exists a sequence {xk } ⊂ C r (i) with xk → x̄. Because xk ∈ C r (i),
xk ∈ C ∩ r cl B such that
1
g(y, i) < 0, ∀ y ∈ xk + B. (11.3)
r
Because C and cl B are closed sets, C ∩ r cl B is also closed and thus,
1
x̄ ∈ C ∩ r cl B. Now if x̄ ∈ / C r (i), there exists some ȳ ∈ x̄ + B such that
r
1
g(ȳ, i) ≥ 0. As xk → x̄, for sufficiently large k, ȳ ∈ xk + B with g(ȳ, i) ≥ 0,
r
which is a contradiction to condition (11.3). Thus C r (i) is a closed set.
Finally, we claim that for some r̄ ∈ N and every set of n + 1 points
{i0 , i1 , . . . , in } ⊂ I,
n
\
C r̄ (ij ) 6= ∅.
j=0

On the contrary, suppose that for every r ∈ N, there exist n + 1 points


{ir0 , ir1 , . . . , irn } ⊂ I such that
n
\
C r (irj ) = ∅. (11.4)
j=0

Define the sequence sr = (ir0 , ir1 , . . . , irn ) ∈ I n+1 . As I is a compact set,


I n+1 is also compact and thus {sr } is a bounded sequence. By the Bolzano–
Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. With-
out loss of generality, assume that sr → s̄, where s̄ = (ī0 , ī1 , . . . , īn ) ∈ I n+1 .
As (ii)(a) is satisfied, there exists x̄ ∈ C such that
g(x̄, ī0 ) < 0, g(x̄, ī1 ) < 0, . . . , g(x̄, īn ) < 0.

© 2012 by Taylor & Francis Group, LLC


11.3 Reduction Approach 371

Because g(., .) is jointly continuous on (x, i) ∈ Rn × I, hence jointly usc on


(x, i) ∈ Rn × I. Therefore, by the above condition there exist ε > 0 and a
neighborhood of īj , N (īj ), j = 0, 1, . . . , n, such that

g(y, ij ) < 0, ∀ y ∈ x̄ + εB, ∀ ij ∈ N (īj ), j = 0, 1, . . . , n. (11.5)

As irj → īj , one may choose r̄ ∈ N sufficiently large such that

1
kx̄k ≤ r̄, ε> and ir̄j ∈ N (īj ), j = 0, 1, . . . , n. (11.6)

Combining (11.5) and (11.6), x̄ ∈ C ∩ r̄ cl B such that
1
g(y, ir̄j ) < 0, ∀ y ∈ x̄ + B.

Therefore, x̄ ∈ C r̄ (ir̄j ) for every j = 0, 1, . . . , n, which contradicts our as-
sumption (11.4). Thus, for some r̄ ∈ N and every set of n + 1 points
{i0 , i1 , . . . , in } ⊂ I,
n
\
C r̄ (ij ) 6= ∅.
j=0

As C r̄ (ij ), j = 0, 1, . . . , n, are nonempty compact convex sets, invoking Helly’s


Theorem, Proposition 2.28,
\
C r̄ (i) 6= ∅.
i∈I

From the above condition, there exists x̃ ∈ C r (i) for every i ∈ I, which implies
x̃ ∈ C such that
1
g(y, i) < 0, ∀ y ∈ x̃ + B, i ∈ I.
r
1
Taking U = Rn and defining ε = for r ∈ N, the above condition yields (i).
r
To complete the proof, we have to finally show that (ii)(b) also implies (i).
This can be done by expressing (ii(b) in the form of (ii)(a). Consider a point
i′ ∈
/ I and define I ′ = {i′ } ∪ I, which is again a compact set. Also define the
function g ′ on Rn × I ′ as

′ ′ −δ, x ∈ U,
g (x, i ) = and g ′ (x, i) = g(x, i), i ∈ I,
+∞, x 6∈ U,

where δ > 0. Observe that g ′ (., i), i ∈ I ′ satisfies the convexity assumption
and is jointly usc on Rn × I ′ . Therefore, (ii)(b) is equivalent to the existence
of x ∈ C for every n points {i1 , i2 , . . . , in } ⊂ I,

g ′ (x, i′ ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0. (11.7)

© 2012 by Taylor & Francis Group, LLC


372 Convex Semi-Infinite Optimization

As (ii)(a) is also satisfied, for every n+1 points {i0 , i1 , . . . , in } ⊂ I there exists
x ∈ C such that

g ′ (x, i0 ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0. (11.8)

Combining the conditions (11.7) and (11.8), (ii)(b) implies that for every n+1
points {i0 , i1 , . . . , in } ⊂ I ′ there exists x ∈ C such that

g ′ (x, i0 ) < 0, g ′ (x, i1 ) < 0, . . . , g ′ (x, in ) < 0,

which is of the form (ii)(a). As we have already seen that (ii)(a) implies (i)
with U = Rn , there exists x ∈ C and ε > 0 such that

g ′ (x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I ′ ,

which by the definition of the function g ′ implies that

y ∈ U, g(x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I,

that is,

x + εB ⊂ U, g(x, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I.

Thus, (ii)(b) implies (i) and hence establishes the result. 


Using the above proposition, Borwein [16] obtained the equivalent reduced
form of (SIP ) under the relaxed Slater constraint qualification. The convex
semi-infinite programming (SIP ) is said to satisfy the relaxed Slater constraint
qualification for (SIP ) if given any n+1 points {i0 , i1 , . . . , in } ⊂ I, there exists
x̂ ∈ Rn such that

g(x̂, i0 ) < 0, g(x̂, i1 ) < 0, . . . , g(x̂, in ) < 0.

Observe that the Slater constraint qualification for (SIP ) also implies the
relaxed Slater constraint qualification for (SIP ). Now we present the KKT
g ).
optimality condition for (SIP ) by reducing it to the equivalent (SIP

Theorem 11.4 Consider the convex semi-infinite programming problem


(SIP ). Suppose that the relaxed Slater constraint qualification for (SIP ) holds.
Then x̄ is a point of minimizer of (SIP ) if and only if there exist n points
{i1 , i2 , . . . , in } ⊂ I, λij ≥ 0, j = 1, 2, . . . , n, such that
n
X
0 ∈ ∂f (x̄) + λij ∂g(x̄, ij ).
j=1

Proof. Define an open set

U = {x ∈ Rn : f (x) < f (x̄)}.

© 2012 by Taylor & Francis Group, LLC


11.3 Reduction Approach 373

Consider x1 , x2 ∈ U . By the convexity of f ,

f ((1 − λ)x1 + λx2 ) ≤ (1 − λ)f (x1 ) + λf (x2 ) < f (x̄), ∀ λ ∈ [0, 1],

which implies (1 − λ)x1 + λx2 ∈ U . Because x1 , x2 ∈ U were arbitrary, U is


a convex set. As x̄ is a point of minimizer of (SIP ), there does not exist any
x ∈ Rn and ε > 0 such that

x + εB ⊂ U, g(y, i) < 0, ∀ y ∈ x + εB, ∀ i ∈ I,

which implies that (i) of Proposition 11.3 does not hold. Therefore either
(ii)(a) or (ii)(b) is not satisfied. As the relaxed Slater constraint qualification
for (SIP ), which is the same as (ii)(a), holds, (ii)(b) cannot be satisfied. Thus,
there exist n points {i1 , i2 , . . . , in } ⊂ I such that

f (x) < f (x̄), g(x, ij ) < 0, j = 1, 2, . . . , n, (11.9)

has no solution. We claim that x̄ is a point of minimizer of the reduced problem


inf f (x) subject to g(x, ij ) ≤ 0, j = 1, 2, . . . , n. g)
(SIP
g ), that is,
Consider a feasible point x̃ of (SIP

g(x, ij ) ≤ 0, j = 1, 2, . . . , n. (11.10)

Also, by the relaxed Slater constraint qualification for (SIP ), corresponding


g ),
to the n + 1 points {i0 , i1 , i2 , . . . , in } ⊂ I with {i1 , i2 , . . . , in } as in (SIP
there exists x̂ such that

g(x̂, ij ) < 0, j = 0, 1, 2, . . . , n. (11.11)

By the convexity of g(., ij ), j = 1, 2, . . . , n, along with the conditions (11.10)


and (11.11),

g((1 − λ)x̃ + λx̂, ij ) ≤ (1 − λ)g(x̃, ij ) + λg(x̂, ij ) < 0, ∀ λ ∈ (0, 1).

Because the system (11.9) has no solution,

f ((1 − λ)x̃ + λx̂) ≥ f (x̄), ∀ λ ∈ (0, 1). (11.12)

As dom f = Rn , by Theorem 2.69, f is continuous on Rn . Thus, taking the


limit as λ → 0, the inequality (11.12) leads to

f (x̃) ≥ f (x̄).

g ), x̄ is a point of minimizer of
Because x̃ is an arbitrary feasible point of (SIP
g
(SIP ). Observe that by (11.11), the Slater constraint qualification is satisfied

© 2012 by Taylor & Francis Group, LLC


374 Convex Semi-Infinite Optimization

by the reduced problem. Therefore, invoking Theorem 3.7, there exist λij ≥ 0,
j = 1, 2, . . . , n, such that
n
X
0 ∈ ∂f (x̄) + λij ∂g(x̄, ij ) and λij g(x̄, ij ) = 0, j = 1, 2, . . . , n. (11.13)
j=1

[I(x̄)]
We claim that λ ∈ R+ . From the complementary slackness condition in
the optimality condition (11.13), if ij 6∈ I(x̄), λij = 0, whereas for ij ∈ I(x̄),
λij ≥ 0. For i 6∈ {i1 , i2 , . . . , in } but i ∈ I(x̄), define λi = 0. Therefore,
[I(x̄)]
λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ

thereby yielding the KKT optimality condition for (SIP ). The converse can
be worked out along the lines of Theorem 11.2. 

11.4 Lagrangian Regular Point


In both the preceding sections on reduction techniques to establish the KKT
optimality condition for (SIP ), the index set I was taken to be compact. But
then what about the scenarios where the index set I need not be compact.
To look into such situations, López and Vercher [75] introduced the concept
of Lagrangian regular point, which we present next. Before we define this
concept, we introduce the following notations.
For x̄ ∈ CI having nonempty I(x̄), define
[
S̄(x̄) = {∂g(x̄, i) ∈ Rn : i ∈ I(x̄)} = ∂g(x̄, i)
i∈I(x̄)

and
X [I(x̄)]
b
S(x̄) = cone co S̄(x̄) = { λi ∂g(x̄, i) ∈ Rn : λ ∈ R+ }.
i∈supp λ

For any i ∈ I(x̄), consider ξi ∈ ∂g(x̄, i) ⊂ S̄(x̄). By Definition 2.77 of the


subdifferential,

g(x, i) − g(x̄, i) ≥ hξi , x − x̄i, ∀ x ∈ Rn .

In particular, for {xk } ⊂ CI , that is, g(xk , i) ≤ 0 along with the fact that
g(x̄, i) = 0, i ∈ I(x̄), the above inequality reduces to

hξi , xk − x̄i ≤ 0, ∀ k ∈ N.

© 2012 by Taylor & Francis Group, LLC


11.4 Lagrangian Regular Point 375

For any {αk } ⊂ R+ ,

hξi , αk (xk − x̄)i ≤ 0, ∀ k ∈ N.

Taking the limit as k → +∞ in the above inequality,

hξi , zi = lim hξi , αk (xk − x̄)i ≤ 0,


k→∞

where z ∈ cl cone (CI − x̄). By Theorem 2.35, z ∈ TCI (x̄). Because i ∈ I(x̄)
and ξi ∈ S̄(x̄) were arbitrary,

hξ, zi ≤ 0, ∀ ξ ∈ S̄(x̄),

which implies z ∈ (S̄(x̄))◦ . Because z ∈ TCI (x̄) is arbitrary,

TCI (x̄) ⊂ (S̄(x̄))◦ , (11.14)


b
which by Propositions 2.31(iii) and 2.37 implies that cl S(x̄) ⊂ NCI (x̄). As
(SIP ) is equivalent to the unconstrained problem,
inf (f + δCI )(x) subject to x ∈ Rn , (SIPu )
therefore, if x̄ is a point of minimizer of (SIP ), it is also a minimizer of (SIPu ).
By Theorem 3.1, the following optimality condition

0 ∈ ∂f (x̄) + NCI (x̄) (11.15)


b
holds. So rather than cl S(x̄) ⊂ NCI (x̄), one would prefer the reverse relation
so that the above condition may be explicitly expressed in terms of the subd-
ifferential of the constraints. Thus, we move on with the notion of Lagrangian
regular point studied in López and Vercher [75].

Definition 11.5 x̄ ∈ CI is said to be a Lagrangian regular point if


(i) I(x̄) is empty: TCI (x̄) = Rn .
b
(ii) I(x̄) is nonempty: (S̄(x̄))◦ ⊂ TCI (x̄) and S(x̄) is closed.

Recall the equivalent Abadie constraint qualification for (CP ) studied in


Chapter 3, that is, S(x̄) ⊂ TC (x̄), where

S(x̄) = {v ∈ Rn : gi′ (x̄, v) ≤ 0, ∀ i ∈ I(x̄)}.

By Proposition 3.9,
b
(S(x̄))◦ = cl S(x̄)

which by Proposition 2.31(ii) and (iii) along with the fact that S(x̄) is a closed
convex cone implies that
b
(S(x̄))◦
= (S̄(x̄))◦ = S(x̄)

© 2012 by Taylor & Francis Group, LLC


376 Convex Semi-Infinite Optimization

where
m
[
S̄(x̄) = {∂gi (x̄) : i = 1, 2, . . . , m} = ∂gi (x̄).
i=1

Therefore, the Abadie constraint qualification is equivalent to

(S̄(x̄))◦ ⊂ TC (x̄).

Moreover, in the derivation of the standard KKT optimality condition for


(CP ), Theorem 3.10, under the Abadie constraint qualification, we further
b
assumed that S(x̄) was closed. A careful look at the Lagrangian regular point
when I(x̄) is nonempty shows that it is an extension of the Abadie constraint
qualification to (SIP ) along with the closedness condition. Next we derive
the KKT optimality condition for (SIP ) under the Lagrangian regularity.
The result is due to López and Vercher [75].

Theorem 11.6 Consider the convex semi-infinite programming problem


(SIP ). Assume that x̄ ∈ CI is a Lagrangian regular point. Then x̄ is a point
[I(x̄)]
of minimizer of (SIP ) if and only if there exists λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i).
i∈supp λ

Proof. Suppose that x̄ is a point of minimizer of (SIP ), which by the condi-


tion (11.15) implies

0 ∈ ∂f (x̄) + NCI (x̄).

Therefore, there exists ξ ∈ ∂f (x̄) such that

−ξ ∈ NCI (x̄).

Depending on the emptiness and nonemptiness of I(x̄), we consider the fol-


lowing two cases.
(i) I(x̄) is empty: As x̄ is a Lagrangian regular point, TCI (x̄) = Rn , which by
Proposition 2.37 implies that

NCI (x̄) = (TCI (x̄))◦ = {0}.

Therefore, the optimality condition reduces to

0 ∈ ∂f (x̄).

(ii) I(x̄) is nonempty: As x̄ is a Lagrangian regular point, (S̄(x̄))◦ ⊂ TCI (x̄),


which by Proposition 2.31(i) and (iii) yields that

NCI (x̄) ⊂ (S̄(x̄))◦◦


b
= cl cone co S̄(x̄) = cl S(x̄). (11.16)

© 2012 by Taylor & Francis Group, LLC


11.4 Lagrangian Regular Point 377

Also, by relation (11.14), TCI (x̄) ⊂ (S̄(x̄))◦ , which implies that


b
cl S(x̄) ⊂ NCI (x̄) (11.17)

is always true. Combining the conditions (11.16) and (11.17),


b
NCI (x̄) = cl S(x̄).
b
Again, by Definition 11.5 of the Lagrangian regular point, S(x̄) is closed and
hence, the KKT optimality condition becomes
b
0 ∈ ∂f (x̄) + S(x̄),
[I(x̄)]
which implies that there exists λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ

as desired. The converse can be worked out as in Theorem 11.2, thereby es-
tablishing the requisite result. 
In Goberna and López [50], they consider the feasible direction cone to
F ⊂ Rn at x̄ ∈ F , DF (x̄), defined as

DF (x̄) = {d ∈ Rn : there exists λ > 0 such that x̄ + λd ∈ F }.

It is easy to observe that

DF (x̄) ⊂ cone (F − x̄). (11.18)

In case F is a convex set, by Definition 2.46 of convex set, for every x ∈ F ,

x̄ + λ(x − x̄) ∈ F, ∀ λ ∈ (0, 1),

which implies x − x̄ ∈ DF (x̄). Because x ∈ F was arbitrary, F − x̄ ⊂ DF (x̄).


As DF (x̄) is a cone,
cone (F − x̄) ⊂ DF (x̄). (11.19)
Combining the conditions (11.18) and (11.19),

DF (x̄) = cone (F − x̄)

and hence, the tangent cone to F at x̄ is related to the feasible direction set
as

TF (x̄) = cl DF (x̄).

For the convex semi-infinite programming problem (SIP ), the feasible set is
CI . In particular, taking F = CI in the above condition yields

TCI (x̄) = cl DCI (x̄).

© 2012 by Taylor & Francis Group, LLC


378 Convex Semi-Infinite Optimization

From (11.14) we have that TCI (x̄) ⊂ (S̄(x̄))◦ ; thus the above condition yields

DCI (x̄) ⊂ TCI (x̄) ⊂ (S̄(x̄))◦ . (11.20)

In Definition 11.5 of the Lagrangian regular point, for x̄ ∈ CI with nonempty


I(x̄),

(S̄(x̄))◦ ⊂ TCI (x̄) = cl DCI (x̄).

Combining the above condition with (11.20), which along with Proposi-
b
tion 2.31 and the fact that S(x̄) = cone co S̄(x̄) implies that

b
cl S(x̄) = (S̄(x̄))◦◦ = (DCI (x̄))◦ .

b
By the closedness condition of S(x̄) at the Lagrangian regular point x̄, the
preceding condition reduces to
b
S(x̄) = (DCI (x̄))◦ .

The above qualification condition is referred to as the convex locally Farkas-


Minkowski problem in Goberna and López [50]. For x̄ ∈ ri CI , TCI (x̄) = Rn ,
which by Proposition 2.37 implies that

NCI (x̄) = (TCI (x̄))◦ = {0}.

As TCI (x̄) ⊂ (S̄(x̄))◦ always holds, by Proposition 2.31,

{0} ⊂ (S̄(x̄))◦◦ ⊂ (TCI (x̄))◦ = {0}

which implies
b
S(x̄) = cl cone co S̄(x̄) = (S̄(x̄))◦◦ = {0}.

b
Thus, S(x̄) = NCI (x̄) for every x̄ ∈ ri CI . Therefore one needs to impose
the Lagrangian regular point condition to boundary points only. This fact
was mentioned in Goberna and López [50] and was proved in Fajardo and
López [44].
Recall that in Chapter 3, under the Slater constraint qualification, Propo-
b
sition 3.3 leads to NC (x̄) = S(x̄), which by Propositions 2.31 and 2.37 is
equivalent to
b
TC (x̄) = (S(x̄))◦ b
= (cl S(x̄))◦
= S(x̄).

b
Also, under the Slater constraint qualification, by Lemma 3.5, S(x̄) is closed.
Hence the Slater constraint qualification leads to the Abadie constraint qualifi-
cation along with the closedness criteria. A similar result also holds for (SIP ).
But before that we present Gordan’s Theorem of Alternative which plays an
important role in establishing the result.

© 2012 by Taylor & Francis Group, LLC


11.4 Lagrangian Regular Point 379

Proposition 11.7 (Gordan’s Theorem of Alternative) Consider xi ∈ Rn for


i ∈ I, where I is an arbitrary index set. If co{xi : i ∈ I} is a closed set, then
the equivalence holds between the negation of system (I) and system (II),
where
(I) {x ∈ Rn : hxi , xi < 0, i ∈ I} 6= ∅,
(II) 0 ∈ co {xi : i ∈ I}.

Proof. If xi = 0 for some i ∈ I, then the result holds trivially as system (I)
is not satisfied while system (II) holds. So without loss of generality, assume
that xi 6= 0 for every i ∈ I.
Suppose (I) does not hold. Let 0 6∈ co {xi : i ∈ I}. As by hypothesis
co {xi : i ∈ I} is closed, by the Strict Separation Theorem, Theorem 2.26(iii),
there exists a ∈ Rn with a 6= 0 such that

ha, xi < 0, ∀ x ∈ co {xi : i ∈ I}.

In particular, for xi ∈ co {xi : i ∈ I},

ha, xi i < 0, ∀ i ∈ I,

which implies system (I) holds, a contradiction to our supposition. Thus


0 ∈ co {xi : i ∈ I}, that is, system (II) holds.
[I]
Suppose
P that system (II) holds, which implies that there exists λ ∈ R+
with i∈supp λ λi = 1 such that
X
0= λi xi .
i∈supp λ

Let x̄ ∈ {x ∈ Rn : hxi , xi < 0, i ∈ I}. Therefore,


X
0 = h0, x̄i = λi hxi , x̄i < 0,
i∈supp λ

which is a contradiction. Thus, system (I) does not hold, thereby completing
the proof. 
The hypothesis that co {xi : i ∈ I} is a closed set is required as shown
in the
h example from López and Vercher [75]. Consider xi = (cos i, sin i) and
π π
I= − , . Observe that (0, 0) 6∈ co {xi : i ∈ I} as
2 2
0 ∈ co {cos i : i ∈ I} and 0 ∈ co {sin i : i ∈ I}

cannot hold simultaneously because 0 ∈ co {cos i : i ∈ I} is possible only if


π
i = − at which sin i = −1. Thus system (II) is not satisfied. Also, there
2
does not exist any x = (xc , xs ) such that

xc cos i + xs sin i < 0, ∀ i ∈ I. (11.21)

© 2012 by Taylor & Francis Group, LLC


380 Convex Semi-Infinite Optimization
π
On the contrary, suppose that such an x exists. In particular, taking i = −
2
π
and i = 0 yields that xs > 0 and xc < 0, respectively. But as the limit i → ,
2
xc cos i + xs sin i → xs ,

that is, for some i ∈ I, xc cos i + xs sin i > 0, which is a contradiction to


(11.21). Hence, system (I) is also not satisfied. Note that co {xi : i ∈ I} is not
π
closed because taking the limit as i → ,
2
xi = (cos i, sin i) → (0, 1)

and (0, 1) ∈
/ co {xi : i ∈ I}.
Now we present the result from López and Vercher [75] showing that
the Slater constraint qualification for (SIP ) implies that every feasible point
x ∈ CI is a Lagrangian regular point.

Proposition 11.8 Consider the convex semi-infinite programming problem


(SIP ). If the Slater constraint qualification for (SIP ) holds, then every x̄ ∈ CI
is a Lagrangian regular point.

Proof. Suppose that x̄ ∈ CI , that is,

g(x̄, i) ≤ 0, i ∈ I.

Define g(x) = supi∈I g(x, i).


(i) I(x̄) is empty: By conditions (i) and (ii) of the Slater constraint qualifica-
tion for (SIP ), g(x̄) < 0 which by Proposition 2.67 implies that x̄ ∈ ri CI .
Therefore,

TCI (x̄) = cl cone (CI − x̄) = Rn .

(ii) I(x̄) is nonempty: We claim that I(x̄) is compact. By condition (i) of the
Slater constraint qualification for (SIP ), I is compact. Because I(x̄) ⊂ I, I(x̄)
is bounded. Now consider {ik } ⊂ I(x̄) such that ik → i. By the compactness
of I and the fact that I(x̄) ⊂ I, i ∈ I. As ik ∈ I(x̄),

g(x̄, ik ) = 0, ∀ k ∈ N,

which by condition (ii) of the Slater constraint qualification for (SIP ), that
is, the continuity of g(x, i) with respect to (x, i) in Rn × I, implies that as the
limit k → +∞, g(x̄, i) = 0 and thus i ∈ I(x̄). Therefore, I(x̄) is closed, which
along with the boundedness impliesSthat I(x̄) is compact.
Next we will show that S̄(x̄) = i∈I(x̄) ∂g(x̄, i) is compact. Suppose that
{ξk } ⊂ S̄(x̄) with ξk → ξ. As ξk ∈ S̄(x̄), there exists ik ∈ I(x̄) such that
ξk ∈ ∂g(x̄, ik ), that is, by Definition 2.77 of the subdifferential,

g(x, ik ) − g(x̄, ik ) ≥ hξk , x − x̄i, ∀ x ∈ Rn .

© 2012 by Taylor & Francis Group, LLC


11.4 Lagrangian Regular Point 381

As I(x̄) is compact, without loss of generality, assume that ik → i ∈ I(x̄).


Taking the limit as k → +∞ in the above inequality along with condition (ii)
of the Slater constraint qualification for (SIP ) yields

g(x, i) − g(x̄, i) ≥ hξ, x − x̄i, ∀ x ∈ Rn ,

that is, ξ ∈ ∂g(x̄, i) with i ∈ I(x̄). Therefore ξ ∈ S̄(x̄), thereby implying that
S̄(x̄) is closed. As dom g(., i) = Rn , i ∈ I(x̄), by Proposition 2.83, ∂g(x̄, i) is
compact for i ∈ I(x̄), which implies for every ξi ∈ ∂g(x̄, i) there exists Mi > 0
such that

kξi k ≤ Mi , ∀ i ∈ I(x̄).

Because I(x̄) is compact, the supremum of Mi over I(x̄) is attained, that is,

sup Mi = M < +∞.


i∈I(x̄)

Therefore, for every i ∈ I(x̄),

kξk ≤ M, ∀ ξ ∈ ∂g(x̄, i),

which implies that S̄(x̄) is bounded. Thus S̄(x̄) is compact.


As I(x̄) is nonempty, g(x̄) = 0. By condition (iii) of the Slater constraint
qualification for (SIP ), there exists x̂ such that g(x̂, i) < 0, i ∈ I, which along
with condition (i) yields that

g(x̂) < 0 = g(x̄).

Thus, x̄ is not a point of minimizer of g and hence 0 ∈ / ∂g(x̄). By the Valadier


formula, Theorem 2.97,
[
0∈ / co ∂g(x̄, i) = co S̄(x̄).
i∈I(x̄)

Because S̄(x̄) is compact, by Theorem 2.9, co S̄(x̄) is also compact. By Propo-


b
sition 3.4, cone co S̄(x̄) and hence S(x̄) is closed.
Finally to establish that x̄ is a Lagrangian regular point, we will prove
that (S̄(x̄))◦ ⊂ TCI (x̄). Define the set

S ′ (x̄) = {x ∈ Rn : hξ, xi < 0, ∀ ξ ∈ S̄(x̄)}.

As co S̄(x̄) is closed with 0 6∈ co S̄(x̄), by the Gordan’s Theorem of Alternative,


Proposition 11.7, S ′ (x̄) is nonempty. Therefore, by Proposition 2.67, ri (S̄(x̄))◦
is nonempty. Consider z ∈ ri (S̄(x̄))◦ , which implies

hξ, zi < 0, ∀ ξ ∈ S̄(x̄),

© 2012 by Taylor & Francis Group, LLC


382 Convex Semi-Infinite Optimization

which leads to

hξ, zi < 0, ∀ ξ ∈ co S̄(x̄).

Again, by the Valadier formula, Theorem 2.97,


[
∂g(x̄) = co ∂g(., i) = co S̄(x̄).
i∈I(x̄)

Thus,

hξ, zi < 0, ∀ ξ ∈ ∂g(x̄).

By the compactness of I, the supremum is attained by g(., i) over I. As


dom g(., i) = Rn , dom g = Rn . Therefore, by Theorem 2.79,

g ′ (x̄, z) = max hξ, zi.


ξ∈∂g(x̄)

Also, because dom g = Rn , by Proposition 2.83, ∂g(x̄) is compact and thus


g ′ (x̄, z) < 0, which implies that there exists λ̄ > 0 such that

g(x̄ + λz) < 0, ∀ λ ∈ (0, λ̄),

which implies that for every λ ∈ (0, λ̄),

g(x̄ + λz, i) < 0, ∀ i ∈ I.

Hence, x̄ + λz ∈ CI for every λ ∈ (0, λ̄), which yields


1
z∈ (CI − x̄) ⊂ cl cone (CI − x̄) = TCI (x̄).
λ
Because z ∈ ri (S̄(x̄))◦ was arbitrary, ri (S̄(x̄))◦ ⊂ TCI (x̄), which along with
the closedness of the tangent cone, TCI (x̄), leads to

(S̄(x̄))◦ = cl (ri (S̄(x̄))◦ ) ⊂ TCI (x̄).

From both cases, we obtain that x̄ ∈ CI is a Lagrangian regular point. Because


x̄ was arbitrary, every feasible point is a Lagrangian regular point under the
assumption of the Slater constraint qualification for (SIP ), thereby establish-
ing the result. 

11.5 Farkas–Minkowski Linearization


In the previous section on the Lagrangian regular point, observe that it is
defined at a point and hence is a local qualification condition. We observed

© 2012 by Taylor & Francis Group, LLC


11.5 Farkas–Minkowski Linearization 383

that the notion of Lagrangian regular point is also known as the convex locally
Farkas–Minkowski problem. In this section, we will discuss about the global
qualification condition, namely the Farkas–Minkowski qualification studied
in Goberna, López, and Pastor [51] and López and Vercher [75]. Before in-
troducing this qualification condition, let us briefly discuss the concept of
Farkas–Minkowski system for a linear semi-infinite system from Goberna and
López [50].
Consider a linear semi-infinite system
Θ = {hxi , xi ≥ ci , i ∈ I} (LSIS)
The relation hx̃, xi ≥ c̃ is a consequence relation of the system Θ if every solu-
tion of Θ satisfies the relation. A consistent (LSIS) Θ is said to be a Farkas–
Minkowski system, in short, an FM system, if every consequent relation is a
consequence of some finite subsystem. Before we state the Farkas–Minkowski
qualification for convex semi-infinite programming problem (SIP ), we present
some results on the consequence relation and the FM system from Goberna
and López [50].

Proposition 11.9 hx̃, xi ≥ c̃ is a consequence relation of the consistent


(LSIS) Θ if and only if

(x̃, c̃) ∈ cl cone co {(xi , ci ), i ∈ I; (0, −1)}.

Proof. Denote by K ⊂ Rn+1 the convex cone

K = cone co {(xi , ci ), i ∈ I; (0, −1)}.

Consider i′ ∈
/ I and define (xi′ , ci′ ) = (0, −1) and I ′ = {i′ } ∪ I. Thus

K = cone co {(xi , ci ), i ∈ I ′ }.
[I ′ ]
Suppose that (x̃, c̃) ∈ cl K, which implies there exist {λk } ⊂ R+ , {sk } ∈ N,
{xikj } ⊂ Rn and {cikj } ⊂ R satisfying ikj ∈ I ′ for j = 1, 2, . . . , sk , such that
sk
X
(x̃, c̃) = lim λkj (xikj , cikj ). (11.22)
k→∞
j=1

As K ⊂ Rn+1 , by the Carathéodory Theorem, Theorem 2.8, 1 ≤ sk ≤ n + 2.


For any k ∈ N with sk < n + 2, define λkj = 0 and any arbitrary (xikj , cikj )
with ikj ∈ I ′ for j = sk + 1, sk + 2, . . . , n + 2. Therefore, condition (11.22)
becomes
n+2
X
(x̃, c̃) = lim λkj (xikj , cikj ). (11.23)
k→∞
j=1

If x̄ is a solution of (LSIS) Θ,

hxikj , x̄i ≥ cikj , j = 1, 2, . . . , n + 2,

© 2012 by Taylor & Francis Group, LLC


384 Convex Semi-Infinite Optimization

which along with the fact λkj ≥ 0, j = 1, 2, . . . , n + 2, leads to


n+2
X n+2
X
λkj hxikj , x̄i ≥ λkj cikj .
j=1 j=1

Taking the limit as k → +∞ in the above condition along with (11.23) yields

hx̃, x̄i ≥ c̃.

Because x̄ was arbitrary, hx̃, xi ≥ c̃ is a consequence relation of (LSIS) Θ.


Conversely, suppose that hx̃, xi ≥ c̃ is a consequence relation of (LSIS) Θ.
We claim that (x̃, c̃) ∈ cl K. On the contrary, suppose that (x̃, c̃) 6∈ cl K. By the
Strict Separation Theorem, Theorem 2.26 (iii), there exist (γ, γn+1 ) ∈ Rn × R
with (γ, γn+1 ) 6= (0, 0) such that

hγ, xi + γn+1 c > α̃ = hγ, x̃i + γn+1 c̃, ∀ (x, c) ∈ cl K. (11.24)

As 0 ∈ K ⊂ cl K, the above condition implies that α̃ < 0. We claim that

hγ, xi + γn+1 c ≥ 0, ∀ (x, c) ∈ cl K.

On the contrary, suppose that there exists (x̄, c̄) ∈ cl K such that

α̃ < hγ, x̄i + γn+1 c̄ < 0.

Because cl K is a cone, λ(x̄, c̄) ∈ cl K for λ > 0. Therefore, the above inequality
along with the condition (11.24) implies that

α̃ < hγ, λx̄i + γn+1 λc̄ < 0. (11.25)

Taking the limit as λ → +∞,

hγ, λx̄i + γn+1 λc̄ → −∞,

thereby contradicting the relation (11.25). Thus,

hγ, xi + γn+1 c ≥ 0 > α̃, ∀ (x, c) ∈ cl K. (11.26)

As (0, −1) ∈ K ⊂ cl K, for λ > 0, (0, −λ) ∈ cl K. Therefore, (11.26) leads to

−λγn+1 ≥ 0, ∀ λ > 0,

which implies that γn+1 ≤ 0. We now consider the following two cases.
(i) γn+1 = 0: The condition (11.26) reduces to

hγ, xi ≥ 0 > hγ, x̃i, ∀ (x, c) ∈ cl K.

In particular, for (xi , ci ) ∈ cl K, i ∈ I ′ ,

hγ, xi i ≥ 0 > hγ, x̃i, ∀ i ∈ I ′ . (11.27)

© 2012 by Taylor & Francis Group, LLC


11.5 Farkas–Minkowski Linearization 385

As (LSIS) Θ is consistent, there exists x̄ ∈ Rn such that

hx̄, xi i ≥ ci , ∀ i ∈ I. (11.28)

Therefore, from the inequalities (11.27) and (11.28), for any λ > 0,

hx̄ + λγ, xi i ≥ ci , ∀ i ∈ I,

which implies x̄ + λγ is a solution of (LSIS) Θ. By our supposition, hx̃, xi ≥ c̃


is a consequence relation of Θ, which implies that

hx̃, x̄ + λγi ≥ c̃. (11.29)

By the condition (11.27), as the limit λ → +∞,

hx̄ + λγ, x̃i → −∞,

thereby contradicting the inequality (11.29).


(ii) γn+1 < 0: As γn+1 6= 0, dividing (11.26) throughout by −γn+1 and setting
−γ
x̄ = ,
γn+1

hx̄, xi − c ≥ 0 > hx̄, x̃i − c̃, ∀ (x, c) ∈ cl K.

The above condition holds in particular for (xi , ci ) ∈ K ⊂ cl K, i ∈ I. Thus,

hx̄, xi i − ci ≥ 0 > hx̄, x̃i − c̃, ∀ i ∈ I,

that is,

hx̄, xi i ≥ ci , i ∈ I and hx̄, x̃i < c̃.

Therefore, x̄ is a solution of (LSIS) Θ but does not satisfy the consequence


relation hx̃, xi ≥ c̃, which is again a contradiction.
Hence, our assumption was wrong and (x̃, c̃) ∈ cl K, thereby completing the
proof. 
Next we present the characterization of the FM system Θ from Goberna,
López, and Pastor [51].

Proposition 11.10 (LSIS) Θ is an FM system if and only if

(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}.

Proof. Suppose that

(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)}.

© 2012 by Taylor & Francis Group, LLC


386 Convex Semi-Infinite Optimization

As in the proof of Proposition 11.9, consider i′ ∈


/ I. Define (xi′ , ci′ ) = (0, −1)
and I ′ = {i′ } ∪ I. Therefore,

(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I ′ } ⊂ Rn+1 ,

which by the Carathéodory Theorem, Theorem 2.8, implies that there exist
λj ≥ 0 and ij ∈ I ′ , j = 1, . . . , s, with 1 ≤ s ≤ n + 2 such that
s
X
(x̃, c̃) = λj (xij , cij ).
j=1

Invoking Proposition 11.9 to the finite system, hx̃, xi ≥ c̃ is a consequence


relation of the finite system

hxij , xi ≥ cij , j = 1, 2, . . . , s.

Therefore, (LSIS) Θ is an FM system.


Conversely, suppose that Θ is an FM system, which implies that a conse-
quence relation hx̃, xi ≥ c̃ of the infinite system

hxi , xi ≥ ci , i ∈ I

can be expressed as a consequence of finite subsystem, that is, hx̃, xi ≥ c̃ is a


consequence relation of a finite subsystem

hxi , xi ≥ ci , i = 1, 2, . . . , s.

Applying Proposition 11.9 to the above finite system,

(x̃, c̃) ∈ cl cone co {(xi , ci ), i = 1, 2, . . . , s; (0, −1)}


= cone co {(xi , ci ), i = 1, 2, . . . , s; (0, −1)}

which implies that

(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)},

thereby establishing the desired result. 


Now we introduce the Farkas–Minkowski qualification for (SIP ) from Gob-
erna, López, and Pastor [51].

Definition 11.11 The convex semi-infinite programming problem (SIP ) is


said to satisfy the Farkas–Minkowski (FM) qualification if (LSIS)

Θ = {g(y, i) + hξ, x − yi ≤ 0 : (y, i) ∈ Rn × I, ξ ∈ ∂g(y, i)}

is an FM system.

© 2012 by Taylor & Francis Group, LLC


11.5 Farkas–Minkowski Linearization 387

Using the FM qualification, we present the KKT optimality condition for


(SIP ) from Goberna, López, and Pastor [51].

Theorem 11.12 Consider the convex semi-infinite programming problem


(SIP ). Assume that the FM qualification holds. Then x̄ ∈ Rn is a point of
[I(x̄)]
minimizer of (SIP ) if and only if there exists λ ∈ R+ such that
X
0 ∈ ∂f (x̄) + λi ∂gi (x̄).
i∈supp λ

e that is,
Proof. Define the solution set of (LSIS) Θ by C,
e = {x ∈ Rn : g(y, i) + hξ, x − yi ≤ 0, ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)}.
C

Note that as dom g(., i) = Rn , i ∈ I, by Proposition 2.83, ∂g(y, i) is nonempty


for every y ∈ Rn and hence C e is defined. We claim that CI = C. e
Suppose that x̃ ∈ CI , which implies that g(x̃, i) ≤ 0, i ∈ I. For any y ∈ Rn
and ξi ∈ ∂g(y, i) for i ∈ I, by Definition 2.77 of the subdifferential,

g(y, i) + hξi , x − yi ≤ g(x, i), ∀ x ∈ Rn .

In particular, for x = x̃, the above inequality becomes

g(y, i) + hξi , x̃ − yi ≤ g(x̃, i) ≤ 0, ∀ i ∈ I,


e Because x̃ ∈ CI was arbitrary, CI ⊂ C.
which leads to x̃ ∈ C. e
e
Conversely, suppose that x̃ ∈ C, which implies that for every y ∈ Rn and
i ∈ I,

g(y, i) + hξ, x̃ − yi ≤ 0, ∀ ξ ∈ ∂g(y, i).

In particular, taking y = x̃, the above condition reduces to

g(x̃, i) ≤ 0, i ∈ I,

thereby implying that x̃ ∈ CI . Because x̃ ∈ C e was arbitrary, Ce ⊂ CI . Hence


e
CI = C and thus the FM system Θ is a linearization of CI .
As x̄ is a point of minimizer of (SIP ), it is also the point of minimizer of
the equivalent problem

inf f (x) subject to x ∈ CI .

Because dom f = Rn , by Theorem 3.1,

0 ∈ ∂f (x̄) + NCI (x̄),

which implies that there exists ξ ∈ ∂f (x̄) such that −ξ ∈ NCI (x̄). By Defini-
tion 2.36 of the normal cone,

hξ, x − x̄i ≥ 0, ∀ x ∈ CI ,

© 2012 by Taylor & Francis Group, LLC


388 Convex Semi-Infinite Optimization

and thus it is a consequence relation of (LSIS) Θ. As Θ is an FM system, by


Proposition 11.10, there exist λj ≥ 0, ξj ∈ ∂g(x̄, ij ), ij ∈ I, j = 1, 2, . . . , s,
and µ ≥ 0 such that
s
X
(ξ, hξ, x̄i) = λj (−ξj , g(yj , ij ) − hξj , yj i) + (0, −µ).
j=1

Without loss of generality, assume that λj > 0, j = 1, 2, . . . , s. Now multiply-


ing the above condition throughout by (−x, 1) leads to
s
X
hξ, x̄ − xi = λj (g(yj , ij ) + hξj , x − yj i) − µ.
j=1

As µ ≥ 0, the above relation leads to


s
X
hξ, x̄ − xi ≤ λj (g(yj , ij ) + hξj , x − yj i). (11.30)
j=1

As ξj ∈ ∂g(x̄, ij ), j = 1, 2, . . . , s,

g(yj , ij ) + hξj , x − yj i ≤ g(x, ij ), ∀ x ∈ Rn . (11.31)

Also, because ξ ∈ ∂f (x̄),

f (x̄) − f (x) ≤ hξ, x̄ − xi, ∀ x ∈ Rn . (11.32)

Combining the conditions (11.30), (11.31) and (11.32), yields that


s
X
f (x̄) ≤ f (x) + λj g(x, ij ), ∀ x ∈ Rn . (11.33)
j=1

In particular, taking x = x̄ in the above inequality, which along with the


feasibility of x̄ leads to
s
X
0≤ λj g(x̄, ij ) ≤ 0,
j=1

s
X
that is, λj g(x̄, ij ) = 0. Thus,
j=1

λj g(x̄, ij ) = 0, ∀ j = 1, 2, . . . , s.

By our supposition, λj > 0, j = 1, 2, . . . , s which implies that g(x̄, ij ) = 0, that


is, ij ∈ I(x̄), j = 1, 2, . . . , s. Define λi = 0 for i ∈ I(x̄) and i 6∈ {i1 , i2 , . . . , is }.

© 2012 by Taylor & Francis Group, LLC


11.5 Farkas–Minkowski Linearization 389

Therefore, from the inequality (11.33), x̄ is the minimizer of the unconstrained


problem
X
inf f (x) + λi g(x, i) subject to x ∈ Rn ,
i∈supp λ

[I(x̄)]
where λ ∈ R+ . By the optimality condition for the unconstrained problem,
Theorem 2.89,
X
0 ∈ ∂(f + λi g(., i))(x̄).
i∈supp λ

As dom f = dom g(., i) = Rn for i ∈ I(x̄), by the Sum Rule, Theorem 2.91,
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i),
i∈supp λ

thereby establishing the KKT optimality conditions for (SIP ). The converse
can be worked out along the lines of Theorem 11.2. 
Another notion that implies (LSIS) Θ is an FM system is that of the
canonically closed system. Below we define this concept and a result relating
a canonically closed system and an FM system from Goberna, López, and
Pastor [51].

Definition 11.13 (LSIS) Θ = {hxi , xi ≥ ci , i ∈ I} is canonically closed if


the following conditions hold:

(i) There exists x̂ ∈ Rn such that hxi , x̂i > ci , i ∈ I.

(ii) The set {(xi , ci ), i ∈ I} is compact.

The following result provides different conditions under which Θ is an FM


system, part of the proof is due to Hestenes [55].

Proposition 11.14 If the consistent (LSIS) Θ satisfies one of the following


conditions, then it is an FM system:

(i) cone co {(xi , ci ), i ∈ I; (0, −1)}) is closed.

(ii) cone co {(xi , ci ), i ∈ I} is closed.

(iii) (LSIS) Θ is canonically closed.

Proof. (i) Suppose that cone co {(xi , ci ), i ∈ I; (0, −1)} is closed. Then by
Proposition 11.9, hx̃, xi ≥ c̃ is the consequence relation of (LSIS) Θ if and
only if

(x̃, c̃) ∈ cone co {(xi , ci ), i ∈ I; (0, −1)},

© 2012 by Taylor & Francis Group, LLC


390 Convex Semi-Infinite Optimization

which by Proposition 11.10 is equivalent to Θ being an FM system.


(ii) Define

K = cone co {(xi , ci ), i ∈ I; (0, −1)} and e = cone co {(xi , ci ), i ∈ I}.


K

It is easy to observe that


e + cone (0, −1).
K=K (11.34)

Suppose that K e is closed. We claim that K is closed, which by (i) will then
imply that Θ is an FM system. Consider a bounded sequence {(x̃k , c̃k )} ⊂ K
such that (x̃k , c̃k ) → (x̃, c̃). Note that (x̃k , c̃k ) for k ∈ N can be expressed as

(x̃k , c̃k ) = (xk , ck ) + λk (0, −1), ∀ k ∈ N, (11.35)


e and {λk } ⊂ R+ . Assume that {λk } is an unbounded
where {(xk , ck )} ⊂ K
sequence, which implies λk → +∞. From the condition (11.35)
1 1
(x̃k , c̃k ) = (xk , ck ) + (0, −1), ∀ k ∈ N.
λk λk
As (x̃k , c̃k ) → (x̃, c̃), taking the limit as k → +∞ in the above condition
implies that
1
(xk , ck ) → (0, 1),
λk
e ⊂ cl K. By Proposition 11.9,
that is, (0, 1) ∈ cl K

0 = h0, xi ≥ 1

is a consequent relation of (LSIS) Θ, which is impossible. Thus, {λk } is a


bounded sequence. By the Bolzano–Weierstrass Theorem, Proposition 1.3, it
has a convergent subsequence. Without loss of generality, assume that λk → λ.
By the condition (11.35) and boundedness of the sequences {(x̃k , c̃k )} and
{λk }, {(xk , ck )} is also a bounded sequence. Without loss of generality, by the
Bolzano–Weierstrass Theorem, let (xk , ck ) → (x, c). As Ke is closed, (x, c) ∈ K.
e
Therefore, taking the limit as k → +∞, (11.35) along with (11.34) yields that

(x̃, c̃) = (x, c) + λ(0, −1) ∈ K,

and hence K is closed, which by (i) implies that (LSIS) Θ is an FM system.


(iii) Suppose that Θ is a canonically closed system. Therefore, the set
{(xi , ci ), i ∈ I} is compact. We claim that
e = cone co {(xi , ci ), i ∈ I} ⊂ Rn+1
K

is closed. On the contrary, assume that it is not closed, which implies there

© 2012 by Taylor & Francis Group, LLC


11.5 Farkas–Minkowski Linearization 391

exists a convergent sequence {(x̃k , c̃k )} ⊂ K e such that (x̃k , c̃k ) → (x̃, c̃) with
e but (x̃, c̃) ∈
(x̃, c̃) ∈ cl K e Because (x̃k , c̃k ) ∈ K,
/ K. e there exist λi ≥ 0, ik ∈ I
kj j

for j = 1, 2, . . . , sk , {sk } ⊂ N such that


sk
X
(x̃k , c̃k ) = λikj (xikj , cikj ).
j=1

By the Carathéodory Theorem, Theorem 2.8, 1 ≤ sk ≤ n + 2. For sk < n + 2,


choose any ikj ∈ I with λikj = 0, j = sk + 1, sk + 2, . . . , n + 2. Therefore the
above condition can be rewritten as
n+2
X
(x̃k , c̃k ) = λikj (xikj , cikj ). (11.36)
j=1

As {(xi , ci ), i ∈ I} is compact, by the Bolzano–Weierstrass Theorem,


{(xikj , cikj )} has a convergent subsequence. Without loss of generality, assume
that (xi , ci ) → (xi , ci ) ∈ {(xi , ci ), i ∈ I}. By assumption, (x̃, c̃) ∈ e
/ K,
kj kj j j

which implies {λikj } is unbounded. Otherwise, there exists some convergent


subsequence of {λikj }. Without relabeling, assume that λikj → λij . Therefore,
taking the limit as k → +∞ in (11.36) leads to
n+2
X
(x̃, c̃) = e
λij (xij , cij ) ∈ K,
j=1

which is a contradiction of our assumption.


Pn+2 λikj
Denote λk = j=1 λikj . Observe that the sequence { } ⊂ R+ is a
λk
bounded sequence and hence by the Bolzano–Weierstrass Theorem has a con-
λikj
vergent subsequence. Without loss of generality, assume that → λij ≥ 0,
Pn+2 λk
j = 1, 2, . . . , n+2, with j=1 λij = 1. Dividing the condition (11.36) through-
out by λk and taking the limit as k → +∞, which along with the fact that
λk → +∞ yields
n+2
X n+2
X
λj (xij , cij ) = 0 with λij = 1. (11.37)
j=1 j=1

As (LSIS) Θ is canonically closed, there exists x̂ ∈ Rn such that hxi , x̂i > ci ,
i ∈ I, that is,
h(xi , ci ), (x̂, −1)i = hxi , x̂i − ci > 0, ∀ i ∈ I. (11.38)
Combining the relations (11.37) and (11.38) along with the fact that λij ≥ 0,
j = 1, 2, . . . , n + 2, not all simultaneously zero, implies that
n+2
X n+2
X
0= λij h(xij , cij ), (x̂, −1)i = λij (hxij , x̂i − cij ) > 0,
j=1 j=1

© 2012 by Taylor & Francis Group, LLC


392 Convex Semi-Infinite Optimization
e is closed,
which is impossible. Thus our assumption was wrong and hence K
which by (ii) yields that Θ is an FM system. 
As seen in Section 11.4, the Slater constraint qualification for (SIP ) im-
plies that every feasible point is a Lagrangian regular point; we will now
present the relation between the Slater constraint qualification for (SIP ) and
the FM qualification. For this we will need the following result from Goberna,
López, and Pastor [51].

Proposition 11.15 Consider a closed convex set F ⊂ Rn and let Fb denote


the boundary points of F . Also consider (LSIS)

Θ = {hxi , xi ≤ ci , i ∈ I}

such that
(i) every point of F is a solution of the system Θ,
(ii) there exists x̂ ∈ F such that hxi , xi < ci , i ∈ I,
(iii) given any y ∈ Fb , there exists some i ∈ I such that hxi , yi = ci .
Then F is the solution set of Θ, that is,

F = {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}.

Proof. Observe that by condition (i), F ⊂ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}.


Conversely, suppose that there exists

z ∈ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I}

and z 6∈ F . By (ii), there exists x̂ ∈ F such that hxi , xi < ci for every i ∈ I. As
F is a closed convex set, the line segment joining x̂ and z meets the boundary
Fb at only one point, say y ∈ (x̂, z). Therefore, there exists λ ∈ (0, 1) such
that y = (1 − λ)x̂ + λz ∈ Fb . By condition (iii), there exists ī ∈ I such that

hxī , yi = cī . (11.39)

By the conditions on x̂ and z,

hxī , x̂i < cī and hxī , zi ≤ cī ,

respectively. Thus

hxī , yi = (1 − λ)hxī , x̂i + λhxī , zi < cī ,

which is a contradiction to (11.39). Hence, F ⊃ {x ∈ Rn : hxi , xi ≤ ci , i ∈ I},


thereby establishing the result. 
Now we are in a position to present the implication that the Slater con-
straint qualification for (SIP ) leads to the FM qualification from López and
Vercher [75].

© 2012 by Taylor & Francis Group, LLC


11.5 Farkas–Minkowski Linearization 393

Proposition 11.16 Consider the convex semi-infinite programming problem


(SIP ) with bounded feasible set CI . If the Slater constraint qualification for
(SIP ) holds, then the FM qualification condition is also satisfied.
Proof. Define g(x) = sup g(x, i) and CIb = {x ∈ CI : g(x) = 0}. Consider the
i∈I
(LSIS)
e = {hξ, xi ≤ hξ, yi, y ∈ C b , ξ ∈ ∂g(y)}.
Θ I

We claim that Θ e is a linear representation of CI . For any ξ ∈ ∂g(y), by


Definition 2.77 of the subdifferential,
hξ, x − yi ≤ g(x) − g(y), ∀ x ∈ Rn . (11.40)
As the Slater constraint qualification for (SIP ) holds, by condition (i) and
(ii), the supremum g(x) is attained. Therefore, in particular, for y ∈ CIb and
x ∈ CI , that is, g(y) = 0 and g(x) = supi∈I g(x, i) ≤ 0, respectively, the above
inequality reduces to
hξ, xi ≤ hξ, yi, ∀ ξ ∈ ∂g(y).

Because x ∈ CI was arbitrary, every point of CI is a solution of Θ.e


By condition (iii) of the Slater constraint qualification for (SIP ), there
exists x̂ ∈ Rn such that
g(x̂, i) < 0, ∀ i ∈ I,
that is, x̂ ∈ CI . By the conditions (i) and (ii) of the Slater constraint qualifi-
cation for (SIP ), g(x̂) < 0. Therefore, in particular, taking y ∈ CIb and x = x̂,
the condition (11.40) becomes
hξ, x̂ − yi ≤ g(x̂) < 0, ∀ ξ ∈ ∂g(y) (11.41)
for every y ∈ CIb . Also, in particular, taking y = ȳ ∈ CIb and x = ȳ in the
inequality (11.40), the relation holds with equality.
From the above discussion, it is obvious that the conditions of Proposi-
tion 11.15 are satisfied and thus, CI coincides with the solution set of (LSIS)
e that is,
Θ,
CI = {x ∈ Rn : hξ, xi ≤ hξ, yi, ∀ y ∈ CIb , ∀ ξ ∈ ∂g(y)}. (11.42)
e is canonically closed and hence is an FM system.
We now show that Θ
By the condition (11.41),
hξ, x̂i < hξ, yi, ∀ y ∈ CIb , ∀ ξ ∈ ∂g(y)
e
and thus, the condition of (ii) of Definition 11.13 is satisfied. Therefore, for Θ
to be a canonically closed system, we need to show that the set
e = {(ξ, hξ, yi), y ∈ CIb , ξ ∈ ∂g(y)}
K

© 2012 by Taylor & Francis Group, LLC


394 Convex Semi-Infinite Optimization

is compact.
As CI is bounded and CIb ⊂ CI , therefore CIb is bounded. Also by condition
(i) and (ii) of the Slater constraint qualification for (SIP ), the supremum is
attained over I. Therefore, as dom g(., i) = Rn , i ∈ I, dom g = Rn , which
by Theorem 2.69 is continuous over Rn . Thus, for a sequence {yk } ⊂ CIb with
yk → ȳ, g(yk ) → g(ȳ). Also, as g(yk ) = 0 for every k ∈ N, g(ȳ) = 0, which
implies ȳ ∈ CIb . Hence, CIb is closed, thereby yielding the compactness of CIb .
By Proposition 2.85,
[
∂g(CIb ) = {ξ ∈ Rn : ξ ∈ ∂g(y), y ∈ CIb } = ∂g(y)
y∈CIb

is compact.
Now consider a convergent sequence {(ξk , hξk , yk i)} ⊂ K e where {yk } ⊂ C b
I
˜ γ̃), that is ξk → ξ˜
and ξk ∈ ∂g(yk ) ⊂ ∂g(CIb ). Suppose that (ξk , hξk , yk i) → (ξ,
and hξk , yk i → γ̃. As ξk → ξ,˜ which by the compactness of ∂g(C b ) implies that
I
˜
ξ ∈ ∂g(CI ). Because {yk } ⊂ CIb , {yk } is a bounded sequence. By the Bolzano–
b

Weierstrass Theorem, Proposition 1.3, it has a convergent subsequence. With-


out loss of generality, assume that yk → ỹ, which by compactness of CIb leads
to ỹ ∈ CIb . As hξk , yk i → γ, which by the convergence of {ξk } and {yk } im-
plies that γ̃ = hξ, ˜ ỹi. Because ξk ∈ ∂g(yk ) with ξk → ξ˜ and yk → ỹ, by
the Closed Graph Theorem, Theorem 2.84, ξ˜ ∈ ∂g(ỹ). Thus (ξ, ˜ hξ, e
˜ ỹi) ∈ K,
thereby yielding the closedness of K. e
By the compactness of CIb and ∂g(CIb ), kyk ≤ M1 for every y ∈ CIb and
kξk ≤ M2 for every ξ ∈ ∂g(CIb ), respectively. Therefore, for any (ξ, hξ, yi) ∈ K e
along with the Cauchy–Schwarz inequality, Proposition 1.1,

kξk2 + |hξ, yi| ≤ kξk2 + kξk kyk ≤ M1 (M1 + M2 )

and hence Ke is bounded. Therefore, Ke is compact, thus implying that the


e e is
system Θ is canonically closed, which by Proposition 11.14 yields that Θ
an FM system.
Next we claim that (LSIS)

Θ = {hξ, x − yi ≤ −g(y, i), (y, i) ∈ Rn × I, ξ ∈ ∂g(y, i)}

is equivalent to Θ,e that is, both Θ and Θ e have the same solution set. To
establish this claim, we will prove that CI is the solution set of Θ.
For any (y, i) ∈ Rn × I and ξi ∈ ∂g(y, i), by Definition 2.77 of the subdif-
ferential,
hξi , x − yi ≤ g(x, i) − g(y, i), ∀ x ∈ Rn . (11.43)
In particular, taking x ∈ CI , that is,

g(x, i) ≤ 0, ∀ i ∈ I,

© 2012 by Taylor & Francis Group, LLC


11.6 Noncompact Scenario: An Alternate Approach 395

the inequality (11.43) reduces to

hξi , x − yi ≤ −g(y, i), ∀ (y, i) ∈ Rn × I, ∀ ξi ∈ ∂g(y, i),

which implies x is a solution of (LSIS) Θ. Because x ∈ CI was arbitrary,


every point of CI is a solution of Θ.
By condition (iii) of the Slater constraint qualification, there exists x̂ ∈ Rn
such that

g(x̂, i) < 0, ∀ i ∈ I.

In particular, taking x = x̂ in (11.43) yields that for every (y, i) ∈ Rn × I,

hξi , x̂ − yi ≤ g(x̂, i) − g(y, i) < −g(y, i), ∀ ξi ∈ ∂g(y, i).

Also, taking y = ỹ ∈ CIb , where

CIb = {y ∈ Rn : there exists some i ∈ I such that g(y, i) = 0},

along with x = ỹ and ĩ ∈ I(ỹ) in the condition (11.43) leads to

hξ˜i , ỹ − ỹi = 0 = −g(ỹ, ĩ), ∀ ξi ∈ ∂g(ỹ, ĩ).

As the conditions of Proposition 11.15 are satisfied,

CI = {x ∈ Rn : hξ, x − yi ≤ −g(y, i), ∀ (y, i) ∈ Rn × I, ∀ ξ ∈ ∂g(y, i)},


(11.44)
that is, CI is a solution set of (LSIS) Θ.
From the conditions (11.42) and (11.44), both Θ e and Θ are equivalent
e
(LSIS). Because Θ is an FM system, Θ is also an FM system, which along with
Definition 11.11 yields that (SIP ) satisfies the FM qualification condition,
thereby establishing the requisite result. 

11.6 Noncompact Scenario: An Alternate Approach


In this section we discuss the recent epigraphical approach, or more precisely
the sequential approach studied in Chapter 7 as a tool to establish the KKT
optimality conditions for (SIP ). This approach has been studied for convex
programming problems with infinite constraints by Jeyakumar [66, 67] and
Goberna, Jeyakumar, and López [49]. Here we will present the KKT optimality
conditions for (SIP ) from the work of Dinh, Mordukhovich, and Nghia [33]
under the following relaxed closed cone constraint qualification for (SIP ),
that is,
[
cone co epi g(., i)∗ is closed.
i∈I

© 2012 by Taylor & Francis Group, LLC


396 Convex Semi-Infinite Optimization

But before establishing the optimality conditions for (SIP ) as a consequence


of the optimality conditions expressed in terms of the epigraph of the conjugate
functions, we present a result from Jeyakumar [67].

Proposition 11.17 Consider an lsc proper convex function φ : Rn → R̄ and


define

F = {x ∈ Rn : φ(x) ≤ 0}.

If F is nonempty, then epi δF∗ = cl cone co epi φ∗ .

Proof. Suppose that F is nonempty. From the definition of the indicator


function to the set F , δF ,

φ(x) ≤ 0 = δF (x), for x ∈ F,


φ(x) ≤ +∞ = δF (x), for x ∈
/ F.

Therefore, φ(x) ≤ δF (x) for every x ∈ Rn . By Proposition 2.103,

δF∗ (ξ) ≤ φ∗ (ξ), ∀ ξ ∈ Rn . (11.45)

We claim that cl cone co epi φ∗ ⊂ epi δF∗ . By Definition 2.101 of the conjugate
function, δF∗ is the same as the support function to the set F , that is, δF∗ = σF .
By Proposition 2.102, δF∗ is lsc, hence by Theorem 1.9, epi δF∗ is closed. Also,
as σF is a sublinear function, by Theorem 2.59 epi σ is a convex cone. So it
is sufficient to establish that epi φ∗ ⊂ epi δF∗ . Consider any (ξ, α) ∈ epi φ∗ ,
which by condition (11.45) implies that

δF∗ (ξ) ≤ φ∗ (ξ) ≤ α.

Therefore, (ξ, α) ∈ epi δF∗ . As (ξ, α) ∈ epi φ∗ was arbitrary, epi φ∗ ⊂ epi δF∗ .
Because epi δF∗ is a closed convex cone,

cl cone co epi φ∗ ⊂ epi δF∗ . (11.46)

To complete the proof, we will prove the converse inclusion, that is,
epi δF∗ ⊂ cl cone co epi φ∗ . Suppose that (ξ, α) ∈ / cl cone co epi φ∗ . As
δF = σF is a sublinear function with δF (0) = 0. Therefore, (0, −1) 6∈ epi δF∗ ,
∗ ∗

which by the relation (11.46) implies that (0, −1) 6∈ cl cone co epi φ∗ . Define
the convex set

Fe = {(1 − λ)(ξ, α) + λ(0, −1) ∈ Rn × R : λ ∈ [0, 1]}.

We claim that

Fe ∩ cl cone co epi φ∗ = ∅.

On the contrary, suppose that there exists λ̃ ∈ (0, 1) such that

(1 − λ̃)(ξ, α) + λ̃(0, −1) ∈ cl cone co epi φ∗ . (11.47)

© 2012 by Taylor & Francis Group, LLC


11.6 Noncompact Scenario: An Alternate Approach 397

We claim that {0} × R+ ⊂ cl cone co epi φ∗ . To establish this fact, it is


sufficient to show that (0, 1) ∈ cl cone co epi φ∗ . On the contrary, suppose
that

(0, 1) 6∈ cl cone co epi φ∗ .

Then by the Strict Separation Theorem, Theorem 2.26 (iii), there exist
(a, γ) ∈ Rn × R with (a, γ) 6= (0, 0) such that

ha, ξi + γα > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ . (11.48)

As (0, 0) ∈ cl cone co epi φ∗ , γ < 0. We will show that

ha, ξi + γα ≥ 0 > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ .

On the contrary, suppose that (ξ, α) ∈ cl cone co epi φ∗ such that

0 > ha, ξi + γα > γ. (11.49)

For any λ > 0, λ(ξ, α) ∈ cl cone co epi φ∗ , which by the conditions (11.48)
and (11.49) implies that

0 > λ(ha, ξi + γα) > γ.

Taking the limit as λ → +∞ in the above inequality,

λ(ha, ξi + γα) → −∞,

which is a contradiction. Therefore,

ha, ξi + γα ≥ 0 > γ, ∀ (ξ, α) ∈ cl cone co epi φ∗ . (11.50)

Consider any ξ ∈ dom φ∗ and ε > 0. Thus, (ξ, φ∗ (ξ) + ε) ∈ cl cone co epi φ∗ .
Therefore, from the condition (11.50),

ha, ξi + γ(φ∗ (ξ) + ε) ≥ 0,

which implies
1
(ha, ξi + γφ∗ (ξ)) + γ ≥ 0.
ε
Taking the limit as ε → +∞ in the above inequality, which along with (11.50)
yields that 0 > γ ≥ 0, which is a contradiction. Thus, (0, 1) ∈ cl cone co epi φ∗
and hence
{0} × R+ = cone (0, 1) ⊂ cl cone co epi φ∗ . (11.51)
From the relations (11.47) and (11.51),

(1 − λ̃)(ξ, α) = (1 − λ̃)(ξ, α) + λ̃(0, −1) + (0, λ̃) ∈ cl cone co epi φ∗ ,

© 2012 by Taylor & Francis Group, LLC


398 Convex Semi-Infinite Optimization

which implies
1
(ξ, α) = {(1 − λ̃)(ξ, α)} ∈ cl cone co epi φ∗ ,
(1 − λ̃)

thereby contradicting our assumption. Thus

Fe ∩ cl cone co epi φ∗ = ∅.

As Fe is a compact convex set and cl cone co epi φ∗ is a closed convex cone, by


the Strict Separation Theorem, Theorem 2.26 (iii), there exists (a, γ) ∈ Rn ×R
with (a, γ) 6= (0, 0) such that

ha, zi + γβ > ha, z̃i + γ β̃ (11.52)

for every (z, β) ∈ cl cone co φ∗ and (z̃, β̃) ∈ Fe. As (0, 0) ∈ cl cone co epi φ∗ ,

0 > ha, zi + γβ, ∀ (z, β) ∈ Fe.

Also, as (0, −1), (ξ, α) ∈ Fe, from condition (11.52),

γ>0 and ha, ξi + γα < 0. (11.53)

Repeating the discussion as before, we can show that

ha, zi + γβ ≥ 0 > ha, z̃i + γ β̃

for every (z, β) ∈ cl cone co φ∗ and (z̃, β̃) ∈ Fe. For any ξ ∈ dom φ∗ ,
(ξ, φ∗ (ξ)) ∈ cl cone co epi φ∗ , which by the above inequality implies that

ha, ξi + γφ∗ (ξ) ≥ 0, ∀ ξ ∈ dom φ∗ . (11.54)

Because φ is lsc, by Theorem 2.105, φ = φ∗∗ . Therefore, by the conditions


(11.53) and (11.54),
−a −a −a
φ( ) = φ∗∗ ( ) = sup {hξ, i − φ∗ (ξ)} ≤ 0,
γ γ ξ∈Rn γ

−a
which implies that ∈ F . Again using the condition (11.53),
γ
−a −a −a
δF∗ ( ) = σF ( ) ≥ hξ, i > α,
γ γ γ

/ epi δF∗ , thereby establishing the desired result.


which implies (ξ, α) ∈ 
Now we move on to derive the optimality conditions in epigraphical form.
Similar results have been studied in the form of generalized Farkas’ Lemma
in Dinh, Goberna, and López [31] and Dinh, Goberna, López, and Son [32].

© 2012 by Taylor & Francis Group, LLC


11.6 Noncompact Scenario: An Alternate Approach 399

Theorem 11.18 Consider the convex semi-infinite programming problem


(SIP ). Then x̄ is a point of minimizer of (SIP ) if and only if
[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi g ∗ (., i). (11.55)
i∈I

Proof. Suppose that x̄ is a point of minimizer of (SIP ) and hence of the


following unconstrained problem

inf f (x) + δCI (x) subject to x ∈ Rn .

Therefore, by Theorem 2.89,

0 ∈ ∂(f + δCI )(x̄),

which by Theorem 2.108 and the fact that x̄ ∈ CI implies that

f (x̄) + (f + δCI )∗ (0) = h0, x̄i = 0.

Therefore, the above condition leads to

(0, −f (x̄)) ∈ epi (f + δCI )∗ .

As dom f = Rn , by Theorem 2.69, f is continuous over Rn . Thus, by Propo-


sition 2.124,
(0, −f (x̄)) ∈ epi f ∗ + epi δC

I
. (11.56)
Define the supremum function g(x) = supi∈I g(x, i), which implies that

CI = {x ∈ Rn : g(x) ≤ 0}.

Because x̄ ∈ CI , CI is nonempty. Invoking Proposition 11.17, the condition


(11.56) yields

(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi g ∗ .

Applying Theorem 2.123 to the above relation leads to


[
(0, −f (x̄)) ∈ epi f ∗ + cl cone co epi g ∗ (., i),
i∈I

thereby leading to the desired condition.


Conversely, suppose that the epigraphcal condition (11.55) is satisfied,
which implies that there exist ξ ∈ dom f ∗ , α ≥ 0, λki ≥ 0, ξik ∈ dom g ∗ (., i)
and αik ≥ 0 for i ∈ I such that
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λki (ξik , g ∗ (ξik , i) + αik ).
k→∞
i∈I

© 2012 by Taylor & Francis Group, LLC


400 Convex Semi-Infinite Optimization
S
As cone co i∈I epi g ∗ (., i) ⊂ Rn+1 , by the Carathéodory Theorem, Theo-
rem 2.8, any element
S in the convex cone can be expressed as a sum of n + 2
elements from i∈I epi g ∗ (., i). Therefore, the above condition becomes
n+2
X
(0, −f (x̄)) = (ξ, f ∗ (ξ) + α) + lim λkij (ξikj , g ∗ (ξikj , ij ) + αikj ),
k→∞
j=1

where ij ∈ I, j = 1, 2, . . . , n + 2. Componentwise comparison leads to


n+2
X
0 = ξ + lim λkij ξikj , (11.57)
k→∞
j=1
n+2
X
−f (x̄) = f ∗ (ξ) + α + lim λkij (g ∗ (ξikj , ij ) + αikj ). (11.58)
k→∞
j=1

By Definition 2.101 of the conjugate function, condition (11.58) yields


n+2
X
f (x̄) − f (x) ≤ −hξ, xi − α − lim λkij (g ∗ (ξikj , ij ) + αikj )
k→∞
i=1
n+2
X
≤ −hξ, xi − α − lim λkij (hξikj , xi − g(ξikj , ij ) + αikj ), ∀ x ∈ Rn .
k→∞
i=1

Using condition (11.57), for every x ∈ CI , the above inequality leads to


n+2
X
f (x̄) − f (x) ≤ −α − lim λkij αikj ,
k→∞
i=1

which by the nonnegativity of α and αikj , j = 1, 2, . . . , n + 2, yields

f (x̄) ≤ f (x), ∀ x ∈ CI .

Thus, x̄ is a point of minimizer of (SIP ), thereby establishing the result. 


We end this chapter by presenting the KKT optimality condition for (SIP )
from Dinh, Mordukhovich, and Nghia [33]. But before that we define the set
of active constraint multipliers as
[I]
A(x̄) = {λ ∈ R+ : λi g(x̄, i) = 0, ∀ i ∈ supp λ}.

Theorem 11.19 Consider the convex semi-infinite programming problem


(SIP ). Assume that the closed cone constraint qualification holds. Then x̄
is a point of minimizer of (SIP ) if and only if there exists λ ∈ A(x̄) such that
X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i).
i∈supp λ

© 2012 by Taylor & Francis Group, LLC


11.6 Noncompact Scenario: An Alternate Approach 401

Proof. By Theorem 11.18, x̄ is a point of minimizer of (SIP ) if and only


if condition (11.55) is satisfied. As the closed cone constraint qualification is
satisfied, (11.55) reduces to
[
(0, −f (x̄)) ∈ epi f ∗ + cone co epi g ∗ (., i).
i∈I

By Theorem 2.122, there exist ξ ∈ ∂ε f (x̄), ε ≥ 0, λi ≥ 0, ξi ∈ ∂εi g(x̄, i) and


εi ≥ 0, i ∈ I such that
X
(0, −f (x̄)) = (ξ, hξ, x̄i − f (x̄) + ε) + λi (ξi , hξi , x̄i − g(x̄, i) + εi ).
i∈I

Componentwise comparison leads to


X
0 = ξ+ λi ξi , (11.59)
i∈I
X
−f (x̄) = (hξ, x̄i − f (x̄) + ε) + λi (hξi , x̄i − g(x̄, i) + εi ). (11.60)
i∈I

Using the condition (11.59), (11.60) reduces to


X
0=ε+ λi (−g(x̄, i) + εi ).
i∈I

The above condition along with the fact that x̄ ∈ CI , that is, g(x̄, i) ≤ 0, i ∈ I
and the nonnegativity of ε, εi and λi , i ∈ I, implies that

ε = 0, λi εi = 0 and λi g(x̄, i) = 0, i ∈ I.

Thus, for i ∈ supp λ, εi = 0 and λ ∈ A(x̄). Therefore, ξ ∈ ∂f (x̄) and


ξi ∈ ∂g(x̄, i), i ∈ supp λ satisfying
X
0=ξ+ λi ξi .
i∈supp λ

Therefore, for λ ∈ A(x̄),


X
0 ∈ ∂f (x̄) + λi ∂g(x̄, i), (11.61)
i∈supp λ

thereby yielding the KKT optimality condition for (SIP ).


Conversely, suppose that (11.61) holds, which implies that there exist
ξ ∈ ∂f (x̄) and ξi ∈ ∂g(x̄, i), i ∈ supp λ such that
X
0=ξ+ ξi . (11.62)
i∈supp λ

© 2012 by Taylor & Francis Group, LLC


402 Convex Semi-Infinite Optimization

By Definition 2.77 of the subdifferential, for every x ∈ Rn ,

f (x) ≥ f (x̄) + hξ, x − x̄i,


g(x, i) ≥ g(x̄, i) + hξ, x − x̄i, i ∈ supp λ,

which along with the condition (11.62) implies that


X X
f (x) + λi g(x, i) ≥ f (x̄) + λi g(x̄, i), ∀ x ∈ Rn .
i∈supp λ i∈supp λ

As λ ∈ A(x̄), λi g(x̄, i) = 0 for i ∈ supp λ, which for every x ∈ CI reduces the


above inequality to
X
f (x) ≥ f (x) + λi g(x, i) ≥ f (x̄), ∀ x ∈ CI .
i∈supp λ

Therefore, x̄ is a point of minimizer of (SIP ), hence completing the proof. 

© 2012 by Taylor & Francis Group, LLC


Chapter 12
Convexity in Nonconvex Optimization

12.1 Introduction
This is the final chapter of the book. What we want to discuss here is essen-
tially outside the preview of convex optimization. Yet as we will see, convexity
will play a fundamental role in the issues discussed. We will discuss here two
major areas in nonconvex optimization, namely maximization of a convex
function and minimization of a d.c. function. The acronym d.c. stands for
difference convex function, that is, functions expressed as the difference of
two convex functions. Thus, more precisely, we would look into the following
problems:
max f (x) subject to x∈C (P 1)
and min f (x) − g(x) subject to x ∈ C, (P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. A large
class of nonconvex optimization problems actually come into this setting. Note
that (P 1) can be posed as

min − f (x) subject to x∈C

and thus as

min φ(x) − f (x) subject to x ∈ C,

where φ is the zero function. Thus the problem (P 1) can also be viewed as
a special case of (P 2), though we will consider them separately for a better
understanding.

12.2 Maximization of a Convex Function


The problem of maximizing a convex function over a convex set is a complete
paradigm shift from that of minimization of a convex function over a convex

403

© 2012 by Taylor & Francis Group, LLC


404 Convexity in Nonconvex Optimization

set. The problem of maximization of a convex function is, in fact, a hard


nonconvex minimization problem. One of the early results in this direction
appears in the classic text of Rockafellar [97] and we will mention a few of
them here in order to motivate the reader. The first point that the reader
should note is that local maxima of a convex function need not be global
maxima. We leave it to the reader to create some examples that bring out
this fact. The following result is given in Rockafellar [97]. We will not provide
any proof. See Rockafellar [97] for the proof.

Theorem 12.1 Consider a convex function f : Rn → R and a convex set


C ⊂ Rn . If f attains its supremum relative to C at some point in ri C, then
f is constant on C.

The above theorem says that if f is a nonconstant convex function and if


it attains its supremum on C, then it must be attained at the boundary. Of
course the more interesting question is when does the convex function actually
attains its maximum? In this respect, one has the following interesting result
from [97] where the set C is assumed to be polyhedral.

Theorem 12.2 Consider a convex function f : Rn → R and a convex set


C ⊂ Rn that is polyhedral. Suppose that there are no half lines in C on which
f is unbounded above. Then f attains its supremum over C.

For more general results, see [97]. One of the earliest papers dealing exclu-
sively with the optimality conditions of maximizing a convex function over a
convex set is due to Strekalovskii [104]. Though Strekalovskii [104] frames his
problem in a general setting, his results are essentially useful for the convex
case and the main results in his paper are given only for the convex case.
Observe that if f : Rn → R is a convex function and x̄ ∈ C is the point
where f attains a global maximum, then for every x ∈ C,

0 ≥ f (x) − f (x̄) ≥ hξ, x − x̄i, ∀ ξ ∈ ∂f (x̄),

which implies

hξ, x − x̄i ≤ 0, ∀ ξ ∈ ∂f (x̄), ∀ x ∈ C.

Thus the necessary condition is ∂f (x̄) ⊂ NC (x̄). The reader should try to find
a necessary condition when x̄ is a local maximum. Can we find a sufficient
condition for a global maximum? Strekalovskii [104] attempts to answer this
question by developing a set of necessary and sufficient conditions.

Theorem 12.3 Consider the problem of maximizing a convex function f over


a closed convex set C. Assume that x̄ ∈ C is a point such that

−∞ < infn f (x) < f (x̄) < +∞


x∈R

© 2012 by Taylor & Francis Group, LLC


12.2 Maximization of a Convex Function 405

and the set

C̄ = {x ∈ Rn : f (x) ≤ f (x̄)}

is compact having a nonempty interior, that is, int C̄ 6= ∅. Then x̄ ∈ C is a


global maximum of f on C if and only if

(a) for every x∗ ∈ ∂f (x̄), hx∗ , x − x̄i ≤ 0, ∀ x ∈ C or,


(b) for every y ∗ ∈ S(f, x̄), hy ∗ , x − x̄i ≤ 1, ∀ x ∈ C where

S(f, x̄) = {y ∗ ∈ Rn : ∃ y ∈ Rn , y 6= x̄, f (y) = f (x̄) and


∃ α > 0, αy ∗ ∈ ∂f (y) satisfying hy ∗ , y − x̄i = 1}.

Proof. We will only prove (a) and leave (b) to the readers. If x̄ is a global
maximum, then our discussion preceding the theorem shows that (a) holds,
that the condition in (a) is necessary. Now we will look into the reverse, that
is, whether (a) is sufficient for a global maximum or not. Observe that under
the given hypothesis, for every x∗ ∈ ∂f (x̄),

hx∗ , x − x̄i ≤ 0, ∀ x ∈ C.

As dom f = Rn , by Theorem 2.69, f is a continuous convex function, thus


the set C̄ is closed and convex. Also, from the above inequality,

cone ∂f (x̄) ⊂ NC (x̄).

Further, as C̄ has a nonempty interior, there exists x̂ such that f (x̂) < f (x̄).
Hence

NC̄ (x̄) = {λξ : λ ≥ 0, ξ ∈ ∂f (x̄)}.

Thus, NC̄ (x̄) = cone ∂f (x̄). This shows that NC̄ (x̄) ⊂ NC (x̄), which implies
that C ⊂ C̄. Hence x̄ is the point where the maximum is achieved as x̄ is
already given to be an element of C. 
It is important to note that without the additional conditions,
∂f (x̄) ⊂ NC (x̄) does not render a global maximum. Here we put forward an
example from Dutta [38]. Consider f : R → R defined as

f (x) = max{x2 , x}.

Now suppose that we want to maximize f over C = [−1, 0]. Consider x̄ = 0.


Thus NC (x̄) = R+ = {x ∈ R : x ≥ 0}. Observe that ∂f (0) = [0, 1]. Therefore,
∂f (0) ⊂ NC (0). However, x̄ = 0 is a global minimizer of f over C and not a
global maximizer.
Strekalovskii refined the above result slightly to provide the following re-
sult. This appeared in [105].

© 2012 by Taylor & Francis Group, LLC


406 Convexity in Nonconvex Optimization

Theorem 12.4 Consider a closed convex set C ⊂ Rn and let x̄ ∈ C. Assume


that

−∞ ≤ infn f (x) < f (x̄),


x∈R

where f : Rn → R is a convex function. Then x̄ ∈ C is a global maximum for


(P 1) if and only if

∂f (x) ⊂ NC (x), ∀ x ∈ Rn satisfying f (x) = f (x̄).

Readers are requested to have a look at the difference between


Strekalovskii’s result in Theorem 12.3 and this result. Though the above re-
sult is elegant, it suffers from a drawback, that is, one needs to calculate
NC (x) for every x ∈ Rn satisfying f (x) = f (x̄). Now if x ∈
/ C, then tradi-
tionally we define NC (x) = ∅. However, for a convex function f : Rn → R,
∂f (x) 6= ∅ for every x ∈ Rn . This drawback was overcome by Hiriart-Urruty
and Ledyaev [61]. We now present their result but with a different approach
to the proof.

Theorem 12.5 Consider a convex function f : Rn → R and a closed convex


set C ⊂ R. Let x̄ ∈ C be such that

−∞ ≤ inf f (x) < f (x̄).


x∈C

Then x̄ ∈ C is a maximizer for (P 1) if and only if

∂f (x) ⊂ NC (x), ∀ x ∈ C satisfying f (x) = f (x̄).

Proof. If x̄ ∈ C is the global maximizer of the function f over C, then we


have already seen that ∂f (x̄) ⊂ NC (x̄). It is simple to see that if f (x) = f (x̄),
∂f (x) ⊂ NC (x). We leave this very simple proof to the reader.
Conversely, assume on the contrary that x̄ ∈ C is not a global maximizer
of (P 1). Therefore, there exists x̂ ∈ C such that f (x̂) > f (x̄). Consider the
following level set

S(x̄) = {x ∈ C : f (x) ≤ f (x̄)},

which is a closed convex set. It is clear that x̂ 6∈ S(x̄). Thus, the following
projection problem:
1
min kx − x̂k2 subject to f (x) ≤ f (x̄), x ∈ C
2
has a unique solution. Let x̃ ∈ C be that unique solution. Now using the Fritz
John optimality conditions for a convex optimization problem, Theorem 5.1,
there exist λ0 ≥ 0 and λ1 ≥ 0 with (λ0 , λ1 ) 6= (0, 0) such that
(i) 0 ∈ λ0 (x̃ − x̂) + λ1 ∂f (x̃) + NC (x̃),

© 2012 by Taylor & Francis Group, LLC


12.2 Maximization of a Convex Function 407

(ii) λ1 (f (x̃) − f (x̄)) = 0.


Assume that λ0 = 0, which implies λ1 > 0. Thus the above conditions reduce
to

0 ∈ λ1 ∂f (x̃) + NC (x̃) and f (x̃) = f (x̄).

The condition 0 ∈ λ1 ∂f (x̃) + NC (x̃) leads to the expression

0 ∈ ∂f (x̃) + NC (x̃).

This is obtained by dividing both sides by λ1 and noting that NC (x̃) is a cone.
As f is convex, invoking Theorem 3.1, x̃ ∈ C is a point of minimizer of f over
C, that is,

f (x̃) = inf f (x).


x∈C

The condition f (x̃) = f (x̄) along with the given hypothesis yields that

−∞ ≤ inf f (x) < f (x̃),


x∈C

thereby contradicting the fact that x̃ is the point of minimizer of f over C.


Hence λ0 > 0. Now assume that λ1 = 0. Therefore, the facts that λ0 > 0 and
NC (x̃) is a cone yield that

0 ∈ (x̃ − x̂) + NC (x̃),

that is,

x̂ − x̃ ∈ NC (x̃).

Because x̂ ∈ C,

0 ≥ hx̂ − x̃, x̂ − x̃i = kx̂ − x̃k2 ,

implying that x̃ = x̂. This is indeed a contradiction. Hence λ1 > 0. Thus there
exist ξ ∈ ∂f (x̃) and η ∈ NC (x̃) such that

0 = λ0 (x̃ − x̂) + λ1 ξ + η. (12.1)

As f (x̃) = f (x̄), by the given hypothesis, ∂f (x̃) ⊂ NC (x̃), which implies

−hλ1 ξ, x̂ − x̃i ≥ 0. (12.2)

Further, it is simple to see that

−hη, x̂ − x̃i + λ0 kx̂ − x̃k2 > 0. (12.3)

The conditions (12.1), (12.2), and (12.3) lead to a contradiction, thereby es-
tablishing the result. 

© 2012 by Taylor & Francis Group, LLC


408 Convexity in Nonconvex Optimization

12.3 Minimization of d.c. Functions


In this section we will concentrate on deriving the optimality condition for
local and global minimization of a very important class of nonconvex problems.
These problems are the ones where the objective function can be expressed
as the difference of two convex functions. Such functions are referred to as
difference convex functions or d.c. functions. Thus we will concentrate on the
problem
min f (x) − g(x) subject to x∈C (P 2)
where f, g : Rn → R are convex functions and C ⊂ Rn is a convex set. Note
that f (x) − g(x) need not be a convex function unless g is a linear or affine
function. So in general it is a nonconvex function. We begin by providing a
necessary optimality condition for a local optimal point.

Theorem 12.6 Consider the problem (P 2) and let x̄ be a local minimizer of


(P 2) where C = Rn . Then ∂f (x̄) ∩ ∂g(x̄) 6= ∅.

Proof. Let x̄ be a local minimum. As f − g is locally Lipschitz,

0 ∈ ∂ ◦ (f − g)(x̄).

For details, see Clarke [27] or Chapter 3. Hence, by the Sum Rule of the Clarke
subdifferential,

0 ∈ ∂ ◦ f (x̄) + ∂ ◦ (−g)(x̄).

Noting that ∂ ◦ f (x̄) = ∂f (x̄) and ∂ ◦ (−g)(x̄) = −∂ ◦ g(x̄) = −∂g(x̄), the above
condition becomes

0 ∈ ∂f (x̄) − ∂g(x̄).

This yields that

∂f (x̄) ∩ ∂g(x̄) 6= ∅.

We would again like to stress that for details on the Clarke subdifferential,
see Clarke [27]. 
Let us note that the above condition is only necessary and not sufficient.
Consider h(x) = f (x) − g(x), where f (x) = x2 and g(x) = |x|. At x̄ = 0,
∂f (0) = 0 and ∂g(0) = [−1, 1]. Thus,

∂f (0) ∩ ∂g(0) = {0}.

But it is clear that x̄ = 0 is not a local minimizer of h.

© 2012 by Taylor & Francis Group, LLC


12.3 Minimization of d.c. Functions 409

Now let us see what happens if we consider C ⊂ Rn . In this case, one


would have

0 ∈ ∂ ◦ (f − g)(x̄) + NC (x̄),

(see Clarke [27] for more details). Hence,

0 ∈ ∂f (x̄) − ∂g(x̄) + NC (x̄).

Thus there exist ξf ∈ ∂f (x̄), ξg ∈ ∂g(x̄) and η ∈ NC (x̄) such that

ξg = ξf + η.

Thus, the optimality condition can now be stated as follows:


If x̄ is a local minimum for (P 2), there there exists ξg ∈ ∂g(x̄)
such that

ξg ∈ ∂f (x̄) + NC (x̄).

For C = Rn , if x̄ is a global minimum for (P 2),

f (x) − g(x) ≥ f (x̄) − g(x̄), ∀ x ∈ Rn .

Therefore,

f (x) − f (x̄) ≥ g(x) − g(x̄) ≥ hξg , x − x̄i, ∀ ξg ∈ ∂g(x̄),

thereby implying that

∂g(x̄) ⊂ ∂f (x̄).

Note that this is again a necessary condition and not sufficient. We urge the
reader to find an example demonstrating this fact.
We now present interesting and important necessary and sufficient optimal-
ity conditions for the global optimization of problem (P 2). Here the optimality
conditions will be expressed in terms of the ε-subdifferential. We present this
result as given in Bomze [15].

Theorem 12.7 Consider the problem (P 2) with C = Rn . Then x̄ ∈ Rn is a


global minimizer of (P 2) if and only if

∂ε g(x̄) ⊂ ∂ε f (x̄), ∀ ε > 0.

Proof. As x̄ ∈ Rn is a global minimizer of (f − g) over Rn ,

f (x) − f (x̄) ≥ g(x) − g(x̄), ∀ x ∈ Rn .

If ξ ∈ ∂ε g(x̄) for any ε > 0,

f (x) − f (x̄) ≥ g(x) − g(x̄) ≥ hξ, x − x̄i − ε, ∀ x ∈ Rn ,

© 2012 by Taylor & Francis Group, LLC


410 Convexity in Nonconvex Optimization

thereby implying that ξ ∈ ∂ε f (x̄). Because ε > 0 was arbitrary, this establishes
that

∂ε g(x̄) ⊂ ∂ε f (x̄), ∀ ε > 0.

Let us now look at the converse. On the contrary, assume that x̄ is not a
global minimizer of (P 2), which implies that there exists x̂ ∈ Rn such that

f (x̂) − g(x̂) < f (x̄) − g(x̄).

This yields that

f (x̄) − f (x̂) − g(x̄) + g(x̂) > 0.

Set δ = (1/2)(f (x̄) − f (x̂) − g(x̄) + g(x̂)). It is simple to see that δ > 0.
Now consider ξˆ ∈ ∂g(x̂), which implies that
ˆ x̄ − x̂i ≥ 0.
g(x̄) − g(x̂) − hξ,

Because δ > 0,
ˆ x̄ − x̂i + δ > 0.
g(x̄) − g(x̂) − hξ,
ˆ x̄ − x̂i + δ. Then for any x ∈ Rn ,
Set ε = g(x̄) − g(x̂) − hξ,
ˆ x − x̄i − ε
hξ, ˆ x − x̂ + x̂ − x̄i − ε
= hξ,
ˆ x − x̂i − δ + g(x̂) − g(x̄).
= hξ,

As ξˆ ∈ ∂g(x̂), it is clear that ξˆ ∈ ∂δ g(x̂), which leads to


ˆ x − x̂i − δ + g(x̂) ≤ g(x).
hξ,

Thus,
ˆ x − x̄i − ε ≤ g(x) − g(x̄), ∀ x ∈ Rn ,
hξ,

thereby implying that ξˆ ∈ ∂ε g(x̄). By the given hypothesis, ξˆ ∈ ∂ε f (x̄). There-


fore, in particular for x = x̂,
ˆ x̂ − x̄i − ε.
f (x̂) − f (x̄) ≥ hξ,

Now

2δ = f (x̄) − f (x̂) − (g(x̄) − g(x̂))


≤ ε − hξ,ˆ x̂ − x̄i − (g(x̄) − g(x̂)).

The way in which ε is defined leads to


ˆ x̂ − x̄i.
ε − (g(x̄) − g(x̂)) = δ + hξ,

© 2012 by Taylor & Francis Group, LLC


12.3 Minimization of d.c. Functions 411

Hence,
ˆ x̂ − x̄i − hξ,
2δ ≤ δ + hξ, ˆ x̂ − x̄i = δ < 2δ,

which is a contradiction. Thus, x̄ is indeed a global solution for (P 2). 


Note that the above result also holds true if we assume f : Rn → R∪{+∞}
and g : Rn → R. In that case, one just has to assume that x̄ ∈ dom f . The
reader is encouraged to sketch the proof for such a scenario. However, we
present the result here for the sake of convenience.

Theorem 12.8 Consider the problem (P 2) with C = Rn and a lower semi-


continuous convex function f : Rn → R ∪ {+∞} with dom f 6= ∅. Let
x̄ ∈ dom f . Then x̄ is a global minimum for (P 2) if and only if

∂ε g(x̄) ⊂ ∂ε f (x̄), ∀ ε > 0.

Using the above result, one can deduce an optimality conditions for the case
when C ⊂ Rn and f : Rn → R. Observe that when C ⊂ Rn and f : Rn → R,
the problem (P 2) can be equivalently written as

min (f + δC )(x) − g(x) subject to x ∈ Rn .

Hence, x̄ is a global minimum for (P 2) if and only if

∂ε g(x̄) ⊂ ∂ε (f + δC )(x̄), ∀ ε > 0.

This is done of course by applying Theorem 12.8. Invoking the Sum Rule of
ε-subdifferential, Theorem 2.115,
[
∂ε (f + δC )(x̄) = (∂ε1 f (x̄) + ∂ε2 δC (x̄)).
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

Hence,
[
∂ε g(x̄) ⊂ (∂ε1 f (x̄) + Nε2 ,C (x̄)), ∀ ε > 0.
ε1 ≥ 0, ε2 ≥ 0,
ε1 + ε2 = ε

We just recall that ∂ε2 δC (x̄) = Nε2 ,C (x̄) for any ε2 ≥ 0.


The result Theorem 12.8 can also be used to deduce necessary and sufficient
optimality conditions for the problem (P 1).

Corollary 12.9 Consider the problem (P 1). Assume that C ⊂ Rn is a closed


convex set. The x̄ ∈ C is a global maximum for (P 1) if and only if

∂ε f (x̄) ⊂ Nε,C (x̄), ∀ ε > 0.

© 2012 by Taylor & Francis Group, LLC


412 Convexity in Nonconvex Optimization

Proof. Observe that the problem (P 1) can be written as

min − f (x) subject to x ∈ C.

A further equivalent version can be given by

min (δC + f )(x) subject to x ∈ Rn .

Using Theorem 12.8, the optimality condition is

∂ε f (x̄) ⊂ ∂ε δC (x̄), ∀ ε > 0,

that is,

∂ε f (x̄) ⊂ Nε,C (x̄), ∀ ε > 0,

thereby establishing the result. 


We end our discussion and the book here. However for more details on the
use of the above results, see for example Bomze [15], Hiriart-Urruty [60], and
the references therein.

© 2012 by Taylor & Francis Group, LLC


Bibliography

[1] F. A. Al-Khayyal and J. Kyparisis. Finite convergence of algorithms


for nonlinear programs and variational inequalities. J. Optim. Theory
Appl., 70:319–332, 1991.
[2] H. Attouch and H. Brézis. Duality for the sum of convex functions in
general banach spaces. In Aspects of Mathematics and its Applications,
pages 125–133. Amsterdam, 1986.
[3] H. Attouch, G. Buttazzo, and G. Michaille. Variational Analysis
in Sobolev and BV Spaces: Applications to PDEs and Optimization.
MPS/SIAM Series on Optimization, SIAM, Philadelphia, PA; MPS,
Philadelphia, PA, 2006.
[4] J.-P. Aubin and I. Ekeland. Applied Nonlinear Analysis. Wiley-
Interscience, New York, 1984.
[5] A. Auslender. Optimisation. Méthodes numériques. Maı̂trise de Math-
matiques et Applications Fondamentales. Masson, Paris-New York-
Barcelona, 1976.
[6] A. Ben-Tal and A. Ben-Israel. Characterizations of optimality in convex
programming: The nondifferentiable case. Applicable Anal., 9:137–156,
1979.
[7] A. Ben-Tal, A. Ben-Israel, and S. Zlobec. Characterization of optimality
in convex programming without a constraint qualification. J. Optim.
Theory Appl., 20:417–437, 1976.
[8] A. Ben-Tal and A. Nemirovskii. Lectures on Modern Convex Optimiza-
tion: Analysis, Algorithms, and Engineering Applications. MPS/SIAM
Series on Optimization, SIAM, Philadelphia, PA, 2001.
[9] A. Ben-Tal, E. E. Rosinger, and A. Ben-Israel. A Helly-type theorem and
semi-infinite programming. In Constructive Approaches to Mathemati-
cal Models, pages 127–135. Academic Press, New York-London-Toronto,
Ont., 1979.
[10] C. Berge. Topological Spaces. Including a Treatment of Multi-Valued
Functions, Vector Spaces and Convexity. Dover Publications, Inc., Mi-
neola, NY, 1997.

413

© 2012 by Taylor & Francis Group, LLC


414 Bibliography

[11] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont,


MA, 1999.
[12] D. P. Bertsekas. Convex Analysis and Optimization. Athena Scientific,
Belmont, MA, 2003.
[13] D. P. Bertsekas and A. E. Ozdaglar. Pseudonormality and Lagrange
multipler theory for constrained optimization. J. Optim. Theory Appl.,
114:287–343, 2002.
[14] D. P. Bertsekas, A. E. Ozdaglar, and P. Tseng. Enhanced Fritz John
for convex programming. SIAM J. Optim., 16:766–797, 2006.
[15] I. M. Bomze. Global optimization: A quadratic programming perspec-
tive. In Nolinear Optimization, Lecture Notes in Math., volume 1989.
Springer-Verlag, Berlin, 2010.
[16] J. M. Borwein. Direct theorems in semi-infinite convex programming.
Math. Programming, 21:301–318, 1981.
[17] J. M. Borwein and A. S. Lewis. Convex Analysis and Nonlinear Opti-
mization: Theory and Examples. CMS Books in Mathematics, Springer-
Verlag, New York, 2000.
[18] J. M. Borwein and Q. J. Zhu. Techniques of Variational Analysis.
Springer, New York, 2005.
[19] A. Brøndsted and R. T. Rockafellar. On the subdifferentiability of con-
vex functions. Proc. Am. Math. Soc., 16:605–611, 1965.
[20] R. S. Burachik and V. Jeyakumar. A dual condition for the convex
subdifferential sum formula with applications. J. Convex Anal., 12:279–
290, 2005.
[21] R. S. Burachik and V. Jeyakumar. A new geometric condition for
Fenchel’s duality in infinite dimensional spaces. Math. Programming,
104:229–233, 2005.
[22] J. V. Burke and S. Deng. Weak sharp minima revisited. Part I: Basic
theory. Control Cybernetics, 31:439–469, 2002.
[23] J. V. Burke and S. Deng. Weak sharp minima revisited. Part II: Ap-
plication to linear regularity and error bounds. Math. Programming,
104:235–261, 2005.
[24] J. V. Burke and S. Deng. Weak sharp minima revisited Part III: Error
bounds for differentiable convex inclusions. Math. Programming, 116:37–
56, 2009.
[25] J. V. Burke and M. C. Ferris. Weak sharp minima in mathematical
programming. SIAM J. Control Optim., 31:1340–1359, 1993.

© 2012 by Taylor & Francis Group, LLC


Bibliography 415

[26] E. W. Cheney. Introduction to Approximation Theory. McGraw-Hill,


New York, 1966.

[27] F. H. Clarke. Optimization and Nonsmooth Analysis. Wiley Interscience,


New York, 1983.

[28] F. H. Clarke, Y. S. Ledyaev, R. J. Stern, and P. R. Wolenski. Non-


smooth Analysis and Control Theory, volume 178: Graduate Texts in
Mathematics. Springer-Verlag, New York.

[29] L. Cromme. Strong uniqueness. Numer. Math., 29:179–193, 1978.

[30] V. F. Demyanov and A. M. Rubinov. Constructive Nonsmooth Analysis.


Approximation & Optimization. 7: Peter Lang, Frankfurt am Main,
1995.

[31] N. Dinh, M. A. Goberna, and M. A. López. From linear to convex


systems: Consistency, Farkas’ lemma and applications. J. Convex Anal.,
13:279–290, 2006.

[32] N. Dinh, M. A. Goberna, M. A. López, and T. Q. Son. New Farkas-


type constraint qualifications in convex infinite programmming. ESAIM
Control Optim. Calc. Var., 13:580–597, 2007.

[33] N. Dinh, B. S. Mordukhovich, and T. T. A. Nghia. Subdifferentials of


value functions and optimality conditions for DC and bilevel infinite and
semi-infinite programs. Math. Programming, 123:101–138, 2010.

[34] N. Dinh, T. T. A. Nghia, and G. Vallet. A closedness condition and


its application to DC programs with convex constraints. Optimization,
59:541–560, 2010.

[35] R. J. Duffin. Convex analysis treated by linear programming. Math.


Programming, 4:125–143, 1973.

[36] J. Dutta. Generalized derivatives and nonsmooth optimization, a finite


dimensional tour. Top, 13:185–314, 2005.

[37] J. Dutta. Necessary optimality conditions and saddle points for approx-
imate optimization in Banach spaces. Top, 13:127–144, 2005.

[38] J. Dutta. Optimality conditions for maximizing a locally Lipschitz func-


tion. Optimization, 54:377–389, 2005.

[39] J. Dutta, K. Deb, R. Arora, and R. Tulshyan. Approximate KKT points:


Theoretical and numerical study. 2010. Preprint.

[40] J. Dutta and C. S. Lalitha. Optimality conditions in convex optimization


revisited. 2010. Preprint.

© 2012 by Taylor & Francis Group, LLC


416 Bibliography

[41] I. Ekeland. On the variational principle. J. Math. Anal. Appl., 47:324–


353, 1974.
[42] I. Ekeland. Nonconvex minimization problems. Bull. Am. Math. Soc.,
1:443–474, 1979.
[43] I. Ekeland and R. Temam. Convex Analysis and Variational Problems,
volume 1: Studies in Mathematics and its Applications. North-Holland
Publishing Co., Amsterdam-Oxford and American Elsevier Publishing
Co., Inc., New York, 1976.
[44] M. D. Fajardo and M. A. López. Locally Farkas-Minkowski systems in
convex semi-infinite programming. J. Optim. Theory Appl., 103:313–
335, 1999.
[45] W. Fenchel. On conjugate convex functions. Canadian J. Math., 1:73–
77, 1949.
[46] M. C. Ferris. Weak Sharp Minima and Penalty Functions in Mathemat-
ical Programming. University of Cambridge, Cambridge, 1988. Ph. D.
thesis.
[47] M. Florenzano and C. Le Van. Finite Dimensional Convexity and Op-
timization, volume 13: Studies in Economic Theory. Springer-Verlag,
Berlin, 2001.
[48] M. Fukushima. Equivalent differentiable optimization problems and de-
scent methods for asymmetric variational inequality problems. Math.
Programming, 53:99–110, 1992.
[49] M. A. Goberna, V. Jeyakumar, and M. A. López. Necessary and suffi-
cient constraint qualifications for solvability of systems of infinite convex
inequalities. Nonlinear Anal., 68:1184–1194, 2008.
[50] M. A. Goberna and M. A. López. Linear Semi-Infinite Optimization,
volume 2: Wiley Series in Mathematical Methods in Practice. John
Wiley & Sons, Ltd., Chichester, 1998.
[51] M. A. Goberna, M. A. López, and J. Pastor. Farkas-Minkowski systems
in semi-infinite programming. Appl. Math. Optim., 7:295–308, 1981.
[52] F. J. Gould and J. W. Tolle. A necessary and sufficient qualification for
constrained optimization. SIAM J. Appl. Math., 20:164–172, 1971.
[53] F. J. Gould and J. W. Tolle. Geometry of optimality conditions and
constraint qualifications. Math. Programming, 2:1–18, 1972.
[54] M. Guignard. Generalized Kuhn-Tucker conditions for mathematical
programming problems in a Banach space. SIAM J. Control, 7:232–
241, 1969.

© 2012 by Taylor & Francis Group, LLC


Bibliography 417

[55] M. R. Hestenes. Optimization Theory: The Finite Dimensional Case.


Wiley, New York, 1975.
[56] R. Hettich. A review of numerical methods for semi-infinite optimiza-
tion. In Semi-Infinite Programming and Applications, Lecture Notes
in Econom. and Math. System, volume 215, pages 158–178. Springer-
Verlag, Berlin, 1983.
[57] R. Hettich and K. O. Kortanek. Semi-infinite programming: theory,
methods and applications. SIAM Review, 35:380–429, 1993.
[58] J.-B. Hiriart-Urruty. ε-Subdifferential Calculus. In Convex Analysis
and Optimization, pages 43–92. Pitman, London, 1982.
[59] J.-B. Hiriart-Urruty. What conditions are satisfied at points minimizing
the maximum of a finite number of differentiable functions? In Non-
smooth Optimization: Methods and Applications (Erice, 1991), pages
166–174. Gordon and Breach, Montreux, 1992.
[60] J. B. Hiriart-Urruty. Global optimality conditions for maximizing a
convex quadratic function under convex quadratic constraints. J. Global
Optim., 21:445–455, 2001.
[61] J. B. Hiriart-Urruty and Y. S. Ledyaev. A note on the characterization
of the global maxima of a (tangentailly) convex function over a convex
set. J. Convex Anal., 3:55–61, 1996.
[62] J.-B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Min-
imization Algorithms I & II, volume 306: Fundamental Principles of
Mathematical Sciences. Springer-Verlag, Berlin, 1993.
[63] J.-B. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex
Analysis. Grundlehren Text Editions, Springer-Verlag, Berlin, 2001.
[64] J. B. Hiriart-Urruty and R. R. Phelps. Subdifferential calculus using
ε-subdifferentials. J. Funct. Anal., 118:154–166, 1993.
[65] J. Y. Jaffray and J. Ch. Pomerol. A direct proof of the Kuhn-Tucker
necessary optimality theoren for convex and affine inequalities. SIAM
Review, 31:671–674, 1989.
[66] V. Jeyakumar. Asymptotic dual conditions characterizing optimality for
infinite convex programs. J. Optim. Theory Appl., 93:153–165, 1997.
[67] V. Jeyakumar. Characterizing set containments involving infinite convex
constraints and reverse-convex constraints. SIAM J. Optim., 13:947–
959, 2003.
[68] V. Jeyakumar, G. M. Lee, and N. Dinh. New sequential Lagrange mul-
tiplier conditions characterizing optimality without constraint qualifica-
tion for convex programs. SIAM J. Optim., 14:534–547, 2003.

© 2012 by Taylor & Francis Group, LLC


418 Bibliography

[69] V. Jeyakumar and G. Y. Li. Farkas’ lemma for separable sublinear


inequalities without qualifications. Optim. Lett., 3:537–545, 2009.

[70] V. Jeyakumar, A. M. Rubinov, B. M. Glover, and Y. Ishizuka. Inequality


systems and global optimization. J. Math. Anal. Appl., 202:900–919,
1996.

[71] F. John. Extremum problems with inequalities as subsidiary conditions.


In Studies and Essays Presented to R. Courant on His 60th Birthday,
pages 187–204. Interscience Publishers, Inc., New York, 1948.

[72] V. L. Klee. The critical set of a convex body. Am. J. Math., 75:178–188,
1953.

[73] H. W. Kuhn and A. W. Tucker. Nonlinear programming. In Proceed-


ings of the Second Berkeley Symposium on Mathematical Statistics and
Probability, 1950, pages 481–492. University of California Press, Berke-
ley and Los Angeles, 1951.

[74] J. B. Lasserre. On representation of the feasible set in convex optimiza-


tion. Optim. Lett., 4:1–5, 2010.

[75] M. A. López and E. Vercher. Optimality conditions for nondifferen-


tiable convex semi-infinite programming. Math. Programming, 27:307–
319, 1983.

[76] P. Loridan. Necessary conditions for ε-optimality. Math. Programming


Stud., 19:140–152, 1982.

[77] P. Loridan and J. Morgan. Penalty function in ε-programming and


ε-minimax problems. Math. Programming, 26:213–231, 1983.

[78] D. T. Luc, N. X. Tan, and P. N. Tinh. Convex vector functions and


their subdifferential. Acta Math. Vietnam., 23:107–127, 1998.

[79] R. Lucchetti. Convexity and Well-Posed Problems. Springer Science +


Business Media, Inc., New York, 2006.

[80] D. G. Luenberger. Optimization by Vector Space Methods. John Wiley


& Sons, Inc., New York, 1968.

[81] T. L. Magnanti. Fenchel and Lagrange duality are equivalent. Math.


Programming, 7:253–258, 1974.

[82] O. L. Mangasarian. Nonlinear Programming. McGraw-Hill Book Com-


pany, New York, 1969.

[83] E. J. McShane. The Lagrange multiplier rule. Am. Math. Monthly,


80:922–925, 1973.

© 2012 by Taylor & Francis Group, LLC


Bibliography 419

[84] B. S. Mordukhovich. Approximation and maximum principle for non-


smooth problems of optimal control. Russian Math. Surveys, 196:263–
264, 1977.

[85] B. S. Mordukhovich. Metric approximations and necessary optimality


conditions for general class of nonsmooth extremal problems. Soviet
Math. Doklady, 22:526–530, 1980.

[86] B. S. Mordukhovich. Variational Analysis and Generalized Differentia-


tion I & II. Springer-Verlag, Berlin, 2006.

[87] J. J. Moreau. Fonctions convexes en dualite. Seminaire de Mathema-


tiques de la Faculte des Sciences de Montpellier, (1), 1962.

[88] J. J. Moreau. Convexity and duality. In Functional Analysis and Opti-


mization, pages 145–169. Academic Press, New York, 1966.

[89] J. J. Moreau. Inf-convolution, sous-additivité, convexité des fonctions


numériques. J. Math. Pures Appl., 49:109–154, 1970.

[90] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic


Course. In Applied Optimization, volume 87. Kluwer Academic Publish-
ers, 2004.

[91] A. E. Ozdaglar. Pseudonormality and a Lagrange Multiplier Theory for


Constrained Optimization. Mass. Institute of Technology, Cambridge,
MA, 2003. Ph. D. thesis.

[92] A. E. Ozdaglar and D. P. Bertsekas. The relation between pseudonor-


mality and quasiregularity in constrained optimization. Optim. Methods
Softw., 19:493–506, 2004.

[93] R. R. Phelps. Convex Functions, Monotone Operators and Differen-


tiability, volume 1364: Lecture Notes in Mathematics. Springer-Verlag,
Berlin.

[94] B. T. Polyak. Sharp Minima. In Institute of Control Sciences Lecture


Notes, Moscow, 1979. Presented at the IIASA Workshop on Generalized
Lagrangians and Their Applications, IIASA, Laxenburg, Austria.

[95] B. T. Polyak. Introduction to Optimization. Optimmization Software,


Inc., Publications Division, New York, 1987.

[96] B. N. Pshenichnyi. Necessary Conditions for an Extremum, volume 4:


Pure and Applied Mathematics. Marcel Dekker, Inc., New York, 1971.

[97] R. T. Rockafellar. Convex Analysis, volume 28: Princeton Mathematical


Series. Princeton University Press, Princeton, NJ, 1970.

© 2012 by Taylor & Francis Group, LLC


420 Bibliography

[98] R. T. Rockafellar. Some convex programs whose duals are linearly con-
strained. In Nonlinear Programming (Proc. Sympos., Univ. of Wiscon-
sin, Madison, Wis., 1970), pages 293–322. Academic Press, New York,
1970.

[99] R. T. Rockafellar. Conjugate Duality and Optimization. Society for


Industrial and Applied Mathematics, Philadelphia, 1974.

[100] R. T. Rockafellar. Lagrange multipliers and optimality. SIAM Review,


35:183–238, 1993.

[101] R. T. Rockafellar and R. J.-B. Wets. Variational Analysis, volume


317: Fundamental Principles of Mathematical Sciences. Springer-Verlag,
Berlin.

[102] A. Ruszczynski. Nonlinear Optimization. Princeton University Press,


Princeton, NJ, 2006.

[103] R. Schneider. Convex Bodies: The Brunn–Minkowski Theory, volume


44: Encyclopedia of Mathematics and its Applications. Cambridge Uni-
versity Press, Cambridge, 1993.

[104] A. S. Strekavolskiĭ. On the problem of the global extremum. Soviet


Math. Doklady, 292:1062–1066, 1987.

[105] A. S. Strekavolskiĭ. Search for the global maximum of a convex func-


tional on an admissible set. Comp. Math. Math. Phys., 33:349–363,
1993.

[106] J.-J. Strodiot, V. H. Nguyen, and N. Heukemes. ε-Optimal solutions


in nondifferentiable convex programming and some related questions.
Math. Programming, 25:307–328, 1983.

[107] T. Strömberg. The operation of infimal convolution. Dissertationes


Math., 352:1–61, 1996.

[108] L. Thibault. Sequential convex subdifferential calculus and sequential


Lagrange multipliers. SIAM J. Control Optim., 35:1434–1444, 1997.

[109] M. Valadier. Sous-diffrentiels d’une borne suprieure et d’une somme


continue de fonctions convexes. C. R. Acad. Sci. Paris, 268:39–42, 1969.

[110] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley & Sons,
Inc., New York, 1984.

[111] R.-J. Wets. Elementary constructive proofs of the theorems of Farkas,


Minkowski and Weyl. In Econimic Decision Making: Games, Econo-
metrics and Optimization, pages 427–432. Elsevier-Science, Amsterdam,
1990.

© 2012 by Taylor & Francis Group, LLC


Bibliography 421

[112] H. Wolkowicz. Geometry of optimality conditions and constraint quali-


fications: The convex case. Math. Programming, 19:32–60, 1980.

[113] K. Yokoyama. ε-Optimality criteria for convex programming problems


via exact penalty functions. Math. Programming, 56:233–243, 1992.
[114] W. I. Zangwill. Non-linear programming via penalty functions. Manag.
Sci., 13:344–358, 1967.

© 2012 by Taylor & Francis Group, LLC


Index

S-convex function, 161 canonically closed system, 389


ε-complementary slackness, 342 Carathéodory Theorem, 28
ε-feasible set, 338 Cauchy–Schwarz inequality, 4
ε-maximum solution, 348 CC qualification condition, 305
ε-minimum solution, 348 Chain Rule, 101, 164
ε-normal set, 123, 339 Clarke directional derivative, 320
ε-saddle point, 345 Clarke generalized gradient, 163, 289
ε-solution, 135, 337 Clarke Jacobian, 163
ε-subdifferential, 96, 122, 123, 338, Clarke subdifferential, 163, 289, 320
409 closed cone constraint qualification,
281, 300, 395
Abadie constraint qualification, 154, closed convex hull of function, 70
214, 270, 375 closed function, 10
abstract constraint, 157 closed graph theorem, 93
active constraint multipliers, 400 closed half space, 23, 44
active index set, 58, 104, 146, 213, closed map, 15
249, 255, 316, 367 closed-valued map, 15
affine combination, 31 closure of function, 10, 77, 79
affine function, 3 closure of set, 4, 31
affine hull, 31 coercive, 12, 351
affine set, 24 complementary slackness, 151, 172
affine support, 113 complementary violation condition,
almost ε-solution, 236, 338, 348, 350 212
approximate optimality conditions, concave function, 63
337 cone, 40, 243
approximate solutions, 337 cone constrained problem, 161, 162
approximate up to ε, 337 cone convex function, 162
cone generated by set, 40
badly behaved constraints, 255 conic combination, 40
biconjugate function, 114 conjugate function, 68, 111, 112, 114,
bilevel programming, 308 198
bipolar cone, 50 consequence relation, 383
blunt cone, 243 constraint qualification, 213
Bolzano–Weierstrass Theorem, 5 continuous, 6, 75
bounded sequence, 5 convergent sequence, 4
bounded-valued map, 15 convex analysis, 2, 23
Brønsted–Rockafellar Theorem, 128 convex combination, 25

423

© 2012 by Taylor & Francis Group, LLC


424 Index

convex cone, 39, 40 Fenchel duality, 196


convex cone generated by set, 40 Fenchel–Young inequality, 116
convex function, 2, 3, 62, 286 finitely generated cone, 61
convex hull, 27 finitely generated set, 61
convex hull of function, 70 Fritz John optimality condition, 207
convex locally Farkas-Minkowski
problem, 378 gap function, 19
convex optimization, 2, 315 generalized Lagrange multiplier, 190
convex programming, 3 generators of cone, 61
convex set, 2, 3, 23 generators of set, 61
convex-valued map, 15 geometric optimality condition, 249,
core, 39 255
Gordan’s theorem of alternative, 379
d.c. function, 403, 408 gradient, 13
derivative, 13, 14, 85 graph, 15
direction of constancy, 244
direction of recession, 41 Helly’s Theorem, 49
direction sets, 243 Hessian, 14
directional derivative, 85, 320 hyperplane, 23, 44
distance function, 65
domain, 3, 62 improper function, 62, 75
dual problem, 170, 185, 235, 361 indicator function, 65, 123
duality, 170 Inf-Convolution Rule, 118
Dubovitskii–Milyutin Theorem, 252 infimal/inf-convolution, 68
Infimum Rule, 118
Ekeland’s variational principle, 127, inner product, 4
135, 355 interior, 4, 31
enhanced dual Fritz John condition,
235 Jacobian, 14
enhanced Fritz John optimality Jensen’s inequality, 64
condition, 207, 208
epigraph, 4, 62, 75, 136, 286 Karush–Kuhn–Tucker (KKT)
equality set, 255 optimality condition, 2, 151
error bound, 19 Karush–Kuhn–Tucker multiplier, 151
exact penalty function, 350
extended-valued function, 3 Lagrange multiplier, 1, 146, 151, 188
Lagrangian duality, 185
faithfully convex function, 244 Lagrangian function, 172, 238, 345
Farkas’ Lemma, 275, 398 Lagrangian regular point, 374, 375
Farkas–Minkowski (FM) constraint limit infimum of function, 6
qualification, 300 limit infimum of sequence, 5
Farkas–Minkowski (FM) system, 383 limit point, 5
Farkas–Minkowski qualification, 382, limit supremum of function, 6
383, 386 limit supremum of sequence, 5
feasible direction, 54 line segment principle, 32
feasible direction cone, 377 linear programming, 1, 25, 61, 327

© 2012 by Taylor & Francis Group, LLC


Index 425

linear semi-infinite system, 383 polyhedral cone, 61


linearity criteria, 213, 221 polyhedral set, 25, 58, 60
Lipschitz constant, 82, 163 positive cone, 365
Lipschitz function, 2, 82, 320 positive polar cone, 53
locally bounded map, 15 positively homogeneous, 71
locally Lipschitz function, 82, 163 primal problem, 170
lower limit of function, 6 product space, 365
lower limit of sequence, 5 projection, 65
lower semicontinuous (lsc), 5 prolongation principle, 32
lower semicontinuous hull, 10 proper function, 3, 62, 75
lower-level set, 8 proper map, 15
proper separation, 45, 221
Mangasarian–Fromovitz constraint pseudonormality, 213, 220
qualification, 317
marginal function, 190 quasi ε-solution, 338, 355
max-function, 104, 159, 342 quasinormality, 215
Max-Function Rule, 106, 132 quasiregularity, 215
maximal monotone, 95
Mean Value Theorem, 14 Rademacher Theorem, 163
merit function, 19 recession cone, 41
metric approximation, 209 regular ε-solution, 338
minimax equality, 170 regular function, 320
minimax inequality, 170 regular point, 270
modified ε-KKT conditions, 358 regularization condition, 254
modified ε-KKT point, 358 relative interior, 31
modified Slater constraint relaxed ε-complementary slackness,
qualification, 176, 183 346, 358
monotone, 18 relaxed Slater constraint
multifunction, 93 qualification, 372
right scalar multiplication, 339
nonconvex optimization, 403
nondecreasing function, 100, 286 saddle point, 169, 170
nondegeneracy condition, 316, 321 saddle point condition, 170, 216
nonsmooth function, 13, 243, 320 Saddle Point Theorem, 171
nonsmooth optimization, 2 Scalar Product Rule, 131
norm, 4 second-order derivative, 14
normal cone, 40, 54, 57, 89 semi-infinite programming, 365
separable sublinear function, 274
open ball, 4 separating hyperplane, 44
open half space, 24, 44 separation theorem, 44, 45
orthogonal complement, 48 Sequential Chain Rule, 286, 290
sequential optimality conditions, 243,
parameter, 199 281, 291, 395
parameterized family, 199 Sequential Sum Rule, 282
penalty function, 209 set-valued map, 15, 93
polar cone, 40, 50 sharp minimum, 327

© 2012 by Taylor & Francis Group, LLC


426 Index

Slater constraint qualification, 145,


146, 167, 172, 214, 254, 272,
302, 316, 339, 366
Slater-type constraint qualification,
157, 162, 213, 221, 236
slope inequality, 86
smooth function, 13, 243, 315
strict convex function, 63
strict epigraph, 64
strict separation, 45
strong duality, 186
strongly convex function, 19
strongly convex optimization, 19
strongly monotone, 20
strongly unique local minimum, 327
subadditive, 71
subdifferential, 89, 162
subdifferential calculus, 98
subgradient, 89, 162
sublinear function, 66, 71, 274, 320
subsequence, 5
Sum Rule, 98, 118, 129, 137, 163
sup-function approach, 366
support function, 66, 71, 72
support set, 113, 366
supporting hyperplane, 45
Supremum Rule, 118, 137

tangent cone, 40, 54


two-person-zero-sum game, 169

upper limit of function, 6


upper limit of sequence, 5
upper semicontinuous (usc), 6, 94
upper semicontinuous (usc) map, 15

Valadier Formula, 107


value function, 190, 197
value of game, 170
variational inequality, 17

weak duality, 186


weak sharp minimum, 327, 328
weakest constraint qualification, 270
Weierstrass Theorem, 12

© 2012 by Taylor & Francis Group, LLC

You might also like