This action might not be possible to undo. Are you sure you want to continue?
LECTURE NOTES
Convexity, Duality, and
Lagrange Multipliers
Dimitri P. Bertsekas
with
Angelia Nedi´c and Asuman E. Ozdaglar
Massachusetts Institute of Technology
Spring 2001
These notes were developed for the needs of the 6.291 class at
M.I.T. (Spring 2001). They are copyrightprotected, but they
may be reproduced freely for noncommercial purposes.
Contents
1. Convex Analysis and Optimization
1.1. Linear Algebra and Analysis . . . . . . . . . . . . . . . . .
1.1.1. Vectors and Matrices . . . . . . . . . . . . . . . . . .
1.1.2. Topological Properties . . . . . . . . . . . . . . . . . .
1.1.3. Square Matrices . . . . . . . . . . . . . . . . . . . .
1.1.4. Derivatives . . . . . . . . . . . . . . . . . . . . . . .
1.2. Convex Sets and Functions . . . . . . . . . . . . . . . . . .
1.2.1. Basic Properties . . . . . . . . . . . . . . . . . . . .
1.2.2. Convex and Aﬃne Hulls . . . . . . . . . . . . . . . . .
1.2.3. Closure, Relative Interior, and Continuity . . . . . . . . .
1.2.4. Recession Cones . . . . . . . . . . . . . . . . . . . .
1.3. Convexity and Optimization . . . . . . . . . . . . . . . . .
1.3.1. Global and Local Minima . . . . . . . . . . . . . . . .
1.3.2. The Projection Theorem . . . . . . . . . . . . . . . . .
1.3.3. Directions of Recession
and Existence of Optimal Solutions . . . . . . . . . . . .
1.3.4. Existence of Saddle Points . . . . . . . . . . . . . . . .
1.4. Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1. Support and Separation Theorems . . . . . . . . . . . .
1.4.2. Nonvertical Hyperplanes and the Minimax Theorem . . . . .
1.5. Conical Approximations and Constrained Optimization . . . . .
1.6. Polyhedral Convexity . . . . . . . . . . . . . . . . . . . .
1.6.1. Polyhedral Cones . . . . . . . . . . . . . . . . . . . .
1.6.2. Polyhedral Sets . . . . . . . . . . . . . . . . . . . . .
1.6.3. Extreme Points . . . . . . . . . . . . . . . . . . . . .
1.6.4. Extreme Points and Linear Programming . . . . . . . . .
1.7. Subgradients . . . . . . . . . . . . . . . . . . . . . . . .
1.7.1. Directional Derivatives . . . . . . . . . . . . . . . . . .
1.7.2. Subgradients and Subdiﬀerentials . . . . . . . . . . . . .
1.7.3. Subgradients . . . . . . . . . . . . . . . . . . . . .
1.7.4. Subgradients of Extended RealValued Functions . . . . . .
1.7.5. Directional Derivative of the Max Function . . . . . . . . .
1.8. Optimality Conditions . . . . . . . . . . . . . . . . . . . .
iii
iv Contents
1.9. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
2. Lagrange Multipliers
2.1. Introduction to Lagrange Multipliers . . . . . . . . . . . . .
2.2. Enhanced Fritz John Optimality Conditions . . . . . . . . . .
2.3. Informative Lagrange Multipliers . . . . . . . . . . . . . . .
2.4. Pseudonormality and Constraint Qualiﬁcations . . . . . . . . .
2.5. Exact Penalty Functions . . . . . . . . . . . . . . . . . . .
2.6. Using the Extended Representation . . . . . . . . . . . . . .
2.7. Extensions to the Nondiﬀerentiable Case . . . . . . . . . . . .
2.8. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
3. Lagrangian Duality
3.1. Geometric Multipliers . . . . . . . . . . . . . . . . . . . .
3.2. Duality Theory . . . . . . . . . . . . . . . . . . . . . . .
3.3. Linear and Quadratic Programming Duality . . . . . . . . . .
3.4. Strong Duality Theorems . . . . . . . . . . . . . . . . . .
3.4.1. Convex Cost – Linear Constraints . . . . . . . . . . . . .
3.4.2. Convex Cost – Convex Constraints . . . . . . . . . . . .
3.5. Fritz John Conditions when there is no Optimal Solution . . . .
3.6. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
4. Conjugate Duality and Applications
4.1. Conjugate Functions . . . . . . . . . . . . . . . . . . . .
4.2. The Fenchel Duality Theorem . . . . . . . . . . . . . . . .
4.3. The Primal Function and Sensitivity Analysis . . . . . . . . .
4.4. Exact Penalty Functions . . . . . . . . . . . . . . . . . . .
4.5. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
5. Dual Computational Methods
5.1. Dual Subgradients . . . . . . . . . . . . . . . . . . . . .
5.2. Subgradient Methods . . . . . . . . . . . . . . . . . . . .
5.2.1. Analysis of Subgradient Methods . . . . . . . . . . . . .
5.2.2. Subgradient Methods with Randomization . . . . . . . . .
5.3. Cutting Plane Methods . . . . . . . . . . . . . . . . . . .
5.4. Ascent Methods . . . . . . . . . . . . . . . . . . . . . .
5.5. Notes and Sources . . . . . . . . . . . . . . . . . . . . .
Preface
These lecture notes were developed for the needs of a graduate course at
the Electrical Engineering and Computer Science Department at M.I.T.
They focus selectively on a number of fundamental analytical and com
putational topics in (deterministic) optimization that span a broad range
from continuous to discrete optimization, but are connected through the
recurring theme of convexity, Lagrange multipliers, and duality. These top
ics include Lagrange multiplier theory, Lagrangian and conjugate duality,
and nondiﬀerentiable optimization. The notes contain substantial portions
that are adapted from my textbook “Nonlinear Programming: 2nd Edi
tion,” Athena Scientiﬁc, 1999. However, the notes are more advanced,
more mathematical, and more researchoriented.
As part of the course I have also decided to develop in detail those
aspects of the theory of convex sets and functions that are essential for an
indepth coverage of Lagrange multipliers and duality. I have long thought
that convexity, aside from being an eminently useful subject in engineering
and operations research, is also an excellent vehicle for assimilating some
of the basic concepts of analysis within an intuitive geometrical setting.
Unfortunately, the subject’s coverage in mathematics and engineering cur
ricula is scant and incidental. I believe that at least part of the reason
is that while there are a number of excellent books on convexity, as well
as a true classic (Rockafellar’s 1970 book), none of them is well suited for
teaching nonmathematicians who form the largest part of the potential
audience.
I have therefore tried in these notes to make convex analysis acces
sible by limiting somewhat its scope and by emphasizing its geometrical
character, while at the same time maintaining mathematical rigor. The
coverage of the theory is signiﬁcantly extended in the exercises, whose de
tailed solutions will be eventually posted on the internet. I have included
as many insightful illustrations as I could come up with, and I have tried to
use geometric visualization as a principal tool for maintaining the students’
interest in mathematical proofs.
An important part of my approach has been to maintain a close link
between the theoretical treatment of convexity concepts with their appli
v
vi Preface
cation to optimization. For example, in Chapter 1, soon after the devel
opment of some of the basic facts about convexity, I discuss some of their
applications to optimization and saddle point theory; soon after the dis
cussion of hyperplanes and cones, I discuss conical approximations and
necessary conditions for optimality; soon after the discussion of polyhedral
convexity, I discuss its application in linear and integer programming; and
soon after the discussion of subgradients, I discuss their use in optimality
conditions. I follow consistently this style in the remaining chapters, al
though having developed in Chapter 1 most of the needed convexity theory,
the discussion in the subsequent chapters is more heavily weighted towards
optimization.
In addition to their educational purpose, these notes aim to develop
two topics that I have recently researched with two of my students, and to
integrate them into the overall landscape of convexity, duality, and opti
mization. These topics are:
(a) A new approach to Lagrange multiplier theory, based on a set of en
hanced FritzJohn conditions and the notion of constraint pseudonor
mality. This work, joint with my Ph.D. student Asuman Ozdaglar,
aims to generalize, unify, and streamline the theory of constraint
qualiﬁcations. It allows for an abstract set constraint (in addition
to equalities and inequalities), it highlights the essential structure of
constraints using the new notion of pseudonormality, and it devel
ops the connection between Lagrange multipliers and exact penalty
functions.
(b) A new approach to the computational solution of (nondiﬀerentiable)
dual problems via incremental subgradient methods. These meth
ods, developed jointly with my Ph.D. student Angelia Nedi´c, include
some interesting randomized variants, which according to both anal
ysis and experiment, perform substantially better than the standard
subgradient methods for large scale problems that typically arise in
the context of duality.
The lecture notes may be freely reproduced and distributed for non
commercial purposes. They represent workinprogress, and your feedback
and suggestions for improvements in content and style will be most wel
come.
Dimitri P. Bertsekas
dimitrib@mit.edu
Spring 2001
1
Convex Analysis and
Optimization
Date: July 26, 2001
Contents
1.1. Linear Algebra and Analysis . . . . . . . . . . . . . p. 4
1.1.1. Vectors and Matrices . . . . . . . . . . . . . . p. 5
1.1.2. Topological Properties . . . . . . . . . . . . . p. 8
1.1.3. Square Matrices . . . . . . . . . . . . . . . . p. 15
1.1.4. Derivatives . . . . . . . . . . . . . . . . . . . p. 20
1.2. Convex Sets and Functions . . . . . . . . . . . . . . p. 24
1.2.1. Basic Properties . . . . . . . . . . . . . . . . p. 24
1.2.2. Convex and Aﬃne Hulls . . . . . . . . . . . . . p. 37
1.2.3. Closure, Relative Interior, and Continuity . . . . . p. 39
1.2.4. Recession Cones . . . . . . . . . . . . . . . . p. 47
1.3. Convexity and Optimization . . . . . . . . . . . . . p. 61
1.3.1. Global and Local Minima . . . . . . . . . . . . p. 61
1.3.2. The Projection Theorem . . . . . . . . . . . . . p. 64
1.3.3. Directions of Recession
and Existence of Optimal Solutions . . . . . . . . p. 67
1.3.4. Existence of Saddle Points . . . . . . . . . . . . p. 75
1.4. Hyperplanes . . . . . . . . . . . . . . . . . . . . p. 87
1
2 Convex Analysis and Optimization Chap. 1
1.4.1. Support and Separation Theorems . . . . . . . . p. 87
1.4.2. Nonvertical Hyperplanes and the Minimax Theorem . p. 91
1.5. Conical Approximations and Constrained Optimization p. 102
1.6. Polyhedral Convexity . . . . . . . . . . . . . . . p. 115
1.6.1. Polyhedral Cones . . . . . . . . . . . . . . . p. 115
1.6.2. Polyhedral Sets . . . . . . . . . . . . . . . . p. 122
1.6.3. Extreme Points . . . . . . . . . . . . . . . . p. 123
1.6.4. Extreme Points and Linear Programming . . . . p. 128
1.7. Subgradients . . . . . . . . . . . . . . . . . . . p. 139
1.7.1. Directional Derivatives . . . . . . . . . . . . . p. 139
1.7.2. Subgradients and Subdiﬀerentials . . . . . . . . p. 144
1.7.3. Subgradients . . . . . . . . . . . . . . . . p. 151
1.7.4. Subgradients of Extended RealValued Functions . p. 157
1.7.5. Directional Derivative of the Max Function . . . p. 159
1.8. Optimality Conditions . . . . . . . . . . . . . . . p. 167
1.9. Notes and Sources . . . . . . . . . . . . . . . . p. 172
Sec. 1.0 3
In this chapter we provide the mathematical foundations for Lagrange mul
tiplier theory and duality, which are developed in the subsequent chapters.
After an introductory ﬁrst section, we focus on convex analysis with an
emphasis on optimizationrelated topics. We assume no prior knowledge of
the subject, and we develop the subject in detail.
As we embark on the study of convexity, it is worth listing some of
the properties of convex sets and functions that make them so special in
optimization.
(a) Convex functions have no local minima that are not global . Thus the
diﬃculties associated with multiple disconnected local minima, whose
global optimality is hard to verify in practice, are avoided.
(b) Convex sets are connected and have feasible directions at any point
(assuming they consist of more than one point). By this we mean that
given any point x in a convex set X, it is possible to move from x
along some directions and stay within X for at least a nontrivial inter
val. In fact a stronger property holds: given any two distinct points
x and x in X, the direction x − x is a feasible direction at x, and
all feasible directions can be characterized this way. For optimization
purposes, this is important because it allows a calculusbased com
parison of the cost of x with the cost of its close neighbors, and forms
the basis for some important algorithms. Furthermore, much of the
diﬃculty commonly associated with discrete constraint sets (arising
for example in combinatorial optimization), is not encountered under
convexity.
(c) Convex sets have a nonempty relative interior. In other words, when
viewed within the smallest aﬃne set containing it, a convex set has a
nonempty interior. Thus convex sets avoid the analytical and compu
tational optimization diﬃculties associated with “thin” and “curved”
constraint surfaces.
(d) A nonconvex function can be “convexiﬁed” while maintaining the opti
mality of its global minima, by forming the convex hull of the epigraph
of the function.
(e) The existence of a global minimum of a convex function over a convex
set is conveniently characterized in terms of directions of recession
(see Section 1.3).
(f) Polyhedral convex sets (those speciﬁed by linear equality and inequal
ity constraints) are characterized in terms of a ﬁnite set of extreme
points and extreme directions. This is the basis for ﬁnitely terminat
ing methods for linear programming, including the celebrated simplex
method (see Section 1.6).
(g) Convex functions are continuous and have nice diﬀerentiability prop
erties. In particular, a realvalued convex function is directionally
4 Convex Analysis and Optimization Chap. 1
diﬀerentiable at any point. Furthermore, while a convex function
need not be diﬀerentiable, it possesses subgradients, which are nice
and geometrically intuitive substitutes for a gradient (see Section 1.7).
Just like gradients, subgradients ﬁgure prominently in optimality con
ditions and computational algorithms.
(h) Convex functions are central in duality theory. Indeed, the dual prob
lem of a given optimization problem (discussed in Chapters 3 and 4)
consists of minimization of a convex function over a convex set, even
if the original problem is not convex.
(i) Closed convex cones are selfdual with respect to polarity. In words,
if X
∗
denotes the set of vectors that form a nonpositive inner prod
uct with all vectors in a set X, we have C = (C
∗
)
∗
for any closed
and convex cone C. This simple and geometrically intuitive property
(discussed in Section 1.5) underlies important aspects of Lagrange
multiplier theory.
(j) Convex, lower semicontinuous functions are selfdual with respect to
conjugacy. It will be seen in Chapter 4 that a certain geometrically
motivated conjugacy operation on a given convex, lower semicontinu
ous function generates a convex, lower semicontinuous function, and
when applied for a second time regenerates the original function. The
conjugacy operation is central in duality theory, and has a nice inter
pretation that can be used to visualize and understand some of the
most profound aspects of optimization.
Our approach in this chapter is to maintain a close link between the
theoretical treatment of convexity concepts with their application to opti
mization. For example, soon after the development for some of the basic
facts about convexity in Section 1.2, we discuss some of their applications
to optimization in Section 1.3; and soon after the discussion of hyperplanes
and cones in Sections 1.4 and 1.5, we discuss conditions for optimality.
We follow this approach throughout the book, although given the exten
sive development of convexity theory in Chapter 1, the discussion in the
subsequent chapters is more heavily weighted towards optimization.
1.1 LINEAR ALGEBRA AND ANALYSIS
In this section, we list some basic deﬁnitions, notational conventions, and
results from linear algebra and analysis. We assume that the reader is
familiar with this material, so no proofs are given. For related and addi
tional material, we recommend the books by Hoﬀman and Kunze [HoK71],
Lancaster and Tismenetsky [LaT85], and Strang [Str76] (linear algebra),
Sec. 1.1 Linear Algebra and Analysis 5
and the books by Ash [Ash72], Ortega and Rheinboldt [OrR70], and Rudin
[Rud76] (analysis).
Notation
If X is a set and x is an element of X, we write x ∈ X. A set can be
speciﬁed in the form X = ¦x [ x satisﬁes P¦, as the set of all elements
satisfying property P. The union of two sets X
1
and X
2
is denoted by
X
1
∪ X
2
and their intersection by X
1
∩ X
2
. The symbols ∃ and ∀ have
the meanings “there exists” and “for all,” respectively. The empty set is
denoted by Ø.
The set of real numbers (also referred to as scalars) is denoted by '.
The set ' augmented with +∞ and −∞ is called the set of extended real
numbers. We denote by [a, b] the set of (possibly extended) real numbers x
satisfying a ≤ x ≤ b. A rounded, instead of square, bracket denotes strict
inequality in the deﬁnition. Thus (a, b], [a, b), and (a, b) denote the set of
all x satisfying a < x ≤ b, a ≤ x < b, and a < x < b, respectively. When
working with extended real numbers, we use the natural extensions of the
rules of arithmetic: x 0 = 0 for every extended real number x, x ∞ = ∞
if x > 0, x ∞ = −∞ if x < 0, and x + ∞ = ∞ and x − ∞ = −∞ for
every scalar x. The expression ∞−∞ is meaningless and is never allowed
to occur.
If f is a function, we use the notation f : X → Y to indicate the fact
that f is deﬁned on a set X (its domain) and takes values in a set Y (its
range). If f : X → Y is a function, and U and V are subsets of X and Y ,
respectively, the set
_
f(x) [ x ∈ U
_
is called the image or forward image
of U, and the set
_
x ∈ X [ f(x) ∈ V
_
is called the inverse image of V .
1.1.1 Vectors and Matrices
We denote by '
n
the set of ndimensional real vectors. For any x ∈ '
n
,
we use xi to indicate its ith coordinate, also called its ith component.
Vectors in '
n
will be viewed as column vectors, unless the contrary
is explicitly stated. For any x ∈ '
n
, x
denotes the transpose of x, which is
an ndimensional row vector. The inner product of two vectors x, y ∈ '
n
is
deﬁned by x
y =
n
i=1
x
i
y
i
. Any two vectors x, y ∈ '
n
satisfying x
y = 0
are called orthogonal .
If x is a vector in '
n
, the notations x > 0 and x ≥ 0 indicate that all
coordinates of x are positive and nonnegative, respectively. For any two
vectors x and y, the notation x > y means that x −y > 0. The notations
x ≥ y, x < y, etc., are to be interpreted accordingly.
If X is a set and λ is a scalar we denote by λX the set ¦λx [ x ∈ X¦.
If X1 and X2 are two subsets of '
n
, we denote by X1 +X2 the vector sum
¦x
1
+ x
2
[ x
1
∈ X
1
, x
2
∈ X
2
¦.
6 Convex Analysis and Optimization Chap. 1
We use a similar notation for the sum of any ﬁnite number of subsets. In
the case where one of the subsets consists of a single vector x, we simplify
this notation as follows:
x + X = ¦x + x [ x ∈ X¦.
Given sets X
i
⊂ '
n
i , i = 1, . . . , m, the Cartesian product of the X
i
,
denoted by X
1
X
m
, is the subset
_
(x
1
, . . . , x
m
) [ x
i
∈ X
i
, i = 1, . . . , m
_
of '
n
1
+···+nm
.
Subspaces and Linear Independence
A subset S of '
n
is called a subspace if ax + by ∈ S for every x, y ∈ S and
every a, b ∈ '. An aﬃne set in '
n
is a translated subspace, i.e., a set of
the form x + S = ¦x + x [ x ∈ S¦, where x is a vector in '
n
and S is a
subspace of '
n
. The span of a ﬁnite collection ¦x
1
, . . . , x
m
¦ of elements
of '
n
(also called the subspace generated by the collection) is the subspace
consisting of all vectors y of the form y =
m
k=1
a
k
x
k
, where each a
k
is a
scalar.
The vectors x
1
, . . . , x
m
∈ '
n
are called linearly independent if there
exists no set of scalars a
1
, . . . , a
m
such that
m
k=1
a
k
x
k
= 0, unless a
k
= 0
for each k. An equivalent deﬁnition is that x
1
,= 0, and for every k > 1,
the vector x
k
does not belong to the span of x1, . . . , x
k−1
.
If S is a subspace of '
n
containing at least one nonzero vector, a basis
for S is a collection of vectors that are linearly independent and whose
span is equal to S. Every basis of a given subspace has the same number
of vectors. This number is called the dimension of S. By convention, the
subspace ¦0¦ is said to have dimension zero. The dimension of an aﬃne set
x + S is the dimension of the corresponding subspace S. Every subspace
of nonzero dimension has an orthogonal basis, i.e., a basis consisting of
mutually orthogonal vectors.
Given any set X, the set of vectors that are orthogonal to all elements
of X is a subspace denoted by X
⊥
:
X
⊥
= ¦y [ y
x = 0, ∀ x ∈ X¦.
If S is a subspace, S
⊥
is called the orthogonal complement of S. It can
be shown that (S
⊥
)
⊥
= S (see the Polar Cone Theorem in Section 1.5).
Furthermore, any vector x can be uniquely decomposed as the sum of a
vector from S and a vector from S
⊥
(see the Projection Theorem in Section
1.3.2).
Sec. 1.1 Linear Algebra and Analysis 7
Matrices
For any matrix A, we use A
ij
, [A]
ij
, or a
ij
to denote its ijth element. The
transpose of A, denoted by A
, is deﬁned by [A
]ij = aji. For any two
matrices A and B of compatible dimensions, the transpose of the product
matrix AB satisﬁes (AB)
= B
A
.
If X is a subset of '
n
and A is an mn matrix, then the image of
X under A is denoted by AX (or A X if this enhances notational clarity):
AX = ¦Ax [ x ∈ X¦.
If X is subspace, then AX is also a subspace.
Let A be a square matrix. We say that A is symmetric if A
= A. We
say that A is diagonal if [A]ij = 0 whenever i ,= j. We use I to denote the
identity matrix. The determinant of A is denoted by det(A).
Let A be an mn matrix. The range space of A, denoted by R(A),
is the set of all vectors y ∈ '
m
such that y = Ax for some x ∈ '
n
. The
null space of A, denoted by N(A), is the set of all vectors x ∈ '
n
such
that Ax = 0. It is seen that the range space and the null space of A are
subspaces. The rank of A is the dimension of the range space of A. The
rank of A is equal to the maximal number of linearly independent columns
of A, and is also equal to the maximal number of linearly independent rows
of A. The matrix A and its transpose A
have the same rank. We say that
A has full rank, if its rank is equal to min¦m, n¦. This is true if and only
if either all the rows of A are linearly independent, or all the columns of A
are linearly independent.
The range of an mn matrix A and the orthogonal complement of
the nullspace of its transpose are equal, i.e.,
R(A) = N(A
)
⊥
.
Another way to state this result is that given vectors a
1
, . . . , a
n
∈ '
m
(the
columns of A) and a vector x ∈ '
m
, we have x
y = 0 for all y such that
a
i
y = 0 for all i if and only if x = λ
1
a
1
+ + λ
n
a
n
for some scalars
λ
1
, . . . , λ
n
. This is a special case of Farkas’ Lemma, an important result
for constrained optimization, which will be discussed later in Section 1.6.
A useful application of this result is that if S
1
and S
2
are two subspaces of
'
n
, then
S
⊥
1
+ S
⊥
2
= (S1 ∩ S2)
⊥
.
This follows by introducing matrices B1 and B2 such that S1 = ¦x [ B1x =
0¦ = N(B
1
) and S
2
= ¦x [ B
2
x = 0¦ = N(B
2
), and writing
S
⊥
1
+S
⊥
2
= R
_
[ B
1
B
2
]
_
= N
__
B
1
B
2
__
⊥
=
_
N(B
1
)∩N(B
2
)
_
⊥
= (S
1
∩S
2
)
⊥
8 Convex Analysis and Optimization Chap. 1
A function f : '
n
→ ' is said to be aﬃne if it has the form f(x) =
a
x + b for some a ∈ '
n
and b ∈ '. Similarly, a function f : '
n
→ '
m
is
said to be aﬃne if it has the form f(x) = Ax + b for some m n matrix
A and some b ∈ '
m
. If b = 0, f is said to be a linear function or linear
transformation.
1.1.2 Topological Properties
Deﬁnition 1.1.1: A norm   on '
n
is a function that assigns a
scalar x to every x ∈ '
n
and that has the following properties:
(a) x ≥ 0 for all x ∈ '
n
.
(b) αx = [α[ x for every scalar α and every x ∈ '
n
.
(c) x = 0 if and only if x = 0.
(d) x + y ≤ x + y for all x, y ∈ '
n
(this is referred to as the
triangle inequality).
The Euclidean norm of a vector x = (x
1
, . . . , x
n
) is deﬁned by
x = (x
x)
1/2
=
_
n
i=1
[xi[
2
_
1/2
.
The space '
n
, equipped with this norm, is called a Euclidean space. We
will use the Euclidean norm almost exclusively in this book. In particular,
in the absence of a clear indication to the contrary,   will denote the
Euclidean norm. Two important results for the Euclidean norm are:
Proposition 1.1.1: (Pythagorean Theorem) For any two vectors
x and y that are orthogonal, we have
x + y
2
= x
2
+y
2
.
Proposition 1.1.2: (Schwartz inequality) For any two vectors x
and y, we have
[x
y[ ≤ x y,
with equality holding if and only if x = αy for some scalar α.
Sec. 1.1 Linear Algebra and Analysis 9
Two other important norms are the maximum norm 
∞
(also called
supnorm or ∞norm), deﬁned by
x
∞
= max
i
[x
i
[,
and the 1norm  1, deﬁned by
x
1
=
n
i=1
[x
i
[.
Sequences
We use both subscripts and superscripts in sequence notation. Generally,
we prefer subscripts, but we use superscripts whenever we need to reserve
the subscript notation for indexing coordinates or components of vectors
and functions. The meaning of the subscripts and superscripts should be
clear from the context in which they are used.
A sequence ¦x
k
[ k = 1, 2, . . .¦ (or ¦x
k
¦ for short) of scalars is said
to converge if there exists a scalar x such that for every > 0 we have
[x
k
−x[ < for every k greater than some integer K (depending on ). We
call the scalar x the limit of ¦x
k
¦, and we also say that ¦x
k
¦ converges to
x; symbolically, x
k
→ x or lim
k→∞
x
k
= x. If for every scalar b there exists
some K (depending on b) such that x
k
≥ b for all k ≥ K, we write x
k
→ ∞
and lim
k→∞
x
k
= ∞. Similarly, if for every scalar b there exists some K
such that x
k
≤ b for all k ≥ K, we write x
k
→−∞ and lim
k→∞
x
k
= −∞.
A sequence ¦x
k
¦ is called a Cauchy sequence if for every > 0, there
exists some K (depending on ) such that [x
k
−x
m
[ < for all k ≥ K and
m ≥ K.
A sequence ¦x
k
¦ is said to be bounded above (respectively, below) if
there exists some scalar b such that x
k
≤ b (respectively, x
k
≥ b) for all
k. It is said to be bounded if it is bounded above and bounded below.
The sequence ¦x
k
¦ is said to be monotonically nonincreasing (respectively,
nondecreasing) if x
k+1
≤ x
k
(respectively, x
k+1
≥ x
k
) for all k. If ¦x
k
¦ con
verges to x and is nonincreasing (nondecreasing), we also use the notation
x
k
↓ x (x
k
↑ x, respectively).
Proposition 1.1.3: Every bounded and monotonically nonincreasing
or nondecreasing scalar sequence converges.
Note that a monotonically nondecreasing sequence ¦x
k
¦ is either
bounded, in which case it converges to some scalar x by the above propo
sition, or else it is unbounded, in which case x
k
→ ∞. Similarly, a mono
tonically nonincreasing sequence ¦x
k
¦ is either bounded and converges, or
it is unbounded, in which case x
k
→ −∞.
10 Convex Analysis and Optimization Chap. 1
The supremum of a nonempty set X of scalars, denoted by sup X, is
deﬁned as the smallest scalar x such that x ≥ y for all y ∈ X. If no such
scalar exists, we say that the supremum of X is ∞. Similarly, the inﬁmum
of X, denoted by inf X, is deﬁned as the largest scalar x such that x ≤ y
for all y ∈ X, and is equal to −∞ if no such scalar exists. For the empty
set, we use the convention
sup(Ø) = −∞, inf(Ø) = ∞.
(This is somewhat paradoxical, since we have that the sup of a set is less
than its inf , but works well for our analysis.) If sup X is equal to a scalar
x that belongs to the set X, we say that x is the maximum point of X and
we often write
x = supX = max X.
Similarly, if inf X is equal to a scalar x that belongs to the set X, we often
write
x = inf X = min X.
Thus, when we write max X (or min X) in place of sup X (or inf X, re
spectively) we do so just for emphasis: we indicate that it is either evident,
or it is known through earlier analysis, or it is about to be shown that the
maximum (or minimum, respectively) of the set X is attained at one of its
points.
Given a scalar sequence ¦x
k
¦, the supremum of the sequence, denoted
by sup
k
x
k
, is deﬁned as sup¦x
k
[ k = 1, 2, . . .¦. The inﬁmum of a sequence
is similarly deﬁned. Given a sequence ¦x
k
¦, let y
m
= sup¦x
k
[ k ≥ m¦,
z
m
= inf ¦x
k
[ k ≥ m¦. The sequences ¦y
m
¦ and ¦z
m
¦ are nonincreasing
and nondecreasing, respectively, and therefore have a limit whenever ¦x
k
¦
is bounded above or is bounded below, respectively (Prop. 1.1.3). The
limit of ym is denoted by limsup
k→∞
x
k
, and is referred to as the limit
superior of ¦x
k
¦. The limit of z
m
is denoted by liminf
k→∞
x
k
, and is
referred to as the limit inferior of ¦x
k
¦. If ¦x
k
¦ is unbounded above,
we write limsup
k→∞
x
k
= ∞, and if it is unbounded below, we write
liminf
k→∞
x
k
= −∞.
Sec. 1.1 Linear Algebra and Analysis 11
Proposition 1.1.4: Let ¦x
k
¦ and ¦y
k
¦ be scalar sequences.
(a) There holds
inf
k
x
k
≤ liminf
k→∞
x
k
≤ limsup
k→∞
x
k
≤ sup
k
x
k
.
(b) ¦x
k
¦ converges if and only if liminf
k→∞
x
k
= limsup
k→∞
x
k
and, in that case, both of these quantities are equal to the limit
of x
k
.
(c) If x
k
≤ y
k
for all k, then
liminf
k→∞
x
k
≤ liminf
k→∞
y
k
, limsup
k→∞
x
k
≤ limsup
k→∞
y
k
.
(d) We have
liminf
k→∞
x
k
+ liminf
k→∞
y
k
≤ liminf
k→∞
(x
k
+ y
k
),
limsup
k→∞
x
k
+ limsup
k→∞
y
k
≥ limsup
k→∞
(x
k
+ y
k
).
A sequence ¦x
k
¦ of vectors in '
n
is said to converge to some x ∈ '
n
if
the ith coordinate of x
k
converges to the ith coordinate of x for every i. We
use the notations x
k
→ x and lim
k→∞
x
k
= x to indicate convergence for
vector sequences as well. The sequence ¦x
k
¦ is called bounded (respectively,
a Cauchy sequence) if each of its corresponding coordinate sequences is
bounded (respectively, a Cauchy sequence). It can be seen that ¦x
k
¦ is
bounded if and only if there exists a scalar c such that x
k
 ≤ c for all k.
Deﬁnition 1.1.2: We say that a vector x ∈ '
n
is a limit point of a se
quence ¦x
k
¦ in '
n
if there exists a subsequence of ¦x
k
¦ that converges
to x.
Proposition 1.1.5: Let ¦x
k
¦ be a sequence in '
n
.
(a) If ¦x
k
¦ is bounded, it has at least one limit point.
(b) ¦x
k
¦ converges if and only if it is bounded and it has a unique
limit point.
12 Convex Analysis and Optimization Chap. 1
(c) ¦x
k
¦ converges if and only if it is a Cauchy sequence.
o() Notation
If p is a positive integer and h : '
n
→ '
m
, then we write
h(x) = o
_
x
p
_
if
lim
k→∞
h(x
k
)
x
k

p
= 0,
for all sequences ¦x
k
¦, with x
k
,= 0 for all k, that converge to 0.
Closed and Open Sets
We say that x is a closure point or limit point of a set X ⊂ '
n
if there
exists a sequence ¦x
k
¦, consisting of elements of X, that converges to x.
The closure of X, denoted cl(X), is the set of all limit points of X.
Deﬁnition 1.1.3: A set X ⊂ '
n
is called closed if it is equal to its
closure. It is called open if its complement (the set ¦x [ x / ∈ X¦) is
closed. It is called bounded if there exists a scalar c such that the
magnitude of any coordinate of any element of X is less than c. It is
called compact if it is closed and bounded.
Deﬁnition 1.1.4: A neighborhood of a vector x is an open set con
taining x. We say that x is an interior point of a set X ⊂ '
n
if there
exists a neighborhood of x that is contained in X. A vector x ∈ cl(X)
which is not an interior point of X is said to be a boundary point of
X.
Let   be a given norm in '
n
. For any > 0 and x
∗
∈ '
n
, consider
the sets
_
x [ x −x
∗
 <
_
,
_
x [ x −x
∗
 ≤
_
.
The ﬁrst set is open and is called an open sphere centered at x
∗
, while the
second set is closed and is called a closed sphere centered at x
∗
. Sometimes
the terms open ball and closed ball are used, respectively.
Sec. 1.1 Linear Algebra and Analysis 13
Proposition 1.1.6:
(a) The union of ﬁnitely many closed sets is closed.
(b) The intersection of closed sets is closed.
(c) The union of open sets is open.
(d) The intersection of ﬁnitely many open sets is open.
(e) A set is open if and only if all of its elements are interior points.
(f) Every subspace of '
n
is closed.
(g) A set X is compact if and only if every sequence of elements of
X has a subsequence that converges to an element of X.
(h) If ¦X
k
¦ is a sequence of nonempty and compact sets such that
X
k
⊃ X
k+1
for all k, then the intersection ∩
∞
k=0
X
k
is nonempty
and compact.
The topological properties of subsets of '
n
, such as being open,
closed, or compact, do not depend on the norm being used. This is a
consequence of the following proposition, referred to as the norm equiva
lence property in '
n
, which shows that if a sequence converges with respect
to one norm, it converges with respect to all other norms.
Proposition 1.1.7: For any two norms   and  
on '
n
, there
exists a scalar c such that x ≤ cx
for all x ∈ '
n
.
Using the preceding proposition, we obtain the following.
Proposition 1.1.8: If a subset of '
n
is open (respectively, closed,
bounded, or compact) with respect to some norm, it is open (respec
tively, closed, bounded, or compact) with respect to all other norms.
Sequences of Sets
Let ¦X
k
¦ be a sequence of nonempty subsets of '
n
. The outer limit of
¦X
k
¦, denoted limsup
k→∞
X
k
, is the set of all x ∈ '
n
such that every
neighborhood of x has a nonempty intersection with inﬁnitely many of the
sets X
k
, k = 1, 2, . . .. Equivalently, limsup
k→∞
X
k
is the set of all limits
of subsequences ¦x
k
¦
K
such that x
k
∈ X
k
, for all k ∈ /.
14 Convex Analysis and Optimization Chap. 1
The inner limit of ¦X
k
¦, denoted liminf
k→∞
X
k
, is the set of all
x ∈ '
n
such that every neighborhood of x has a nonempty intersection
with all except ﬁnitely many of the sets X
k
, k = 1, 2, . . .. Equivalently,
liminf
k→∞
X
k
is the set of all limits of sequences ¦x
k
¦ such that x
k
∈ X
k
,
for all k = 1, 2, . . ..
The sequence ¦X
k
¦ is said to converge to a set X if
X = liminf
k→∞
X
k
= limsup
k→∞
X
k
,
in which case X is said to be the limit of ¦X
k
¦. The inner and outer limits
are closed (possibly empty) sets. If each set X
k
consists of a single point
x
k
, limsup
k→∞
X
k
is the set of limit points of ¦x
k
¦, while liminf
k→∞
X
k
is just the limit of ¦x
k
¦ if ¦x
k
¦ converges, and otherwise it is empty.
Continuity
Let f : X → '
m
be a function, where X is a subset of '
n
, and let x be a
point in X. If there exists a vector y ∈ '
m
such that the sequence
_
f(x
k
)
_
converges to y for every sequence ¦x
k
¦ ⊂ X such that lim
k→∞
x
k
= x,
we write lim
z→x
f(z) = y. If there exists a vector y ∈ '
m
such that the
sequence
_
f(x
k
)
_
converges to y for every sequence ¦x
k
¦ ⊂ X such that
lim
k→∞
x
k
= x and x
k
≤ x (respectively, x
k
≥ x) for all k, we write
lim
z↑x
f(z) = y [respectively, lim
z↓x
f(z)].
Deﬁnition 1.1.5: Let X be a subset of '
n
.
(a) A function f : X → '
m
is called continuous at a point x ∈ X if
lim
z→x
f(z) = f(x).
(b) A function f : X → '
m
is called rightcontinuous (respectively,
leftcontinuous) at a point x ∈ X if lim
z↓x
f(z) = f(x) [respec
tively, lim
z↑x
f(z) = f(x)].
(c) A realvalued function f : X →' is called upper semicontinuous
(respectively, lower semicontinuous) at a point x ∈ X if f(x) ≥
limsup
k→∞
f(x
k
) [respectively, f(x) ≤ liminf
k→∞
f(x
k
)] for ev
ery sequence ¦x
k
¦ of elements of X converging to x.
If f : X → '
m
is continuous at every point of a subset of its domain
X, we say that f is continuous over that subset . If f : X → '
m
is con
tinuous at every point of its domain X, we say that f is continuous. We
use similar terminology for rightcontinuous, leftcontinuous, upper semi
continuous, and lower semicontinuous functions.
Sec. 1.1 Linear Algebra and Analysis 15
Proposition 1.1.9:
(a) The composition of two continuous functions is continuous.
(b) Any vector norm on '
n
is a continuous function.
(c) Let f : '
n
→ '
m
be continuous, and let Y ⊂ '
m
be open
(respectively, closed). Then the inverse image of Y ,
_
x ∈ '
n
[
f(x) ∈ Y
_
, is open (respectively, closed).
(d) Let f : '
n
→ '
m
be continuous, and let X ⊂ '
n
be compact.
Then the forward image of X,
_
f(x) [ x ∈ X
_
, is compact.
Matrix Norms
A norm   on the set of nn matrices is a realvalued function that has
the same properties as vector norms do when the matrix is viewed as an
element of '
n
2
. The norm of an n n matrix A is denoted by A.
We are mainly interested in induced norms, which are constructed as
follows. Given any vector norm  , the corresponding induced matrix
norm, also denoted by  , is deﬁned by
A = sup
_
x∈
n
x=1
_
Ax.
It is easily veriﬁed that for any vector norm, the above equation deﬁnes a
bona ﬁde matrix norm having all the required properties.
Note that by the Schwartz inequality (Prop. 1.1.2), we have
A = sup
x=1
Ax = sup
y=x=1
[y
Ax[.
By reversing the roles of x and y in the above relation and by using the
equality y
Ax = x
A
y, it follows that A = A
.
1.1.3 Square Matrices
Deﬁnition 1.1.6: A square matrix A is called singular if its determi
nant is zero. Otherwise it is called nonsingular or invertible.
16 Convex Analysis and Optimization Chap. 1
Proposition 1.1.10:
(a) Let A be an n n matrix. The following are equivalent:
(i) The matrix A is nonsingular.
(ii) The matrix A
is nonsingular.
(iii) For every nonzero x ∈ '
n
, we have Ax ,= 0.
(iv) For every y ∈ '
n
, there is a unique x ∈ '
n
such that
Ax = y.
(v) There is an n n matrix B such that AB = I = BA.
(vi) The columns of A are linearly independent.
(vii) The rows of A are linearly independent.
(b) Assuming that A is nonsingular, the matrix B of statement (v)
(called the inverse of A and denoted by A
−1
) is unique.
(c) For any two square invertible matrices A and B of the same
dimensions, we have (AB)
−1
= B
−1
A
−1
.
Deﬁnition 1.1.7: The characteristic polynomial φ of an nn matrix
A is deﬁned by φ(λ) = det(λI − A), where I is the identity matrix
of the same size as A. The n (possibly repeated or complex) roots of
φ are called the eigenvalues of A. A nonzero vector x (with possibly
complex coordinates) such that Ax = λx, where λ is an eigenvalue of
A, is called an eigenvector of A associated with λ.
Proposition 1.1.11: Let A be a square matrix.
(a) A complex number λ is an eigenvalue of A if and only if there
exists a nonzero eigenvector associated with λ.
(b) A is singular if and only if it has an eigenvalue that is equal to
zero.
Note that the only use of complex numbers in this book is in relation
to eigenvalues and eigenvectors. All other matrices or vectors are implicitly
assumed to have real components.
Sec. 1.1 Linear Algebra and Analysis 17
Proposition 1.1.12: Let A be an n n matrix.
(a) If T is a nonsingular matrix and B = TAT
−1
, then the eigenval
ues of A and B coincide.
(b) For any scalar c, the eigenvalues of cI + A are equal to c +
λ1, . . . , c + λn, where λ1, . . . , λn are the eigenvalues of A.
(c) The eigenvalues of A
k
are equal to λ
k
1
, . . . , λ
k
n
, where λ
1
, . . . , λ
n
are the eigenvalues of A.
(d) If A is nonsingular, then the eigenvalues of A
−1
are the recipro
cals of the eigenvalues of A.
(e) The eigenvalues of A and A
coincide.
Symmetric and Positive Deﬁnite Matrices
Symmetric matrices have several special properties, particularly regarding
their eigenvalues and eigenvectors. In what follows in this section,  
denotes the Euclidean norm.
Proposition 1.1.13: Let A be a symmetric n n matrix. Then:
(a) The eigenvalues of A are real.
(b) The matrix A has a set of n mutually orthogonal, real, and
nonzero eigenvectors x1, . . . , xn.
(c) Suppose that the eigenvectors in part (b) have been normalized
so that x
i
 = 1 for each i. Then
A =
n
i=1
λ
i
x
i
x
i
,
where λ
i
is the eigenvalue corresponding to x
i
.
18 Convex Analysis and Optimization Chap. 1
Proposition 1.1.14: Let A be a symmetric n n matrix, and let
λ
1
≤ ≤ λ
n
be its (real) eigenvalues. Then:
(a) A = max
_
[λ
1
[, [λ
n
[
_
, where   is the matrix norm induced
by the Euclidean norm.
(b) λ1y
2
≤ y
Ay ≤ λny
2
for all y ∈ '
n
.
(c) If A is nonsingular, then
A
−1
 =
1
min
_
[λ
1
[, [λ
n
[
_.
Proposition 1.1.15: Let A be a square matrix, and let   be the
matrix norm induced by the Euclidean norm. Then:
(a) If A is symmetric, then A
k
 = A
k
for any positive integer k.
(b) A
2
= A
A = AA
.
Deﬁnition 1.1.8: A symmetric nn matrix A is called positive deﬁ
nite if x
Ax > 0 for all x ∈ '
n
, x ,= 0. It is called positive semideﬁnite
if x
Ax ≥ 0 for all x ∈ '
n
.
Throughout this book, the notion of positive deﬁniteness applies ex
clusively to symmetric matrices. Thus whenever we say that a matrix is
positive (semi)deﬁnite, we implicitly assume that the matrix is symmetric.
Proposition 1.1.16:
(a) The sum of two positive semideﬁnite matrices is positive semidef
inite. If one of the two matrices is positive deﬁnite, the sum is
positive deﬁnite.
(b) If A is a positive semideﬁnite n n matrix and T is an m
n matrix, then the matrix TAT
is positive semideﬁnite. If A
is positive deﬁnite and T is invertible, then TAT
is positive
deﬁnite.
Sec. 1.1 Linear Algebra and Analysis 19
Proposition 1.1.17:
(a) For any mn matrix A, the matrix A
Ais symmetric and positive
semideﬁnite. A
A is positive deﬁnite if and only if A has rank n.
In particular, if m = n, A
A is positive deﬁnite if and only if A
is nonsingular.
(b) A square symmetric matrix is positive semideﬁnite (respectively,
positive deﬁnite) if and only if all of its eigenvalues are nonneg
ative (respectively, positive).
(c) The inverse of a symmetric positive deﬁnite matrix is symmetric
and positive deﬁnite.
Proposition 1.1.18: Let A be a symmetric positive semideﬁnite nn
matrix of rank m. There exists an n m matrix C of rank m such
that
A = CC
.
Furthermore, for any such matrix C:
(a) A and C
have the same null space: N(A) = N(C
).
(b) A and C have the same range space: R(A) = R(C).
Proposition 1.1.19: Let A be a square symmetric positive semidef
inite matrix.
(a) There exists a symmetric matrix Q with the property Q
2
= A.
Such a matrix is called a symmetric square root of A and is de
noted by A
1/2
.
(b) There is a unique symmetric square root if and only if A is pos
itive deﬁnite.
(c) A symmetric square root A
1/2
is invertible if and only if A is
invertible. Its inverse is denoted by A
−1/2
.
(d) There holds A
−1/2
A
−1/2
= A
−1
.
(e) There holds AA
1/2
= A
1/2
A.
20 Convex Analysis and Optimization Chap. 1
1.1.4 Derivatives
Let f : '
n
→ ' be some function, ﬁx some x ∈ '
n
, and consider the
expression
lim
α→0
f(x + αe
i
) −f(x)
α
,
where e
i
is the ith unit vector (all components are 0 except for the ith
component which is 1). If the above limit exists, it is called the ith partial
derivative of f at the point x and it is denoted by (∂f/∂x
i
)(x) or ∂f(x)/∂x
i
(x
i
in this section will denote the ith coordinate of the vector x). Assuming
all of these partial derivatives exist, the gradient of f at x is deﬁned as the
column vector
∇f(x) =
_
¸
_
∂f(x)
∂x
1
.
.
.
∂f(x)
∂xn
_
¸
_.
For any y ∈ '
n
, we deﬁne the onesided directional derivative of f in
the direction y, to be
f
(x; y) = lim
α↓0
f(x +αy) −f(x)
α
,
provided that the limit exists. We note from the deﬁnitions that
f
(x; ei) = −f
(x; −ei) ⇒ f
(x; ei) = (∂f/∂xi)(x).
If the directional derivative of f at a vector x exists in all directions
y and f
(x; y) is a linear function of y, we say that f is diﬀerentiable at
x. This type of diﬀerentiability is also called Gateaux diﬀerentiability. It
is seen that f is diﬀerentiable at x if and only if the gradient ∇f(x) exists
and satisﬁes ∇f(x)
y = f
(x; y) for every y ∈ '
n
. The function f is called
diﬀerentiable over a given subset U of '
n
if it is diﬀerentiable at every
x ∈ U. The function f is called diﬀerentiable (without qualiﬁcation) if it
is diﬀerentiable at all x ∈ '
n
.
If f is diﬀerentiable over an open set U and the gradient ∇f(x) is
continuous at all x ∈ U, f is said to be continuously diﬀerentiable over U.
Such a function has the property
lim
y→0
f(x + y) −f(x) −∇f(x)
y
y
= 0, ∀ x ∈ U, (1.1)
where   is an arbitrary vector norm. If f is continuously diﬀerentiable
over '
n
, then f is also called a smooth function.
The preceding equation can also be used as an alternative deﬁnition
of diﬀerentiability. In particular, f is called Frechet diﬀerentiable at x
Sec. 1.1 Linear Algebra and Analysis 21
if there exists a vector g satisfying Eq. (1.1) with ∇f(x) replaced by g.
If such a vector g exists, it can be seen that all the partial derivatives
(∂f/∂x
i
)(x) exist and that g = ∇f(x). Frechet diﬀerentiability implies
(Gateaux) diﬀerentiability but not conversely (see for example [OrR70] for
a detailed discussion). In this book, when dealing with a diﬀerentiable
function f, we will always assume that f is continuously diﬀerentiable
(smooth) over a given open set [∇f(x) is a continuous function of x over
that set], in which case f is both Gateaux and Frechet diﬀerentiable, and
the distinctions made above are of no consequence.
The deﬁnitions of diﬀerentiability of f at a point x only involve the
values of f in a neighborhood of x. Thus, these deﬁnitions can be used
for functions f that are not deﬁned on all of '
n
, but are deﬁned instead
in a neighborhood of the point at which the derivative is computed. In
particular, for functions f : X → ' that are deﬁned over a strict subset
X of '
n
, we use the above deﬁnition of diﬀerentiability of f at a vector
x provided x is an interior point of the domain X. Similarly, we use the
above deﬁnition of diﬀerentiability or continuous diﬀerentiability of f over
a subset U, provided U is an open subset of the domain X. Thus any
mention of diﬀerentiability of a function f over a subset implicitly assumes
that this subset is open.
If f : '
n
→ '
m
is a vectorvalued function, it is called diﬀerentiable
(or smooth) if each component f
i
of f is diﬀerentiable (or smooth, respec
tively). The gradient matrix of f, denoted ∇f(x), is the n m matrix
whose ith column is the gradient ∇f
i
(x) of f
i
. Thus,
∇f(x) =
_
∇f
1
(x) ∇f
m
(x)
_
.
The transpose of ∇f is called the Jacobian of f and is a matrix whose ijth
entry is equal to the partial derivative ∂fi/∂xj.
Now suppose that each one of the partial derivatives of a function
f : '
n
→ ' is a smooth function of x. We use the notation (∂
2
f/∂x
i
∂x
j
)(x)
to indicate the ith partial derivative of ∂f/∂x
j
at a point x ∈ '
n
. The
Hessian of f is the matrix whose ijth entry is equal to (∂
2
f/∂x
i
∂x
j
)(x),
and is denoted by ∇
2
f(x). We have (∂
2
f/∂xi∂xj)(x) = (∂
2
f/∂xj∂xi)(x)
for every x, which implies that ∇
2
f(x) is symmetric.
If f : '
m+n
→ ' is a function of (x, y), where x = (x
1
, . . . , x
m
) ∈ '
m
and y = (y
1
, . . . , y
n
) ∈ '
n
, we write
∇
x
f(x, y) =
_
¸
_
∂f(x,y)
∂x
1
.
.
.
∂f(x,y)
∂xm
_
¸
_, ∇
y
f(x, y) =
_
¸
_
∂f(x,y)
∂y
1
.
.
.
∂f(x,y)
∂yn
_
¸
_.
We denote by ∇
2
xx
f(x, y), ∇
2
xy
f(x, y), and ∇
2
yy
f(x, y) the matrices with
components
_
∇
2
xx
f(x, y)
¸
ij
=
∂
2
f(x, y)
∂xi∂xj
,
_
∇
2
xy
f(x, y)
¸
ij
=
∂
2
f(x, y)
∂xi∂yj
,
22 Convex Analysis and Optimization Chap. 1
_
∇
2
yy
f(x, y)
¸
ij
=
∂
2
f(x, y)
∂y
i
∂y
j
.
If f : '
m+n
→ '
r
, f = (f
1
, f
2
, . . . , f
r
), we write
∇
x
f(x, y) =
_
∇
x
f
1
(x, y) ∇
x
f
r
(x, y)
¸
,
∇
y
f(x, y) =
_
∇
y
f
1
(x, y) ∇
y
f
r
(x, y)
¸
.
Let f : '
k
→ '
m
and g : '
m
→ '
n
be smooth functions, and let h
be their composition, i.e.,
h(x) = g
_
f(x)
_
.
Then, the chain rule for diﬀerentiation states that
∇h(x) = ∇f(x)∇g
_
f(x)
_
, ∀ x ∈ '
k
.
Some examples of useful relations that follow from the chain rule are:
∇
_
f(Ax)
_
= A
∇f(Ax), ∇
2
_
f(Ax)
_
= A
∇
2
f(Ax)A,
where A is a matrix,
∇
x
_
f
_
h(x), y
_
_
= ∇h(x)∇
h
f
_
h(x), y
_
,
∇
x
_
f
_
h(x), g(x)
_
_
= ∇h(x)∇
h
f
_
h(x), g(x)
_
+∇g(x)∇
g
f
_
h(x), g(x)
_
.
We now state some theorems relating to diﬀerentiable functions that
will be useful for our purposes.
Proposition 1.1.20: (Mean Value Theorem) Let f : '
n
→ ' be
continuously diﬀerentiable over an open sphere S, and let x be a vector
in S. Then for all y such that x+y ∈ S, there exists an α ∈ [0, 1] such
that
f(x + y) = f(x) +∇f(x + αy)
y.
Sec. 1.1 Linear Algebra and Analysis 23
Proposition 1.1.21: (Second Order Expansions) Let f : '
n
→
' be twice continuously diﬀerentiable over an open sphere S, and let
x be a vector in S. Then for all y such that x + y ∈ S:
(a) There holds
f(x +y) = f(x) + y
∇f(x) +
1
2
y
_
_
1
0
_
_
t
0
∇
2
f(x + τy)dτ
_
dt
_
y.
(b) There exists an α ∈ [0, 1] such that
f(x + y) = f(x) + y
∇f(x) +
1
2
y
∇
2
f(x + αy)y.
(c) There holds
f(x + y) = f(x) + y
∇f(x) +
1
2
y
∇
2
f(x)y + o
_
y
2
_
.
Proposition 1.1.22: (Implicit Function Theorem) Consider a
function f : '
n+m
→ '
m
of x ∈ '
n
and y ∈ '
m
such that:
(1) f(x, y) = 0.
(2) f is continuous, and has a continuous and nonsingular gradient
matrix ∇
y
f(x, y) in an open set containing (x, y).
Then there exist open sets S
x
⊂ '
n
and S
y
⊂ '
m
containing x and y,
respectively, and a continuous function φ : S
x
→ S
y
such that y = φ(x)
and f
_
x, φ(x)
_
= 0 for all x ∈ S
x
. The function φ is unique in the sense
that if x ∈ S
x
, y ∈ S
y
, and f(x, y) = 0, then y = φ(x). Furthermore,
if for some integer p > 0, f is p times continuously diﬀerentiable the
same is true for φ, and we have
∇φ(x) = −∇
x
f
_
x, φ(x)
_
_
∇
y
f
_
x, φ(x)
_
_
−1
, ∀ x ∈ S
x
.
As a ﬁnal word of caution to the reader, let us mention that one can
easily get confused with gradient notation and its use in various formulas,
such as for example the order of multiplication of various gradients in the
chain rule and the Implicit Function Theorem. Perhaps the safest guideline
to minimize errors is to remember our conventions:
24 Convex Analysis and Optimization Chap. 1
(a) A vector is viewed as a column vector (an n 1 matrix).
(b) The gradient ∇f of a scalar function f : '
n
→ ' is also viewed as a
column vector.
(c) The gradient matrix ∇f of a vector function f : '
n
→ '
m
with
components f
1
, . . . , f
m
is the n m matrix whose columns are the
(column) vectors ∇f
1
, . . . , ∇f
m
.
With these rules in mind one can use “dimension matching” as an eﬀective
guide to writing correct formulas quickly.
1.2 CONVEX SETS AND FUNCTIONS
We now introduce some of the basic notions relating to convex sets and
functions. The material of this section permeates all subsequent develop
ments in this book, and will be used in the next section for the discussion of
important issues in optimization, such as the existence of optimal solutions.
1.2.1 Basic Properties
The notion of a convex set is deﬁned below and is illustrated in Fig. 1.2.1.
Deﬁnition 1.2.1: Let C be a subset of '
n
. We say that C is convex
if
αx + (1 −α)y ∈ C, ∀ x, y ∈ C, ∀ α ∈ [0, 1]. (1.2)
Note that the empty set is by convention considered to be convex.
Generally, when referring to a convex set, it will usually be apparent from
the context whether this set can be empty, but we will often be speciﬁc in
order to minimize ambiguities.
The following proposition lists some operations that preserve convex
ity of a set.
Sec. 1.2 Convex Sets and Functions 25
Convex Sets Nonconvex Sets
x
y
αx + (1  α)y, 0 < α < 1
x
x
y
y
x
y
Figure 1.2.1. Illustration of the deﬁnition of a convex set. For convexity, linear
interpolation between any two points of the set must yield points that lie within
the set.
Proposition 1.2.1:
(a) The intersection ∩
i∈I
C
i
of any collection ¦C
i
[ i ∈ I¦ of convex
sets is convex.
(b) The vector sum C1 +C2 of two convex sets C1 and C2 is convex.
(c) The set λC is convex for any convex set C and scalar λ. Fur
thermore, if C is a convex set and λ
1
, λ
2
are positive scalars, we
have
(λ
1
+ λ
2
)C = λ
1
C +λ
2
C.
(d) The closure and the interior of a convex set are convex.
(e) The image and the inverse image of a convex set under an aﬃne
function are convex.
Proof: The proof is straightforward using the deﬁnition of convexity, cf.
Eq. (1.2). For example, to prove part (a), we take two points x and y
from ∩
i∈I
C
i
, and we use the convexity of C
i
to argue that the line segment
connecting x and y belongs to all the sets Ci, and hence, to their intersec
tion. The proofs of parts (b)(e) are similar and are left as exercises for the
reader. Q.E.D.
26 Convex Analysis and Optimization Chap. 1
A set C is said to be a cone if for all x ∈ C and λ > 0, we have λx ∈ C.
A cone need not be convex and need not contain the origin (although the
origin always lies in the closure of a nonempty cone). Several of the results
of the above proposition have analogs for cones (see the exercises).
Convex Functions
The notion of a convex function is deﬁned below and is illustrated in Fig.
1.2.2.
Deﬁnition 1.2.2: Let C be a convex subset of '
n
. A function f :
C → ' is called convex if
f
_
αx + (1 −α)y
_
≤ αf(x) +(1 −α)f(y), ∀ x, y ∈ C, ∀ α ∈ [0, 1].
(1.3)
The function f is called concave if −f is convex. The function f is
called strictly convex if the above inequality is strict for all x, y ∈ C
with x ,= y, and all α ∈ (0, 1). For a function f : X → ', we also say
that f is convex over the convex set C if the domain X of f contains
C and Eq. (1.3) holds, i.e., when the domain of f is restricted to C, f
becomes convex.
αf(x) + (1  α)f(y)
x y
C
z
f(z)
Figure 1.2.2. Illustration of the deﬁnition of a function that is convex over a
convex set C. The linear interpolation αf(x) + (1 − α)f(y) overestimates the
function value f(αx + (1 −α)y) for all α ∈ [0, 1].
If C is a convex set and f : C → ' is a convex function, the level
sets ¦x ∈ C [ f(x) ≤ γ¦ and ¦x ∈ C [ f(x) < γ¦ are convex for all scalars
Sec. 1.2 Convex Sets and Functions 27
γ. To see this, note that if x, y ∈ C are such that f(x) ≤ γ and f(y) ≤ γ,
then for any α ∈ [0, 1], we have αx +(1 −α)y ∈ C, by the convexity of C,
and f
_
αx + (1 − α)y
_
≤ αf(x) + (1 − α)f(y) ≤ γ, by the convexity of f.
However, the converse is not true; for example, the function f(x) =
_
[x[
has convex level sets but is not convex.
We generally prefer to deal with convex functions that are realvalued
and are deﬁned over the entire space '
n
(rather than over just a convex
subset). However, in some situations, prominently arising in the context of
duality theory, we will encounter functions f that are convex over a convex
subset C and cannot be extended to functions that are convex over '
n
. In
such situations, it may be convenient, instead of restricting the domain of
f to the subset C where f takes real values, to extend the domain to the
entire space '
n
, but allow f to take the value ∞.
We are thus motivated to introduce extended realvalued convex func
tions that can take the value of ∞ at some points. In particular, if C ⊂ '
n
is a convex set, a function f : C → (−∞, ∞] is called convex over C (or
simply convex when C = '
n
) if the condition
f
_
αx +(1 −α)y
_
≤ αf(x) + (1 −α)f(y), ∀ x, y ∈ C, ∀ α ∈ [0, 1]
holds. The function f is called strictly convex if the above inequality is strict
for all x and y in C such that f(x) < ∞ and f(y) < ∞. It can be seen that
if f is convex, the level sets ¦x ∈ C [ f(x) ≤ γ¦ and ¦x ∈ C [ f(x) < γ¦
are convex for all scalars γ.
One complication when dealing with extended realvalued functions
is that we sometimes need to be careful to exclude the unusual case where
f(x) = ∞ for all x ∈ C, in which case f is still convex, since it trivially
satisﬁes the above inequality (such functions are sometimes called improper
in the literature of convex functions, but we will not use this terminology
here). Another area of concern is working with functions that can take both
values −∞ and ∞, because of the possibility of the forbidden expression
∞−∞arising when sums of function values are considered. For this reason,
we will try to minimize the occurances of such functions. On occasion we
will deal with extended realvalued functions that can take the value −∞
but not the value ∞. In particular, a function f : C → [−∞, ∞), where
C is a convex set, is called concave if the function −f : C → (−∞, ∞] is
convex as per the preceding deﬁnition.
We deﬁne the eﬀective domain of an extended realvalued function
f : X → (−∞, ∞] to be the set
dom(f) =
_
x ∈ X [ f(x) < ∞
_
.
Note that if X is convex and f is convex over X, then dom(f) is a convex
set. Similarly, we deﬁne the eﬀective domain of an extended realvalued
function f : X → [−∞, ∞) to be the set dom(f) =
_
x ∈ X [ f(x) > −∞
_
.
28 Convex Analysis and Optimization Chap. 1
Note that by replacing the domain of an extended realvalued convex func
tion with its eﬀective domain, we can convert it to a realvalued function.
In this way, we can use results stated in terms of realvalued functions, and
we can also avoid calculations with ∞. Thus, the entire subject of con
vex functions can be developed without resorting to extended realvalued
functions. The reverse is also true, namely that extended realvalued func
tions can be adopted as the norm; for example, the classical treatment of
Rockafellar [Roc70] uses this approach.
Generally, functions that are realvalued over the entire space '
n
are
more convenient (and even essential) in numerical algorithms and also in
optimization analyses where a calculusoriented approach based on diﬀer
entiability is adopted. This is typically the case in nonconvex optimization,
where nonlinear equality and nonconvex inequality constraints are involved
(see Chapter 2). On the other hand, extended realvalued functions oﬀer
notational advantages in a convex programming setting (see Chapters 3
and 4), and in fact may be more natural because some basic construc
tions around duality involve extended realvalued functions. Since we plan
to deal with nonconvex as well as convex problems, and with analysis as
well as numerical methods, we will adopt a ﬂexible approach and use both
realvalued and extended realvalued functions. However, consistently with
prevailing optimization practice, we will generally prefer to avoid using ex
tended realvalued functions, unless there are strong notational or other
advantages for doing so.
Epigraphs and Semicontinuity
An extended realvalued function f : X → (−∞, ∞] is called lower semi
continuous at a vector x ∈ X if f(x) ≤ liminf
k→∞
f(x
k
) for every sequence
¦x
k
¦ converging to x. This deﬁnition is consistent with the corresponding
deﬁnition for realvalued functions [cf. Def. 1.1.5(c)]. If f is lower semicon
tinuous at every x in a subset U ⊂ X, we say that f is lower semicontinuous
over U.
The epigraph of a function f : X → (−∞, ∞], where X ⊂ '
n
, is the
subset of '
n+1
given by
epi(f) =
_
(x, w) [ x ∈ X, w ∈ ', f(x) ≤ w
_
;
(see Fig. 1.2.3). Note that if we restrict f to its eﬀective domain
_
x ∈
X [ f(x) < ∞
_
, so that it becomes realvalued, the epigraph remains
unaﬀected. Epigraphs are very useful for our purposes because questions
about convexity and lower semicontinuity of functions can be reduced to
corresponding questions about convexity and closure of their epigraphs, as
shown in the following proposition.
Sec. 1.2 Convex Sets and Functions 29
f(x)
x
Convex function
f(x)
x
Nonconvex function
Epigraph Epigraph
Figure 1.2.3. Illustration of the epigraphs of extended realvalued convex and
nonconvex functions.
Proposition 1.2.2: Let f : X → (−∞, ∞] be a function. Then:
(a) epi(f) is convex if and only if dom(f) is convex and f is convex
over dom(f).
(b) Assuming X = '
n
, the following are equivalent:
(i) The level set ¦x [ f(x) ≤ γ¦ is closed for all scalars γ.
(ii) f is lower semicontinuous over '
n
.
(iii) epi(f) is closed.
Proof: (a) Assume that dom(f) is convex and f is convex over dom(f).
If (x1, w1) and (x2, w2) belong to epi(f) and α ∈ [0, 1], we have x1, x2 ∈
dom(f) and
f(x
1
) ≤ w
1
, f(x
2
) ≤ w
2
,
and by multiplying these inequalities with α and (1 −α), respectively, by
adding, and by using the convexity of f, we obtain
f
_
αx
1
+ (1 −α)x
2
_
≤ αf(x
1
) + (1 −α)f(x
2
) ≤ αw
1
+(1 −α)w
2
.
Hence the vector
_
αx1 + (1 − α)x2, αw1 + (1 − α)w2
_
, which is equal to
α(x
1
, w
1
) + (1 − α)(x
2
, w
2
), belongs to epi(f), showing the convexity of
epi(f).
Conversely, assume that epi(f) is convex, and let x
1
, x
2
∈ dom(f)
and α ∈ [0, 1]. The pairs
_
x
1
, f(x
1
)
_
and
_
x
2
, f(x
2
)
_
belong to epi(f), so
by convexity, we have
_
αx
1
+ (1 −α)x
2
, αf(x
1
) + (1 −α)f(x
2
)
_
∈ epi(f).
30 Convex Analysis and Optimization Chap. 1
Therefore, by the deﬁnition of epi(f), it follows that αx
1
+ (1 − α)x
2
∈
dom(f), so dom(f) is convex, while
f
_
αx
1
+ (1 −α)x
2
_
≤ αf(x
1
) + (1 −α)f(x
2
),
so f is convex over dom(f).
(b) We ﬁrst show that (i) implies (ii). Assume that the level set
_
x [
f(x) ≤ γ
_
is closed for all scalars γ, ﬁx a vector x ∈ '
n
, and let ¦x
k
¦
be a sequence converging to x. If liminf
k→∞
f(x
k
) = ∞, then we are
done, so assume that liminf
k→∞
f(x
k
) < ∞. Let ¦x
k
¦
K
⊂ ¦x
k
¦ be a
subsequence along which the limit inferior of ¦f(x
k
)¦ is attained. Then for
γ such that liminf
k→∞
f(x
k
) < γ and all suﬃciently large k with k ∈ /,
we have f(x
k
) ≤ γ. Since each level set
_
x [ f(x) ≤ γ
_
is closed, it follows
that x belongs to all the level sets with liminf
k→∞
f(x
k
) < γ, implying
that f(x) ≤ liminf
k→∞
f(x
k
), and that f is lower semicontinuous at x.
Therefore, (i) implies (ii).
We next show that (ii) implies (iii). Assume that f is lower semicon
tinuous over '
n
, and let (x, w) be the limit of a sequence ¦(x
k
, w
k
)¦ ⊂
epi(f). Then we have f(x
k
) ≤ w
k
, and by taking the limit as k →
∞ and by using the lower semicontinuity of f at x, we obtain f(x) ≤
liminf
k→∞
f(x
k
) ≤ w. Hence, (x, w) ∈ epi(f) and epi(f) is closed. Thus,
(ii) implies (iii).
We ﬁnally show that (iii) implies (i). Assume that epi(f) is closed,
and let ¦x
k
¦ be a sequence that converges to some x and belongs to the
level set
_
x [ f(x) ≤ γ
_
for some γ. Then (x
k
, γ) ∈ epi(f) for all k and
(x
k
, γ) → (x, γ), so since epi(f) is closed, we have (x, γ) ∈ epi(f). Hence,
x belongs to the level set
_
x [ f(x) ≤ γ
_
, implying that this set is closed.
Therefore, (iii) implies (i). Q.E.D.
If the epigraph of a function f : X → (−∞, ∞] is a closed set, we say
that f is a closed function. Thus, if we extend the domain of f to '
n
and
consider the function
˜
f given by
˜
f(x) =
_
f(x) if x ∈ X,
∞ if x / ∈ X,
we see that according to the preceding proposition, f is closed if and only
˜
f is lower semicontinuous over '
n
. Note that if f is lower semicontinuous
over dom(f), it is not necessarily closed; take for example f to be constant
for x in some nonclosed set and ∞ otherwise. Furthermore, if f is closed
it is not necessarily true that dom(f) is closed; for example, the function
f(x) =
_
1
x
if x > 0,
∞ otherwise,
is closed but dom(f) is the open halfline of positive numbers. On the other
hand, if dom(f) is closed and f is lower semicontinuous over dom(f), then
Sec. 1.2 Convex Sets and Functions 31
f is closed because epi(f) is closed, as can be seen by reviewing the relevant
part of the proof of Prop. 1.2.2(b).
Common examples of convex functions are aﬃne functions and norms;
this is straightforward to verify using the deﬁnition of convexity. For exam
ple, for any x, y ∈ '
n
and any α ∈ [0, 1], by using the triangle inequality,
we have
αx + (1 −α)y ≤ αx +(1 −α)y = αx + (1 −α)y,
so the norm function   is convex. The following proposition provides
some means for recognizing convex functions, and gives some algebraic
operations that preserve convexity of a function.
Proposition 1.2.3:
(a) Let f
1
, . . . , f
m
: '
n
→ (−∞, ∞] be given functions, let λ
1
, . . . , λ
m
be positive scalars, and consider the function g : '
n
→ (−∞, ∞]
given by
g(x) = λ
1
f
1
(x) + + λ
m
f
m
(x).
If f
1
, . . . , f
m
are convex, then g is also convex, while if f
1
, . . . , f
m
are closed, then g is also closed.
(b) Let f : '
n
→ (−∞, ∞] be a given function, let A be an m n
matrix, and consider the function g : '
n
→ (−∞, ∞] given by
g(x) = f(Ax).
If f is convex, then g is also convex, while if f is closed, then g
is also closed.
(c) Let f
i
: '
n
→ (−∞, ∞] be given functions for i ∈ I, where I is an
arbitrary index set, and consider the function g : '
n
→ (−∞, ∞]
given by
g(x) = sup
i∈I
f
i
(x).
If f
i
, i ∈ I, are convex, then g is also convex, while if f
i
, i ∈ I,
are closed, then g is also closed.
Proof: (a) Let f
1
, . . . , f
m
be convex. We use the deﬁnition of convexity
32 Convex Analysis and Optimization Chap. 1
to write for any x, y ∈ '
n
and α ∈ [0, 1],
g
_
αx + (1 −α)y
_
=
m
i=1
λ
i
f
i
_
αx + (1 −α)y
_
≤
m
i=1
λ
i
_
αf
i
(x) + (1 −α)f
i
(y)
_
= α
m
i=1
λ
i
f
i
(x) + (1 −α)
m
i=1
λ
i
f
i
(y)
= αg(x) + (1 −α)g(y).
Hence g is convex.
Let the functions f1, . . . , fm be closed. Then they are lower semicon
tinuous at every x ∈ '
n
[cf. Prop. 1.2.2(b)], so for every sequence ¦x
k
¦
converging to x, we have f
i
(x) ≤ liminf
k→∞
f
i
(x
k
) for all i. Hence
g(x) ≤
m
i=1
λ
i
liminf
k→∞
f
i
(x
k
) ≤ liminf
k→∞
m
i=1
λ
i
f
i
(x
k
) = liminf
k→∞
g(x
k
).
where we have used Prop. 1.1.4(d) (the sum of the limit inferiors of se
quences is less or equal to the limit inferior of the sum sequence). There
fore, g is lower semicontinuous at all x ∈ '
n
, so by Prop. 1.2.2(b), it is
closed.
(b) This is straightforward, along the lines of the proof of part (a).
(c) A pair (x, w) belongs to the epigraph
epi(g) =
_
(x, w) [ g(x) ≤ w
_
if and only if fi(x) ≤ w for all i ∈ I, or (x, w) ∈ ∩
i∈I
epi(fi). Therefore,
epi(g) = ∩
i∈I
epi(f
i
).
If the f
i
are convex, the epigraphs epi(f
i
) are convex, so epi(g) is convex,
and g is convex by Prop. 1.2.2(a). If the f
i
are closed, then the epigraphs
epi(fi) are closed, so epi(g) is closed, and g is closed. Q.E.D.
Characterizations of Diﬀerentiable Convex Functions
For diﬀerentiable functions, there is an alternative characterization of con
vexity, given in the following proposition and illustrated in Fig. 1.2.4.
Sec. 1.2 Convex Sets and Functions 33
f(z)
f(x) + (z  x)'∇f(x)
x z
Figure 1.2.4. Characterization of convexity in terms of ﬁrst derivatives. The
condition f(z) ≥ f(x) + (z −x)
∇f(x) states that a linear approximation, based
on the ﬁrst order Taylor series expansion, underestimates a convex function.
Proposition 1.2.4: Let C ⊂ '
n
be a convex set and let f : '
n
→ '
be diﬀerentiable over '
n
.
(a) f is convex over C if and only if
f(z) ≥ f(x) + (z −x)
∇f(x), ∀ x, z ∈ C. (1.4)
(b) f is strictly convex over C if and only if the above inequality is
strict whenever x ,= z.
Proof: We prove (a) and (b) simultaneously. Assume that the inequality
(1.4) holds. Choose any x, y ∈ C and α ∈ [0, 1], and let z = αx +(1 −α)y.
Using the inequality (1.4) twice, we obtain
f(x) ≥ f(z) + (x −z)
∇f(z),
f(y) ≥ f(z) + (y −z)
∇f(z).
We multiply the ﬁrst inequality by α, the second by (1−α), and add them
to obtain
αf(x) + (1 −α)f(y) ≥ f(z) +
_
αx + (1 −α)y −z
_
∇f(z) = f(z),
which proves that f is convex. If the inequality (1.4) is strict as stated in
part (b), then if we take x ,= y and α ∈ (0, 1) above, the three preceding
inequalities become strict, thus showing the strict convexity of f.
Conversely, assume that f is convex, let x and z be any vectors in C
with x ,= z, and for α ∈ (0, 1), consider the function
g(α) =
f
_
x +α(z −x)
_
−f(x)
α
, α ∈ (0, 1].
34 Convex Analysis and Optimization Chap. 1
We will show that g(α) is monotonically increasing with α, and is strictly
monotonically increasing if f is strictly convex. This will imply that
(z −x)
∇f(x) = lim
α↓0
g(α) ≤ g(1) = f(z) −f(x),
with strict inequality if g is strictly monotonically increasing, thereby show
ing that the desired inequality (1.4) holds (and holds strictly if f is strictly
convex). Indeed, consider any α1, α2, with 0 < α1 < α2 < 1, and let
α =
α
1
α
2
, z = x + α
2
(z −x). (1.5)
We have
f
_
x + α(z −x)
_
≤ αf(z) + (1 −α)f(x),
or
f
_
x + α(z −x)
_
−f(x)
α
≤ f(z) −f(x), (1.6)
and the above inequalities are strict if f is strictly convex. Substituting the
deﬁnitions (1.5) in Eq. (1.6), we obtain after a straightforward calculation
f
_
x + α
1
(z −x)
_
−f(x)
α
1
≤
f
_
x + α
2
(z −x)
_
−f(x)
α
2
,
or
g(α
1
) ≤ g(α
2
),
with strict inequality if f is strictly convex. Hence g is monotonically
increasing with α, and strictly so if f is strictly convex. Q.E.D.
Note a simple consequence of Prop. 1.2.4(a): if f : '
n
→ ' is a convex
function and ∇f(x
∗
) = 0, then x
∗
minimizes f over '
n
. This is a classical
suﬃcient condition for unconstrained optimality, originally formulated (in
one dimension) by Fermat in 1637.
For twice diﬀerentiable convex functions, there is another characteri
zation of convexity as shown by the following proposition.
Proposition 1.2.5: Let C ⊂ '
n
be a convex set and let f : '
n
→ '
be twice continuously diﬀerentiable over '
n
.
(a) If ∇
2
f(x) is positive semideﬁnite for all x ∈ C, then f is convex
over C.
(b) If ∇
2
f(x) is positive deﬁnite for all x ∈ C, then f is strictly
convex over C.
(c) If C = '
n
and f is convex, then ∇
2
f(x) is positive semideﬁnite
for all x.
Sec. 1.2 Convex Sets and Functions 35
Proof: (a) By Prop. 1.1.21(b), for all x, y ∈ C we have
f(y) = f(x) + (y −x)
∇f(x) +
1
2
(y −x)
∇
2
f
_
x + α(y −x)
_
(y −x)
for some α ∈ [0, 1]. Therefore, using the positive semideﬁniteness of ∇
2
f,
we obtain
f(y) ≥ f(x) + (y −x)
∇f(x), ∀ x, y ∈ C.
From Prop. 1.2.4(a), we conclude that f is convex.
(b) Similar to the proof of part (a), we have f(y) > f(x) + (y −x)
∇f(x)
for all x, y ∈ C with x ,= y, and the result follows from Prop. 1.2.4(b).
(c) Suppose that f : '
n
→ ' is convex and suppose, to obtain a con
tradiction, that there exist some x ∈ '
n
and some z ∈ '
n
such that
z
∇
2
f(x)z < 0. Using the continuity of ∇
2
f, we see that we can choose z
with small enough norm so that z
∇
2
f(x + αz)z < 0 for every α ∈ [0, 1].
Then, using again Prop. 1.1.21(b), we obtain f(x + z) < f(x) + z
∇f(x),
which, in view of Prop. 1.2.4(a), contradicts the convexity of f. Q.E.D.
If f is convex over a strict subset C ⊂ '
n
, it is not necessarily true
that ∇
2
f(x) is positive semideﬁnite at any point of C [take for example
n = 2, C = ¦(x
1
, 0) [ x
1
∈ '¦, and f(x) = x
2
1
− x
2
2
]. A strengthened
version of Prop. 1.2.5 is given in the exercises. It can be shown that the
conclusion of Prop. 1.2.5(c) also holds if C is assumed to have nonempty
interior instead of being equal to '
n
.
The following proposition considers a strengthened form of strict con
vexity characterized by the following equation:
_
∇f(x) −∇f(y)
_
(x −y) ≥ αx −y
2
, ∀ x, y ∈ '
n
, (1.7)
where α is some positive number. Convex functions with this property are
called strongly convex with coeﬃcient α.
Proposition 1.2.6: (Strong Convexity) Let f : '
n
→ ' be
smooth. If f is strongly convex with coeﬃcient α, then f is strictly
convex. Furthermore, if f is twice continuously diﬀerentiable, then
strong convexity of f with coeﬃcient α is equivalent to the positive
semideﬁniteness of ∇
2
f(x) − αI for every x ∈ '
n
, where I is the
identity matrix.
Proof: Fix some x, y ∈ '
n
such that x ,= y, and deﬁne the function
h : ' → ' by h(t) = f
_
x + t(y − x)
_
. Consider some t, t
∈ ' such that
36 Convex Analysis and Optimization Chap. 1
t < t
. Using the chain rule and Eq. (1.7), we have
_
dh
dt
(t
) −
dh
dt
(t)
_
(t
−t)
=
_
∇f
_
x + t
(y −x)
_
−∇f
_
x +t(y −x)
_
_
(y −x)(t
−t)
≥ α(t
−t)
2
x −y
2
> 0.
Thus, dh/dt is strictly increasing and for any t ∈ (0, 1), we have
h(t) −h(0)
t
=
1
t
_
t
0
dh
dτ
(τ) dτ <
1
1 −t
_
1
t
dh
dτ
(τ) dτ =
h(1) −h(t)
1 −t
.
Equivalently, th(1) +(1 −t)h(0) > h(t). The deﬁnition of h yields tf(y) +
(1 − t)f(x) > f
_
ty + (1 − t)x
_
. Since this inequality has been proved for
arbitrary t ∈ (0, 1) and x ,= y, we conclude that f is strictly convex.
Suppose now that f is twice continuously diﬀerentiable and Eq. (1.7)
holds. Let c be a scalar. We use Prop. 1.1.21(b) twice to obtain
f(x +cy) = f(x) + cy
∇f(x) +
c
2
2
y
∇
2
f(x + tcy)y,
and
f(x) = f(x + cy) −cy
∇f(x + cy) +
c
2
2
y
∇
2
f(x +scy)y,
for some t and s belonging to [0, 1]. Adding these two equations and using
Eq. (1.7), we obtain
c
2
2
y
_
∇
2
f(x+scy)+∇
2
f(x+tcy)
_
y =
_
∇f(x+cy)−∇f(x)
_
(cy) ≥ αc
2
y
2
.
We divide both sides by c
2
and then take the limit as c → 0 to conclude
that y
∇
2
f(x)y ≥ αy
2
. Since this inequality is valid for every y ∈ '
n
, it
follows that ∇
2
f(x) −αI is positive semideﬁnite.
For the converse, assume that ∇
2
f(x) − αI is positive semideﬁnite
for all x ∈ '
n
. Consider the function g : ' →' deﬁned by
g(t) = ∇f
_
tx + (1 −t)y
_
(x −y).
Using the Mean Value Theorem (Prop. 1.1.20), we have
_
∇f(x)−∇f(y)
_
(x−
y) = g(1) −g(0) = (dg/dt)(t) for some t ∈ [0, 1]. The result follows because
dg
dt
(t) = (x −y)
∇
2
f
_
tx + (1 −t)y
_
(x −y) ≥ αx −y
2
,
where the last inequality is a consequence of the positive semideﬁniteness
of ∇
2
f
_
tx + (1 −t)y
_
−αI. Q.E.D.
Sec. 1.2 Convex Sets and Functions 37
As an example, consider the quadratic function
f(x) = x
Qx,
where Q is a symmetric matrix. By Prop. 1.2.5, the function f is convex
if and only if Q is positive semideﬁnite. Furthermore, by Prop. 1.2.6, f is
strongly convex with coeﬃcient α if and only if ∇
2
f(x) −αI = 2Q−αI is
positive semideﬁnite for some α > 0. Thus f is strongly convex with some
positive coeﬃcient (as well as strictly convex) if and only if Q is positive
deﬁnite.
1.2.2 Convex and Aﬃne Hulls
Let X be a subset of '
n
. A convex combination of elements of X is a vector
of the form
m
i=1
α
i
x
i
, where m is a positive integer, x
1
, . . . , x
m
belong to
X, and α1, . . . , αm are scalars such that
α
i
≥ 0, i = 1, . . . , m,
m
i=1
α
i
= 1.
Note that if X is convex, then the convex combination
m
i=1
α
i
x
i
belongs
to X (this is easily shown by induction; see the exercises), and for any
function f : '
n
→' that is convex over X, we have
f
_
m
i=1
α
i
x
i
_
≤
m
i=1
α
i
f(x
i
). (1.8)
This follows by using repeatedly the deﬁnition of convexity. The preceding
relation is a special case of Jensen’s inequality and can be used to prove a
number of interesting inequalities in applied mathematics and probability
theory.
The convex hull of a set X, denoted conv(X), is the intersection of
all convex sets containing X, and is a convex set by Prop. 1.2.1(a). It is
straightforward to verify that the set of all convex combinations of elements
of X is convex, and is equal to the convex hull conv(X) (see the exercises).
In particular, if X consists of a ﬁnite number of vectors x1, . . . , xm, its
convex hull is
conv
_
¦x1, . . . , xm¦
_
=
_
m
i=1
αixi
¸
¸
¸ αi ≥ 0, i = 1, . . . , m,
m
i=1
αi = 1
_
.
We recall that an aﬃne set M is a set of the form x + S, where S
is a subspace, called the subspace parallel to M. If X is a subset of '
n
,
the aﬃne hull of X, denoted aﬀ(X), is the intersection of all aﬃne sets
containing X. Note that aﬀ(X) is itself an aﬃne set and that it contains
38 Convex Analysis and Optimization Chap. 1
conv(X). It can be seen that the aﬃne hull of X, the aﬃne hull of the
convex hull conv(X), and the aﬃne hull of the closure cl(X) coincide (see
the exercises). For a convex set C, the dimension of C is deﬁned to be the
dimension of aﬀ(C).
Given a subset X ⊂ '
n
, a nonnegative combination of elements of X
is a vector of the form
m
i=1
α
i
x
i
, where m is a positive integer, x
1
, . . . , x
m
belong to X, and α
1
, . . . , α
m
are nonnegative scalars. If the scalars α
i
are all positive, the combination
m
i=1
αixi is said to be positive. The
cone generated by X, denoted by cone(X), is the set of all nonnegative
combinations of elements of X. It is easily seen that cone(X) is a convex
cone, although it need not be closed [cone(X) can be shown to be closed
in special cases, such as when X is a ﬁnite set – this is one of the central
results of polyhedral convexity and will be shown in Section 1.6].
The following is a fundamental characterization of convex hulls.
Proposition 1.2.7: (Caratheodory’s Theorem) Let X be a sub
set of '
n
.
(a) Every x in cone(X) can be represented as a positive combination
of vectors x
1
, . . . , x
m
∈ X that are linearly independent, where
m is a positive integer with m ≤ n.
(b) Every x in conv(X) can be represented as a convex combination
of vectors x1, . . . , xm ∈ X such that x2 − x1, . . . , xm − x1 are
linearly independent, where m is a positive integer with m ≤
n + 1.
Proof: (a) Let x be a nonzero vector in the cone(X), and let m be the
smallest integer such that x has the form
m
i=1
α
i
x
i
, where α
i
> 0 and
xi ∈ X for all i = 1, . . . , m. If the vectors xi were linearly dependent, there
would exist scalars λ
1
, . . . , λ
m
, with
m
i=1
λ
i
x
i
= 0 and at least one of the
λ
i
is positive. Consider the linear combination
m
i=1
(α
i
−γλ
i
)x
i
, where γ
is the largest γ such that α
i
−γλ
i
≥ 0 for all i. This combination provides
a representation of x as a positive combination of fewer than m vectors of
X – a contradiction. Therefore, x
1
, . . . , x
m
, are linearly independent, and
since any linearly independent set of vectors contains at most n elements,
we must have m ≤ n.
(b) The proof will be obtained by applying part (a) to the subset of '
n+1
given by
Y =
_
(x, 1) [ x ∈ X
_
.
If x ∈ conv(X), then x =
m
i=1
γ
i
x
i
for some positive γ
i
with 1 =
m
i=1
γ
i
,
i.e., (x, 1) ∈ cone(Y ). By part (a), we have (x, 1) =
m
i=1
α
i
(x
i
, 1), for
some positive α
1
, . . . , α
m
and vectors (x
1
, 1), . . . , (x
m
, 1), which are linearly
Sec. 1.2 Convex Sets and Functions 39
independent (implying that m ≤ n + 1) or equivalently
x =
m
i=1
α
i
x
i
, 1 =
m
i=1
α
i
.
Finally, to show that x
2
−x
1
, . . . , x
m
−x
1
are linearly independent, assume
to arrive at a contradiction, that there exist λ
2
, . . . , λ
m
, not all 0, such that
m
i=2
λ
i
(x
i
−x
1
) = 0.
Equivalently, deﬁning λ
1
= −(λ
2
+ + λ
m
), we have
m
i=1
λi(xi, 1) = 0,
which contradicts the linear independence of (x
1
, 1), . . . , (x
m
, 1). Q.E.D.
It is not generally true that the convex hull of a closed set is closed
[take for instance the convex hull of the set consisting of the origin and the
subset ¦(x1, x2) [ x1x2 = 1, x1 ≥ 0, x2 ≥ 0¦ of '
2
]. We have, however, the
following.
Proposition 1.2.8: The convex hull of a compact set is compact.
Proof: Let X be a compact subset of '
n
. By Caratheodory’s Theorem,
a sequence in conv(X) can be expressed as
_
n+1
i=1
α
k
i
x
k
i
_
, where for all k
and i, α
k
i
≥ 0, x
k
i
∈ X, and
n+1
i=1
α
k
i
= 1. Since the sequence
_
(α
k
1
, . . . , α
k
n+1
, x
k
1
, . . . , x
k
n+1
)
_
belongs to a compact set, it has a limit point
_
(α
1
, . . . , α
n+1
, x
1
, . . . , x
n+1
)
_
such that
n+1
i=1
α
i
= 1, and for all i, α
i
≥ 0, and x
i
∈ X. Thus, the vector
n+1
i=1
α
i
x
i
, which belongs to conv(X), is a limit point of the sequence
_
n+1
i=1
α
k
i
x
k
i
_
, showing that conv(X) is compact. Q.E.D.
1.2.3 Closure, Relative Interior, and Continuity
We now consider some generic topological properties of convex sets and
functions. Let C be a nonempty convex subset of '
n
. The closure cl(C)
of C is also a nonempty convex set (Prop. 1.2.1). While the interior of C
40 Convex Analysis and Optimization Chap. 1
may be empty, it turns out that convexity implies the existence of interior
points relative to the aﬃne hull of C. This is an important property, which
we now formalize.
We say that x is a relative interior point of C, if x ∈ C and there
exists a neighborhood N of x such that N∩aﬀ(C) ⊂ C, i.e., x is an interior
point of C relative to aﬀ(C). The relative interior of C, denoted ri(C), is
the set of all relative interior points of C. For example, if C is a line
segment connecting two distinct points in the plane, then ri(C) consists of
all points of C except for the end points. The set C is said to be relatively
open if ri(C) = C. A point x ∈ cl(C) such that x / ∈ ri(C) is said to be a
relative boundary point of C. The set of all relative boundary points of C
is called the relative boundary of C.
The following proposition gives some basic facts about relative interior
points.
Proposition 1.2.9: Let C be a nonempty convex set.
(a) (Line Segment Principle) If x ∈ ri(C) and x ∈ cl(C), then all
points on the line segment connecting x and x, except possibly
x, belong to ri(C).
(b) (Nonemptiness of Relative Interior) ri(C) is a nonempty and con
vex set, and has the same aﬃne hull as C. In fact, if m is the di
mension of aﬀ(C) and m > 0, there exist vectors x
0
, x
1
, . . . , x
m
∈
ri(C) such that x
1
−x
0
, . . . , x
m
−x
0
span the subspace parallel
to aﬀ(C).
(c) x ∈ ri(C) if and only if every line segment in C having x as one
endpoint can be prolonged beyond x without leaving C [i.e., for
every x ∈ C, there exists a γ > 1 such that x+(γ−1)(x−x) ∈ C].
Proof: (a) In the case where x ∈ C, see Fig. 1.2.5. In the case where
x / ∈ C, to show that for any α ∈ (0, 1] we have xα = αx+(1 −α)x ∈ ri(C),
consider a sequence ¦x
k
¦ ⊂ C that converges to x, and let x
k,α
= αx+(1−
α)x
k
. Then as in Fig. 1.2.5, we see that ¦z [ z −x
k,α
 < α¦ ∩aﬀ(C) ⊂ C
for all k. Since for large enough k, we have
¦z [ z −x
α
 < α/2¦ ⊂ ¦z [ z −x
k,α
 < α¦,
it follows that ¦z [ z − x
α
 < α/2¦ ∩ aﬀ(C) ⊂ C, which shows that
xα ∈ ri(C).
(b) Convexity of ri(C) follows from the Line Segment Principle of part (a).
By using a translation argument if necessary, we assume without loss of
generality that 0 ∈ C. Then, the aﬃne hull of C is a subspace of dimension
m. If m = 0, then C and aﬀ(C) consist of a single point, which is a unique
Sec. 1.2 Convex Sets and Functions 41
S
S
α
x
ε
αε
x
x
α
= αx + (1  α)x
C
Figure 1.2.5. Proof of the Line Segment Principle for the case where x ∈ C. Since
x ∈ ri(C), there exists a sphere S = {z  z −x < } such that S ∩ aﬀ(C) ⊂ C.
For all α ∈ (0, 1], let xα = αx + (1 −α)x and let Sα = {z  z −xα < α}. It
can be seen that each point of Sα ∩aﬀ(C) is a convex combination of x and some
point of S ∩ aﬀ(C). Therefore, Sα ∩ aﬀ(C) ⊂ C, implying that xα ∈ ri(C).
relative interior point. If m > 0, we can ﬁnd m linearly independent vectors
z
1
, . . . , z
m
in C that span aﬀ(C); otherwise there would exist r < m linearly
independent vectors in C whose span contains C, contradicting the fact that
the dimension of aﬀ(C) is m. Thus z
1
, . . . , z
m
form a basis for aﬀ(C).
Consider the set
X =
_
x
¸
¸
¸ x =
m
i=1
αizi,
m
i=1
αi < 1, αi > 0, i = 1, . . . , m
_
(see Fig. 1.2.6). This set is open relative to aﬀ(C), i.e., for every x ∈ X,
there exists an open set N such that x ∈ N and N ∩ aﬀ(C) ⊂ X. [To see
this, note that X is the inverse image of the open set in '
m
_
(α
1
, . . . , α
m
)
¸
¸
¸
m
i=1
α
i
< 1, α
i
> 0, i = 1, . . . , m
_
under the linear transformation from aﬀ(C) to '
m
that maps
m
i=1
α
i
z
i
into (α
1
, . . . , α
m
); openness of the above set follows by continuity of linear
transformation.] Therefore all points of X are relative interior points of
C, and ri(C) is nonempty. Since by construction, aﬀ(X) = aﬀ(C) and
X ⊂ ri(C), it follows that ri(C) and C have the same aﬃne hull.
To show the last assertion of part (b), consider vectors
x
0
= α
m
i=1
z
i
, x
i
= x
0
+ αz
i
, i = 1, . . . , m,
42 Convex Analysis and Optimization Chap. 1
where α is a positive scalar such that α(m+1) < 1. The vectors x
0
, . . . , x
m
are in the set X and in the relative interior of C, since X ⊂ ri(C). Further
more, because x
i
− x
0
= αz
i
for all i and vectors z
1
, . . . , z
m
span aﬀ(C),
the vectors x
1
−x
0
, . . . , x
m
−x
0
also span aﬀ(C).
(c) If x ∈ ri(C), the given condition holds by the Line Segment Principle.
Conversely, let x satisfy the given condition. We will show that x ∈ ri(C).
By part (b), there exists a vector x ∈ ri(C). We may assume that x ,= x,
since otherwise we are done. By the given condition, since x is in C,
there is a γ > 1 such that y = x + (γ − 1)(x − x) ∈ C. Then we have
x = (1 − α)x + αy, where α = 1/γ ∈ (0, 1), so by part (a), we obtain
x ∈ ri(C). Q.E.D.
X
z
1
0
C
z
2
Figure 1.2.6. Construction of the relatively open set X in the proof of nonempti
ness of the relative interior of a convex set C that contains the origin. We choose
m linearly independent vectors z1, . . . , zm ∈ C, where m is the dimension of
aﬀ(C), and let
X =
_
m
i=1
α
i
z
i
¸
¸
¸
m
i=1
α
i
< 1, α
i
> 0, i = 1, . . . , m
_
.
In view of Prop. 1.2.9(b), C and ri(C) all have the same dimension.
It can also be shown that C and cl(C) have the same dimension (see the
exercises). The next proposition gives several properties of closures and
relative interiors of convex sets.
Sec. 1.2 Convex Sets and Functions 43
Proposition 1.2.10: Let C be a nonempty convex set.
(a) cl(C) = cl
_
ri(C)
_
.
(b) ri(C) = ri
_
cl(C)
_
.
(c) Let C be another nonempty convex set. Then the following three
conditions are equivalent:
(i) C and C have the same relative interior.
(ii) C and C have the same closure.
(iii) ri(C) ⊂ C ⊂ cl(C).
(d) A ri(C) = ri(A C) for all mn matrices A.
(e) If C is bounded, then A cl(C) = cl(A C) for all mn matrices
A.
Proof: (a) Since ri(C) ⊂ C, we have cl
_
ri(C)
_
⊂ cl(C). Conversely, let
x ∈ cl(C). We will show that x ∈ cl
_
ri(C)
_
. Let x be any point in ri(C)
[there exists such a point by Prop. 1.2.9(b)], and assume that that x ,= x
(otherwise we are done). By the Line Segment Principle [Prop. 1.2.9(a)],
we have αx +(1 −α)x ∈ ri(C) for all α ∈ (0, 1]. Thus, x is the limit of the
sequence
_
(1/k)x +(1 −1/k)x [ k ≥ 1
_
that lies in ri(C), so x ∈ cl
_
ri(C)
_
.
(b) The inclusion C ⊂ cl(C) follows from the deﬁnition of a relative interior
point and the fact aﬀ(C) = aﬀ
_
cl(C)
_
(see the exercises). To prove the
reverse inclusion, let z ∈ ri
_
cl(C)
_
. We will show that z ∈ ri(C). By Prop.
1.2.9(b), there exists an x ∈ ri(C). We may assume that x ,= z (otherwise
we are done). We choose γ > 1, with γ suﬃciently close to 1 so that the
vector y = z + (γ −1)(z −x) belongs to ri
_
cl(C)
_
[cf. Prop. 1.2.9(c)], and
hence also to cl(C). Then we have z = (1−α)x+αy where α = 1/γ ∈ (0, 1),
so by the Line Segment Principle [Prop. 1.2.9(a)], we obtain z ∈ ri(C).
(c) If ri(C) = ri(C), part (a) implies that cl(C) = cl(C). Similarly, if
cl(C) = cl(C), part (b) implies that ri(C) = ri(C). Furthermore, if any
of these conditions hold the relation ri(C) ⊂ C ⊂ cl(C) implies condition
(iii). Finally, assume that condition (iii) holds. Then by taking closures,
we have cl
_
ri(C)
_
⊂ cl(C) ⊂ cl(C), and by using part (a), we obtain
cl(C) ⊂ cl(C) ⊂ cl(C). Hence C and C have the same closure.
(d) For any set X, we have A cl(X) ⊂ cl(A X), since if a sequence
¦x
k
¦ ⊂ X converges to some x ∈ cl(X) then the sequence ¦Ax
k
¦ ⊂ A X
converges to Ax, implying that Ax ∈ cl(A X). We use this fact and part
(a) to write
A ri(C) ⊂ A C ⊂ A cl(C) = A cl
_
ri(C)
_
⊂ cl
_
A ri(C)
_
.
44 Convex Analysis and Optimization Chap. 1
Thus AC lies between the set Ari(C) and the closure of that set, implying
that the relative interiors of the sets A C and A ri(C) are equal [part (c)].
Hence ri(AC) ⊂ Ari(C). We will show the reverse inclusion by taking any
z ∈ A ri(C) and showing that z ∈ ri(A C). Let x be any vector in A C,
and let z ∈ ri(C) and x ∈ C be such that Az = z and Ax = x. By part
Prop. 1.2.9(c), there exists γ > 1 such that the vector y = z +(γ −1)(z−x)
belongs to C. Thus we have Ay ∈ A C and Ay = z +(γ −1)(z −x), so by
Prop. 1.2.9(c) it follows that z ∈ ri(A C).
(e) By the argument given in part (d), we have A cl(C) ⊂ cl(A C). To
show the converse, choose any x ∈ cl(A C). Then, there exists a sequence
¦x
k
¦ ⊂ C such that Ax
k
→ x. Since C is bounded, ¦x
k
¦ has a subsequence
that converges to some x ∈ cl(C), and we must have Ax = x. It follows
that x ∈ A cl(C). Q.E.D.
Note that if C is closed but unbounded, the set A C need not be
closed [cf. part (e) of the above proposition]. For example take the closed
set C =
_
(x
1
, x
2
) [ x
1
x
2
≥ 1, x
1
≥ 0, x
2
≥ 0
_
and let A have the eﬀect
of projecting the typical vector x on the horizontal axis, i.e., A (x
1
, x
2
) =
(x1, 0). Then A C is the (nonclosed) halﬂine
_
(x1, x2) [ x1 > 0, x2 = 0
_
.
Proposition 1.2.11: Let C
1
and C
2
be nonempty convex sets:
(a) Assume that the sets ri(C1) and ri(C2) have a nonempty inter
section. Then
cl(C
1
∩ C
2
) = cl(C
1
) ∩ cl(C
2
), ri(C
1
∩ C
2
) = ri(C
1
) ∩ ri(C
2
).
(b) ri(C1 + C2) = ri(C1) + ri(C2).
Proof: (a) Let y ∈ cl(C
1
) ∩ cl(C
2
). Let x be a vector in the intersection
ri(C
1
) ∩ ri(C
2
) (which is nonempty by assumption). By the Line Segment
Principle [Prop. 1.2.9(a)], the vector αx+(1−α)y belongs to ri(C1)∩ri(C2)
for all α ∈ (0, 1]. Hence y is the limit of a sequence α
k
x + (1 − α
k
)y ⊂
ri(C
1
) ∩ ri(C
2
) with α
k
→ 0, implying that y ∈ cl
_
ri(C
1
) ∩ ri(C
2
)
_
. Hence,
we have
cl(C
1
) ∩ cl(C
2
) ⊂ cl
_
ri(C
1
) ∩ ri(C
2
)
_
⊂ cl(C
1
∩ C
2
).
Also C
1
∩ C
2
is contained in cl(C
1
) ∩ cl(C
2
), which is a closed set, so we
have
cl(C
1
∩ C
2
) ⊂ cl(C
1
) ∩ cl(C
2
).
Thus, equality holds throughout in the preceding two relations, so that
cl(C
1
∩ C
2
) = cl(C
1
) ∩ cl(C
2
). Furthermore, the sets ri(C
1
) ∩ ri(C
2
) and
Sec. 1.2 Convex Sets and Functions 45
C
1
∩ C
2
have the same closure. Therefore, by Prop. 1.2.10(c), they have
the same relative interior, implying that
ri(C
1
∩ C
2
) ⊂ ri(C
1
) ∩ ri(C
2
).
To show the converse, take any x ∈ ri(C
1
) ∩ ri(C
2
) and any y ∈
C
1
∩ C
2
. By Prop. 1.2.9(c), the line segment connecting x and y can be
prolonged beyond x by a small amount without leaving C1 ∩ C2. By the
same proposition, it follows that x ∈ ri(C
1
∩ C
2
).
(b) Consider the linear transformation A : '
2n
→ '
n
given by A(x
1
, x
2
) =
x
1
+ x
2
for all x
1
, x
2
∈ '
n
. The relative interior of the Cartesian product
C
1
C
2
(viewed as a subset of '
2n
) is easily seen to be ri(C
1
) ri(C
2
) (see
the exercises). Since A(C1 C2) = C1 + C2, the result follows from Prop.
1.2.10(d). Q.E.D.
The requirement that ri(C
1
) ∩ ri(C
2
) ,= Ø is essential in part (a) of
the above proposition. As an example, consider the subsets of the real
line C
1
= ¦x [ x > 0¦ and C
2
= ¦x [ x < 0¦. Then we have cl(C
1
∩
C2) = Ø ,= ¦0¦ = cl(C1) ∩ cl(C2). Also, consider C1 = ¦x [ x ≥ 0¦ and
C
2
= ¦x [ x ≤ 0¦. Then we have ri(C
1
∩ C
2
) = ¦0¦ , = Ø = ri(C
1
) ∩ ri(C
2
).
Continuity of Convex Functions
We close this section with a basic result on the continuity properties of
convex functions.
Proposition 1.2.12: If f : '
n
→ ' is convex, then it is continuous.
More generally, if f : '
n
→ (−∞, ∞] is convex, then f when restricted
to dom(f), is continuous over the relative interior of dom(f).
Proof: Restricting attention to the aﬃne hull of dom(f) and using a
transformation argument if necessary, we assume without loss of gener
ality, that the origin is an interior point of dom(f) and that the unit cube
X = ¦x [ x
∞
≤ 1¦ is contained in dom(f). It will suﬃce to show that f
is continuous at 0, i.e, that for any sequence ¦x
k
¦ ⊂ '
n
that converges to
0, we have f(x
k
) → f(0).
Let e
i
, i = 1, . . . , 2
n
, be the corners of X, i.e., each e
i
is a vector whose
entries are either 1 or −1. It is not diﬃcult to see that any x ∈ X can
be expressed in the form x =
2
n
i=1
α
i
e
i
, where each α
i
is a nonnegative
scalar and
2
n
i=1
α
i
= 1. Let A = max
i
f(e
i
). From Jensen’s inequality
[Eq. (1.8)], it follows that f(x) ≤ A for every x ∈ X.
For the purpose of proving continuity at zero, we can assume that
x
k
∈ X and x
k
,= 0 for all k. Consider the sequences ¦y
k
¦ and ¦z
k
¦ given
46 Convex Analysis and Optimization Chap. 1
by
y
k
=
x
k
x
k

∞
, z
k
= −
x
k
x
k

∞
;
(cf. Fig. 1.2.7). Using the deﬁnition of a convex function for the line
segment that connects y
k
, x
k
, and 0, we have
f(x
k
) ≤
_
1 −x
k

∞
_
f(0) +x
k

∞
f(y
k
).
Since x
k

∞
→ 0 and f(y
k
) ≤ A for all k, by taking the limit as k → ∞,
we obtain
limsup
k→∞
f(x
k
) ≤ f(0).
Using the deﬁnition of a convex function for the line segment that connects
x
k
, 0, and z
k
, we have
f(0) ≤
x
k

∞
x
k

∞
+ 1
f(z
k
) +
1
x
k

∞
+ 1
f(x
k
)
and letting k → ∞, we obtain
f(0) ≤ liminf
k→∞
f(x
k
).
Thus, lim
k→∞
f(x
k
) = f(0) and f is continuous at zero. Q.E.D.
e
1
x
k
x
k+1
0
y
k e
3
e
2
e
4
z
k
Figure 1.2.7. Construction for proving
continuity of a convex function (cf. Prop.
1.2.12).
A straightforward consequence of the continuity of a realvalued func
tion f that is convex over '
n
is that its epigraph as well as the level sets
_
x [ f(x) ≤ γ
_
are closed and convex (cf. Prop. 1.2.2).
Sec. 1.2 Convex Sets and Functions 47
1.2.4 Recession Cones
Some of the preceding results [Props. 1.2.8, 1.2.10(e)] have illustrated how
boundedness aﬀects the topological properties of sets obtained through
various operations on convex sets. In this section we take a closer look at
this issue.
Given a convex set C, we say that a vector y is a direction of recession
of C if x + αy ∈ C for all x ∈ C and α ≥ 0. In words, y is a direction
of recession of C if starting at any x in C and going indeﬁnitely along y,
we never cross the relative boundary of C to points outside C. The set of
all directions of recession is a cone containing the origin. It is called the
recession cone of C and it is denoted by R
C
(see Fig. 1.2.8). The following
proposition gives some properties of recession cones.
0
x + αy
x
Convex Set C
Recession Cone R
C
y
Figure 1.2.8. Illustration of the recession cone R
C
of a convex set C. A direction
of recession y has the property that x +αy ∈ C for all x ∈ C and α ≥ 0.
Proposition 1.2.13: (Recession Cone Theorem) Let C be a
nonempty closed convex set.
(a) The recession cone R
C
is a closed convex cone.
(b) A vector y belongs to R
C
if and only if there exists a vector
x ∈ C such that x + αy ∈ C for all α ≥ 0.
(c) R
C
contains a nonzero direction if and only if C is unbounded.
(d) The recession cones of C and ri(C) are equal.
(e) If D is another closed convex set such that C ∩ D ,= Ø, we have
R
C∩D
= R
C
∩ R
D
.
48 Convex Analysis and Optimization Chap. 1
Proof: (a) If y
1
, y
2
belong to R
C
and λ
1
, λ
2
are positive scalars such that
λ1 + λ2 = 1, we have for any x ∈ C and α ≥ 0
x + α(λ
1
y
1
+λ
2
y
2
) = λ
1
(x + αy
1
) + λ
2
(x + αy
2
) ∈ C,
where the last inclusion holds because x+αy
1
and x +αy
2
belong to C by
the deﬁnition of R
C
. Hence λ
1
y
1
+λ
2
y
2
∈ R
C
, implying that R
C
is convex.
Let y be in the closure of R
C
, and let ¦y
k
¦ ⊂ R
C
be a sequence
converging to y. For any x ∈ C and α ≥ 0 we have x + αy
k
∈ C for all k,
and because C is closed, we have x + αy ∈ C. This implies that y ∈ R
C
and that R
C
is closed.
(b) If y ∈ R
C
, every vector x ∈ C has the required property by the def
inition of R
C
. Conversely, let y be such that there exists a vector x ∈ C
with x + αy ∈ C for all α ≥ 0. We ﬁx x ∈ C and α > 0, and we show
that x +αy ∈ C. We may assume that y ,= 0 (otherwise we are done) and
without loss of generality, we may assume that y = 1. Let
z
k
= x + kαy, k = 1, 2, . . . .
If x = z
k
for some k, then x +αy = x +α(k +1)y which belongs to C and
we are done. We thus assume that x ,= z
k
for all k, and we deﬁne
y
k
=
z
k
−x
z
k
−x
, k = 1, 2, . . .
(see the construction of Fig. 1.2.9).
x + αy
x
Convex Set C
x
z
k2
z
k1
z
k
x + y
x + y
k
Unit Ball
x + αy
Figure 1.2.9. Construction for the proof of Prop. 1.2.13(b).
We have
y
k
=
z
k
−x
z
k
−x
z
k
−x
z
k
−x
+
x −x
z
k
−x
=
z
k
−x
z
k
−x
y +
x −x
z
k
−x
.
Sec. 1.2 Convex Sets and Functions 49
Because z
k
is an unbounded sequence,
z
k
−x
z
k
−x
→ 1,
x −x
z
k
−x
→ 0,
so by combining the preceding relations, we have y
k
→ y. Thus x + αy
is the limit of ¦x + αy
k
¦. The vector x + αy
k
lies between x and z
k
in
the line segment connecting x and z
k
for all k such that z
k
−x ≥ α, so
by convexity of C, we have x + αy
k
∈ C for all suﬃciently large k. Since
x+αy
k
→ x+αy and C is closed, it follows that x+αy must belong to C.
(c) Assuming that C is unbounded, we will show that R
C
contains a nonzero
direction (the reverse is clear). Choose any x ∈ C and any unbounded
sequence ¦z
k
¦ ⊂ C. Consider the sequence ¦y
k
¦, where
y
k
=
z
k
−x
z
k
−x
,
and let y be a limit point of ¦y
k
¦ (compare with the construction of Fig.
1.2.9). For any ﬁxed α ≥ 0, the vector x+αy
k
lies between x and z
k
in the
line segment connecting x and z
k
for all k such that z
k
−x ≥ α. Hence
by convexity of C, we have x + αy
k
∈ C for all suﬃciently large k. Since
x +αy is a limit point of ¦x +αy
k
¦, and C is closed, we have x +αy ∈ C.
Hence the nonzero vector y is a direction of recession.
(d) If y ∈ R
ri(C)
, then for a ﬁxed x ∈ ri(C) and all α ≥ 0, we have
x + αy ∈ ri(C) ⊂ C. Hence, by part (b), we have y ∈ R
C
. Conversely, if
y ∈ R
C
, for any x ∈ ri(C), we have x + αy ∈ C for all α ≥ 0. It follows
from the Line Segment Principle [cf. Prop. 1.2.9(a)] that x+αy ∈ ri(C) for
all α ≥ 0, so that y belongs to R
ri(C)
.
(e) By the deﬁnition of direction of recession, y ∈ R
C∩D
implies that
x + αy ∈ C ∩ D for all x ∈ C ∩ D and all α > 0. By part (b), this in turn
implies that y ∈ R
C
and y ∈ R
D
, so that R
C∩D
⊂ R
C
∩ R
D
. Conversely,
by the deﬁnition of direction of recession, if y ∈ R
C
∩R
D
and x ∈ C∩D, we
have x+αy ∈ C ∩D for all α > 0, so y ∈ R
C∩D
. Thus, R
C
∩R
D
⊂ R
C∩D
.
Q.E.D.
Note that part (c) of the above proposition yields a characterization
of compact and convex sets, namely that a closed convex set is bounded
if and only if R
C
= ¦0¦. A useful generalization is that for a compact set
W ⊂ '
m
and an mn matrix A, the set
V = ¦x ∈ C [ Ax ∈ W¦
is compact if and only if R
C
∩ N(A) = ¦0¦. To see this, note that the
recession cone of the set
V = ¦x ∈ '
n
[ Ax ∈ W¦
50 Convex Analysis and Optimization Chap. 1
is N(A) [clearly N(A) ⊂ R
V
; if x / ∈ N(A) but x ∈ R
V
we must have
αAx ∈ W for all α > 0, which contradicts the boundedness of W]. Hence,
the recession cone of V is R
C
∩N(A), so by Prop. 1.2.13(c), V is compact
if and only if R
C
∩ N(A) = ¦0¦.
One possible use of recession cones is to obtain conditions guaran
teeing the closure of linear transformations and vector sums of convex sets
in the absence of boundedness, as in the following two propositions (some
reﬁnements are given in the exercises).
Proposition 1.2.14: Let C be a nonempty closed convex subset of
'
n
and let A be an m n matrix with nullspace N(A) such that
R
C
∩ N(A) = ¦0¦. Then AC is closed.
Proof: For any y ∈ cl(AC), the set
C
=
_
x ∈ C [ y −Ax ≤
_
is nonempty for all > 0. Furthermore, by the discussion following the
proof of Prop. 1.2.13, the assumption R
C
∩ N(A) = ¦0¦ implies that C
is
compact. It follows that the set ∩>0C is nonempty and any x ∈ ∩>0C
satisﬁes Ax = y, so y ∈ AC. Q.E.D.
Proposition 1.2.15: Let C
1
, . . . , C
m
be nonempty closed convex sub
sets of '
n
such that the equality y
1
+ + y
m
= 0 for some vectors
y
i
∈ R
C
i
implies that y
i
= 0 for all i = 1, . . . , m. Then the vector sum
C
1
+ + C
m
is a closed set.
Proof: Let C be the Cartesian product C
1
C
m
viewed as a sub
set of '
mn
and let A be the linear transformation that maps a vector
(x1, . . . , xm) ∈ '
mn
into x1 + + xm. We have
R
C
= R
C
1
+ + R
Cm
(see the exercises) and
N(A) =
_
(y1, . . . , ym) [ y1 + + ym = 0, yi ∈ '
n
_
,
so under the given condition, we obtain R
C
∩ N(A) = ¦0¦. Since AC =
C
1
+ + C
m
, the result follows by applying Prop. 1.2.14. Q.E.D.
When specialized to just two sets C
1
and C
2
, the above proposition
says that if there is no nonzero direction of recession of C
1
that is the
Sec. 1.2 Convex Sets and Functions 51
opposite of a direction of recession of C
2
, then C
1
+ C
2
is closed. This is
true in particular if R
C
1
= ¦0¦ which is equivalent to C1 being compact
[cf. Prop. 1.2.13(c)]. We thus obtain the following proposition.
Proposition 1.2.16: Let C
1
and C
2
be closed, convex sets. If C
1
is
bounded, then C1 + C2 is a closed and convex set. If both C1 and C2
are bounded, then C
1
+ C
2
is a compact and convex set.
Proof: Closedness of C
1
+ C
2
follows from the preceding discussion. If
both C
1
and C
2
are bounded, then C
1
+C
2
is also bounded and hence also
compact. Q.E.D.
Note that if C
1
and C
2
are both closed and unbounded, the vector
sum C1 +C2 need not be closed. For example consider the closed sets of '
2
given by C
1
=
_
(x
1
, x
2
) [ x
1
x
2
≥ 1, x
1
≥ 0, x
2
≥ 0
_
and C
2
= ¦(x
1
, x
2
) [
x
1
= 0
_
. Then C
1
+ C
2
is the open halfspace ¦(x
1
, x
2
) [ x
1
> 0¦.
E X E R C I S E S
1.2.1
(a) Show that a set is convex if and only if it contains all the convex combina
tions of its elements.
(b) Show that the convex hull of a set coincides with the set of all the convex
combinations of its elements.
1.2.2
Let C be a nonempty set in !
n
, and let λ1 and λ2 be positive scalars. Show
by example that the sets (λ1 + λ2)C and λ1C + λ2C may diﬀer when C is not
convex [cf. Prop. 1.2.1].
1.2.3 (Properties of Cones)
(a) For any collection Ci [ i ∈ I¦ of cones, the intersection ∩i∈ICi is a cone.
(b) The Cartesian product C1 C2 of two cones C1 and C2 is a cone.
52 Convex Analysis and Optimization Chap. 1
(c) The vector sum C1 + C2 of two cones C1 and C2 is a cone.
(d) The closure of a cone is a cone.
(e) The image and the inverse image of a cone under a linear transformation
is a cone.
1.2.4 (Convex Cones)
(a) For any collection of vectors ai [ i ∈ I¦, the set C = x [ a
i
x ≤ 0, i ∈ I¦
is a closed convex cone.
(b) Show that a cone C is convex if and only if C +C ⊂ C.
(c) Let C1 and C2 be convex cones containing the origin. Show that
C1 + C2 = conv(C1 ∪ C2),
C1 ∩ C2 =
_
α∈[0,1]
_
(1 −α)C1 ∩ αC2
_
.
1.2.5 (Properties of Cartesian Product)
Given sets Xi ⊂ !
n
i
, i = 1, . . . , m, let X = X1 Xm be their Cartesian
product.
(a) Show that the convex hull (closure, aﬃne hull) of X is equal to the Cartesian
product of the convex hulls (closures, aﬃne hulls, respectively) of the Xi’s.
(b) Show that cone(X) is equal to the Cartesian product of cone(Xi).
(c) Assuming X1, . . . , Xm are convex, show that the relative interior (reces
sion cone) of X is equal to the Cartesian product of the relative interiors
(recession cones) of the Xi’s.
1.2.6
Let Ci [ i ∈ I¦ be an arbitrary collection of convex sets in !
n
, and let C be the
convex hull of the union of the collection. Show that
C =
_
_
i∈I
αiCi
_
,
where the union is taken over all convex combinations such that only ﬁnitely
many coeﬃcients αi are nonzero.
Sec. 1.2 Convex Sets and Functions 53
1.2.7
Let X be a nonempty set.
(a) Show that X, conv(X), and cl(X) have the same dimension.
(b) Show that cone(X) = cone
_
conv(X)
_
.
(c) Show that the dimension of conv(X) is at most as large as the dimension
of cone(X). Give an example where the dimension of conv(X) is smaller
than the dimension of cone(X).
(d) Assuming that the origin belongs to conv(X), show that conv(X) and
cone(X) have the same dimension.
1.2.8 (Lower Semicontinuity under Composition)
(a) Let f : !
n
→ !
m
be a continuous function and g : !
m
→ ! be a lower
semicontinuous function. Show that the function h deﬁned by h(x) =
g
_
f(x)
_
is lower semicontinuous.
(b) Let f : !
n
→ ! be a lower semicontinuous function, and g : ! → ! be
a lower semicontinuous and monotonically nondecreasing function. Show
that the function h deﬁned by h(x) = g
_
f(x)
_
is lower semicontinuous.
1.2.9 (Convexity under Composition)
(a) Let f be a convex function deﬁned on a convex set C ⊂ !
n
, and let g
be a convex monotonically nondecreasing function of a single variable [i.e.,
g(y) ≤ g(y) for y < y]. Show that the function h deﬁned by h(x) = g
_
f(x)
_
is convex over C. In addition, if g is monotonically increasing and f is
strictly convex, then h is strictly convex.
(b) Let f = (f1, . . . , fm) where each fi : !
n
→ ! is a convex function, and let
g : !
m
→ ! be a function such that g(u1, . . . , um) is convex and nondecreas
ing in ui for each i. Show that the function h deﬁned by h(x) = g
_
f(x)
_
is convex.
1.2.10 (Convex Functions)
Show that the following functions are convex:
(a) f1 : X → ! is given by
f1(x1, . . . , xn) = −(x1x2 xn)
1
n
where X = (x1, . . . , xn) [ x1 ≥, . . . , xn ≥ 0¦.
(b) f2(x) = ln
_
e
x
1
+ + e
xn
_
with (x1, . . . , xn) ∈ !
n
.
(c) f3(x) = x
p
with p ≥ 1 and x ∈ !
n
.
54 Convex Analysis and Optimization Chap. 1
(d) f4(x) =
1
f(x)
with f a concave function over !
n
and f(x) > 0 for all x.
(e) f5(x) = αf(x) + β with f a convex function over !
n
, and α and β scalars
such that α ≥ 0.
(f ) f6(x) = max0, f(x)¦ with f a convex function over !
n
.
(g) f7(x) = [[Ax −b[[ with A an mn matrix and b a vector in !
m
.
(h) f8(x) = x
Ax + b
x + β with A an n n positive semideﬁnite symmetric
matrix, b a vector in !
n
, and β a scalar.
(i) f9(x) = e
βx
Ax
with A an nn positive semideﬁnite symmetric matrix and
β a positive scalar.
(j) f10(x) = f(Ax +b) with f a convex function over !
m
, A an mn matrix,
and b a vector in !
m
.
1.2.11
Use the Line Segment Principle and the method of proof of Prop. 1.2.5(c) to show
that if C is a convex set with nonempty interior, and f : !
n
→ ! is convex and
twice continuously diﬀerentiable over C, then ∇
2
f(x) is positive semideﬁnite for
all x ∈ C.
1.2.12
Let C ⊂ !
n
be a convex set and let f : !
n
→ ! be twice continuously diﬀeren
tiable over C. Let S be the subspace that is parallel to the aﬃne hull of C. Show
that f is convex over C if and only if y
∇
2
f(x)y ≥ 0 for all x ∈ C and y ∈ S.
1.2.13
Let f : !
n
→ ! be a diﬀerentiable function. Show that f is convex over a convex
set C if and only if
_
∇f(x) −∇f(y)
_
(x −y) ≥ 0, ∀ x, y ∈ C.
Hint: The condition above says that the function f, restricted to the line segment
connecting x and y, has monotonically nondecreasing gradient; see also the proof
of Prop. 1.2.6.
1.2.14 (Ascent/Descent Behavior of a Convex Function)
Let f : ! → ! be a convex function of a single variable.
(a) (Monotropic Property) Use the deﬁnition of convexity to show that f is
“turning upwards” in the sense that if x1, x2, x3 are three scalars such that
x1 < x2 < x3, then
f(x2) −f(x1)
x2 −x1
≤
f(x3) −f(x2)
x3 −x2
.
Sec. 1.2 Convex Sets and Functions 55
(b) Use part (a) to show that there are four possibilities as x increases to ∞:
(1) f(x) decreases monotonically to −∞, (2) f(x) decreases monotonically
to a ﬁnite value, (3) f(x) reaches some value and stays at that value, (4)
f(x) increases monotonically to ∞ when x ≥ x for some x ∈ !.
1.2.15 (Posynomials)
A posynomial of scalar variables y1, . . . , yn is a function of the form
g(y1, . . . , yn) =
m
i=1
βiy
a
i1
1
y
a
in
n ,
where the scalars yi and βi are positive, and the exponents aij are real numbers.
Show the following:
(a) A posynomial need not be convex.
(b) By a logarithmic change of variables, where we set
f(x) = ln
_
g(y1, . . . , yn)
_
, bi = lnβi, xi = ln yi, ∀ i = 1, . . . , m,
we obtain a convex function
f(x) = ln exp(Ax + b)
deﬁned for all x ∈ !
n
, where lnexp(x) = ln
_
e
x
1
+ +e
xn
_
, A is an mn
matrix with entries aij, and b ∈ !
m
is a vector with components bi.
(c) In general, every function g : !
n
→ ! of the form
g(y) = g1(y)
γ
1
gr(y)
γr
,
where gk is a posynomial and γk > 0 for all k, can be transformed by the
logarithmic change of variables into a convex function f given by
f(x) =
r
k=1
γk ln exp(Akx +bk),
with the matrix Ak and the vector bk being associated with the posynomial
gk for each k.
1.2.16 (ArithmeticGeometric Mean Inequality)
Show that if α1, . . . , αn are positive scalars with
n
i=1
αi = 1, then for every set
of positive scalars x1, . . . , xn, we have
x
α
1
1
x
α
2
2
x
αn
n
≤ α1x1 + α2x2 + + αnxn,
with equality if and only if x1 = x2 = = xn. Hint: Show that −lnx is a
strictly convex function on (0, ∞).
56 Convex Analysis and Optimization Chap. 1
1.2.17
Use the result of Exercise 1.2.16 to verify Young’s inequality
xy ≤
x
p
p
+
y
q
q
,
where p > 0, q > 0, 1/p + 1/q = 1, x ≥ 0, and y ≥ 0. Then, use Young’s
inequality to verify Holder’s inequality
n
i=1
[xiyi[ ≤
_
n
i=1
[xi[
p
_
1/p
_
n
i=1
[yi[
q
_
1/q
.
1.2.18 (Convexity under Minimization I)
Let h : !
n+m
→ ! be a convex function. Consider the function f : !
n
→ !
given by
f(x) = inf
u∈U
h(x, u),
where we assume that U is a nonempty and convex subset of !
m
, and that
f(x) > −∞ for all x. Show that f is convex, and for each x, the set u ∈ U [
h(x, u) = f(x)¦ is convex. Hint: There cannot exist α ∈ [0, 1], x1, x2, u1 ∈ U,
u2 ∈ U such that f
_
αx1 + (1 −α)x2
_
> αh(x1, u1) + (1 −α)h(x2, u2).
1.2.19 (Convexity under Minimization II)
(a) Let C be a convex set in !
n+1
and let
f(x) = infw [ (x, w) ∈ C¦.
Show that f is convex over !
n
.
(b) Let f1, . . . , fm be convex functions over !
n
and let
f(x) = inf
_
m
i=1
fi(xi)
¸
¸
¸
m
i=1
xi = x
_
.
Assuming that f(x) > −∞ for all x, show that f is convex over !
n
.
(c) Let h : !
m
→ ! be a convex function and let
f(x) = inf
By=x
h(y),
where B is an n m matrix. Assuming that f(x) > −∞ for all x, show
that f is convex over the range space of B.
(d) In parts (b) and (c), show by example that if the assumption f(x) > −∞
for all x is violated, then the set
_
x [ f(x) > −∞
_
need not be convex.
Sec. 1.2 Convex Sets and Functions 57
1.2.20 (Convexity under Minimization III)
Let fi [ i ∈ I¦ be an arbitrary collection of convex functions on !
n
. Deﬁne
the convex hull of these functions f : !
n
→ ! as the pointwise inﬁmum of the
collection, i.e.,
f(x) = inf
_
w [ (x, w) ∈ conv
_
∪i∈Iepi(fi)
__
.
Show that f(x) is given by
f(x) = inf
_
i∈I
αifi(xi)
¸
¸
¸
i∈I
αixi = x
_
,
where the inﬁmum is taken over all representations of x as a convex combination
of elements xi, such that only ﬁnitely many coeﬃcients αi are nonzero.
1.2.21 (Convexiﬁcation of Nonconvex Functions)
Let X be a nonempty subset of !
n
, let f : X → ! be a function that is bounded
below over X. Deﬁne the function F : conv(X) → ! by
F(x) = inf
_
w [ (x, w) ∈ conv
_
epi(f)
__
.
(a) Show that F is convex over conv(X) and it is given by
F(x) = inf
_
m
i=1
αif(xi)
¸
¸
¸
m
i=1
αixi = x, xi ∈ X,
m
i=1
αi = 1, αi ≥ 0, m ≥ 1
_
(b) Show that
inf
x∈conv(X)
F(x) = inf
x∈X
f(x).
(c) Show that the set of global minima of F over conv(X) includes all global
minima of f over X.
1.2.22 (Minimization of Linear Functions)
Show that minimization of a linear function over a set is equivalent to minimiza
tion over its convex hull. In particular, if X ⊂ !
n
and c ∈ !
n
, then
inf
x∈X
c
x = inf
conv(X)
c
x.
Furthermore, the inﬁmum in the lefthand side above is attained if and only if
the inﬁmum in the righthand side is attained.
58 Convex Analysis and Optimization Chap. 1
1.2.23 (Extension of Caratheodory’s Theorem)
Let X1 and X2 be subsets of !
n
, and let X = conv(X1) + cone(X2). Show that
every vector x in X can be represented in the form
x =
k
i=1
αixi +
m
i=k+1
αixi,
where m is a positive integer with m≤ n+1, the vectors x1, . . . , xk belong to X1,
the vectors x
k+1
, . . . , xm belong to X2, and the scalars α1, . . . , αm are nonnegative
with α1+ +α
k
= 1. Furthermore, the vectors x2−x1, . . . , x
k
−x1, x
k+1
, . . . , xm
are linearly independent.
1.2.24
Let x0, . . . , xm be vectors in !
n
such that x1 − x0, . . . , xm − x0 are linearly
independent. The convex hull of x0, . . . , xm is called an mdimensional simplex,
and x0, . . . , xm are called the vertices of the simplex.
(a) Show that the dimension of a convex set C is the maximum of the dimen
sions of the simplices included in C.
(b) Use part (a) to show that a nonempty convex set has a nonempty relative
interior.
1.2.25
Let X be a bounded subset of !
n
. Show that
cl
_
conv(X)
_
= conv
_
cl(X)
_
.
In particular, if X is closed and bounded, then conv(X) is closed and bounded
(cf. Prop. 1.2.8).
1.2.26
Let C1 and C2 be two nonempty convex sets such that C1 ⊂ C2.
(a) Give an example showing that ri(C1) need not be a subset of ri(C2).
(b) Assuming that the sets ri(C1) and ri(C2) have nonempty intersection, show
that ri(C1) ⊂ ri(C2).
(c) Assuming that the sets C1 and ri(C2) have nonempty intersection, show
that the set ri(C1) ∩ ri(C2) is nonempty.
Sec. 1.2 Convex Sets and Functions 59
1.2.27
Let C be a nonempty convex set.
(a) Show the following reﬁnement of the Line Segment Principle [Prop. 1.2.9(c)]:
x ∈ ri(C) if and only if for every x ∈ aﬀ(C), there exists γ > 1 such that
x + (γ −1)(x −x) ∈ C.
(b) Assuming that the origin lies in ri(C), show that cone(C) coincides with
aﬀ(C).
(c) Show the following extension of part (b) to a nonconvex set: If X is a
nonempty set such that the origin lies in the relative interior of conv(X),
then cone(X) coincides with aﬀ(X).
1.2.28
Let C be a compact set.
(a) Assuming that C is a convex set not containing the origin in its relative
boundary, show that cone(C) is closed.
(b) Give examples showing that the assertion of part (a) fails if C is unbounded
or C contains the origin in its relative boundary.
(c) The convexity assumption in part (a) can be relaxed as follows: assuming
that conv(C) does not contain the origin in its boundary, show that cone(C)
is closed. Hint: Use part (a) and Exercise 1.2.7(b).
1.2.29
(a) Let C be a convex cone. Show that ri(C) is also a convex cone.
(b) Let C = cone
_
x1, . . . , xm¦
_
. Show that
ri(C) =
_
m
i=1
αixi
¸
¸
¸ αi > 0, i = 1, . . . , m
_
.
1.2.30
Let A be an mn matrix and let C be a nonempty convex set in !
m
. Assuming
that the inverse image A
−1
C is nonempty, show that
ri(A
−1
C) = A
−1
ri(C), cl(A
−1
C) = A
−1
cl(C).
[Compare these relations with those of Prop. 1.2.10(d) and (e), respectively.]
60 Convex Analysis and Optimization Chap. 1
1.2.31 (Lipschitz Property of Convex Functions)
Let f : !
n
→ ! be a convex function and X be a bounded set in !
n
. Then f
has the Lipschitz property over X, i.e., there exists a positive scalar c such that
[f(x) −f(y)[ ≤ c [[x −y[[, ∀ x, y ∈ X.
1.2.32
Let C be a closed convex set and let M be an aﬃne set such that the intersection
C∩M is nonempty and bounded. Show that for every aﬃne set M that is parallel
to M, the intersection C ∩ M is bounded when nonempty.
1.2.33 (Recession Cones of Nonclosed Sets)
Let C be a nonempty convex set.
(a) Show by counterexample that part (b) of the Recession Cone Theorem need
not hold when C is not closed.
(b) Show that
RC ⊂ R
cl(C)
, cl(RC) ⊂ R
cl(C)
.
Give an example where the inclusion cl(RC) ⊂ R
cl(C)
is strict, and another
example where cl(RC) = R
cl(C)
. Also, give an example showing that R
ri(C)
need not be a subset of RC.
(c) Let C be a closed convex set such that C ⊂ C. Show that RC ⊂ R
C
. Give
an example showing that the inclusion can fail if C is not closed.
1.2.34 (Recession Cones of Relative Interiors)
Let C be a nonempty convex set.
(a) Show that a vector y belongs to R
ri(C)
if and only if there exists a vector
x ∈ ri(C) such that x + αy ∈ C for every α ≥ 0.
(b) Show that R
ri(C)
= R
cl(C)
.
(c) Let C be a relatively open convex set such that C ⊂ C. Show that RC ⊂
R
C
. Give an example showing that the inclusion can fail if C is not relatively
open. [Compare with Exercise 1.2.33(b).]
Hint: In part (a), follow the proof of part (b) of the Recession Cone Theorem.
In parts (b) and (c), use the result of part (a).
Sec. 1.3 Convexity and Optimization 61
1.2.35
Let C be a nonempty convex set in !
n
and let A be an mn matrix.
(a) Show the following reﬁnement of Prop. 1.2.14: if R
cl(C)
∩N(A) = 0¦, then
cl(A C) = A cl(C), A R
cl(C)
= R
A·cl(C)
.
(b) Give an example showing that A R
cl(C)
and R
A·cl(C)
can diﬀer when
R
cl(C)
∩ N(A) ,= 0¦.
1.2.36 (Lineality Space and Recession Cone)
Let C be a nonempty convex set in !
n
. Deﬁne the lineality space of C, denoted by
L, to be a subspace of vectors y such that simultaneously y ∈ RC and −y ∈ RC.
(a) Show that for every subspace S ⊂ L
C = (C ∩ S
⊥
) +S.
(b) Show the following reﬁnement of Prop. 1.2.14 and Exercise 1.2.35: if A is
an mn matrix and R
cl(C)
∩ N(A) is a subspace of L, then
cl(A C) = A cl(C), R
A·cl(C)
= A R
cl(C)
.
1.2.37
This exercise is a reﬁnement of Prop. 1.2.15.
(a) Let C1, . . . , Cm be nonempty closed convex sets in !
n
such that the equality
y1 + +ym = 0 with yi ∈ RC
i
implies that each yi belongs to the lineality
space of Ci. Then the vector sum C1 + + Cm is a closed set and
RC
1
+···+Cm
= RC
1
+ + RCm
.
(b) Show the following extension of part (a) to nonclosed sets: Let C1, . . . , Cm
be nonempty convex sets in !
n
such that the equality y1+ +ym = 0 with
yi ∈ R
cl(C
i
)
implies that each yi belongs to the lineality space of cl(Ci).
Then we have
cl(C1 + + Cm) = cl(C1) + + cl(Cm),
R
cl(C
1
+···+Cm)
= R
cl(C
1
)
+ + R
cl(Cm)
.
62 Convex Analysis and Optimization Chap. 1
1.3 CONVEXITY AND OPTIMIZATION
In this section we discuss applications of convexity to some basic optimiza
tion issues, such as the existence and uniqueness of global minima. Several
other applications, relating to optimality conditions and polyhedral con
vexity, will be discussed in subsequent sections.
1.3.1 Global and Local Minima
Let X be a nonempty subset of '
n
and let f : '
n
→ (−∞, ∞] be a
function. We say that a vector x
∗
∈ X is a minimum of f over X if
f(x
∗
) = inf
x∈X
f(x). We also call x
∗
a minimizing point or minimizer or
global minimum over X. Alternatively, we say that f attains a minimum
over X at x
∗
, and we indicate this by writing
x
∗
∈ arg min
x∈X
f(x).
We use similar terminology for maxima, i.e., a vector x
∗
∈ X such that
f(x
∗
) = sup
x∈X
f(x) is said to be a maximum of f over X, and we indicate
this by writing
x
∗
∈ arg max
x∈X
f(x).
If the domain of f is the set X (instead of '
n
), we also call x
∗
a (global)
minimum or (global) maximum of f (without the qualiﬁer “over X”).
A basic question in minimization problems is whether an optimal
solution exists. This question can often be resolved with the aid of the
classical theorem of Weierstrass, which states that a continuous function
attains a minimum over a compact set. We will provide a more general
version of this theorem, and to this end, we introduce some terminology.
We say that a function f : '
n
→(−∞, ∞] is coercive if
lim
k→∞
f(x
k
) = ∞
for every sequence ¦x
k
¦ such that x
k
 → ∞ for some norm  . Note
that as a consequence of the deﬁnition, the level sets
_
x [ f(x) ≤ γ¦ of a
coercive function f are bounded whenever they are nonempty.
Sec. 1.3 Convexity and Optimization 63
Proposition 1.3.1: (Weierstrass’ Theorem) Let f : '
n
→ (−∞, ∞]
be a closed function. Assume that one of the following three conditions
holds:
(1) dom(f) is compact.
(2) dom(f) is closed and f is coercive.
(3) There exists a scalar γ such that the set
_
x [ f(x) ≤ γ
_
is nonempty and compact.
Then, f attains a minimum over '
n
.
Proof: If f(x) = ∞ for all x ∈ '
n
, then every x ∈ '
n
attains the min
imum of f over '
n
. Thus, with no loss of generality, we assume that
inf
x∈
n f(x) < ∞. Assume condition (1). Let ¦x
k
¦ ⊂ dom(f) be a se
quence such that
lim
k→∞
f(x
k
) = inf
x∈
n
f(x).
Since dom(f) is bounded, this sequence has at least one limit point x
∗
[Prop. 1.1.5(a)]. Since f is closed, f is lower semicontinuous at x
∗
[cf.
Prop. 1.2.2(b)], so that f(x
∗
) ≤ lim
k→∞
f(x
k
) = inf
x∈
n f(x), and x
∗
is a
minimum of f.
Assume condition (2). Consider a sequence ¦x
k
¦ as in the proof under
condition (1). Since f is coercive, ¦x
k
¦ must be bounded and the proof
proceeds similar to the proof under condition (1).
Assume condition (3). If the given γ is equal to inf
x∈
n f(x), the set
of minima of f over '
n
is
_
x [ f(x) ≤ γ
_
, and since by assumption this set
is nonempty, we are done. If inf
x∈
n f(x) < γ, consider a sequence ¦x
k
¦ as
in the proof under condition (1). Then, for all k suﬃciently large, x
k
must
belong to the set
_
x [ f(x) ≤ γ
_
. Since this set is compact, ¦x
k
¦ must be
bounded and the proof proceeds similar to the proof under condition (1).
Q.E.D.
The most common application of the above proposition is when we
want to minimize a realvalued function f : '
n
→ ' over a nonempty set
X. Then by applying Prop. 1.3.1 to the extended realvalued function
˜
f(x) =
_
f(x) if x ∈ X,
∞ otherwise,
(1.9)
we see that f attains a minimum over X if X is closed and f is lower
semicontinuous over X (implying that
˜
f is closed), and either (1) X is
64 Convex Analysis and Optimization Chap. 1
bounded or (2) f is coercive or (3) some level set
_
x ∈ X [ f(x) ≤ γ
_
is
nonempty and compact.
For another example, suppose that we want to minimize a closed
convex function f : '
n
→ (−∞, ∞] over a closed convex set X. Then, by
using the function
˜
f of Eq. (1.9), we see that f attains a minimum over
X if the set X ∩ dom(f) is compact or if f is coercive or if some level set
_
x ∈ X [ f(x) ≤ γ
_
is nonempty and compact.
Note that with appropriate adjustments, the preceding analysis ap
plies to the existence of maxima of f over X. For example, if a realvalued
function f is upper semicontinuous at all points of a compact set X, then
f attains a maximum over X.
We say that a vector x
∗
∈ X is a local minimum of f over X if
there exists some > 0 such that f(x
∗
) ≤ f(x) for every x ∈ X satisfying
x − x
∗
 ≤ , where   is some vector norm. If the domain of f is the
set X (instead of '
n
), we also call x
∗
a local minimum of f (without the
qualiﬁer “over X”). Local maxima are deﬁned similarly.
An important implication of convexity of f and X is that all local
minima are also global, as shown in the following proposition and in Fig.
1.3.1.
αf(x*) + (1  α)f(x)
x
f(αx* + (1 α)x)
x x*
f(x)
Figure 1.3.1. Proof of why local minima of convex functions are also global.
Suppose that f is convex, and assume to arrive at a contradiction, that x
∗
is a
local minimum that is not global. Then there must exist an x ∈ X such that
f(x) < f(x
∗
). By convexity, for all α ∈ (0, 1),
f
_
αx
∗
+ (1 −α)x
_
≤ αf(x
∗
) + (1 −α)f(x) < f(x
∗
).
Thus, f has strictly lower value than f(x
∗
) at every point on the line segment
connecting x
∗
with x, except at x
∗
. This contradicts the local minimality of x
∗
over X.
Sec. 1.3 Convexity and Optimization 65
Proposition 1.3.2: If X ⊂ '
n
is a convex set and f : X → (−∞, ∞]
is a convex function, then a local minimum of f is also a global mini
mum. If in addition f is strictly convex, then there exists at most one
global minimum of f.
Proof: See Fig. 1.3.1 for a proof that a local minimum of f is also global.
Let f be strictly convex, and to obtain a contradiction, assume that two
distinct global minima x and y exist. Then the midpoint (x + y)/2 must
belong to X, since X is convex. Furthermore, the value of f must be
smaller at the midpoint than at x and y by the strict convexity of f. Since
x and y are global minima, we obtain a contradiction. Q.E.D.
1.3.2 The Projection Theorem
In this section we develop a basic result of analysis and optimization.
Proposition 1.3.3: (Projection Theorem) Let C be a nonempty
closed convex set and let   be the Euclidean norm.
(a) For every x ∈ '
n
, there exists a unique vector z ∈ C that mini
mizes z −x over all z ∈ C. This vector is called the projection
of x on C, and is denoted by P
C
(x), i.e.,
P
C
(x) = arg min
z∈C
z −x.
(b) For every x ∈ '
n
, a vector z ∈ C is equal to P
C
(x) if and only if
(y −z)
(x −z) ≤ 0, ∀ y ∈ C.
(c) The function f : '
n
→ C deﬁned by f(x) = P
C
(x) is continuous
and nonexpansive, i.e.,
_
_
P
C
(x) −P
C
(y)
_
_
≤ x −y, ∀ x, y ∈ '
n
.
(d) The distance function d : '
n
→ ', deﬁned by
d(x, C) = min
z∈C
z −x,
is convex.
66 Convex Analysis and Optimization Chap. 1
Proof: (a) Fix x and let w be some element of C. Minimizing x−z over
all z ∈ C is equivalent to minimizing the same function over all z ∈ C such
that x−z ≤ x−w, which is a compact set. Furthermore, the function
g deﬁned by g(z) = z − x
2
is continuous. Existence of a minimizing
vector follows by Weierstrass’ Theorem (Prop. 1.3.1).
To prove uniqueness, notice that the square of the Euclidean norm
is a strictly convex function of its argument because its Hessian matrix is
the identity matrix, which is positive deﬁnite [Prop. 1.2.5(b)]. Therefore,
g is strictly convex and it follows that its minimum is attained at a unique
point (Prop. 1.3.2).
(b) For all y and z in C we have
y−x
2
= y−z
2
+z−x
2
−2(y−z)
(x−z) ≥ z−x
2
−2(y−z)
(x−z).
Therefore, if z is such that (y − z)
(x − z) ≤ 0 for all y ∈ C, we have
y −x
2
≥ z −x
2
for all y ∈ C, implying that z = P
C
(x).
Conversely, let z = P
C
(x), consider any y ∈ C, and for α > 0, deﬁne
y
α
= αy + (1 −α)z. We have
x −y
α

2
= (1 −α)(x −z) + α(x −y)
2
= (1 −α)
2
x −z
2
+ α
2
x −y
2
+ 2(1 −α)α(x −z)
(x −y).
Viewing x −y
α

2
as a function of α, we have
∂
∂α
_
x −y
α

2
_
¸
¸
¸
α=0
= −2x −z
2
+2(x −z)
(x −y) = −2(y −z)
(x −z).
Therefore, if (y −z)
(x −z) > 0 for some y ∈ C, then
∂
∂α
_
x −y
α

2
_
¸
¸
¸
α=0
< 0
and for positive but small enough α, we obtain x −y
α
 < x −z. This
contradicts the fact z = P
C
(x) and shows that (y −z)
(x − z) ≤ 0 for all
y ∈ C.
(c) Let x and y be elements of '
n
. From part (b), we have
_
w−P
C
(x)
_
_
x−
P
C
(x)
_
≤ 0 for all w ∈ C. Since P
C
(y) ∈ C, we obtain
_
P
C
(y) −P
C
(x)
_
_
x −P
C
(x)
_
≤ 0.
Similarly,
_
P
C
(x) −P
C
(y)
_
_
y −P
C
(y)
_
≤ 0.
Adding these two inequalities, we obtain
_
P
C
(y) −P
C
(x)
_
_
x −P
C
(x) −y + P
C
(y)
_
≤ 0.
Sec. 1.3 Convexity and Optimization 67
By rearranging and by using the Schwartz inequality, we have
_
_
P
C
(y) −P
C
(x)
_
_
2
≤
_
P
C
(y) −P
C
(x)
_
(y −x) ≤
_
_
P
C
(y)−P
C
(x)
_
_
y −x,
showing that P
C
() is nonexpansive and a fortiori continuous.
(d) Assume, to arrive at a contradiction, that there exist x
1
, x
2
∈ '
n
and
an α ∈ (0, 1) such that
d
_
αx
1
+ (1 −α)x
2
, C
_
> αd(x
1
, C) + (1 −α)d(x
2
, C).
Then there must exist z1, z2 ∈ C such that
d
_
αx
1
+ (1 −α)x
2
, C
_
> αz
1
−x
1
 + (1 −α)z
2
−x
2
,
which implies that
_
_
αz
1
+ (1 −α)z
2
−αx
1
−(1 −α)x
2
_
_
> αx
1
−z
1
 + (1 −α)x
2
−z
2
.
This contradicts the triangle inequality in the deﬁnition of norm. Q.E.D.
Figure 1.3.2 illustrates the necessary and suﬃcient condition of part
(b) of the Projection Theorem.
x P
C
(x)
y
C
Figure 1.3.2. Illustration of the condition
satisﬁed by the projection P
C
(x). For each
vector y ∈ C, the vectors x − P
C
(x) and
y − P
C
(x) form an angle greater than or
equal to π/2 or, equivalently,
_
y −P
C
(x)
_
_
x −P
C
(x)
_
≤ 0.
1.3.3 Directions of Recession and Existence of Optimal Solutions
The recession cone, discussed in Section 1.2.4, is also useful for analyzing
the existence of optimal solutions of convex optimization problems. A key
idea here is that a convex function can be described in terms of its epigraph,
which is a convex set. The recession cone of the epigraph can be used to
obtain the directions along which the function decreases monotonically.
This is the idea underlying the following proposition.
68 Convex Analysis and Optimization Chap. 1
Proposition 1.3.4: Let f : '
n
→ (−∞, ∞] be a closed convex func
tion and consider the level sets L
a
=
_
x [ f(x) ≤ a
_
.
(a) All the level sets L
a
that are nonempty have the same recession
cone, given by
R
La
=
_
y [ (y, 0) ∈ R
epi(f)
_
,
where R
epi(f)
is the recession cone of the epigraph of f.
(b) If one nonempty level set L
a
is compact, then all nonempty level
sets are compact.
Proof: From the formula for the epigraph
epi(f) =
_
(x, w) [ f(x) ≤ w
_
,
it can be seen that for all a for which L
a
is nonempty, we have
_
(x, a) [ x ∈ L
a
_
= epi(f) ∩
_
(x, a) [ x ∈ '
n
_
.
The recession cone of the set in the lefthand side above is
_
(y, 0) [ y ∈
R
La
_
. By Prop. 1.2.13(e), the recession cone of the set in the righthand
side is equal to the intersection of the recession cone of epi(f) and the
recession cone of
_
(x, a) [ x ∈ '
n
_
, which is equal to
_
(y, 0) [ y ∈ '
n
_
, the
horizontal subspace that passes through the origin. Thus we have
_
(y, 0) [ y ∈ R
La
_
=
_
(y, 0) [ (y, 0) ∈ R
epi(f)
_
,
from which it follows that
R
La
=
_
y [ (y, 0) ∈ R
epi(f)
_
.
This proves part (a). Part (b) follows by applying Prop. 1.2.13(c) to the
recession cone of epi(f). Q.E.D.
For a closed convex function f : '
n
→ (−∞, ∞], the (common)
recession cone of the nonempty level sets of f is referred to as the recession
cone of f, and is denoted by R
f
(see Fig. 1.3.3). Thus
R
f
=
_
y [ f(x) ≥ f(x + αy), ∀ x ∈ '
n
, α ≥ 0
_
.
Each y ∈ R
f
is called a direction of recession of f. Since f is closed, its
level sets are closed, so by the Recession Cone Theorem (cf. Prop. 1.2.13),
R
f
is also closed.
Sec. 1.3 Convexity and Optimization 69
0
Level Sets of Convex
Function f
Recession Cone R
f
Figure 1.3.3. Recession cone R
f
of a closed
convex function f. It is the (common) re
cession cone of the nonempty level sets of
f.
The most intuitive way to look at directions of recession is from a
descent viewpoint: if we start at any x ∈ '
n
and move indeﬁnitely along a
direction of recession y, we must stay within each level set that contains x,
or equivalently we must encounter exclusively points z with f(z) ≤ f(x). In
words, a direction of recession of f is a direction of uninterrupted nonascent
for f. Conversely, if we start at some x ∈ '
n
and while moving along a
direction y, we encounter a point z with f(z) > f(x), then y cannot be a
direction of recession. It is easily seen via a convexity argument that once
we cross the relative boundary of a level set of f we never cross it back again,
and with a little thought (see Fig. 1.3.4 and the exercises), it follows that a
direction that is not a direction of recession of f is a direction of eventual
uninterrupted ascent of f. In view of these observations, it is not surprising
that directions of recession play a prominent role in characterizing the
existence of solutions of convex optimization problems, as shown in the
following proposition.
Proposition 1.3.5: (Existence of Solutions of Convex Pro
grams) Let X be a closed convex subset of '
n
, and let f : '
n
→
(−∞, ∞] be a closed convex function such that f(x) < ∞ for at least
one vector x ∈ X. Let X
∗
be the set of minimizing points of f over
X. Then X
∗
is nonempty and compact if and only if X and f have
no common nonzero direction of recession.
Proof: Let f
∗
= inf
x∈X
f(x), and note that
X
∗
= X ∩
_
x [ f(x) ≤ f
∗
_
.
Since by assumption f
∗
< ∞, and X and f are closed, the two sets in
the above intersection are closed, so X
∗
is closed as well as convex. If X
∗
is nonempty and compact, it has no nonzero direction of recession [Prop.
70 Convex Analysis and Optimization Chap. 1
f(x + αy)
α
f(x)
(a)
f(x + αy)
α
f(x)
(b)
f(x + αy)
α
f(x)
(c)
f(x + αy)
α
f(x)
(d)
f(x + αy)
α
f(x)
(e)
f(x + αy)
α
f(x)
(f)
Figure 1.3.4. Ascent/descent behavior of a closed convex function starting at
some x ∈
n
and moving along a direction y. If y is a direction of recession of f,
there are two possibilities: either f decreases monotonically to a ﬁnite value or
−∞ [ﬁgures (a) and (b), respectively], or f reaches a value that is less or equal to
f(x) and stays at that value [ﬁgures (c) and (d)]. If y is not a direction of recession
of f, then eventually f increases monotonically to ∞ [ﬁgures (e) and (f)]; i.e., for
some α ≥ 0 and all α
1
, α
2
≥ α with α
1
< α
2
we have f(x +α
1
y) < f(x +α
2
y).
1.2.13(c)]. Therefore, there is no nonzero vector in the intersection of the
recession cones of X and
_
x [ f(x) ≤ f
∗
_
. This is equivalent to X and f
having no common nonzero direction of recession.
Sec. 1.3 Convexity and Optimization 71
Conversely, let a be a scalar such that the set
X
a
= X ∩
_
x [ f(x) ≤ a
_
is nonempty and has no nonzero direction of recession. Then, Xa is closed
[since X is closed and
_
x [ f(x) ≤ a
_
is closed by the closedness of f], and
by Prop. 1.2.13(c), X
a
is compact. Since minimization of f over X and
over X
a
yields the same set of minima, X
∗
, by Weierstrass’ Theorem (Prop.
1.3.1) X
∗
is nonempty, and since X
∗
⊂ X
a
, we see that X
∗
is bounded.
Since X
∗
is closed it is compact. Q.E.D.
If the closed convex set X and the closed convex function f of the
above proposition have a common direction of recession, then either X
∗
is empty [take for example, X = (−∞, 0] and f(x) = e
x
] or else X
∗
is
nonempty and unbounded [take for example, X = (−∞, 0] and f(x) =
max¦0, x¦].
Another interesting question is what happens when X and f have a
common direction of recession, call it y, but f is bounded below over X:
f
∗
= inf
x∈X
f(x) > −∞.
Then for any x ∈ X, we have x + αy ∈ X (since y is a direction of
recession of X), and f(x + αy) is monotonically nondecreasing to a ﬁnite
value as α → ∞ (since y is a direction of recession of f and f
∗
> −∞).
Generally, the minimum of f over X need not be attained. However, it
turns out that the minimum is attained in an important special case: when
f is quadratic and X is polyhedral (i.e., is deﬁned by linear equality and
inequality constraints).
To understand the main idea, consider the problem
minimize f(x) = c
x +
1
2
x
Qx
subject to Ax = 0,
(1.10)
where Q is a positive semideﬁnite symmetric n n matrix, c ∈ '
n
is a
given vector, and A is an m n matrix. Let N(Q) and N(A) denote the
nullspaces of Q and A, respectively. There are two possibilities:
(a) For some x ∈ N(A) ∩ N(Q), we have c
x ,= 0. Then, since f(αx) =
αc
x for all α ∈ ', it follows that f becomes unbounded from below
either along x or along −x.
(b) For all x ∈ N(A) ∩ N(Q), we have c
x = 0. In this case, we have
f(x) = 0 for all x ∈ N(A)∩N(Q). For x ∈ N(A) such that x / ∈ N(Q),
since N(Q) and R(Q), the range of Q, are orthogonal subspaces, x
can be uniquely decomposed as x
R
+ x
N
, where x
N
∈ N(Q) and
72 Convex Analysis and Optimization Chap. 1
x
R
∈ R(Q), and we have f(x) = c
x + (1/2)x
R
Qx
R
, where x
R
is
the (nonzero) component of x along R(Q). Hence f(αx) = αc
x +
(1/2)α
2
x
R
Qx
R
for all α > 0, with x
R
Qx
R
> 0. It follows that f is
bounded below along all feasible directions x ∈ N(A).
We thus conclude that for f to be bounded from below along all directions
in N(A) it is necessary and suﬃcient that c
x = 0 for all x ∈ N(A) ∩N(Q).
However, boundedness from below of a convex cost function f along all
directions of recession of a constraint set does not guarantee existence of
an optimal solution, or even boundedness from below over the constraint set
(see the exercises). On the other hand, since the constraint set N(A) is a
subspace, it is possible to use a transformation x = Bz where the columns
of the matrix B are basis vectors for N(A), and view the problem as an
unconstrained minimization over z of the cost function h(z) = f(Bz), which
is positive semideﬁnite quadratic. We can then argue that boundedness
from below of this function along all directions z is necessary and suﬃcient
for existence of an optimal solution. This argument indicates that problem
(1.10) has an optimal solution if and only if c
x = 0 for all x ∈ N(A)∩N(Q).
By using a translation argument, this result can also be extended to the
case where the constraint set is a general aﬃne set of the form ¦x [ Ax = b¦
rather than the subspace ¦x [ Ax = 0¦.
In part (a) of the following proposition we state the result just de
scribed (equality constraints only). While we can prove the result by for
malizing the argument outlined above, we will use instead a more elemen
tary variant of this argument, whereby the constraints are eliminated via
a penalty function; this will give us the opportunity to introduce a line
of proof that we will frequently employ in other contexts as well. In part
(b) of the proposition, we allow linear inequality constraints, and we show
that a convex quadratic program has an optimal solution if and only if its
optimal value is bounded below. Note that the cost function may be linear,
so the proposition applies to linear programs as well.
Sec. 1.3 Convexity and Optimization 73
Proposition 1.3.6: (Existence of Solutions of Quadratic Pro
grams) Let f : '
n
→ ' be a quadratic function of the form
f(x) = c
x +
1
2
x
Qx,
where Q is a positive semideﬁnite symmetric nn matrix and c ∈ '
n
is
a given vector. Let also A be an mn matrix and b ∈ '
m
be a vector.
Denote by N(A) and N(Q), the nullspaces of A and Q, respectively.
(a) Let X = ¦x [ Ax = b¦ and assume that X is nonempty. The
following are equivalent:
(i) f attains a minimum over X.
(ii) f
∗
= inf
x∈X
f(x) > −∞.
(iii) c
y = 0 for all y ∈ N(A) ∩ N(Q).
(b) Let X = ¦x [ Ax ≤ b¦ and assume that X is nonempty. The
following are equivalent:
(i) f attains a minimum over X.
(ii) f
∗
= inf
x∈X
f(x) > −∞.
(iii) c
y ≥ 0 for all y ∈ N(Q) such that Ay ≤ 0.
Proof: (a) (i) clearly implies (ii).
We next show that (ii) implies (iii). For all x ∈ X, y ∈ N(A) ∩N(Q),
and α ∈ ', we have x + αy ∈ X and
f(x + αy) = c
(x + αy) +
1
2
(x + αy)
Q(x + αy) = f(x) +αc
y.
If c
y ,= 0, then either lim
α→∞
f(x + αy) = −∞ or lim
α→−∞
f(x + αy) =
−∞, and we must have f
∗
= −∞. Hence (ii) implies that c
y = 0 for all
y ∈ N(A) ∩ N(Q).
We ﬁnally show that (iii) implies (i) by ﬁrst using a translation argu
ment and then using a penalty function argument. Choose any x ∈ X, so
that X = x+N(A). Then minimizing f over X is equivalent to minimizing
f(x + y) over y ∈ N(A), or
minimize h(y)
subject to Ay = 0,
where
h(y) = f(x + y) = f(x) +∇f(x)
y +
1
2
y
Qy.
74 Convex Analysis and Optimization Chap. 1
For any integer k > 0, let
h
k
(y) = h(y) +
k
2
Ay
2
= f(x) +∇f(x)
y +
1
2
y
Qy +
k
2
Ay
2
. (1.11)
Note that for all k
h
k
(y) ≤ h
k+1
(y), ∀ y ∈ '
n
,
and
inf
y∈
n
h
k
(y) ≤ inf
y∈
n
h
k+1
(y) ≤ inf
Ay=0
h(y) ≤ h(0) = f(x). (1.12)
Denote
S =
_
N(A) ∩ N(Q)
_
⊥
and write any y ∈ '
n
as y = z + w, where
z ∈ S, w ∈ S
⊥
= N(A) ∩ N(Q).
Then, by using the assumption c
w = 0 [implying that ∇f(x)
w = (c +
Qx)
w = 0], we see from Eq. (1.11) that
h
k
(y) = h
k
(z + w) = h
k
(z), (1.13)
i.e., h
k
is determined in terms of its restriction to the subspace S. It can
be seen from Eq. (1.11) that the function h
k
has no nonzero direction of
recession in common with S, so h
k
(z) attains a minimum over S, call it y
k
,
and in view of Eq. (1.13), y
k
also attains the minimum of h
k
(y) over '
n
.
From Eq. (1.12), we have
h
k
(y
k
) ≤ h
k+1
(y
k+1
) ≤ inf
Ay=0
h(y) ≤ f(x), (1.14)
and we will use this relation to show that ¦y
k
¦ is bounded and each of its
limit points minimizes h(y) subject to Ay = 0. Indeed, from Eq. (1.14), the
sequence ¦h
k
(y
k
)¦ is bounded, so if ¦y
k
¦ were unbounded, then assuming
without loss of generality that y
k
,= 0, we would have h
k
(y
k
)/y
k
 → 0, or
lim
k→∞
_
1
y
k

f(x) +∇f(x)
ˆ y
k
+y
k

_
1
2
ˆ y
k
Qˆ y
k
+
k
2
Aˆ y
k

2
__
= 0,
where ˆ y
k
= y
k
/y
k
. For this to be true, all limit points ˆ y of the bounded
sequence ¦ˆ y
k
¦ must be such that ˆ y
Qˆ y = 0 and Aˆ y = 0, which is impossible
since ˆ y = 1 and ˆ y ∈ S. Thus ¦y
k
¦ is bounded and for any one of its limit
points, call it y, we have y ∈ S and
limsup
k→∞
h
k
(y
k
) = f(x) +∇f(x)
y +
1
2
y
Qy + limsup
k→∞
k
2
Ay
k

2
≤ inf
Ay=0
h(y).
Sec. 1.3 Convexity and Optimization 75
It follows that Ay = 0 and that y minimizes h(y) over Ay = 0. This implies
that the vector y = x + y minimizes f(x) subject to Ax = b.
(b) Clearly (i) implies (ii), and similar to the proof of part (a), (ii) implies
that c
y ≥ 0 for all y ∈ N(Q) with Ay ≤ 0.
Finally, we show that (iii) implies (i) by using the corresponding re
sult of part (a). For any x ∈ X, let J(x) denote the index set of active
constraints at x, i.e., J(x) = ¦j [ a
j
x = bj¦, where the a
j
are the rows of
A.
For any sequence ¦x
k
¦ ⊂ X with f(x
k
) → f
∗
, we can extract a
subsequence such that J(x
k
) is constant and equal to some J. Accordingly,
we select a sequence ¦x
k
¦ ⊂ X such that f(x
k
) → f
∗
, and the index set
J(x
k
) is equal for all k to a set J that is maximal over all such sequences [for
any other sequence ¦x
k
¦ ⊂ X with f(x
k
) → f
∗
and such that J(x
k
) = J
for all k, we cannot have J ⊂ J unless J = J].
Consider the problem
minimize f(x)
subject to a
j
x = b
j
, j ∈ J.
(1.15)
Assume, to come to a contradiction, that this problem does not have a
solution. Then, by part (a), we have c
y < 0 for some y ∈ N(A) ∩ N(Q),
where A is the matrix having as rows the a
j
, j ∈ J. Consider the line
¦x
k
+ γy [ γ > 0¦. Since y ∈ N(Q), we have
f(x
k
+ γy) = f(x
k
) + γc
y,
so that
f(x
k
+ γy) < f(x
k
), ∀ γ > 0.
Furthermore, since y ∈ N(A), we have
a
j
(x
k
+γy) = b
j
, ∀ j ∈ J, γ > 0.
We must also have a
j
y > 0 for at least one j / ∈ A [otherwise (iii) would
be violated], so the line ¦x
k
+ γy [ γ > 0¦ crosses the relative boundary
of X for some γ
k
> 0. The sequence ¦x
k
¦, where x
k
= x
k
+ γ
k
y, satisﬁes
¦x
k
¦ ⊂ X, f(x
k
) → f
∗
[since f(x
k
) ≤ f(x
k
)], and the active index set
J(x
k
) strictly contains J for all k. This contradicts the maximality of J,
and shows that problem (1.15) has an optimal solution, call it x.
Since x
k
is a feasible solution of problem (1.15), we have
f(x) ≤ f(x
k
), ∀ k,
so that
f(x) ≤ f
∗
.
76 Convex Analysis and Optimization Chap. 1
We will now show that x minimizes f over X, by showing that x ∈ X,
thereby completing the proof. Assume, to arrive at a contradiction, that
x / ∈ X. Let ˆ x
k
be a point in the interval connecting x
k
and x that belongs
to X and is closest to x. We have that J(ˆ x
k
) strictly contains J for all k.
Since f(x) ≤ f(x
k
) and f is convex over the interval [x
k
, x], it follows that
f(ˆ x
k
) ≤ max
_
f(x
k
), f(x)
_
= f(x
k
).
Thus f(ˆ x
k
) → f
∗
, which contradicts the maximality of J. Q.E.D.
1.3.4 Existence of Saddle Points
Suppose that we are given a function φ : X Z → ', where X ⊂ '
n
,
Z ⊂ '
m
, and we wish to either
minimize sup
z∈Z
φ(x, z)
subject to x ∈ X
or
maximize inf
x∈X
φ(x, z)
subject to z ∈ Z.
These problems are encountered in at least three major optimization con
texts:
(1) Worstcase design, whereby we view z as a parameter and we wish
to minimize over x a cost function, assuming the worst possible value
of x. A special case of this is the discrete minimax problem, where
we want to minimize over x ∈ X
max
_
f
1
(x), . . . , f
m
(x)
_
,
where the f
i
are some given functions. Here, Z is the ﬁnite set
¦1, . . . , m¦. Within this context, it is important to provide char
acterizations of the max function
max
z∈Z
φ(x, z),
particularly its directional derivative. We do this in Section 1.7, where
we discuss the diﬀerentiability properties of convex functions.
(2) Exact penalty functions, which can be used for example to convert
constrained optimization problems of the form
minimize f(x)
subject to x ∈ X, g
j
(x) ≤ 0, j = 1, . . . , r
(1.16)
Sec. 1.3 Convexity and Optimization 77
to (less constrained) minimax problems of the form
minimize f(x) +c max
_
0, g
1
(x), . . . , g
r
(x)
_
subject to x ∈ X,
where c is a large positive penalty parameter. This conversion is
useful for both analytical and computational purposes, and will be
discussed in Chapters 2 and 4.
(3) Duality theory, where using problem (1.16) as an example, we intro
duce the, so called, Lagrangian function
L(x, µ) = f(x) +
r
j=1
µ
j
g
j
(x)
involving the vector µ = (µ
1
, . . . , µ
r
) ∈ '
r
, and the dual problem
maximize inf
x∈X
L(x, µ)
subject to µ ≥ 0.
(1.17)
The original (primal) problem (1.16) can also be written as
minimize sup
µ≥0
L(x, µ)
subject to x ∈ X
[if x violates any of the constraints g
j
(x) ≤ 0, we have sup
µ≥0
L(x, µ) =
∞, and if it does not, we have sup
µ≥0
L(x, µ) = f(x)]. Thus the pri
mal and the dual problems (1.16) and (1.17) can be viewed in terms
of a minimax problem.
We will now derive conditions guaranteeing that
sup
z∈Z
inf
x∈X
φ(x, z) = inf
x∈X
sup
z∈Z
φ(x, z), (1.18)
and that the inf and the sup above are attained. This is a major issue in
duality theory because it connects the primal and the dual problems [cf.
Eqs. (1.16) and (1.17)] through their optimal values and optimal solutions.
In particular, when we discuss duality in Chapter 3, we will see that a
major question is whether there is no duality gap, i.e., whether the optimal
primal and dual values are equal. This is so if and only if
sup
µ≥0
inf
x∈X
L(x, µ) = inf
x∈X
sup
µ≥0
L(x, µ). (1.19)
We will prove in this section one major result, the Saddle Point Theo
rem, which guarantees the equality (1.18), as well as the attainment of the
78 Convex Analysis and Optimization Chap. 1
inf and sup, assuming convexity/concavity assumptions on φ and (essen
tially) compactness assumptions on X and Z.† Unfortunately, the Saddle
Point Theorem is only partially adequate for the development of duality
theory, because compactness of Z and, to some extent, compactness of
X turn out to be restrictive assumptions [for example Z corresponds to
the set ¦µ [ µ ≥ 0¦ in Eq. (1.19), which is not compact]. We will state
another major result, the Minimax Theorem, which is more relevant to du
ality theory and gives conditions guaranteing the minimax equality (1.18),
although it does not guarantee the attainment of the inf and sup. We will
also give additional theorems of the minimax type in Chapter 3, when we
discuss duality and make a closer connection with the theory of Lagrange
multipliers.
A ﬁrst observation regarding the potential validity of the minimax
equality (1.18) is that we always have the inequality
sup
z∈Z
inf
x∈X
φ(x, z) ≤ inf
x∈X
sup
z∈Z
φ(x, z), (1.20)
[for every z ∈ Z, write inf
x∈X
φ(x, z) ≤ inf
x∈X
sup
z∈Z
φ(x, z) and take the
supremum over z ∈ Z of the lefthand side]. However, special conditions
are required to guarantee equality.
Suppose that x
∗
is an optimal solution of the problem
minimize sup
z∈Z
φ(x, z)
subject to x ∈ X
(1.21)
and z
∗
is an optimal solution of the problem
maximize inf
x∈X
φ(x, z)
subject to z ∈ Z.
(1.22)
† The Saddle Point Theorem is also central in game theory, as we now brieﬂy
explain. In the simplest type of zero sum game, there are two players: the ﬁrst
may choose one out of n moves and the second may choose one out of m moves.
If moves i and j are selected by the ﬁrst and the second player, respectively, the
ﬁrst player gives a speciﬁed amount aij to the second. The objective of the ﬁrst
player is to minimize the amount given to the other player, and the objective of
the second player is to maximize this amount. The players use mixed strategies,
whereby the ﬁrst player selects a probability distribution x = (x1, . . . , xn) over
his n possible moves and the second player selects a probability distribution
z = (z1, . . . , zm) over his m possible moves. Since the probability of selecting i
and j is xizj, the expected amount to be paid by the ﬁrst player to the second is
i,j
aijxizj or x
Az, where A is the n m matrix with elements aij.
If each player adopts a worst case viewpoint, whereby he optimizes his
choice against the worst possible selection by the other player, the ﬁrst player
must minimize maxz x
Az and the second player must maximize minx x
Az. The
main result, a special case of the existence result we will prove shortly, is that
these two optimal values are equal, implying that there is an amount that can
be meaningfully viewed as the value of the game for its participants.
Sec. 1.3 Convexity and Optimization 79
Then we have
sup
z∈Z
inf
x∈X
φ(x, z) = inf
x∈X
φ(x, z
∗
) ≤ φ(x
∗
, z
∗
) ≤ sup
z∈Z
φ(x
∗
, z) = inf
x∈X
sup
z∈Z
φ(x, z).
(1.23)
If the minimax equality [cf. Eq. (1.18)] holds, then equality holds through
out above, so that
sup
z∈Z
φ(x
∗
, z) = φ(x
∗
, z
∗
) = inf
x∈X
φ(x, z
∗
), (1.24)
or equivalently
φ(x
∗
, z) ≤ φ(x
∗
, z
∗
) ≤ φ(x, z
∗
), ∀ x ∈ X, z ∈ Z. (1.25)
A pair of vectors x
∗
∈ X and z
∗
∈ Z satisfying the two above (equivalent)
relations is called a saddle point of φ (cf. Fig. 1.3.5).
We have thus shown that if the minimax equality (1.18) holds, any
vectors x
∗
and z
∗
that are optimal solutions of problems (1.21) and (1.22),
respectively, form a saddle point. Conversely, if (x
∗
, z
∗
) is a saddle point,
then the deﬁnition (1.24) implies that
inf
x∈X
sup
z∈Z
φ(x, z) ≤ sup
z∈Z
inf
x∈X
φ(x, z).
This, together with the minimax inequality (1.20) guarantee that the min
imax equality (1.18) holds and from Eq. (1.23), x
∗
and z
∗
are optimal
solutions of problems (1.21) and (1.22), respectively.
We summarize the above discussion in the following proposition.
Proposition 1.3.7: A pair (x
∗
, z
∗
) is a saddle point of φ if and only if
the minimax equality (1.18) holds, and x
∗
and z
∗
are optimal solutions
of problems (1.21) and (1.22), respectively.
Note a simple consequence of the above proposition: the set of saddle
points, when nonempty, is the Cartesian product X
∗
Z
∗
, where X
∗
and Z
∗
are the sets of optimal solutions of problems (1.21) and (1.22), respectively.
In other words x
∗
and z
∗
can be independently chosen within the sets X
∗
and Z
∗
, respectively, to form a saddle point. Note also that if the minimax
equality (1.18) does not hold, there is no saddle point, even if the sets X
∗
and Z
∗
are nonempty.
One can visualize saddle points in terms of the sets of minimizing
points over X for ﬁxed z ∈ Z and maximizing points over Z for ﬁxed
x ∈ X:
ˆ
X(z) =
_
ˆ x [ ˆ x minimizes φ(x, z) over X
_
,
80 Convex Analysis and Optimization Chap. 1
x
z
Curve of maxima
Curve of minima
φ(x,z)
Saddle point
(x
*
,z
*
)
^
φ(x(z),z)
φ(x,z(x))
^
Figure 1.3.5. Illustration of a saddle point of a function φ(x, z) over x ∈ X and
z ∈ Z; the function plotted here is φ(x, z) =
1
2
(x
2
+ 2xz −z
2
). Let
ˆ x(z) = arg min
x∈X
φ(x, z), ˆz(x) = arg max
z∈Z
φ(x, z).
In the case illustrated, ˆ x(z) and ˆz(x) consist of unique minimizing and maximiz
ing points, respectively, so we view ˆ x(z) and ˆz(x) as (singlevalued) functions;
otherwise ˆ x(z) and ˆz(x) should be viewed as setvalued mappings. We consider
the corresponding curves φ
_
ˆ x(z), z
_
and φ
_
x, ˆz(x)
_
. By deﬁnition, a pair (x
∗
, z
∗
)
is a saddle point if and only if
max
z∈Z
φ(x
∗
, z) = φ(x
∗
, z
∗
) = min
x∈X
φ(x, z
∗
),
or equivalently, if (x
∗
, z
∗
) lies on both curves [x
∗
= ˆ x(z
∗
) and z
∗
= ˆz(x
∗
)]. At
such a pair, we also have
max
z∈Z
φ
_
ˆ x(z), z
_
= max
z∈Z
min
x∈X
φ(x, z) = φ(x
∗
, z
∗
) = min
x∈X
max
z∈Z
φ(x, z) = min
x∈X
φ
_
x, ˆz(x)
_
,
so that
φ
_
ˆ x(z), z
_
≤ φ(x
∗
, z
∗
) ≤ φ
_
x, ˆz(x)
_
, ∀ x ∈ X, z ∈ Z
(see Prop. 1.3.7). Visually, the curve of maxima φ
_
x, ˆ z(x)
_
must lie “above” the
curve of minima φ
_
ˆ x(z), z
_
(completely, i.e., for all x ∈ X and z ∈ Z), but the
two curves should meet at (x
∗
, z
∗
).
Sec. 1.3 Convexity and Optimization 81
ˆ
Z(x) =
_
ˆ z [ ˆ z maximizes φ(x, z) over Z
_
.
The deﬁnition implies that the pair (x
∗
, z
∗
) is a saddle point if and only if
it is a “point of intersection” of
ˆ
X() and
ˆ
Z() in the sense that
x
∗
∈
ˆ
X(z
∗
), z
∗
∈
ˆ
Z(x
∗
);
see Fig. 1.3.5.
We now consider conditions that guarantee the existence of a saddle
point. We have the following classical result.
Proposition 1.3.8: (Saddle Point Theorem) Let X and Z be
closed, convex subsets of '
n
and '
m
, respectively, and let φ : XZ →
' be a function. Assume that for each z ∈ Z, the function φ(, z) :
X → ' is convex and lower semicontinuous, and for each x ∈ X, the
function φ(x, ) : Z → ' is concave and upper semicontinuous. Then
there exists a saddle point of φ under any one of the following four
conditions:
(1) X and Z are compact.
(2) Z is compact and there exists a vector z ∈ Z such that φ(, z) is
coercive.
(3) X is compact and there exists a vector x ∈ X such that −φ(x, )
is coercive.
(4) There exist vectors x ∈ X and z ∈ Z such that φ(, z) and
−φ(x, ) are coercive.
Proof: We ﬁrst prove the result under the assumption that X and Z
are compact [condition (1)], and the additional assumption that φ(x, ) is
strictly concave for each x ∈ X.
By Weierstrass’ Theorem (Prop. 1.3.1), the function
f(x) = max
z∈Z
φ(x, z), x ∈ X,
is realvalued and the maximum above is attained for each x ∈ X at a
point denoted ˆ z(x), which is unique by the strict concavity assumption.
Furthermore, by Prop. 1.2.3(c), f is lower semicontinuous, so again by
Weierstrass’ Theorem, f attains a minimum over x ∈ X at some point x
∗
.
Let z
∗
= ˆ z(x
∗
), so that
φ(x
∗
, z) ≤ φ(x
∗
, z
∗
) = f(x
∗
), ∀ z ∈ Z. (1.26)
We will show that (x
∗
, z
∗
) is a saddle point of φ, and in view of the above
relation, it will suﬃce to show that φ(x
∗
, z
∗
) ≤ φ(x, z
∗
) for all x ∈ X.
82 Convex Analysis and Optimization Chap. 1
Choose any x ∈ X and let
x
k
=
1
k
x +
_
1 −
1
k
_
x
∗
, z
k
= ˆ z(x
k
), k = 1, 2, . . . . (1.27)
Let z be any limit point of ¦z
k
¦ corresponding to a subsequence / of
positive integers. Using the convexity of φ(, z
k
), we have
f(x
∗
) ≤ f(x
k
) = φ(x
k
, z
k
) ≤
1
k
φ(x, z
k
) +
_
1 −
1
k
_
φ(x
∗
, z
k
). (1.28)
Taking the limit as k → ∞ and k ∈ /, and using the upper semicontinuity
of φ(x
∗
, ), we obtain
f(x
∗
) ≤ limsup
k→∞, k∈K
φ(x
∗
, z
k
) ≤ φ(x
∗
, z) ≤ max
z∈Z
φ(x
∗
, z) = f(x
∗
).
Hence equality holds throughout above, and it follows that φ(x
∗
, z) ≤
φ(x
∗
, z) = f(x
∗
) for all z ∈ Z. Since z
∗
is the unique maximizer of φ(x
∗
, )
over Z, we see that z = ˆ z(x
∗
) = z
∗
, so that ¦z
k
¦ has z
∗
as its unique limit
point, independently of the choice of the vector x within X (this is the ﬁne
point in the argument where the strict concavity assumption is needed).
We have shown that for any x ∈ X, the corresponding sequence ¦z
k
¦
[cf. Eq. (1.27)], converges to the vector z
∗
= ˆ z(x
∗
), satisﬁes Eq. (1.28), and
also, in view of Eq. (1.26), satisﬁes
φ(x
∗
, z
k
) ≤ φ(x
∗
, z
∗
) = f(x
∗
), k = 1, 2, . . .
Combining this last relation with Eq. (1.28), we have
φ(x
∗
, z
∗
) ≤
1
k
φ(x, z
k
) +
_
1 −
1
k
_
φ(x
∗
, z
∗
), ∀ x ∈ X,
or
φ(x
∗
, z
∗
) ≤ φ(x, z
k
), ∀ x ∈ X.
Taking the limit as z
k
→ z
∗
, and using the upper semicontinuity of φ(x, ),
we obtain
φ(x
∗
, z
∗
) ≤ φ(x, z
∗
), ∀ x ∈ X.
Combining this relation with Eq. (1.26), we see that (x
∗
, z
∗
) is a saddle
point of φ.
Next, we remove the assumption of strict concavity of φ(x
∗
, ). We
introduce the functions φ
k
: X Z → ' given by
φ
k
(x, z) = φ(x, z) −
1
k
z
2
, k = 1, 2, . . .
Sec. 1.3 Convexity and Optimization 83
Since φ
k
(x
∗
, ) is strictly concave, there exists (based on what has already
been proved) a saddle point (x
∗
k
, z
∗
k
) of φ
k
, satisfying
φ(x
∗
k
, z)−
1
k
z
2
≤ φ(x
∗
k
, z
∗
k
)−
1
k
z
∗
k

2
≤ φ(x, z
∗
k
)−
1
k
z
∗
k

2
, ∀ x ∈ X, z ∈ Z.
Let (x
∗
, z
∗
) be a limit point of (x
∗
k
, z
∗
k
). By taking limit as k → ∞ and by
using the semicontinuity assumptions on φ, it follows that
φ(x
∗
, z) ≤ liminf
k→∞
φ(x
∗
k
, z) ≤ limsup
k→∞
φ(x, z
∗
k
) ≤ φ(x, z
∗
), ∀ x ∈ X, z ∈ Z.
(1.29)
By setting alternately x = x
∗
and z = z
∗
in the above relation, we obtain
φ(x
∗
, z) ≤ φ(x
∗
, z
∗
) ≤ φ(x, z
∗
), ∀ x ∈ X, z ∈ Z,
so (x
∗
, z
∗
) is a saddle point of φ.
We now prove the existence of a saddle point under conditions (2),
(3), or (4). For each integer k, we introduce the convex and compact sets
X
k
=
_
x ∈ X [ x ≤ k
_
, Z
k
=
_
z ∈ Z [ z ≤ k
_
.
Choose k large enough so that
k ≥
_
_
_
max
z∈Z
z if condition (2) holds,
max
x∈X
x if condition (3) holds,
max¦x, z¦ if condition (4) holds,
and X
k
is nonempty under condition (2) while Z
k
is nonempty under con
dition (3). Note that for k ≥ k, the sets X
k
and Z
k
are nonempty. Fur
thermore, Z = Z
k
under condition (2) and X = X
k
under condition (3).
Using the result already shown under condition (1), for each k ≥ k, there
exists a saddle point over X
k
Z
k
, i.e., a pair (x
k
, z
k
) such that
φ(x
k
, z) ≤ φ(x
k
, z
k
) ≤ φ(x, z
k
), ∀ x ∈ X
k
, z ∈ Z
k
. (1.30)
Assume that condition (2) holds. Then, since Z is compact, ¦z
k
¦ is
bounded. If ¦x
k
¦ were unbounded, the coercivity of φ(, z) would imply
that φ(x
k
, z) → ∞ and from Eq. (1.30) it would follow that φ(x, z
k
) → ∞
for all x ∈ X. By the upper semicontinuity of φ(x, ), this contradicts the
boundedness of ¦z
k
¦. Hence ¦x
k
¦ must be bounded, and (x
k
, z
k
) must
have a limit point (x
∗
, z
∗
). Taking limit as k → ∞ in Eq. (1.30), and using
the semicontinuity assumptions on φ, it follows that
φ(x
∗
, z) ≤ liminf
k→∞
φ(x
k
, z) ≤ limsup
k→∞
φ(x, z
k
) ≤ φ(x, z
∗
), ∀ x ∈ X, z ∈ Z.
(1.31)
84 Convex Analysis and Optimization Chap. 1
By alternately setting x = x
∗
and z = z
∗
in the above relation, we see that
(x
∗
, z
∗
) is a saddle point of φ.
A symmetric argument with the obvious modiﬁcations, shows the
result under condition (3). Finally, under condition (4), note that Eq.
(1.30) yields for all k,
φ(x
k
, z) ≤ φ(x
k
, z
k
) ≤ φ(x, z
k
).
If ¦x
k
¦ were unbounded, the coercivity of φ(, z) would imply that φ(x
k
, z) →
∞ and hence φ(x, z
k
) → ∞, which together with the upper semicontinuity
of φ(x, ) violates the coercivity of −φ(x, ). Hence ¦x
k
¦ must be bounded,
and a symmetric argument shows that ¦z
k
¦ must be bounded. Thus (x
k
, z
k
)
must have a limit point (x
∗
, z
∗
). The result then follows from Eq. (1.31),
similar to the case where condition (2) holds. Q.E.D.
It is easy to construct examples showing that the convexity of X and Z
are essential assumptions for the above proposition (this is also evident from
Fig. 1.3.5). The assumptions of compactness/coercivity and lower/upper
semicontinuity of φ are essential for existence of a saddle point (just as they
are essential in Weierstrass’ Theorem). An interesting question is whether
convexity/concavity and lower/upper semicontinuity of φ are suﬃcient to
guarantee the minimax equality (1.18). Unfortunately this is not so for
reasons that also touch upon some of the deeper aspects of duality theory
(see Chapter 3). Here is an example:
Example 1.3.1
Let
X = x ∈ !
2
[ x ≥ 0¦, Z = z ∈ ! [ z ≥ 0¦,
and let
φ(x, z) = e
−
√
x
1
x
2
+ zx1,
which satisfy the convexity/concavity and lower/upper semicontinuity as
sumptions of Prop. 1.3.8. For all z ≥ 0, we have
inf
x≥0
_
e
−
√
x
1
x
2
+ zx1
_
= 0,
since the expression in braces is nonnegative for x ≥ 0 and can approach zero
by taking x1 → 0 and x1x2 → ∞. Hence
sup
z≥0
inf
x≥0
φ(x, z) = 0.
We also have for all x ≥ 0
sup
z≥0
_
e
−
√
x
1
x
2
+ zx1
_
=
_
1 if x1 = 0,
∞ if x1 > 0.
Sec. 1.3 Convexity and Optimization 85
Hence
inf
x≥0
sup
z≥0
φ(x, z) = 1,
so inf
x≥0
sup
z≥0
φ(x, z) > sup
z≥0
inf
x≥0
φ(x, z). The diﬃculty here is that
the compactness/coercivity assumptions of Prop. 1.3.8 are violated.
On the other hand, with convexity/concavity and lower/upper semi
continuity of φ, together with one additional assumption, given in the fol
lowing proposition, it is possible to guarantee the minimax equality (1.18)
without guaranteeing the existence of a saddle point. The proof of the
proposition requires the machinery of hyperplanes, and will be given at the
end of the next section.
Proposition 1.3.9: (Minimax Theorem) Let X and Z be non
empty convex subsets of '
n
and '
m
, respectively, and let φ : XZ →
' be a function. Assume that for each z ∈ Z, the function φ(, z) :
X → ' is convex and lower semicontinuous, and for each x ∈ X, the
function φ(x, ) : Z → ' is concave and upper semicontinuous. Let
the function p : '
m
→ [−∞, ∞] be deﬁned by
p(u) = inf
x∈X
sup
z∈Z
¦φ(x, z) −u
z¦. (1.32)
Assume that
−∞ < p(0) < ∞,
and that p(0) ≤ liminf
k→∞
p(u
k
) for every sequence ¦u
k
¦ with u
k
→
0. Then, the minimax equality holds, i.e.,
sup
z∈Z
inf
x∈X
φ(x, z) = inf
x∈X
sup
z∈Z
φ(x, z).
Within the duality context discussed in the beginning of this subsec
tion, the minimax equality implies that there is no duality gap. The above
theorem indicates that, aside from convexity and semicontinuity assump
tions, the properties of the function p around u = 0 are critical. As an
illustration, note that in Example 1.3.1, p is given by
p(u) = inf
x≥0
sup
z≥0
_
e
−
√
x
1
x
2
+ z(x
1
−u)
_
=
_
∞ if u < 0,
1 if u = 0,
0 if u > 0.
86 Convex Analysis and Optimization Chap. 1
Thus it can be seen that even though p(0) is ﬁnite, p is not lower semicon
tinuous at 0. As a result the assumptions of Prop. 1.3.9 are violated and
the minimax equality does not hold.
The signiﬁcance of the function p within the context of convex pro
gramming and duality will be elaborated on and clariﬁed further in Chapter
4. Furthermore, the proof of the above theorem, given in Section 1.4.2, will
indicate some common circumstances where p has the desired properties
around u = 0.
E X E R C I S E S
1.3.1
Let f : !
n
→ ! be a convex function, let X be a closed convex set, and assume
that f and X have no common direction of recession. Let X
∗
be the optimal
solution set (nonempty and compact by Prop. 1.3.5) and let f
∗
= infx∈X f(x).
Show that:
(a) For every > 0 there exists a δ > 0 such that every vector x ∈ X with
f(x) ≤ f
∗
+ δ satisﬁes min
x
∗
∈X
∗ x −x
∗
 ≤ .
(b) Every sequence x
k
¦ ⊂ X satisfying lim
k→∞
f(x
k
) → f
∗
is bounded and
all its limit points belong to X
∗
.
1.3.2
Let C be a convex set and S be a subspace. Show that projection on S is a
linear transformation and use this to show that the projection of the set C on S
is a convex set, which is compact if C is compact. Is the projection of C always
closed if C is closed?
1.3.3 (Existence of Solution of Quadratic Programs)
This exercise deals with an extension of Prop. 1.3.6 to the case where the quadratic
cost may not be convex. Consider a problem of the form
minimize c
x +
1
2
x
Qx
subject to Ax ≤ b,
where Q is a symmetric (not necessarily positive semideﬁnite) matrix, c and b
are given vectors, and A is a matrix. Show that the following are equivalent:
Sec. 1.4 Hyperplanes 87
(a) There exists at least one optimal solution.
(b) The cost function is bounded below over the constraint set.
(c) The problem has at least one feasible solution, and for any feasible x, there
is no y ∈ !
n
such that Ay ≤ 0 and either y
Qy < 0 or y
Qy = 0 and
(c + Qx)
y < 0.
1.3.4
Let C ⊂ !
n
be a nonempty convex set, and let f : !
n
→ ! be an aﬃne function
such that infx∈C f(x) is attained at some x
∗
∈ ri(C). Show that f(x) = f(x
∗
)
for all x ∈ C.
1.3.5 (Existence of Optimal Solutions)
Let f : !
n
→ ! be a convex function, and consider the problem of minimizing
f over a closed and convex set X. Suppose that f attains a minimum along all
half lines of the form x+αy [ α ≥ 0¦ where x ∈ X and y is in the recession cone
of X. Show that we may have infx∈X f(x) = −∞. Hint: Use the case n = 2,
X = !
2
, f(x) = minz∈C z −x
2
−x1, where C =
_
(x1, x2) [ x
2
1
≤ x2
_
.
1.3.6 (Saddle Points in two Dimensions)
Consider a function φ of two real variables x and z taking values in compact
intervals X and Z, respectively. Assume that for each z ∈ Z, the function φ(, z)
is minimized over X at a unique point denoted ˆ x(z). Similarly, assume that for
each x ∈ X, the function φ(x, ) is maximized over Z at a unique point denoted
ˆz(x). Assume further that the functions ˆ x(z) and ˆz(x) are continuous over Z
and X, respectively. Show that φ has a unique saddle point (x
∗
, z
∗
). Use this to
investigate the existence of saddle points of φ(x, z) = x
2
+z
2
over X = [0, 1] and
Z = [0, 1].
1.4 HYPERPLANES
Some of the most important principles in convexity and optimization,
including duality, revolve around the use of hyperplanes, i.e., (n − 1)
dimensional aﬃne sets. A hyperplane has the property that it divides the
space into two halfspaces. We will see shortly that a distinguishing feature
of a closed convex set is that it is the intersection of all the halfspaces that
contain it. Thus any closed convex set can be described in dual fashion: (a)
as the union of all points contained in the set, and (b) as the intersection
of all halfspaces containing the set. This fundamental principle carries over
to a closed convex function, once the function is speciﬁed in terms of its
88 Convex Analysis and Optimization Chap. 1
(closed and convex) epigraph. In this section, we develop the principle just
outlined, and we apply it to a construction that is central in duality theory.
We then use this construction to prove the minimax theorem given at the
end of the preceding section.
1.4.1 Support and Separation Theorems
A hyperplane is a set of the form ¦x [ a
x = b¦, where a ∈ '
n
, a ,= 0,
and b ∈ ', as illustrated in Fig. 1.4.1. An equivalent deﬁnition is that a
hyperplane in '
n
is an aﬃne set of dimension n−1. The vector a is called
the normal vector of the hyperplane (it is orthogonal to the diﬀerence x−y
of any two vectors x and y that belong to the hyperplane). The two sets
¦x [ a
x ≥ b¦, ¦x [ a
x ≤ b¦,
are called the halfspaces associated with the hyperplane (also referred to as
the positive and negative halfspaces, respectively). We have the following
result, which is also illustrated in Fig. 1.4.1. The proof is based on the
Projection Theorem and is illustrated in Fig. 1.4.2.
a
C
a C
2
C
1
(c) (b)
Positive Halfspace
{x  a'x ≥ b}
Hyperplane {x  a'x = b}
a
(a)
Negative Halfspace
{x  a'x ≤ b}
x
Figure 1.4.1. (a) A hyperplane {x  a
x = b} divides the space into two halfs
paces as illustrated. (b) Geometric interpretation of the Supporting Hyperplane
Theorem. (c) Geometric interpretation of the Separating Hyperplane Theorem.
Sec. 1.4 Hyperplanes 89
x
3
x
2
x
1
x
0
a
2
a
1
a
0
C
x
2
x
1
x
0
x
x
3
Figure 1.4.2. Illustration of the proof of the Supporting Hyperplane Theorem
for the case where the vector x belongs to the closure of C. We choose a sequence
{x
k
} of vectors not belonging to the closure of C which converges to x, and we
project x
k
on the closure of C. We then consider, for each k, the hyperplane that
is orthogonal to the line segment connecting x
k
and its projection, and passes
through x
k
. These hyperplanes “converge” to a hyperplane that supports C at x.
Proposition 1.4.1: (Supporting Hyperplane Theorem) If C ⊂
'
n
is a convex set and x is a point that does not belong to the interior
of C, there exists a vector a ,= 0 such that
a
x ≥ a
x, ∀ x ∈ C. (1.33)
Proof: Consider the closure cl(C) of C, which is a convex set by Prop.
1.2.1(d). Let ¦x
k
¦ be a sequence of vectors not belonging to cl(C), which
converges to x; such a sequence exists because x does not belong to the
interior of C. If ˆ x
k
is the projection of x
k
on cl(C), we have by part (b) of
the Projection Theorem (Prop. 1.3.3)
(ˆ x
k
−x
k
)
(x − ˆ x
k
) ≥ 0, ∀ x ∈ cl(C).
Hence we obtain for all x ∈ cl(C) and k,
(ˆ x
k
−x
k
)
x ≥ (ˆ x
k
−x
k
)
ˆ x
k
= (ˆ x
k
−x
k
)
(ˆ x
k
−x
k
)+(ˆ x
k
−x
k
)
x
k
≥ (ˆ x
k
−x
k
)
x
k
.
We can write this inequality as
a
k
x ≥ a
k
x
k
, ∀ x ∈ cl(C), k = 0, 1, . . . , (1.34)
where
a
k
=
ˆ x
k
−x
k
ˆ x
k
−x
k

.
90 Convex Analysis and Optimization Chap. 1
We have a
k
 = 1 for all k, and hence the sequence ¦a
k
¦ has a subsequence
that converges to a nonzero limit a. By considering Eq. (1.34) for all a
k
belonging to this subsequence and by taking the limit as k → ∞, we obtain
Eq. (1.33). Q.E.D.
Proposition 1.4.2: (Separating Hyperplane Theorem) If C1
and C
2
are two nonempty, disjoint, and convex subsets of '
n
, there
exists a hyperplane that separates them, i.e., a vector a ,= 0 such that
a
x
1
≤ a
x
2
, ∀ x
1
∈ C
1
, x
2
∈ C
2
. (1.35)
Proof: Consider the convex set
C = ¦x [ x = x2 −x1, x1 ∈ C1, x2 ∈ C2¦.
Since C
1
and C
2
are disjoint, the origin does not belong to C, so by the
Supporting Hyperplane Theorem there exists a vector a ,= 0 such that
0 ≤ a
x, ∀ x ∈ C,
which is equivalent to Eq. (1.35). Q.E.D.
Proposition 1.4.3: (Strict Separation Theorem) If C
1
and C
2
are two nonempty, disjoint, and convex sets such that C
1
is closed and
C
2
is compact, there exists a hyperplane that strictly separates them,
i.e., a vector a ,= 0 and a scalar b such that
a
x1 < b < a
x2, ∀ x1 ∈ C1, x2 ∈ C2. (1.36)
Proof: Consider the problem
minimize x
1
−x
2

subject to x1 ∈ C1, x2 ∈ C2.
This problem is equivalent to minimizing the distance d(x
2
, C
1
) over x
2
∈
C
2
. Since C
2
is compact, and the distance function is convex (cf. Prop.
1.3.3) and hence continuous (cf. Prop. 1.2.12), the problem has at least one
solution (x
1
, x
2
) by Weierstrass’ Theorem (cf. Prop. 1.3.1). Let
a =
x2 −x1
2
, x =
x1 +x2
2
, b = a
x.
Sec. 1.4 Hyperplanes 91
Then, a ,= 0, since x
1
∈ C
1
, x
2
∈ C
2
, and C
1
and C
2
are disjoint. The
hyperplane
¦x [ a
x = b¦
contains x, and it can be seen from the preceding discussion that x
1
is the
projection of x on C
1
, and x
2
is the projection of x on C
2
(see Fig. 1.4.3).
By Prop. 1.3.3(b), we have
(x −x
1
)
(x
1
−x
1
) ≤ 0, ∀ x
1
∈ C
1
or equivalently, since x −x
1
= a,
a
x
1
≤ a
x
1
= a
x + a
(x
1
−x) = b −a
2
< b, ∀ x
1
∈ C
1
.
Thus, the lefthand side of Eq. (1.36) is proved. The righthand side is
proved similarly. Q.E.D.
(b)
a
C
1
(a)
C
2
x
2
x
1
x
C
2
= {(ξ
1
,ξ
2
)  ξ
1
> 0, ξ
2
>0, ξ
1
ξ
2
≥ 1}
C
1
= {(ξ
1
,ξ
2
)  ξ
1
≤ 0}
C = {x
1
 x
2
 x
1
∈ C
1
, x
2
∈ C
2
}
= {(ξ
1
,ξ
2
)  ξ
1
< 0}
Figure 1.4.3. (a) Illustration of the construction of a strictly separating hyper
plane of two disjoint closed convex sets C1 and C2 one of which is also bounded
(cf. Prop. 1.4.3). (b) An example showing that if none of the two sets is compact,
there may not exist a strictly separating hyperplane. This is due to the fact that
the distance d(x
2
, C
1
) can approach 0 arbitrarily closely as x
2
ranges over C
2
,
but can never reach 0.
The preceding proposition may be used to provide a fundamental
characterization of closed convex sets.
Proposition 1.4.4: The closure of the convex hull of a set C ⊂ '
n
is the intersection of the halfspaces that contain C. In particular, a
closed and convex set is the intersection of the halfspaces that contain
it.
92 Convex Analysis and Optimization Chap. 1
Proof: Assume ﬁrst that C is closed and convex. Then C is contained in
the intersection of the halfspaces that contain C, so we focus on proving
the reverse inclusion. Let x / ∈ C. Applying the Strict Separation Theorem
(Prop. 1.4.3) to the sets C and ¦x¦, we see that there exists a halfspace
containing C but not containing x. Hence, if x / ∈ C, then x cannot belong
to the intersection of the halfspaces containing C, proving that C contains
that intersection. Thus the result is proved for the case where C is closed
and convex.
Consider the case of a general set C. Each halfspace H that contains
C must also contain conv(C) (since H is convex), and also cl
_
conv(C)
_
(since H is closed). Hence the intersection of all halfspaces containing
C and the intersection of all halfspaces containing cl
_
conv(C)
_
coincide.
From what has been proved for the case of a closed convex set, the latter
intersection is equal to cl
_
conv(C)
_
. Q.E.D.
1.4.2 Nonvertical Hyperplanes and the Minimax Theorem
In the context of optimization theory, supporting hyperplanes are often
used in conjunction with epigraphs of functions f on '
n
. Since epi(f) =
¦(x, w) [ w ≥ f(x)¦ is a subset of '
n+1
, the normal vector of a hyperplane
is a nonzero vector of the form (µ, β), where µ ∈ '
n
and β ∈ '. We say
that the hyperplane is horizontal if µ = 0 and we say that it is vertical if
β = 0.
Note that f is bounded if and only if epi(f) is contained in a halfspace
corresponding to a horizontal hyperplane. Note also that if a hyperplane
with normal (µ, β) is nonvertical (i.e., β ,= 0), then it crosses the (n +1)st
axis (the axis associated with w) at a unique point. If (x, w) is any vector
on the hyperplane, the crossing point has the form (0, ξ), where
ξ =
µ
β
x +w, (1.37)
since from the hyperplane equation, we have (0, ξ)
(µ, β) = (x, w)
(µ, β).
On the other hand, it can be seen that if the hyperplane is vertical, it either
contains the entire (n+1)st axis, or else it does not cross it at all; see Fig.
1.4.4.
Vertical lines in '
n+1
are sets of the form ¦(x, w) [ w ∈ '¦, where x
is a ﬁxed vector in '
n
. It can be seen that vertical hyperplanes, as well
as the corresponding halfspaces, consist of the union of the vertical lines
that pass through their points. If f(x) > −∞ for all x, then epi(f) cannot
contain a vertical line, and it appears plausible that epi(f) is contained in
some halfspace corresponding to a nonvertical hyperplane. We prove this
fact in greater generality in the following proposition, which will also be
useful as a ﬁrst step in the subsequent development.
Sec. 1.4 Hyperplanes 93
(µ,β)
w
x
Nonvertical
Hyperplane
(µ,0)
Vertical
Hyperplane
(x,w)
_ _
(µ/β) x + w
_ _
Figure 1.4.4. Illustration of vertical and nonvertical hyperplanes in
n+1
.
Proposition 1.4.5: Let C be a nonempty convex subset of '
n+1
,
and let the points of '
n+1
be denoted by (u, w), where u ∈ '
n
and
w ∈ '.
(a) If C contains no vertical lines, then C is contained in a halfspace
corresponding to a nonvertical hyperplane; that is, there exists
(µ, β) such that µ ∈ '
n
, β ∈ ', β ,= 0, and γ ∈ ' such that
µ
u + βw ≥ γ, ∀ (u, w) ∈ C.
(b) If C contains no vertical lines and (u, w) does not belong to
the closure of C, there exists a nonvertical hyperplane strictly
separating (u, w) from C.
Proof: We ﬁrst note that if C contains no vertical lines, then ri(C) con
tains no vertical lines, which implies that cl(C) contains no vertical lines,
since the recession cones of cl(C) and ri(C) coincide, as shown in the exer
cises to Section 1.2. Thus if we prove the result assuming that C is closed,
the proof for the case where C is not closed will readily follow by replacing
C with cl(C). Hence we assume without loss of generality that C is closed.
(a) By Prop. 1.4.4, C is the intersection of all halfspaces that contain it.
If every hyperplane containing C in one of its halfspaces were vertical, we
would have
C = ∩
i∈I
_
(u, w) [ µ
i
u ≥ γ
i
_
for a collection of nonzero vectors µ
i
, i ∈ I, and scalars γ
i
, i ∈ I. Then
for every (u, w) ∈ C, the vertical line
_
(u, w) [ w ∈ '
_
also belongs to C.
94 Convex Analysis and Optimization Chap. 1
It follows that if no vertical line belongs to C, there exists a nonvertical
hyperplane containing C.
(b) Since (u, w) does not belong to C, there exists a hyperplane strictly
separating (u, w) from C (cf. Prop. 1.4.3). If this hyperplane is nonvertical,
we are done, so assume otherwise. Then we have a nonzero vector µ and a
scalar γ such that
µ
u > γ > µ
u, ∀ (u, w) ∈ C.
Consider a nonvertical hyperplane containing C in one of its subspaces, so
that for some (µ, β) and γ, with β ,= 0, we have
µ
u + βw ≥ γ, ∀ (u, w) ∈ C.
By multiplying this relation with any > 0 and adding it to the preceding
relation, we obtain
(µ + µ)
u + βw > γ +γ, ∀ (u, w) ∈ C.
Since γ > olµ
u, there is a small enough such that
γ + γ > (µ +µ)
u + βw.
From the above two relations, we obtain
(µ + µ)
u + βw > (µ + µ)
u + βw, ∀ (u, w) ∈ C,
implying that there is a nonvertical hyperplane with normal (µ + µ, β)
that strictly separates (u, w) from C. Q.E.D.
Min Common/Max Crossing Duality
Hyperplanes allow insightful visualization of duality concepts. This is par
ticularly so in a construction involving two simple optimization problems,
which we now discuss. These problems will prove particularly relevant in
the context of duality, and will also form the basis for the proof of the
Minimax Theorem, stated at the end of the preceding section.
Let S be a subset of '
n+1
and consider the following two problems.
(a) Min Common Point Problem: Among all points that are common to
both S and the (n + 1)st axis, we want to ﬁnd one whose (n + 1)st
component is minimum.
(b) Max Crossing Point Problem: Consider nonvertical hyperplanes that
contain S in their corresponding “upper” halfspace, i.e., the halfspace
that contains the vertical halﬂine
_
(0, w) [ w ≥ 0
_
in its recession cone
Sec. 1.4 Hyperplanes 95
0
(a)
Min Common Point w
*
Max Crossing Point q
*
M
_
0
(b)
M
M
0
(c)
S
_
M
M
Max Crossing Point q
*
Max Crossing Point q
*
Min Common Point w
*
Min Common Point w
*
Figure 1.4.5. Illustration of the optimal values of the min common and max
crossing problems. In (a), the two optimal values are not equal. In (b), when S
is “extended upwards” along the (n + 1)st axis it yields the set
S =
_
(u, w)  there exists w with w ≤ w and (u, w) ∈ S
_
,
which is closed and convex. As a result, the two optimal values are equal. In (c),
the set S is convex but not closed, and there are points (0, w) on the vertical axis
with w < w
∗
that lie in the closure of S. Here q
∗
is equal to the minimum such
value of w, and we have q
∗
< w
∗
.
(see Fig. 1.4.5). We want to ﬁnd the maximum crossing point of the
(n + 1)st axis with such a hyperplane.
Figure 1.4.5 suggests that the optimal value of the max crossing point
problem is no larger than the optimal value of the min common point
problem, and that under favorable circumstances the two optimal values
are equal.
We now formalize the analysis of the two above problems and pro
vide conditions that guarantee equality of their optimal values. The min
96 Convex Analysis and Optimization Chap. 1
common point problem is
minimize w
subject to (0, w) ∈ S,
(1.38)
and its optimal value is denoted by w
∗
, i.e.,
w
∗
= inf
(0,w)∈S
w.
Given a nonvertical hyperplane in '
n+1
, multiplication of its normal
vector (µ, β), β ,= 0, by a nonzero scalar maintains the normality property.
Hence, the set of nonvertical hyperplanes can be equivalently described as
the set of all hyperplanes with normals of the form (µ, 1). A hyperplane of
this type crosses the (n +1)st axis at
ξ = w + µ
u,
where (u, w) is any vector that lies on the hyperplane [cf. Eq. (1.37)]. In
order for the upper halfspace corresponding to the hyperplane to contain
S, we must have
ξ = w + µ
u ≤ w +µ
u, ∀ (u, w) ∈ S.
Combining the two above equations, we see that the max crossing point
problem can be expressed as
maximize ξ
subject to ξ ≤ w + µ
u, ∀ (u, w) ∈ S,
µ ∈ '
n
,
or equivalently,
maximize ξ
subject to ξ ≤ inf
(u,w)∈S
¦w + µ
u¦, µ ∈ '
n
.
Thus, the max crossing point problem is given by
maximize q(µ)
subject to µ ∈ '
n
,
(1.39)
where
q(µ) = inf
(u,w)∈S
¦w + µ
u¦. (1.40)
Sec. 1.4 Hyperplanes 97
We denote by q
∗
the corresponding optimal value,
q
∗
= sup
µ∈
n
q(µ).
Note that for every (u, w) ∈ S and every µ ∈ '
n
, we have
q(µ) = inf
(u,w)∈S
¦w + µ
u¦ ≤ inf
(0,w)∈S
w = w
∗
,
so by taking the supremum of the lefthand side over µ ∈ '
n
, we obtain
q
∗
≤ w
∗
, (1.41)
i.e., the max crossing point is no higher than the min common point, as
suggested by Fig. 1.4.5. Note also that if −∞ < w
∗
< ∞, then (0, w
∗
) is
seen to be a closure point of the set S, so if in addition S is closed and
convex, and admits a nonvertical supporting hyperplane at (0, w
∗
), then we
have q
∗
= w
∗
and the optimal values q
∗
and w
∗
are attained. Between the
“unfavorable” case where q
∗
< w
∗
, and the “most favorable” case where
q
∗
= w
∗
while the optimal values q
∗
and w
∗
are attained, there are several
intermediate cases. The following proposition provides assumptions that
guarantee the equality q
∗
= w
∗
, but not necessarily the attainment of the
optimal values.
Proposition 1.4.6 (Min Common/Max Crossing Theorem):
Consider the min common point and max crossing point problems,
and assume the following:
(1) −∞ < w
∗
< ∞.
(2) The set
S =
_
(u, w) [ there exists w with w ≤ w and (u, w) ∈ S
_
is convex.
(3) For every sequence
_
(u
k
, w
k
)
_
⊂ S with u
k
→ 0,
w
∗
≤ liminf
k→∞
w
k
.
Then we have q
∗
= w
∗
.
Proof: We ﬁrst note that (0, w
∗
) is a closure point of S, since by the
deﬁnition of w
∗
, there exists a sequence
_
(0, w
k
)
_
⊂ S ⊂ S such that
w
k
→ w
∗
.
98 Convex Analysis and Optimization Chap. 1
We next show by contradiction that S does not contain any vertical
lines. If this were not so, since S is convex, the direction
_
(0, −γ) [ γ ≥ 0
_
would be a direction of recession of cl(S) and hence also a direction of
recession of ri(S) (see the exercises in Section 1.2). Since (0, w
∗
) is a closure
point of S, there exists a sequence
_
(u
k
, w
k
)
_
⊂ ri(S) converging to (0, w
∗
),
so the sequence
_
(u
k
, w
k
−γ)
_
belongs to ri(S) for each ﬁxed γ > 0. In view
of the deﬁnition of S, for any γ > 0, there is a sequence
_
(u
k
, w
k
)
_
⊂ S
with w
k
≤ w
k
−γ for all k, so that liminf
k→∞
w
k
≤ w
∗
−γ. Since u
k
→ 0,
this contradicts assumption (3).
As shown above, S does not contain any vertical lines. Furthermore,
using condition (3) and the fact that w
∗
is the optimal value of the min
common point problem, it follows that for any > 0, the vector (0, w
∗
−)
does not belong to the closure of S. It follows, by using Prop. 1.4.5(b), that
there exists a hyperplane separating (0, w
∗
− ) from S. This hyperplane
crosses the (n+1)st axis at a unique point, which must lie between w
∗
−
and w
∗
, and must also be less or equal to the optimal value q
∗
of the max
crossing point problem. Since can be arbitrarily small, it follows that
q
∗
≥ w
∗
. In view of the fact q
∗
≤ w
∗
, which holds always [cf. Eq. (1.41)],
the result follows. Q.E.D.
Proof of the Minimax Theorem
We will now prove the Minimax Theorem stated at the end of last section
(cf. Prop. 1.3.9). In particular, assume that X and Z are nonempty convex
subsets of '
n
and '
m
, respectively, φ : X Z → ' is a function such
that for each z ∈ Z, the function φ(, z) : X → ' is convex and lower
semicontinuous, and for each x ∈ X, the function φ(x, ) : Z → ' is
concave and upper semicontinuous. Then if the additional assumptions of
Prop. 1.3.9 regarding the function
p(u) = inf
x∈X
sup
z∈Z
¦φ(x, z) −u
z¦
hold, we will show that
sup
z∈Z
inf
x∈X
φ(x, z) = inf
x∈X
sup
z∈Z
φ(x, z).
We ﬁrst prove the following lemma.
Sec. 1.4 Hyperplanes 99
Lemma 1.4.1: Let X and Z be nonempty convex subsets of '
n
and
'
m
, respectively, and let φ : X Z → ' be a function. Assume that
for each z ∈ Z, the function φ(, z) : X → ' is convex, and consider
the function p deﬁned by
p(u) = inf
x∈X
sup
z∈Z
¦φ(x, z) −u
z¦, ∀ u ∈ '
m
.
Then the set
P =
_
u [ p(u) < ∞
_
is convex and p satisﬁes
p
_
αu
1
+ (1 −α)u
2
_
≤ αp(u
1
) +(1 −α)p(u
2
) (1.42)
for all α ∈ [0, 1] and all u
1
, u
2
∈ P.
Proof: Let u
1
∈ P and u
2
∈ P. Rewriting p(u) as
p(u) = inf
x∈X
l(x, u),
where l(x, u) = sup
z∈Z
_
φ(x, z) −u
z
_
, we have that there exist sequences
¦x
1
k
¦ ⊂ X and ¦x
2
k
¦ ⊂ X such that
l(x
1
k
, u
1
) →p(u
1
), l(x
2
k
, u
2
) → p(u
2
).
By convexity of X, we have αx
1
k
+ (1 −α)x
2
k
∈ X for all α ∈ [0, 1]. Using
also the convexity of φ(, z) for each z ∈ Z, we have
p
_
αu
1
+(1 −α)u
2
_
≤ l
_
αx
1
k
+ (1 −α)x
2
k
, αu
1
+ (1 −α)u
2
_
= sup
z∈Z
_
φ
_
αx
1
k
+ (1 −α)x
2
k
, z
_
−
_
αu
1
+(1 −α)u
2
_
z
_
≤ sup
z∈Z
_
αφ(x
1
k
, z) + (1 −α)φ(x
2
k
, z) −
_
αu
1
+ (1 −α)u
2
_
z
_
≤ αsup
z∈Z
_
φ(x
1
k
, z) −u
1
z
_
+ (1 −α) sup
z∈Z
_
φ(x
2
k
, z) −u
2
z
_
= αl(x
1
k
, u
1
) + (1 −α)l(x
2
k
, u
2
).
Since
αl(x
1
k
, u
1
) + (1 −α)l(x
2
k
, u
2
) → αp(u
1
) + (1 −α)p(u
2
),
it follows that
p
_
αu
1
+(1 −α)u
2
_
≤ αp(u
1
) + (1 −α)p(u
2
).
100 Convex Analysis and Optimization Chap. 1
Furthermore, since u
1
and u
2
belong to P, the lefthand side of the above
relation is less than ∞, so αu
1
+ (1 −α)u
2
∈ P for all α ∈ [0, 1], implying
that P is convex. Q.E.D.
Note that Lemma 1.4.1 allows the possibility that the function p not
only takes ﬁnite values and the value ∞, but also the value −∞. However,
under the additional assumption of Prop. 1.3.9 that p(0) ≤ liminf
k→∞
p(u
k
)
for all sequences ¦u
k
¦ with u
k
→ 0, we must have p(u) > −∞ for all
u ∈ '
n
. Furthermore, p is is an extendedreal valued convex function,
which is lower semicontinuous at u = 0. The reason is that if p(u) = −∞
for some u ∈ '
m
, then from Eq. (1.42), we must have p(αu) = −∞ for
all α ∈ (0, 1], contradicting the assumption that p(0) ≤ liminf
k→∞
p(u
k
)
for all sequences ¦u
k
¦ with u
k
→ 0. Thus, under the assumptions of
Prop. 1.3.9, p maps '
n
into (−∞, ∞], and in view of Lemma 1.4.1, p is an
extendedreal valued convex function. Furthermore, p is lower semicontinu
ous at u = 0 in view again of the assumption that p(0) ≤ liminf
k→∞
p(u
k
)
for all sequences ¦u
k
¦ with u
k
→ 0.
We now prove Prop. 1.3.9 by reducing it to an application of the Min
Common/Max Crossing Theorem (cf. Prop. 1.4.6) of the preceding section.
In particular, we show that with an appropriate selection of the set S, the
assumptions of Prop. 1.3.9 are essentially equivalent to the corresponding
assumptions of the Min Common/Max Crossing Theorem. Furthermore,
the optimal values of the corresponding min common and max crossing
point problems are
q
∗
= sup
z∈Z
inf
x∈X
φ(x, z), w
∗
= inf
x∈X
sup
z∈Z
φ(x, z).
We choose the set S (as well the set S) in that theorem to be the
epigraph of p, i.e.,
S = S =
_
(u, w) [ u ∈ P, p(u) ≤ w
_
,
which is convex in view of the convexity of the function p shown above.
Thus assumption (2) of the Min Common/Max Crossing Theorem is sat
isﬁed. We clearly have
w
∗
= p(0),
so the assumption −∞ < p(0) < ∞ of Prop. 1.3.9 is equivalent to as
sumption (1) of the Min Common/Max Crossing Theorem. Furthermore,
the assumption that p(0) ≤ liminf
k→∞
p(u
k
) for all sequences ¦u
k
¦ with
u
k
→ 0 is equivalent to the last remaining assumption (3) of the Min Com
mon/Max Crossing Theorem. Thus the conclusion q
∗
= w
∗
of the theorem
holds.
From the deﬁnition of p, it follows that
w
∗
= p(0) = inf
x∈X
sup
z∈Z
φ(x, z).
Sec. 1.4 Hyperplanes 101
Thus, to complete the proof of Prop. 1.3.9, all that remains is to show that
for the max crossing point problem corresponding to the epigraph S of p,
we have
q
∗
= sup
µ∈
m
inf
(u,w)∈S
¦w + µ
u¦ = sup
z∈Z
inf
x∈X
φ(x, z).
To prove this, let us write for every µ ∈ '
m
,
q(µ) = inf
(u,w)∈S
¦w + µ
u¦
= inf
_
(u,w)u∈P, p(u)≤w
_
¦w +µ
u¦
= inf
u∈P
_
p(u) + µ
u
_
From the deﬁnition of p, we have for every µ ∈ '
m
,
q(µ)= inf
u∈P
_
p(u) + u
µ
_
,
= inf
u∈P
inf
x∈X
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
= inf
x∈X
inf
u∈P
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
.
We now show that q(µ) is given by
q(µ) =
_
inf
x∈X
φ(x, µ) if µ ∈ Z,
−∞ if µ / ∈ Z.
(1.43)
Indeed, if µ ∈ Z, then
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
≥ φ(x, µ), ∀ x ∈ X, ∀ u ∈ P,
so
inf
u∈P
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
≥ φ(x, µ), ∀ x ∈ X. (1.44)
To show that we have equality in Eq. (1.44), we deﬁne the function r
x
(z) :
'
m
→ (−∞, ∞] as
r
x
(z) =
_
−φ(x, z) if z ∈ Z,
+∞ otherwise,
which is closed and convex for all x ∈ X by the given concavity/upper
semicontinuity assumptions, so the epigraph of rx(z), epi
_
rx
_
, is a closed
and convex set. Since µ ∈ Z, the point
_
µ, r
x
(µ)
_
belongs to epi(r
x
). For
some > 0, we consider the point
_
µ, r
x
(µ) −
_
, which does not belong to
epi(r
x
). Since r
x
(z) > −∞ for all z, the set epi(r
x
) does not contain any
vertical lines. Therefore, by Prop. 1.4.5(b), for all x ∈ X, there exists a
102 Convex Analysis and Optimization Chap. 1
nonvertical hyperplane with normal (u, ζ) with ζ ,= 0, and a scalar c such
that
u
µ + ζ
_
r
x
(µ) −
_
< c < u
z + ζw, ∀ (z, w) ∈ epi(r
x
).
Since w can be made arbitrarily large, we have ζ > 0, and we can take
ζ = 1. In particular for w = rx(z), with z ∈ Z, we have
u
µ +
_
r
x
(µ) −
_
< c < u
z + r
x
(z), ∀ z ∈ Z,
or equivalently,
φ(x, z) + u
(µ −z) < φ(x, µ) + , ∀ z ∈ Z.
Letting → 0, we obtain
inf
u∈
m
sup
z∈Z
_
φ(x, z)+u
(µ−z)
_
≤ sup
z∈Z
_
φ(x, z)+u
(µ−z)
_
≤ φ(x, µ), ∀ x ∈ X
which together with Eq. (1.44) implies that
inf
u∈
m
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
= φ(x, µ), ∀ x ∈ X.
Therefore, if µ ∈ Z, q(µ) has the form of Eq. (1.43).
If µ / ∈ Z, we consider a sequence ¦w
k
¦ that goes to −∞. Since µ / ∈ Z,
the sequence of points (µ, w
k
) does not belong to the epi(r
x
). Therefore,
similar to the argument above, there exists a sequence of nonvertical hy
perplanes with normals (u
k
, 1) such that
φ(x, z) + u
k
(µ −z) < w
k
, ∀ z ∈ Z, ∀ k,
so that
inf
u∈
m
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
≤ sup
z∈Z
_
φ(x, z) + u
k
(µ −z)
_
≤ w
k
, ∀ k.
Taking the limit in the above equation along the relevant subsequence, we
obtain
inf
u∈
m
sup
z∈Z
_
φ(x, z) + u
(µ −z)
_
= −∞, ∀ x ∈ X,
which proves that q(µ) has the form given in Eq. (1.43) and it follows that
q
∗
= sup
z∈Z
inf
x∈X
φ(x, z).
E X E R C I S E S
Sec. 1.5 Conical Approximations 103
1.4.1 (Strong Separation)
Let C1 and C2 be two nonempty, convex subsets of !
n
, and let B denote the unit
ball in !
n
. A hyperplane H is said to separate strongly C1 and C2 if there exists
an > 0 such that C1 +B is contained in one of the open halfspaces associated
with H and C2 + B is contained in the other. Show that:
(a) The following three conditions are equivalent:
(i) There exists a hyperplane strongly separating C1 and C2.
(ii) There exists a vector b ∈ !
n
such that infx∈C
1
b
x > sup
x∈C
2
b
x.
(iii) infx
1
∈C
1
, x
2
∈C
2
x1 −x2 > 0.
(b) If C1 and C2 are closed and have no common directions of recession, there
exists a hyperplane strongly separating C1 and C2.
(c) If the two sets C1 and C2 have disjoint closures, and at least one of the two
is bounded, there exists a hyperplane strongly separating them.
1.4.2 (Proper Separation)
Let C1 and C2 be two nonempty, convex subsets of !
n
. A hyperplane H is said
to separate properly C1 and C2 if C1 and C2 are not both contained in H. Show
that the following three conditions are equivalent:
(i) There exists a hyperplane properly separating C1 and C2.
(ii) There exists a vector b ∈ !
n
such that
inf
x∈C
1
b
x ≥ sup
x∈C
2
b
x, sup
x∈C
1
b
x > inf
x∈C
2
b
x,
(iii) The relative interiors ri(C1) and ri(C2) have no point in common.
1.5 CONICAL APPROXIMATIONS AND CONSTRAINED
OPTIMIZATION
Optimality conditions for constrained optimization problems revolve around
the behavior of the cost function at points of the contraint set around a
candidate for optimality x
∗
. Within this context, approximating the con
straint set locally around x
∗
by a conical set turns out to be particularly
useful. In this section, we develop this approach and we derive necessary
conditions for constrained optimality. We ﬁrst introduce some basic notions
relating to cones.
Given a set C, the cone given by
C
∗
= ¦y [ y
x ≤ 0, ∀ x ∈ C¦,
104 Convex Analysis and Optimization Chap. 1
is called the polar cone of C (see Fig. 1.5.1). Clearly, the polar cone C
∗
,
being the intersection of a collection of closed halfspaces, is closed and con
vex (regardless of whether C is closed and/or convex). If C is a subspace,
it can be seen that the polar cone C
∗
is equal to the orthogonal subspace
C
⊥
. The following basic result generalizes the equality C = (C
⊥
)
⊥
, which
holds in the case where C is a subspace.
0
C
∗
C
a
1
a
2
(a)
C
a
1
0
C
∗
a
2
(b)
Figure 1.5.1. Illustration of the polar cone C
∗
of a set C ⊂
2
. In (a) C
consists of just two points, a
1
and a
2
, while in (b) C is the convex cone {x  x =
µ
1
a
1
+µ
2
a
2
, µ
1
≥ 0, µ
2
≥ 0}. The polar cone C
∗
is the same in both cases.
Proposition 1.5.1: (Polar Cone Theorem)
(a) For any set C, we have
C
∗
=
_
cl(C)
_
∗
=
_
conv(C)
_
∗
=
_
cone(C)
_
∗
.
(b) For any cone C, we have
(C
∗
)
∗
= cl
_
conv(C)
_
.
In particular, if C is closed and convex, then (C
∗
)
∗
= C.
Proof: (a) Clearly, we have
_
cl(C)
_
∗
⊂ C
∗
. Conversely, if y ∈ C
∗
, then
y
x
k
≤ 0 for all k and all sequences ¦x
k
¦ ⊂ C, so that y
x ≤ 0 for all limits
x of such sequences. Hence, y ∈
_
cl(C)
_
∗
and C
∗
⊂
_
cl(C)
_
∗
.
Similarly, we have
_
conv(C)
_
∗
⊂ C
∗
. Conversely, if y ∈ C
∗
, then
Sec. 1.5 Conical Approximations 105
y
x ≤ 0 for all x ∈ C so that y
z ≤ 0 for all z that are convex combinations
of vectors x ∈ C. Hence y ∈
_
conv(C)
_
∗
and C
∗
⊂
_
conv(C)
_
∗
. A nearly
identical argument also shows that C
∗
=
_
cone(C)
_
∗
.
(b) Figure 1.5.2 shows that if C is closed and convex, then (C
∗
)
∗
= C.
From this it follows that
__
cl
_
conv(C)
__
∗
_
∗
= cl
_
conv(C)
_
,
and by using part (a) in the lefthand side above, we obtain (C
∗
)
∗
=
cl
_
conv(C)
_
. Q.E.D.
x
C
y
z
0
C
∗
z
^
2z
^
z  z
^
Figure 1.5.2. Proof of the Polar Cone Theorem for the case where C is a closed
and convex cone. If x ∈ C, then for all y ∈ C
∗
, we have x
y ≤ 0, which implies that
x ∈ (C
∗
)
∗
. Hence, C ⊂ (C
∗
)
∗
. To prove the reverse inclusion, take z ∈ (C
∗
)
∗
,
and let ˆz be the unique projection of z on C, as shown in the ﬁgure. Since C is
closed, the projection exists by the Projection Theorem (Prop. 1.3.3), which also
implies that
(z −ˆz)
(x −ˆz) ≤ 0, ∀ x ∈ C.
By taking in the preceding relation x = 0 and x = 2ˆz (which belong to C since C
is a closed cone), it is seen that
(z −ˆ z)
ˆz = 0.
Combining the last two relations, we obtain (z−ˆz)
x ≤ 0 for all x ∈ C. Therefore,
(z − ˆz) ∈ C
∗
, and since z ∈ (C
∗
)
∗
, we obtain (z − ˆ z)
z ≤ 0, which when added
to (z −ˆz)
ˆz = 0 yields z −ˆz
2
≤ 0. Therefore, z = ˆz and z ∈ C, implying that
(C
∗
)
∗
⊂ C.
The analysis of a constrained optimization problem often centers on
how the cost function behaves along directions leading away from a local
106 Convex Analysis and Optimization Chap. 1
minimum to some neighboring feasible points. The sets of the relevant
directions constitute cones that can be viewed as approximations to the
constraint set, locally near a point of interest. Let us introduce two such
cones, which are important in connection with optimality conditions.
Deﬁnition 1.5.1: Given a subset X of '
n
and a vector x ∈ X, a
feasible direction of X at x is a vector y ∈ '
n
such that there exists
an α > 0 with x + αy ∈ X for all α ∈ [0, α]. The set of all feasible
directions of X at x is a cone denoted by F
X
(x).
It can be seen that if X is convex, the feasible directions at x are the
vectors of the form α(x − x) with α > 0 and x ∈ X. However, when X
is nonconvex, the cone of feasible directions need not provide interesting
information about the local structure of the set X near the point x. For
example, often there is no nonzero feasible direction at x when X is noncon
vex [think of the set X =
_
x [ h(x) = 0
_
, where h : '
n
→ ' is a nonlinear
function]. The next deﬁnition introduces a cone that provides information
on the local structure of X even when there is no nonzero feasible direction.
Deﬁnition 1.5.2: Given a subset X of '
n
and a vector x ∈ X, a
vector y is said to be a tangent of X at x if either y = 0 or there exists
a sequence ¦x
k
¦ ⊂ X such that x
k
,= x for all k and
x
k
→ x,
x
k
−x
x
k
−x
→
y
y
.
The set of all tangents of X at x is a cone called the tangent cone of
X at x, and is denoted by T
X
(x).
x
k
x
k1
x + y
x + y
k
x + y
k1
Ball of
radius y
x + y
k+1
x
X
Tangent at x
Figure 1.5.3. Illustration of a tangent
y at a vector x ∈ X. There is a se
quence {x
k
} ⊂ X that converges to x
and is such that the normalized direction
sequence (x
k
−x)/x
k
−x converges to
y/y, the normalized direction of y, or
equivalently, the sequence
y
k
=
y(x
k
−x)
x
k
−x
illustrated in the ﬁgure converges to y.
Sec. 1.5 Conical Approximations 107
Thus a nonzero vector y is a tangent at x if it is possible to approach x
with a feasible sequence ¦x
k
¦ such that the normalized direction sequence
(x
k
−x)/x
k
−x converges to y/y, the normalized direction of y (see
Fig. 1.5.3). The following proposition provides an equivalent deﬁnition of
a tangent, which is occasionally more convenient.
x
1
x
2
(a)
x
1
x
2
(b)
(1,2)
T
X
(x) = cl(F
X
(x))
x = (0,1)
x = (0,1)
T
X
(x)
Figure 1.5.4. Examples of the cones F
X
(x) and T
X
(x) of a set X at the vector
x = (0, 1). In (a), we have
X =
_
(x1, x2)  (x1 + 1)
2
−x2 ≤ 0, (x1 −1)
2
−x2 ≤ 0
_
.
Here X is convex and the tangent cone T
X
(x) is equal to the closure of the cone of
feasible directions F
X
(x) (which is an open set in this example). Note, however,
that the vectors (1, 2) and (−1, 2) (as well the origin) belong to T
X
(x) and also
to the closure of F
X
(x), but are not feasible directions. In (b), we have
X =
_
(x1, x2) 
_
(x1 + 1)
2
−x2
__
(x1 −1)
2
−x2
_
= 0
_
.
Here the set X is nonconvex, and T
X
(x) is closed but not convex. Furthermore,
F
X
(x) consists of just the zero vector.
Proposition 1.5.2: A vector y is a tangent of a set X at a vector
x ∈ X if and only if there exists a sequence ¦x
k
¦ ⊂ X with x
k
→ x,
and a positive sequence ¦α
k
¦ such that α
k
→ 0 and (x
k
−x)/α
k
→ y.
Proof: If ¦x
k
¦ is the sequence in the deﬁnition of a tangent, take α
k
=
x
k
−x/y. Conversely, if α
k
→ 0, (x
k
−x)/α
k
→ y, and y ,= 0, clearly
x
k
→ x and
x
k
−x
x
k
−x
=
(x
k
−x)/α
k
(x
k
−x)/α
k

→
y
y
,
108 Convex Analysis and Optimization Chap. 1
so ¦x
k
¦ satisﬁes the deﬁnition of a tangent. Q.E.D.
Figure 1.5.4 illustrates the cones T
X
(x) and F
X
(x) with examples.
The following proposition gives some of the properties of the cones F
X
(x)
and T
X
(x).
Proposition 1.5.3: Let X be a nonempty subset of '
n
and let x
be a vector in X. The following hold regarding the cone of feasible
directions F
X
(x) and the tangent cone T
X
(x).
(a) T
X
(x) is a closed cone.
(b) cl
_
F
X
(x)
_
⊂ T
X
(x).
(c) If X is convex, then F
X
(x) and T
X
(x) are convex, and we have
cl
_
F
X
(x)
_
= T
X
(x).
Proof: (a) Let ¦y
k
¦ be a sequence in T
X
(x) that converges to y. We will
show that y ∈ T
X
(x). If y = 0, then y ∈ T
X
(x), so assume that y ,= 0. By
the deﬁnition of a tangent, for every y
k
there is a sequence ¦x
i
k
¦ ⊂ X with
x
i
k
,= x such that
lim
i→∞
x
i
k
= x, lim
i→∞
x
i
k
−x
[[x
i
k
−x[[
=
y
k
[[y
k
[[
.
For k = 1, 2, . . ., choose an index i
k
such that i
1
< i
2
< . . . < i
k
and
lim
k→∞
[[x
i
k
k
−x[[ = 0, lim
k→∞
_
_
_
_
_
x
i
k
k
−x
[[x
i
k
k
−x[[
−
y
k
[[y
k
[[
_
_
_
_
_
= 0.
We also have for all k
_
_
_
_
_
x
i
k
k
−x
[[x
i
k
k
−x[[
−
y
[[y[[
_
_
_
_
_
≤
_
_
_
_
_
x
i
k
k
−x
[[x
i
k
k
−x[[
−
y
k
[[y
k
[[
_
_
_
_
_
+
_
_
_
_
y
k
[[y
k
[[
−
y
[[y[[
_
_
_
_
,
so the fact y
k
→ y, and the preceding two relations imply that
lim
k→∞
x
i
k
k
−x = 0, lim
k→∞
_
_
_
_
_
x
i
k
k
−x
[[x
i
k
k
−x[[
−
y
[[y[[
_
_
_
_
_
= 0.
Hence y ∈ T
X
(x) and T
X
(x) is closed.
(b) Clearly every feasible direction is also a tangent, so F
X
(x) ⊂ T
X
(x).
Since by part (a), T
X
(x) is closed, the result follows.
Sec. 1.5 Conical Approximations 109
(c) If X is convex, the feasible directions at x are the vectors of the form
α(x − x) with α > 0 and x ∈ X. From this it can be seen that F
X
(x) is
convex. Convexity of T
X
(x) will follow from the convexity of F
X
(x) once
we show that cl
_
F
X
(x)
_
= T
X
(x).
In view of part (b), it will suﬃce to show that T
X
(x) ⊂ cl
_
F
X
(x)
_
.
Let y ∈ T
X
(x) and, using Prop. 1.5.2, let ¦x
k
¦ be a sequence in X and
¦α
k
¦ be a positive sequence such that α
k
→ 0 and (x
k
−x)/α
k
→ y. Since
X is a convex set, the direction (x
k
−x)/α
k
is feasible at x for all k. Hence
y ∈ cl
_
F
X
(x)
_
, and it follows that T
X
(x) ⊂ cl
_
F
X
(x)
_
. Q.E.D.
The tangent cone ﬁnds an important application in the following basic
necessary condition for local optimality:
Proposition 1.5.4: Let f : '
n
→ ' be a smooth function, and let
x
∗
be a local minimum of f over a set X ⊂ '
n
. Then
∇f(x
∗
)
y ≥ 0, ∀ y ∈ T
X
(x
∗
).
If X is convex, this condition can be equivalently written as
∇f(x
∗
)
(x −x
∗
) ≥ 0, ∀ x ∈ X,
and in the case where X = '
n
, reduces to ∇f(x
∗
) = 0.
Proof: Let y be a nonzero tangent of X at x
∗
. Then there exists a sequence
¦ξ
k
¦ and a sequence ¦x
k
¦ ⊂ X such that x
k
,= x
∗
for all k,
ξ
k
→ 0, x
k
→ x
∗
,
and
x
k
−x
∗
x
k
−x
∗

=
y
y
+ ξ
k
.
By the Mean Value Theorem, we have for all k
f(x
k
) = f(x
∗
) +∇f(˜ x
k
)
(x
k
−x
∗
),
where ˜ x
k
is a vector that lies on the line segment joining x
k
and x
∗
. Com
bining the last two equations, we obtain
f(x
k
) = f(x
∗
) +
x
k
−x
∗

y
∇f(˜ x
k
)
y
k
, (1.45)
where
y
k
= y +yξ
k
.
110 Convex Analysis and Optimization Chap. 1
If ∇f(x
∗
)
y < 0, since ˜ x
k
→ x
∗
and y
k
→ y, it follows that for all suﬃ
ciently large k, ∇f(˜ x
k
)
y
k
< 0 and [from Eq. (1.45)] f(x
k
) < f(x
∗
). This
contradicts the local optimality of x
∗
.
When X is convex, we have cl
_
F
X
(x)
_
= T
X
(x) (cf. Prop. 1.5.3).
Thus the condition shown can be written as
∇f(x
∗
)
y ≥ 0, ∀ y ∈ cl
_
F
X
(x)
_
,
which in turn is equivalent to
∇f(x
∗
)
(x −x
∗
) ≥ 0, ∀ x ∈ X.
If X = '
n
, by setting x = x
∗
+ e
i
and x = x
∗
− e
i
, where e
i
is the ith
unit vector (all components are 0 except for the ith, which is 1), we obtain
∂f(x
∗
)/∂x
i
= 0 for all i = 1, . . . , n, so ∇f(x
∗
) = 0. Q.E.D.
A direction y for which ∇f(x
∗
)
y < 0 may be viewed as a descent
direction of f at x
∗
, in the sense that we have (by Taylor’s theorem)
f(x
∗
+ αy) = f(x
∗
) + α∇f(x
∗
)
y + o(α) < f(x
∗
)
for suﬃciently small but positive α. Thus Prop. 1.5.4 says that if x
∗
is
a local minimum, there is no descent direction within the tangent cone
T
X
(x
∗
).
Note that the necessary condition of Prop. 1.5.4 can equivalently be
written as
−∇f(x
∗
) ∈ T
X
(x
∗
)
∗
(see Fig. 1.5.5). There is an interesting converse of this result, namely that
given any vector z ∈ T
X
(x
∗
)
∗
, there exists a smooth function f such that
−∇f(x
∗
) = z and x
∗
is a local minimum of f over X. We will return to
this result and to the subject of conical approximations when we discuss
Lagrange multipliers in Chapter 2.
The Normal Cone
In addition to the cone of feasible directions and the tangent cone, there
is one more conical approximation that is of special interest for the op
timization topics covered in this book. This is the normal cone of X at
a vector x ∈ X, denoted by N
X
(x), and obtained from the polar cone
T
X
(x)
∗
by means of a closure operation. In particular, we have z ∈ N
X
(x)
if there exist sequences ¦x
k
¦ ⊂ X and ¦z
k
¦ such that x
k
→ x, z
k
→ z,
and z
k
∈ T
X
(x
k
)
∗
for all k. Equivalently, the graph of N
X
(), viewed as
a pointtoset mapping, is the intersection of the closure of the graph of
T
X
()
∗
with the set
_
(x, z) [ x ∈ X
_
. In the case where X is a closed set,
Sec. 1.5 Conical Approximations 111
Surfaces of equal cost f(x)
Constraint set X
x
*
x
*
− ∇f(x
*
)
x* + (T
X
(x
*
))
*
Figure 1.5.5. Illustration of the necessary optimality condition
−∇f(x
∗
) ∈ T
X
(x
∗
)
∗
for x
∗
to be a local minimum of f over X.
the set
_
(x, z) [ x ∈ X
_
contains the closure of the graph of T
X
()
∗
, so the
graph of N
X
() is equal to the closure of the graph of T
X
()
∗
:
_
(x, z) [ x ∈ X, z ∈ N
X
(x)
_
= cl
__
(x, z) [ x ∈ X, z ∈ T
X
(x)
∗
__
if X is closed.
It can be seen that N
X
(x) is a closed cone containing T
X
(x)
∗
, but
it need not be convex like T
X
(x)
∗
(see the examples of Fig. 1.5.6). In the
case where T
X
(x)
∗
= N
X
(x), we say that X is regular at x. An important
consequence of convexity of X is that it implies regularity, as shown in the
following proposition.
Proposition 1.5.5: Let X be a convex set. Then for all x ∈ X, we
have
z ∈ T
X
(x)
∗
if and only if z
(x −x) ≤ 0, ∀ x ∈ X. (1.46)
Furthermore, X is regular at all x ∈ X.
112 Convex Analysis and Optimization Chap. 1
a
1
a
2
N
X
(x)
x = 0
X = T
X
(x)
N
X
(x)
x = 0
X
T
X
(x) = R
n
(a)
(b)
Figure 1.5.6. Examples of normal cones. In the case of ﬁgure (a), X is the union
of two lines passing through the origin:
X = {x  (a
1
x)(a
2
x) = 0}.
For x = 0 we have T
X
(x) = X, T
X
(x)
∗
= {0}, while N
X
(x) is the nonconvex set
consisting of the two lines of vectors that are collinear to either a1 or a2. Thus
X is not regular at x = 0. At all other vectors x ∈ X, we have regularity with
T
X
(x)
∗
and N
X
(x) equal to either the line of vectors that are collinear to a1 or
the line of vectors that are collinear to a2.
In the case of ﬁgure (b), X is regular at all points except at x = 0, where
we have T
X
(x) =
n
, T
X
(x)
∗
= {0}, while N
X
(x) is equal to the horizontal axis.
Sec. 1.5 Conical Approximations 113
Proof: Since (x − x) ∈ F
X
(x) ⊂ T
X
(x) for all x ∈ X, it follows that
if z ∈ T
X
(x)
∗
, then z
(x − x) ≤ 0 for all x ∈ X. Conversely, let x be
such that z
(x − x) ≤ 0 for all x ∈ X, and to arrive at a contradiction,
assume that z / ∈ T
X
(x)
∗
. Then there exists some y ∈ T
X
(x) such that
z
y > 0. Since cl
_
F
X
(x)
_
= T
X
(x) [cf. Prop. 1.5.3(c)], there exists a
sequence ¦y
k
¦ ⊂ F
X
(x) such that y
k
→ y, so that y
k
= α
k
(x
k
− x) for
some α
k
> 0 and some x
k
∈ X. Since z
y > 0, we have α
k
z
(x
k
−x) > 0
for large enough k, which is a contradiction.
If x ∈ X and z ∈ N
X
(x), there must exist sequences ¦x
k
¦ ⊂ X and
¦z
k
¦ such that x
k
→ x, z
k
→ z, and z
k
∈ T
X
(x
k
)
∗
. By Eq. (1.46) (which
critically depends on the convexity of X), we must have z
k
(x − x
k
) ≤ 0
for all x ∈ X. Taking the limit as k → ∞, we obtain z
(x − x) ≤ 0 for
all x ∈ X, which by Eq. (1.46), implies that z ∈ T
X
(x)
∗
. Thus, we have
N
X
(x) ⊂ T
X
(x)
∗
. Since the reverse inclusion always holds, it follows that
N
X
(x) = T
X
(x)
∗
, so that X is regular at x. Q.E.D.
Note that convexity of T
X
(x) does not imply regularity of X at x,
as the example of Fig. 1.5.6(b) shows. However, a converse can be shown,
namely that if X is closed and is regular at x, then T
X
(x) is equal to the
polar of N
X
(x):
T
X
(x) = N
X
(x)
∗
if X is closed and is regular at x
(see Rockafellar and Wets [RoW98], p. 221). Thus, for a closed X, regular
ity at x implies that the closed cone T
X
(x) is convex. This result, which is
central in nonsmooth analysis, will not be needed for our development in
this book.
E X E R C I S E S
1.5.1 (Fermat’s Principle in Optics)
Let C ⊂ !
n
be a closed convex set, and let y and z be given vectors in !
n
such
that the line segment connecting y and z does not intersect with C. Consider
the problem of minimizing the sum of distances y − x + z − x over x ∈ C.
Derive a necessary and suﬃcient optimality condition. Does an optimal solution
exist and if so, is it unique? Discuss the case where C is closed but not convex.
1.5.2
Let C1, C2, and C3 be three closed subsets of !
n
. Consider the problem of ﬁnding
a triangle with minimum perimeter that has one vertex on each of the three sets,
114 Convex Analysis and Optimization Chap. 1
i.e., the problem of minimizing x1 − x2 + x2 − x3 + x3 − x1 subject to
xi ∈ Ci, i = 1, 2, 3, and the additional condition that x1, x2, and x3 do not lie on
the same line. Show that if (x
∗
1
, x
∗
2
, x
∗
3
) deﬁnes an optimal triangle, there exists
a vector z
∗
in the triangle such that
(z
∗
−x
∗
i
) ∈ TC
i
(x
∗
i
)
∗
, i = 1, 2, 3.
1.5.3 (Cone Decomposition Theorem)
Let C ⊂ !
n
be a closed convex cone and let x be a given vector in !
n
. Show
that:
(a) ˆ x is the projection of x on C if and only if
ˆ x ∈ C, x − ˆ x ∈ C
∗
, (x − ˆ x)
ˆ x = 0.
(b) The following two properties are equivalent:
(i) x1 and x2 are the projections of x on C and C
∗
, respectively.
(ii) x = x1 + x2 with x1 ∈ C, x2 ∈ C
∗
, and x
1
x2 = 0.
1.5.4
Let C ⊂ !
n
be a closed convex cone and let a be a given vector in !
n
. Show
that for any positive scalars β and γ, we have
max
x≤β, x∈C
a
x ≤ γ if and only if a ∈ C
∗
+x [ x ≤ γ/β¦.
1.5.5 (Quasiregularity)
Consider the set
X =
_
x [ hi(x) = 0, i = 1, . . . , m, gj(x) ≤ 0, j = 1, . . . , r
_
,
where the functions hi : !
n
→ ! and gj : !
n
→ ! are smooth. For any x ∈ X
let A(x) =
_
j [ gj (x) ≤ 0
_
and consider the subspace
V (x) =
_
y [ ∇hi(x)
y = 0, i = 1, . . . , m, ∇gj (x)
y ≤ 0, j ∈ A(x)
_
.
Show that:
(a) TX(x) ⊂ V (x).
(b) TX(x) = V (x) if either the gradients ∇hi(x), i = 1, . . . , m, ∇gi(x), j ∈
A(x), are linearly independent, or all the functions hi and gj are linear.
Note: The property TX(x) = V (x) is called quasiregularity, and will be
signiﬁcant in the Lagrange multiplier theory of Chapter 2.
Sec. 1.6 Polyhedral Convexity 115
1.5.6 (Regularity of Constraint Systems)
Consider the set X of Exercise 1.5.5. Show that if at a given x ∈ X, the quasireg
ularity condition TX(x) = V (x) of that exercise holds, then X is regular at x.
1.5.7 (Necessary and Suﬃcient Optimality Condition)
Let f : !
n
→ ! be a smooth convex function and let X be a nonempty subset of
!
n
. Show that the condition
∇f(x
∗
)
(x −x
∗
) ≥ 0, ∀ x ∈ X,
is necessary and suﬃcient for a vector x
∗
∈ X to be a global minimum of f over
X.
1.6 POLYHEDRAL CONVEXITY
In this section, we develop some basic results regarding the geometry of
polyhedral sets. We start with properties of cones that have a polyhe
dral structure, and proceed to discuss characterizations of more general
polyhedral sets. We then apply the results obtained to linear and integer
programming.
1.6.1 Polyhedral Cones
We introduce two diﬀerent ways to view cones with polyhedral structure,
and our ﬁrst objective in this section is to show that these two views are
related through the polarity relation and are in some sense equivalent.
We say that a cone C ⊂ '
n
is polyhedral , if it has the form
C = ¦x [ a
j
x ≤ 0, j = 1, . . . , r¦,
where a
1
, . . . , a
r
are some vectors in '
n
. We say that a cone C ⊂ '
n
is
ﬁnitely generated, if it is generated by a ﬁnite set of vectors, i.e., if it has
the form
C = cone
_
¦a
1
, . . . , a
r
¦
_
=
_
_
_
x
¸
¸
¸ x =
r
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r
_
_
_
,
116 Convex Analysis and Optimization Chap. 1
(a)
a
1
0
a
3
a
2
a
1
0
a
3
a
2
(b)
Figure 1.6.1. (a) Polyhedral cone deﬁned by the inequality constraints a
j
x ≤ 0,
j = 1, 2, 3. (b) Cone generated by the vectors a
1
, a
2
, a
3
.
where a
1
, . . . , a
r
are some vectors in '
n
. These deﬁnitions are illustrated
in Fig. 1.6.1.
Note that sets deﬁned by linear equality constraints (as well as linear
inequality constraints) are also polyhedral cones, since a linear equality
constraint may be converted into two inequality constraints. In particular,
if e
1
, . . . , e
m
and a
1
, . . . , a
r
are given vectors, the set
¦x [ e
i
x = 0, i = 1, . . . , m, a
j
x ≤ 0, j = 1, . . . , r¦
is a polyhedral cone since it can be written as
¦x [ e
i
x ≤ 0, −e
i
x ≤ 0, i = 1, . . . , m, a
j
x ≤ 0, j = 1, . . . , r¦.
Using a related conversion, it can be seen that the cone
_
_
_
x
¸
¸
¸ x =
m
i=1
λ
i
e
i
+
r
j=1
µ
j
a
j
, λ
i
∈ ', i = 1, . . . , m, µ
j
≥ 0, j = 1, . . . , r
_
_
_
is ﬁnitely generated, since it can be written as
_
x
¸
¸
¸ x =
m
i=1
λ
+
i
e
i
+
m
i=1
λ
−
i
(−e
i
) +
r
j=1
µ
j
a
j
,
λ
+
i
≥ 0, λ
−
i
≥ 0, i = 1, . . . , m, µj ≥ 0, j = 1, . . . , r
_
.
A polyhedral cone is closed, since it is the intersection of closed half
spaces. A ﬁnitely generated cone is also closed and is in fact polyhedral,
but this is a fairly deep fact, which is shown in the following proposition.
Sec. 1.6 Polyhedral Convexity 117
Proposition 1.6.1:
(a) Let a
1
, . . . , a
r
be vectors of '
n
. Then, the ﬁnitely generated cone
C =
_
_
_
x
¸
¸
¸ x =
r
j=1
µjaj, µj ≥ 0, j = 1, . . . , r
_
_
_
(1.47)
is closed and its polar cone is the polyhedral cone given by
C
∗
=
_
x [ a
j
x ≤ 0, j = 1, . . . , r
_
. (1.48)
(b) (Farkas’ Lemma) Let x, e
1
, . . . , e
m
, and a
1
, . . . , a
r
be vectors of
'
n
. We have x
y ≤ 0 for all vectors y ∈ '
n
such that
y
ei = 0, ∀ i = 1, . . . , m, y
aj ≤ 0, ∀ j = 1, . . . , r,
if and only if x can be expressed as
x =
m
i=1
λiei +
r
j=1
µjaj,
where λ
i
and µ
j
are some scalars with µ
j
≥ 0 for all j.
(c) (MinkowskiWeyl Theorem) A cone is polyhedral if and only if it
is ﬁnitely generated.
Proof: (a) We ﬁrst show that the polar cone of C has the desired form
(1.48). If y satisﬁes y
a
j
≤ 0 for all j, then y
x ≤ 0 for all x ∈ C, so the
set in the righthand side of Eq. (1.48) is a subset of C
∗
. Conversely, if
y ∈ C
∗
, i.e., if y
x ≤ 0 for all x ∈ C, then (since a
j
belong to C) we have
y
aj ≤ 0, for all j. Thus, C
∗
is a subset of the set in the righthand side of
Eq. (1.48).
There remains to show that C is closed. We will give two proofs
for this. The ﬁrst proof is simpler and suﬃces for the purpose of showing
Farkas’ Lemma [part (b)]. The second proof shows a stronger result, namely
that C is polyhedral. This not only shows that C is closed, but also proves
half of the MinkowskiWeyl Theorem [part (c)].
The ﬁrst proof that C is closed is based on induction on the number
of vectors r. When r = 1, C is either ¦0¦ (if a
1
= 0) or a halﬂine, and is
118 Convex Analysis and Optimization Chap. 1
therefore closed. Suppose, for some r ≥ 1, all cones of the form
_
_
_
x
¸
¸
¸ x =
r
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r
_
_
_
are closed. Then, we will show that a cone of the form
C
r+1
=
_
_
_
x
¸
¸
¸ x =
r+1
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r + 1
_
_
_
is also closed. Without loss of generality, assume that aj = 1 for all j.
If the vectors −a
1
, . . . , −a
r+1
belong to C
r+1
, then C
r+1
is the subspace
spanned by a
1
, . . . , a
r+1
and is therefore closed. If the negative of one of
the vectors, say −a
r+1
, does not belong to C
r+1
, consider the cone
C
r
=
_
_
_
x
¸
¸
¸ x =
r
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r
_
_
_
,
which is closed by the induction hypothesis. Let
γ = min
x∈Cr, x=1
a
r+1
x.
Since the set ¦x ∈ C
r
[ x = 1¦ is nonempty and compact, the minimum
above is attained at some x
∗
∈ C
r
with x
∗
 = 1 by Weierstrass’ Theorem.
Using the Schwartz Inequality, we have
γ = a
r+1
x
∗
≥ −ar+1 x
∗
 = −1,
with equality if and only if x
∗
= −a
r+1
. Because x
∗
∈ C
r
and −a
r+1
/ ∈ C
r
,
equality cannot hold above, so that γ > −1. Let ¦x
k
¦ ⊂ C
r+1
be a
convergent sequence. We will prove that its limit belongs to C
r+1
, thereby
showing that C
r+1
is closed. Indeed, for all k, we have x
k
= ξ
k
a
r+1
+ y
k
,
where ξ
k
≥ 0 and y
k
∈ C
r
. Using the fact a
r+1
 = 1 and the deﬁnition of
γ, we obtain
x
k

2
= ξ
2
k
+y
k

2
+ 2ξ
k
a
r+1
y
k
≥ ξ
2
k
+y
k

2
+ 2γξ
k
y
k

=
_
ξ
k
−y
k
)
2
+ 2(1 + γ)ξ
k
y
k
.
Since ¦x
k
¦ converges, ξ
k
≥ 0, and 1 + γ > 0, it follows that the sequences
¦ξ
k
¦ and ¦y
k
¦ are bounded and hence, they have limit points denoted by
ξ and y, respectively. The limit of ¦x
k
¦ is
lim
k→∞
(ξ
k
a
r+1
+ y
k
) = ξa
r+1
+ y,
Sec. 1.6 Polyhedral Convexity 119
which belongs to C
r+1
, since ξ ≥ 0 and y ∈ C
r
(by the closedness hypothesis
on Cr). We conclude that Cr+1 is closed, completing the proof.
We now give an alternative proof that C is closed, by showing that it
is polyhedral. The proof is constructive and uses induction on the number
of vectors r. When r = 1, there are two possibilities: (a) a
1
= 0, in which
case C = ¦0¦, and
C = ¦x [ u
i
x ≤ 0, −u
i
x ≤ 0, i = 1, . . . , n¦,
where ui is the ith unit coordinate vector; (b) a1 ,= 0, in which case C is
a closed halﬂine, which using the Polar Cone Theorem, is characterized as
the set of vectors that are orthogonal to the subspace orthogonal to a
1
and
also make a nonnegative inner product with a
1
, i.e.,
C = ¦x [ b
i
x ≤ 0, −b
i
x ≤ 0, i = 1, . . . , n −1, −a
1
x ≤ 0¦,
where b
1
, . . . , b
n−1
are basis vectors for the (n − 1)dimensional subspace
that is orthogonal to the vector a
1
. In both cases (a) and (b), C is poly
hedral.
Assume that for some r ≥ 1, a set of the form
C
r
=
_
_
_
x
¸
¸
¸
¸
¸
x =
r
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r
_
_
_
has a polyhedral representation
P
r
=
_
x [ b
j
x ≤ 0, j = 1, . . . , m
_
.
Let a
r+1
be a given vector in '
n
, and consider the set
C
r+1
=
_
_
_
x
¸
¸
¸
¸
¸
x =
r+1
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r +1
_
_
_
.
We will show that C
r+1
has a polyhedral representation.
Let
β
j
= a
r+1
b
j
, j = 1, . . . , m,
and deﬁne the index sets
J
−
= ¦j [ βj < 0¦, J
0
= ¦j [ βj = 0¦, J
+
= ¦j [ βj > 0¦.
If J
−
∪J
0
= Ø, or equivalently a
r+1
b
j
> 0 for all j, we see that −a
r+1
is an
interior point of P
r
. Hence there is an > 0 such that for each µ
r+1
> 0,
the open sphere centered at −µ
r+1
a
r+1
of radius µ
r+1
is contained in
P
r
. It follows that for any x ∈ '
n
we have that −µ
r+1
a
r+1
+ x ∈ P
r
120 Convex Analysis and Optimization Chap. 1
for all suﬃciently large µ
r+1
, implying that x belongs to C
r+1
. Therefore
Cr+1 = '
n
, so that Cr+1 is a polyhedral set.
We may thus assume that J
−
∪J
0
,= Ø. We will show that if J
+
= Ø
or J
−
= Ø, the set C
r+1
has the polyhedral representation
P
r+1
=
_
x [ b
j
x ≤ 0, j ∈ J
−
∪ J
0
_
,
and if J
+
,= Ø and J
−
,= Ø, it has the polyhedral representation
P
r+1
=
_
x [ b
j
x ≤ 0, j ∈ J
−
∪ J
0
, b
l,k
x ≤ 0, l ∈ J
+
, k ∈ J
−
_
,
where
b
l,k
= b
l
−
β
l
β
k
b
k
, ∀ l ∈ J
+
, ∀ k ∈ J
−
.
This will complete the induction.
Indeed, we have C
r+1
⊂ P
r+1
since by construction, the vectors
a1, . . . , ar+1 satisfy the inequalities deﬁning Pr+1. To show the reverse in
clusion, we ﬁx a vector x ∈ P
r+1
and we verify that there exist µ
1
, . . . , µ
r+1
≥
0 such that x =
r+1
j=1
µ
j
a
j
, or equivalently that there exists µ
r+1
≥ 0 such
that
x −µ
r+1
a
r+1
∈ P
r
.
We consider three cases.
(1) J
+
= Ø. Then, since x ∈ P
r+1
, we have b
j
x ≤ 0 for all j ∈ J
−
∪ J
0
,
implying x ∈ P
r
, so that x −µ
r+1
a
r+1
∈ P
r
for µ
r+1
= 0.
(2) J
+
,= Ø, J
0
,= Ø, and J
−
= Ø. Then, since x ∈ P
r+1
, we have
b
j
x ≤ 0 for all j ∈ J
0
, so that
b
j
(x −µa
r+1
) ≤ 0, ∀ j ∈ J
0
, ∀ µ ≥ 0,
b
j
(x −µa
r+1
) ≤ 0, ∀ j ∈ J
+
, ∀ µ ≥ max
j∈J
+
b
j
x
β
j
.
Hence, for µ
r+1
suﬃciently large, we have x −µ
r+1
a
r+1
∈ P
r
.
(3) J
+
,= Ø and J
−
,= Ø. Then since x ∈ Pr+1, we have b
l,k
x ≤ 0 for all
l ∈ J
+
, k ∈ J
−
, implying that
b
l
x
β
l
≤
b
k
x
β
k
, ∀ l ∈ J
+
, ∀ k ∈ J
−
. (1.49)
Furthermore, for x ∈ Pr+1, there holds b
k
x/β
k
≥ 0 for all k ∈ J
−
, so
that for µ
r+1
satisfying
max
_
0, max
l∈J
+
b
l
x
β
l
_
≤ µ
r+1
≤ min
k∈J
−
b
k
x
β
k
,
Sec. 1.6 Polyhedral Convexity 121
we have
b
j
x −µr+1b
j
ar+1 ≤ 0, ∀ j ∈ J
0
,
b
k
x −µ
r+1
b
k
a
r+1
= β
k
_
b
k
x
β
k
−µ
r+1
_
≤ 0, ∀ k ∈ J
−
,
b
l
x −µ
r+1
b
l
a
r+1
= β
l
_
b
l
x
β
l
−µ
r+1
_
≤ 0, ∀ l ∈ J
+
.
Hence b
j
x−µ
r+1
a
r+1
≤ 0 for all j, implying that x−µ
r+1
a
r+1
∈ P
r
,
and completing the proof.
(b) Deﬁne ar+i = ei and ar+m+i = −ei, i = 1, . . . , m. The result to be
shown translates to
x ∈ P
∗
⇐⇒ x ∈ C,
where
P = ¦y [ y
aj ≤ 0, j = 1, . . . , r + 2m¦ ,
C =
_
_
_
x
¸
¸
¸ x =
r+2m
j=1
µjaj, µj ≥ 0
_
_
_
.
Since by part (a), P = C
∗
and C is closed and convex, we have P
∗
=
_
C
∗
_
∗
= C by the Polar Cone Theorem (Prop. 1.5.1).
(c) We have already shown in the alternative proof of part (a) that a ﬁnitely
generated cone is polyhedral, so there remains to show the converse. Let
P be a polyhedral cone given by
P = ¦x [ a
j
x ≤ 0, j = 1, . . . , r¦.
By part (a),
P =
_
_
_
x
¸
¸
¸ x =
r
j=1
µ
j
a
j
, µ
j
≥ 0, j = 1, . . . , r
_
_
_
∗
,
and by using also the closedness of ﬁnitely generated cones proved in part
(a) and the Polar Cone Theorem, we have
P
∗
=
_
_
_
x
¸
¸
¸ x =
r
j=1
µjaj, µj ≥ 0, j = 1, . . . , r
_
_
_
.
Thus, P
∗
is ﬁnitely generated and therefore polyhedral, so that by part
(a), its polar (P
∗
)
∗
is ﬁnitely generated. By the Polar Cone Theorem,
(P
∗
)
∗
= P, implying that P is ﬁnitely generated. Q.E.D.
122 Convex Analysis and Optimization Chap. 1
1.6.2 Polyhedral Sets
A nonempty set P ⊂ '
n
is said to be a polyhedral set (or polyhedron) if it
has the form
P =
_
x [ a
j
x ≤ b
j
, j = 1, . . . , r
_
,
where a
j
are some vectors in '
n
and b
j
are some scalars. A polyhedron
may also involve linear equalities, which may be converted into two linear
inequalities. In particular, if e
1
, . . . , e
m
and a
1
, . . . , a
r
are given vectors,
and d
1
, . . . , d
m
and b
1
, . . . , b
r
are given scalars, the set
¦x [ e
i
x = d
i
, i = 1, . . . , m, a
j
x ≤ b
j
, j = 1, . . . , r¦
is polyhedral, since it can be written as
¦x [ e
i
x ≤ d
i
, −e
i
x ≤ −d
i
, i = 1, . . . , m, a
j
x ≤ b
j
, j = 1, . . . , r¦.
The following is a fundamental result, showing that a polyhedral set
can be represented as the sum of the convex hull of a ﬁnite set of points
and a ﬁnitely generated cone.
Proposition 1.6.2: A set P is polyhedral if and only if there exist
a nonempty ﬁnite set of vectors ¦v
1
, . . . , v
m
¦ and a ﬁnitely generated
cone C such that P = conv
_
¦v
1
, . . . , v
m
¦
_
+ C, i.e.,
P =
_
_
_
x
¸
¸
¸ x =
m
j=1
µ
j
v
j
+ y,
m
j=1
µ
j
= 1, µ
j
≥ 0, j = 1, . . . , m, y ∈ C
_
_
_
.
Proof: Assume that P is polyhedral. Then, it has the form
P =
_
x [ a
j
x ≤ b
j
, j = 1, . . . , r
_
,
for some vectors aj and scalars bj. Consider the polyhedral cone in '
n+1
given by
ˆ
P =
_
(x, w) [ 0 ≤ w, a
j
x ≤ b
j
w, j = 1, . . . , r
_
and note that
P =
_
x [ (x, 1) ∈
ˆ
P
_
.
By the MinkowskiWeyl Theorem [Prop. 1.6.1(c)],
ˆ
P is ﬁnitely generated,
so it has the form
ˆ
P =
_
_
_
(x, w)
¸
¸
¸ x =
m
j=1
µ
j
v
j
, w =
m
j=1
µ
j
d
j
, µ
j
≥ 0, j = 1, . . . , m
_
_
_
,
Sec. 1.6 Polyhedral Convexity 123
for some vectors v
j
and scalars d
j
. Since w ≥ 0 for all vectors (x, w) ∈
ˆ
P,
we see that dj ≥ 0 for all j. Let
J
+
= ¦j [ d
j
> 0¦, J
0
= ¦j [ d
j
= 0¦.
By replacing µ
j
by µ
j
/d
j
for all j ∈ J
+
, we obtain the equivalent descrip
tion
ˆ
P =
_
_
_
(x, w)
¸
¸
¸ x =
m
j=1
µ
j
v
j
, w =
j∈J
+
µ
j
, µ
j
≥ 0, j = 1, . . . , m
_
_
_
.
Since P = ¦x [ (x, 1) ∈
ˆ
P¦, we obtain
P =
_
_
_
x
¸
¸
¸ x =
j∈J
+
µjvj +
j∈J
0
µjvj,
j∈J
+
µj = 1, µj ≥ 0, j = 1, . . . , m
_
_
_
.
Thus, P is the vector sum of the convex hull of the vectors v
j
, j ∈ J
+
, and
the ﬁnitely generated cone
_
_
_
j∈J
0
µ
j
v
j
¸
¸
¸ µ
j
≥ 0, j ∈ J
0
_
_
_
.
To prove that the vector sum of conv
_
¦v
1
, . . . , v
m
¦
_
and a ﬁnitely
generated cone is a polyhedral set, we use a reverse argument; we pass to
a ﬁnitely generated cone description, we use the MinkowskiWeyl Theorem
to assert that this cone is polyhedral, and we ﬁnally construct a polyhedral
set description. The details are left as an exercise for the reader. Q.E.D.
1.6.3 Extreme Points
Given a convex set C, a vector x ∈ C is said to be an extreme point of C
if there do not exist vectors y ∈ C and z ∈ C, with y ,= x and z ,= x, and
a scalar α ∈ (0, 1) such that x = αy +(1 −α)z. An equivalent deﬁnition is
that x cannot be expressed as a convex combination of some vectors of C,
all of which are diﬀerent from x.
Thus an extreme point of a set cannot lie strictly between the end
points of any line segment contained in the set. It follows that an extreme
point of a set must lie on the relative boundary of the set. As a result, an
open set has no extreme points, and more generally a convex set that coin
cides with its relative interior, has no extreme points, except in the special
case where the set consists of a single point. As another consequence of the
deﬁnition, a convex cone may have at most one extreme point, the origin.
124 Convex Analysis and Optimization Chap. 1
Extreme
Points
Extreme
Points
Extreme
Points
(a) (b) (c)
Figure 1.6.2. Illustration of extreme points of various convex sets. In the set of
(c), the extreme points are the ones that lie on the circular arc.
We also show in what follows that a bounded polyhedral set has a ﬁnite
number of extreme points and in fact it is equal to the convex hull of its
extreme points. Figure 1.6.2 illustrates the extreme points of various types
of sets.
The following proposition provides some intuition into the nature of
extreme points.
Proposition 1.6.3: Let C be a nonempty closed convex set in '
n
.
(a) If H is a hyperplane that passes through a relative boundary
point of C and contains C in one of its halfspaces, then every
extreme point of C ∩ H is also an extreme point of C.
(b) C has at least one extreme point if and only if it does not contain
a line, i.e., a set L of the form L = ¦x + αd [ α ∈ '¦ where d is
a nonzero vector in '
n
and x is some vector in C.
(c) (Krein  Milman Theorem) If C is bounded (in addition to being
convex and closed), then C is equal to the convex hull of its
extreme points.
Proof: (a) Let x be an element of C ∩H which is not an extreme point of
C. Then we have x = αy + (1 − α)z for some α ∈ (0, 1), and some y ∈ C
and z ∈ C, with y ,= x and z ,= x. Since x ∈ H, the halfspace containing C
is of the form ¦x [ a
x ≥ a
x¦, where a ,= 0. Then a
y ≥ a
x and a
z ≥ a
x,
which in view of x = αy + (1 −α)z, implies that a
y = a
x and a
z = a
x.
Therefore, y ∈ C ∩ H and z ∈ C ∩ H, showing that x is not an extreme
point of C ∩ H.
(b) To arrive at a contradiction, assume that C has an extreme point x
and contains a line ¦x+αd [ α ∈ '¦, where x ∈ C and d ,= 0. Then by the
Sec. 1.6 Polyhedral Convexity 125
Recession Cone Theorem [Prop. 1.2.13(b)] and the closedness of C, both
d and −d are directions of recession of C, so the line ¦x + αd [ α ∈ '¦
belongs to C. This contradicts the fact that x is an extreme point.
Conversely, we use induction on the dimension of the space to show
that if C does not contain a line, it must have an extreme point. This
is true in the real line ', so assume it is true in '
n−1
. Since C contains
no line, we must have C ,= '
n
, so that C has a relative boundary point.
Take any x on the relative boundary of C, and any hyperplane H passing
through x and containing C in one of its halfspaces. By using a translation
argument if necessary, we may assume without loss of generality that x = 0.
Then H is an (n − 1)dimensional subspace, so the set C ∩ H lies in an
(n − 1)dimensional space and contains no line. Hence, by the induction
hypothesis, it must have an extreme point. By part (a), this extreme point
must also be an extreme point of C.
(c) The proof is based on induction on the dimension of the space. On the
real line, every compact convex set C is a line segment whose endpoints are
the extreme points of C, so that C is the convex hull of its extreme points.
Suppose now that every compact convex set in '
n−1
is the convex hull of its
extreme points. Let C be a compact convex set in '
n
. Since C is bounded,
it contains no line and by part (b), it has at least one extreme point. By
convexity of C, the convex hull of all extreme points of C is contained
in C. To show the inverse inclusion, choose any x ∈ C, and consider a
line that lies in aﬀ(C) and passes through x. Since C is compact, the
intersection of this line and C is a line segment whose endpoints, say x1
and x
2
, belong to the relative boundary of C. Let H
1
be a hyperplane that
passes through x
1
and contains C in one of its halfspaces. Similarly, let
H
2
be a hyperplane that passes through x
2
and contains C in one of its
halfspaces. The intersections C ∩ H
1
and C ∩ H
2
are compact convex sets
that lie in the hyperplanes H
1
and H
2
, respectively. By viewing H
1
and H
2
as (n − 1)dimensional spaces, and by using the inductive hypothesis, we
see that each of the sets C∩H
1
and C∩H
2
is the convex hull of its extreme
points. Hence x
1
is a convex combination of some extreme points of C∩H
1
,
and x
2
is a convex combination of some extreme points of C ∩H
2
. By part
(a), every extreme point of C∩H
1
and every extreme point of C∩H
2
is also
an extreme point of C, so that each of x1 and x2 is a convex combination
of some extreme points of C. Since x lies in the line segment connecting x
1
and x
2
, it follows that x is a convex combination of some extreme points
of C, showing that C is contained in the convex hull of all extreme points
of C. Q.E.D.
The boundedness assumption is essential in the Krein  Milman The
orem. For example, the only extreme point of the ray C = ¦λy [ λ ≥ 0¦,
where is y is a given nonzero vector, is the origin, but none of its other
points can be generated as a convex combination of extreme points of C.
126 Convex Analysis and Optimization Chap. 1
As an example of application of the preceding proposition, consider
a nonempty polyhedral set of the form
¦x [ Ax = b, x ≥ 0¦,
or of the form
¦x [ Ax ≤ b, x ≥ 0¦,
where A is an mn matrix and b ∈ '
n
. Such polyhedra arise commonly in
linear programming formulations, and clearly do not contain a line. Hence
by Prop. 1.6.3(b) they always possess at least one extreme point.
One of the fundamental linear programming results is that if a linear
function f attains a minimum over a polyhedral set C having at least one
extreme point, then f attains a minimum at some extreme point of C (as
well as possibly at some other nonextreme points). We will come to this
fact after considering the more general case where f is concave, and C is
closed and convex.
Proposition 1.6.4: Let C ⊂ '
n
be a closed convex set that does not
contain a line, and let f : C → ' be a concave function attaining a
minimum over C. Then f attains a minimum at some extreme point
of C.
Proof: If C consists of just a single point, the result holds trivially. We
may thus assume that C contains at least two distinct points. We ﬁrst
show that f attains a minimum at some vector that is not in ri(C). Let x
∗
be a vector where f attains a minimum over C. If x
∗
/ ∈ ri(C) we are done,
so assume that x
∗
∈ ri(C). By Prop. 1.2.9(c), every line segment having
x
∗
as one endpoint can be prolonged beyond x
∗
without leaving C, i.e., for
any x ∈ C, there exists γ > 1 such that
ˆ x = x
∗
+ (γ −1)(x
∗
−x) ∈ C.
Therefore
x
∗
=
γ −1
γ
x +
1
γ
ˆ x,
and by the concavity of f and the optimality of x
∗
, we have
f(x
∗
) ≥
γ −1
γ
f(x) +
1
γ
f(ˆ x) ≥ f(x
∗
),
implying that f(x) = f(x
∗
) for any x ∈ C. Furthermore, there exists an
x ∈ C with x / ∈ ri(C), since C is closed, contains at least two points, and
does not contain a line.
Sec. 1.6 Polyhedral Convexity 127
We have shown so far that the minimum of f is attained at some
x / ∈ ri(C). If x is an extreme point of C, we are done. If it is not an
extreme point, consider a hyperplane H passing through x and containing
C in one of its halfspaces. The intersection C
1
= C ∩ H is closed, convex,
does not contain a line, and lies in an aﬃne set M
1
of dimension n − 1.
Furthermore, f attains its minimum over C
1
at x. Thus, by the preceding
argument, it also attains its minimum at some x
1
∈ C
1
with x
1
/ ∈ ri(C
1
).
If x1 is an extreme point of C1, then by Prop. 1.6.3, it is also an extreme
point of C and the result follows. If x
1
is not an extreme point of C
1
,
then we may view M
1
as a space of dimension n − 1 and we form C
2
, the
intersection of C
1
with a hyperplane in M
1
that passes through x
1
and
contains C
1
in one of its halfspaces. This hyperplane will be of dimension
n − 2. We can continue this process for at most n times, when a set Cn
consisting of a single point is obtained. This point is an extreme point
of C
n
and, by repeated application of Prop. 1.6.3, an extreme point of C.
Q.E.D.
The following is an important special case of the preceding proposi
tion.
Proposition 1.6.5: Let C be a closed convex set and let f : C → '
be a concave function. Assume that for some mn matrix A of rank
n and some b ∈ '
n
we have
Ax ≥ b, ∀ x ∈ C.
Then if f attains a minimum over C, it attains a minimum at some
extreme point of C.
Proof: In view of Prop. 1.6.4, it suﬃces to show that C does not contain
a line. Assume, to obtain a contradiction, that C contains the line
L = ¦x + λd [ λ ∈ '
n
¦,
where x ∈ C and d is a nonzero vector. Since A has rank n, the vector Ad
is nonzero, implying that the image
AL = ¦Ax + λAd [ λ ∈ '¦
is also a line. This contradicts the assumption Ax ≥ b for all x ∈ C.
Q.E.D.
128 Convex Analysis and Optimization Chap. 1
1.6.4 Extreme Points and Linear Programming
We now consider a polyhedral set P and we characterize the set of its
extreme points (also called vertices). By Prop. 1.6.2, P can be represented
as
P =
ˆ
P +C,
where
ˆ
P is the convex hull of some vectors v
1
, . . . , v
m
and C is a ﬁnitely
generated cone:
ˆ
P =
_
_
_
x
¸
¸
¸ x =
m
j=1
µ
j
v
j
,
m
j=1
µ
j
= 1, µ
j
≥ 0, j = 1, . . . , m
_
_
_
.
We note that an extreme point x of P cannot be of the form x = ˆ x + y,
where ˆ x ∈
ˆ
P and y ,= 0, y ∈ C, since in this case x would be the midpoint
of the line segment connecting the distinct vectors ˆ x and ˆ x+2y. Therefore,
an extreme point of P must belong to
ˆ
P, and since
ˆ
P ⊂ P, it must also be
an extreme point of
ˆ
P. An extreme point of
ˆ
P must be one of the vectors
v
1
, . . . , v
m
, since otherwise this point would be expressible as a convex
combination of v1, . . . , vm. Thus the set of extreme points of P is either
empty, or nonempty and ﬁnite. Using Prop. 1.6.3(b), it follows that the set
of extreme points of P is nonempty and ﬁnite if and only if P contains no
line.
If P is bounded, then we must have P =
ˆ
P, and it follows from
the Krein  Milman Theorem that P is equal to the convex hull of its
extreme points (not just the convex hull of the vectors v
1
, . . . , v
m
). The
following proposition gives another and more speciﬁc characterization of
extreme points of polyhedral sets, which is central in the theory of linear
programming.
Sec. 1.6 Polyhedral Convexity 129
Proposition 1.6.6: Let P be a polyhedral set in '
n
.
(a) If P has the form
P =
_
x [ a
j
x ≤ b
j
, j = 1, . . . , r
_
,
where a
j
and b
j
are given vectors and scalars, respectively, then
a vector v ∈ P is an extreme point of P if and only if the set
A
v
=
_
a
j
[ a
j
v = b
j
, j ∈ ¦1, . . . , r¦
_
contains n linearly independent vectors.
(b) If P has the form
P = ¦x [ Ax = b, x ≥ 0¦,
where A is a given mn matrix and b is a given vector in '
m
,
then a vector v ∈ P is an extreme point of P if and only if the
columns of A corresponding to the nonzero coordinates of v are
linearly independent.
(c) If P has the form
P = ¦x [ Ax = b, c ≤ x ≤ d¦,
where A is a given m n matrix, b is a given vector in '
m
,
and c, d are given vectors in '
n
, then a vector v ∈ P is an
extreme point of P if and only if the columns of A corresponding
to the coordinates of v that lie strictly between the corresponding
coordinates of c and d are linearly independent.
Proof: (a) If the set Av contains fewer than n linearly independent vectors,
then the system of equations
a
j
w = 0, ∀ a
j
∈ A
v
has a nonzero solution w. For suﬃciently small γ > 0, we have v +γw ∈ P
and v −γw ∈ P, thus showing that v is not an extreme point. Thus, if v
is an extreme point, A
v
must contain n linearly independent vectors.
Conversely, suppose that A
v
contains a subset
¯
A
v
consisting of n
linearly independent vectors. Suppose that for some y ∈ P, z ∈ P, and
α ∈ (0, 1), we have v = αy + (1 −α)z. Then for all a
j
∈
¯
A
v
, we have
b
j
= a
j
v = αa
j
y +(1 −α)a
j
z ≤ αb
j
+ (1 −α)b
j
= b
j
.
130 Convex Analysis and Optimization Chap. 1
Thus v, y, and z are all solutions of the system of n linearly independent
equations
a
j
w = b
j
, ∀ a
j
∈
¯
A
v
.
Hence v = y = z, implying that v is an extreme point of P.
(b) Let k be the number of zero coordinates of v, and consider the matrix
¯
A, which is the same as A except that all the columns corresponding to
the zero coordinates of v are set to zero. We write P in the form
P = ¦x [ Ax ≤ b, −Ax ≤ −b, −x ≤ 0¦,
and apply the result of part (a). We obtain that v is an extreme point if and
only if
¯
A contains n − k linearly independent rows, which is equivalent to
the n−k nonzero columns of A (corresponding to the nonzero coordinates
of v) being linearly independent.
(c) The proof is essentially the same as the proof of part (b). Q.E.D.
The following theorem illustrates the importance of extreme points in
linear programming. In addition to its analytical signiﬁcance, it forms the
basis for algorithms, such as the celebrated simplex method (see any linear
programming book, e.g., Dantzig [Dan63], Chvatal [Chv83], or Bertsimas
and Tsitsiklis [BeT97]), which systematically search the set of extreme
points of the constraint polyhedron for an optimal extreme point.
Proposition 1.6.7: (Fundamental Theorem of Linear Pro
gramming) Let P be a polyhedral set that has at least one extreme
point. Then if a linear function attains a minimum over P, it attains
a minimum at some extreme point of P.
Proof: Since P is polyhedral, it has a representation
P = ¦x [ Ax ≥ b¦,
for some m n matrix A and some b ∈ '
m
. If A had rank less than
n, then its nullspace would contain some nonzero vector d, so P would
contain a line parallel to d, contradicting the existence of an extreme point
[cf. Prop. 1.6.3(b)]. Thus A has rank n and the result follows from Prop.
1.6.5. Q.E.D.
Extreme Points and Integer Programming
Many important optimization problems, in addition to the usual equality
and inequality constraints, include the requirement that the optimization
Sec. 1.6 Polyhedral Convexity 131
variables take integer values, such as 0 or 1. We refer to such problems
as integer programming problems, and we note that they arise in a broad
variety of practical settings, such as for example in scheduling, resource allo
cation, and engineering design, as well in many combinatorial optimization
contexts, such as matching, and traveling salesman problems.
The methodology for solving an integer programming problem is very
diverse, but an important subset of this methodology relies on the solution
of a continuous optimization problem, called the relaxed problem, which is
derived from the original by neglecting the integer constraints while main
taining all the other constraints. In many important situations the relaxed
problem is a linear program that can be solved for an extreme point opti
mal solution, by a variety of algorithms, including the simplex method. A
particularly fortuitous situation arises if this extreme point solution hap
pens to have integer components, since it will then solve optimally not just
the relaxed problem, but also the original integer programming problem.
Thus polyhedra whose extreme points have integer components are of spe
cial signiﬁcance. We will now use Prop. 1.6.6 to characterize an important
class of polyhedra with this property.
Consider a polyhedral set P of the form
P = ¦x [ Ax = b, c ≤ x ≤ d¦, (1.50)
where A is a given mn matrix, b is a given vector in '
m
, and c and d are
given vectors in '
n
. We assume that all the components of the matrix A
and the vectors b, c, and d are integer. Given an extreme point v of P, we
want to determine conditions under which the components of v are integer.
Let us say that a square matrix with integer components is unimodular
if its determinant is 0, 1, or 1, and let us say that a rectangular matrix with
integer components is totally unimodular if each of its square submatrices
is unimodular. We have the following proposition.
Proposition 1.6.8: Let P be a polyhedral set
P = ¦x [ Ax = b, c ≤ x ≤ d¦,
where A is a given m n matrix, b is a given vector in '
m
, and c
and d are given vectors in '
n
. Assume that all the components of the
matrix A and the vectors b, c, and d are integer, and that the matrix
A is totally unimodular. Then all the extreme points of P have integer
components.
Proof: Let v be an extreme point of P. Consider the subset of indexes
I = ¦i [ c
i
< v
i
< d
i
¦,
132 Convex Analysis and Optimization Chap. 1
and without loss of generality, assume that
I = ¦1, . . . , m¦
for some integer m. Let A be the matrix consisting of the ﬁrst m columns
of A and let v be the vector consisting of the ﬁrst m components of v.
Note that each of the last n − m components of v is equal to either the
corresponding component of c or to the corresponding component of d,
which are integer. Thus the extreme point v has integer components if and
only if the subvector v does.
From Prop. 1.6.6, A has linearly independent columns, so v is the
unique solution of the system of equations
Ay = b,
where b is equal to b minus the last n − m columns of A multiplied with
the corresponding components of v (each of which is equal to either the
corresponding component of c or the corresponding component of d, so
that b has integer components). Equivalently, there exists an invertible
mm submatrix
˜
A of A and a subvector
˜
b of b with m components such
that
v = (
˜
A)
−1˜
b.
The components of v will be integer if we can guarantee that the
components of the inverse (
˜
A)
−1
are integer. By Cramer’s formula, each
of the components of the inverse of a matrix is a fraction with a sum
of products of the components of the matrix in the numerator and the
determinant of the matrix in the denominator. Since by hypothesis, A
is totally unimodular, the invertible submatrix
˜
A is unimodular, and its
determinant is either equal to 1 or is equal to 1. Hence (
˜
A)
−1
has integer
components, and it follows that v (and hence also the extreme point v) has
integer components. Q.E.D.
The total unimodularity of the matrix A in the above proposition
can be veriﬁed in a number of important special cases, some of which are
discussed in the exercises. The most important such case arises in network
optimization problems, where A is the, socalled, arc incidence matrix of
a given directed graph (we refer to network optimization books such as
Rockafellar [Roc84] or Bertsekas [Ber98] for a detailed discussion of these
problems and their broad applications). Here A has a row for each node
and a column for each arc of the graph. The component corresponding to
the ith row and a given arc is a 1 if the arc is outgoing from node i, is a 1
if the arc is incoming to i, and is a 0 otherwise. Then we can show that the
determinant of each square submatrix of A is 0, 1, or 1 by induction on the
dimension of the submatrix. In particular, the submatrices of dimension 1
of A are the scalar components of A, which are 0, 1, or 1. Suppose that
Sec. 1.6 Polyhedral Convexity 133
the determinant of each square submatrix of dimension n ≥ 1 is 0, 1, or
1. Consider a square submatrix of dimension n + 1. If this matrix has a
column with all components 0, the matrix is singular, and its determinant
is 0. If the matrix has a column with a single nonzero component (a 1 or
a 1), by expanding its determinant along that component and using the
induction hypothesis, we see that the determinant is 0, 1, or 1. Finally, if
each column of the matrix has two components (a 1 and a 1), the sum of
its rows is 0, so the matrix is singular, and its determinant is 0. Thus, in
linear network optimization where the only constraints, other than upper
and lower bounds on the variables, are equality constraints corresponding
to an arc incidence matrix, integer extreme point optimal solutions can
generically be found.
E X E R C I S E S
1.6.1
Let V be the convex hull of a ﬁnite set of points v1, . . . , vm¦ in !
n
, and let
C ⊂ !
n
be a ﬁnitely generated cone. Show that V +C is a polyhedral set. Hint :
Use the MinkowskiWeyl Theorem to construct a polyhedral description of V +C.
1.6.2
Let P be a polyhedral set represented as
P =
_
x
¸
¸
¸ x =
m
j=1
µj vj + y,
m
j=1
µj = 1, µj ≥ 0, j = 1, . . . , m, y ∈ C
_
,
where v1, . . . , vm are some vectors and C is a ﬁnitely generated cone (cf. Prop.
1.6.2). Show that the recession cone of P is equal to C.
1.6.3
Show that the image and the inverse image of a polyhedral set under a linear
transformation is a polyhedral set.
134 Convex Analysis and Optimization Chap. 1
1.6.4
Let C1 ⊂ !
n
and C2 ⊂ !
m
be polyhedral sets. Show that the Cartesian product
C1 C2 is a polyhedral set. Show also that if C1 and C2 are polyhedral cones,
then C1 C2 is a polyhedral cone.
1.6.5
Let C1 and C2 be two polyhedral sets in !
n
. Show that C1 +C2 is a polyhedral
set. Show also that if C1 and C2 are polyhedral cones, then C1+C2 is a polyhedral
cone.
1.6.6 (Polyhedral Strong Separation)
Let C1 and C2 be two nonempty disjoint polyhedral sets in !
n
. Show that C1
and C2 can be strongly separated, i.e., there is a nonzero vector a ∈ !
n
such that
sup
x∈C
1
a
x < inf
z∈C
2
a
z.
(Compare this with Exercise 1.4.1.)
1.6.7
Let C be a polyhedral set containing the origin.
(a) Show that cone(C) is a polyhedral cone.
(b) Show by counterexample that if C does not contain the origin, cone(C)
may not be a polyhedral cone.
1.6.8 (Polyhedral Functions)
A function f : !
n
→ (−∞, ∞] is polyhedral if its epigraph is a polyhedral set
in !
n+1
not containing vertical lines [i.e., lines of the form (x, w) [ w ∈ !¦ for
some x ∈ !
n
]. Show the following:
(a) A polyhedral function is convex over !
n
.
(b) f is polyhedral if and only if dom(f) is a polyhedral set and
f(x) = maxa
1
x +b1, . . . , a
m
x +bm¦, ∀ x ∈ dom(f)
for some vectors ai ∈ !
n
and scalars bi.
(c) The function given by
F(y) = sup
m
i=1
µ
i
=1, µ
i
≥0
__
m
i=1
µiai
_
y +
m
i=1
µibi
_
is polyhedral.
Sec. 1.6 Polyhedral Convexity 135
1.6.9
Let A be an mn matrix and let g : !
m
→ (−∞, ∞] be a polyhedral function.
Show that the function f given by f(x) = g(Ax) is polyhedral on !
n
.
1.6.10
Let P be a polyhedral set represented as
P =
_
x
¸
¸
¸ x =
m
j=1
µj vj + y,
m
j=1
µj = 1, µj ≥ 0, j = 1, . . . , m, y ∈ C
_
,
where v1, . . . , vm are some vectors and C is a ﬁnitely generated cone (cf. Prop.
1.6.2). Show that each extreme point of P is equal to some vector vi that cannot
be represented as a convex combination of the remaining vectors vj, j ,= i.
1.6.11
Consider a nonempty polyhedral set of the form
P =
_
x [ a
j
x ≤ bj , j = 1, . . . , r
_
,
where aj are some vectors and bj are some scalars. Show that P has an extreme
point if and only if the set of vectors aj [ j = 1, . . . , r¦ contains a subset of n
linearly independent vectors.
1.6.12 (Isomorphic Polyhedral Sets)
It is easily seen that if a linear transformation is applied to a polyhedron, the
result is a polyhedron whose extreme points are the images of some (but not
necessarily all) of the extreme points of the original polyhedron. This exer
cise clariﬁes the circumstances where the extreme points of the original and the
transformed polyhedra are in onetoone correspondence. Let P and Q be two
polyhedra of !
n
and !
m
, respectively.
(a) Show that there is an aﬃne function f : !
n
→ !
m
that maps each extreme
point of P into a distinct extreme point of Q if and only if there exists an
aﬃne function g : !
n
→ !
m
that maps each extreme point of Q into a
distinct extreme point of P. Note: If there exist such aﬃne mappings f
and g, we say that P and Q are isomorphic.
(b) Show that P and Q are isomorphic if and only if there exist aﬃne functions
f : !
n
→ !
m
and g : !
n
→ !
m
such that x = g
_
f(x)
_
for all x ∈ P and
y = f
_
g(y)
_
for all y ∈ Q.
(c) Show that if P and Q are isomorphic then P and Q have the same dimen
sion.
136 Convex Analysis and Optimization Chap. 1
(d) Let A be an k n matrix and let
P = x ∈ !
n
[ Ax ≤ b, x ≥ 0¦,
Q =
_
(x, z) ∈ !
n+k
[ Ax + z = b, x ≥ 0, z ≥ 0
_
.
Show that P and Q are isomorphic.
1.6.13
Show by an example that the set of extreme points of a compact set need not be
closed. Hint : Consider a line segment C1 = (x1, x2, x3) [ x1 = 0, x2 = 0, −1 ≤
x3 ≤ 1¦ and a circular disk C2 = (x1, x2, x3) [ (x1 −1)
2
+x
2
2
≤ 1, x3 = 0¦, and
verify that conv(C1 ∪ C2) is compact, while its set of the extreme points is not
closed.
1.6.14
Show that a compact convex set is polyhedral if and only if it has a ﬁnite number
of extreme points. Give an example showing that the assertion fails if bound
edness of the set is replaced by a weaker assumption that the set contains no
lines.
1.6.15 (Gordan’s Theorem of the Alternative)
Let a1, . . . , ar be vectors in !
n
.
(a) Then exactly one of the following two conditions holds:
(i) There exists a vector x ∈ !
n
such that
a
1
x < 0, . . . , a
r
x < 0.
(ii) There exists a vector µ ∈ !
r
such that µ ,= 0, µ ≥ 0, and
µ1a1 + + µrar = 0.
(b) Show that an alternative and equivalent statement of part (a) is the follow
ing: a polyhedral cone has nonempty interior if and only if its polar cone
does not contain a line, i.e., a set of the form αz [ α ∈ !¦, where z is a
nonzero vector.
Sec. 1.6 Polyhedral Convexity 137
1.6.16
Let a1, . . . , ar be vectors in !
n
and let b1, . . . br be scalars. Then exactly one of
the following two conditions holds:
(i) There exists a vector x ∈ !
n
such that
a
1
x ≤ b1, . . . , a
r
x ≤ br.
(ii) There exists a vector µ ∈ !
r
such that µ ≥ 0 and
µ1a1 + + µrar = 0, µ1b1 + + µrbr < 0.
1.6.17 (Convex System Alternatives)
Let C be a nonempty convex set in !
n
and let fi : C → !, i = 1, . . . , r be convex
functions. Then exactly one of the following two conditions holds:
(i) There exists a vector x ∈ C such that
f1(x) < 0, . . . , fr(x) < 0.
(ii) There exists a vector µ ∈ !
r
such that µ ,= 0, µ ≥ 0, and
µ1f1(x) + + µrfr(x) ≥ 0, ∀ x ∈ C.
1.6.18 (ConvexAﬃne System Alternatives)
Let C be a nonempty convex set in !
n
. Let fi : C → !, i = 1, . . . , r be convex
functions, and let fi : C → !, i = r + 1, . . . , r be aﬃne functions such that the
system
f
r+1
(x) ≤ 0, . . . , fr(x) ≤ 0
has a solution x ∈ ri(C). Then exactly one of the following two conditions holds:
(i) There exists a vector x ∈ C such that
f1(x) < 0, . . . , f
r
(x) < 0, f
r+1
(x) ≤ 0, . . . , fr(x) ≤ 0.
(ii) There exists a vector µ ∈ !
r
such that µ ,= 0, µ ≥ 0, and
µ1f1(x) + + µrfr(x) ≥ 0, ∀ x ∈ C.
138 Convex Analysis and Optimization Chap. 1
1.6.19 (Facets)
Let P be a polyhedral set and let H be a hyperplane that passes through a
relative boundary point of P and contains P in one of its halfspaces (cf., Prop.
1.6.3). The set F = P ∩ H is called a facet of P.
(a) Show that each facet is a polyhedral set.
(b) Show that the number of distinct facets of P is ﬁnite.
(c) Show that each extreme point of C, viewed as a singleton set, is a facet.
(d) Show that if dim(P) > 0, there exists a facet of P whose dimension is
dim(P) −1.
1.6.20
Let f : !
n
→ (−∞, ∞] be a polyhedral function and let P ⊂ !
n
be a polyhedral
set. Show that if infx∈C f(x) is ﬁnite, then the set of minimizers of f over C is
nonempty. Hint : Use Exercise 1.6.8(b), and replace the problem of minimizing
f over C by an equivalent linear program.
1.6.21
Show that an mn matrix A is totally unimodular if and only if every subset J
of 1, . . . , n¦ can be partitioned into two subsets J1 and J2 such that
¸
¸
¸
¸
¸
¸
j∈J
1
aij −
j∈J
2
aij
¸
¸
¸
¸
¸
¸
≤ 1, ∀ i = 1, . . . , m.
1.6.22
Let A be a matrix with components that are either 1 or 1 or 0. Suppose that
A has at most two nonzero components in each column. Show that A is totally
unimodular.
1.6.23
Let A be a matrix with components that are either 0 or 1. Suppose that in each
column, all the components that are equal to 1 appear consecutively. Show that
A is totally unimodular.
Sec. 1.7 Subgradients 139
1.6.24
Let A be a square invertible matrix such that all the components of A are integer.
Show that A is unimodular if and only if the solution of the system Ax = b
has integer components for every vector b that has integer components. Hint:
To prove that A is unimodular, use the system Ax = ui, where ui is the ith
unit vector, to show that A
−1
has integer components. Then use the equality
det(A) det(A
−1
) = 1.
1.7 SUBGRADIENTS
Much of optimization theory revolves around comparing the value of the
cost function at a given point with its values at neighboring points. This
calls for an analysis approach that uses derivatives, as we have seen in Sec
tion 1.5. When the cost function is nondiﬀerentiable, this approach breaks
down, but fortunately, it turns out that in the case of a convex cost function
there is a convenient substitute, the notion of directional diﬀerentiabity and
the related notion of a subgradient, which are the subject of this section.
1.7.1 Directional Derivatives
Convex sets and functions can be characterized in many ways by their be
havior along lines. For example, a set is convex if and only if its intersection
with any line is convex, a convex set is bounded if and only if its intersec
tion with every line is bounded, a function is convex if and only if it is
convex along any line, and a convex function is coercive if and only if it
is coercive along any line. Similarly, it turns out that the diﬀerentiability
properties of a convex function are determined by the corresponding prop
erties along lines. With this in mind, we ﬁrst consider convex functions of
a single variable.
Let I be an interval of real numbers, and let f : I →' be convex. If
x, y, z ∈ I and x < y < z, then we can show the relation
f(y) −f(x)
y −x
≤
f(z) −f(x)
z −x
≤
f(z) −f(y)
z −y
, (1.51)
which is illustrated in Fig. 1.7.1. For a formal proof, note that, using the
deﬁnition of a convex function [cf. Eq. (1.3)], we obtain
f(y) ≤
_
y −x
z −x
_
f(z) +
_
z −y
z −x
_
f(x)
140 Convex Analysis and Optimization Chap. 1
slope =
slope = slope =
y  x
f(y)  f(x)
z  y
f(z)  f(y)
z  x
f(z)  f(x)
x y z
Figure 1.7.1. Illustration of the inequalities (1.51). The rate of change of the
function f is nondecreasing with its argument.
and either of the desired inequalities follows by appropriately rearranging
terms.
Let a and b be the inﬁmum and the supremum, respectively, of I,
also referred to as the left and right end points of I, respectively. For any
x ∈ I that is not equal to the right end point, and for any α > 0 such that
x +α ∈ I, we deﬁne
s
+
(x, α) =
f(x + α) −f(x)
α
.
Let 0 < α ≤ α
. We use the ﬁrst inequality in Eq. (1.51) with y = x + α
and z = x + α
to obtain s
+
(x, α) ≤ s
+
(x, α
). Therefore, s
+
(x, α) is a
nondecreasing function of α and, as α decreases to zero, it converges either
to a ﬁnite number or to −∞. Let f
+
(x) be the value of the limit, which we
call the right derivative of f at the point x. Similarly, for any x ∈ I that
is not equal to the left end point, and for any α > 0 such that x −α ∈ I,
we deﬁne
s
−
(x, α) =
f(x) −f(x −α)
α
.
By a symmetrical argument, s
−
(x, α) is a nonincreasing function of α. Its
limit as α decreases to zero, denoted by f
−
(x), is called the left derivative
of f at the point x, and is either ﬁnite or equal to ∞.
In the case where the end points a and b belong to the domain I of f,
we deﬁne for completeness f
−
(a) = −∞ and f
+
(b) = ∞. The basic facts
about the diﬀerentiability properties of onedimensional convex functions
can be easily visualized, and are given in the following proposition.
Sec. 1.7 Subgradients 141
Proposition 1.7.1: Let I ⊂ ' be an interval with end points a and
b, and let f : I → ' be a convex function.
(a) We have f
−
(x) ≤ f
+
(x) for every x ∈ I.
(b) If x belongs to the interior of I, then f
+
(x) and f
−
(x) are ﬁnite.
(c) If x, z ∈ I and x < z, then f
+
(x) ≤ f
−
(z).
(d) The functions f
−
, f
+
: I → [−∞, +∞] are nondecreasing.
(e) The function f
+
(respectively, f
−
) is right– (respectively, left–)
continuous at every interior point of I. Also, if a ∈ I (respec
tively, b ∈ I) and f is continuous at a (respectively, b), then f
+
(respectively, f
−
) is right– (respectively, left–) continuous at a
(respectively, b).
(f) If f is diﬀerentiable at a point x belonging to the interior of I,
then f
+
(x) = f
−
(x) = (df/dx)(x).
(g) For any x, z ∈ I and any d satisfying f
−
(x) ≤ d ≤ f
+
(x), we
have
f(z) ≥ f(x) + d(z −x).
(h) The function f
+
: I → (−∞, ∞] [respectively, f
−
: I → [−∞, ∞)]
is upper (respectively, lower) semicontinuous at every x ∈ I.
Proof: (a) If x is an end point of I, the result is trivial because f
−
(a) =
−∞ and f
+
(b) = ∞. We assume that x is an interior point, we let α > 0,
and use Eq. (1.51), with y = x + α and z = y − α, to obtain s
−
(x, α) ≤
s
+
(x, α). Taking the limit as α decreases to zero, we obtain f
−
(x) ≤ f
+
(x).
(b) Let x belong to the interior of I and let α > 0 be such that x −α ∈ I.
Then f
−
(x) ≥ s
−
(x, α) > −∞. For similar reasons, we obtain f
+
(x) < ∞.
Part (a) then implies that f
−
(x) < ∞ and f
+
(x) > −∞.
(c) We use Eq. (1.51), with y = (z + x)/2, to obtain s
+
_
x, (z − x)/2
_
≤
s
−
_
z, (z −x)/2
_
. The result then follows because f
+
(x) ≤ s
+
_
x, (z −x)/2
_
and s
−
_
z, (z −x)/2
_
≤ f
−
(z).
(d) This follows by combining parts (a) and (c).
(e) Fix some x ∈ I, x ,= b, and some positive δ and α such that x+δ+α < b.
We allow x to be equal to a, in which case f is assumed to be continuous
at a. We have f
+
(x + δ) ≤ s
+
(x +δ, α). We take the limit, as δ decreases
to zero, to obtain lim
δ↓0
f
+
(x +δ) ≤ s
+
(x, α). We have used here the fact
that s
+
(x, α) is a continuous function of x, which is a consequence of the
continuity of f (Prop. 1.2.12). We now let α decrease to zero to obtain
lim
δ↓0
f
+
(x + δ) ≤ f
+
(x). The reverse inequality is also true because f
+
142 Convex Analysis and Optimization Chap. 1
is nondecreasing and this proves the right–continuity of f
+
. The proof for
f
−
is similar.
(f) This is immediate from the deﬁnition of f
+
and f
−
.
(g) Fix some x, z ∈ I. The result is trivially true for x = z. We only
consider the case x < z; the proof for the case x > z is similar. Since
s
+
(x, α) is nondecreasing in α, we have
_
f(z) − f(x)
_
/(z − x) ≥ s
+
(x, α)
for α belonging to (0, z −x). Letting α decrease to zero, we obtain
_
f(z) −
f(x)
_
/(z −x) ≥ f
+
(x) ≥ d and the result follows.
(h) This follows from parts (a), (d), (e), and the deﬁning property of semi
continuity (Deﬁnition 1.1.5). Q.E.D.
We will now discuss notions of directional diﬀerentiablity of multi
dimensional realvalued functions (the extended realvalued case will be
discussed later). Given a function f : '
n
→ ', a point x ∈ '
n
, and a vec
tor y ∈ '
n
, we say that f is directionally diﬀerentiable at x in the direction
y if there is a scalar f
(x; y) such that
f
(x; y) = lim
α↓0
f(x +αy) −f(x)
α
.
We call f
(x; y) the directional derivative of f at x in the direction y. We
say that f is directional ly diﬀerentiable at a point x if it is directionally
diﬀerentiable at x in all directions. As in Section 1.1.4, we say that f is
diﬀerentiable at x if it is directionally diﬀerentiable at x and f
(x; y) is a
linear function of y denoted by
f
(x; y) = ∇f(x)
y
where ∇f(x) is the gradient of f at x.
An interesting property of realvalued convex functions is that they
are directionally diﬀerentiable at all points x. This is a consequence of the
directional diﬀerentiability of scalar convex functions, as can be seen from
the relation
f
(x; y) = lim
α↓0
f(x +αy) −f(x)
α
= lim
α↓0
Fy(α) −Fy(0)
α
= F
+
y (0), (1.52)
where F
+
y
(0) is the right derivative of the convex scalar function
F
y
(α) = f(x + αy)
at α = 0. Note that the above calculation also shows that the left derivative
F
−
y
(0) of F
y
is equal to −f
(x; −y) and, by using Prop. 1.7.1(a), we obtain
F
−
y
(0) ≤ F
+
y
(0), or equivalently,
−f
(x; −y) ≤ f
(x; y), ∀ y ∈ '
n
. (1.53)
Sec. 1.7 Subgradients 143
Note also that for a convex function, the diﬀerence quotient
_
f(x + αy) −
f(x)
_
/α is a monotonically nondecreasing function of α, so an equivalent
deﬁnition of the directional derivative is
f
(x; y) = inf
α>0
f(x +αy) −f(x)
α
.
The following proposition generalizes the upper semicontinuity prop
erty of right derivatives of scalar convex functions [Prop. 1.7.1(h)], and
shows that if f is diﬀerentiable, then its gradient is continuous.
Proposition 1.7.2: Let f : '
n
→ ' be convex, and let ¦f
k
¦ be a
sequence of convex functions f
k
: '
n
→ ' with the property that
lim
k→∞
f
k
(x
k
) = f(x) for every x ∈ '
n
and every sequence ¦x
k
¦ that
converges to x. Then for any x ∈ '
n
and y ∈ '
n
, and any sequences
¦x
k
¦ and ¦y
k
¦ converging to x and y, respectively, we have
limsup
k→∞
f
k
(x
k
; y
k
) ≤ f
(x; y). (1.54)
Furthermore, if f is diﬀerentiable over '
n
, then it is continuously
diﬀerentiable over '
n
.
Proof: For any µ > f
(x; y), there exists an α > 0 such that
f(x + αy) −f(x)
α
< µ, ∀ α ≤ α.
Hence, for α ≤ α, we have
f
k
(x
k
+ αy
k
) −f
k
(x
k
)
α
≤ µ
for all suﬃciently large k, and using Eq. (1.52), we obtain
limsup
k→∞
f
k
(x
k
; y
k
) < µ.
Since this is true for all µ > f
(x; y), inequality (1.54) follows.
If f is diﬀerentiable at all x ∈ '
n
, then using the continuity of f and
the part of the proposition just proved, we have for every sequence ¦x
k
¦
converging to x and every y ∈ '
n
,
limsup
k→∞
∇f(x
k
)
y = limsup
k→∞
f
(x
k
; y) ≤ f
(x; y) = ∇f(x)
y.
By replacing y by −y in the preceding argument, we obtain
−liminf
k→∞
∇f(x
k
)
y = limsup
k→∞
_
−∇f(x
k
)
y
_
≤ −∇f(x)
y.
Therefore, we have ∇f(x
k
)
y → ∇f(x)
y for every y, which implies that
∇f(x
k
) → ∇f(x). Hence, the gradient is continuous. Q.E.D.
144 Convex Analysis and Optimization Chap. 1
1.7.2 Subgradients and Subdiﬀerentials
Given a convex function f : '
n
→ ', we say that a vector d ∈ '
n
is a
subgradient of f at a point x ∈ '
n
if
f(z) ≥ f(x) + (z −x)
d, ∀ z ∈ '
n
. (1.55)
If instead f is a concave function, we say that d is a subgradient of f at
x if −d is a subgradient of the convex function −f at x. The set of all
subgradients of a convex (or concave) function f at x ∈ '
n
is called the
subdiﬀerential of f at x, and is denoted by ∂f(x).
A subgradient admits an intuitive geometrical interpretation: it can
be identiﬁed with a nonvertical supporting hyperplane to the graph of the
function, as illustrated in Fig. 1.7.2. Such a hyperplane provides a linear
approximation to the function, which is an underestimate in the case of
a convex function and an overestimate in the case of a concave function.
Figure 1.7.3 provides some examples of subdiﬀerentials.
z
(x,f(x))
f(z)
(  d, 1)
Figure 1.7.2. Illustration of a subgradient of a convex function f. The deﬁning
relation (1.55) can be written as
f(z) −z
d ≥ f(x) −x
d, ∀ z ∈
n
.
Equivalently, the hyperplane in
n+1
that has normal (−d, 1) and passes through
_
x, f(x)
_
supports the epigraph of f, as shown in the ﬁgure.
The directional derivative and the subdiﬀerential of a convex function
are closely linked, as shown in the following proposition.
Sec. 1.7 Subgradients 145
f(x) = x f(x) = max{0, (1/2)(x
2
 1)}
0 0 1  1 x
∂f(x) ∂f(x)
0 0 1  1
1
 1
x x
x
Figure 1.7.3. The subdiﬀerential of some scalar convex functions as a function
of the argument x.
Proposition 1.7.3: Let f : '
n
→ ' be convex. For every x ∈ '
n
,
the following hold:
(a) A vector d is a subgradient of f at x if and only if
f
(x; y) ≥ y
d, ∀ y ∈ '
n
.
(b) The subdiﬀerential ∂f(x) is a nonempty, convex, and compact
set, and there holds
f
(x; y) = max
d∈∂f(x)
y
d, ∀ y ∈ '
n
. (1.56)
In particular, f is diﬀerentiable at x with gradient ∇f(x), if and
only if it has ∇f(x) as its unique subgradient at x.
(c) If X is a bounded set, the set ∪
x∈X
∂f(x) is bounded.
Proof: (a) The subgradient inequality (1.55) is equivalent to
f(x + αy) −f(x)
α
≥ y
d, ∀ y ∈ '
n
, α > 0.
Since the quotient on the left above decreases monotonically to f
(x; y) as
α ↓ 0 [Eq. (1.51)], we conclude that the subgradient inequality (1.55) is
146 Convex Analysis and Optimization Chap. 1
equivalent to f
(x; y) ≥ y
d for all y ∈ '
n
. Therefore we obtain
d ∈ ∂f(x) ⇐⇒ f
(x; y) ≥ y
d, ∀ y ∈ '
n
. (1.57)
(b) From Eq. (1.57), we see that ∂f(x) is the intersection of the closed
halfspaces
_
d [ y
d ≤ f
(x; y)
_
, where y ranges over the nonzero vectors of
'
n
. It follows that ∂f(x) is closed and convex.
To show that ∂f(x) is also bounded, suppose to arrive at a contradic
tion that there is a sequence ¦d
k
¦ ⊂ ∂f(x) with [[d
k
[[ → ∞. Let y
k
=
d
k
d
k

.
Then, from the subgradient inequality (1.55), we have
f(x + y
k
) ≥ f(x) + y
k
d
k
= f(x) +[[d
k
[[, ∀ k,
so it follows that f(x +y
k
) → ∞. This is a contradiction since f is convex
and hence continuous, so it is bounded on any bounded set. Thus ∂f(x) is
bounded.
To show that ∂f(x) is nonempty and that Eq. (1.56) holds, we ﬁrst
observe that Eq. (1.57) implies that f
(x; y) ≥ max
d∈∂f(x)
y
d [where the
maximum is −∞ if ∂f(x) is empty]. To show the reverse inequality, take
any x and y in '
n
, and consider the subset of '
n+1
C
1
=
_
(µ, z) [ µ > f(z)
_
,
and the halfline
C
2
=
_
(µ, z) [ µ = f(x) +αf
(x; y), z = x + αy, α ≥ 0
_
;
see Fig. 1.7.4. Using the deﬁnition of directional derivative and the convex
ity of f, it follows that these two sets are nonempty, convex, and disjoint.
By applying the Separating Hyperplane Theorem (Prop. 1.4.2), we see that
there exists a nonzero vector (γ, w) ∈ '
n+1
such that
γµ+w
z ≥ γ
_
f(x)+αf
(x; y)
_
+w
(x+αy), ∀ α ≥ 0, z ∈ '
n
, µ > f(z).
(1.58)
We cannot have γ < 0 since then the lefthand side above could be made
arbitrarily small by choosing µ suﬃciently large. Also if γ = 0, then Eq.
(1.58) implies that w = 0, which is a contradiction. Therefore, γ > 0 and
by dividing with γ in Eq. (1.58), we obtain
µ+(z−x)
(w/γ) ≥ f(x)+αf
(x; y)+αy
(w/γ), ∀ α ≥ 0, z ∈ '
n
, µ > f(z).
(1.59)
By setting α = 0 in the above relation and by taking the limit as µ ↓ f(z),
we obtain f(z) ≥ f(x) + (z − x)
(−w/γ) for all z ∈ '
n
, implying that
(−w/γ) ∈ ∂f(x). Hence ∂f(x) is nonempty. By setting z = x and α = 1
in Eq. (1.59), and by taking the limit as µ ↓ f(x), we obtain y
(−w/γ) ≥
Sec. 1.7 Subgradients 147
C
1
f(z)
0 y x z
µ
C
2
Figure 1.7.4. Illustration of the sets C
1
and C
2
used in the hyperplane separation
argument of the proof of Prop. 1.7.3(b).
f
(x; y), which implies that max
d∈∂f(x)
y
d ≥ f
(x; y). The proof of Eq.
(1.56) is complete.
From the deﬁnition of directional derivative, we see that f is diﬀer
entiable at x with gradient ∇f(x) if and only if the directional derivative
f
(x; y) is a linear function of the form f
(x; y) = ∇f(x)
y. Thus, from Eq.
(1.56), f is diﬀerentiable at x with gradient ∇f(x), if and only if it has
∇f(x) as its unique subgradient at x.
(c) Assume the contrary, i.e. that there exists a sequence ¦x
k
¦ ⊂ X, and a
sequence ¦d
k
¦ with d
k
∈ ∂f(x
k
) for all k and d
k
 → ∞. Without loss of
generality, we assume that d
k
,= 0 for all k, and we denote y
k
= d
k
/d
k
.
Since both ¦x
k
¦ and ¦y
k
¦ are bounded, they must contain convergent sub
sequences. We assume without loss of generality that x
k
converges to some
x and y
k
converges to some y with y = 1. By Eq. (1.56), we have
f
(x
k
; y
k
) ≥ d
k
y
k
= d
k
,
so it follows that f
(x
k
; y
k
) → ∞. This contradicts, however, Eq. (1.54),
which requires that limsup
k→∞
f
(x
k
; y
k
) ≤ f
(x; y). Q.E.D.
The next proposition shows some basic properties of subgradients.
148 Convex Analysis and Optimization Chap. 1
Proposition 1.7.4: Let f : '
n
→ ' be convex.
(a) If a sequence ¦x
k
¦ converges to x and d
k
∈ ∂f(x
k
) for all k,
the sequence ¦d
k
¦ is bounded and each of its limit points is a
subgradient of f at x.
(b) If f is equal to the sum f1 + + fm of convex functions fj :
'
n
→ ', j = 1, . . . , m, then ∂f(x) is equal to the vector sum
∂f
1
(x) + + ∂f
m
(x).
Proof: (a) By Prop. 1.7.3(c), the sequence ¦d
k
¦ is bounded, and by Prop.
1.7.3(a), we have
y
d
k
≤ f
(x
k
; y), ∀ y ∈ '
n
.
If d is a limit point of ¦d
k
¦, we have by taking limit in the above relation
and by using Prop. 1.7.2
y
d ≤ limsup
k→∞
f
(x
k
; y) ≤ f
(x; y), ∀ y ∈ '
n
.
Therefore, by Prop. 1.7.3(a), we have d ∈ ∂f(x).
(b) It will suﬃce to prove the result for the case where f = f
1
+ f
2
. If
d
1
∈ ∂f
1
(x) and d
2
∈ ∂f
2
(x), then from the subgradient inequality (1.55),
we have
f
1
(z) ≥ f
1
(x) + (z −x)
d
1
, ∀ z ∈ '
n
,
f
2
(z) ≥ f
2
(x) + (z −x)
d
2
, ∀ z ∈ '
n
,
so by adding, we obtain
f(z) ≥ f(x) + (z −x)
(d
1
+ d
2
), ∀ z ∈ '
n
.
Hence d
1
+ d
2
∈ ∂f(x), implying that ∂f
1
(x) + ∂f
2
(x) ⊂ ∂f(x).
To prove the reverse inclusion, suppose to arrive at a contradiction,
that there exists a d ∈ ∂f(x) such that d / ∈ ∂f
1
(x)+∂f
2
(x). Since by Prop.
1.7.3(b), the sets ∂f
1
(x) and ∂f
2
(x) are compact, the set ∂f
1
(x) + ∂f
2
(x)
is compact (cf. Prop. 1.2.16), and by Prop. 1.4.3, there exists a hyperplane
strictly separating d from ∂f1(x) + ∂f2(x), i.e., a vector y and a scalar b
such that
y
(d
1
+ d
2
) < b < y
d, ∀ d
1
∈ ∂f
1
(x), ∀ d
2
∈ ∂f
2
(x).
Therefore,
max
d
1
∈∂f
1
(x)
y
d
1
+ max
d
2
∈∂f
2
(x)
y
d
2
< y
d,
and by Prop. 1.7.3(b),
f
1
(x; y) + f
2
(x; y) < y
d.
Sec. 1.7 Subgradients 149
By using the deﬁnition of directional derivative, we have f
1
(x; y)+f
2
(x; y) =
f
(x; y), so that
f
(x; y) < y
d,
which is a contradiction in view of Prop. 1.7.3(a). Q.E.D.
We close this section with some versions of the chain rule for direc
tional derivatives and subgradients.
Proposition 1.7.5: (Chain Rule)
(a) Let f : '
n
→ ' be a convex function and let A be an m n
matrix. Then the subdiﬀerential of the function F deﬁned by
F(x) = f(Ax),
is given by
∂F(x) = A
∂f(Ax) =
_
A
g [ g ∈ ∂f(Ax)
_
.
(b) Let f : '
n
→ ' be a convex function and let g : ' → ' be a
smooth scalar function. Then the function F deﬁned by
F(x) = g
_
f(x)
_
is directionally diﬀerentiable at all x, and its directional deriva
tive is given by
F
(x; y) = ∇g
_
f(x)
_
f
(x; y), ∀ x ∈ '
n
, y ∈ '
n
. (1.60)
Furthermore, if g is convex and monotonically nondecreasing,
then F is convex and its subdiﬀerential is given by
∂F(x) = ∇g
_
f(x)
_
∂f(x), ∀ x ∈ '
n
. (1.61)
Proof: (a) It is seen using the deﬁnition of directional derivative that
F
(x; y) = f
(Ax; Ay), ∀ y ∈ '
n
.
Let g ∈ ∂f(Ax) and d = A
g. Then by Prop. 1.7.3(a), we have
g
z ≤ f
(Ax; z) ∀ z ∈ '
m
,
150 Convex Analysis and Optimization Chap. 1
and in particular,
g
Ay ≤ f
(Ax; Ay) ∀ y ∈ '
n
,
or
(A
g)
y ≤ F
(x; y), ∀ y ∈ '
n
.
Hence, by Prop. 1.7.3(a), we have A
g ∈ ∂F(x), so that A
∂f(Ax) ⊂ ∂F(x).
To prove the reverse inclusion, suppose to come to a contradiction,
that there exists a d ∈ ∂F(x) such that d / ∈ A
∂f(Ax). Since by Prop.
1.7.3(b), the set ∂f(Ax) is compact, the set A
∂f(Ax) is also compact
[cf. Prop. 1.1.9(d)], and by Prop. 1.4.3, there exists a hyperplane strictly
separating d from A
∂f(Ax), i.e., a vector y and a scalar b such that
y
(A
g) < b < y
d, ∀ g ∈ ∂f(Ax).
From this we obtain
max
g∈∂f(Ax)
(Ay)
g < y
d,
or, by using Prop. 1.7.3(b),
f
(Ax; Ay) < y
d.
Since f
(Ax; Ay) = F
(x; y), it follows that
F
(x; y) < y
d,
which is a contradiction in view of Prop. 1.7.3(a).
(b) We have
F
(x; y) = lim
α↓0
F(x + αy) −F(x)
α
= lim
α↓0
g
_
f(x + αy)
_
−g
_
f(x)
_
α
. (1.62)
From the convexity of f it follows that there are three possibilities: (1)
for some α > 0, f(x + αy) = f(x) for all α ∈ (0, α], (2) for some α > 0,
f(x + αy) > f(x) for all α ∈ (0, α], (3) for some α > 0, f(x + αy) < f(x)
for all α ∈ (0, α].
In case (1), from Eq. (1.62), we have F
(x; y) = f
(x; y) = 0 and the
given formula (1.60) holds. In case (2), from Eq. (1.62), we have for all
α ∈ (0, α]
F
(x; y) = lim
α↓0
f(x +αy) −f(x)
α
g
_
f(x + αy)
_
−g
_
f(x)
_
f(x + αy) −f(x)
.
As α ↓ 0, we have f(x + αy) → f(x), so the preceding equation yields
F
(x; y) = ∇g
_
f(x)
_
f
(x; y). The proof for case (3) is similar.
Sec. 1.7 Subgradients 151
If g is convex and monotonically nondecreasing, then F is convex
since for any x1, x2 ∈ '
n
and α ∈ [0, 1], we have
αF(x
1
) + (1 −α)F(x
2
) = αg
_
f(x
1
)
_
+ (1 −α)g
_
f(x
2
)
_
≥ g
_
αf(x
1
) + (1 −α)f(x
2
)
_
≥ g
_
f
_
αx
1
+ (1 −α)x
2
__
= F
_
αx
1
+ (1 −α)x
2
_
,
where for the ﬁrst inequality we use the convexity of g, and for the second
inequality we use the convexity of f and the monotonicity of g. To obtain
the formula for the subdiﬀerential of F, we note that by Prop. 1.7.3(a),
d ∈ ∂F(x) if and only if y
d ≤ F
(x; y) for all y ∈ '
n
, or equivalently (from
what has been already shown)
y
d ≤ ∇g
_
f(x)
_
f
(x; y), ∀ y ∈ '
n
.
If ∇g
_
f(x)
_
= 0, this relation yields d = 0, so ∂F(x) = ¦0¦ and the desired
formula (1.61) holds. If ∇g
_
f(x)
_
,= 0, we have ∇g
_
f(x)
_
> 0 by the
monotonicity of g, so we obtain
y
d
∇g
_
f(x)
_ ≤ f
(x; y), ∀ y ∈ '
n
,
which, by Prop. 1.7.3(a), is equivalent to d/∇g
_
f(x)
_
∈ ∂f(x). Thus we
have shown that d ∈ ∂F(x) if and only if d/∇g
_
f(x)
_
∈ ∂f(x), which
proves the desired formula (1.61). Q.E.D.
1.7.3 Subgradients
We now consider a notion of approximate subgradient. Given a convex
function f : '
n
→ ' and a scalar > 0, we say that a vector d ∈ '
n
is an
subgradient of f at a point x ∈ '
n
if
f(z) ≥ f(x) + (z −x)
d −, ∀ z ∈ '
n
. (1.63)
If instead f is a concave function, we say that d is an subgradient of f
at x if −d is a subgradient of the convex function −f at x. The set of all
subgradients of a convex (or concave) function f at x ∈ '
n
is called the
subdiﬀerential of f at x, and is denoted by ∂f(x). The subdiﬀerential
is illustrated geometrically in Fig. 1.7.5.
subgradients ﬁnd several applications in nondiﬀerentiable optimiza
tion, particularly in the context of computational methods. Some of their
important properties are given in the following proposition.
152 Convex Analysis and Optimization Chap. 1
0 x
f(z)
ε
Slopes = endpoints of
εsubdifferential at x
z
Figure 1.7.5. Illustration of the subdiﬀerential ∂f(x) of a convex onedimensional
function f : → . The subdiﬀerential is a bounded interval, and corresponds
to the set of slopes indicated in the ﬁgure. Its left endpoint is
f
−
(x) = sup
δ<0
f(x +δ) −f(x) +
δ
,
and its right endpoint is
f
+
j,
(x) = inf
δ>0
f(x +δ) −f(x) +
δ
.
Sec. 1.7 Subgradients 153
Proposition 1.7.6: Let f : '
n
→ ' be convex and be a positive
scalar. For every x ∈ '
n
, the following hold:
(a) The subdiﬀerential ∂f
(x) is a nonempty, convex, and compact
set, and for all y ∈ '
n
there holds
inf
α>0
f(x + αy) −f(x) +
α
= max
d∈∂f(x)
y
d.
(b) We have 0 ∈ ∂
f(x) if and only if
f(x) ≤ inf
z∈
n
f(z) + .
(c) If a direction y is such that y
d < 0 for all d ∈ ∂
f(x), then
inf
α>0
f(x +αy) < f(x) −.
(d) If 0 / ∈ ∂
f(x), then the direction y = −d, where
d = arg min
d∈∂f(x)
d,
satisﬁes y
d < 0 for all d ∈ ∂
f(x).
(e) If f is equal to the sum f
1
+ + f
m
of convex functions f
j
:
'
n
→ ', j = 1, . . . , m, then
∂f(x) ⊂ ∂f1(x) + +∂fm(x) ⊂ ∂mf(x).
Proof: (a) We have
d ∈ ∂
f(x) ⇐⇒ f(x + αy) ≥ f(x) +αy
d −, ∀ α > 0, ∀ y ∈ '
n
.
(1.64)
Hence
d ∈ ∂
f(x) ⇐⇒ inf
α>0
f(x +αy) −f(x) +
α
≥ y
d, ∀ y ∈ '
n
. (1.65)
It follows that ∂
f(x) is the intersection of the closed subspaces
_
d
¸
¸
¸ inf
α>0
f(x + αy) −f(x) +
α
≥ y
d
_
154 Convex Analysis and Optimization Chap. 1
as y ranges over '
n
. Hence ∂
f(x) is closed and convex. To show that
∂f(x) is also bounded, suppose to arrive at a contradiction that there is a
sequence ¦d
k
¦ ⊂ ∂
f(x) with [[d
k
[[ → ∞. Let y
k
=
d
k
d
k

. Then, from Eq.
(1.64), we have for α = 1
f(x + y
k
) ≥ f(x) +[[d
k
[[ −, ∀ k,
so it follows that f(x +y
k
) → ∞. This is a contradiction since f is convex
and hence continuous, so it is bounded on any bounded set. Thus ∂
f(x)
is bounded.
To show that ∂
f(x) is nonempty and satisﬁes
inf
α>0
f(x + αy) −f(x) +
α
= max
d∈∂f(x)
y
d, ∀ y ∈ '
n
,
we argue similar to the proof of Prop. 1.7.3(b). Consider the subset of
'
n+1
C1 =
_
(µ, z) [ µ > f(z)
_
,
and the halfline
C
2
= ¦(µ, z) [ µ = f(x)−+β inf
α>0
f(x + αy) −f(x) +
α
, z = x+βy, β ≥ 0¦.
These sets are nonempty and convex. They are also disjoint, since we have
for all (µ, z) ∈ C
2
µ = f(x) − + β inf
α>0
f(x + αy) −f(x) +
α
≤ f(x) − + β
f(x + βy) −f(x) +
β
= f(x + βy)
= f(z).
Hence there exists a hyperplane separating them, i.e., a vector (γ, w) ,=
(0, 0) such that for all β ≥ 0, z ∈ '
n
, and µ > f(z),
γµ + w
z ≤ γ
_
f(x) − + β inf
α>0
f(x + αy) −f(x) +
α
_
+ w
(x + βy).
(1.66)
We cannot have γ > 0, since then the lefthand side above could be made
arbitrarily large by choosing µ to be suﬃciently large. Also, if γ = 0,
Eq. (1.66) implies that w = 0, contradicting the fact that (γ, w) ,= (0, 0).
Therefore, γ < 0 and after dividing Eq. (1.66) by γ, we obtain for all β ≥ 0,
z ∈ '
n
, and µ > f(z),
µ +
_
w
γ
_
(z −x) ≥ f(x) − + β inf
α>0
f(x + αy) −f(x) +
α
+ β
_
w
γ
_
y.
(1.67)
Sec. 1.7 Subgradients 155
Taking the limit above as µ ↓ f(z) and setting β = 0, we obtain
f(z) ≥ f(x) − +
_
−
w
γ
_
(z −x), ∀ z ∈ '
n
.
Hence −w/γ belongs to ∂
f(x), showing that ∂
f(x) is nonempty. Also by
taking z = x in Eq. (1.67), and by letting µ ↓ f(x) and by dividing with β,
we obtain
−
w
γ
y ≥ −
β
+ inf
α>0
f(x +αy) −f(x) +
α
.
Since β can be chosen as large as desired, we see that
−
w
γ
y ≥ inf
α>0
f(x +αy) −f(x) +
α
.
Combining this relation with Eq. (1.65), we obtain
max
d∈∂f(x)
d
y = inf
α>0
f(x + αy) −f(x) −
α
.
(b) By deﬁnition, 0 ∈ ∂f(x) if and only if f(z) ≥ f(x) − for all z ∈ '
n
,
which is equivalent to inf
z∈
n f(z) ≥ f(x) −.
(c) Assume that a direction y is such that
max
d∈∂f(x)
y
d < 0, (1.68)
while inf
α>0
f(x + αy) ≥ f(x) − . Then f(x + αy) − f(x) ≥ − for all
α > 0, or equivalently
f(x +αy) −f(x) +
α
≥ 0, ∀ α > 0.
Consequently, using part (a), we have
max
d∈∂f(x)
y
d = inf
α>0
f(x + αy) −f(x) +
α
≥ 0.
which contradicts Eq. (1.68).
(d) The vector d is the projection of the zero vector on the convex and
compact set ∂
f(x). If 0 ,∈ ∂
f(x), we have d > 0, while by Prop. 1.3.3,
(d −d)
d ≥ 0, ∀ d ∈ ∂f(x).
Hence
d
d ≥ [[d[[
2
> 0, ∀ d ∈ ∂
f(x).
156 Convex Analysis and Optimization Chap. 1
(e) It will suﬃce to prove the result for the case where f = f
1
+ f
2
. If
d1 ∈ ∂f1(x) and d2 ∈ ∂f2(x), then from Eq. (1.64), we have
f
1
(x + αy) ≥ f
1
(x) + αy
d
1
−, ∀ α > 0, ∀ y ∈ '
n
,
f
2
(x + αy) ≥ f
2
(x) + αy
d
2
−, ∀ α > 0, ∀ y ∈ '
n
,
so by adding, we obtain
f(x + αy) ≥ f(x) + αy
(d1 + d2) −2, ∀ α > 0, ∀ y ∈ '
n
.
Hence from Eq. (1.64), we have d
1
+d
2
∈ ∂
2
f(x), implying that ∂
f
1
(x) +
∂
f
2
(x) ⊂ ∂
2
f(x).
To prove that ∂
f(x) ⊂ ∂
f
1
(x)+∂
f
2
(x), we use an argument similar
to the one of the proof of Prop. 1.7.4(b). Suppose to come to a contradic
tion, that there exists a d ∈ ∂f(x) such that d / ∈ ∂f1(x)+∂f2(x). Since by
part (a), the sets ∂f1(x) and ∂f2(x) are compact, the set ∂f1(x)+∂f2(x)
is compact (cf. Prop. 1.2.16), and by Prop. 1.4.3, there exists a hyperplane
strictly separating d from ∂
f
1
(x) + ∂
f
2
(x), i.e., a vector y and a scalar b
such that
y
(d1 + d2) < b < y
d, ∀ d1 ∈ ∂f1(x), ∀ d2 ∈ ∂f2(x).
From this we obtain
max
d
1
∈∂f
1
(x)
y
d
1
+ max
d
2
∈∂f
2
(x)
y
d
2
< y
d,
and by part (a),
inf
α>0
f
1
(x + αy) −f
1
(x) +
α
+ inf
α>0
f
2
(x + αy) −f
2
(x) +
α
< y
d.
Let α
j
, j = 1, 2, be positive scalars such that
f
1
(x + α
1
y) −f
1
(x) +
α1
+
f
2
(x +α
2
y) −f
2
(x) +
α2
< y
d. (1.69)
Deﬁne
α =
1
1/α
1
+ 1/α
2
.
As a consequence of the convexity of f
j
, j = 1, 2, the ratio
_
f
j
(x + αy) −
fj(x)
_
/α is monotonically nondecreasing in α. Thus, since αj ≥ α, we have
f
j
(x + α
j
y) −f
j
(x)
αj
≥
f
j
(x +αy) −f
j
(x)
α
, j = 1, 2,
Sec. 1.7 Subgradients 157
and Eq. (1.69) together with the deﬁnition of α yields
y
d >
f
1
(x + α
1
y) −f
1
(x) +
α1
+
f
1
(x + α
1
y) −f
1
(x) +
α1
≥
α
+
f
1
(x + αy) −f
1
(x)
α
+
f
2
(x + αy) −f
2
(x)
α
≥ inf
α>0
f(x + αy) −f(x) +
α
.
This contradicts Eq. (1.65), thus implying that ∂f(x) ⊂ ∂f1(x)+∂f2(x).
Q.E.D.
Parts (b)(d) of Prop. 1.7.6 contain the elements of an important algo
rithm (called descent algorithm and introduced by Bertsekas and Mitter
[BeM71], [BeM73]) for minimizing convex functions to within a tolerance
of . At a given point x, we check whether 0 ∈ ∂f(x). If this is so, then
by Prop. 1.7.6(a), x is an optimal solution. If not, then by going along
the direction opposite to the vector of minimum norm on ∂
f(x), we are
guaranteed a cost improvement of at least . An implementation of this
algorithm due to Lemarechal [Lem74], [Lem75] is given in the exercises.
1.7.4 Subgradients of Extended RealValued Functions
We have focused so far on realvalued convex functions f : '
n
→ ', which
are deﬁned over the entire space '
n
and are convex over '
n
. The notion
of a subdiﬀerential and a subgradient of an extended realvalued convex
function f : '
n
→ (−∞, ∞] can be developed along similar lines. In
particular, a vector d is a subgradient of f at a vector x ∈ dom(f) [i.e.,
f(x) < ∞] if the subgradient inequality holds, i.e.,
f(z) ≥ f(x) + (z −x)
d, ∀ z ∈ '
n
.
If the extended realvalued function f : '
n
→ [−∞, ∞) is concave, d is
said to be a subgradient of f at a vector x with f(x) > −∞ if −d is a
subgradient of the convex function −f at x.
The subdiﬀerential ∂f(x) is the set of all subgradients of the convex
(or concave) function f. By convention, ∂f(x) is considered empty for all
x with f(x) = ∞. Note that contrary to the case of realvalued functions,
∂f(x) may be empty, or closed but unbounded. For example, the extended
realvalued convex function given by
f(x) =
_
−
√
x if 0 ≤ x ≤ 1,
∞ otherwise,
has the subdiﬀerential
∂f(x) =
_
_
_
−
1
2
√
x
if 0 < x < 1,
[−1/2, ∞) if x = 1,
Ø if x ≤ 0 or 1 < x.
158 Convex Analysis and Optimization Chap. 1
Thus, ∂f(x) can be empty and can be unbounded at points x that belong
to the eﬀective domain of f (as in the cases x = 0 and x = 1, respectively,
of the above example). However, it can be shown that ∂f(x) is nonempty
and compact at points x that are interior points of the eﬀective domain of
f, as also illustrated by the above example.
Similarly, a vector d is an subgradient of f at a vector x such that
f(x) < ∞ if
f(z) ≥ f(x) + (z −x)
d −, ∀ z ∈ '
n
.
The subdiﬀerential ∂f(x) is the set of all subgradients of f. Figure
1.7.6 illustrates the deﬁnition of ∂
f(x) for the case of a onedimensional
function f. The ﬁgure illustrates that even when f is extended realvalued,
the subdiﬀerential ∂
f(x) is nonempty at all points of its eﬀective domain.
This can be shown for multidimensional functions f as well.
0
x
f(z)
ε
Slopes = endpoints of
εsubdifferential at x
D
0
x
f(z)
ε
Slope = right endpoint of
εsubdifferential at x
D
z z
Figure 1.7.6. Illustration of the subdiﬀerential ∂f(x) of a onedimensional
function f : → (−∞, ∞], which is convex and has as eﬀective domain an
interval D. The subdiﬀerential corresponds to the set of slopes indicated in the
ﬁgure. Note that ∂f(x) is nonempty at all x ∈ D. Its left endpoint is
f
−
(x) =
_
sup
δ<0, x+δ∈D
f(x+δ)−f(x)+
δ
if inf D < x,
−∞ if inf D = x,
and its right endpoint is
f
+
j,
(x) =
_
inf
δ>0, x+δ∈D
f(x+δ)−f(x)+
δ
if x < supD,
∞ if x = supD.
Note that these endpoints can be −∞ (as in the ﬁgure on the right) or ∞. For
= 0, the above formulas also give the endpoints of the subdiﬀerential ∂f(x).
Note that while ∂f(x) is nonempty for all x in the interior of D, it may be empty
for x at the relative boundary of D (as in the ﬁgure on the right).
One can provide generalized versions of the results of Props. 1.7.3
1.7.6 within the context of extended realvalued convex functions, but with
Sec. 1.7 Subgradients 159
appropriate adjustments and additional assumptions to deal with cases
where ∂f(x) may be empty or noncompact. Some of these generalizations
are discussed in the exercises.
1.7.5 Directional Derivative of the Max Function
As mentioned in Section 1.3.4, the max function f(x) = max
z∈Z
φ(x, z)
arises in a variety of interesting contexts, including duality. It is therefore
important to characterize this function, and the following proposition gives
the directional derivative and the subdiﬀerential of f for the case where
φ(, z) is convex for all z ∈ Z.
160 Convex Analysis and Optimization Chap. 1
Proposition 1.7.7: (Danskin’s Theorem) Let Z ⊂ '
m
be a
compact set, and let φ : '
n
Z → ' be continuous and such that
φ(, z) : '
n
→ ' is convex for each z ∈ Z.
(a) The function f : '
n
→ ' given by
f(x) = max
z∈Z
φ(x, z) (1.70)
is convex and has directional derivative given by
f
(x; y) = max
z∈Z(x)
φ
(x, z; y), (1.71)
where φ
(x, z; y) is the directional derivative of the function φ(, z)
at x in the direction y, and Z(x) is the set of maximizing points
in Eq. (1.70)
Z(x) =
_
z
¸
¸
¸ φ(x, z) = max
z∈Z
φ(x, z)
_
.
In particular, if Z(x) consists of a unique point z and φ(, z) is
diﬀerentiable at x, then f is diﬀerentiable at x, and ∇f(x) =
∇
x
φ(x, z), where ∇
x
φ(x, z) is the vector with coordinates
∂φ(x, z)
∂x
i
, i = 1, . . . , n.
(b) If φ(, z) is diﬀerentiable for all z ∈ Z and ∇xφ(x, ) is continuous
on Z for each x, then
∂f(x) = conv
_
∇
x
φ(x, z) [ z ∈ Z(x)
_
, ∀ x ∈ '
n
.
(c) The conclusion of part (a) also holds if, instead of assuming the
Z is compact, we assume that Z(x) is nonempty for all x ∈ '
n
,
and that φ and Z are such that for every convergent sequence
¦x
k
¦, there exists a bounded sequence ¦z
k
¦ with z
k
∈ Z(x
k
) for
all k.
Proof: (a) The convexity of f has been established in Prop. 1.2.3(c). We
note that since φ is continuous and Z is compact, the set Z(x) is nonempty
by Weierstrass’ Theorem (Prop. 1.3.1) and f takes real values. For any
Sec. 1.7 Subgradients 161
z ∈ Z(x), y ∈ '
n
, and α > 0, we use the deﬁnition of f to obtain
f(x + αy) −f(x)
α
≥
φ(x + αy, z) −φ(x, z)
α
.
Taking the limit as α decreases to zero, we obtain f
(x; y) ≥ φ
(x, z; y).
Since this is true for every z ∈ Z(x), we conclude that
f
(x; y) ≥ sup
z∈Z(x)
φ
(x, z; y), ∀ y ∈ '
n
. (1.72)
To prove the reverse inequality and that the supremum in the right
hand side of the above inequality is attained, consider a sequence ¦α
k
¦ with
α
k
↓ 0 and let x
k
= x +α
k
y. For each k, let z
k
be a vector in Z(x
k
). Since
¦z
k
¦ belongs to the compact set Z, it has a subsequence converging to some
z ∈ Z. Without loss of generality, we assume that the entire sequence ¦z
k
¦
converges to z. We have
φ(x
k
, z
k
) ≥ φ(x
k
, z), ∀ z ∈ Z,
so by taking the limit as k →∞ and by using the continuity of φ, we obtain
φ(x, z) ≥ φ(x, z), ∀ z ∈ Z.
Therefore, z ∈ Z(x). We now have
f
(x; y) ≤
f(x + α
k
y) −f(x)
α
k
=
φ(x + α
k
y, z
k
) −φ(x, z)
α
k
≤
φ(x + α
k
y, z
k
) −φ(x, z
k
)
α
k
= −
φ(x + α
k
y + α
k
(−y), z
k
) −φ(x + α
k
y, z
k
)
α
k
≤ −φ
(x + α
k
y, z
k
; −y)
≤ φ
(x + α
k
y, z
k
; y),
(1.73)
where the last inequality follows from inequality (1.53). By letting k → ∞,
we obtain
f
(x; y) ≤ limsup
k→∞
φ
(x + α
k
y, z
k
; y) ≤ φ
(x, z; y),
where the right inequality in the preceding relation follows from Prop. 1.4.2
with f
k
() = φ(, z
k
), x
k
= x + α
k
y, and y
k
= y. This relation together
with inequality (1.72) proves Eq. (1.71).
162 Convex Analysis and Optimization Chap. 1
For the last statement of part (a), if Z(x) consists of the unique point
z, Eq. (1.71) and the diﬀerentiability assumption on φ yield
f
(x; y) = φ
(x, z; y) = y
∇xφ(x, z), ∀ y ∈ '
n
,
which implies that ∇f(x) = ∇
x
φ(x, z).
(b) By part (a), we have
f
(x; y) = max
z∈Z(x)
∇
x
φ(x, z)
y,
while by Prop. 1.7.3, we have
f
(x; y) = max
z∈∂f(x)
d
y.
For all z ∈ Z(x) and y ∈ '
n
, we have
f(y) = max
z∈Z
φ(y, z)
≥ φ(y, z)
≥ φ(x, z) +∇
x
φ(x, z)
(y −x)
= f(x) +∇xφ(x, z)
(y −x).
Therefore, ∇
x
φ(x, z) is a subgradient of f at x, implying that
conv
_
∇
x
φ(x, z) [ z ∈ Z(x)
_
⊂ ∂f(x).
To prove the reverse inclusion, we use a hyperplane separation argu
ment. By the continuity of φ(x, ) and the compactness of Z, we see that
Z(x) is compact, so by the continuity of ∇
x
φ(x, ), the set
_
∇
x
φ(x, z) [
z ∈ Z(x)
_
is compact. By Prop. 1.2.8, it follows that conv
_
∇xφ(x, z) [ z ∈
Z(x)
_
is compact. If d ∈ ∂f(x) while d / ∈ conv
_
∇
x
φ(x, z) [ z ∈ Z(x)
_
, by
the Strict Separation Theorem (Prop. 1.4.3), there exist y ,= 0 and γ ∈ ',
such that
d
y > γ > ∇
x
φ(x, z)
y, ∀ z ∈ Z(x).
Therefore, we have
d
y > max
z∈Z(x)
∇
x
φ(x, z)
y = f
(x; y),
contradicting Prop. 1.7.3. Thus ∂f(x) ⊂ conv
_
∇xφ(x, z) [ z ∈ Z(x)
_
and
the proof is complete.
(c) The proof of this part is nearly identical to the proof of part (a).
Q.E.D.
Sec. 1.7 Subgradients 163
A simple but important application of the above proposition is when
Z is a ﬁnite set and φ is linear in x for all z. In this case, f(x) can be
represented as
f(x) = max¦a
1
x + b
1
, . . . , a
m
x + b
m
¦,
where a
1
, . . . , a
m
are given vectors in '
n
and b
1
, . . . , b
m
are given scalars.
Then ∂f(x) is the convex hull of the set of all vectors a
i
such that a
i
x+b
i
=
max¦a
1
x +b1, . . . , a
mx + bm¦.
E X E R C I S E S
1.7.1 (Properties of Directional Derivative)
Let f : !
n
→ ! be a convex function and let x ∈ !
n
be a ﬁxed vector. Then for
the directional derivative function f
(x; ) : !
n
→ ! the following hold.
(a) f
(x; λy) = λf(x; y) for all λ ≥ 0 and y ∈ !
n
.
(b) f
(x; ) is convex over !
n
.
(c) The level set y [ f
(x; y) ≤ 0¦ is a closed convex cone.
1.7.2 (Chain Rule for Directional Derivatives)
Let f : !
n
→ !
m
and g : !
m
→ !
k
be given functions, and let x ∈ !
n
be a given
point. Suppose that f and g are directionally diﬀerentiable at x, and g is such
that for all w ∈ !
m
g
(y; w) = lim
α↓0, z→w
g(y + αz) −g(y)
α
.
Then the composite function F(x) = g
_
f(x)
_
is directionally diﬀerentiable at x
and the following chain rule applies
F
(x; d) = g
_
f(x); f
(x; d)
_
, ∀ d ∈ !
n
.
1.7.3
Let f : !
n
→ ! be a convex function. Show that a vector d ∈ !
n
is a subgradient
of f at x if and only if the function d
y −f(y) achieves its maximum at y = x.
164 Convex Analysis and Optimization Chap. 1
1.7.4
Show the following:
(a) For f(x) = [[x[[, we have
∂f(x) =
_
x/[[x[[¦ if x ,= 0,
x [ [[x[[ ≤ 1¦ if x = 0.
(b) For f(x) = max1≤i≤n [xi[, we have
∂f(x) =
_
convsign(xi)ei [ i ∈ Jx¦ if x ,= 0,
conv±e1, . . . , ±en¦ if x = 0,
where sign(t) denotes the sign of a scalar t, ei is the ith basis vector in
!
n
, and Jx is the set of all i such that f(x) = [xi[.
(c) For the function
f(x) =
_
0 if x ∈ C,
∞ if x ,∈ C,
where C is a nonempty convex set, we have
∂f(x) =
_
NC(x) if x ∈ C,
Ø if x ,∈ C.
1.7.5
Let f : !
n
→ ! be a convex function and X ⊂ !
n
be a bounded set. Show that
f is Lipschitz continuous over X, i.e., that there exists a scalar L such that
¸
¸
f(x) −f(y)
¸
¸
≤ L x −y, ∀ x, y ∈ X.
Show also that
f
(x; y) ≤ L [[y[[, ∀ x ∈ X, ∀ y ∈ !
n
.
Hint: Use the boundedness property of the subdiﬀerential (Prop. 1.7.3).
1.7.6 (Polar Cones of Level Sets)
Let f : !
n
→ ! be a convex function and let x ∈ !
n
be a ﬁxed vector.
(a) Show that
y [ f
(x; y) ≤ 0¦ =
_
cone
_
∂f(x)
_
_
∗
.
(b) Assuming that the level set z [ f(z) ≤ f(x)¦ is nonempty, show that the
normal cone of z [ f(z) ≤ f(x)¦ at the point x coincides with cone
_
∂f(x)
_
.
Sec. 1.7 Subgradients 165
1.7.7
Let f : !
n
→ (−∞, ∞] be a convex function. Show that ∂f(x) is nonempty if
x belongs to the relative interior of dom(f). Show also that ∂f(x) is nonempty
and bounded if and only if x belongs to the interior of dom(f).
1.7.8
Let fi : !
n
→ (−∞, ∞], i = 1, . . . , m be convex functions, and let f = f1 + +
fm. Show that
∂f1(x) + + ∂fm(x) ⊂ ∂f(x), ∀x ∈ !
n
.
Furthermore, assuming that the relative interiors of dom(fi) have a nonempty
intersection, show that
∂f1(x) + + ∂fm(x) = ∂f(x).
Hint: Follow the line of proof of Prop. 1.7.8.
1.7.9
Let C1, . . . , Cm be convex sets in !
n
such that ri(C1) ∩ ∩ri(Cm) is nonempty.
Show that
NC
1
∩···∩Cm
(x) = NC
1
(x) + + NCm
(x).
Hint: For each i, let fi(x) = 0 when x ∈ C and fi(x) = ∞ otherwise. Then use
Exercises 1.7.4(c) and 1.7.8.
1.7.10
Let f : !
n
→ ! be a convex function. Show that
∩>0∂f(x) = ∂f(x), ∀ x ∈ !
n
.
1.7.11
Let f : !
n
→ ! be a convex function and X ⊂ !
n
a bounded set. Show that for
every > 0, the set ∪x∈X∂f(x) is bounded.
166 Convex Analysis and Optimization Chap. 1
1.7.12 Continuity of Subdiﬀerential [Nur77]
Let f : !
n
→ ! be a convex function and > 0 be a scalar. For every x ∈ !
n
,
the following hold:
(a) If a sequence xk¦ converges to x and dk ∈ ∂f(xk) for all k, then the
sequence dk¦ is bounded and each of its limit points is an subgradient
of f at x.
(b) If d ∈ ∂f(x), then for every sequence xk¦ converging to x, there is a
sequence d
k
¦ such that d
k
∈ ∂f(x
k
) and d
k
→ d.
Note: This exercise shows that a pointtoset mapping x → ∂f(x) is continuous.
In particular, part (a) shows that the mapping x → ∂f(x) is upper semicontin
uous, while part (b) shows that this mapping is lower semicontinuous.
1.7.13 (Steepest Descent Directions of Convex Functions)
Let f : !
n
→ ! be a convex function and ﬁx a vector x ∈ !
n
. Show that a
vector d is the vector of minimum norm in ∂f(x) if and only if either d = 0 or
else d/d minimizes f
(x; d) over all d with d ≤ 1.
1.7.14 (Generating Descent Directions of Convex Functions)
Let f : !
n
→ ! be a convex function and ﬁx a vector x ∈ !
n
. A vector d ∈ !
n
is
said to be a descent direction of f at x if the corresponding directional derivative
of f satisﬁes
f
(x; d) < 0.
This exercise provides a method for generating a descent direction in circum
stances where obtaining a single subgradient at x is relatively easy.
Assume that x does not minimize f and let g1 be some subgradient of f at
x. For k = 2, 3, . . ., let wk be the vector of minimum norm in the convex hull of
g1, . . . , gk−1,
w
k
= arg min
g∈conv{g
1
,...,g
k−1
}
g.
If wk is a descent direction stop; else let gk be an element of ∂f(x) such that
g
k
w
k
= min
g∈∂f(x)
g
w
k
= f
(x; w
k
).
Show that this process terminates in a ﬁnite number of steps with a descent
direction. Hint: If w
k
is not a descent direction, then gi
w
k
≥ w
k

2
≥ g
∗

2
> 0
for all i = 1, . . . , k − 1, where g
∗
is the subgradient of minimum norm, while at
the same time gk
wk ≤ 0. Consider a limit point of (wk, gk)¦.
Sec. 1.8 Optimality Conditions 167
1.7.15 (Implementing the Descent Method [Lem74])
Let f : !
n
→ ! be a convex function. This exercise shows how the procedure
of Exercise 1.7.14 can be modiﬁed so that it generates an descent direction at
a given vector x. At the typical step of this procedure, we have g1, . . . , gk−1 ∈
∂f(x). Let wk be the vector of minimum norm in the convex hull of g1, . . . , gk−1,
wk = arg min
g∈conv{g
1
,...,g
k−1
}
g.
If wk = 0, stop; we have 0 ∈ ∂f(x), so by Prop. 1.7.6(b), f(x) ≤ inf
z∈
n f(z)+
and x is optimal. Otherwise, by a search along the line x − αwk [ α ≥ 0¦,
determine whether there exists a scalar α such that f(x − αw
k
) < f(x) − . If
such a α can be found, stop and replace x with x−αwk (the value of f has been
improved by at least ). Otherwise let gk be an element of ∂f(x) such that
g
k
w
k
= min
g∈∂f(x)
g
w
k
.
Show that this process will terminate in a ﬁnite number of steps with either an
improvement of the value of f by at least , or by conﬁrmation that x is an
optimal solution.
1.8 OPTIMALITY CONDITIONS
In this section we generalize the constrained optimality conditions of Sec
tion 1.5 to problems involving a nondiﬀerentiable but convex cost function.
We will ﬁrst use directional derivatives to provide a necessary and suﬃcient
condition for optimality in the problem of minimizing a convex function
f : '
n
→ ' over a convex set X ⊂ '
n
. It can be seen that x
∗
is a global
minimum of f over X if and only if
f
(x
∗
; x −x
∗
) ≥ 0, ∀ x ∈ X.
This follows using the deﬁnition (1.52) of directional derivative and the fact
that the diﬀerence quotient
f
_
x
∗
+α(x −x
∗
)
_
−f(x
∗
)
α
is a monotonically nondecreasing function of α. This leads to the following
optimality condition.
168 Convex Analysis and Optimization Chap. 1
Proposition 1.8.1: Let f : '
n
→ ' be a convex function. A vector
x
∗
minimizes f over a convex set X ⊂ '
n
if and only if there exists a
subgradient d ∈ ∂f(x
∗
) such that
d
(x −x
∗
) ≥ 0, ∀ x ∈ X.
Equivalently, x
∗
minimizes f over a convex set X ⊂ '
n
if and only if
0 ∈ ∂f(x
∗
) + T
X
(x
∗
)
∗
,
where T
X
(x
∗
)
∗
is the polar of the tangent cone of X at x
∗
.
Proof: Suppose that for some d ∈ ∂f(x
∗
) and all x ∈ X, we have d
(x −
x
∗
) ≥ 0. Then, since from the deﬁnition of a subgradient we have f(x) −
f(x
∗
) ≥ d
(x−x
∗
) for all x ∈ X, we obtain f(x) −f(x
∗
) ≥ 0 for all x ∈ X,
so x
∗
minimizes f over X. Conversely, suppose that x
∗
minimizes f over
X. Consider the set of feasible directions of X at x
∗
F
X
(x
∗
) = ¦w ,= 0 [ x
∗
+ αw ∈ X for some α > 0¦,
and the cone
W = −F
X
(x
∗
)
∗
=
_
d [ d
w ≥ 0, ∀ w ∈ F
X
(x
∗
)
_
.
If ∂f(x
∗
) and W have a point in common, we are done, so to arrive at a
contradiction, assume the opposite, i.e., ∂f(x
∗
) ∩ W = Ø. Since ∂f(x
∗
) is
compact and W is closed, by Prop. 1.4.3 there exists a hyperplane strictly
separating ∂f(x
∗
) and W, i.e., a vector y and a scalar c such that
g
y < c < d
y, ∀ g ∈ ∂f(x
∗
), ∀ d ∈ W.
Using the fact that W is a closed cone, it follows that
c < 0 ≤ d
y, ∀ d ∈ W, (1.74)
which when combined with the preceding inequality, also yields
max
g∈∂f(x
∗
)
g
y < c < 0.
Thus, using part (b), we have f
(x
∗
; y) < 0, while from Eq. (1.74), we
see that y belongs to the polar cone of F
X
(x
∗
)
∗
, which by the Polar Cone
theorem (Prop. 1.5.1), implies that y is in the closure of the set of feasible
directions F
X
(x
∗
). Hence for a sequence y
k
of feasible directions converging
to y we have f
(x
∗
; y
k
) < 0, which contradicts the optimality of x
∗
.
Sec. 1.8 Optimality Conditions 169
The last statement follows from the convexity of X which implies that
T
X
(x
∗
)
∗
is the set of all z such that z
(x−x
∗
) ≤ 0 for all x ∈ X (cf. Props.
1.5.3 and 1.5.5). Q.E.D.
Note that the above proposition generalizes the optimality condition
of Prop. 1.6.2 for the case where f is convex and smooth:
∇f(x
∗
)
(x −x
∗
) ≥ 0, ∀ x ∈ X.
In the special case where X = '
n
, we obtain a basic necessary and suﬃcient
condition for unconstrained optimality of x
∗
:
0 ∈ ∂f(x
∗
).
This optimality condition is also evident from the subgradient inequality
(1.55).
We ﬁnally extend the optimality conditions of Props. 1.6.2 and 1.8.1
to the case where the cost function involves a (possibly nonconvex) smooth
component and a convex (possibly nondiﬀerentiable) component.
Proposition 1.8.2: Let x
∗
be a local minimum of a function f :
'
n
→ ' over a subset X of '
n
. Assume that the tangent cone T
X
(x
∗
)
is convex, and that f has the form
f(x) = f
1
(x) + f
2
(x),
where f
1
is convex and f
2
is smooth. Then
−∇f2(x
∗
) ∈ ∂f1(x
∗
) + T
X
(x
∗
)
∗
.
Proof: The proof is analogous to the one of Prop. 1.6.2. Let y be a
nonzero tangent of X at x
∗
. Then there exists a sequence ¦ξ
k
¦ and a
sequence ¦x
k
¦ ⊂ X such that x
k
,= x
∗
for all k,
ξ
k
→ 0, x
k
→ x
∗
,
and
x
k
−x
∗
x
k
−x
∗

=
y
y
+ ξ
k
.
We write this equation as
x
k
−x
∗
=
x
k
−x
∗

y
y
k
, (1.75)
170 Convex Analysis and Optimization Chap. 1
where
y
k
= y +yξ
k
.
By the convexity of f
1
, we have
−f
1
(x
k
; x
k
−x
∗
) ≤ f
1
(x
k
; x
∗
−x
k
)
and
f
1
(x
k
) + f
1
(x
k
; x
∗
−x
k
) ≤ f
1
(x
∗
),
and by adding these inequalities, we obtain
f
1
(x
k
) ≤ f
1
(x
∗
) + f
1
(x
k
; x
k
−x
∗
), ∀ k.
Also, by the Mean Value Theorem, we have
f
2
(x
k
) = f
2
(x
∗
) +∇f
2
(˜ x
k
)
(x
k
−x
∗
), ∀ k,
where ˜ x
k
is a vector on the line segment joining x
k
and x
∗
. By adding the
last two relations,
f(x
k
) ≤ f(x
∗
) + f
1
(x
k
; x
k
−x
∗
) +∇f
2
(˜ x
k
)
(x
k
−x
∗
), ∀ k,
so using Eq. (1.75), we obtain
f(x
k
) ≤ f(x
∗
) +
x
k
−x
∗

y
_
f
1
(x
k
; y
k
) +∇f
2
(˜ x
k
)
y
k
_
, ∀ k. (1.76)
We now show by contradiction that f
1
(x
∗
; y)+∇f
2
(x
∗
)
y ≥ 0. Indeed,
if f
1
(x
∗
; y) + ∇f
2
(x
∗
)
y < 0, then since ˜ x
k
→ x
∗
and y
k
→ y, it follows,
using also Prop. 1.7.2, that for all suﬃciently large k,
f
1
(x
k
; y
k
) +∇f2(˜ x
k
)
y
k
< 0
and [from Eq. (1.76)] f(x
k
) < f(x
∗
). This contradicts the local optimality
of x
∗
.
We have thus shown that f
1
(x
∗
; y)+∇f
2
(x
∗
)
y ≥ 0 for all y ∈ T
X
(x
∗
),
or equivalently, by Prop. 1.7.3(b),
max
d∈∂f
1
(x
∗
)
d
y +∇f
2
(x
∗
)
y ≥ 0,
or equivalently [since T
X
(x
∗
) is a cone]
min
y≤1
y∈T
X
(x
∗
)
max
d∈∂f
1
(x
∗
)
_
d +∇f
2
(x
∗
)
_
y = 0.
Sec. 1.9 Notes and Sources 171
We can now apply the Saddle Point Theorem (Prop. 1.3.8 – the convex
ity/concavity and compactness assumptions of that proposition are satis
ﬁed) to assert that there exists a d ∈ ∂f
1
(x
∗
) such that
min
y≤1
y∈T
X
(x
∗
)
_
d +∇f2(x
∗
)
_
y = 0.
This implies that
−
_
d +∇f
2
(x
∗
)
_
∈ T
X
(x
∗
)
∗
,
which in turn is equivalent to −∇f
2
(x
∗
) ∈ ∂f
1
(x
∗
) + T
X
(x
∗
)
∗
. Q.E.D.
Note that in the special case where f
1
(x) ≡ 0, we obtain Prop. 1.6.2.
The convexity assumption on T
X
(x
∗
) is unnecessary in this case, but it is
essential in general. [Consider the subset X =
_
(x1, x2) [ x1x2 = 0
_
of
'
2
; it is easy to construct a convex nondiﬀerentiable function that has a
global minimum at x
∗
= 0 without satisfying the necessary condition of
Prop. 1.8.2.]
In the special case where f
2
(x) ≡ 0 and X is convex, Prop. 1.8.2
yields the necessity part of Prop. 1.8.1. More generally, when X is convex,
an equivalent statement of Prop. 1.8.2 is that if x
∗
is a local minimum of
f over X, there exists a subgradient d ∈ ∂f
1
(x
∗
) such that
_
d +∇f2(x
∗
)
_
(x −x
∗
) ≥ 0, ∀ x ∈ X.
This is because for a convex set X, we have z ∈ T
X
(x
∗
)
∗
if and only if
z
(x −x
∗
) ≤ 0 for all x ∈ X.
E X E R C I S E S
1.8.1
Consider the problem of minimizing a convex function f : !
n
→ ! over the
polyhedral set
X = x [ a
j
x ≤ bj, j = 1, . . . , r¦.
Show that x
∗
is an optimal solution if and only if there exist scalars µ
∗
1
, . . . , µ
∗
r
such that
(i) µ
∗
j
≥ 0 for all j, and µ
∗
j
= 0 for all j such that a
j
x
∗
< bj .
(ii) 0 ∈ ∂f(x
∗
) +
r
j=1
µ
∗
j
aj .
Hint: Characterize the cone TX(x
∗
)
∗
, and use Prop. 1.8.1 and Farkas’ Lemma.
172 Convex Analysis and Optimization Chap. 1
1.9 NOTES AND SOURCES
The material in this chapter is classical and is developed in various books.
Most of these books relate to both convex analysis and optimization, but
diﬀer in their scope and emphasis.
The book by Rockafellar [Roc70], is widely viewed as the classic con
vex analysis text, and contains a more extensive development of convexity
that the one given here, although it does not cross over into nonconvex
optimization. The book by Rockafellar and Wets [RoW98] is a deep and
detailed treatment of “variational analysis,” a broad spectrum of topics that
integrate classical analysis, convexity, and optimization of both convex and
nonconvex (possibly nonsmooth) functions. The normal cone, introduced
by Mordukhovich [Mor76], and the work of Clarke on nonsmooth analy
sis [Cla83] play a central role in this subject. The book by Rockafellar
and Wets contains a wealth of material beyond what is covered here, and
is strongly recommended for the advanced reader who wishes to obtain a
comprehensive view of the connecting links between convex and classical
analysis.
Among other books with detailed accounts of convexity, Stoer and
Witzgall [StW70] discuss similar topics as Rockafellar Roc70] but less com
prehensively. Ekeland and Temam [EkT76] develop the subject in inﬁnite
dimensional spaces. HiriartUrruty and Lemarechal [HiL93] emphasize al
gorithms for dual and nondiﬀerentiable optimization. Rockafellar [Roc84]
focuses on convexity and duality in network optimization and an important
generalization, known as monotropic programming. Bertsekas [Ber98] also
gives a detailed coverage of this material, which owes much to the early work
of Minty [Min60] on network optimization. Bonnans and Shapiro [BoS00]
emphasize sensitivity analysis and discuss inﬁnite dimensional problems as
well. Borwein and Lewis [BoL00] develop many of the concepts in Rock
afellar and Wets [RoW98], but more succinctly. Schrijver [Sch86] provides
an extensive account of polyhedral convexity with applications to integer
programming and combinatorial optimization, and gives many historical
references.
Among the early contributors to convexity theory, we note Steinitz
[Ste13], [Ste14], [Ste16], who developed the theory of relative interiors, re
cession cones, and polar cones; Minkowski [Min11], who ﬁrst investigated
supporting hyperplane theorems; Caratheodory [Car11], who also investi
gated hyperplane separation and gave the theorem on convex hulls that
carries his name. The MinkowskiWeyl Theorem given here is due to Weyl
[Wey35]. The notion of a tangent and the tangent cone originated with
Bouligand [Bou30], [Bou32]. Subdiﬀerential theory owes much to the work
of Fenchel, who in his 1951 lecture notes [Fen51] laid the foundations for
the subsequent work on convexity and its connection with optimization.
Sec. 1.9 Notes and Sources 173
REFERENCES
[BeM71] Bertsekas, D. P, and Mitter, S. K., 1971. “Steepest Descent for
Optimization Problems with Nondiﬀerentiable Cost Functionals,” Proc.
5th Annual Princeton Confer. Inform. Sci. Systems, Princeton, N. J., pp.
347351.
[BeM73] Bertsekas, D. P., and Mitter, S. K., 1973. “A Descent Numerical
Method for Optimization Problems with Nondiﬀerentiable Cost Function
als,” SIAM J. on Control, Vol. 11, pp. 637652.
[BeT97] Bertsimas, D., and Tsitsiklis, J. N., 1997. Introduction to Linear
Optimization, Athena Scientiﬁc, Belmont, MA.
[Ber98] Bertsekas, D. P., 1998. Network Optimization: Continuous and
Discrete Models, Athena Scientiﬁc, Belmont, MA.
[Ber99] Bertsekas, D. P., 1999. Nonlinear Programming: 2nd Edition, Athena
Scientiﬁc, Belmont, MA.
[BoS00] Bonnans, J. F., and Shapiro, A., 2000. Perturbation Analysis of
Optimization Problems, SpringerVerlag, Berlin and N. Y.
[BoL00] Borwein, J. M., and Lewis, A. S., Convex Analysis and Nonlinear
Optimization, SpringerVerlag, N. Y.
[Bou30] Bouligand, G., 1930. “Sur les Surfaces Depourvues de Points Hy
perlimites,” Annales de la Societe Polonaise de Mathematique, Vol. 9, pp.
3241.
[Bou32] Bouligand, G., 1932. Introduction a la Geometrie Inﬁnitesimale
Directe, GauthiersVillars, Paris.
[Car11] Caratheodory, C., 1911. “Uber den Variabilitatsbereich der Fouri
erschen Konstanten von Positiven Harmonischen Funktionen,” Rendiconto
del Circolo Matematico di Palermo, Vol. 32, pp. 193217; reprinted in Con
stantin Caratheodory, Gesammelte Mathematische Schriften, Band III (H.
Tietze, ed.), C. H. Beck’sche Verlagsbuchhandlung, Munchen, 1955, pp.
78110.
[Chv83] Chvatal, V., 1983. Linear Programming, W. H. Freeman and Co.,
N. Y.
[Cla83] Clarke, F. H., 1983. Optimization and Nonsmooth Analysis, Wiley,
N. Y.
[Dan63] Dantzig, G. B., 1963. Linear Programming and Extensions, Prince
ton Univ. Press, Princeton, N. J.
[EkT76] Ekeland, I., and Temam, R., 1976. Convex Analysis and Varia
tional Problems, NorthHolland Publ., Amsterdam.
174 Convex Analysis and Optimization Chap. 1
[Fen51] Fenchel, W., 1951. “Convex Cones, Sets, and Functions,” Mimeogra
phed Notes, Princeton Univ.
[HiL93] HiriartUrruty, J.B., and Lemarechal, C., 1993. Convex Analysis
and Minimization Algorithms, Vols. I and II, SpringerVerlag, Berlin and
N. Y.
[HoK71] Hoﬀman, K., and Kunze, R., 1971. Linear Algebra, PrenticeHall,
Englewood Cliﬀs, N. J.
[LaT85] Lancaster, P., and Tismenetsky, M., 1985. The Theory of Matrices,
Academic Press, N. Y.
[Lem74] Lemarechal, C., 1974. “An Algorithm for Minimizing Convex Func
tions,” in Information Processing ’74, Rosenfeld, J. L., (Ed.), pp. 552556,
NorthHolland, Amsterdam.
[Lem75] Lemarechal, C., 1975. “An Extension of Davidon Methods to Non
diﬀerentiable Problems,” Math. Programming Study 3, Balinski, M., and
Wolfe, P., (Eds.), NorthHolland, Amsterdam, pp. 95109.
[Min11] Minkowski, H., 1911. “Theorie der Konvexen Korper, Insbesondere
Begrundung Ihres Ober Flachenbegriﬀs,” Gesammelte Abhandlungen, II,
Teubner, Leipsig.
[Min60] Minty, G. J., 1960. “Monotone Networks,” Proc. Roy. Soc. London,
A, Vol. 257, pp. 194212.
[Mor76] Mordukhovich, B. S., 1976. “Maximum Principle in the Problem
of Time Optimal Response with Nonsmooth Constraints,” J. of Applied
Mathematics and Mechanics, Vol. 40, pp. 960969.
[Nur77] Nurminskii, E. A., “The Continuity of Subgradient Mappings,”
(Russian) Kibernetika (Kiev) 1977, No. 5, pp. 148–149.
[OrR70] Ortega, J. M., and Rheinboldt, W. C., 1970. Iterative Solution of
Nonlinear Equations in Several Variables, Academic Press, N. Y.
[RoW98] Rockafellar, R. T., and Wets, R. J.B., 1998. Variational Analysis,
SpringerVerlag, Berlin.
[Roc70] Rockafellar, R. T., 1970. Convex Analysis, Princeton Univ. Press,
Princeton, N. J
[Roc84] Rockafellar, R. T., 1984. Network Flows and Monotropic Opti
mization, Wiley, N. Y.; republished by Athena Scientiﬁc, Belmont, MA,
1998.
[Rud76] Rudin, W., 1976. Principles of Mathematical Analysis, McGraw
Hill, N. Y.
[Scr86] Shrijver, A., 1986. Theory of Linear and Integer Programming,
Wiley, N. Y.
Sec. 1.9 Notes and Sources 175
[Ste13] Steinitz, H., 1913. “Bedingt Konvergente Reihen und Konvexe Sys
tem, I, J. of Math., Vol. 143, pp. 128175.
[Ste14] Steinitz, H., 1914. “Bedingt Konvergente Reihen und Konvexe Sys
tem, II, J. of Math., Vol. 144, pp. 140.
[Ste16] Steinitz, H., 1916. “Bedingt Konvergente Reihen und Konvexe Sys
tem, III, J. of Math., Vol. 146, pp. 152.
[StW70] Stoer, J., and Witzgall, C., 1970. Convexity and Optimization in
Finite Dimensions, SpringerVerlag, Berlin.
[Str76] Strang, G., 1976. Linear Algebra and Its Applications, Academic
Press, N. Y.
[Wey35] Weyl, H., 1935. “Elementare Theorie der Konvexen Polyeder,”
Commentarii Mathematici Helvetici, Vol. 7, pp. 290306.
Contents
1. Convex Analysis and Optimization 1.1. Linear Algebra and Analysis . . . . . . . . . . . . 1.1.1. Vectors and Matrices . . . . . . . . . . . . . 1.1.2. Topological Properties . . . . . . . . . . . . . 1.1.3. Square Matrices . . . . . . . . . . . . . . . 1.1.4. Derivatives . . . . . . . . . . . . . . . . . . 1.2. Convex Sets and Functions . . . . . . . . . . . . . 1.2.1. Basic Properties . . . . . . . . . . . . . . . 1.2.2. Convex and Aﬃne Hulls . . . . . . . . . . . . 1.2.3. Closure, Relative Interior, and Continuity . . . . 1.2.4. Recession Cones . . . . . . . . . . . . . . . 1.3. Convexity and Optimization . . . . . . . . . . . . 1.3.1. Global and Local Minima . . . . . . . . . . . 1.3.2. The Projection Theorem . . . . . . . . . . . . 1.3.3. Directions of Recession and Existence of Optimal Solutions . . . . . . . 1.3.4. Existence of Saddle Points . . . . . . . . . . . 1.4. Hyperplanes . . . . . . . . . . . . . . . . . . . 1.4.1. Support and Separation Theorems . . . . . . . 1.4.2. Nonvertical Hyperplanes and the Minimax Theorem 1.5. Conical Approximations and Constrained Optimization 1.6. Polyhedral Convexity . . . . . . . . . . . . . . . 1.6.1. Polyhedral Cones . . . . . . . . . . . . . . . 1.6.2. Polyhedral Sets . . . . . . . . . . . . . . . . 1.6.3. Extreme Points . . . . . . . . . . . . . . . . 1.6.4. Extreme Points and Linear Programming . . . . 1.7. Subgradients . . . . . . . . . . . . . . . . . . . 1.7.1. Directional Derivatives . . . . . . . . . . . . . 1.7.2. Subgradients and Subdiﬀerentials . . . . . . . . 1.7.3. Subgradients . . . . . . . . . . . . . . . . 1.7.4. Subgradients of Extended RealValued Functions . 1.7.5. Directional Derivative of the Max Function . . . . 1.8. Optimality Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iii
Analysis of Subgradient Methods . . . .1. Using the Extended Representation . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . .iv Contents 1. . . 3. 3. . . . . . . . . .3. . . . . Subgradient Methods . . . . . . . . . . Linear and Quadratic Programming Duality . . . 5. . . . . .4. . . . . Conjugate Functions . . . .5. . . . . . . . Analysis . .4. . . 2. .6. . . . 5. . . . Duality Theory . . 4. .2. . Ascent Methods . . .3.2. . . Cutting Plane Methods . . Notes and Sources . . . . . 5. .2.7. . Strong Duality Theorems .4. . Conjugate Duality and Applications 4. . . . . . . . . . . 5. Dual Subgradients . . . . Notes and Sources 2. . . . . . . .4. . . . 2. . . . . . . . . . . . . . Pseudonormality and Constraint Qualiﬁcations Exact Penalty Functions .1. . Introduction to Lagrange Multipliers . . 3. . Convex Cost – Linear Constraints . . . . . . . . . . .2. Notes and Sources . . . . . . . . . Geometric Multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Primal Function and Sensitivity Exact Penalty Functions . . . . . . .2. . . . 5. . . . . . . . . . . . . . . . . . . . Extensions to the Nondiﬀerentiable Case . . . . . . . . Lagrange Multipliers 2. . . . . Notes and Sources . . . . . . . . . Informative Lagrange Multipliers . . . . . . . . . . . . . . . . . . . . Enhanced Fritz John Optimality Conditions . . . . . . . . .2. . . . 4. . . . .9. . . . . . . . . . . . . . . . .3. . 3. .1. . . . . . . . . . . . . . . . . .3. . . . Subgradient Methods with Randomization 5. . 2. . . 3. . . . . . . . . . .6. . 3. . 2. . . . . . . . . . . . . .8. 4. Lagrangian Duality 3. . . . . 2. . . . . . . . . . . . 2. . . . . . . . . 5. Fritz John Conditions when there is no Optimal Solution 3. . 4. . .1. . . . . . .4. . Notes and Sources . . . . . . . . . . .2. . . . . . . . . . The Fenchel Duality Theorem . . . . Dual Computational Methods 5. .5. . . . Convex Cost – Convex Constraints . 2. . . . . . . . . . . 4.5. . .4. . . . . . . . . . .1.5. . . . . . . . . . . . .1. . .2. . . . .
Preface
These lecture notes were developed for the needs of a graduate course at the Electrical Engineering and Computer Science Department at M.I.T. They focus selectively on a number of fundamental analytical and computational topics in (deterministic) optimization that span a broad range from continuous to discrete optimization, but are connected through the recurring theme of convexity, Lagrange multipliers, and duality. These topics include Lagrange multiplier theory, Lagrangian and conjugate duality, and nondiﬀerentiable optimization. The notes contain substantial portions that are adapted from my textbook “Nonlinear Programming: 2nd Edition,” Athena Scientiﬁc, 1999. However, the notes are more advanced, more mathematical, and more researchoriented. As part of the course I have also decided to develop in detail those aspects of the theory of convex sets and functions that are essential for an indepth coverage of Lagrange multipliers and duality. I have long thought that convexity, aside from being an eminently useful subject in engineering and operations research, is also an excellent vehicle for assimilating some of the basic concepts of analysis within an intuitive geometrical setting. Unfortunately, the subject’s coverage in mathematics and engineering curricula is scant and incidental. I believe that at least part of the reason is that while there are a number of excellent books on convexity, as well as a true classic (Rockafellar’s 1970 book), none of them is well suited for teaching nonmathematicians who form the largest part of the potential audience. I have therefore tried in these notes to make convex analysis accessible by limiting somewhat its scope and by emphasizing its geometrical character, while at the same time maintaining mathematical rigor. The coverage of the theory is signiﬁcantly extended in the exercises, whose detailed solutions will be eventually posted on the internet. I have included as many insightful illustrations as I could come up with, and I have tried to use geometric visualization as a principal tool for maintaining the students’ interest in mathematical proofs. An important part of my approach has been to maintain a close link between the theoretical treatment of convexity concepts with their appliv
vi
Preface
cation to optimization. For example, in Chapter 1, soon after the development of some of the basic facts about convexity, I discuss some of their applications to optimization and saddle point theory; soon after the discussion of hyperplanes and cones, I discuss conical approximations and necessary conditions for optimality; soon after the discussion of polyhedral convexity, I discuss its application in linear and integer programming; and soon after the discussion of subgradients, I discuss their use in optimality conditions. I follow consistently this style in the remaining chapters, although having developed in Chapter 1 most of the needed convexity theory, the discussion in the subsequent chapters is more heavily weighted towards optimization. In addition to their educational purpose, these notes aim to develop two topics that I have recently researched with two of my students, and to integrate them into the overall landscape of convexity, duality, and optimization. These topics are: (a) A new approach to Lagrange multiplier theory, based on a set of enhanced FritzJohn conditions and the notion of constraint pseudonormality. This work, joint with my Ph.D. student Asuman Ozdaglar, aims to generalize, unify, and streamline the theory of constraint qualiﬁcations. It allows for an abstract set constraint (in addition to equalities and inequalities), it highlights the essential structure of constraints using the new notion of pseudonormality, and it develops the connection between Lagrange multipliers and exact penalty functions. (b) A new approach to the computational solution of (nondiﬀerentiable) dual problems via incremental subgradient methods. These methods, developed jointly with my Ph.D. student Angelia Nedi´, include c some interesting randomized variants, which according to both analysis and experiment, perform substantially better than the standard subgradient methods for large scale problems that typically arise in the context of duality. The lecture notes may be freely reproduced and distributed for noncommercial purposes. They represent workinprogress, and your feedback and suggestions for improvements in content and style will be most welcome.
Dimitri P. Bertsekas dimitrib@mit.edu Spring 2001
1 Convex Analysis and Optimization
Date: July 26, 2001
Contents 1.1. Linear Algebra and Analysis . . . . . . . . 1.1.1. Vectors and Matrices . . . . . . . . . 1.1.2. Topological Properties . . . . . . . . 1.1.3. Square Matrices . . . . . . . . . . . 1.1.4. Derivatives . . . . . . . . . . . . . . 1.2. Convex Sets and Functions . . . . . . . . . 1.2.1. Basic Properties . . . . . . . . . . . 1.2.2. Convex and Aﬃne Hulls . . . . . . . . 1.2.3. Closure, Relative Interior, and Continuity 1.2.4. Recession Cones . . . . . . . . . . . 1.3. Convexity and Optimization . . . . . . . . 1.3.1. Global and Local Minima . . . . . . . 1.3.2. The Projection Theorem . . . . . . . . 1.3.3. Directions of Recession and Existence of Optimal Solutions . . . 1.3.4. Existence of Saddle Points . . . . . . . 1.4. Hyperplanes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . p. 4 p. 5 p. 8 p. 15 p. 20 p. 24 p. 24 p. 37 p. 39 p. 47 p. 61 p. 61 p. 64
. . . . . p. 67 . . . . . p. 75 . . . . . p. 87
1
7. . . . 1.2 Convex Analysis and Optimization Chap.7. 139 p. . . .6.7. . 1. . . . . . p. .4. . 1. . Notes and Sources . 1. . . . 102 p. 167 p. . . Subgradients . . 1. . . . .7. . . .4. . Optimality Conditions . . . . . Polyhedral Cones .3.6. . . 157 p. Polyhedral Sets .2. . . 115 p. . . Extreme Points and Linear Programming . . . . . 1. .2. . . . . 91 p.1. . . . 139 p. . . . . . . .4. 1. . .6. . 1. . . .5. . . . 1. . Extreme Points . . . . . . . . .6. 115 p. . Subgradients and Subdiﬀerentials . Subgradients . . 151 p.6. . . Subgradients of Extended RealValued Functions . . . . . . . . . . 128 p. . . . . . . . p. 159 p. . 1 1. Directional Derivative of the Max Function .3. . . .8. . . 122 p. . . . . . 1. . . . . 1. . . Nonvertical Hyperplanes and the Minimax Theorem 1. 172 . . . . . . .4. . . Support and Separation Theorems . Directional Derivatives . .1. . . .2. .7. 1. 123 p. . . .7. . . . . Polyhedral Convexity . .5. . . . . . . . . 1.9. 144 p. . . . Conical Approximations and Constrained Optimization 1. . . . . . .1. 87 .
0 3 In this chapter we provide the mathematical foundations for Lagrange multiplier theory and duality. a realvalued convex function is directionally . (b) Convex sets are connected and have feasible directions at any point (assuming they consist of more than one point). 1. including the celebrated simplex method (see Section 1. For optimization purposes. This is the basis for ﬁnitely terminating methods for linear programming. the direction x − x is a feasible direction at x. it is worth listing some of the properties of convex sets and functions that make them so special in optimization. In other words. which are developed in the subsequent chapters. By this we mean that given any point x in a convex set X. when viewed within the smallest aﬃne set containing it. We assume no prior knowledge of the subject.3). (g) Convex functions are continuous and have nice diﬀerentiability properties.Sec. is not encountered under convexity. it is possible to move from x along some directions and stay within X for at least a nontrivial interval. are avoided. (e) The existence of a global minimum of a convex function over a convex set is conveniently characterized in terms of directions of recession (see Section 1. (a) Convex functions have no local minima that are not global . much of the diﬃculty commonly associated with discrete constraint sets (arising for example in combinatorial optimization). a convex set has a nonempty interior. In particular.6). whose global optimality is hard to verify in practice. Furthermore. After an introductory ﬁrst section. (f) Polyhedral convex sets (those speciﬁed by linear equality and inequality constraints) are characterized in terms of a ﬁnite set of extreme points and extreme directions. we focus on convex analysis with an emphasis on optimizationrelated topics. Thus the diﬃculties associated with multiple disconnected local minima. and all feasible directions can be characterized this way. (d) A nonconvex function can be “convexiﬁed” while maintaining the optimality of its global minima. (c) Convex sets have a nonempty relative interior. Thus convex sets avoid the analytical and computational optimization diﬃculties associated with “thin” and “curved” constraint surfaces. by forming the convex hull of the epigraph of the function. and forms the basis for some important algorithms. this is important because it allows a calculusbased comparison of the cost of x with the cost of its close neighbors. As we embark on the study of convexity. and we develop the subject in detail. In fact a stronger property holds: given any two distinct points x and x in X.
(i) Closed convex cones are selfdual with respect to polarity. if X ∗ denotes the set of vectors that form a nonpositive inner product with all vectors in a set X.7). lower semicontinuous function. The conjugacy operation is central in duality theory. Our approach in this chapter is to maintain a close link between the theoretical treatment of convexity concepts with their application to optimization. it possesses subgradients. notational conventions. Indeed.4 and 1. subgradients ﬁgure prominently in optimality conditions and computational algorithms.1 LINEAR ALGEBRA AND ANALYSIS In this section. and Strang [Str76] (linear algebra). We follow this approach throughout the book. and soon after the discussion of hyperplanes and cones in Sections 1. we list some basic deﬁnitions. even if the original problem is not convex.4 Convex Analysis and Optimization Chap. while a convex function need not be diﬀerentiable. we recommend the books by Hoﬀman and Kunze [HoK71]. we discuss conditions for optimality. Lancaster and Tismenetsky [LaT85]. and has a nice interpretation that can be used to visualize and understand some of the most profound aspects of optimization. For related and additional material. although given the extensive development of convexity theory in Chapter 1. we have C = (C ∗ )∗ for any closed and convex cone C. and results from linear algebra and analysis. the dual problem of a given optimization problem (discussed in Chapters 3 and 4) consists of minimization of a convex function over a convex set.2.3. . we discuss some of their applications to optimization in Section 1. 1. (j) Convex. Just like gradients. This simple and geometrically intuitive property (discussed in Section 1. so no proofs are given. In words. 1 diﬀerentiable at any point. It will be seen in Chapter 4 that a certain geometrically motivated conjugacy operation on a given convex.5. lower semicontinuous function generates a convex. and when applied for a second time regenerates the original function. the discussion in the subsequent chapters is more heavily weighted towards optimization. For example. Furthermore. soon after the development for some of the basic facts about convexity in Section 1. lower semicontinuous functions are selfdual with respect to conjugacy. which are nice and geometrically intuitive substitutes for a gradient (see Section 1. (h) Convex functions are central in duality theory. We assume that the reader is familiar with this material.5) underlies important aspects of Lagrange multiplier theory.
If f : X → Y is a function. If f is a function. x · ∞ = ∞ if x > 0. If X1 and X2 are two subsets of n . x2 ∈ X2 }. x denotes the transpose of x. [a. respectively. instead of square. b) denote the set of all x satisfying a < x ≤ b. Any two vectors x. and U and V are subsets of X and Y . If X is a set and λ is a scalar we denote by λX the set {λx  x ∈ X}. The expression ∞ − ∞ is meaningless and is never allowed to occur. y ∈ n is n deﬁned by x y = i=1 xi yi . x · ∞ = −∞ if x < 0. the set f(x)  x ∈ U is called the image or forward image of U. 1. Vectors in n will be viewed as column vectors. b]. respectively. bracket denotes strict inequality in the deﬁnition. we use the notation f : X → Y to indicate the fact that f is deﬁned on a set X (its domain ) and takes values in a set Y (its range).1. etc. as the set of all elements satisfying property P . For any two vectors x and y. we use the natural extensions of the rules of arithmetic: x · 0 = 0 for every extended real number x. and the set x ∈ X  f (x) ∈ V is called the inverse image of V . For any x ∈ n . b] the set of (possibly extended) real numbers x satisfying a ≤ x ≤ b. The union of two sets X1 and X2 is denoted by X1 ∪ X2 and their intersection by X1 ∩ X2 . The notations x ≥ y. Thus (a. also called its ith component. If x is a vector in n . The empty set is denoted by Ø.1 Vectors and Matrices We denote by n the set of ndimensional real vectors. and a < x < b. We denote by [a. A rounded. b).. x < y. Notation If X is a set and x is an element of X. When working with extended real numbers. the notation x > y means that x − y > 0. unless the contrary is explicitly stated. and (a. the notations x > 0 and x ≥ 0 indicate that all coordinates of x are positive and nonnegative. . The inner product of two vectors x. A set can be speciﬁed in the form X = {x  x satisﬁes P }. which is an ndimensional row vector. For any x ∈ n . and Rudin [Rud76] (analysis). respectively. we denote by X1 + X2 the vector sum {x1 + x2  x1 ∈ X1 . and x + ∞ = ∞ and x − ∞ = −∞ for every scalar x. The symbols ∃ and ∀ have the meanings “there exists” and “for all. a ≤ x < b.Sec. The set of real numbers (also referred to as scalars) is denoted by . are to be interpreted accordingly. we write x ∈ X . 1. Ortega and Rheinboldt [OrR70]. The set augmented with +∞ and −∞ is called the set of extended real numbers.” respectively. y ∈ n satisfying x y = 0 are called orthogonal . we use xi to indicate its ith coordinate.1 Linear Algebra and Analysis 5 and the books by Ash [Ash72].
It can be shown that (S ⊥ )⊥ = S (see the Polar Cone Theorem in Section 1.. . . . is the subset (x1 . . . the vector xk does not belong to the span of x1. . . . the set of vectors that are orthogonal to all elements of X is a subspace denoted by X ⊥ : X ⊥ = {y  y x = 0. y ∈ S and every a. a basis consisting of mutually orthogonal vectors.2). Subspaces and Linear Independence A subset S of n is called a subspace if ax + by ∈ S for every x. . . . where x is a vector in n and S is a subspace of n .e. a set of the form x + S = {x + x  x ∈ S}. b ∈ . An aﬃne set in n is a translated subspace. xk−1 . If S is a subspace.. The dimension of an aﬃne set x + S is the dimension of the corresponding subspace S.6 Convex Analysis and Optimization Chap. Given sets Xi ⊂ ni . If S is a subspace of n containing at least one nonzero vector.3. Given any set X. the subspace {0} is said to have dimension zero. ∀ x ∈ X}. xm )  xi ∈ Xi . . Furthermore. . S ⊥ is called the orthogonal complement of S. . . . . . . . . In the case where one of the subsets consists of a single vector x. By convention. xm } of elements of n (also called the subspace generated by the collection) is the subspace m consisting of all vectors y of the form y = k=1 ak xk . a basis for S is a collection of vectors that are linearly independent and whose span is equal to S. 1 We use a similar notation for the sum of any ﬁnite number of subsets.e. The span of a ﬁnite collection {x1 . . . and for every k > 1. i. . denoted by X1 × · · · × Xm . we simplify this notation as follows: x + X = {x + x  x ∈ X}. i = 1. unless ak = 0 for each k. the Cartesian product of the Xi . Every basis of a given subspace has the same number of vectors. This number is called the dimension of S. . . . am such that k=1 ak xk = 0. The vectors x1. m of n1 +···+nm . i = 1.5). An equivalent deﬁnition is that x1 = 0. xm ∈ n are called linearly independent if there m exists no set of scalars a1 . i. . any vector x can be uniquely decomposed as the sum of a vector from S and a vector from S ⊥ (see the Projection Theorem in Section 1. Every subspace of nonzero dimension has an orthogonal basis. m. . where each ak is a scalar.
The matrix A and its transpose A have the same rank.e. . the transpose of the product matrix AB satisﬁes (AB) = B A . then AX is also a subspace. . This follows by introducing matrices B1 and B2 such that S1 = {x  B1 x = 0} = N (B1 ) and S2 = {x  B2 x = 0} = N (B2). an ∈ m (the columns of A) and a vector x ∈ m . We use I to denote the identity matrix. n}. . and writing ⊥ ⊥ S1 +S2 = R [ B1 B2 ] = N B1 B2 ⊥ = N (B1 )∩N (B2) ⊥ = (S1 ∩S2 )⊥ . if its rank is equal to min{m. we use Aij . The rank of A is equal to the maximal number of linearly independent columns of A. Let A be a square matrix. . denoted by A .. is deﬁned by [A ]ij = aji . If X is a subset of n and A is an m × n matrix. For any two matrices A and B of compatible dimensions. an important result for constrained optimization. and is also equal to the maximal number of linearly independent rows of A. The transpose of A. . It is seen that the range space and the null space of A are subspaces. denoted by R(A). .1 Linear Algebra and Analysis 7 Matrices For any matrix A. [A]ij . We say that A has full rank . If X is subspace. or all the columns of A are linearly independent. then ⊥ ⊥ S1 + S2 = (S1 ∩ S2)⊥ . we have x y = 0 for all y such that ai y = 0 for all i if and only if x = λ1a1 + · · · + λn an for some scalars λ1 . . or aij to denote its ijth element. The range of an m × n matrix A and the orthogonal complement of the nullspace of its transpose are equal. Let A be an m × n matrix. This is a special case of Farkas’ Lemma. R(A) = N (A )⊥ . 1. The null space of A. then the image of X under A is denoted by AX (or A · X if this enhances notational clarity): AX = {Ax  x ∈ X}.Sec. This is true if and only if either all the rows of A are linearly independent.6. denoted by N (A). is the set of all vectors x ∈ n such that Ax = 0. is the set of all vectors y ∈ m such that y = Ax for some x ∈ n . We say that A is symmetric if A = A. A useful application of this result is that if S1 and S2 are two subspaces of n . The rank of A is the dimension of the range space of A. The determinant of A is denoted by det(A). i. The range space of A. λn . which will be discussed later in Section 1. Another way to state this result is that given vectors a1 . . We say that A is diagonal if [A]ij = 0 whenever i = j.
f is said to be a linear function or linear transformation. (this is referred to as the The Euclidean norm of a vector x = (x1 . Proposition 1. 1.1. (c) x = 0 if and only if x = 0.1: (Pythagorean Theorem) For any two vectors x and y that are orthogonal. · will denote the Euclidean norm. .1. If b = 0. . in the absence of a clear indication to the contrary. . In particular.2 Topological Properties Deﬁnition 1. with equality holding if and only if x = αy for some scalar α.2: (Schwartz inequality) For any two vectors x and y. αx = α · x for every scalar α and every x ∈ n n. a function f : n → m is said to be aﬃne if it has the form f (x) = Ax + b for some m × n matrix A and some b ∈ m . . .8 Convex Analysis and Optimization Chap. xn ) is deﬁned by n 1/2 x = (x x)1/2 = i=1 xi 2 . is called a Euclidean space.1. we have x y ≤ x · y . We will use the Euclidean norm almost exclusively in this book. y ∈ triangle inequality ). The space n . Two important results for the Euclidean norm are: Proposition 1. Similarly. equipped with this norm.1: A norm · on n is a function that assigns a scalar x to every x ∈ n and that has the following properties: (a) x ≥ 0 for all x ∈ (b) (d) n. x + y ≤ x + y for all x.1. 1 A function f : n → is said to be aﬃne if it has the form f (x) = a x + b for some a ∈ n and b ∈ . we have x+y 2 = x 2 + y 2.
If {xk } converges to x and is nonincreasing (nondecreasing).3: Every bounded and monotonically nonincreasing or nondecreasing scalar sequence converges. if for every scalar b there exists some K such that xk ≤ b for all k ≥ K. Sequences We use both subscripts and superscripts in sequence notation. The sequence {xk } is said to be monotonically nonincreasing (respectively. 2. a monotonically nonincreasing sequence {xk } is either bounded and converges. we prefer subscripts. . or it is unbounded. . A sequence {xk } is called a Cauchy sequence if for every > 0. Similarly. deﬁned by x ∞ (also called = max xi . there exists some K (depending on ) such that xk − xm  < for all k ≥ K and m ≥ K. Note that a monotonically nondecreasing sequence {xk } is either bounded. in which case xk → ∞. Proposition 1.1 Linear Algebra and Analysis ∞ 9 Two other important norms are the maximum norm · supnorm or ∞ norm). 1.} (or {xk } for short) of scalars is said to converge if there exists a scalar x such that for every > 0 we have xk − x < for every k greater than some integer K (depending on ). symbolically. we also use the notation xk ↓ x (xk ↑ x. but we use superscripts whenever we need to reserve the subscript notation for indexing coordinates or components of vectors and functions. A sequence {xk  k = 1. . Similarly. It is said to be bounded if it is bounded above and bounded below. below) if there exists some scalar b such that xk ≤ b (respectively. The meaning of the subscripts and superscripts should be clear from the context in which they are used. .1. We call the scalar x the limit of {xk }. If for every scalar b there exists some K (depending on b) such that xk ≥ b for all k ≥ K. we write xk → −∞ and limk→∞ xk = −∞. or else it is unbounded. xk → x or limk→∞ xk = x. i n and the 1 norm · 1. we write xk → ∞ and limk→∞ xk = ∞.Sec. in which case xk → −∞. respectively). and we also say that {xk } converges to x. nondecreasing) if xk+1 ≤ xk (respectively. Generally. A sequence {xk } is said to be bounded above (respectively. in which case it converges to some scalar x by the above proposition. deﬁned by x 1 = i=1 xi . xk+1 ≥ xk ) for all k. xk ≥ b) for all k.
respectively) of the set X is attained at one of its points. respectively (Prop. or it is about to be shown that the maximum (or minimum. The limit of zm is denoted by lim inf k→∞ xk .10 Convex Analysis and Optimization Chap. Given a sequence {xk }. and therefore have a limit whenever {xk } is bounded above or is bounded below.) If sup X is equal to a scalar x that belongs to the set X.1. If no such scalar exists. Similarly. and if it is unbounded below. zm = inf{xk  k ≥ m}. . and is equal to −∞ if no such scalar exists. denoted by inf X. is deﬁned as the largest scalar x such that x ≤ y for all y ∈ X.3). respectively) we do so just for emphasis: we indicate that it is either evident. and is referred to as the limit inferior of {xk }. . For the empty set. Similarly. (This is somewhat paradoxical. The sequences {ym } and {zm } are nonincreasing and nondecreasing. denoted by sup X. is deﬁned as the smallest scalar x such that x ≥ y for all y ∈ X. . The limit of ym is denoted by lim supk→∞ xk . 1 The supremum of a nonempty set X of scalars. we say that x is the maximum point of X and we often write x = sup X = max X. if inf X is equal to a scalar x that belongs to the set X. we write lim inf k→∞ xk = −∞. . the inﬁmum of X. when we write max X (or min X) in place of sup X (or inf X. we use the convention sup(Ø) = −∞. and is referred to as the limit superior of {xk }. we say that the supremum of X is ∞. since we have that the sup of a set is less than its inf. respectively. If {xk } is unbounded above. let ym = sup{xk  k ≥ m}. but works well for our analysis. the supremum of the sequence. inf(Ø) = ∞.}. is deﬁned as sup{xk  k = 1. Given a scalar sequence {xk }. we often write x = inf X = min X. or it is known through earlier analysis. The inﬁmum of a sequence is similarly deﬁned. 1. denoted by supk xk . 2. Thus. we write lim supk→∞ xk = ∞.
1 Linear Algebra and Analysis 11 Proposition 1. then lim inf xk ≤ lim inf yk . (a) If {xk } is bounded.1. We use the notations xk → x and limk→∞ xk = x to indicate convergence for vector sequences as well. a Cauchy sequence) if each of its corresponding coordinate sequences is bounded (respectively.1.1. (b) {xk } converges if and only if it is bounded and it has a unique limit point.4: Let {xk } and {yk } be scalar sequences. The sequence {xk } is called bounded (respectively.2: We say that a vector x ∈ n is a limit point of a sequence {xk } in n if there exists a subsequence of {xk } that converges to x. 1. it has at least one limit point. k→∞ k→∞ (d) We have lim inf xk + lim inf yk ≤ lim inf (xk + yk ). (c) If xk ≤ yk for all k. k→∞ k→∞ k→∞ lim sup xk + lim sup yk ≥ lim sup(xk + yk ). (a) There holds inf xk ≤ lim inf xk ≤ lim sup xk ≤ sup xk . It can be seen that {xk } is bounded if and only if there exists a scalar c such that xk ≤ c for all k.5: Let {xk } be a sequence in n. . a Cauchy sequence). Proposition 1. k→∞ k→∞ lim sup xk ≤ lim sup yk . k k→∞ k→∞ k (b) {xk } converges if and only if lim inf k→∞ xk = lim supk→∞ xk and. k→∞ k→∞ k→∞ A sequence {xk } of vectors in n is said to converge to some x ∈ n if the ith coordinate of xk converges to the ith coordinate of x for every i. in that case.Sec. both of these quantities are equal to the limit of xk . Deﬁnition 1.
denoted cl(X ). For any > 0 and x∗ ∈ . Let · the sets be a given norm in x  x − x∗ < . Deﬁnition 1. o(·) Notation If p is a positive integer and h : n → m. The closure of X. then we write h(x) = o x if k→∞ p lim h(xk ) = 0. that converges to x. is the set of all limit points of X .1. while the second set is closed and is called a closed sphere centered at x∗ . It is called compact if it is closed and bounded. n. n.1. . Sometimes the terms open ball and closed ball are used.4: A neighborhood of a vector x is an open set containing x. with xk = 0 for all k. xk p for all sequences {xk }.3: A set X ⊂ n is called closed if it is equal to its closure. It is called bounded if there exists a scalar c such that the magnitude of any coordinate of any element of X is less than c. respectively. Closed and Open Sets We say that x is a closure point or limit point of a set X ⊂ n if there exists a sequence {xk }. 1 (c) {xk } converges if and only if it is a Cauchy sequence. Deﬁnition 1. consisting of elements of X. consider x  x − x∗ ≤ The ﬁrst set is open and is called an open sphere centered at x∗ . It is called open if its complement (the set {x  x ∈ X}) is / closed.12 Convex Analysis and Optimization Chap. We say that x is an interior point of a set X ⊂ n if there exists a neighborhood of x that is contained in X. A vector x ∈ cl(X) which is not an interior point of X is said to be a boundary point of X. that converge to 0.
bounded. This is a consequence of the following proposition. 1.7: For any two norms · and exists a scalar c such that x ≤ c x for all x ∈ n. is the set of all x ∈ n such that every neighborhood of x has a nonempty intersection with inﬁnitely many of the sets Xk . then the intersection ∩∞ Xk is nonempty k=0 and compact.1.1.Sec. Proposition 1. bounded. The topological properties of subsets of n . (f) Every subspace of n is closed.6: (a) The union of ﬁnitely many closed sets is closed. (g) A set X is compact if and only if every sequence of elements of X has a subsequence that converges to an element of X. we obtain the following. or compact) with respect to some norm. there Using the preceding proposition. it converges with respect to all other norms. k = 1. denoted lim supk→∞ Xk .1. lim supk→∞ Xk is the set of all limits of subsequences {xk }K such that xk ∈ Xk . such as being open. closed.. Proposition 1. it is open (respectively. Equivalently. which shows that if a sequence converges with respect to one norm. (h) If {Xk } is a sequence of nonempty and compact sets such that Xk ⊃ Xk+1 for all k. . do not depend on the norm being used. (e) A set is open if and only if all of its elements are interior points.1 Linear Algebra and Analysis 13 Proposition 1. for all k ∈ K. closed. Sequences of Sets Let {Xk } be a sequence of nonempty subsets of n . or compact. . (c) The union of open sets is open. .8: If a subset of n is open (respectively. 2. . or compact) with respect to all other norms. (b) The intersection of closed sets is closed. The outer limit of {Xk }. (d) The intersection of ﬁnitely many open sets is open. closed. · on n. referred to as the norm equivalence property in n .
If there exists a vector y ∈ m such that the sequence f(xk ) converges to y for every sequence {xk } ⊂ X such that limk→∞ xk = x. If f : X → m is continuous at every point of a subset of its domain X. . 2. limz↓x f (z)]. we write limz↑x f(z) = y [respectively. and otherwise it is empty. 2. Continuity Let f : X → m be a function. where X is a subset of n . If each set Xk consists of a single point xk . . . . xk ≥ x) for all k. leftcontinuous) at a point x ∈ X if limz↓x f (z) = f (x) [respectively. m is called continuous at a point x ∈ X if (b) A function f : X → m is called rightcontinuous (respectively. and lower semicontinuous functions.. Deﬁnition 1. we say that f is continuous over that subset . lim inf k→∞ Xk is the set of all limits of sequences {xk } such that xk ∈ Xk . Equivalently. denoted lim inf k→∞ Xk . f (x) ≤ lim inf k→∞ f (xk )] for every sequence {xk } of elements of X converging to x. upper semicontinuous. lower semicontinuous ) at a point x ∈ X if f(x) ≥ lim supk→∞ f (xk ) [respectively. we write limz→x f (z) = y.5: Let X be a subset of (a) A function f : X → limz→x f(z) = f (x). If f : X → m is continuous at every point of its domain X . lim supk→∞ Xk is the set of limit points of {xk }.14 Convex Analysis and Optimization Chap. (c) A realvalued function f : X → is called upper semicontinuous (respectively. limz↑x f (z) = f(x)].1. . The sequence {Xk } is said to converge to a set X if X = lim inf Xk = lim sup Xk . k = 1. is the set of all x ∈ n such that every neighborhood of x has a nonempty intersection with all except ﬁnitely many of the sets Xk . and let x be a point in X . for all k = 1. we say that f is continuous. n. . while lim inf k→∞ Xk is just the limit of {xk } if {xk } converges. 1 The inner limit of {Xk }. .. leftcontinuous. If there exists a vector y ∈ m such that the sequence f (xk ) converges to y for every sequence {xk } ⊂ X such that limk→∞ xk = x and xk ≤ x (respectively. k→∞ k→∞ in which case X is said to be the limit of {Xk }. We use similar terminology for rightcontinuous. The inner and outer limits are closed (possibly empty) sets.
1. x ∈ n  f (x) ∈ Y . (d) Let f : n → m be continuous. Note that by the Schwartz inequality (Prop. the corresponding induced matrix norm. the above equation deﬁnes a bona ﬁde matrix norm having all the required properties. is compact. is open (respectively.9: (a) The composition of two continuous functions is continuous. Given any vector norm · . f(x)  x ∈ X .1.1. closed). The norm of an n × n matrix A is denoted by A .2). is deﬁned by A = sup x∈ n  x =1 Ax .1.3 Square Matrices Deﬁnition 1. Then the forward image of X. .1 Linear Algebra and Analysis 15 Proposition 1. 1. and let Y ⊂ m be open (respectively. We are mainly interested in induced norms. Otherwise it is called nonsingular or invertible. (b) Any vector norm on n is a continuous function. also denoted by · . closed). we have A = sup x =1 Ax = sup y = x =1 y Ax. (c) Let f : n → m be continuous. and let X ⊂ n be compact. it follows that A = A . By reversing the roles of x and y in the above relation and by using the equality y Ax = x A y. which are constructed as follows. 1.Sec. It is easily veriﬁed that for any vector norm.6: A square matrix A is called singular if its determinant is zero. 1. Then the inverse image of Y . Matrix Norms A norm · on the set of n × n matrices is a realvalued function that has the same properties as vector norms do when the matrix is viewed as an 2 element of n .
(c) For any two square invertible matrices A and B of the same dimensions.1. n there is a unique x ∈ such that (v) There is an n × n matrix B such that AB = I = BA. Proposition 1. where λ is an eigenvalue of A. . A nonzero vector x (with possibly complex coordinates) such that Ax = λx. (vii) The rows of A are linearly independent. n. Note that the only use of complex numbers in this book is in relation to eigenvalues and eigenvectors. is called an eigenvector of A associated with λ. where I is the identity matrix of the same size as A. (iii) For every nonzero x ∈ (iv) For every y ∈ Ax = y.10: (a) Let A be an n × n matrix. we have (AB)−1 = B −1 A−1 .11: Let A be a square matrix. (b) Assuming that A is nonsingular. (ii) The matrix A is nonsingular. (vi) The columns of A are linearly independent. (b) A is singular if and only if it has an eigenvalue that is equal to zero. The following are equivalent: (i) The matrix A is nonsingular. the matrix B of statement (v) (called the inverse of A and denoted by A−1 ) is unique. Deﬁnition 1.1.1.16 Convex Analysis and Optimization Chap. we have Ax = 0.7: The characteristic polynomial φ of an n × n matrix A is deﬁned by φ(λ) = det(λI − A). 1 Proposition 1. n. (a) A complex number λ is an eigenvalue of A if and only if there exists a nonzero eigenvector associated with λ. The n (possibly repeated or complex) roots of φ are called the eigenvalues of A. All other matrices or vectors are implicitly assumed to have real components.
. . . . where λi is the eigenvalue corresponding to xi . . · denotes the Euclidean norm. λn are the eigenvalues of A. . where λ1 . particularly regarding their eigenvalues and eigenvectors. . λk . c + λn . then the eigenvalues of A−1 are the reciprocals of the eigenvalues of A. . .12: Let A be an n × n matrix. then the eigenvalues of A and B coincide. xn . . . (b) The matrix A has a set of n mutually orthogonal. (c) The eigenvalues of Ak are equal to λk . . . . . Proposition 1. 1.13: Let A be a symmetric n × n matrix. Then n A= i=1 λi xi xi .1. (a) If T is a nonsingular matrix and B = T AT −1 . (c) Suppose that the eigenvectors in part (b) have been normalized so that xi = 1 for each i.1. (d) If A is nonsingular. real. In what follows in this section. where λ1 . . .1 Linear Algebra and Analysis 17 Proposition 1. . . Symmetric and Positive Deﬁnite Matrices Symmetric matrices have several special properties. λn n 1 are the eigenvalues of A.Sec. and nonzero eigenvectors x1 . the eigenvalues of cI + A are equal to c + λ1 . . Then: (a) The eigenvalues of A are real. . (e) The eigenvalues of A and A coincide. (b) For any scalar c.
min λ1 .1. where by the Euclidean norm. (c) If A is nonsingular. It is called positive semideﬁnite if x Ax ≥ 0 for all x ∈ n .15: Let A be a square matrix.16: (a) The sum of two positive semideﬁnite matrices is positive semidefinite. λn  Proposition 1. Then: (a) If A is symmetric. then the matrix T AT is positive semideﬁnite. Thus whenever we say that a matrix is positive (semi)deﬁnite. x = 0. = A A = AA . If A is positive deﬁnite and T is invertible. and let matrix norm induced by the Euclidean norm. . and let λ1 ≤ · · · ≤ λn be its (real) eigenvalues.1.8: A symmetric n × n matrix A is called positive deﬁnite if x Ax > 0 for all x ∈ n . then Ak = A (b) A 2 k · be the for any positive integer k. (b) λ1 y 2 · is the matrix norm induced n. the sum is positive deﬁnite. the notion of positive deﬁniteness applies exclusively to symmetric matrices. then ≤ y Ay ≤ λn y 2 for all y ∈ A−1 = 1 . then T AT is positive deﬁnite.1. 1 Proposition 1. we implicitly assume that the matrix is symmetric. Then: (a) A = max λ1 . Throughout this book. (b) If A is a positive semideﬁnite n × n matrix and T is an m × n matrix. Proposition 1. Deﬁnition 1.1. If one of the two matrices is positive deﬁnite. λn  .18 Convex Analysis and Optimization Chap.14: Let A be a symmetric n × n matrix.
(d) There holds A−1/2 A−1/2 = A−1.1 Linear Algebra and Analysis 19 Proposition 1. A A is positive deﬁnite if and only if A has rank n. (c) A symmetric square root A1/2 is invertible if and only if A is invertible.18: Let A be a symmetric positive semideﬁnite n×n matrix of rank m. positive deﬁnite) if and only if all of its eigenvalues are nonnegative (respectively. Proposition 1. Such a matrix is called a symmetric square root of A and is denoted by A1/2. for any such matrix C: (a) A and C have the same null space: N (A) = N (C ). positive). (a) There exists a symmetric matrix Q with the property Q2 = A.1. (c) The inverse of a symmetric positive deﬁnite matrix is symmetric and positive deﬁnite. . In particular. 1. the matrix A A is symmetric and positive semideﬁnite. if m = n.1. There exists an n × m matrix C of rank m such that A = CC . Furthermore. Its inverse is denoted by A−1/2 . (b) A square symmetric matrix is positive semideﬁnite (respectively. Proposition 1.17: (a) For any m×n matrix A. (b) There is a unique symmetric square root if and only if A is positive deﬁnite.1.Sec. A A is positive deﬁnite if and only if A is nonsingular. (e) There holds AA1/2 = A1/2 A.19: Let A be a square symmetric positive semidefinite matrix. (b) A and C have the same range space: R(A) = R(C).
If the above limit exists. The preceding equation can also be used as an alternative deﬁnition of diﬀerentiability. Let f : n → expression and consider the lim f (x + αei ) − f (x) . −ei ) ⇒ f (x. It is seen that f is diﬀerentiable at x if and only if the gradient f (x) exists and satisﬁes f (x) y = f (x. it is called the ith partial derivative of f at the point x and it is denoted by (∂f /∂xi )(x) or ∂f (x)/∂xi (xi in this section will denote the ith coordinate of the vector x). (1. ﬁx some x ∈ α→0 n. y) is a linear function of y. the gradient of f at x is deﬁned as the column vector ∂f (x) For any y ∈ n . . Assuming all of these partial derivatives exist. we deﬁne the onesided directional derivative of f in the direction y. f is called Frechet diﬀerentiable at x . In particular. y) for every y ∈ n . ei ) = −f (x. to be f (x. If f is diﬀerentiable over an open set U and the gradient f (x) is continuous at all x ∈ U. Such a function has the property lim f (x + y) − f (x) − y f (x) y = 0.1. . This type of diﬀerentiability is also called Gateaux diﬀerentiability . We note from the deﬁnitions that f (x. f(x + αy) − f (x) . α where ei is the ith unit vector (all components are 0 except for the ith component which is 1). ∀ x ∈ U. 1 1.4 Derivatives be some function. ei ) = (∂f /∂xi )(x). The function f is called diﬀerentiable (without qualiﬁcation) if it is diﬀerentiable at all x ∈ n . The function f is called diﬀerentiable over a given subset U of n if it is diﬀerentiable at every x ∈ U.1) y→0 where · is an arbitrary vector norm. α provided that the limit exists. If f is continuously diﬀerentiable over n . .20 Convex Analysis and Optimization Chap. y) = lim α↓0 f(x) = ∂x1 ∂f (x) ∂xn . then f is also called a smooth function. we say that f is diﬀerentiable at x. If the directional derivative of f at a vector x exists in all directions y and f (x. f is said to be continuously diﬀerentiable over U .
y) ∂xm 2 f (x. y) We denote by components ∂f(x. y f (x. these deﬁnitions can be used for functions f that are not deﬁned on all of n . xx xy = ∂x1 . we use the above deﬁnition of diﬀerentiability or continuous diﬀerentiability of f over a subset U. when dealing with a diﬀerentiable function f . If such a vector g exists. .1 Linear Algebra and Analysis 21 if there exists a vector g satisfying Eq. yn ) ∈ n . we will always assume that f is continuously diﬀerentiable (smooth) over a given open set [ f(x) is a continuous function of x over that set]. it can be seen that all the partial derivatives (∂f /∂xi )(x) exist and that g = f (x). and the distinctions made above are of no consequence. The Hessian of f is the matrix whose ijth entry is equal to (∂ 2 f/∂xi ∂xj )(x). . y) xx ij = ∂ 2 f(x. . we write ∂f(x. 2 f(x.1) with f (x) replaced by g. ∂xi ∂yj . provided U is an open subset of the domain X. for functions f : X → that are deﬁned over a strict subset X of n . . but are deﬁned instead in a neighborhood of the point at which the derivative is computed.Sec. Similarly. y). in which case f is both Gateaux and Frechet diﬀerentiable. y). respectively). Frechet diﬀerentiability implies (Gateaux) diﬀerentiability but not conversely (see for example [OrR70] for a detailed discussion).y) x f(x. . ∂xi ∂xj 2 f (x. We use the notation (∂ 2 f/∂xi ∂xj )(x) to indicate the ith partial derivative of ∂f /∂xj at a point x ∈ n . (1. y) and 2 f (x. . y) . . If f : n → m is a vectorvalued function. . is the n × m matrix whose ith column is the gradient fi (x) of fi . y).y) ∂f(x. In this book. where x = (x1 . y) xy ij = . which implies that 2 f (x) is symmetric. .y) ∂yn . Thus. the matrices with ∂ 2 f (x. . 2 f(x. denoted f(x). Now suppose that each one of the partial derivatives of a function f : n → is a smooth function of x. We have (∂ 2 f /∂xi ∂xj )(x) = (∂ 2 f/∂xj ∂xi )(x) for every x. . xm ) ∈ m and y = (y1. . it is called diﬀerentiable (or smooth) if each component fi of f is diﬀerentiable (or smooth. Thus any mention of diﬀerentiability of a function f over a subset implicitly assumes that this subset is open. y) . y) yy = ∂y1 ∂f(x. and is denoted by 2f (x). . If f : m+n → is a function of (x. Thus. The transpose of f is called the Jacobian of f and is a matrix whose ijth entry is equal to the partial derivative ∂fi /∂xj . In particular. The deﬁnitions of diﬀerentiability of f at a point x only involve the values of f in a neighborhood of x. The gradient matrix of f . we use the above deﬁnition of diﬀerentiability of f at a vector x provided x is an interior point of the domain X. f (x) = f1 (x) · · · fm (x) . 1.
there exists an α ∈ [0. 1 = ∂ 2 f(x.e.1. . x h(x). y) = x f1 (x. y) . We now state some theorems relating to diﬀerentiable functions that will be useful for our purposes. x f h(x).. y) . the chain rule for diﬀerentiation states that h(x) = f (x) g f(x) . 1] such that f (x + y) = f (x) + f (x + αy) y. y) yy ij Chap. g(x) . and let x be a vector in S. y) · · · m y fr (x. y f (x. Let f : k → m and g : be their composition. ∀x∈ k. y = h(x) hf f (Ax). f = (f1 . Then for all y such that x + y ∈ S. . → n be smooth functions. y) · · · x fr (x. g(x) = h(x) hf h(x).20: (Mean Value Theorem) Let f : n → be continuously diﬀerentiable over an open sphere S. y . Proposition 1. y) . . g(x) + g(x) gf h(x). and let h h(x) = g f (x) . fr ). Some examples of useful relations that follow from the chain rule are: f (Ax) = A where A is a matrix. . i.22 Convex Analysis and Optimization 2 f (x. 2 f (Ax) = A 2 f (Ax)A. f h(x). ∂yi ∂yj If f : m+n → r. we write x f (x. y) = y f1 (x. Then. . f2 .
φ(x) yf x. y) = 0. and let x be a vector in S.21: (Second Order Expansions) Let f : n → be twice continuously diﬀerentiable over an open sphere S. 1] such that f (x + y) = f (x) + y (c) There holds f (x + y) = f (x) + y f (x) + 1 y 2 2 f (x)y f (x) + 1 y 2 2 f (x + αy)y.22: (Implicit Function Theorem) Consider a function f : n+m → m of x ∈ n and y ∈ m such that: (1) f (x. y) in an open set containing (x. +o y 2 . 1.1. As a ﬁnal word of caution to the reader. Furthermore. y). (b) There exists an α ∈ [0. Proposition 1. and a continuous function φ : Sx → Sy such that y = φ(x) and f x. Then for all y such that x + y ∈ S: (a) There holds f (x + y) = f(x) + y f (x) + 1 y 2 1 0 t 0 2 f (x + τ y)dτ dt y. then y = φ(x). let us mention that one can easily get confused with gradient notation and its use in various formulas. (2) f is continuous.1. respectively. such as for example the order of multiplication of various gradients in the chain rule and the Implicit Function Theorem. φ(x) = 0 for all x ∈ Sx .Sec. φ(x) −1 . and has a continuous and nonsingular gradient matrix y f (x. y ∈ Sy . if for some integer p > 0. and we have φ(x) = − xf x. Perhaps the safest guideline to minimize errors is to remember our conventions: . y) = 0. ∀ x ∈ Sx .1 Linear Algebra and Analysis 23 Proposition 1. f is p times continuously diﬀerentiable the same is true for φ. and f (x. The function φ is unique in the sense that if x ∈ Sx . Then there exist open sets Sx ⊂ n and Sy ⊂ m containing x and y.
(1. 1]. (b) The gradient f of a scalar function f : column vector. it will usually be apparent from the context whether this set can be empty.2. but we will often be speciﬁc in order to minimize ambiguities.2. . . when referring to a convex set. Deﬁnition 1. fm . The following proposition lists some operations that preserve convexity of a set. fm is the n × m matrix whose columns are the (column) vectors f1 .1: Let C be a subset of n . ∀ α ∈ [0. The material of this section permeates all subsequent developments in this book. We say that C is convex if αx + (1 − α)y ∈ C. . ∀ x. .2) Note that the empty set is by convention considered to be convex. With these rules in mind one can use “dimension matching” as an eﬀective guide to writing correct formulas quickly. y ∈ C. .2 CONVEX SETS AND FUNCTIONS We now introduce some of the basic notions relating to convex sets and functions. 1. n → is also viewed as a (c) The gradient matrix f of a vector function f : n → m with components f1 .1 Basic Properties The notion of a convex set is deﬁned below and is illustrated in Fig. 1. .24 Convex Analysis and Optimization Chap. 1. and will be used in the next section for the discussion of important issues in optimization. . . 1 (a) A vector is viewed as a column vector (an n × 1 matrix).1.2. such as the existence of optimal solutions. . Generally.
2.2. Furthermore. The proofs of parts (b)(e) are similar and are left as exercises for the reader. For convexity. (c) The set λC is convex for any convex set C and scalar λ. if C is a convex set and λ1 . (e) The image and the inverse image of a convex set under an aﬃne function are convex. Eq.2).Sec.2 Convex Sets and Functions αx + (1 α)y. we take two points x and y from ∩i∈I Ci . λ2 are positive scalars.E. For example.1: (a) The intersection ∩i∈I Ci of any collection {Ci  i ∈ I} of convex sets is convex. Illustration of the deﬁnition of a convex set. to prove part (a).D. Q. to their intersection. (1. (b) The vector sum C1 + C2 of two convex sets C1 and C2 is convex. . Proposition 1. linear interpolation between any two points of the set must yield points that lie within the set. and we use the convexity of Ci to argue that the line segment connecting x and y belongs to all the sets Ci . we have (λ1 + λ2 )C = λ1 C + λ2 C. 1. Proof: The proof is straightforward using the deﬁnition of convexity. (d) The closure and the interior of a convex set are convex. cf. 0 <α < 1 x y y x 25 y x y x Convex Sets Nonconvex Sets Figure 1.1. and hence.
The linear interpolation αf (x) + (1 − α)f (y) overestimates the function value f (αx + (1 − α)y) for all α ∈ [0. If C is a convex set and f : C → is a convex function.e. 1].3) The function f is called concave if −f is convex. n. we have λx ∈ C. Convex Functions The notion of a convex function is deﬁned below and is illustrated in Fig. (1. Several of the results of the above proposition have analogs for cones (see the exercises). A function f : ∀ x. 1.26 Convex Analysis and Optimization Chap. y ∈ C. Deﬁnition 1. αf(x) + (1 α)f(y) f(z) x C y z Figure 1.2. A cone need not be convex and need not contain the origin (although the origin always lies in the closure of a nonempty cone). f becomes convex. For a function f : X → .2.2.2: Let C be a convex subset of C → is called convex if f αx + (1 − α)y ≤ αf (x) + (1 − α)f (y). y ∈ C with x = y. (1. ∀ α ∈ [0. we also say that f is convex over the convex set C if the domain X of f contains C and Eq. 1].. the level sets {x ∈ C  f(x) ≤ γ} and {x ∈ C  f (x) < γ} are convex for all scalars .3) holds. and all α ∈ (0. 1). i. 1 A set C is said to be a cone if for all x ∈ C and λ > 0.2.2. when the domain of f is restricted to C. Illustration of the deﬁnition of a function that is convex over a convex set C. The function f is called strictly convex if the above inequality is strict for all x.
is called concave if the function −f : C → (−∞. To see this. It can be seen that if f is convex. to extend the domain to the entire space n . ∞] is convex as per the preceding deﬁnition. 1]. One complication when dealing with extended realvalued functions is that we sometimes need to be careful to exclude the unusual case where f (x) = ∞ for all x ∈ C. but allow f to take the value ∞. ∞] is called convex over C (or simply convex when C = n ) if the condition f αx + (1 − α)y ≤ αf (x) + (1 − α)f (y). In particular. However. and f αx + (1 − α)y ≤ αf (x) + (1 − α)f (y) ≤ γ. We generally prefer to deal with convex functions that are realvalued and are deﬁned over the entire space n (rather than over just a convex subset). since it trivially satisﬁes the above inequality (such functions are sometimes called improper in the literature of convex functions. Note that if X is convex and f is convex over X. In particular. if C ⊂ n is a convex set. In such situations. 1] holds. then for any α ∈ [0. we have αx + (1 − α)y ∈ C. y ∈ C are such that f (x) ≤ γ and f (y) ≤ γ. We are thus motivated to introduce extended realvalued convex functions that can take the value of ∞ at some points.2 Convex Sets and Functions 27 γ. ∀ α ∈ [0. by the convexity of f . ∞) to be the set dom(f) = x ∈ X  f (x) > −∞ . then dom(f) is a convex set. it may be convenient. but we will not use this terminology here). . the converse is not true. where C is a convex set. Another area of concern is working with functions that can take both values −∞ and ∞. However. we deﬁne the eﬀective domain of an extended realvalued function f : X → [−∞. For this reason. The function f is called strictly convex if the above inequality is strict for all x and y in C such that f (x) < ∞ and f(y) < ∞. Similarly. We deﬁne the eﬀective domain of an extended realvalued function f : X → (−∞. ∞). ∀ x. in which case f is still convex. for example. a function f : C → (−∞. we will encounter functions f that are convex over a convex subset C and cannot be extended to functions that are convex over n . by the convexity of C. in some situations. ∞] to be the set dom(f ) = x ∈ X  f(x) < ∞ . the function f (x) = x has convex level sets but is not convex. y ∈ C. instead of restricting the domain of f to the subset C where f takes real values. On occasion we will deal with extended realvalued functions that can take the value −∞ but not the value ∞. prominently arising in the context of duality theory. note that if x. a function f : C → [−∞.Sec. the level sets {x ∈ C  f (x) ≤ γ} and {x ∈ C  f (x) < γ} are convex for all scalars γ. 1. we will try to minimize the occurances of such functions. because of the possibility of the forbidden expression ∞−∞ arising when sums of function values are considered.
Epigraphs are very useful for our purposes because questions about convexity and lower semicontinuity of functions can be reduced to corresponding questions about convexity and closure of their epigraphs. we can use results stated in terms of realvalued functions. Thus. as shown in the following proposition. is the subset of n+1 given by epi(f ) = (x. functions that are realvalued over the entire space n are more convenient (and even essential) in numerical algorithms and also in optimization analyses where a calculusoriented approach based on diﬀerentiability is adopted. . we say that f is lower semicontinuous over U . for example. On the other hand. However. we can convert it to a realvalued function. and with analysis as well as numerical methods. and in fact may be more natural because some basic constructions around duality involve extended realvalued functions.5(c)]. the entire subject of convex functions can be developed without resorting to extended realvalued functions. This deﬁnition is consistent with the corresponding deﬁnition for realvalued functions [cf. the epigraph remains unaﬀected. ∞] is called lower semicontinuous at a vector x ∈ X if f (x) ≤ lim inf k→∞ f (xk ) for every sequence {xk } converging to x. If f is lower semicontinuous at every x in a subset U ⊂ X. where nonlinear equality and nonconvex inequality constraints are involved (see Chapter 2).2. namely that extended realvalued functions can be adopted as the norm. w)  x ∈ X. Generally. we will generally prefer to avoid using extended realvalued functions. f (x) ≤ w . 1. and we can also avoid calculations with ∞. where X ⊂ n . 1.1. Note that if we restrict f to its eﬀective domain x ∈ X  f(x) < ∞ . ∞]. we will adopt a ﬂexible approach and use both realvalued and extended realvalued functions. the classical treatment of Rockafellar [Roc70] uses this approach. Since we plan to deal with nonconvex as well as convex problems. The epigraph of a function f : X → (−∞. This is typically the case in nonconvex optimization. consistently with prevailing optimization practice. (see Fig. The reverse is also true. so that it becomes realvalued. Epigraphs and Semicontinuity An extended realvalued function f : X → (−∞. 1 Note that by replacing the domain of an extended realvalued convex function with its eﬀective domain.28 Convex Analysis and Optimization Chap. In this way. Def.3). extended realvalued functions oﬀer notational advantages in a convex programming setting (see Chapters 3 and 4). unless there are strong notational or other advantages for doing so. w ∈ .
2: Let f : X → (−∞. and by using the convexity of f . ∞] be a function. αw1 + (1 − α)w2 . x2 ∈ dom(f) and f (x1 ) ≤ w1 . assume that epi(f ) is convex.2 Convex Sets and Functions 29 f(x) Epigraph f(x) Epigraph x Convex function Nonconvex function x Figure 1. (i) The level set {x  f (x) ≤ γ} is closed for all scalars γ. we have αx1 + (1 − α)x2 .2. f(x2 ) ≤ w2 . respectively. Conversely. w1 ) + (1 − α)(x2 . 1]. 1]. n. If (x1 . we obtain f αx1 + (1 − α)x2 ≤ αf (x1 ) + (1 − α)f (x2 ) ≤ αw1 + (1 − α)w2 . Proposition 1. Proof: (a) Assume that dom(f ) is convex and f is convex over dom(f). and by multiplying these inequalities with α and (1 − α). showing the convexity of epi(f ). f (x1 ) and x2 . Illustration of the epigraphs of extended realvalued convex and nonconvex functions. f(x2 ) belong to epi(f ). by adding. which is equal to α(x1 .Sec. we have x1 . (b) Assuming X = the following are equivalent: n. . w1 ) and (x2 . and let x1 . w2 ). Then: (a) epi(f ) is convex if and only if dom(f ) is convex and f is convex over dom(f ). 1.2. so by convexity. w2) belong to epi(f) and α ∈ [0. x2 ∈ dom(f ) and α ∈ [0. αf (x1 ) + (1 − α)f (x2 ) ∈ epi(f ).3. Hence the vector αx1 + (1 − α)x2 . The pairs x1 . (ii) f is lower semicontinuous over (iii) epi(f ) is closed. belongs to epi(f ).
D. Therefore. γ). we say that f is a closed function. otherwise. (ii) implies (iii). if we extend the domain of f to n and ˜ consider the function f given by ˜ f (x) = f (x) ∞ if x ∈ X. take for example f to be constant for x in some nonclosed set and ∞ otherwise. wk )} ⊂ epi(f ). the function f (x) = ∞ 1 x if x > 0. (x. Assume that the level set x  f (x) ≤ γ is closed for all scalars γ. / we see that according to the preceding proposition. Then (xk . ∞] is a closed set. Hence. Furthermore. Since each level set x  f (x) ≤ γ is closed. so dom(f ) is convex. and that f is lower semicontinuous at x. we have (x. and let (x. γ) ∈ epi(f ).E. and let {xk } be a sequence converging to x. (iii) implies (i). Assume that f is lower semicontinuous over n . we obtain f (x) ≤ lim inf k→∞ f (xk ) ≤ w. implying that this set is closed. (b) We ﬁrst show that (i) implies (ii). Then we have f (xk ) ≤ wk . so assume that lim inf k→∞ f (xk ) < ∞. (i) implies (ii).30 Convex Analysis and Optimization Chap. Thus. On the other hand. Therefore. and by taking the limit as k → ∞ and by using the lower semicontinuity of f at x. it is not necessarily closed. γ) → (x. Hence. We ﬁnally show that (iii) implies (i). Note that if f is lower semicontinuous over dom(f). Assume that epi(f) is closed. 1 Therefore. then we are done. if dom(f ) is closed and f is lower semicontinuous over dom(f). for example. f is closed if and only ˜ f is lower semicontinuous over n . by the deﬁnition of epi(f ). If lim inf k→∞ f (xk ) = ∞. implying that f (x) ≤ lim inf k→∞ f(xk ). if f is closed it is not necessarily true that dom(f ) is closed. Let {xk }K ⊂ {xk } be a subsequence along which the limit inferior of {f(xk )} is attained. γ) ∈ epi(f ) for all k and (xk . If the epigraph of a function f : X → (−∞. w) ∈ epi(f ) and epi(f ) is closed. it follows that x belongs to all the level sets with lim inf k→∞ f (xk ) < γ. so f is convex over dom(f ). w) be the limit of a sequence {(xk . Q. while f αx1 + (1 − α)x2 ≤ αf(x1) + (1 − α)f (x2 ). it follows that αx1 + (1 − α)x2 ∈ dom(f). Then for γ such that lim inf k→∞ f (xk ) < γ and all suﬃciently large k with k ∈ K. Thus. and let {xk } be a sequence that converges to some x and belongs to the level set x  f (x) ≤ γ for some γ. ﬁx a vector x ∈ n . is closed but dom(f ) is the open halfline of positive numbers. We next show that (ii) implies (iii). we have f (xk ) ≤ γ. if x ∈ X. x belongs to the level set x  f (x) ≤ γ . so since epi(f ) is closed. then .
then g is also closed.2. fm are convex. We use the deﬁnition of convexity . ∞] be given functions. . then g is also closed. For example. i∈I If fi . let A be an m × n matrix. . . . and consider the function g : n → (−∞. Proof: (a) Let f1 .Sec. . ∞] be a given function. The following proposition provides some means for recognizing convex functions. ∞] given by g(x) = f(Ax). while if f is closed. λm be positive scalars. then g is also convex. by using the triangle inequality. for any x.3: (a) Let f1 . If f is convex. . we have αx + (1 − α)y ≤ αx + (1 − α)y = α x + (1 − α) y . . are closed. (b) Let f : n → (−∞. . . ∞] given by g(x) = λ1 f1 (x) + · · · + λm fm (x). (c) Let fi : n → (−∞. are convex. so the norm function · is convex. this is straightforward to verify using the deﬁnition of convexity. then g is also convex. 1. If f1 . and gives some algebraic operations that preserve convexity of a function. and consider the function g : n → (−∞. i ∈ I. . then g is also closed. and consider the function g : n → (−∞. 1. . 1]. where I is an arbitrary index set. y ∈ n and any α ∈ [0. . ∞] be given functions for i ∈ I. .2 Convex Sets and Functions 31 f is closed because epi(f ) is closed. . . . then g is also convex. i ∈ I. while if fi . ∞] given by g(x) = sup fi (x). Common examples of convex functions are aﬃne functions and norms. fm : n → (−∞. Proposition 1.2(b). let λ1 . . . fm are closed. as can be seen by reviewing the relevant part of the proof of Prop. . fm be convex.2. while if f1 . .
4(d) (the sum of the limit inferiors of sequences is less or equal to the limit inferior of the sum sequence). or (x.32 Convex Analysis and Optimization n Chap. . the epigraphs epi(fi ) are convex. Therefore. 1.2(b). 1. then the epigraphs epi(fi ) are closed. and g is convex by Prop. we have fi (x) ≤ lim inf k→∞ fi (xk ) for all i. . Then they are lower semicontinuous at every x ∈ n [cf. .4. y ∈ and α ∈ [0. Prop. Hence m m g(x) ≤ i=1 λi lim inf fi (xk ) ≤ lim inf k→∞ k→∞ λi fi (xk ) = lim inf g(xk ). Characterizations of Diﬀerentiable Convex Functions For diﬀerentiable functions. w) ∈ ∩i∈I epi(fi ).2. so for every sequence {xk } converging to x.2(b)]. so epi(g) is closed. Q. . it is closed. Hence g is convex. If the fi are closed. (c) A pair (x. so epi(g) is convex.2(a).2. there is an alternative characterization of convexity. w) belongs to the epigraph epi(g) = (x. w)  g(x) ≤ w if and only if fi (x) ≤ w for all i ∈ I.1. 1. m g αx + (1 − α)y = ≤ i=1 m λi fi αx + (1 − α)y λi αfi (x) + (1 − α)fi (y) m i=1 m =α i=1 λi fi (x) + (1 − α) λi fi (y) i=1 = αg(x) + (1 − α)g(y). 1. and g is closed. i=1 k→∞ where we have used Prop. Therefore. along the lines of the proof of part (a). If the fi are convex. epi(g) = ∩i∈I epi(fi ).E.2.D. so by Prop. fm be closed. (b) This is straightforward. . 1 to write for any x. given in the following proposition and illustrated in Fig.2. g is lower semicontinuous at all x ∈ n . Let the functions f1 . 1]. 1.
the three preceding inequalities become strict. The condition f(z) ≥ f (x) + (z − x) f (x) states that a linear approximation. . and let z = αx + (1 − α)y.2. y ∈ C and α ∈ [0. ∀ x. and add them to obtain αf (x) + (1 − α)f (y) ≥ f (z) + αx + (1 − α)y − z f (z) = f (z). We multiply the ﬁrst inequality by α. (1. consider the function g(α) = f x + α(z − x) − f (x) . 1) above. n be a convex set and let f : n → (a) f is convex over C if and only if f (z) ≥ f(x) + (z − x) f (x). assume that f is convex. α α ∈ (0. Proof: We prove (a) and (b) simultaneously. Characterization of convexity in terms of ﬁrst derivatives. f (z). thus showing the strict convexity of f.4: Let C ⊂ be diﬀerentiable over n . Choose any x.4) holds. Conversely. then if we take x = y and α ∈ (0. Assume that the inequality (1. based on the ﬁrst order Taylor series expansion. 1).4) is strict as stated in part (b). 1]. Proposition 1.4. and for α ∈ (0.x)'∇f(x) x z Figure 1. we obtain f(x) ≥ f (z) + (x − z) f (y) ≥ f (z) + (y − z) f (z).Sec.2.4) (b) f is strictly convex over C if and only if the above inequality is strict whenever x = z. z ∈ C. the second by (1 − α). underestimates a convex function. which proves that f is convex. If the inequality (1.4) twice. let x and z be any vectors in C with x = z. Using the inequality (1. 1].2 Convex Sets and Functions 33 f(z) f(x) + (z . 1.
(1.34 Convex Analysis and Optimization Chap. originally formulated (in one dimension) by Fermat in 1637. or f x + α(z − x) − f (x) ≤ f (z) − f(x). 1. (1. we obtain after a straightforward calculation f x + α1 (z − x) − f (x) f x + α2 (z − x) − f (x) ≤ . This is a classical suﬃcient condition for unconstrained optimality.5: Let C ⊂ n be a convex set and let f : be twice continuously diﬀerentiable over n . 2 f(x) is positive semideﬁnite . (c) If C = n and f is convex.4(a): if f : n → is a convex function and f(x∗ ) = 0.4) holds (and holds strictly if f is strictly convex).2. Hence g is monotonically increasing with α. then f is convex over C. thereby showing that the desired inequality (1. then f is strictly convex over C. and is strictly monotonically increasing if f is strictly convex. consider any α1 . Substituting the deﬁnitions (1. This will imply that (z − x) f (x) = lim g(α) ≤ g(1) = f(z) − f (x). Note a simple consequence of Prop.6) α and the above inequalities are strict if f is strictly convex. then for all x. and let α= We have f x + α(z − x) ≤ αf (z) + (1 − α)f (x). with strict inequality if f is strictly convex. with 0 < α1 < α2 < 1.5) in Eq. Indeed. and strictly so if f is strictly convex. 1 We will show that g(α) is monotonically increasing with α. (1.D.2. α↓0 with strict inequality if g is strictly monotonically increasing. α2 . α1 α2 or g(α1 ) ≤ g(α2 ). (b) If 2 f (x) is positive deﬁnite for all x ∈ C. n α1 .5) → (a) If 2 f (x) is positive semideﬁnite for all x ∈ C.E. For twice diﬀerentiable convex functions. Q.6). then x∗ minimizes f over n . α2 z = x + α2 (z − x). Proposition 1. there is another characterization of convexity as shown by the following proposition.
If f is strongly convex with coeﬃcient α. 1].Sec. C = {(x1 .D. Using the continuity of 2 f . y ∈ C. 1. 1. Consider some t. y ∈ C we have f (y) = f (x) + (y − x) f (x) + 1 (y − x) 2 2f x + α(y − x) (y − x) 2f . 1.E.2. we have f(y) > f (x) + (y − x) f (x) for all x. contradicts the convexity of f .1. to obtain a contradiction. Convex functions with this property are called strongly convex with coeﬃcient α. ∀ x.2.1. if f is twice continuously diﬀerentiable.4(a).2. 1. ∀ x. y ∈ n. 1. (b) Similar to the proof of part (a).21(b). 1]. using the positive semideﬁniteness of we obtain f(y) ≥ f (x) + (y − x) f(x). and f (x) = x2 − x2 ]. we obtain f (x + z) < f (x) + z f (x).2 Convex Sets and Functions 35 Proof: (a) By Prop. t ∈ such that . using again Prop. it is not necessarily true that 2 f (x) is positive semideﬁnite at any point of C [take for example n = 2. 1. y ∈ C with x = y. for some α ∈ [0. If f is convex over a strict subset C ⊂ n .4(b). A strengthened 1 2 version of Prop. where I is the identity matrix. (1. (c) Suppose that f : n → is convex and suppose. we conclude that f is convex. It can be shown that the conclusion of Prop. The following proposition considers a strengthened form of strict convexity characterized by the following equation: f (x) − f (y) (x − y) ≥ α x − y 2 . y ∈ n such that x = y. Furthermore. From Prop. we see that we can choose z with small enough norm so that z 2 f(x + αz)z < 0 for every α ∈ [0. and the result follows from Prop.4(a). in view of Prop. and deﬁne the function h : → by h(t) = f x + t(y − x) . Proposition 1.2.7) where α is some positive number. Therefore. then strong convexity of f with coeﬃcient α is equivalent to the positive semideﬁniteness of 2 f(x) − αI for every x ∈ n .6: (Strong Convexity) Let f : n → be smooth. for all x.21(b). which.2. Q. then f is strictly convex. Proof: Fix some x.2. that there exist some x ∈ n and some z ∈ n such that z 2f (x)z < 0. Then.5 is given in the exercises. 1. 0)  x1 ∈ }.5(c) also holds if C is assumed to have nonempty interior instead of being equal to n . 1.
E. Let c be a scalar. 1 t < t . The result follows because dg (t) = (x − y) dt 2f tx + (1 − t)y (x − y) ≥ α x − y 2 . .7). dh/dt is strictly increasing and for any t ∈ (0. (1. 2 for some t and s belonging to [0. Since this inequality has been proved for arbitrary t ∈ (0. 1]. We divide both sides by c2 and then take the limit as c → 0 to conclude that y 2 f (x)y ≥ α y 2 . Since this inequality is valid for every y ∈ n . Q. Using the chain rule and Eq.21(b) twice to obtain f(x + cy) = f (x) + cy and f (x) + c2 y 2 2 f (x + tcy)y. For the converse. Using the Mean Value Theorem (Prop.20). we have dh dh (t ) − (t) (t − t) dt dt = f x + t (y − x) − 2 f x + t(y − x) (y − x)(t − t) ≥ α(t − t)2 x − y > 0.7) holds.7).1.36 Convex Analysis and Optimization Chap. 1).D. assume that 2 f(x) − αI is positive semideﬁnite for all x ∈ n . Suppose now that f is twice continuously diﬀerentiable and Eq. Thus. 1) and x = y. 1]. Adding these two equations and using Eq. Consider the function g : → deﬁned by g(t) = f tx + (1 − t)y (x − y).1. we conclude that f is strictly convex. dτ 1−t Equivalently. we have h(t) − h(0) 1 = t t t 0 dh 1 (τ ) dτ < dτ 1−t 1 t dh h(1) − h(t) (τ ) dτ = . (1. we have f(x)− f (y) (x− y) = g(1) − g(0) = (dg/dt)(t) for some t ∈ [0. (1. it follows that 2 f (x) − αI is positive semideﬁnite. th(1) + (1 − t)h(0) > h(t). 1. 1. where the last inequality is a consequence of the positive semideﬁniteness of 2 f tx + (1 − t)y − αI. We use Prop. c2 y 2 f(x + scy)y. we obtain f (x) = f (x + cy) − cy f(x + cy) + c2 y 2 2 f (x+scy)+ 2 f (x+tcy) y= f(x+cy)− f (x) (cy) ≥ αc2 y 2 . The deﬁnition of h yields tf (y) + (1 − t)f (x) > f ty + (1 − t)x .
2 Convex and Aﬃne Hulls Let X be a subset of n . xm belong to X. denoted conv(X). . and α1 . It is straightforward to verify that the set of all convex combinations of elements of X is convex. . i=1 We recall that an aﬃne set M is a set of the form x + S. its convex hull is m m conv {x1.2. αi = 1 . . . consider the quadratic function f (x) = x Qx. 1.1(a). i=1 (1. xm } = αi xi i=1 αi ≥ 0. and is a convex set by Prop. then the convex combination i=1 αi xi belongs to X (this is easily shown by induction. and for any function f : n → that is convex over X.Sec. . . if X consists of a ﬁnite number of vectors x1 . 1. the function f is convex if and only if Q is positive semideﬁnite. . x1 . The convex hull of a set X . Note that aﬀ(X) is itself an aﬃne set and that it contains . 1. . called the subspace parallel to M .8) This follows by using repeatedly the deﬁnition of convexity. f is strongly convex with coeﬃcient α if and only if 2 f (x) − αI = 2Q − αI is positive semideﬁnite for some α > 0.2. The preceding relation is a special case of Jensen’s inequality and can be used to prove a number of interesting inequalities in applied mathematics and probability theory. . . If X is a subset of n . and is equal to the convex hull conv(X) (see the exercises). By Prop. .6. xm .2. . In particular.5. . 1. denoted aﬀ(X). m. is the intersection of all convex sets containing X. Furthermore. . . where S is a subspace. is the intersection of all aﬃne sets containing X. . where m is a positive integer. where Q is a symmetric matrix. . 1. see the exercises).2. Thus f is strongly convex with some positive coeﬃcient (as well as strictly convex) if and only if Q is positive deﬁnite. . by Prop. . i=1 m Note that if X is convex. i = 1. . . i = 1. . the aﬃne hull of X. . we have m m f i=1 α i xi ≤ αi f (xi ). αi = 1. m. A convex combination of elements of X is a vector m of the form i=1 αi xi .2 Convex Sets and Functions 37 As an example. αm are scalars such that m αi ≥ 0. .
the dimension of C is deﬁned to be the dimension of aﬀ(C). αm and vectors (x1 . xm ∈ X such that x2 − x1 .. .6]. . where γ is the largest γ such that αi − γλi ≥ 0 for all i. . . The following is a fundamental characterization of convex hulls. 1). . By part (a). for some positive α1. . . αm are nonnegative scalars. . Given a subset X ⊂ n . Proof: (a) Let x be a nonzero vector in the cone(X). . 1)  x ∈ X . . 1). . . (x. . 1). . and since any linearly independent set of vectors contains at most n elements. which are linearly m m n+1 . . although it need not be closed [cone(X) can be shown to be closed in special cases. we have (x. there m would exist scalars λ1 . . . we must have m ≤ n. The cone generated by X. . If the scalars αi m are all positive. It can be seen that the aﬃne hull of X . For a convex set C. Proposition 1. . xm . with i=1 λi xi = 0 and at least one of the m λi is positive. such as when X is a ﬁnite set – this is one of the central results of polyhedral convexity and will be shown in Section 1. xm − x1 are linearly independent. x1. and let m be the m smallest integer such that x has the form i=1 αi xi . where αi > 0 and xi ∈ X for all i = 1. . . xm ∈ X that are linearly independent. the aﬃne hull of the convex hull conv(X). (b) Every x in conv(X) can be represented as a convex combination of vectors x1 . where m is a positive integer. and α1 . . . (a) Every x in cone(X) can be represented as a positive combination of vectors x1 . . .e. . If the vectors xi were linearly dependent. x1 . m i. a nonnegative combination of elements of X m is a vector of the form i=1 αi xi . 1) = i=1 αi (xi . 1 conv(X). . m. . . . then x = i=1 γi xi for some positive γi with 1 = i=1 γi . If x ∈ conv(X). . where m is a positive integer with m ≤ n. Consider the linear combination i=1 (αi − γλi )xi . 1) ∈ cone(Y ).38 Convex Analysis and Optimization Chap. .2. (b) The proof will be obtained by applying part (a) to the subset of given by Y = (x. are linearly independent. . . . where m is a positive integer with m ≤ n + 1. . xm belong to X. . . It is easily seen that cone(X ) is a convex cone.7: (Caratheodory’s Theorem) Let X be a subset of n . (xm . . denoted by cone(X). and the aﬃne hull of the closure cl(X) coincide (see the exercises). This combination provides a representation of x as a positive combination of fewer than m vectors of X – a contradiction. the combination i=1 αi xi is said to be positive. . Therefore. λm . is the set of all nonnegative combinations of elements of X.
and for all i. and i i n+1 i=1 k αi = 1. .2. . which belongs to conv(X ). . . is a limit point of the sequence n+1 i=1 αk xk . n+1 k k a sequence in conv(X) can be expressed as i=1 αi xi . . .2 Convex Sets and Functions 39 independent (implying that m ≤ n + 1) or equivalently m m x= i=1 αi xi . . . assume to arrive at a contradiction. . . . deﬁning λ1 = −(λ2 + · · · + λm ). . xn+1 ) n+1 such that i=1 αi = 1. λm . Proposition 1. . 1.Sec. x1 . . i=1 which contradicts the linear independence of (x1 . Equivalently. (xm . xk . . however.D. Proof: Let X be a compact subset of n . Thus. While the interior of C . the vector n+1 i=1 αi xi . it has a limit point (α1. 1. x1 ≥ 0. αn+1 . where for all k and i. Let C be a nonempty convex subset of n . We have. the following. . αn+1 .E. xk ∈ X. The closure cl(C) of C is also a nonempty convex set (Prop. 1. It is not generally true that the convex hull of a closed set is closed [take for instance the convex hull of the set consisting of the origin and the subset {(x1 . i i Q.8: The convex hull of a compact set is compact. and xi ∈ X.E. . Relative Interior. By Caratheodory’s Theorem. x2)  x1 x2 = 1. . 1= i=1 αi . x2 ≥ 0} of 2 ]. Q.D. and Continuity We now consider some generic topological properties of convex sets and functions. . αk ≥ 0. to show that x2 − x1 . . xm − x1 are linearly independent.2. . Finally. 1). Since the sequence k (αk . . . xk ) 1 1 n+1 belongs to a compact set. showing that conv(X) is compact. . such that m i=2 λi (xi − x1) = 0. . . we have m λi (xi . . that there exist λ2 . not all 0.2. 1) = 0. αi ≥ 0.3 Closure.1). . 1).
if x ∈ C and there exists a neighborhood N of x such that N ∩aﬀ(C) ⊂ C. Then.e. Proof: (a) In the case where x ∈ C. The set of all relative boundary points of C is called the relative boundary of C.5. By using a translation argument if necessary. and has the same aﬃne hull as C. belong to ri(C). . we see that {z  z − xk. except possibly x. If m = 0. . 1 may be empty. (c) x ∈ ri(C) if and only if every line segment in C having x as one endpoint can be prolonged beyond x without leaving C [i.2. and let xk.e.9: Let C be a nonempty convex set. (a) (Line Segment Principle) If x ∈ ri(C) and x ∈ cl(C). if C is a line segment connecting two distinct points in the plane.. For example. x is an interior point of C relative to aﬀ(C). it turns out that convexity implies the existence of interior points relative to the aﬃne hull of C. The relative interior of C. xm ∈ ri(C) such that x1 − x0 . 1. This is an important property. / consider a sequence {xk } ⊂ C that converges to x. We say that x is a relative interior point of C. In fact. which shows that (b) Convexity of ri(C) follows from the Line Segment Principle of part (a). for every x ∈ C. .α = αx + (1 − α)xk .2. is the set of all relative interior points of C.α < α }. see Fig. The following proposition gives some basic facts about relative interior points. 1] we have xα = αx + (1 − α)x ∈ ri(C). Since for large enough k. Then as in Fig. . The set C is said to be relatively open if ri(C) = C. there exists a γ > 1 such that x+(γ−1)(x−x) ∈ C].. < α /2} ∩ aﬀ(C) ⊂ C. . the aﬃne hull of C is a subspace of dimension m. (b) (Nonemptiness of Relative Interior ) ri(C) is a nonempty and convex set. which is a unique . then C and aﬀ(C) consist of a single point. 1.2. . . In the case where x ∈ C. i. we assume without loss of generality that 0 ∈ C. to show that for any α ∈ (0. denoted ri(C). x1 . there exist vectors x0 . then ri(C) consists of all points of C except for the end points. A point x ∈ cl(C) such that x ∈ ri(C) is said to be a / relative boundary point of C.α < α } ∩ aﬀ(C) ⊂ C for all k. Proposition 1.5. if m is the dimension of aﬀ(C) and m > 0.40 Convex Analysis and Optimization Chap. it follows that {z  z − xα xα ∈ ri(C). which we now formalize. we have {z  z − xα < α /2} ⊂ {z  z − xk. then all points on the line segment connecting x and x. . xm − x0 span the subspace parallel to aﬀ(C).
. . . . To show the last assertion of part (b). we can ﬁnd m linearly independent vectors z1 . . . i=1 αi < 1. let xα = αx + (1 − α)x and let Sα = {z  z − xα < α }.e. Since x ∈ ri(C).2. Therefore. otherwise there would exist r < m linearly independent vectors in C whose span contains C. 1. 1]. relative interior point. i = 1.6). For all α ∈ (0. . openness of the above set follows by continuity of linear transformation. i = 1. [To see this. . . . . i. . . . note that X is the inverse image of the open set in m m (α1 . i = 1. Consider the set m m X= x x= i=1 αi zi .] Therefore all points of X are relative interior points of C. Thus z1. there exists an open set N such that x ∈ N and N ∩ aﬀ(C) ⊂ X. Sα ∩ aﬀ(C) ⊂ C. αi > 0. Since by construction. It can be seen that each point of Sα ∩ aﬀ(C) is a convex combination of x and some point of S ∩ aﬀ(C). and ri(C) is nonempty. . zm form a basis for aﬀ(C).5. .Sec. . . αm ). . m m under the linear transformation from aﬀ(C) to m that maps i=1 αi zi into (α1 .2. This set is open relative to aﬀ(C). m.2 Convex Sets and Functions 41 xα = αx + (1 α)x x x ε S C αε Sα Figure 1. aﬀ(X) = aﬀ(C) and X ⊂ ri(C). implying that xα ∈ ri(C). contradicting the fact that the dimension of aﬀ(C) is m. there exists a sphere S = {z  z − x < } such that S ∩ aﬀ(C) ⊂ C. consider vectors m x0 = α i=1 zi . . xi = x0 + αzi . m (see Fig. . αm ) i=1 αi < 1. αi > 0. for every x ∈ X. . . . . 1.. Proof of the Line Segment Principle for the case where x ∈ C. . zm in C that span aﬀ(C). . it follows that ri(C) and C have the same aﬃne hull. . If m > 0. .
. αi > 0. where α = 1/γ ∈ (0. since x is in C. zm span aﬀ(C). We will show that x ∈ ri(C). we obtain x ∈ ri(C). because xi − x0 = αzi for all i and vectors z1 . . . The next proposition gives several properties of closures and relative interiors of convex sets. . so by part (a). Construction of the relatively open set X in the proof of nonemptiness of the relative interior of a convex set C that contains the origin.2. and let m m X= i=1 αi zi i=1 αi < 1. .E. (c) If x ∈ ri(C). .2. xm are in the set X and in the relative interior of C. . By the given condition. C and ri(C) all have the same dimension. . . . since X ⊂ ri(C). In view of Prop. . Then we have x = (1 − α)x + αy. Conversely. there exists a vector x ∈ ri(C). . Q. .9(b). We choose m linearly independent vectors z1 . 1. 1 where α is a positive scalar such that α(m + 1) < 1. . X z2 0 z1 C Figure 1. . i = 1. m . the vectors x1 − x0 . We may assume that x = x. 1). By part (b). the given condition holds by the Line Segment Principle. there is a γ > 1 such that y = x + (γ − 1)(x − x) ∈ C. . let x satisfy the given condition. . The vectors x0. where m is the dimension of aﬀ(C). since otherwise we are done. . xm − x0 also span aﬀ(C).6. . zm ∈ C.42 Convex Analysis and Optimization Chap. . It can also be shown that C and cl(C) have the same dimension (see the exercises).D. Furthermore. .
2.2. 1. (b) The inclusion C ⊂ cl(C) follows from the deﬁnition of a relative interior point and the fact aﬀ(C) = aﬀ cl(C) (see the exercises). and assume that that x = x (otherwise we are done). we have αx + (1 − α)x ∈ ri(C) for all α ∈ (0. We choose γ > 1. Similarly. Then by taking closures.2. let x ∈ cl(C). 1. Proof: (a) Since ri(C) ⊂ C. part (a) implies that cl(C) = cl(C).9(a)]. (c) Let C be another nonempty convex set. (c) If ri(C) = ri(C).9(c)]. let z ∈ ri cl(C) . we obtain z ∈ ri(C). Hence C and C have the same closure.9(b)]. implying that Ax ∈ cl(A · X). We will show that z ∈ ri(C). so x ∈ cl ri(C) . By the Line Segment Principle [Prop. we have cl ri(C) ⊂ cl(C) ⊂ cl(C). (b) ri(C) = ri cl(C) . Then the following three conditions are equivalent: (i) C and C have the same relative interior. Prop. (ii) C and C have the same closure. Let x be any point in ri(C) [there exists such a point by Prop.2. 1. (iii) ri(C) ⊂ C ⊂ cl(C). Thus. We will show that x ∈ cl ri(C) . 1]. Furthermore. so by the Line Segment Principle [Prop. Conversely. 1). there exists an x ∈ ri(C). we obtain cl(C) ⊂ cl(C) ⊂ cl(C). 1. part (b) implies that ri(C) = ri(C). then A · cl(C) = cl(A · C) for all m × n matrices A. We use this fact and part (a) to write A · ri(C) ⊂ A · C ⊂ A · cl(C) = A · cl ri(C) ⊂ cl A · ri(C) . To prove the reverse inclusion. if any of these conditions hold the relation ri(C) ⊂ C ⊂ cl(C) implies condition (iii). we have cl ri(C) ⊂ cl(C). 1. since if a sequence {xk } ⊂ X converges to some x ∈ cl(X) then the sequence {Axk } ⊂ A · X converges to Ax.Sec.9(b). Then we have z = (1−α)x+αy where α = 1/γ ∈ (0. We may assume that x = z (otherwise we are done). Finally.2 Convex Sets and Functions 43 Proposition 1. 1. (a) cl(C) = cl ri(C) .2. and by using part (a). . (d) A · ri(C) = ri(A · C) for all m × n matrices A.9(a)]. and hence also to cl(C). if cl(C) = cl(C).2.10: Let C be a nonempty convex set. (d) For any set X. with γ suﬃciently close to 1 so that the vector y = z + (γ − 1)(z − x) belongs to ri cl(C) [cf. By Prop. assume that condition (iii) holds. we have A · cl(X) ⊂ cl(A · X). x is the limit of the sequence (1/k)x + (1 − 1/k)x  k ≥ 1 that lies in ri(C). (e) If C is bounded.
2. Note that if C is closed but unbounded. so by Prop. x2 )  x1 > 0. Then A · C is the (nonclosed) halﬂine (x1 . (e) By the argument given in part (d). so we have cl(C1 ∩ C2 ) ⊂ cl(C1 ) ∩ cl(C2 ). x2 ≥ 0 and let A have the eﬀect of projecting the typical vector x on the horizontal axis. {xk } has a subsequence that converges to some x ∈ cl(C).44 Convex Analysis and Optimization Chap. there exists γ > 1 such that the vector y = z + (γ − 1)(z − x) belongs to C. Thus we have Ay ∈ A · C and Ay = z + (γ − 1)(z − x). Then. there exists a sequence {xk } ⊂ C such that Axk → x. x2) = (x1 . choose any x ∈ cl(A · C). Proof: (a) Let y ∈ cl(C1 ) ∩ cl(C2 ). x2)  x1 x2 ≥ 1. 0). Let x be a vector in the intersection ri(C1) ∩ ri(C2 ) (which is nonempty by assumption). Also C1 ∩ C2 is contained in cl(C1) ∩ cl(C2 ). 1. we have A · cl(C) ⊂ cl(A · C). and let z ∈ ri(C) and x ∈ C be such that Az = z and Ax = x. the sets ri(C1 ) ∩ ri(C2) and . 1]. x2 = 0 . 1. It follows that x ∈ A · cl(C). Q. 1. Proposition 1. ri(C1 ∩ C2 ) = ri(C1 ) ∩ ri(C2 ). For example take the closed set C = (x1 . we have cl(C1 ) ∩ cl(C2 ) ⊂ cl ri(C1 ) ∩ ri(C2 ) ⊂ cl(C1 ∩ C2 ).9(c) it follows that z ∈ ri(A · C). so that cl(C1 ∩ C2 ) = cl(C1 ) ∩ cl(C2 ).e..2. the set A · C need not be closed [cf. x1 ≥ 0. the vector αx+(1−α)y belongs to ri(C1)∩ri(C2 ) for all α ∈ (0. implying that the relative interiors of the sets A · C and A · ri(C) are equal [part (c)].9(c). 1 Thus A·C lies between the set A·ri(C) and the closure of that set.D.11: Let C1 and C2 be nonempty convex sets: (a) Assume that the sets ri(C1 ) and ri(C2 ) have a nonempty intersection. part (e) of the above proposition]. We will show the reverse inclusion by taking any z ∈ A · ri(C) and showing that z ∈ ri(A · C). implying that y ∈ cl ri(C1 ) ∩ ri(C2 ) . which is a closed set. Hence. Since C is bounded. Let x be any vector in A · C. Then cl(C1 ∩ C2 ) = cl(C1 ) ∩ cl(C2 ). Thus. To show the converse. and we must have Ax = x. Hence y is the limit of a sequence αk x + (1 − αk )y ⊂ ri(C1) ∩ ri(C2 ) with αk → 0. By part Prop.2. (b) ri(C1 + C2 ) = ri(C1 ) + ri(C2 ).9(a)]. i. Furthermore. Hence ri(A·C) ⊂ A·ri(C). A · (x1 . By the Line Segment Principle [Prop.E. equality holds throughout in the preceding two relations.2.
Also. the line segment connecting x and y can be prolonged beyond x by a small amount without leaving C1 ∩ C2 . we assume without loss of generality. By Prop.. The requirement that ri(C1 ) ∩ ri(C2 ) = Ø is essential in part (a) of the above proposition. x2 ∈ n . Proposition 1. consider C1 = {x  x ≥ 0} and C2 = {x  x ≤ 0}. By the same proposition. 1. 1. From Jensen’s inequality [Eq. Since A(C1 × C2 ) = C1 + C2 .12: If f : n → is convex. For the purpose of proving continuity at zero.e. ∞] is convex. 1. i = 1. Therefore. then it is continuous.9(c).2. It will suﬃce to show that f is continuous at 0. Continuity of Convex Functions We close this section with a basic result on the continuity properties of convex functions.2. it follows that f (x) ≤ A for every x ∈ X . 2n . Then we have ri(C1 ∩ C2 ) = {0} = Ø = ri(C1 ) ∩ ri(C2 ).e. Consider the sequences {yk } and {zk } given . if f : n → (−∞. implying that ri(C1 ∩ C2 ) ⊂ ri(C1 ) ∩ ri(C2 ).Sec. .E. by Prop. it follows that x ∈ ri(C1 ∩ C2 ). Then we have cl(C1 ∩ C2 ) = Ø = {0} = cl(C1 ) ∩ cl(C2 ).2. be the corners of X. More generally. The relative interior of the Cartesian product C1 × C2 (viewed as a subset of 2n ) is easily seen to be ri(C1 ) × ri(C2 ) (see the exercises). i.2. we can assume that xk ∈ X and xk = 0 for all k. Let A = maxi f (ei ). . then f when restricted to dom(f ). . take any x ∈ ri(C1 ) ∩ ri(C2 ) and any y ∈ C1 ∩ C2 . they have the same relative interior. Proof: Restricting attention to the aﬃne hull of dom(f ) and using a transformation argument if necessary.10(d).D. that the origin is an interior point of dom(f ) and that the unit cube X = {x  x ∞ ≤ 1} is contained in dom(f). 1. It is not diﬃcult to see that any x ∈ X can 2n be expressed in the form x = i=1 αi ei . the result follows from Prop. As an example. Q. that for any sequence {xk } ⊂ n that converges to 0.8)]. . where each αi is a nonnegative 2n scalar and i=1 αi = 1. x2) = x1 + x2 for all x1 . consider the subsets of the real line C1 = {x  x > 0} and C2 = {x  x < 0}.2 Convex Sets and Functions 45 C1 ∩ C2 have the same closure. To show the converse. we have f (xk ) → f (0). Let ei . is continuous over the relative interior of dom(f ). each ei is a vector whose entries are either 1 or −1. (b) Consider the linear transformation A : 2n → n given by A(x1 . i. (1.10(c).
2. xk . we have f(xk ) ≤ 1 − xk Since xk we obtain ∞ ∞ f (0) + xk ∞ f (yk ). we have f (0) ≤ xk ∞ 1 f (zk ) + f (xk ) xk ∞ + 1 xk ∞ + 1 and letting k → ∞. 0. 1. limk→∞ f (xk ) = f (0) and f is continuous at zero. and zk . by taking the limit as k → ∞. Construction for proving continuity of a convex function (cf.46 Convex Analysis and Optimization Chap. 1 by yk = xk .2. Using the deﬁnition of a convex function for the line segment that connects yk . 1. Prop. e4 zk e1 A straightforward consequence of the continuity of a realvalued function f that is convex over n is that its epigraph as well as the level sets x  f (x) ≤ γ are closed and convex (cf. Prop. k→∞ Using the deﬁnition of a convex function for the line segment that connects xk .2.7). xk ∞ (cf. Fig. xk ∞ zk = − xk .2).12).D. and 0. 1. we obtain f (0) ≤ lim inf f (xk ). Q. e3 yk e2 xk xk+1 0 Figure 1. lim sup f (xk ) ≤ f (0).7.E. → 0 and f (yk ) ≤ A for all k. .2. k→∞ Thus.
1. x + αy x y Recession Cone R C Convex Set C 0 Figure 1. (a) The recession cone RC is a closed convex cone.8).4 Recession Cones Some of the preceding results [Props.Sec. A direction of recession y has the property that x + αy ∈ C for all x ∈ C and α ≥ 0. (c) RC contains a nonzero direction if and only if C is unbounded. we say that a vector y is a direction of recession of C if x + αy ∈ C for all x ∈ C and α ≥ 0. Illustration of the recession cone RC of a convex set C. 1. In this section we take a closer look at this issue.2. (b) A vector y belongs to RC if and only if there exists a vector x ∈ C such that x + αy ∈ C for all α ≥ 0. 1.2. Given a convex set C.10(e)] have illustrated how boundedness aﬀects the topological properties of sets obtained through various operations on convex sets. It is called the recession cone of C and it is denoted by RC (see Fig.13: (Recession Cone Theorem) Let C be a nonempty closed convex set. we never cross the relative boundary of C to points outside C. 1.8. The set of all directions of recession is a cone containing the origin.2. . Proposition 1.2. y is a direction of recession of C if starting at any x in C and going indeﬁnitely along y. (e) If D is another closed convex set such that C ∩ D = Ø. In words. The following proposition gives some properties of recession cones.2.2 Convex Sets and Functions 47 1. we have RC∩D = RC ∩ RD .2.8. (d) The recession cones of C and ri(C) are equal.
13(b). zk z zk2 k1 x + αy x + yk x+y x + αy Convex Set C x x Unit Ball Figure 1. k = 1. we may assume that y = 1. y2 belong to RC and λ1 . We have yk = zk − x zk − x x−x zk − x x−x · + = ·y + . and because C is closed. . 2. 1. . If x = zk for some k. Hence λ1y1 + λ2y2 ∈ RC . and we show that x + αy ∈ C. We ﬁx x ∈ C and α > 0.2. . Let zk = x + kαy. (see the construction of Fig. zk − x k = 1. implying that RC is convex. For any x ∈ C and α ≥ 0 we have x + αyk ∈ C for all k. let y be such that there exists a vector x ∈ C with x + αy ∈ C for all α ≥ 0. We may assume that y = 0 (otherwise we are done) and without loss of generality. and let {yk } ⊂ RC be a sequence converging to y.48 Convex Analysis and Optimization Chap. every vector x ∈ C has the required property by the definition of RC .2. Conversely. . . This implies that y ∈ RC and that RC is closed. where the last inclusion holds because x + αy1 and x + αy2 belong to C by the deﬁnition of RC . and we deﬁne yk = zk − x . zk − x zk − x zk − x zk − x zk − x . 2. 1. .9.2. we have for any x ∈ C and α ≥ 0 x + α(λ1 y1 + λ2y2 ) = λ1 (x + αy1 ) + λ2 (x + αy2 ) ∈ C. we have x + αy ∈ C. 1 Proof: (a) If y1 . We thus assume that x = zk for all k. Construction for the proof of Prop. .9). then x + αy = x + α(k + 1)y which belongs to C and we are done. Let y be in the closure of RC . λ2 are positive scalars such that λ1 + λ2 = 1. (b) If y ∈ RC .
For any ﬁxed α ≥ 0.9). this in turn implies that y ∈ RC and y ∈ RD . By part (b).E. Conversely. 1. zk − x → 1. Q. if y ∈ RC . then for a ﬁxed x ∈ ri(C) and all α ≥ 0. we have x + αy ∈ C. 1. the vector x + αyk lies between x and zk in the line segment connecting x and zk for all k such that zk − x ≥ α. we have x + αy ∈ C ∩ D for all α > 0. for any x ∈ ri(C). (e) By the deﬁnition of direction of recession. we have x + αyk ∈ C for all suﬃciently large k. y ∈ RC∩D implies that x + αy ∈ C ∩ D for all x ∈ C ∩ D and all α > 0. Hence by convexity of C. we have yk → y. zk − x x−x → 0. Thus. namely that a closed convex set is bounded if and only if RC = {0}. if y ∈ RC ∩RD and x ∈ C ∩D. Choose any x ∈ C and any unbounded sequence {zk } ⊂ C. RC ∩ RD ⊂ RC∩D . Thus x + αy is the limit of {x + αyk }. we will show that RC contains a nonzero direction (the reverse is clear). Since x + αy is a limit point of {x + αyk }. by part (b). note that the recession cone of the set V = {x ∈ n  Ax ∈ W } . Hence.D.2. by the deﬁnition of direction of recession. so y ∈ RC∩D . (d) If y ∈ Rri(C) . the set V = {x ∈ C  Ax ∈ W } is compact if and only if RC ∩ N (A) = {0}. where yk = zk − x . Conversely. and C is closed. It follows from the Line Segment Principle [cf.9(a)] that x + αy ∈ ri(C) for all α ≥ 0. Since x + αyk → x + αy and C is closed. so that y belongs to Rri(C) . so by convexity of C.Sec. we have x + αy ∈ C for all α ≥ 0. we have x + αy ∈ ri(C) ⊂ C. Note that part (c) of the above proposition yields a characterization of compact and convex sets. so that RC∩D ⊂ RC ∩ RD .2 Convex Sets and Functions 49 Because zk is an unbounded sequence.2. 1. it follows that x + αy must belong to C. To see this. Prop. zk − x so by combining the preceding relations. The vector x + αyk lies between x and zk in the line segment connecting x and zk for all k such that zk − x ≥ α. we have y ∈ RC . (c) Assuming that C is unbounded. A useful generalization is that for a compact set W ⊂ m and an m × n matrix A. Consider the sequence {yk }. we have x + αyk ∈ C for all suﬃciently large k. Hence the nonzero vector y is a direction of recession. zk − x and let y be a limit point of {yk } (compare with the construction of Fig.
15: Let C1 . V is compact if and only if RC ∩ N (A) = {0}. which contradicts the boundedness of W ]. .14. .14: Let C be a nonempty closed convex subset of n and let A be an m × n matrix with nullspace N (A) such that RC ∩ N (A) = {0}. 1 is N (A) [clearly N (A) ⊂ RV . . so by Prop.E.13(c). .13. Proof: Let C be the Cartesian product C1 × · · · × Cm viewed as a subset of mn and let A be the linear transformation that maps a vector (x1 . Then the vector sum C1 + · · · + Cm is a closed set. We have RC = RC1 + · · · + RCm (see the exercises) and N (A) = (y1 .D. Proposition 1. Since AC = C1 + · · · + Cm . . 1. so y ∈ AC. It follows that the set ∩ >0 C is nonempty and any x ∈ ∩ >0 C satisﬁes Ax = y. Hence. the assumption RC ∩ N (A) = {0} implies that C is compact.D. 1. . . . Cm be nonempty closed convex subsets of n such that the equality y1 + · · · + ym = 0 for some vectors yi ∈ RCi implies that yi = 0 for all i = 1. by the discussion following the proof of Prop. so under the given condition. Q.50 Convex Analysis and Optimization Chap. m. One possible use of recession cones is to obtain conditions guaranteeing the closure of linear transformations and vector sums of convex sets in the absence of boundedness. . the result follows by applying Prop.2. the recession cone of V is RC ∩ N (A).2. When specialized to just two sets C1 and C2 . . . 1.2. yi ∈ n . Proposition 1. . Furthermore. ym )  y1 + · · · + ym = 0.2. Q. Proof: For any y ∈ cl(AC).2. the above proposition says that if there is no nonzero direction of recession of C1 that is the . as in the following two propositions (some reﬁnements are given in the exercises). the set C = x ∈ C  y − Ax ≤ is nonempty for all > 0. . Then AC is closed. if x ∈ N (A) but x ∈ RV we must have / αAx ∈ W for all α > 0. . .E. we obtain RC ∩ N (A) = {0}. . xm ) ∈ mn into x1 + · · · + xm .
then C1 + C2 is also bounded and hence also compact. 1. . 1. x2)  x1 = 0 . If both C1 and C2 are bounded.13(c)]. x1 ≥ 0. then C1 + C2 is a compact and convex set.3 (Properties of Cones) (a) For any collection {Ci  i ∈ I} of cones. the intersection ∩i∈I Ci is a cone.E. (b) Show that the convex hull of a set coincides with the set of all the convex combinations of its elements. Prop.2. 1. and let λ1 and λ2 be positive scalars. If C1 is bounded.2. Proof: Closedness of C1 + C2 follows from the preceding discussion.2.2. then C1 + C2 is closed.2. Proposition 1. x2 )  x1 > 0}. (b) The Cartesian product C1 × C2 of two cones C1 and C2 is a cone. x2 ≥ 0 and C2 = {(x1 .1 (a) Show that a set is convex if and only if it contains all the convex combinations of its elements.2. Prop. If both C1 and C2 are bounded.16: Let C1 and C2 be closed.1]. x2 )  x1 x2 ≥ 1. 1. Then C1 + C2 is the open halfspace {(x1 . Q. then C1 + C2 is a closed and convex set. We thus obtain the following proposition.2 Convex Sets and Functions 51 opposite of a direction of recession of C2 . Show by example that the sets (λ1 + λ2 )C and λ1 C + λ2 C may diﬀer when C is not convex [cf. EXE RC ISES 1. This is true in particular if RC1 = {0} which is equivalent to C1 being compact [cf. the vector sum C1 + C2 need not be closed. For example consider the closed sets of 2 given by C1 = (x1 .2 Let C be a nonempty set in n . convex sets.D. Note that if C1 and C2 are both closed and unbounded. 1.Sec.
. i = 1. aﬃne hulls. (e) The image and the inverse image of a cone under a linear transformation is a cone. (c) Assuming X1 . aﬃne hull) of X is equal to the Cartesian product of the convex hulls (closures.2. . (c) Let C1 and C2 be convex cones containing the origin. Show that C1 + C2 = conv(C1 ∪ C2 ).2. C1 ∩ C2 = (1 − α)C1 ∩ αC2 . let X = X1 × · · · × Xm be their Cartesian (a) Show that the convex hull (closure.52 Convex Analysis and Optimization Chap. 1. 1.5 (Properties of Cartesian Product) Given sets Xi ⊂ product. .1] 1. . show that the relative interior (recession cone) of X is equal to the Cartesian product of the relative interiors (recession cones) of the Xi ’s. . where the union is taken over all convex combinations such that only ﬁnitely many coeﬃcients αi are nonzero. . m. (b) Show that cone(X) is equal to the Cartesian product of cone(Xi ). i ∈ I} is a closed convex cone.4 (Convex Cones) (a) For any collection of vectors {ai  i ∈ I}. Show that n . ni . . α∈[0.6 Let {Ci  i ∈ I} be an arbitrary collection of convex sets in convex hull of the union of the collection. the set C = {x  ai x ≤ 0. . 1 (c) The vector sum C1 + C2 of two cones C1 and C2 is a cone. Xm are convex. and let C be the C= i∈I αi Ci .2. . (d) The closure of a cone is a cone. respectively) of the Xi ’s. (b) Show that a cone C is convex if and only if C + C ⊂ C.
xn )  x1 ≥. . In addition. . . Show that the function h deﬁned by h(x) = g f (x) is lower semicontinuous. . . . .e. and cl(X) have the same dimension.2. and let g be a convex monotonically nondecreasing function of a single variable [i.2 Convex Sets and Functions 53 1. . . . then h is strictly convex. .2. and let g : m → be a function such that g(u1 .. (d) Assuming that the origin belongs to conv(X). g(y) ≤ g(y) for y < y]. if g is monotonically increasing and f is strictly convex. Show that the function h deﬁned by h(x) = g f (x) is convex. 1. .10 (Convex Functions) Show that the following functions are convex: (a) f1 : X → is given by f1 (x1 . Show that the function h deﬁned by h(x) = g f (x) is convex over C. fm ) where each fi : n → is a convex function.8 (Lower Semicontinuity under Composition) (a) Let f : n → m be a continuous function and g : m → be a lower semicontinuous function. and g : → be a lower semicontinuous and monotonically nondecreasing function.2.2. (b) Show that cone(X) = cone conv(X) . . . .7 Let X be a nonempty set. . Give an example where the dimension of conv(X ) is smaller than the dimension of cone(X ). (a) Show that X. xn ) ∈ (c) f3 (x) = x p 1 where X = {(x1 . xn ) = −(x1 x2 · · · xn ) n (b) f2 (x) = ln ex1 + · · · + exn with (x1 . conv(X). with p ≥ 1 and x ∈ n n . . . (b) Let f : n → be a lower semicontinuous function. . . Show that the function h deﬁned by h(x) = g f(x) is lower semicontinuous. . . show that conv(X) and cone(X) have the same dimension. .Sec. . . (b) Let f = (f1 . (c) Show that the dimension of conv(X ) is at most as large as the dimension of cone(X ).9 (Convexity under Composition) (a) Let f be a convex function deﬁned on a convex set C ⊂ n . 1. 1. xn ≥ 0}. . 1. um ) is convex and nondecreasing in ui for each i.
2. x3 are three scalars such that x1 < x2 < x3 . has monotonically nondecreasing gradient.2. Hint : The condition above says that the function f . and α and β scalars n (f) f6 (x) = max{0. Let S be the subspace that is parallel to the aﬃne hull of C.2. see also the proof of Prop. 1 and f(x) > 0 for all x. 1.13 Let f : n → be a diﬀerentiable function. 1.5(c) to show that if C is a convex set with nonempty interior. and f : n → is convex and twice continuously diﬀerentiable over C. m . (a) (Monotropic Property) Use the deﬁnition of convexity to show that f is “turning upwards” in the sense that if x1 . y ∈ C. x2 − x1 x3 − x2 .2. n (e) f5 (x) = αf (x) + β with f a convex function over such that α ≥ 0. 1. and β a scalar. A an m × n matrix. (j) f10 (x) = f (Ax + b) with f a convex function over and b a vector in m . 1. restricted to the line segment connecting x and y. (h) f8 (x) = x Ax + b x + β with A an n × n positive semideﬁnite symmetric matrix. f(x)} with f a convex function over .2. 1. b a vector in n . ∀ x. x2 .12 Let C ⊂ n be a convex set and let f : n → be twice continuously diﬀerentiable over C.54 (d) f4 (x) = 1 f (x) Convex Analysis and Optimization with f a concave function over n Chap. (i) f9 (x) = eβx Ax with A an n ×n positive semideﬁnite symmetric matrix and β a positive scalar. . Show that f is convex over a convex set C if and only if f (x) − f (y) (x − y) ≥ 0. 1.2.11 Use the Line Segment Principle and the method of proof of Prop. then 2 f (x) is positive semideﬁnite for all x ∈ C. m (g) f7 (x) = Ax − b with A an m × n matrix and b a vector in .6. then f (x2 ) − f (x1 ) f(x3 ) − f(x2 ) ≤ .14 (Ascent/Descent Behavior of a Convex Function) Let f : → be a convex function of a single variable. Show that f is convex over C if and only if y 2 f (x)y ≥ 0 for all x ∈ C and y ∈ S.
Sec. 1.2
Convex Sets and Functions
55
(b) Use part (a) to show that there are four possibilities as x increases to ∞: (1) f (x) decreases monotonically to −∞, (2) f (x) decreases monotonically to a ﬁnite value, (3) f (x) reaches some value and stays at that value, (4) f(x) increases monotonically to ∞ when x ≥ x for some x ∈ .
1.2.15 (Posynomials)
A posynomial of scalar variables y1 , . . . , yn is a function of the form
m
g(y1 , . . . , yn ) =
i=1
βi y1 i1 · · · ynin ,
a
a
where the scalars yi and βi are positive, and the exponents aij are real numbers. Show the following: (a) A posynomial need not be convex. (b) By a logarithmic change of variables, where we set f (x) = ln g(y1 , . . . , yn ) , we obtain a convex function f (x) = ln exp(Ax + b) deﬁned for all x ∈ n , where ln exp(x) = ln ex1 + · · · + exn , A is an m × n matrix with entries aij , and b ∈ m is a vector with components bi . (c) In general, every function g :
n
bi = ln βi ,
xi = ln yi ,
∀ i = 1, . . . , m,
→
of the form
γ1
g(y) = g1 (y)
· · · gr (y)γr ,
where gk is a posynomial and γk > 0 for all k, can be transformed by the logarithmic change of variables into a convex function f given by
r
f (x) =
k=1
γk ln exp(Ak x + bk ),
with the matrix Ak and the vector bk being associated with the posynomial gk for each k.
1.2.16 (ArithmeticGeometric Mean Inequality)
Show that if α1 , . . . , αn are positive scalars with of positive scalars x1 , . . . , xn , we have
α α n i=1
αi = 1, then for every set
x1 1 x2 2 · · · xαn ≤ α1 x1 + α2 x2 + · · · + αn xn , n with equality if and only if x1 = x2 = · · · = xn . Hint : Show that − ln x is a strictly convex function on (0, ∞).
56
Convex Analysis and Optimization
Chap. 1
1.2.17
Use the result of Exercise 1.2.16 to verify Young’s inequality xy ≤ xp yq + , p q
where p > 0, q > 0, 1/p + 1/q = 1, x ≥ 0, and y ≥ 0. Then, use Young’s inequality to verify Holder’s inequality
n n 1/p n 1/q
i=1
xi yi  ≤
i=1
xi p
i=1
yi q
.
1.2.18 (Convexity under Minimization I)
Let h : given by
n+m
→
be a convex function. Consider the function f : f (x) = inf h(x, u),
u∈U
n
→
where we assume that U is a nonempty and convex subset of m , and that f (x) > −∞ for all x. Show that f is convex, and for each x, the set {u ∈ U  h(x, u) = f (x)} is convex. Hint : There cannot exist α ∈ [0, 1], x1 , x2 , u1 ∈ U , u2 ∈ U such that f αx1 + (1 − α)x2 > αh(x1 , u1 ) + (1 − α)h(x2 , u2 ).
1.2.19 (Convexity under Minimization II)
(a) Let C be a convex set in
n+1
and let
f (x) = inf{w  (x, w) ∈ C}. Show that f is convex over
n
.
n
(b) Let f1 , . . . , fm be convex functions over
m
and let
m
f (x) = inf
i=1
fi (xi )
i=1
xi = x
.
n
Assuming that f (x) > −∞ for all x, show that f is convex over (c) Let h :
m
.
→
be a convex function and let f (x) = inf h(y),
By=x
where B is an n × m matrix. Assuming that f(x) > −∞ for all x, show that f is convex over the range space of B. (d) In parts (b) and (c), show by example that if the assumption f (x) > −∞ for all x is violated, then the set x  f (x) > −∞ need not be convex.
Sec. 1.2
Convex Sets and Functions
57
1.2.20 (Convexity under Minimization III)
Let {fi  i ∈ I} be an arbitrary collection of convex functions on n . Deﬁne the convex hull of these functions f : n → as the pointwise inﬁmum of the collection, i.e., f (x) = inf w  (x, w) ∈ conv ∪i∈I epi(fi ) Show that f (x) is given by .
f (x) = inf
i∈I
αi fi (xi )
i∈I
αi xi = x
,
where the inﬁmum is taken over all representations of x as a convex combination of elements xi , such that only ﬁnitely many coeﬃcients αi are nonzero.
1.2.21 (Convexiﬁcation of Nonconvex Functions)
Let X be a nonempty subset of n , let f : X → be a function that is bounded below over X . Deﬁne the function F : conv(X) → by F (x) = inf w  (x, w) ∈ conv epi(f ) .
(a) Show that F is convex over conv(X) and it is given by
m m m
F (x) = inf
i=1
αi f (xi )
i=1
αi xi = x, xi ∈ X,
i=1
αi = 1, αi ≥ 0, m ≥ 1
(b) Show that
x∈conv(X)
inf
F (x) = inf f(x).
x∈X
(c) Show that the set of global minima of F over conv(X) includes all global minima of f over X.
1.2.22 (Minimization of Linear Functions)
Show that minimization of a linear function over a set is equivalent to minimization over its convex hull. In particular, if X ⊂ n and c ∈ n , then
x∈X
inf c x =
conv(X)
inf
c x.
Furthermore, the inﬁmum in the lefthand side above is attained if and only if the inﬁmum in the righthand side is attained.
. xm belong to X2 . (c) Assuming that the sets C1 and ri(C2 ) have nonempty intersection. then conv(X) is closed and bounded (cf. 1. . . 1. Show that every vector x in X can be represented in the form k m x= i=1 αi xi + i=k+1 α i xi .2.26 Let C1 and C2 be two nonempty convex sets such that C1 ⊂ C2 . xk+1 . . . and the scalars α1 . . . . if X is closed and bounded. . Furthermore. and x0 . xm be vectors in n such that x1 − x0 .8).23 (Extension of Caratheodory’s Theorem) Let X1 and X2 be subsets of n . show that the set ri(C1 ) ∩ ri(C2 ) is nonempty. show that ri(C1 ) ⊂ ri(C2 ). the vectors x1 . Prop. . In particular. .2. . .2. . where m is a positive integer with m ≤ n +1. (a) Give an example showing that ri(C1 ) need not be a subset of ri(C2 ).25 Let X be a bounded subset of n . . 1. xm is called an mdimensional simplex. 1. . . . xm − x0 are linearly independent. .58 Convex Analysis and Optimization Chap. . (a) Show that the dimension of a convex set C is the maximum of the dimensions of the simplices included in C.2. . . αm are nonnegative with α1 +· · ·+αk = 1. . the vectors xk+1 . . . . and let X = conv(X1 ) + cone(X2 ). xm are linearly independent. . . . . . the vectors x2 −x1 . xm are called the vertices of the simplex. xk belong to X1 .24 Let x0 . . . (b) Assuming that the sets ri(C1 ) and ri(C2 ) have nonempty intersection. . . 1 1.2. Show that cl conv(X) = conv cl(X ) . . (b) Use part (a) to show that a nonempty convex set has a nonempty relative interior. The convex hull of x0 . . xk −x1 .
show that ri(A−1 · C) = A−1 · ri(C). 1. i = 1. (b) Let C = cone {x1 . . show that cone(C) is closed. . (b) Assuming that the origin lies in ri(C). .2. (c) The convexity assumption in part (a) can be relaxed as follows: assuming that conv(C) does not contain the origin in its boundary. .28 Let C be a compact set.Sec. . 1. [Compare these relations with those of Prop. show that cone(C) coincides with aﬀ(C).] . then cone(X) coincides with aﬀ(X ). m . .29 (a) Let C be a convex cone. 1. . (a) Assuming that C is a convex set not containing the origin in its relative boundary.2. xm } . 1. (b) Give examples showing that the assertion of part (a) fails if C is unbounded or C contains the origin in its relative boundary. there exists γ > 1 such that x + (γ − 1)(x − x) ∈ C.2. 1. respectively. (a) Show the following reﬁnement of the Line Segment Principle [Prop. Show that ri(C) is also a convex cone. Show that m ri(C) = i=1 α i xi αi > 0. (c) Show the following extension of part (b) to a nonconvex set: If X is a nonempty set such that the origin lies in the relative interior of conv(X).10(d) and (e).2.27 Let C be a nonempty convex set. 1.30 Let A be an m × n matrix and let C be a nonempty convex set in that the inverse image A−1 · C is nonempty.2. show that cone(C) is closed. Hint : Use part (a) and Exercise 1.2 Convex Sets and Functions 59 1. . Assuming cl(A−1 · C) = A−1 · cl(C).2.9(c)]: x ∈ ri(C) if and only if for every x ∈ aﬀ(C).7(b).2. m .
Give an example where the inclusion cl(RC ) ⊂ Rcl(C) is strict.60 Convex Analysis and Optimization Chap. y ∈ X. 1. the intersection C ∩ M is bounded when nonempty.33 (Recession Cones of Nonclosed Sets) Let C be a nonempty convex set. Also. and another example where cl(RC ) = Rcl(C) . .2.e. Show that RC ⊂ RC . ∀ x. Give an example showing that the inclusion can fail if C is not closed. (c) Let C be a relatively open convex set such that C ⊂ C. i. cl(RC ) ⊂ Rcl(C) .33(b). In parts (b) and (c). (b) Show that Rri(C) = Rcl(C) . Show that RC ⊂ RC . (a) Show by counterexample that part (b) of the Recession Cone Theorem need not hold when C is not closed. give an example showing that Rri(C) need not be a subset of RC . (c) Let C be a closed convex set such that C ⊂ C. there exists a positive scalar c such that f (x) − f(y) ≤ c · x − y. Then f has the Lipschitz property over X. 1 1..2. 1. Give an example showing that the inclusion can fail if C is not relatively open. (a) Show that a vector y belongs to Rri(C) if and only if there exists a vector x ∈ ri(C) such that x + αy ∈ C for every α ≥ 0. (b) Show that RC ⊂ Rcl(C) . Show that for every aﬃne set M that is parallel to M.2. use the result of part (a). [Compare with Exercise 1.2. 1.] Hint : In part (a).34 (Recession Cones of Relative Interiors) Let C be a nonempty convex set. follow the proof of part (b) of the Recession Cone Theorem.32 Let C be a closed convex set and let M be an aﬃne set such that the intersection C ∩M is nonempty and bounded.2.31 (Lipschitz Property of Convex Functions) Let f : n → be a convex function and X be a bounded set in n .
(b) Show the following reﬁnement of Prop. 1.14 and Exercise 1.2. then cl(A · C) = A · cl(C). . Then we have cl(C1 + · · · + Cm ) = cl(C1 ) + · · · + cl(Cm ). Cm be nonempty closed convex sets in n such that the equality y1 + · · · + ym = 0 with yi ∈ RCi implies that each yi belongs to the lineality space of Ci . .14: if Rcl(C) ∩N (A) = {0}.2. 1. Rcl(C1 +···+Cm ) = Rcl(C1 ) + · · · + Rcl(Cm ) . (a) Show the following reﬁnement of Prop. 1. RA·cl(C) = A · Rcl(C) .2.15. . denoted by L. .Sec. (b) Give an example showing that A · Rcl(C) and RA·cl(C) can diﬀer when Rcl(C) ∩ N (A) = {0}. then cl(A · C) = A · cl(C). (a) Let C1 . .2. 1. . . Deﬁne the lineality space of C.37 This exercise is a reﬁnement of Prop. 1.2. .2. Then the vector sum C1 + · · · + Cm is a closed set and RC1 +···+Cm = RC1 + · · · + RCm . Cm be nonempty convex sets in n such that the equality y1 +· · ·+ym = 0 with yi ∈ Rcl(Ci ) implies that each yi belongs to the lineality space of cl(Ci ). A · Rcl(C) = RA·cl(C) . (a) Show that for every subspace S ⊂ L C = (C ∩ S ⊥ ) + S.35: if A is an m × n matrix and Rcl(C) ∩ N (A) is a subspace of L. 1. .2.36 (Lineality Space and Recession Cone) Let C be a nonempty convex set in n .35 Let C be a nonempty convex set in n and let A be an m × n matrix. to be a subspace of vectors y such that simultaneously y ∈ RC and −y ∈ RC .3 Convexity and Optimization 61 1. (b) Show the following extension of part (a) to nonclosed sets: Let C1 .
62 Convex Analysis and Optimization Chap. and we indicate this by writing x∗ ∈ arg min f(x).e. We will provide a more general version of this theorem. such as the existence and uniqueness of global minima. will be discussed in subsequent sections. ∞] be a function.3 CONVEXITY AND OPTIMIZATION In this section we discuss applications of convexity to some basic optimization issues. i. This question can often be resolved with the aid of the classical theorem of Weierstrass. a vector x∗ ∈ X such that f (x∗ ) = supx∈X f (x) is said to be a maximum of f over X. which states that a continuous function attains a minimum over a compact set. We say that a vector x∗ ∈ X is a minimum of f over X if f (x∗ ) = inf x∈X f (x). 1 1. Alternatively.1 Global and Local Minima Let X be a nonempty subset of n and let f : n → (−∞. we say that f attains a minimum over X at x∗ . Note that as a consequence of the deﬁnition. A basic question in minimization problems is whether an optimal solution exists. the level sets x  f (x) ≤ γ} of a coercive function f are bounded whenever they are nonempty. Several other applications. relating to optimality conditions and polyhedral convexity. x∈X If the domain of f is the set X (instead of n ). and to this end. . 1. and we indicate this by writing x∗ ∈ arg max f (x). we introduce some terminology. ∞] is coercive if lim f (xk ) = ∞ k→∞ for every sequence {xk } such that xk → ∞ for some norm · . We also call x∗ a minimizing point or minimizer or global minimum over X. x∈X We use similar terminology for maxima.3.. We say that a function f : n → (−∞. we also call x∗ a (global) minimum or (global) maximum of f (without the qualiﬁer “over X ”).
Then. 1.2. Since this set is compact. 1. we assume that inf x∈ n f (x) < ∞.1. consider a sequence {xk } as in the proof under condition (1). (1.3 Convexity and Optimization 63 Proposition 1. Consider a sequence {xk } as in the proof under condition (1). If the given γ is equal to inf x∈ n f (x). k→∞ x∈ Since dom(f) is bounded. so that f(x∗ ) ≤ limk→∞ f (xk ) = inf x∈ n f (x). for all k suﬃciently large. ∞] be a closed function. Then by applying Prop.E.1 to the extended realvalued function ˜ f (x) = f (x) ∞ if x ∈ X. Thus. and either (1) X is . and since by assumption this set is nonempty. {xk } must be bounded and the proof proceeds similar to the proof under condition (1). (2) dom(f ) is closed and f is coercive. f is lower semicontinuous at x∗ [cf. we are done. this sequence has at least one limit point x∗ [Prop.3. If inf x∈ n f(x) < γ. Since f is coercive. 1.Sec.5(a)]. otherwise. Q. the set of minima of f over n is x  f (x) ≤ γ .2(b)]. Assume condition (1). with no loss of generality. Let {xk } ⊂ dom(f ) be a sequence such that lim f (xk ) = infn f (x). Assume that one of the following three conditions holds: (1) dom(f ) is compact. f attains a minimum over n. Proof: If f (x) = ∞ for all x ∈ n . and x∗ is a minimum of f .9) we see that f attains a minimum over X if X is closed and f is lower ˜ semicontinuous over X (implying that f is closed). Since f is closed.3. Prop. {xk } must be bounded and the proof proceeds similar to the proof under condition (1). xk must belong to the set x  f (x) ≤ γ . then every x ∈ n attains the minimum of f over n . Assume condition (2).D. Assume condition (3). (3) There exists a scalar γ such that the set x  f(x) ≤ γ is nonempty and compact.1: (Weierstrass’ Theorem) Let f : n → (−∞. The most common application of the above proposition is when we want to minimize a realvalued function f : n → over a nonempty set X. Then. 1.
that x∗ is a local minimum that is not global. (1. 1 bounded or (2) f is coercive or (3) some level set x ∈ X  f (x) ≤ γ is nonempty and compact. 1).1. 1. If the domain of f is the set X (instead of n ). if a realvalued function f is upper semicontinuous at all points of a compact set X. For example. Proof of why local minima of convex functions are also global. by ˜ using the function f of Eq. f(x) αf(x*) + (1 . Note that with appropriate adjustments. Local maxima are deﬁned similarly.64 Convex Analysis and Optimization Chap. as shown in the following proposition and in Fig. ∞] over a closed convex set X .α)f(x) f (αx* + (1. we see that f attains a minimum over X if the set X ∩ dom(f) is compact or if f is coercive or if some level set x ∈ X  f (x) ≤ γ is nonempty and compact.1.9). f αx∗ + (1 − α)x ≤ αf(x∗ ) + (1 − α)f(x) < f (x∗ ).α)x ) x x* x Figure 1. . for all α ∈ (0.3. we also call x∗ a local minimum of f (without the qualiﬁer “over X”). f has strictly lower value than f(x∗ ) at every point on the line segment connecting x∗ with x.3. This contradicts the local minimality of x∗ over X. Then. where · is some vector norm. suppose that we want to minimize a closed convex function f : n → (−∞. and assume to arrive at a contradiction. then f attains a maximum over X. except at x∗ . For another example. Suppose that f is convex. Thus. the preceding analysis applies to the existence of maxima of f over X. We say that a vector x∗ ∈ X is a local minimum of f over X if there exists some > 0 such that f (x∗ ) ≤ f (x) for every x ∈ X satisfying x − x∗ ≤ . By convexity. Then there must exist an x ∈ X such that f(x) < f (x∗ ). An important implication of convexity of f and X is that all local minima are also global.
∞] is a convex function. Proof: See Fig. (y − z) (x − z) ≤ 0.3. i. Let f be strictly convex. PC (x) = arg min z − x . Q.2: If X ⊂ n is a convex set and f : X → (−∞.e. assume that two distinct global minima x and y exist.3: (Projection Theorem) Let C be a nonempty closed convex set and let · be the Euclidean norm. 1. C) = min z − x .3.e. 1. z∈C (b) For every x ∈ n. and is denoted by PC (x). This vector is called the projection of x on C. deﬁned by d(x. a vector z ∈ C is equal to PC (x) if and only if ∀ y ∈ C. Proposition 1.3. Since x and y are global minima.. (d) The distance function d : n ∀ x. i.1 for a proof that a local minimum of f is also global. and to obtain a contradiction.D. Then the midpoint (x + y)/2 must belong to X. there exists a unique vector z ∈ C that minimizes z − x over all z ∈ C. Furthermore.3. . 1. → . we obtain a contradiction.2 The Projection Theorem In this section we develop a basic result of analysis and optimization. PC (x) − PC (y) ≤ x − y . y ∈ n.. (a) For every x ∈ n . then a local minimum of f is also a global minimum.Sec. z∈C is convex. since X is convex.3 Convexity and Optimization 65 Proposition 1. If in addition f is strictly convex. (c) The function f : n → C deﬁned by f (x) = PC (x) is continuous and nonexpansive.E. the value of f must be smaller at the midpoint than at x and y by the strict convexity of f . then there exists at most one global minimum of f.
we have y − x 2 ≥ z − x 2 for all y ∈ C. we obtain PC (y) − PC (x) Similarly. we have = −2 x − z 2 α=0 + 2(x − z) (x − y) = −2(y − z) (x − z). Therefore.2). if (y − z) (x − z) > 0 for some y ∈ C.5(b)]. and for α > 0. consider any y ∈ C. x − PC (x) ≤ 0. deﬁne yα = αy + (1 − α)z. g is strictly convex and it follows that its minimum is attained at a unique point (Prop. x− . Adding these two inequalities. which is a compact set. Existence of a minimizing vector follows by Weierstrass’ Theorem (Prop. Viewing x − yα ∂ ∂α x − yα 2 as a function of α. To prove uniqueness. we obtain x − PC (x) − y + PC (y) ≤ 0. PC (x) − PC (y) PC (y) − PC (x) y − PC (y) ≤ 0. We have x − yα 2 = (1 − α)(x − z) + α(x − y) = (1 − α)2 x − z 2 2 2 2 + α2 x − y + 2(1 − α)α(x − z) (x − y). This contradicts the fact z = PC (x) and shows that (y − z) (x − z) ≤ 0 for all y ∈ C. Conversely. then ∂ ∂α x − yα 2 α=0 <0 and for positive but small enough α. we obtain x − yα < x − z . Therefore. 1. implying that z = PC (x). Since PC (y) ∈ C.3. the function g deﬁned by g(z) = z − x 2 is continuous.1).3. notice that the square of the Euclidean norm is a strictly convex function of its argument because its Hessian matrix is the identity matrix.2. 1. Minimizing x − z over all z ∈ C is equivalent to minimizing the same function over all z ∈ C such that x − z ≤ x − w . let z = PC (x). Furthermore. Therefore. From part (b). if z is such that (y − z) (x − z) ≤ 0 for all y ∈ C. 1. which is positive deﬁnite [Prop. (b) For all y and z in C we have y−x 2 = y −z 2+ z −x 2 − 2(y −z) (x−z) ≥ z −x 2 − 2(y −z) (x−z).66 Convex Analysis and Optimization Chap. (c) Let x and y be elements of n . 1 Proof: (a) Fix x and let w be some element of C. we have w−PC (x) PC (x) ≤ 0 for all w ∈ C.
we have PC (y) − PC (x) 2 ≤ PC (y) − PC (x) (y − x) ≤ PC (y) − PC (x) · y − x . showing that PC (·) is nonexpansive and a fortiori continuous. C > αd(x1. which is a convex set. z2 ∈ C such that d αx1 + (1 − α)x2 .3. 1) such that d αx1 + (1 − α)x2.3. discussed in Section 1.4.2 illustrates the necessary and suﬃcient condition of part (b) of the Projection Theorem. For each vector y ∈ C. equivalently. C).E.2.D.3 Convexity and Optimization 67 By rearranging and by using the Schwartz inequality. 1. C x P C(x) y Figure 1. Q.Sec. to arrive at a contradiction. Illustration of the condition satisﬁed by the projection PC (x). This contradicts the triangle inequality in the deﬁnition of norm. C > α z1 − x1 + (1 − α) z2 − x2 . x2 ∈ an α ∈ (0. Then there must exist z1 . . (d) Assume.2. that there exist x1 .3. This is the idea underlying the following proposition. The recession cone of the epigraph can be used to obtain the directions along which the function decreases monotonically. is also useful for analyzing the existence of optimal solutions of convex optimization problems. y − PC (x) x − PC (x) ≤ 0. C) + (1 − α)d(x2 . A key idea here is that a convex function can be described in terms of its epigraph.3 Directions of Recession and Existence of Optimal Solutions The recession cone. n and Figure 1. which implies that αz1 + (1 − α)z2 − αx1 − (1 − α)x2 > α x1 − z1 + (1 − α) x2 − z2 . 1. the vectors x − PC (x) and y − PC (x) form an angle greater than or equal to π/2 or.
and is denoted by Rf (see Fig. so by the Recession Cone Theorem (cf. where Repi(f ) is the recession cone of the epigraph of f . which is equal to (y.2.D. a)  x ∈ La = epi(f ) ∩ (x. w)  f (x) ≤ w .13(c) to the recession cone of epi(f ). Proof: From the formula for the epigraph epi(f ) = (x. (b) If one nonempty level set La is compact. This proves part (a).4: Let f : n → (−∞.2. Since f is closed. Thus we have (y. ∞] be a closed convex function and consider the level sets La = x  f(x) ≤ a . the recession cone of the set in the righthand side is equal to the intersection of the recession cone of epi(f ) and the recession cone of (x. then all nonempty level sets are compact. Part (b) follows by applying Prop. 1.2. .E. (a) All the level sets La that are nonempty have the same recession cone. given by RLa = y  (y. the horizontal subspace that passes through the origin. 0)  y ∈ RLa .68 Convex Analysis and Optimization Chap. we have (x.3. ∞]. ∀ x ∈ n. Prop. Thus Rf = y  f (x) ≥ f(x + αy). 0)  (y. The recession cone of the set in the lefthand side above is (y. 1. 0) ∈ Repi(f ) . 0) ∈ Repi(f ) . a)  x ∈ n . 0)  y ∈ RLa = (y. For a closed convex function f : n → (−∞. 1.3. its level sets are closed. Rf is also closed. Each y ∈ Rf is called a direction of recession of f. 1. By Prop. 0) ∈ Repi(f ) .13). α≥0 . it can be seen that for all a for which La is nonempty. 0)  y ∈ n . Q.13(e). a)  x ∈ n . from which it follows that RLa = y  (y.3). the (common) recession cone of the nonempty level sets of f is referred to as the recession cone of f . 1 Proposition 1.
or equivalently we must encounter exclusively points z with f (z) ≤ f (x).Sec.3.3. the two sets in the above intersection are closed. it has no nonzero direction of recession [Prop. we encounter a point z with f(z) > f (x).3. ∞] be a closed convex function such that f (x) < ∞ for at least one vector x ∈ X . In words.5: (Existence of Solutions of Convex Programs) Let X be a closed convex subset of n .4 and the exercises). . then y cannot be a direction of recession. It is easily seen via a convexity argument that once we cross the relative boundary of a level set of f we never cross it back again. it is not surprising that directions of recession play a prominent role in characterizing the existence of solutions of convex optimization problems. Since by assumption f ∗ < ∞. and note that X ∗ = X ∩ x  f(x) ≤ f ∗ . Proposition 1. a direction of recession of f is a direction of uninterrupted nonascent for f . Conversely. and X and f are closed. 1. 1. Proof: Let f ∗ = inf x∈X f (x). In view of these observations. if we start at some x ∈ n and while moving along a direction y. as shown in the following proposition.3. and let f : n → (−∞. If X ∗ is nonempty and compact. it follows that a direction that is not a direction of recession of f is a direction of eventual uninterrupted ascent of f . so X ∗ is closed as well as convex. Let X ∗ be the set of minimizing points of f over X. Then X ∗ is nonempty and compact if and only if X and f have no common nonzero direction of recession. Level Sets of Convex Function f The most intuitive way to look at directions of recession is from a descent viewpoint: if we start at any x ∈ n and move indeﬁnitely along a direction of recession y. It is the (common) recession cone of the nonempty level sets of f.3 Convexity and Optimization 69 0 Recession Cone R f Figure 1. and with a little thought (see Fig. we must stay within each level set that contains x. Recession cone Rf of a closed convex function f.
α2 ≥ α with α1 < α2 we have f (x + α1 y) < f (x + α2 y).4. i.. or f reaches a value that is less or equal to f(x) and stays at that value [ﬁgures (c) and (d)]. Ascent/descent behavior of a closed convex function starting at some x ∈ n and moving along a direction y. 1 f(x) f(x) α α (a) f(x + αy) f(x + αy) (b) f(x) f(x) α α (c) f(x + αy) f(x + αy) (d) f(x) f(x) α α (e) (f) Figure 1. respectively].13(c)].70 f(x + αy) Convex Analysis and Optimization f(x + αy) Chap. for some α ≥ 0 and all α1 . This is equivalent to X and f having no common nonzero direction of recession.2. . 1. If y is not a direction of recession of f. If y is a direction of recession of f . then eventually f increases monotonically to ∞ [ﬁgures (e) and (f)]. Therefore. there are two possibilities: either f decreases monotonically to a ﬁnite value or −∞ [ﬁgures (a) and (b).3. there is no nonzero vector in the intersection of the recession cones of X and x  f (x) ≤ f ∗ .e.
1. x}]. are orthogonal subspaces. is deﬁned by linear equality and inequality constraints). the minimum of f over X need not be attained. Xa is closed [since X is closed and x  f (x) ≤ a is closed by the closedness of f ].3. let a be a scalar such that the set Xa = X ∩ x  f (x) ≤ a is nonempty and has no nonzero direction of recession. Then.1) X ∗ is nonempty.E. but f is bounded below over X: f ∗ = inf f (x) > −∞.10) where Q is a positive semideﬁnite symmetric n × n matrix. If the closed convex set X and the closed convex function f of the above proposition have a common direction of recession. There are two possibilities: (a) For some x ∈ N (A) ∩ N (Q). X = (−∞. where xN ∈ N (Q) and . (1. 1. Generally. by Weierstrass’ Theorem (Prop.e. X ∗ . it follows that f becomes unbounded from below either along x or along −x.Sec. Let N (Q) and N (A) denote the nullspaces of Q and A. since f(αx) = αc x for all α ∈ . Another interesting question is what happens when X and f have a common direction of recession. call it y. x can be uniquely decomposed as xR + xN . 0] and f (x) = ex ] or else X ∗ is nonempty and unbounded [take for example. X = (−∞. the range of Q. we have x + αy ∈ X (since y is a direction of recession of X). and f (x + αy) is monotonically nondecreasing to a ﬁnite value as α → ∞ (since y is a direction of recession of f and f ∗ > −∞). consider the problem 1 minimize f (x) = c x + x Qx 2 subject to Ax = 0.2. / since N (Q) and R(Q). To understand the main idea. it turns out that the minimum is attained in an important special case: when f is quadratic and X is polyhedral (i. and A is an m × n matrix. 1. (b) For all x ∈ N (A) ∩ N (Q). x∈X Then for any x ∈ X. and by Prop.3 Convexity and Optimization 71 Conversely. For x ∈ N (A) such that x ∈ N (Q). we have c x = 0. and since X ∗ ⊂ Xa . Since X ∗ is closed it is compact. we see that X ∗ is bounded. c ∈ n is a given vector. 0] and f(x) = max{0. respectively. In this case.. Xa is compact.13(c). Then.D. we have c x = 0. we have f(x) = 0 for all x ∈ N (A)∩N (Q). then either X ∗ is empty [take for example. However. Since minimization of f over X and over Xa yields the same set of minima. Q.
where xR is the (nonzero) component of x along R(Q). On the other hand. In part (b) of the proposition. and view the problem as an unconstrained minimization over z of the cost function h(z) = f (Bz). We thus conclude that for f to be bounded from below along all directions in N (A) it is necessary and suﬃcient that c x = 0 for all x ∈ N (A) ∩ N (Q). with xR QxR > 0. so the proposition applies to linear programs as well. . boundedness from below of a convex cost function f along all directions of recession of a constraint set does not guarantee existence of an optimal solution. Note that the cost function may be linear. we allow linear inequality constraints.72 Convex Analysis and Optimization Chap. 1 xR ∈ R(Q). it is possible to use a transformation x = Bz where the columns of the matrix B are basis vectors for N (A). This argument indicates that problem (1. and we have f (x) = c x + (1/2)xR QxR . By using a translation argument. or even boundedness from below over the constraint set (see the exercises). this result can also be extended to the case where the constraint set is a general aﬃne set of the form {x  Ax = b} rather than the subspace {x  Ax = 0}. this will give us the opportunity to introduce a line of proof that we will frequently employ in other contexts as well. We can then argue that boundedness from below of this function along all directions z is necessary and suﬃcient for existence of an optimal solution. While we can prove the result by formalizing the argument outlined above. It follows that f is bounded below along all feasible directions x ∈ N (A). we will use instead a more elementary variant of this argument. whereby the constraints are eliminated via a penalty function. In part (a) of the following proposition we state the result just described (equality constraints only). However.10) has an optimal solution if and only if c x = 0 for all x ∈ N (A)∩N (Q). which is positive semideﬁnite quadratic. and we show that a convex quadratic program has an optimal solution if and only if its optimal value is bounded below. Hence f (αx) = αc x + (1/2)α2 xR QxR for all α > 0. since the constraint set N (A) is a subspace.
y ∈ N (A) ∩ N (Q).3 Convexity and Optimization 73 Proposition 1. and we must have f ∗ = −∞. 2 where Q is a positive semideﬁnite symmetric n×n matrix and c ∈ n is a given vector. (a) Let X = {x  Ax = b} and assume that X is nonempty. 2 If c y = 0. The following are equivalent: (i) f attains a minimum over X. We next show that (ii) implies (iii). We ﬁnally show that (iii) implies (i) by ﬁrst using a translation argument and then using a penalty function argument. (ii) f ∗ = inf x∈X f (x) > −∞. Then minimizing f over X is equivalent to minimizing f (x + y) over y ∈ N (A). then either limα→∞ f (x + αy) = −∞ or limα→−∞ f (x + αy) = −∞. (iii) c y ≥ 0 for all y ∈ N (Q) such that Ay ≤ 0. For all x ∈ X. (b) Let X = {x  Ax ≤ b} and assume that X is nonempty. and α ∈ . (iii) c y = 0 for all y ∈ N (A) ∩ N (Q). 2 . Choose any x ∈ X. (ii) f ∗ = inf x∈X f (x) > −∞.Sec. The following are equivalent: (i) f attains a minimum over X. Hence (ii) implies that c y = 0 for all y ∈ N (A) ∩ N (Q).6: (Existence of Solutions of Quadratic Programs) Let f : n → be a quadratic function of the form 1 f (x) = c x + x Qx. or minimize h(y) subject to Ay = 0. Proof: (a) (i) clearly implies (ii).3. where h(y) = f (x + y) = f (x) + 1 f(x) y + y Qy. respectively. Let also A be an m × n matrix and b ∈ m be a vector. Denote by N (A) and N (Q). so that X = x+ N (A). 1. the nullspaces of A and Q. we have x + αy ∈ X and 1 f (x + αy) = c (x + αy) + (x + αy) Q(x + αy) = f (x) + αc y.
all limit points y of the bounded ˆ ˆ sequence {ˆk } must be such that y Qˆ = 0 and Aˆ = 0. (1. (1. the sequence {hk (yk )} is bounded. call it y. by using the assumption c w = 0 [implying that Qx) w = 0].13). y∈ Ay=0 (1. and in view of Eq. so hk (z) attains a minimum over S. or k→∞ lim 1 f (x) + yk f (x) yk + yk ˆ 1 k yk Qˆk + ˆ y Aˆk y 2 2 2 = 0.74 Convex Analysis and Optimization Chap. hk is determined in terms of its restriction to the subspace S.13) z ∈ S.14). (1. we have hk (yk ) ≤ hk+1 (yk+1 ) ≤ inf h(y) ≤ f (x).. i. For this to be true. Thus {yk } is bounded and for any one of its limit ˆ ˆ points. f (x) w = (c + (1. inf hk (y) ≤ infn hk+1 (y) ≤ inf h(y) ≤ h(0) = f(x). and y∈ n k Ay 2 2 = f (x) + 1 k f (x) y + y Qy + Ay 2 . Ay=0 (1. (1. Indeed. 2 2 (1. Then. we have y ∈ S and lim sup hk (yk ) = f (x) + k→∞ 1 k f (x) y + y Qy + lim sup Ayk 2 k→∞ 2 2 ≤ inf h(y). so if {yk } were unbounded. we see from Eq. which is impossible y ˆ y y since y = 1 and y ∈ S. Ay=0 .e. where yk = yk / yk . 1 For any integer k > 0.11) that the function hk has no nonzero direction of recession in common with S.14) and we will use this relation to show that {yk } is bounded and each of its limit points minimizes h(y) subject to Ay = 0. let hk (y) = h(y) + Note that for all k hk (y) ≤ hk+1 (y). where w ∈ S ⊥ = N (A) ∩ N (Q). from Eq.11) ∀y∈ n. yk also attains the minimum of hk (y) over n .12) Denote S = N (A) ∩ N (Q) and write any y ∈ n ⊥ as y = z + w. then assuming without loss of generality that yk = 0. It can be seen from Eq.11) that hk (y) = hk (z + w) = hk (z). From Eq. (1.12). we would have hk (yk )/ yk → 0. call it yk .
i. since y ∈ N (A). let J (x) denote the index set of active constraints at x. and shows that problem (1. 1. we cannot have J ⊂ J unless J = J ]. For any x ∈ X . and the index set J (xk ) is equal for all k to a set J that is maximal over all such sequences [for any other sequence {xk } ⊂ X with f (xk ) → f ∗ and such that J (xk ) = J for all k. γ > 0.3 Convexity and Optimization 75 It follows that Ay = 0 and that y minimizes h(y) over Ay = 0. to come to a contradiction. This implies that the vector y = x + y minimizes f(x) subject to Ax = b. J (x) = {j  aj x = bj }. The sequence {xk }. j ∈ J. f (xk ) → f ∗ [since f (xk ) ≤ f(xk )]. where the aj are the rows of A.. where A is the matrix having as rows the aj . ∀ γ > 0. Consider the line {xk + γy  γ > 0}. call it x. For any sequence {xk } ⊂ X with f (xk ) → f ∗ .15) has an optimal solution. Since xk is a feasible solution of problem (1. we have f (xk + γy) = f (xk ) + γc y. where xk = xk + γ k y. we show that (iii) implies (i) by using the corresponding result of part (a). we select a sequence {xk } ⊂ X such that f(xk ) → f ∗ . (1. we can extract a subsequence such that J (xk ) is constant and equal to some J . ∀ k. This contradicts the maximality of J . Since y ∈ N (Q). that this problem does not have a solution. (b) Clearly (i) implies (ii).Sec. and the active index set J (xk ) strictly contains J for all k. we have c y < 0 for some y ∈ N (A) ∩ N (Q). we have f(x) ≤ f (xk ).15). and similar to the proof of part (a). satisﬁes {xk } ⊂ X. . Furthermore. we have aj (xk + γy) = bj . j ∈ J .e. Accordingly. by part (a).15) Assume. (ii) implies that c y ≥ 0 for all y ∈ N (Q) with Ay ≤ 0. so the line {xk + γy  γ > 0} crosses the relative boundary of X for some γ k > 0. Consider the problem minimize f (x) subject to aj x = bj . so that f(x) ≤ f ∗ . Then. ∀ j ∈ J. Finally. so that f(xk + γy) < f (xk ). We must also have aj y > 0 for at least one j ∈ A [otherwise (iii) would / be violated].
Within this context. . x 1. . gj (x) ≤ 0. by showing that x ∈ X.4 Existence of Saddle Points . j = 1. . and we wish to either minimize sup φ(x. Assume. 1 We will now show that x minimizes f over X. Here. . . z∈Z particularly its directional derivative. which contradicts the maximality of J . (2) Exact penalty functions. that x ∈ X . m}. . Z is the ﬁnite set {1. which can be used for example to convert constrained optimization problems of the form minimize f(x) subject to x ∈ X. x]. whereby we view z as a parameter and we wish to minimize over x a cost function. . .16) . Q. it is important to provide characterizations of the max function max φ(x. where the fi are some given functions. We have that J (ˆk ) strictly contains J for all k. These problems are encountered in at least three major optimization contexts: (1) Worstcase design. Let xk be a point in the interval connecting xk and x that belongs / ˆ to X and is closest to x. . where we discuss the diﬀerentiability properties of convex functions. . . f(x) = f(xk ). A special case of this is the discrete minimax problem. thereby completing the proof.7.76 Convex Analysis and Optimization Chap. z). x Thus f (ˆk ) → f ∗ . z) z∈Z subject to x ∈ X or maximize inf φ(x.E. z) x∈X subject to z ∈ Z.D. to arrive at a contradiction. where X ⊂ n. assuming the worst possible value of x. We do this in Section 1.3. . r (1. fm (x) . where we want to minimize over x ∈ X max f1 (x). x Since f (x) ≤ f (xk ) and f is convex over the interval [xk . Suppose that we are given a function φ : X × Z → Z ⊂ m . it follows that f (ˆk ) ≤ max f (xk ).
x∈X z∈Z (1. . we will see that a major question is whether there is no duality gap. This is a major issue in duality theory because it connects the primal and the dual problems [cf. Thus the primal and the dual problems (1. µ). we have supµ≥0 L(x. (3) Duality theory.16) and (1. we introduce the. µ) = f (x) + j=1 µj gj (x) r.18). so called. (1. Eqs. z) = inf sup φ(x. µ) µ≥0 subject to x ∈ X [if x violates any of the constraints gj (x) ≤ 0. .Sec. where c is a large positive penalty parameter. involving the vector µ = (µ1 .17) inf L(x. . . Lagrangian function r L(x. z). In particular.17)] through their optimal values and optimal solutions. we have supµ≥0 L(x. . gr (x) subject to x ∈ X. µ) = ∞.16) as an example. i. This conversion is useful for both analytical and computational purposes.18) and that the inf and the sup above are attained. where using problem (1. µ) subject to µ ≥ 0.e.3 Convexity and Optimization 77 to (less constrained) minimax problems of the form minimize f (x) + c max 0.16) and (1. µ) = f(x)].17) can be viewed in terms of a minimax problem. We will now derive conditions guaranteeing that z∈Z x∈X sup inf φ(x. the Saddle Point Theorem. as well as the attainment of the .19) We will prove in this section one major result. 1. The original (primal) problem (1. and if it does not. when we discuss duality in Chapter 3. . . This is so if and only if µ≥0 x∈X sup inf L(x.. g1 (x). whether the optimal primal and dual values are equal. x∈X µ≥0 (1. µr ) ∈ maximize x∈X and the dual problem (1. and will be discussed in Chapters 2 and 4. which guarantees the equality (1. .16) can also be written as minimize sup L(x. µ) = inf sup L(x.
the expected amount to be paid by the ﬁrst player to the second is a x z or x Az. z) ≤ inf x∈X supz∈Z φ(x. z) ≤ inf sup φ(x. † The Saddle Point Theorem is also central in game theory. 1 inf and sup. which is more relevant to duality theory and gives conditions guaranteing the minimax equality (1. If moves i and j are selected by the ﬁrst and the second player. The players use mixed strategies. i. the ﬁrst player gives a speciﬁed amount aij to the second. . z) x∈X (1.j ij i j If each player adopts a worst case viewpoint.78 Convex Analysis and Optimization Chap. We will state another major result.18) is that we always have the inequality z∈Z x∈X sup inf φ(x. write inf x∈X φ(x. the Saddle Point Theorem is only partially adequate for the development of duality theory. . . compactness of X turn out to be restrictive assumptions [for example Z corresponds to the set {µ  µ ≥ 0} in Eq. Suppose that x∗ is an optimal solution of the problem minimize sup φ(x. A ﬁrst observation regarding the potential validity of the minimax equality (1. although it does not guarantee the attainment of the inf and sup. whereby he optimizes his choice against the worst possible selection by the other player.† Unfortunately. which is not compact]. when we discuss duality and make a closer connection with the theory of Lagrange multipliers. zm ) over his m possible moves. implying that there is an amount that can be meaningfully viewed as the value of the game for its participants. (1. the ﬁrst player must minimize maxz x Az and the second player must maximize minx x Az. there are two players: the ﬁrst may choose one out of n moves and the second may choose one out of m moves.22) subject to z ∈ Z. In the simplest type of zero sum game. Since the probability of selecting i and j is xi zj .20) [for every z ∈ Z.18). . as we now brieﬂy explain. assuming convexity/concavity assumptions on φ and (essentially) compactness assumptions on X and Z. special conditions are required to guarantee equality. xn ) over his n possible moves and the second player selects a probability distribution z = (z1 . to some extent. z) and take the supremum over z ∈ Z of the lefthand side]. where A is the n × m matrix with elements aij . The objective of the ﬁrst player is to minimize the amount given to the other player. because compactness of Z and. a special case of the existence result we will prove shortly. x∈X z∈Z (1. The main result. whereby the ﬁrst player selects a probability distribution x = (x1 . respectively.19). However. z). . We will also give additional theorems of the minimax type in Chapter 3. . and the objective of the second player is to maximize this amount. the Minimax Theorem. . . . is that these two optimal values are equal.21) subject to x ∈ X and z ∗ is an optimal solution of the problem maximize inf φ(x. z) z∈Z (1.
Note also that if the minimax equality (1. there is no saddle point.18)] holds. z∈Z x∈X (1. We have thus shown that if the minimax equality (1. then the deﬁnition (1.20) guarantee that the minimax equality (1. Proposition 1.21) and (1. We summarize the above discussion in the following proposition. ∀ x ∈ X.21) and (1. respectively. if (x∗ . z ∗ ) ≤ φ(x∗ .22).25) A pair of vectors x∗ ∈ X and z ∗ ∈ Z satisfying the two above (equivalent) relations is called a saddle point of φ (cf. z). and x∗ and z ∗ are optimal solutions of problems (1.Sec. z) = φ(x∗ . z∈Z x∈X This. to form a saddle point.24) implies that x∈X z∈Z inf sup φ(x. One can visualize saddle points in terms of the sets of minimizing points over X for ﬁxed z ∈ Z and maximizing points over Z for ﬁxed x ∈ X: ˆ X(z) = x  x minimizes φ(x.23) If the minimax equality [cf. z) over X . then equality holds throughout above. respectively.18) does not hold. respectively. Conversely.3. z ∗ ) = inf φ(x. (1. Fig. when nonempty . respectively.3 Convexity and Optimization 79 Then we have z∈Z x∈X sup inf φ(x. so that sup φ(x∗ . z ∗ ) ≤ sup φ(x∗ .23). z ∗ ) is a saddle point.21) and (1. 1. where X ∗ and Z ∗ are the sets of optimal solutions of problems (1. respectively. z) ≤ sup inf φ(x. z) ≤ φ(x∗ . z) = inf sup φ(x.7: A pair (x∗ .22).22).18) holds.18) holds and from Eq. Note a simple consequence of the above proposition: the set of saddle points. together with the minimax inequality (1. z ∗ ) is a saddle point of φ if and only if the minimax equality (1. z ∗ ). even if the sets X ∗ and Z ∗ are nonempty.24) or equivalently φ(x∗ .21) and (1.5). z ∈ Z. (1. is the Cartesian product X ∗ ×Z ∗ . x∗ and z ∗ are optimal solutions of problems (1. Eq. z).22). z ∗ ) ≤ φ(x. (1.18) holds. z ∗ ). 1. ˆ ˆ . In other words x∗ and z ∗ can be independently chosen within the sets X ∗ and Z ∗ . x∈X z∈Z x∈X z∈Z (1. form a saddle point. z) = inf φ(x. any vectors x∗ and z ∗ that are optimal solutions of problems (1.3.
z(x)) Saddle point (x*. z∈Z In the case illustrated. Visually. z∈Z x∈X or equivalently. 1. respectively. if (x∗ . z ∗ ) lies on both curves [x∗ = x(z ∗ ) and z ∗ = ˆ(x∗ )].80 Convex Analysis and Optimization Chap. z ∗ ) ≤ φ x. z). z ∗ ). we also have max φ x(z).5.7). z (completely. z (x) must lie “above” the ˆ curve of minima φ x(z). z ∗ ) = min max φ(x.3. so we view x(z) and ˆ(x) as (singlevalued) functions. By deﬁnition. ˆ(x) .z*) Curve of minima ^ φ(x(z). z ∈ Z (see Prop. z). z) over x ∈ X and z ∈ Z.e. At ˆ z such a pair. z) = φ(x∗ . but the ˆ two curves should meet at (x∗ . ˆ x∈X z ˆ(x) = arg max φ(x. for all x ∈ X and z ∈ Z). z ∗ ) ˆ z is a saddle point if and only if max φ(x∗ . z) = 1 (x2 + 2xz − z 2 ). x(z) and ˆ(x) consist of unique minimizing and maximizˆ z ing points. z and φ x. z = max min φ(x. Let 2 x(z) = arg min φ(x. the curve of maxima φ x. 1 φ(x. ˆ z z∈Z z∈Z x∈X x∈X z∈Z x∈X so that φ x(z).z) x z Figure 1. z ∗ ) = min φ(x. ˆ z ∀ x ∈ X. z ∗ ). z) = φ(x∗ . z ≤ φ(x∗ . a pair (x∗ . ˆ(x) . . Illustration of a saddle point of a function φ(x. ˆ z otherwise x(z) and ˆ(x) should be viewed as setvalued mappings. We consider ˆ z the corresponding curves φ x(z). the function plotted here is φ(x.3. ˆ(x) .z) Curve of maxima ^ φ(x. i.. z) = min φ x.
z ∗ ) is a saddle point if and only if ˆ ˆ it is a “point of intersection” of X(·) and Z(·) in the sense that ˆ x∗ ∈ X(z ∗ ). is realvalued and the maximum above is attained for each x ∈ X at a point denoted z (x).Sec. 1. z). Proof: We ﬁrst prove the result under the assumption that X and Z are compact [condition (1)]. z ∗ ) for all x ∈ X. and let φ : X ×Z → be a function. z) : X → is convex and lower semicontinuous. z) ≤ φ(x∗ . z ∗ ) = f (x∗ ). ·) : Z → is concave and upper semicontinuous. the function f (x) = max φ(x. ·) is strictly concave for each x ∈ X.3(c). Assume that for each z ∈ Z. and the additional assumption that φ(x. We have the following classical result. Proposition 1. z∈Z x ∈ X. and for each x ∈ X. z ∗ ) ≤ φ(x. ˆ Z(x) = z  z maximizes φ(x. the function φ(·. respectively. . z) over Z .2. the function φ(x. it will suﬃce to show that φ(x∗ . so that ˆ φ(x∗ . (4) There exist vectors x ∈ X and z ∈ Z such that φ(·. z) and −φ(x. Let z ∗ = z (x∗ ). By Weierstrass’ Theorem (Prop. and in view of the above relation.5. ∀ z ∈ Z. (1. by Prop.3 Convexity and Optimization 81 The deﬁnition implies that the pair (x∗ . f is lower semicontinuous. which is unique by the strict concavity assumption. z ∗ ) is a saddle point of φ. f attains a minimum over x ∈ X at some point x∗ . convex subsets of n and m .8: (Saddle Point Theorem) Let X and Z be closed. (3) X is compact and there exists a vector x ∈ X such that −φ(x. (2) Z is compact and there exists a vector z ∈ Z such that φ(·. Then there exists a saddle point of φ under any one of the following four conditions: (1) X and Z are compact.3. so again by Weierstrass’ Theorem.3. ·) are coercive. 1. ˆ Furthermore. ˆ z ∗ ∈ Z(x∗ ). We now consider conditions that guarantee the existence of a saddle point. 1. z) is coercive. ˆ ˆ see Fig. 1.1).26) We will show that (x∗ . ·) is coercive.3.
zk ). . (1. (1. and using the upper semicontinuity of φ(x∗ . z ∗ ) is a saddle point of φ. and using the upper semicontinuity of φ(x. zk ) ≤ φ(x∗ . Combining this last relation with Eq. . zk ). ·) over Z. and ˆ also. we have φ(x∗ . . we obtain f(x∗ ) ≤ lim sup φ(x∗ . and it follows that φ(x∗ . k∈K z∈Z Hence equality holds throughout above. . ∀ x ∈ X.28) Taking the limit as k → ∞ and k ∈ K. we have f (x∗ ) ≤ f (xk ) = φ(xk . z ∗ ). the corresponding sequence {zk } [cf.28). 1 1 φ(x. zk ) ≤ 1 1 φ(x. ·). we see that z = z (x∗ ) = z ∗ . Since z ∗ is the unique maximizer of φ(x∗ . zk ) ≤ φ(x∗ . . Taking the limit as zk → z ∗ .27) Let z be any limit point of {zk } corresponding to a subsequence K of positive integers. z ∗ ) ≤ φ(x. satisﬁes Eq.82 Convex Analysis and Optimization Chap. ∀ x ∈ X.27)]. z ∗ ) = f (x∗ ). (1. k k = 1.28).26). Combining this relation with Eq. in view of Eq. z ∗ ). k→∞. . (1. zk = z (xk ). 1 Choose any x ∈ X and let xk = 1 1 x+ 1− k k x∗ . zk ). we remove the assumption of strict concavity of φ(x∗ . independently of the choice of the vector x within X (this is the ﬁne point in the argument where the strict concavity assumption is needed). ˆ k = 1. z) = f (x∗ ). z) ≤ φ(x∗ . ·). we obtain φ(x∗ . converges to the vector z ∗ = z (x∗ ). k = 1. Next. (1. . 2. Using the convexity of φ(·. We have shown that for any x ∈ X. (1. we see that (x∗ . z) ≤ max φ(x∗ . 2. . z) = φ(x. . satisﬁes φ(x∗ . z) = f (x∗ ) for all z ∈ Z. . We introduce the functions φk : X × Z → given by φk (x. z ∗ ) ≤ φ(x. z) − 1 z 2. . so that {zk } has z ∗ as its unique limit ˆ point. zk ) + 1 − k k φ(x∗ .26). z ∗ ) ≤ or φ(x∗ . Eq. zk ) + 1 − k k φ(x∗ . ∀ x ∈ X. (1. ·). 2.
z) → ∞ and from Eq. we obtain φ(x∗ .30) Assume that condition (2) holds. z)− k 1 z k 2 ∗ ≤ φ(x∗ . {zk } is bounded. ·). (3). zk )− k 1 ∗ z k k 2 ∗ ≤ φ(x. this contradicts the boundedness of {zk }. max{ x . a pair (xk . k k→∞ k→∞ (1. zk ) such that φ(xk . ·) is strictly concave. z ∈ Z. z } if condition (4) holds. z) ≤ lim inf φ(x∗ . Choose k large enough so that if condition (2) holds. Zk = z ∈ Z  z ≤ k .31) . zk ). z) ≤ lim sup φ(x. z) ≤ φ(xk . z) would imply that φ(xk . z ∗ ). We now prove the existence of a saddle point under conditions (2).30). zk ). there exists (based on what has already ∗ been proved) a saddle point (x∗ . the sets Xk and Zk are nonempty. z) ≤ lim inf φ(xk . z) ≤ lim sup φ(x. it follows that φ(x∗ . 1. we introduce the convex and compact sets Xk = x ∈ X  x ≤ k . Furthermore. By the upper semicontinuity of φ(x. ∀ x ∈ X. i. z ∗ ). (1. (1. Taking limit as k → ∞ in Eq. and (xk . zk ) ≤ φ(x. there exists a saddle point over Xk × Zk . zk ) → ∞ for all x ∈ X. ∀ x ∈ X. If {xk } were unbounded. maxz∈Z z k ≥ maxx∈X x if condition (3) holds. z ∗ ) is a saddle point of φ. z ∈ Z.3 Convexity and Optimization 83 Since φk (x∗ . (1. By taking limit as k → ∞ and by k using the semicontinuity assumptions on φ. zk ) ≤ φ(x.30) it would follow that φ(x. z ∈ Zk . or (4). so (x∗ .29) By setting alternately x = x∗ and z = z ∗ in the above relation. z) ≤ φ(x∗ . Note that for k ≥ k. ∀ x ∈ Xk . z ∗ ). Hence {xk } must be bounded. and using the semicontinuity assumptions on φ. z ∈ Z. zk ) must have a limit point (x∗ .Sec.. Then.e. zk ) ≤ φ(x. ∀ x ∈ X. k k ∗ Let (x∗ . k→∞ k→∞ (1. z ∗ ) be a limit point of (x∗ . z ∗ ). for each k ≥ k. Z = Zk under condition (2) and X = Xk under condition (3). z ∈ Z. satisfying k φ(x∗ . z ∗ ) ≤ φ(x. zk )− 1 ∗ 2 z . it follows that ∗ φ(x∗ . Using the result already shown under condition (1). For each integer k. the coercivity of φ(·. since Z is compact. and Xk is nonempty under condition (2) while Zk is nonempty under condition (3). ∀ x ∈ X. zk ) of φk .
z≥0 x≥0 We also have for all x ≥ 0 sup e− z≥0 √ x1 x2 + zx1 = 1 ∞ if x1 = 0. we have x≥0 inf e− √ x1 x2 + zx1 = 0. z) ≤ φ(xk . Unfortunately this is not so for reasons that also touch upon some of the deeper aspects of duality theory (see Chapter 3).84 Convex Analysis and Optimization Chap. which together with the upper semicontinuity of φ(x. since the expression in braces is nonnegative for x ≥ 0 and can approach zero by taking x1 → 0 and x1 x2 → ∞. ·). zk ) → ∞. z ∗ ).5). Here is an example: Example 1. If {xk } were unbounded. Q. A symmetric argument with the obvious modiﬁcations. if x1 > 0.18).3.3. Hence sup inf φ(x. zk ) must have a limit point (x∗ . Thus (xk . The assumptions of compactness/coercivity and lower/upper semicontinuity of φ are essential for existence of a saddle point (just as they are essential in Weierstrass’ Theorem). z ∗ ) is a saddle point of φ. z) → ∞ and hence φ(x.1 Let X = {x ∈ and let 2  x ≥ 0}. √ Z = {z ∈ x1 x2  z ≥ 0}. note that Eq. which satisfy the convexity/concavity and lower/upper semicontinuity assumptions of Prop. under condition (4). Hence {xk } must be bounded. the coercivity of φ(·. z) = e− + zx1 . (1.D. z) would imply that φ(xk . The result then follows from Eq. It is easy to construct examples showing that the convexity of X and Z are essential assumptions for the above proposition (this is also evident from Fig. An interesting question is whether convexity/concavity and lower/upper semicontinuity of φ are suﬃcient to guarantee the minimax equality (1.8. . 1. zk ). z) = 0.30) yields for all k.31). shows the result under condition (3). 1. and a symmetric argument shows that {zk } must be bounded. For all z ≥ 0.3. Finally. φ(xk .E. similar to the case where condition (2) holds. 1 By alternately setting x = x∗ and z = z ∗ in the above relation. φ(x. (1. ·) violates the coercivity of −φ(x. we see that (x∗ . zk ) ≤ φ(x.
. Then. The diﬃculty here is that the compactness/coercivity assumptions of Prop.18) without guaranteeing the existence of a saddle point.8 are violated.32) Assume that −∞ < p(0) < ∞. the properties of the function p around u = 0 are critical. As an illustration. aside from convexity and semicontinuity assumptions. z) = inf sup φ(x. On the other hand. and let φ : X × Z → be a function. so inf x≥0 supz≥0 φ(x. if u = 0. Assume that for each z ∈ Z. x∈X z∈Z (1. The above theorem indicates that.1. ∞] be deﬁned by p(u) = inf sup {φ(x. The proof of the proposition requires the machinery of hyperplanes.e. ·) : Z → is concave and upper semicontinuous. together with one additional assumption.3. the minimax equality holds. . note that in Example 1. z) > supz≥0 inf x≥0 φ(x. Proposition 1. z∈Z x∈X sup inf φ(x. z).3 Hence Convexity and Optimization 85 x≥0 z≥0 inf sup φ(x.9: (Minimax Theorem) Let X and Z be nonempty convex subsets of n and m . and for each x ∈ X. z) : X → is convex and lower semicontinuous. i. it is possible to guarantee the minimax equality (1.3.3. the minimax equality implies that there is no duality gap.Sec. if u > 0. with convexity/concavity and lower/upper semicontinuity of φ. 1. Let the function p : m → [−∞. the function φ(·. and will be given at the end of the next section. and that p(0) ≤ lim inf k→∞ p(uk ) for every sequence {uk } with uk → 0. z). given in the following proposition. the function φ(x. z) − u z}. z) = 1. p is given by p(u) = inf sup e− x≥0 z≥0 √ x1 x2 + z(x1 − u) = ∞ 1 0 if u < 0. x∈X z∈Z Within the duality context discussed in the beginning of this subsection. respectively. 1.
1. and A is a matrix.3. The signiﬁcance of the function p within the context of convex programming and duality will be elaborated on and clariﬁed further in Chapter 4. which is compact if C is compact.1 Let f : n → be a convex function.4. c and b are given vectors. 1.86 Convex Analysis and Optimization Chap. Show that: (a) For every > 0 there exists a δ > 0 such that every vector x ∈ X with f(x) ≤ f ∗ + δ satisﬁes minx∗ ∈X ∗ x − x∗ ≤ .2 Let C be a convex set and S be a subspace. p is not lower semicontinuous at 0. let X be a closed convex set. 1 Thus it can be seen that even though p(0) is ﬁnite. Is the projection of C always closed if C is closed? 1. the proof of the above theorem. EXE RC ISES 1. 1. and assume that f and X have no common direction of recession.2.3.6 to the case where the quadratic cost may not be convex. As a result the assumptions of Prop. Show that projection on S is a linear transformation and use this to show that the projection of the set C on S is a convex set. Let X ∗ be the optimal solution set (nonempty and compact by Prop.5) and let f ∗ = inf x∈X f (x). given in Section 1.3. Consider a problem of the form 1 x Qx 2 subject to Ax ≤ b.9 are violated and the minimax equality does not hold.3. Furthermore.3 (Existence of Solution of Quadratic Programs) This exercise deals with an extension of Prop. Show that the following are equivalent: . (b) Every sequence {xk } ⊂ X satisfying lim k→∞ f (xk ) → f ∗ is bounded and all its limit points belong to X ∗ .3.3. minimize c x + where Q is a symmetric (not necessarily positive semideﬁnite) matrix. 1. will indicate some common circumstances where p has the desired properties around u = 0.
z) = x2 + z 2 over X = [0.. ·) is maximized over Z at a unique point denoted z ˆ(x). and consider the problem of minimizing f over a closed and convex set X . and (b) as the intersection of all halfspaces containing the set. x2 )  x2 ≤ x2 . the function φ(x. 1. assume that for ˆ each x ∈ X. Show that φ has a unique saddle point (x∗ . revolve around the use of hyperplanes. z) is minimized over X at a unique point denoted x(z). Show that f (x) = f (x∗ ) for all x ∈ C.4 Let C ⊂ n be a nonempty convex set. and let f : n → be an aﬃne function such that inf x∈C f (x) is attained at some x∗ ∈ ri(C). respectively. Thus any closed convex set can be described in dual fashion: (a) as the union of all points contained in the set. (b) The cost function is bounded below over the constraint set. 1. Show that we may have inf x∈X f (x) = −∞. 1. 1.5 (Existence of Optimal Solutions) Let f : n → be a convex function. Assume further that the functions x(z) and ˆ(x) are continuous over Z ˆ z and X. once the function is speciﬁed in terms of its . Use this to investigate the existence of saddle points of φ(x. 1]. Suppose that f attains a minimum along all half lines of the form {x + αy  α ≥ 0} where x ∈ X and y is in the recession cone of X. where C = (x1 .Sec. (n − 1)dimensional aﬃne sets. including duality.4 HYPERPLANES Some of the most important principles in convexity and optimization. i. Hint : Use the case n = 2.e. z ∗ ). the function φ(·. and for any feasible x. (c) The problem has at least one feasible solution. X = 2 . Similarly.6 (Saddle Points in two Dimensions) Consider a function φ of two real variables x and z taking values in compact intervals X and Z. respectively. A hyperplane has the property that it divides the space into two halfspaces. 1 1. We will see shortly that a distinguishing feature of a closed convex set is that it is the intersection of all the halfspaces that contain it. This fundamental principle carries over to a closed convex function. Assume that for each z ∈ Z.3. f (x) = minz∈C z − x 2 − x1 . there is no y ∈ n such that Ay ≤ 0 and either y Qy < 0 or y Qy = 0 and (c + Qx) y < 0.3.4 Hyperplanes 87 (a) There exists at least one optimal solution.3. 1] and Z = [0.
The vector a is called the normal vector of the hyperplane (it is orthogonal to the diﬀerence x − y of any two vectors x and y that belong to the hyperplane).1. a = 0. The proof is based on the Projection Theorem and is illustrated in Fig. 1. . a Positive Halfspace {x  a'x≥ b} Negative Halfspace {x  a'x≤ b} Hyperplane {x  a'x = b} (a) C C1 x a a (b) (c) C2 Figure 1.2. where a ∈ n .4. and we apply it to a construction that is central in duality theory. We have the following result. as illustrated in Fig. 1.4.1. which is also illustrated in Fig. In this section. 1 (closed and convex) epigraph.88 Convex Analysis and Optimization Chap. we develop the principle just outlined. 1. (b) Geometric interpretation of the Supporting Hyperplane Theorem. The two sets {x  a x ≥ b}.1. We then use this construction to prove the minimax theorem given at the end of the preceding section.1 Support and Separation Theorems A hyperplane is a set of the form {x  a x = b}. {x  a x ≤ b}. An equivalent deﬁnition is that a hyperplane in n is an aﬃne set of dimension n − 1. (a) A hyperplane {x  a x = b} divides the space into two halfspaces as illustrated. 1. are called the halfspaces associated with the hyperplane (also referred to as the positive and negative halfspaces.4.4.4. respectively). (c) Geometric interpretation of the Separating Hyperplane Theorem. and b ∈ .
These hyperplanes “converge” to a hyperplane that supports C at x.Sec. the hyperplane that is orthogonal to the line segment connecting xk and its projection. . x x ˆ x x x x We can write this inequality as ak x ≥ ak xk . If xk is the projection of xk on cl(C). We choose a sequence {xk } of vectors not belonging to the closure of C which converges to x. for each k. . 1. (1. 1. . we have by part (b) of ˆ the Projection Theorem (Prop. Let {xk } be a sequence of vectors not belonging to cl(C).3) (ˆk − xk ) (x − xk ) ≥ 0.2. 1. and passes through xk . . 1. where ak = ∀ x ∈ cl(C). there exists a vector a = 0 such that a x ≥ a x.4.34) ∀ x ∈ cl(C). such a sequence exists because x does not belong to the interior of C. and we project xk on the closure of C. We then consider. ∀ x ∈ C.4. which converges to x. xk − xk ˆ . x ˆ Hence we obtain for all x ∈ cl(C) and k. .1: (Supporting Hyperplane Theorem) If C ⊂ n is a convex set and x is a point that does not belong to the interior of C. xk − xk ˆ (1. which is a convex set by Prop.1(d). Illustration of the proof of the Supporting Hyperplane Theorem for the case where the vector x belongs to the closure of C.2. Proposition 1.33) Proof: Consider the closure cl(C) of C.3. k = 0. (ˆk −xk ) x ≥ (ˆk −xk ) xk = (ˆk −xk ) (ˆk −xk )+(ˆk −xk ) xk ≥ (ˆk −xk ) xk .4 Hyperplanes 89 C x2 x1 x0 x x3 a0 x3 a1 a2 x2 x1 x0 Figure 1.
x2 ∈ C2 }.D.2: (Separating Hyperplane Theorem) If C1 and C2 are two nonempty. Prop. 1. and convex subsets of n . (1. a vector a = 0 such that a x1 ≤ a x2 .D. x2 ) by Weierstrass’ Theorem (cf.1).36) Proof: Consider the problem minimize subject to x1 ∈ C1 .. (1. Q.. a vector a = 0 and a scalar b such that a x1 < b < a x2 . (1. we obtain Eq.35). 2 x= x1 + x2 .2. i. This problem is equivalent to minimizing the distance d(x2 . and hence the sequence {ak } has a subsequence that converges to a nonzero limit a.e. and the distance function is convex (cf.3: (Strict Separation Theorem) If C1 and C2 are two nonempty. disjoint.E.3. 2 b = a x. 1. x2 ∈ C2.3) and hence continuous (cf. and convex sets such that C1 is closed and C2 is compact. Prop. which is equivalent to Eq.35) Proof: Consider the convex set C = {x  x = x2 − x1 .33). (1.90 Convex Analysis and Optimization Chap. x1 − x2 . By considering Eq.E. there exists a hyperplane that separates them. 1. x2 ∈ C2 .34) for all ak belonging to this subsequence and by taking the limit as k → ∞. Since C2 is compact. ∀ x1 ∈ C1 . Proposition 1. the problem has at least one solution (x1 .3. Q. so by the Supporting Hyperplane Theorem there exists a vector a = 0 such that 0 ≤ a x. x1 ∈ C1 . there exists a hyperplane that strictly separates them.e. (1. ∀ x1 ∈ C1. C1) over x2 ∈ C2 . Since C1 and C2 are disjoint. Proposition 1. x2 ∈ C2 .12). Let a= x2 − x1 . the origin does not belong to C.4. Prop. i. ∀ x ∈ C. disjoint. 1 We have ak = 1 for all k.4.
1. (a) Illustration of the construction of a strictly separating hyperplane of two disjoint closed convex sets C1 and C2 one of which is also bounded (cf. a = 0.3. Thus. ∀ x1 ∈ C 1 .x  x ∈ C1. Prop. In particular. there may not exist a strictly separating hyperplane.E. since x1 ∈ C1 .4. C1 ) can approach 0 arbitrarily closely as x2 ranges over C2 . The hyperplane {x  a x = b} contains x. or equivalently.4. and x2 is the projection of x on C2 (see Fig.ξ2) ξ1 > 0. and C1 and C2 are disjoint. . (1. This is due to the fact that the distance d(x2 . The righthand side is proved similarly. 1. The preceding proposition may be used to provide a fundamental characterization of closed convex sets. x2 ∈ C2} 2 1 = {(ξ1.3). 1. ξ2 >0.D.4. but can never reach 0.ξ2) ξ1 < 0} a (a) (b) Figure 1. a closed and convex set is the intersection of the halfspaces that contain it. C1 = {(ξ1. 1. we have (x − x1 ) (x1 − x1 ) ≤ 0.4 Hyperplanes 91 Then.3(b).ξ2) ξ1 ≤ 0} C1 x1 x x2 C2 C2 = {(ξ1. By Prop.3).3. the lefthand side of Eq. Q.36) is proved. Proposition 1. x2 ∈ C2 . and it can be seen from the preceding discussion that x1 is the projection of x on C1 . a x1 ≤ a x1 = a x + a (x1 − x) = b − a 2 ∀ x1 ∈ C1 < b.Sec. since x − x1 = a. ξ1ξ2 ≥ 1} C = {x1 .4: The closure of the convex hull of a set C ⊂ n is the intersection of the halfspaces that contain C.4. (b) An example showing that if none of the two sets is compact.
1 Proof: Assume ﬁrst that C is closed and convex. Let x ∈ C. On the other hand. We prove this fact in greater generality in the following proposition. β = 0). If f (x) > −∞ for all x. w) (µ. Each halfspace H that contains C must also contain conv(C) (since H is convex). Hence.D. w)  w ≥ f (x)} is a subset of n+1 . or else it does not cross it at all.4. and also cl conv(C) (since H is closed).3) to the sets C and {x}.2 Nonvertical Hyperplanes and the Minimax Theorem In the context of optimization theory. Hence the intersection of all halfspaces containing C and the intersection of all halfspaces containing cl conv(C) coincide. β) is nonvertical (i. Thus the result is proved for the case where C is closed and convex. which will also be useful as a ﬁrst step in the subsequent development. ξ). w) is any vector on the hyperplane.E. the latter intersection is equal to cl conv(C) . Note also that if a hyperplane with normal (µ. If (x. Vertical lines in n+1 are sets of the form {(x. It can be seen that vertical hyperplanes. where µ ∈ n and β ∈ .92 Convex Analysis and Optimization Chap. From what has been proved for the case of a closed convex set. where x is a ﬁxed vector in n . and it appears plausible that epi(f ) is contained in some halfspace corresponding to a nonvertical hyperplane. 1.. the crossing point has the form (0. the normal vector of a hyperplane is a nonzero vector of the form (µ. see Fig. it can be seen that if the hyperplane is vertical.4. then it crosses the (n + 1)st axis (the axis associated with w) at a unique point. 1. Applying the Strict Separation Theorem / (Prop.4. as well as the corresponding halfspaces. it either contains the entire (n + 1)st axis. consist of the union of the vertical lines that pass through their points. supporting hyperplanes are often used in conjunction with epigraphs of functions f on n .4. β). 1. we see that there exists a halfspace containing C but not containing x. . β) = (x. we have (0. Note that f is bounded if and only if epi(f ) is contained in a halfspace corresponding to a horizontal hyperplane. proving that C contains that intersection. so we focus on proving the reverse inclusion.e. Since epi(f) = {(x. w)  w ∈ }.37) since from the hyperplane equation. β (1. We say that the hyperplane is horizontal if µ = 0 and we say that it is vertical if β = 0. ξ) (µ. then x cannot belong / to the intersection of the halfspaces containing C. Then C is contained in the intersection of the halfspaces that contain C. if x ∈ C. β). where ξ= µ x + w. Consider the case of a general set C. Q. then epi(f ) cannot contain a vertical line.
and γ ∈ such that µ u + βw ≥ γ.4. (a) By Prop.4. w) from C. 1. the proof for the case where C is not closed will readily follow by replacing C with cl(C). Proof: We ﬁrst note that if C contains no vertical lines. ∀ (u. Proposition 1. that is.Sec. w). 1.0) (µ. i ∈ I. C is the intersection of all halfspaces that contain it. we would have C = ∩i∈I (u. w) ∈ C.4. If every hyperplane containing C in one of its halfspaces were vertical.4 Hyperplanes w (µ. β = 0.β) __ (x. w)  w ∈ also belongs to C. i ∈ I. Thus if we prove the result assuming that C is closed. as shown in the exercises to Section 1. there exists a nonvertical hyperplane strictly separating (u. . which implies that cl(C) contains no vertical lines. where u ∈ w∈ . then C is contained in a halfspace corresponding to a nonvertical hyperplane. w) does not belong to the closure of C. there exists (µ. and scalars γi . the vertical line (u. Then for every (u. (b) If C contains no vertical lines and (u. β ∈ . n and (a) If C contains no vertical lines. since the recession cones of cl(C) and ri(C) coincide. β) such that µ ∈ n . w) ∈ C.w) _ _ (µ/β) x + w 93 Vertical Hyperplane Nonvertical Hyperplane x Figure 1.4. w)  µi u ≥ γi for a collection of nonzero vectors µi .5: Let C be a nonempty convex subset of and let the points of n+1 be denoted by (u.4.2. Hence we assume without loss of generality that C is closed. Illustration of vertical and nonvertical hyperplanes in n+1 . then ri(C) contains no vertical lines. n+1 .
1 It follows that if no vertical line belongs to C. (b) Max Crossing Point Problem: Consider nonvertical hyperplanes that contain S in their corresponding “upper” halfspace. β) that strictly separates (u. β) and γ. there exists a hyperplane strictly separating (u. we are done. > 0 and adding it to the preceding (µ + µ) u + βw > γ + γ. If this hyperplane is nonvertical. Prop. we want to ﬁnd one whose (n + 1)st component is minimum. Since γ > olµ u. This is particularly so in a construction involving two simple optimization problems.94 Convex Analysis and Optimization Chap. ∀ (u. w) from C (cf.. we obtain (µ + µ) u + βw > (µ + µ) u + βw. (b) Since (u. w) does not belong to C. such that γ + γ > (µ + µ) u + βw. w) ∈ C. there exists a nonvertical hyperplane containing C. These problems will prove particularly relevant in the context of duality. and will also form the basis for the proof of the Minimax Theorem. Min Common/Max Crossing Duality Hyperplanes allow insightful visualization of duality concepts. stated at the end of the preceding section. w) ∈ C.e. we obtain ∀ (u. there is a small enough ∀ (u. 1. (a) Min Common Point Problem: Among all points that are common to both S and the (n + 1)st axis. so that for some (µ. the halfspace that contains the vertical halﬂine (0.D. so assume otherwise. Let S be a subset of n+1 and consider the following two problems. Then we have a nonzero vector µ and a scalar γ such that µ u > γ > µ u.E. i.3). From the above two relations.4. w) ∈ C. w)  w ≥ 0 in its recession cone . implying that there is a nonvertical hyperplane with normal (µ + µ. we have µ u + βw ≥ γ. ∀ (u. which we now discuss. Consider a nonvertical hyperplane containing C in one of its subspaces. w) from C. w) ∈ C. with β = 0. Q. By multiplying this relation with any relation.
5 suggests that the optimal value of the max crossing point problem is no larger than the optimal value of the min common point problem. when S is “extended upwards” along the (n + 1)st axis it yields the set S= (u. In (a). w)  there exists w with w ≤ w and (u.5. 1. w) on the vertical axis with w < w∗ that lie in the closure of S. the set S is convex but not closed. which is closed and convex. Figure 1.4. Here q ∗ is equal to the minimum such value of w. The min . w) ∈ S .Sec. Illustration of the optimal values of the min common and max crossing problems. We want to ﬁnd the maximum crossing point of the (n + 1)st axis with such a hyperplane. As a result.4 Hyperplanes _ M 95 Min Common Point w* M Min Common Point w* M Max Crossing Point q* 0 0 Max Crossing Point q* (a) (b) Min Common Point w* S Max Crossing Point q* 0 _ M M (c) Figure 1. In (b). the two optimal values are equal. In (c). and that under favorable circumstances the two optimal values are equal. We now formalize the analysis of the two above problems and provide conditions that guarantee equality of their optimal values. (see Fig. and we have q ∗ < w∗ .4.5). 1. the two optimal values are not equal. and there are points (0.4.
40) . w) ∈ S.w)∈S n. β = 0. we see that the max crossing point problem can be expressed as maximize ξ subject to ξ ≤ w + µ u. w∗ = (0. µ∈ n. w) ∈ S. and its optimal value is denoted by w ∗ . Combining the two above equations. the max crossing point problem is given by maximize q(µ) subject to µ ∈ where q(µ) = (u. 1 common point problem is minimize w subject to (0. ∀ (u. β). w) is any vector that lies on the hyperplane [cf.e.39) inf {w + µ u}. µ∈ n. multiplication of its normal vector (µ. or equivalently. Hence. i. 1). the set of nonvertical hyperplanes can be equivalently described as the set of all hyperplanes with normals of the form (µ. (1. Thus. (1. ∀ (u. we must have ξ = w + µ u ≤ w + µ u.96 Convex Analysis and Optimization Chap.37)]. w) ∈ S. where (u.w)∈S (1.38) inf w. maximize ξ subject to ξ ≤ (u. by a nonzero scalar maintains the normality property. (1.w)∈S inf {w + µ u}. A hyperplane of this type crosses the (n + 1)st axis at ξ = w + µ u. Eq.. Given a nonvertical hyperplane in n+1 . In order for the upper halfspace corresponding to the hyperplane to contain S.
there are several intermediate cases. w)  there exists w with w ≤ w and (u. Proof: We ﬁrst note that (0. so if in addition S is closed and convex. Between the “unfavorable” case where q ∗ < w∗ . (2) The set S = (u. since by the deﬁnition of w ∗ .w)∈S n. w ∗ ) is seen to be a closure point of the set S.41) i. Note also that if −∞ < w ∗ < ∞.Sec.4. 1. there exists a sequence (0. w∗ ≤ lim inf wk .6 (Min Common/Max Crossing Theorem): Consider the min common point and max crossing point problems. and the “most favorable” case where q ∗ = w∗ while the optimal values q ∗ and w ∗ are attained. w) ∈ S is convex. and assume the following: (1) −∞ < w ∗ < ∞. and admits a nonvertical supporting hyperplane at (0. then we have q ∗ = w∗ and the optimal values q ∗ and w ∗ are attained. 1. wk ) ⊂ S ⊂ S such that wk → w ∗ .5. then (0. The following proposition provides assumptions that guarantee the equality q ∗ = w∗ . as suggested by Fig.4. w ∗ ). q ∗ = sup q(µ).e. Proposition 1.w)∈S inf w = w∗. (3) For every sequence (uk . we obtain (1. we have inf {w + µ u} ≤ (0.. the max crossing point is no higher than the min common point. µ∈ n Note that for every (u. but not necessarily the attainment of the optimal values. so by taking the supremum of the lefthand side over µ ∈ q∗ ≤ w∗. . k→∞ Then we have q ∗ = w ∗ . w) ∈ S and every µ ∈ q(µ) = (u. wk ) ⊂ S with uk → 0.4 Hyperplanes 97 We denote by q ∗ the corresponding optimal value. w∗ ) is a closure point of S. n.
In view of the fact q ∗ ≤ w∗ . z) : X → is convex and lower semicontinuous. wk ) ⊂ ri(S) converging to (0. which holds always [cf.4. As shown above. 1 We next show by contradiction that S does not contain any vertical lines. there is a sequence (uk . the function φ(x. it follows that for any > 0. that there exists a hyperplane separating (0.E. assume that X and Z are nonempty convex subsets of n and m . φ : X × Z → is a function such that for each z ∈ Z. z) = inf sup φ(x. 1.5(b). Since can be arbitrarily small. the vector (0.3. S does not contain any vertical lines. the result follows.2). Furthermore. by using Prop. it follows that q ∗ ≥ w∗ . since S is convex. Q. respectively. and for each x ∈ X.41)]. Since (0. this contradicts assumption (3). In particular.9).D. z) − u z} x∈X z∈Z hold. w∗ ). w∗ ) is a closure point of S. Eq. In view of the deﬁnition of S. w∗ − ) does not belong to the closure of S.9 regarding the function p(u) = inf sup {φ(x. wk −γ) belongs to ri(S) for each ﬁxed γ > 0. so the sequence (uk . the function φ(·. the direction (0. (1. there exists a sequence (uk . . and must also be less or equal to the optimal value q ∗ of the max crossing point problem. for any γ > 0. −γ)  γ ≥ 0 would be a direction of recession of cl(S) and hence also a direction of recession of ri(S) (see the exercises in Section 1. z). using condition (3) and the fact that w ∗ is the optimal value of the min common point problem. Prop. If this were not so. we will show that sup inf φ(x. It follows.3. Proof of the Minimax Theorem We will now prove the Minimax Theorem stated at the end of last section (cf. ·) : Z → is concave and upper semicontinuous. 1. Since uk → 0. which must lie between w∗ − and w ∗ . so that lim inf k→∞ wk ≤ w ∗ − γ. 1. w∗ − ) from S. Then if the additional assumptions of Prop. w k ) ⊂ S with w k ≤ wk − γ for all k.98 Convex Analysis and Optimization Chap. This hyperplane crosses the (n + 1)st axis at a unique point. x∈X z∈Z z∈Z x∈X We ﬁrst prove the following lemma.
u2 ). and consider the function p deﬁned by p(u) = inf sup {φ(x. and let φ : X × Z → be a function. we have αx1 + (1 − α)x2 ∈ X for all α ∈ [0. u1 ) + (1 − α)l(x2 . Using k k also the convexity of φ(·. z) − u z . z) − u z}. u2 ∈ P . z) − u2 z k k = Since αl(x1 . k k it follows that p αu1 + (1 − α)u2 ≤ αp(u1 ) + (1 − α)p(u2 ). we have p αu1 + (1 − α)u2 ≤ l αx1 + (1 − α)x2 . αu1 + (1 − α)u2 k k = sup φ αx1 + (1 − α)x2 . Proof: Let u1 ∈ P and u2 ∈ P . z) + (1 − α)φ(x2 . the function φ(·.4 Hyperplanes 99 Lemma 1. 1] and all u1 . u2 ) → αp(u1 ) + (1 − α)p(u2 ).Sec.4. z) − u1 z + (1 − α) sup φ(x2 . u2 ) → p(u2 ). Then the set P = u  p(u) < ∞ is convex and p satisﬁes p αu1 + (1 − α)u2 ≤ αp(u1 ) + (1 − α)p(u2 ) for all α ∈ [0. u1 ) → p(u1 ). 1. k By convexity of X. z) : X → is convex. u). Assume that for each z ∈ Z. k . u) = supz∈Z φ(x. z − αu1 + (1 − α)u2 z k k z∈Z ≤ sup αφ(x1 . z∈Z αl(x1 . 1]. x∈X (1. x∈X z∈Z ∀u∈ m. k l(x2 .42) where l(x. u1 ) k z∈Z + (1 − α)l(x2 . we have that there exist sequences {x1 } ⊂ X and {x2 } ⊂ X such that k k l(x1 . z) − αu1 + (1 − α)u2 z k k z∈Z ≤ α sup φ(x1 . z) for each z ∈ Z. Rewriting p(u) as p(u) = inf l(x. respectively.1: Let X and Z be nonempty convex subsets of n and m .
100
Convex Analysis and Optimization
Chap. 1
Furthermore, since u1 and u2 belong to P , the lefthand side of the above relation is less than ∞, so αu1 + (1 − α)u2 ∈ P for all α ∈ [0, 1], implying that P is convex. Q.E.D. Note that Lemma 1.4.1 allows the possibility that the function p not only takes ﬁnite values and the value ∞, but also the value −∞. However, under the additional assumption of Prop. 1.3.9 that p(0) ≤ lim inf k→∞ p(uk ) for all sequences {uk } with uk → 0, we must have p(u) > −∞ for all u ∈ n . Furthermore, p is is an extendedreal valued convex function, which is lower semicontinuous at u = 0. The reason is that if p(u) = −∞ for some u ∈ m , then from Eq. (1.42), we must have p(αu) = −∞ for all α ∈ (0, 1], contradicting the assumption that p(0) ≤ lim inf k→∞ p(uk ) for all sequences {uk } with uk → 0. Thus, under the assumptions of Prop. 1.3.9, p maps n into (−∞, ∞], and in view of Lemma 1.4.1, p is an extendedreal valued convex function. Furthermore, p is lower semicontinuous at u = 0 in view again of the assumption that p(0) ≤ lim inf k→∞ p(uk ) for all sequences {uk } with uk → 0. We now prove Prop. 1.3.9 by reducing it to an application of the Min Common/Max Crossing Theorem (cf. Prop. 1.4.6) of the preceding section. In particular, we show that with an appropriate selection of the set S, the assumptions of Prop. 1.3.9 are essentially equivalent to the corresponding assumptions of the Min Common/Max Crossing Theorem. Furthermore, the optimal values of the corresponding min common and max crossing point problems are q ∗ = sup inf φ(x, z),
z∈Z x∈X
w ∗ = inf sup φ(x, z).
x∈X z∈Z
We choose the set S (as well the set S) in that theorem to be the epigraph of p, i.e., S = S = (u, w)  u ∈ P, p(u) ≤ w , which is convex in view of the convexity of the function p shown above. Thus assumption (2) of the Min Common/Max Crossing Theorem is satisﬁed. We clearly have w ∗ = p(0), so the assumption −∞ < p(0) < ∞ of Prop. 1.3.9 is equivalent to assumption (1) of the Min Common/Max Crossing Theorem. Furthermore, the assumption that p(0) ≤ lim inf k→∞ p(uk ) for all sequences {uk } with uk → 0 is equivalent to the last remaining assumption (3) of the Min Common/Max Crossing Theorem. Thus the conclusion q ∗ = w ∗ of the theorem holds. From the deﬁnition of p, it follows that w ∗ = p(0) = inf sup φ(x, z).
x∈X z∈Z
Sec. 1.4
Hyperplanes
101
Thus, to complete the proof of Prop. 1.3.9, all that remains is to show that for the max crossing point problem corresponding to the epigraph S of p, we have q ∗ = sup inf {w + µ u} = sup inf φ(x, z).
µ∈ m (u,w)∈S z∈Z x∈X
To prove this, let us write for every µ ∈ q(µ) = =
(u,w)∈S
m,
inf {w + µ u} inf {w + µ u}
(u,w)u∈P, p(u)≤w
= inf p(u) + µ u
u∈P
From the deﬁnition of p, we have for every µ ∈ q(µ)= inf p(u) + u µ ,
u∈P
m,
= inf inf sup φ(x, z) + u (µ − z)
u∈P x∈X z∈Z x∈X u∈P z∈Z
= inf inf sup φ(x, z) + u (µ − z) . We now show that q(µ) is given by q(µ) = Indeed, if µ ∈ Z, then
z∈Z
inf x∈X φ(x, µ) if µ ∈ Z, −∞ if µ ∈ Z. /
(1.43)
sup φ(x, z) + u (µ − z) ≥ φ(x, µ), inf sup φ(x, z) + u (µ − z) ≥ φ(x, µ),
∀ x ∈ X, ∀ u ∈ P, ∀ x ∈ X. (1.44)
so
u∈P z∈Z
To show that we have equality in Eq. (1.44), we deﬁne the function rx (z) : m → (−∞, ∞] as rx (z) = −φ(x, z) if z ∈ Z, +∞ otherwise,
which is closed and convex for all x ∈ X by the given concavity/upper semicontinuity assumptions, so the epigraph of rx (z), epi rx , is a closed and convex set. Since µ ∈ Z, the point µ, rx (µ) belongs to epi(rx ). For some > 0, we consider the point µ, rx (µ) − , which does not belong to epi(rx ). Since rx (z) > −∞ for all z, the set epi(rx ) does not contain any vertical lines. Therefore, by Prop. 1.4.5(b), for all x ∈ X, there exists a
102
Convex Analysis and Optimization
Chap. 1
nonvertical hyperplane with normal (u, ζ) with ζ = 0, and a scalar c such that u µ + ζ rx (µ) − < c < u z + ζw, ∀ (z, w) ∈ epi(rx ).
Since w can be made arbitrarily large, we have ζ > 0, and we can take ζ = 1. In particular for w = rx (z), with z ∈ Z, we have u µ + rx (µ) − or equivalently, φ(x, z) + u (µ − z) < φ(x, µ) + , Letting
u∈ m z∈Z
< c < u z + rx (z),
∀ z ∈ Z, ∀ z ∈ Z.
→ 0, we obtain
z∈Z
inf sup φ(x, z)+u (µ−z) ≤ sup φ(x, z)+u (µ−z) ≤ φ(x, µ), ∀ x ∈ X
which together with Eq. (1.44) implies that
u∈ m z∈Z
inf sup φ(x, z) + u (µ − z) = φ(x, µ),
∀ x ∈ X.
Therefore, if µ ∈ Z, q(µ) has the form of Eq. (1.43). If µ ∈ Z, we consider a sequence {wk } that goes to −∞. Since µ ∈ Z, / / the sequence of points (µ, wk ) does not belong to the epi(rx ). Therefore, similar to the argument above, there exists a sequence of nonvertical hyperplanes with normals (uk , 1) such that φ(x, z) + uk (µ − z) < wk , so that
u∈ m z∈Z
∀ z ∈ Z, ∀ k,
inf sup φ(x, z) + u (µ − z) ≤ sup φ(x, z) + uk (µ − z) ≤ wk , ∀ k.
z∈Z
Taking the limit in the above equation along the relevant subsequence, we obtain infm sup φ(x, z) + u (µ − z) = −∞, ∀ x ∈ X,
u∈ z∈Z
which proves that q(µ) has the form given in Eq. (1.43) and it follows that q ∗ = sup inf φ(x, z).
z∈Z x∈X
EXE RC ISES
Sec. 1.5
Conical Approximations
103
1.4.1 (Strong Separation)
Let C1 and C2 be two nonempty, convex subsets of n , and let B denote the unit ball in n . A hyperplane H is said to separate strongly C1 and C2 if there exists an > 0 such that C1 + B is contained in one of the open halfspaces associated with H and C2 + B is contained in the other. Show that: (a) The following three conditions are equivalent: (i) There exists a hyperplane strongly separating C1 and C2 . (ii) There exists a vector b ∈
n
such that inf x∈C1 b x > supx∈C2 b x.
(iii) inf x1 ∈C1 , x2 ∈C2 x1 − x2 > 0. (b) If C1 and C2 are closed and have no common directions of recession, there exists a hyperplane strongly separating C1 and C2 . (c) If the two sets C1 and C2 have disjoint closures, and at least one of the two is bounded, there exists a hyperplane strongly separating them.
1.4.2 (Proper Separation)
Let C1 and C2 be two nonempty, convex subsets of n . A hyperplane H is said to separate properly C1 and C2 if C1 and C2 are not both contained in H . Show that the following three conditions are equivalent: (i) There exists a hyperplane properly separating C1 and C2 . (ii) There exists a vector b ∈
x∈C1 n
such that sup b x > inf b x,
x∈C1 x∈C2
inf b x ≥ sup b x,
x∈C2
(iii) The relative interiors ri(C1 ) and ri(C2 ) have no point in common.
1.5 CONICAL APPROXIMATIONS AND CONSTRAINED OPTIMIZATION Optimality conditions for constrained optimization problems revolve around the behavior of the cost function at points of the contraint set around a candidate for optimality x∗ . Within this context, approximating the constraint set locally around x∗ by a conical set turns out to be particularly useful. In this section, we develop this approach and we derive necessary conditions for constrained optimality. We ﬁrst introduce some basic notions relating to cones. Given a set C, the cone given by C ∗ = {y  y x ≤ 0, ∀ x ∈ C},
Illustration of the polar cone C ∗ of a set C ⊂ 2 . if C is closed and convex. then . The following basic result generalizes the equality C = (C ⊥ )⊥ .5. ∗ Similarly. the polar cone C ∗ . a1 and a2 . In particular. 1. µ2 ≥ 0}. Conversely.1: (Polar Cone Theorem) (a) For any set C. y ∈ cl(C) and C ∗ ⊂ cl(C) . If C is a subspace. µ1 ≥ 0.5. is closed and convex (regardless of whether C is closed and/or convex). In (a) C consists of just two points. we have C ∗ = cl(C) ∗ = conv(C) ∗ = cone(C) . being the intersection of a collection of closed halfspaces. Conversely.1.104 Convex Analysis and Optimization Chap. The polar cone C ∗ is the same in both cases. it can be seen that the polar cone C ∗ is equal to the orthogonal subspace C ⊥ . Hence. while in (b) C is the convex cone {x  x = µ1 a1 + µ2 a2 . if y ∈ C ∗ . then (C ∗ )∗ = C. we have conv(C) ⊂ C ∗ .1). Clearly. if y ∈ C ∗ . 1 is called the polar cone of C (see Fig. so that y x ≤ 0 for all limits ∗ ∗ x of such sequences. Proposition 1. we have (C ∗ )∗ = cl conv(C) . then y xk ≤ 0 for all k and all sequences {xk } ⊂ C. a1 C a1 C a2 a2 0 C∗ 0 C∗ (a) (b) Figure 1. which holds in the case where C is a subspace. we have cl(C) ⊂ C ∗ .5. ∗ (b) For any cone C. ∗ Proof: (a) Clearly.
Therefore. we have x y ≤ 0. implying that z z z z (C ∗ )∗ ⊂ C. z = ˆ and z ∈ C. and by using part (a) in the lefthand side above.3). 1. Therefore.3.5. Proof of the Polar Cone Theorem for the case where C is a closed and convex cone. Since C is z closed. the projection exists by the Projection Theorem (Prop. as shown in the ﬁgure. which when added z ˆ to (z − ˆ) ˆ = 0 yields z − ˆ 2 ≤ 0. x C ^ z ^ 2z 0 z. From this it follows that cl conv(C) ∗ ∗ = cl conv(C) .Sec.5. To prove the reverse inclusion. and let ˆ be the unique projection of z on C. ˆ z Combining the last two relations. C ⊂ (C ∗ )∗ . we obtain (C ∗ )∗ = cl conv(C) . Q.2 shows that if C is closed and convex. then (C ∗ )∗ = C. Hence. it is seen that (z − z ) ˆ = 0. 1. (b) Figure 1.E. we obtain (z − z ) z ≤ 0.5 Conical Approximations 105 y x ≤ 0 for all x ∈ C so that y z ≤ 0 for all z that are convex combinations ∗ ∗ of vectors x ∈ C. A nearly ∗ identical argument also shows that C ∗ = cone(C) . By taking in the preceding relation x = 0 and x = 2ˆ (which belong to C since C z is a closed cone). z (z − ˆ) ∈ C ∗ .^ z C∗ z y Figure 1. which also implies that (z − ˆ) (x − ˆ) ≤ 0. The analysis of a constrained optimization problem often centers on how the cost function behaves along directions leading away from a local . then for all y ∈ C ∗ .D. which implies that x ∈ (C ∗ )∗ . z z ∀ x ∈ C. If x ∈ C. and since z ∈ (C ∗ )∗ .2. take z ∈ (C ∗ )∗ . Hence y ∈ conv(C) and C ∗ ⊂ conv(C) . we obtain (z − ˆ) x ≤ 0 for all x ∈ C.
2: Given a subset X of n and a vector x ∈ X. Deﬁnition 1. It can be seen that if X is convex. when X is nonconvex. locally near a point of interest. a feasible direction of X at x is a vector y ∈ n such that there exists an α > 0 with x + αy ∈ X for all α ∈ [0. Illustration of a tangent y at a vector x ∈ X. the normalized direction of y. The next deﬁnition introduces a cone that provides information on the local structure of X even when there is no nonzero feasible direction. There is a sequence {xk } ⊂ X that converges to x and is such that the normalized direction sequence (xk − x)/ xk − x converges to y/ y . α]. The set of all feasible directions of X at x is a cone denoted by FX (x). The sets of the relevant directions constitute cones that can be viewed as approximations to the constraint set.1: Given a subset X of n and a vector x ∈ X. the cone of feasible directions need not provide interesting information about the local structure of the set X near the point x. often there is no nonzero feasible direction at x when X is nonconvex [think of the set X = x  h(x) = 0 . the sequence yk = y (xk − x) xk − x Tangent at x illustrated in the ﬁgure converges to y.3. X x + y k1 x k x + yk x + y k+1 x k1 x x+y Ball of radius y Figure 1. where h : n → is a nonlinear function].5. and is denoted by TX (x).5.5. Deﬁnition 1. which are important in connection with optimality conditions. Let us introduce two such cones. xk − x y The set of all tangents of X at x is a cone called the tangent cone of X at x. . or equivalently.106 Convex Analysis and Optimization Chap. 1 minimum to some neighboring feasible points. However. the feasible directions at x are the vectors of the form α(x − x) with α > 0 and x ∈ X. a vector y is said to be a tangent of X at x if either y = 0 or there exists a sequence {xk } ⊂ X such that xk = x for all k and xk → x. For example. xk − x y → .
5. In (a).1) x1 (a) (b) x = (0.1) TX(x) x1 Figure 1. 1. the normalized direction of y (see Fig. take αk = → 0. (xk − x)/αk y .5. and y = 0. 2) and (−1.3).2) TX(x) = cl(FX(x)) x = (0. Proof: If {xk } is the sequence xk − x / y . and a positive sequence {αk } such that αk → 0 and (xk − x)/αk → y. (x1 − 1)2 − x2 ≤ 0 . Here the set X is nonconvex.4. Conversely. FX (x) consists of just the zero vector. Proposition 1. Note.5. that the vectors (1. 1. and TX (x) is closed but not convex. Here X is convex and the tangent cone TX (x) is equal to the closure of the cone of feasible directions FX (x) (which is an open set in this example). x2 x2 (1. we have X= (x1 . x2 )  (x1 + 1)2 − x2 (x1 − 1)2 − x2 = 0 . In (b). (xk − x)/αk → y. 1). we have X= (x1 . which is occasionally more convenient. but are not feasible directions. x2 )  (x1 + 1)2 − x2 ≤ 0. clearly (xk − x)/αk y → . Examples of the cones FX (x) and TX (x) of a set X at the vector x = (0. 2) (as well the origin) belong to TX (x) and also to the closure of FX (x).5 Conical Approximations 107 Thus a nonzero vector y is a tangent at x if it is possible to approach x with a feasible sequence {xk } such that the normalized direction sequence (xk − x)/ xk − x converges to y/ y .Sec. Furthermore. if αk xk → x and xk − x = xk − x in the deﬁnition of a tangent.2: A vector y is a tangent of a set X at a vector x ∈ X if and only if there exists a sequence {xk } ⊂ X with xk → x. however. The following proposition provides an equivalent deﬁnition of a tangent.
.. the result follows.3: Let X be a nonempty subset of n and let x be a vector in X. and we have cl FX (x) = TX (x). i→∞ xi − x yk  k lim i For k = 1. 1 so {xk } satisﬁes the deﬁnition of a tangent. . so assume that y = 0. The following hold regarding the cone of feasible directions FX (x) and the tangent cone TX (x). (a) TX (x) is a closed cone. TX (x) is closed. ik y yk  y xk − x yk  i i so the fact yk → y. . Proof: (a) Let {yk } be a sequence in TX (x) that converges to y.5.4 illustrates the cones TX (x) and FX (x) with examples. < ik and lim xkk − x = 0.E. i k→∞ k→∞ lim i xkk xkk − x − x − yk = 0.5. choose an index ik such that i1 < i2 < . Q. then FX (x) and TX (x) are convex. . (b) Clearly every feasible direction is also a tangent. k xi − x yk k = . If y = 0. The following proposition gives some of the properties of the cones FX (x) and TX (x). yk  We also have for all k xkk − x i i xkk − x − y xkk − x yk yk y ≤ − + − . Proposition 1. and the preceding two relations imply that k→∞ lim xkk − x = 0. so FX (x) ⊂ TX (x). .D. i k→∞ lim i xkk xkk − x − x − y = 0. Since by part (a). y Hence y ∈ TX (x) and TX (x) is closed. . for every yk there is a sequence {xi } ⊂ X with k xi = x such that k i→∞ lim xi = x. (b) cl FX (x) ⊂ TX (x). By the deﬁnition of a tangent. 2. We will show that y ∈ TX (x). Figure 1. then y ∈ TX (x). (c) If X is convex.108 Convex Analysis and Optimization Chap.
Q. and let x∗ be a local minimum of f over a set X ⊂ n . f (x∗ ) = 0.E.2. ∀ y ∈ TX (x∗ ). the feasible directions at x are the vectors of the form α(x − x) with α > 0 and x ∈ X. In view of part (b).4: Let f : n → be a smooth function. the direction (xk − x)/αk is feasible at x for all k.Sec. Then f (x∗ ) y ≥ 0. x where xk is a vector that lies on the line segment joining xk and x∗ . reduces to Proof: Let y be a nonzero tangent of X at x∗ . 1. and in the case where X = n. let {xk } be a sequence in X and {αk } be a positive sequence such that αk → 0 and (xk − x)/αk → y. Com˜ bining the last two equations.D. we obtain f (xk ) = f(x∗ ) + where yk = y + y ξk . From this it can be seen that FX (x) is convex. Let y ∈ TX (x) and. and xk → x∗ . Hence y ∈ cl FX (x) . Convexity of TX (x) will follow from the convexity of FX (x) once we show that cl FX (x) = TX (x).5. and it follows that TX (x) ⊂ cl FX (x) . Then there exists a sequence {ξk } and a sequence {xk } ⊂ X such that xk = x∗ for all k. this condition can be equivalently written as f(x∗ ) (x − x∗ ) ≥ 0. Since X is a convex set. ξk → 0. ∗ xk − x y f (˜k ) (xk − x∗ ). The tangent cone ﬁnds an important application in the following basic necessary condition for local optimality: Proposition 1.5. ∀ x ∈ X.5 Conical Approximations 109 (c) If X is convex. it will suﬃce to show that TX (x) ⊂ cl FX (x) .45) . we have for all k f (xk ) = f (x∗ ) + xk − x∗ y = + ξk . 1. By the Mean Value Theorem. xk − x∗ y f (˜k ) yk . If X is convex. x (1. using Prop.
If X = n . 1 If f (x∗ ) y < 0. by setting x = x∗ + ei and x = x∗ − ei . where ei is the ith unit vector (all components are 0 except for the ith. Q. ∀ x ∈ X. we have z ∈ NX (x) if there exist sequences {xk } ⊂ X and {zk } such that xk → x. n. z)  x ∈ X . . Note that the necessary condition of Prop. we have cl FX (x) = TX (x) (cf. in the sense that we have (by Taylor’s theorem) f (x∗ + αy) = f (x∗ ) + α f (x∗ ) y + o(α) < f (x∗ ) for suﬃciently small but positive α. it follows that for all suﬃ˜ ciently large k.E. denoted by NX (x). viewed as a pointtoset mapping.4 can equivalently be written as − f (x∗ ) ∈ TX (x∗ )∗ (see Fig. Prop.5.5. is the intersection of the closure of the graph of TX (·)∗ with the set (x. . there is no descent direction within the tangent cone TX (x∗ ). zk → z.3). 1. there exists a smooth function f such that − f (x∗ ) = z and x∗ is a local minimum of f over X. (1. The Normal Cone In addition to the cone of feasible directions and the tangent cone. f(˜k ) yk < 0 and [from Eq. namely that given any vector z ∈ TX (x∗ )∗ .110 Convex Analysis and Optimization Chap. which is 1). there is one more conical approximation that is of special interest for the optimization topics covered in this book. the graph of NX (·). Thus the condition shown can be written as f (x∗ ) y ≥ 0. Equivalently. and obtained from the polar cone TX (x)∗ by means of a closure operation. . . There is an interesting converse of this result. A direction y for which f (x∗ ) y < 0 may be viewed as a descent direction of f at x∗ . ∀ y ∈ cl FX (x) .D. 1. In particular.4 says that if x∗ is a local minimum.5). so f (x∗ ) = 0. which in turn is equivalent to f(x∗ ) (x − x∗ ) ≥ 0. we obtain ∂f (x∗ )/∂xi = 0 for all i = 1. When X is convex. This is the normal cone of X at a vector x ∈ X. .5. since xk → x∗ and yk → y. 1.45)] f (xk ) < f (x∗ ). We will return to this result and to the subject of conical approximations when we discuss Lagrange multipliers in Chapter 2. This x contradicts the local optimality of x∗ . In the case where X is a closed set. and zk ∈ TX (xk )∗ for all k.5. 1. Thus Prop.
Proposition 1. we have z ∈ TX (x)∗ if and only if z (x − x) ≤ 0. In the case where TX (x)∗ = NX (x).5.Sec. z ∈ NX (x) = cl (x.46) Furthermore. An important consequence of convexity of X is that it implies regularity. we say that X is regular at x. ∀ x ∈ X. so the graph of NX (·) is equal to the closure of the graph of TX (·)∗ : (x. .5. It can be seen that NX (x) is a closed cone containing TX (x)∗ .5. (1. the set (x. X is regular at all x ∈ X.5 Conical Approximations 111 x* + (TX(x *) )* x * − ∇f(x *) Surfaces of equal cost f(x) x* Constraint set X Figure 1. z)  x ∈ X. Then for all x ∈ X. but it need not be convex like TX (x)∗ (see the examples of Fig. 1. z)  x ∈ X.5. 1. z ∈ TX (x)∗ if X is closed. z)  x ∈ X contains the closure of the graph of TX (·)∗ .5: Let X be a convex set. Illustration of the necessary optimality condition − f (x∗ ) ∈ TX (x∗ )∗ for x∗ to be a local minimum of f over X. as shown in the following proposition.6).
1 a1 NX(x) a2 x=0 X = TX(x) (a) X TX(x) = Rn x=0 NX(x) (b) Figure 1. while NX (x) is equal to the horizontal axis. For x = 0 we have TX (x) = X.5. . TX (x)∗ = {0}. In the case of ﬁgure (a). where we have TX (x) = n . while NX (x) is the nonconvex set consisting of the two lines of vectors that are collinear to either a1 or a2 . TX (x)∗ = {0}.6. we have regularity with TX (x)∗ and NX (x) equal to either the line of vectors that are collinear to a1 or the line of vectors that are collinear to a2 . X is regular at all points except at x = 0. At all other vectors x ∈ X. Thus X is not regular at x = 0. X is the union of two lines passing through the origin: X = {x  (a1 x)(a2 x) = 0}. Examples of normal cones. In the case of ﬁgure (b).112 Convex Analysis and Optimization Chap.
221). Thus. and let y and z be given vectors in n such that the line segment connecting y and z does not intersect with C. If x ∈ X and z ∈ NX (x). which by Eq. and to arrive at a contradiction. so that X is regular at x. there must exist sequences {xk } ⊂ X and {zk } such that xk → x. EXE RC ISES 1. Thus. Since the reverse inclusion always holds. 1. is it unique? Discuss the case where C is closed but not convex. let x be such that z (x − x) ≤ 0 for all x ∈ X.46). Taking the limit as k → ∞. 1. and C3 be three closed subsets of n .3(c)]. p.5. (1. so that yk = αk (xk − x) for some αk > 0 and some xk ∈ X. it follows that if z ∈ TX (x)∗ .6(b) shows.5. 1. Prop. and zk ∈ TX (xk )∗ . Consider the problem of minimizing the sum of distances y − x + z − x over x ∈ C. Then there exists some y ∈ TX (x) such that / z y > 0. then z (x − x) ≤ 0 for all x ∈ X . However. assume that z ∈ TX (x)∗ . 1. Consider the problem of ﬁnding a triangle with minimum perimeter that has one vertex on each of the three sets. Since z y > 0. we have αk z (xk − x) > 0 for large enough k. Note that convexity of TX (x) does not imply regularity of X at x. Conversely. it follows that NX (x) = TX (x)∗ . zk → z. will not be needed for our development in this book. This result. we must have zk (x − xk ) ≤ 0 for all x ∈ X. regularity at x implies that the closed cone TX (x) is convex. C2 . (1. we have NX (x) ⊂ TX (x)∗ .E.D. Since cl FX (x) = TX (x) [cf. as the example of Fig. for a closed X. a converse can be shown. we obtain z (x − x) ≤ 0 for all x ∈ X. which is a contradiction.Sec.5. there exists a sequence {yk } ⊂ FX (x) such that yk → y. namely that if X is closed and is regular at x. which is central in nonsmooth analysis. By Eq.5. implies that z ∈ TX (x)∗ . Does an optimal solution exist and if so.5 Conical Approximations 113 Proof: Since (x − x) ∈ FX (x) ⊂ TX (x) for all x ∈ X. then TX (x) is equal to the polar of NX (x): TX (x) = NX (x)∗ if X is closed and is regular at x (see Rockafellar and Wets [RoW98]. . Q.46) (which critically depends on the convexity of X). Derive a necessary and suﬃcient optimality condition.2 Let C1 .1 (Fermat’s Principle in Optics) Let C ⊂ n be a closed convex set.
114 Convex Analysis and Optimization Chap. i = 1. . the problem of minimizing x1 − x2 + x2 − x3 + x3 − x1 subject to xi ∈ Ci . m. . 3. . hi (x) y = 0. ∗ 1. . m. Show max ax≤γ if and only if a ∈ C + {x  x ≤ γ/β}. . (b) TX (x) = V (x) if either the gradients hi (x). 2. . x2 ∈ C ∗ .e. Show that if (x∗ . there exists 1 2 3 a vector z ∗ in the triangle such that (z∗ − x∗ ) ∈ TCi (x∗ )∗ . For any x ∈ X let A(x) = j  gj (x) ≤ 0 and consider the subspace V (x) = y  Show that: (a) TX (x) ⊂ V (x).3 (Cone Decomposition Theorem) Let C ⊂ that: n be a closed convex cone and let x be a given vector in n .4 Let C ⊂ n be a closed convex cone and let a be a given vector in that for any positive scalars β and γ. x2 . j ∈ A(x) . ˆ (x − x) x = 0. . . 1 i. .5. i i i = 1. j = 1. respectively. i = 1. . 3. or all the functions hi and gj are linear.5 (Quasiregularity) Consider the set X = x  hi (x) = 0. and x3 do not lie on the same line. and will be signiﬁcant in the Lagrange multiplier theory of Chapter 2. x∗ ) deﬁnes an optimal triangle. gi (x). (ii) x = x1 + x2 with x1 ∈ C.5. i = 1. 1. r . where the functions hi : n → and gj : n → are smooth. ˆ x − x ∈ C ∗. i = 1. . x∗ . . 1. gj (x) ≤ 0. x∈C n . and x1 x2 = 0. m. Show (a) x is the projection of x on C if and only if ˆ x ∈ C. and the additional condition that x1 .. . 2. gj (x) y ≤ 0. . ˆ ˆ (b) The following two properties are equivalent: (i) x1 and x2 are the projections of x on C and C ∗ . are linearly independent. Note: The property TX (x) = V (x) is called quasiregularity. .5. . we have x ≤β. . j ∈ A(x).
We start with properties of cones that have a polyhedral structure. . We say that a cone C ⊂ n is polyhedral. .Sec. Show that the condition f (x ) (x − x ) ≥ 0. ar } = x x= j=1 µj aj . . 1. .5. 1. i. j = 1. ar are some vectors in n .6. Show that if at a given x ∈ X. r .5. is necessary and suﬃcient for a vector x∗ ∈ X to be a global minimum of f over X. µj ≥ 0. and our ﬁrst objective in this section is to show that these two views are related through the polarity relation and are in some sense equivalent. if it has the form r C = cone {a1 . and proceed to discuss characterizations of more general polyhedral sets. ∗ ∗ ∀ x ∈ X.5. if it has the form C = {x  aj x ≤ 0. we develop some basic results regarding the geometry of polyhedral sets. We then apply the results obtained to linear and integer programming. 1. where a1 . . the quasiregularity condition TX (x) = V (x) of that exercise holds. .5. j = 1.1 Polyhedral Cones We introduce two diﬀerent ways to view cones with polyhedral structure. r}. 1. if it is generated by a ﬁnite set of vectors. .6 (Regularity of Constraint Systems) Consider the set X of Exercise 1. . . . . . . . .. then X is regular at x.7 (Necessary and Suﬃcient Optimality Condition) Let f : n → be a smooth convex function and let X be a nonempty subset of n . We say that a cone C ⊂ n is ﬁnitely generated . . .6 Polyhedral Convexity 115 1.6 POLYHEDRAL CONVEXITY In this section.e.
1. since a linear equality constraint may be converted into two inequality constraints. . . i = 1. . . . Note that sets deﬁned by linear equality constraints (as well as linear inequality constraints) are also polyhedral cones. . µj ≥ 0. aj x ≤ 0. if e1 . a2 . . λ+ ≥ 0. . m. . . since it is the intersection of closed halfspaces. r} is a polyhedral cone since it can be written as {x  ei x ≤ 0. j = 1. em and a1 . the set {x  ei x = 0. where a1. . i = 1. i = 1. m. i i A polyhedral cone is closed. . . . . . . . A ﬁnitely generated cone is also closed and is in fact polyhedral. . −ei x ≤ 0. . i = 1. In particular. .6. . but this is a fairly deep fact. . (b) Cone generated by the vectors a1 . 2. . . since it can be written as m m r x x= i=1 λ+ ei + i i=1 λ− (−ei ) + i j=1 µj aj . Using a related conversion. . ar are given vectors. . . . λi ∈ . ar are some vectors in n . . . . . . . 3. .1. µj ≥ 0. r . . . . j = 1. j = 1. which is shown in the following proposition. m. . r}. 1 a1 a2 0 a3 a1 a2 a3 0 (a) (b) Figure 1. a3 . it can be seen that the cone m r x x= λi ei + µj aj . aj x ≤ 0. m. . (a) Polyhedral cone deﬁned by the inequality constraints aj x ≤ 0. These deﬁnitions are illustrated in Fig. 1.116 Convex Analysis and Optimization Chap. .6. . r i=1 j=1 is ﬁnitely generated. j = 1. λ− ≥ 0. . . j = 1.
namely that C is polyhedral. .e. if and only if x can be expressed as m r x= i=1 λi ei + j=1 µj aj . r .. the ﬁnitely generated cone C= x x= j=1 µj aj . (1. The second proof shows a stronger result. . . if y x ≤ 0 for all x ∈ C. j = 1. . We will give two proofs for this. r. . . Proof: (a) We ﬁrst show that the polar cone of C has the desired form (1.6 Polyhedral Convexity 117 Proposition 1. . if y ∈ C ∗ . then (since aj belong to C) we have y aj ≤ 0. ar be vectors of r n. When r = 1. Then.48). e1 . . . . . . . then y x ≤ 0 for all x ∈ C.47) is closed and its polar cone is the polyhedral cone given by C ∗ = x  aj x ≤ 0. r (1. . i. .48) (b) (Farkas’ Lemma) Let x. . but also proves half of the MinkowskiWeyl Theorem [part (c)]. We have x y ≤ 0 for all vectors y ∈ n such that y ei = 0. (c) (MinkowskiWeyl Theorem) A cone is polyhedral if and only if it is ﬁnitely generated.1: (a) Let a1 . . ar be vectors of n . y aj ≤ 0. This not only shows that C is closed. . If y satisﬁes y aj ≤ 0 for all j. . . Thus.48).48) is a subset of C ∗ . (1. . . Conversely. C ∗ is a subset of the set in the righthand side of Eq. m. . ∀ i = 1. j = 1. where λi and µj are some scalars with µj ≥ 0 for all j. The ﬁrst proof that C is closed is based on induction on the number of vectors r. for all j. . 1. (1. C is either {0} (if a1 = 0) or a halﬂine. and is . so the set in the righthand side of Eq. The ﬁrst proof is simpler and suﬃces for the purpose of showing Farkas’ Lemma [part (b)]. µj ≥ 0. . . ∀ j = 1.6. and a1. . There remains to show that C is closed. .Sec. em .
. / equality cannot hold above. x =1 min ar+1 x. . Indeed. r j=1 are closed. r + 1 j=1 is also closed. . we have xk = ξk ar+1 + yk . consider the cone r Cr = x x = µj aj . . then Cr+1 is the subspace spanned by a1 . j = 1. for all k. . Because x∗ ∈ Cr and −ar+1 ∈ Cr . Suppose. . µj ≥ 0. . . Let γ= x∈Cr . ξk ≥ 0. . . . Since the set {x ∈ Cr  x = 1} is nonempty and compact. ar+1 and is therefore closed. If the vectors −a1 . Then. and 1 + γ > 0. . −ar+1 belong to Cr+1 . . r . 1 therefore closed. where ξk ≥ 0 and yk ∈ Cr . we obtain xk 2 2 = ξk + yk 2 2 + 2ξk ar+1 yk + 2γξk yk ≥ 2 ξk + yk = ξk − yk )2 + 2(1 + γ)ξk yk . Without loss of generality. respectively. for some r ≥ 1. Since {xk } converges. . all cones of the form r x x= µj aj . we have γ = ar+1 x∗ ≥ − ar+1 · x∗ = −1. j = 1. does not belong to Cr+1 . Let {xk } ⊂ Cr+1 be a convergent sequence. . Using the fact ar+1 = 1 and the deﬁnition of γ. j=1 which is closed by the induction hypothesis. If the negative of one of the vectors. . assume that aj = 1 for all j. we will show that a cone of the form r+1 Cr+1 = x x = µj aj . . say −ar+1 .118 Convex Analysis and Optimization Chap. The limit of {xk } is k→∞ lim (ξk ar+1 + yk ) = ξar+1 + y. j = 1. thereby showing that Cr+1 is closed. . We will prove that its limit belongs to Cr+1 . . the minimum above is attained at some x∗ ∈ Cr with x∗ = 1 by Weierstrass’ Theorem. µj ≥ 0. . it follows that the sequences {ξk } and {yk } are bounded and hence. with equality if and only if x∗ = −ar+1 . µj ≥ 0. Using the Schwartz Inequality. . they have limit points denoted by ξ and y. so that γ > −1.
. in which case C = {0}. It follows that for any x ∈ n we have that −µr+1 ar+1 + x ∈ Pr . i = 1. . . . r + 1 . . and C = {x  ui x ≤ 0. by showing that it is polyhedral. µj ≥ 0. −a1x ≤ 0}. . n}. In both cases (a) and (b). completing the proof.6 Polyhedral Convexity 119 which belongs to Cr+1 . .. We now give an alternative proof that C is closed. the open sphere centered at −µr+1 ar+1 of radius µr+1 is contained in Pr . . (b) a1 = 0. and deﬁne the index sets J − = {j  βj < 0}. we see that −ar+1 is an interior point of Pr . j = 1. . where b1 . Let βj = ar+1 bj . . If J − ∪J 0 = Ø. −ui x ≤ 0. . . C = {x  bi x ≤ 0. there are two possibilities: (a) a1 = 0. The proof is constructive and uses induction on the number of vectors r. r j=1 has a polyhedral representation Pr = x  bj x ≤ 0. n − 1. in which case C is a closed halﬂine. Hence there is an > 0 such that for each µr+1 > 0. j = 1. . where ui is the ith unit coordinate vector. . . i = 1. since ξ ≥ 0 and y ∈ Cr (by the closedness hypothesis on Cr ). J + = {j  βj > 0}. When r = 1. . j = 1. and consider the set r+1 Cr+1 = x x = µj aj . . . .Sec. 1. C is polyhedral. µj ≥ 0. a set of the form r Cr = x x = µj aj . . . is characterized as the set of vectors that are orthogonal to the subspace orthogonal to a1 and also make a nonnegative inner product with a1 . J 0 = {j  βj = 0}. . i. Assume that for some r ≥ 1. j = 1. . . . m . or equivalently ar+1 bj > 0 for all j. Let ar+1 be a given vector in n . bn−1 are basis vectors for the (n − 1)dimensional subspace that is orthogonal to the vector a1 . . . j=1 We will show that Cr+1 has a polyhedral representation. We conclude that Cr+1 is closed. . −bi x ≤ 0. m. which using the Polar Cone Theorem.e.
the set C or J r+1 has the polyhedral representation Pr+1 = x  bj x ≤ 0. βj (3) J + = Ø and J − = Ø. or equivalently that there exists µr+1 ≥ 0 such that x − µr+1 ar+1 ∈ Pr . µr+1 ≥ r+1 0 such that x = j=1 µj aj . we ﬁx a vector x ∈ Pr+1 and we verify that there exist µ1 . Then since x ∈ Pr+1 . βk . j ∈ J − ∪ J 0 . implying x ∈ Pr . This will complete the induction.k = bl − βl bk . . l ∈ J + . since x ∈ Pr+1 . βk ∀ l ∈ J +. 1 for all suﬃciently large µr+1 . there holds bk x/βk ≥ 0 for all k ∈ J − . it has the polyhedral representation Pr+1 = x  bj x ≤ 0. k ∈ J − . Indeed. implying that x belongs to Cr+1 . Therefore Cr+1 = n . . the vectors a1 . . . so that for µr+1 satisfying max 0. ar+1 satisfy the inequalities deﬁning Pr+1 . . we have Cr+1 ⊂ Pr+1 since by construction. so that x − µr+1 ar+1 ∈ Pr for µr+1 = 0. for µr+1 suﬃciently large. we have x − µr+1 ar+1 ∈ Pr . ∀ k ∈ J − . We will show that if J + = Ø − = Ø.k x ≤ 0 for all l ∈ J + . so that Cr+1 is a polyhedral set. Furthermore. and if J + = Ø and J − = Ø.k x ≤ 0. since x ∈ Pr+1 . We consider three cases. implying that bl x b x ≤ k . To show the reverse inclusion. (2) J + = Ø. where bl. We may thus assume that J − ∪ J 0 = Ø. bl. bj (x − µar+1 ) ≤ 0. (1. we have bj x ≤ 0 for all j ∈ J 0 . and J − = Ø. J 0 = Ø. Then.120 Convex Analysis and Optimization Chap. for x ∈ Pr+1 . k ∈ J − . . . (1) J + = Ø. ∀ j ∈ J 0 . . Then. ∀ k ∈ J −. we have bl. we have bj x ≤ 0 for all j ∈ J − ∪ J 0 . max l∈J + bl x βl ≤ µr+1 ≤ min k∈J − bk x . ∀ j ∈ J + .49) Hence. j ∈ J − ∪ J 0 . βl βk ∀ l ∈ J +. so that bj (x − µar+1 ) ≤ 0. ∀ µ ≥ max j∈J + bj x . ∀ µ ≥ 0.
r}. its polar (P ∗ )∗ is ﬁnitely generated. i = 1.1). P ∗ is ﬁnitely generated and therefore polyhedral. . . . we have r P∗ = x x = µj aj . . . . m. ∀ k ∈ J −. r ∗ . (P ∗ )∗ = P .E. . r + 2m} . . r . Since by part (a). and by using also the closedness of ﬁnitely generated cones proved in part (a) and the Polar Cone Theorem. . . so there remains to show the converse. P = r x x= j=1 µj aj . µj ≥ 0. P = C ∗ and C is closed and convex. j = 1.5. r+2m C= x x= µj aj . Q. (b) Deﬁne ar+i = ei and ar+m+i = −ei . j = 1. we have P ∗ = ∗ C ∗ = C by the Polar Cone Theorem (Prop. j = 1. 1. . . By part (a). . . . The result to be shown translates to x ∈ P∗ where P = {y  y aj ≤ 0. µj ≥ 0. j = 1.Sec. . j=1 ⇐⇒ x ∈ C. ∀ l ∈ J +. bk x − µr+1 bk ar+1 = βk bl x − µr+1 bl ar+1 = βl bk x − µr+1 βk bl x − µr+1 βl ∀ j ∈ J 0. j=1 Thus. . ≤ 0. µj ≥ 0 . . (c) We have already shown in the alternative proof of part (a) that a ﬁnitely generated cone is polyhedral. Hence bj x − µr+1 ar+1 ≤ 0 for all j. ≤ 0. By the Polar Cone Theorem. implying that x − µr+1 ar+1 ∈ Pr .D.6 Polyhedral Convexity 121 we have bj x − µr+1 bj ar+1 ≤ 0. . Let P be a polyhedral cone given by P = {x  aj x ≤ 0. implying that P is ﬁnitely generated. . so that by part (a). . 1. and completing the proof.
aj x ≤ bj w. . . j = 1. m. . . . j = 1. The following is a fundamental result. . vm } and a ﬁnitely generated cone C such that P = conv {v1 . .2: A set P is polyhedral if and only if there exist a nonempty ﬁnite set of vectors {v1.1(c)]. . j = 1. br are given scalars. . P = x m m x= j=1 µj vj + y. for some vectors aj and scalars bj . µj ≥ 0. i. . Proof: Assume that P is polyhedral. j = 1. . . .2 Polyhedral Sets A nonempty set P ⊂ n is said to be a polyhedral set (or polyhedron) if it has the form P = x  aj x ≤ bj . .6. . . . . . i = 1. w = µj dj . . where aj are some vectors in n and bj are some scalars. . . r}. r and note that ˆ P = x  (x. . . if e1 . m. .122 Convex Analysis and Optimization Chap. ar are given vectors. −ei x ≤ −di . w)  0 ≤ w. j=1 µj = 1. . Consider the polyhedral cone in given by ˆ P = (x. . 1 1. r} is polyhedral. . y ∈ C . n+1 ˆ By the MinkowskiWeyl Theorem [Prop. . . . . r . 1) ∈ P . . . i = 1. j = 1. w) x = µj vj . . the set {x  ei x = di . m. j=1 j=1 . Then. . . and d1 . 1. . aj x ≤ bj . so it has the form m m ˆ P = (x. . . dm and b1 . vm } + C. A polyhedron may also involve linear equalities.e. . . it has the form P = x  aj x ≤ bj . j = 1. em and a1 . In particular. . since it can be written as {x  ei x ≤ di . . . r . . µj ≥ 0.6. aj x ≤ bj . m . which may be converted into two linear inequalities. . Proposition 1. . . . .6. showing that a polyhedral set can be represented as the sum of the convex hull of a ﬁnite set of points and a ﬁnitely generated cone. . P is ﬁnitely generated. . .. . . . . . . j = 1.
all of which are diﬀerent from x. with y = x and z = x. j = 1. vm } and a ﬁnitely generated cone is a polyhedral set.6 Polyhedral Convexity 123 ˆ for some vectors vj and scalars dj . w) ∈ P . . . . j = 1. Let J + = {j  dj > 0}. 0 j∈J To prove that the vector sum of conv {v1 . + 0 j∈J j∈J j∈J + µj = 1. we pass to a ﬁnitely generated cone description. has no extreme points. . we use a reverse argument. we use the MinkowskiWeyl Theorem to assert that this cone is polyhedral. Thus. we obtain the equivalent description m ˆ P = (x. w) x = µj vj . As a result. . J 0 = {j  dj = 0}. . It follows that an extreme point of a set must lie on the relative boundary of the set. and the ﬁnitely generated cone µj vj µj ≥ 0. 1. .E. Q.Sec. µj ≥ 0. . a vector x ∈ C is said to be an extreme point of C if there do not exist vectors y ∈ C and z ∈ C. j ∈ J 0 . the origin. 1. As another consequence of the deﬁnition. . and a scalar α ∈ (0.6. Thus an extreme point of a set cannot lie strictly between the endpoints of any line segment contained in the set. m . . An equivalent deﬁnition is that x cannot be expressed as a convex combination of some vectors of C. + j=1 j∈J ˆ Since P = {x  (x. j ∈ J + . The details are left as an exercise for the reader. . m . we obtain P = x x= µj vj + µj vj . By replacing µj by µj /dj for all j ∈ J + . we see that dj ≥ 0 for all j.D. . except in the special case where the set consists of a single point. . and more generally a convex set that coincides with its relative interior. 1) ∈ P }. P is the vector sum of the convex hull of the vectors vj . a convex cone may have at most one extreme point.3 Extreme Points Given a convex set C. an open set has no extreme points. µj ≥ 0. Since w ≥ 0 for all vectors (x. and we ﬁnally construct a polyhedral set description. 1) such that x = αy + (1 − α)z. w = µj .
124 Convex Analysis and Optimization Chap. (b) C has at least one extreme point if and only if it does not contain a line.Milman Theorem) If C is bounded (in addition to being convex and closed).6. Then a y ≥ a x and a z ≥ a x. 1). showing that x is not an extreme point of C ∩ H. then every extreme point of C ∩ H is also an extreme point of C.3: Let C be a nonempty closed convex set in n. In the set of (c). i. the halfspace containing C is of the form {x  a x ≥ a x}. We also show in what follows that a bounded polyhedral set has a ﬁnite number of extreme points and in fact it is equal to the convex hull of its extreme points. Illustration of extreme points of various convex sets. implies that a y = a x and a z = a x. a set L of the form L = {x + αd  α ∈ } where d is a nonzero vector in n and x is some vector in C.2 illustrates the extreme points of various types of sets. Then by the . (c) (Krein .. which in view of x = αy + (1 − α)z. 1 Extreme Points (a) Extreme Points (b) Extreme Points (c) Figure 1. (a) If H is a hyperplane that passes through a relative boundary point of C and contains C in one of its halfspaces. Proof: (a) Let x be an element of C ∩ H which is not an extreme point of C. assume that C has an extreme point x and contains a line {x + αd  α ∈ }. Since x ∈ H. and some y ∈ C and z ∈ C. The following proposition provides some intuition into the nature of extreme points. (b) To arrive at a contradiction.e.6. y ∈ C ∩ H and z ∈ C ∩ H. Figure 1. then C is equal to the convex hull of its extreme points. the extreme points are the ones that lie on the circular arc. Then we have x = αy + (1 − α)z for some α ∈ (0. Proposition 1. where x ∈ C and d = 0. Therefore. where a = 0.2. with y = x and z = x.6.
choose any x ∈ C. This contradicts the fact that x is an extreme point. Suppose now that every compact convex set in n−1 is the convex hull of its extreme points. this extreme point must also be an extreme point of C. so that C is the convex hull of its extreme points. is the origin. the only extreme point of the ray C = {λy  λ ≥ 0}. Since C contains no line. Let H1 be a hyperplane that passes through x1 and contains C in one of its halfspaces. say x1 and x2 . it must have an extreme point.13(b)] and the closedness of C.E. The intersections C ∩ H1 and C ∩ H2 are compact convex sets that lie in the hyperplanes H1 and H2 .D. By part (a). 1. it follows that x is a convex combination of some extreme points of C. it has at least one extreme point. Then H is an (n − 1)dimensional subspace. every compact convex set C is a line segment whose endpoints are the extreme points of C. Let C be a compact convex set in n . where is y is a given nonzero vector. On the real line. This is true in the real line . let H2 be a hyperplane that passes through x2 and contains C in one of its halfspaces. so that C has a relative boundary point. so the line {x + αd  α ∈ } belongs to C. Conversely. and by using the inductive hypothesis. Q. Hence. belong to the relative boundary of C. respectively. it contains no line and by part (b). and x2 is a convex combination of some extreme points of C ∩ H2 .Milman Theorem. Since C is compact. Since x lies in the line segment connecting x1 and x2 .2. it must have an extreme point. 1. we must have C = n . showing that C is contained in the convex hull of all extreme points of C. Similarly.6 Polyhedral Convexity 125 Recession Cone Theorem [Prop. every extreme point of C ∩H1 and every extreme point of C ∩H2 is also an extreme point of C. By viewing H1 and H2 as (n − 1)dimensional spaces. we see that each of the sets C ∩ H1 and C ∩H2 is the convex hull of its extreme points. so the set C ∩ H lies in an (n − 1)dimensional space and contains no line. we may assume without loss of generality that x = 0. so assume it is true in n−1 . The boundedness assumption is essential in the Krein . Hence x1 is a convex combination of some extreme points of C ∩H1 . (c) The proof is based on induction on the dimension of the space. and consider a line that lies in aﬀ(C) and passes through x. . by the induction hypothesis. so that each of x1 and x2 is a convex combination of some extreme points of C. and any hyperplane H passing through x and containing C in one of its halfspaces. For example. Take any x on the relative boundary of C. By using a translation argument if necessary. the convex hull of all extreme points of C is contained in C. the intersection of this line and C is a line segment whose endpoints. both d and −d are directions of recession of C. we use induction on the dimension of the space to show that if C does not contain a line. but none of its other points can be generated as a convex combination of extreme points of C. By part (a). To show the inverse inclusion. By convexity of C. Since C is bounded.Sec.
Proof: If C consists of just a single point. We will come to this fact after considering the more general case where f is concave. 1 As an example of application of the preceding proposition. / so assume that x∗ ∈ ri(C). for any x ∈ C.9(c). We may thus assume that C contains at least two distinct points.3(b) they always possess at least one extreme point. 1. the result holds trivially. and / does not contain a line. consider a nonempty polyhedral set of the form {x  Ax = b. where A is an m × n matrix and b ∈ n .e. every line segment having x∗ as one endpoint can be prolonged beyond x∗ without leaving C. Furthermore. and C is closed and convex. .6. One of the fundamental linear programming results is that if a linear function f attains a minimum over a polyhedral set C having at least one extreme point. there exists γ > 1 such that x = x∗ + (γ − 1)(x∗ − x) ∈ C. or of the form {x  Ax ≤ b.4: Let C ⊂ n be a closed convex set that does not contain a line. If x∗ ∈ ri(C) we are done. we have f (x∗ ) ≥ γ−1 1 f (x) + f (ˆ) ≥ f (x∗ ). Then f attains a minimum at some extreme point of C. then f attains a minimum at some extreme point of C (as well as possibly at some other nonextreme points).126 Convex Analysis and Optimization Chap. and clearly do not contain a line. We ﬁrst show that f attains a minimum at some vector that is not in ri(C). Hence by Prop. x ≥ 0}. 1. there exists an x ∈ C with x ∈ ri(C). ˆ γ γ and by the concavity of f and the optimality of x∗ .. i. x ≥ 0}. Proposition 1.2. By Prop. ˆ Therefore x∗ = γ−1 1 x + x. Such polyhedra arise commonly in linear programming formulations.6. Let x∗ be a vector where f attains a minimum over C. since C is closed. contains at least two points. x γ γ implying that f (x) = f(x∗ ) for any x ∈ C. and let f : C → be a concave function attaining a minimum over C.
Proposition 1.6. we are done. The intersection C1 = C ∩ H is closed. f attains its minimum over C1 at x. and lies in an aﬃne set M1 of dimension n − 1. Assume.6.D. then we may view M1 as a space of dimension n − 1 and we form C2 . Q. then by Prop. This hyperplane will be of dimension n − 2.6. by repeated application of Prop. where x ∈ C and d is a nonzero vector. consider a hyperplane H passing through x and containing C in one of its halfspaces.E. Since A has rank n. convex. an extreme point of C. 1. Then if f attains a minimum over C. Proof: In view of Prop.5: Let C be a closed convex set and let f : C → be a concave function. when a set Cn consisting of a single point is obtained. it is also an extreme point of C and the result follows. The following is an important special case of the preceding proposition. This contradicts the assumption Ax ≥ b for all x ∈ C. Furthermore. by the preceding argument. to obtain a contradiction. 1. does not contain a line.4.6. If it is not an / extreme point.3.6 Polyhedral Convexity 127 We have shown so far that the minimum of f is attained at some x ∈ ri(C). the intersection of C1 with a hyperplane in M1 that passes through x1 and contains C1 in one of its halfspaces. We can continue this process for at most n times. If x1 is not an extreme point of C1 . implying that the image AL = {Ax + λAd  λ ∈ } is also a line. This point is an extreme point of Cn and. / If x1 is an extreme point of C1 .E. 1. it suﬃces to show that C does not contain a line. 1.Sec.D. . it attains a minimum at some extreme point of C. ∀ x ∈ C.3. Thus. it also attains its minimum at some x1 ∈ C1 with x1 ∈ ri(C1). Q. If x is an extreme point of C. that C contains the line L = {x + λd  λ ∈ n }. Assume that for some m × n matrix A of rank n and some b ∈ n we have Ax ≥ b. the vector Ad is nonzero.
. j=1 µj = 1. . Thus the set of extreme points of P is either empty. it follows that the set of extreme points of P is nonempty and ﬁnite if and only if P contains no line. . and since P ⊂ P . We note that an extreme point x of P cannot be of the form x = x + y. vm . . .3(b). . . The following proposition gives another and more speciﬁc characterization of extreme points of polyhedral sets.6. it must also be ˆ ˆ an extreme point of P . vm . 1. . . . . ˆ where P is the convex hull of some vectors v1. µj ≥ 0. or nonempty and ﬁnite. . . . since otherwise this point would be expressible as a convex combination of v1 . ˆ ˆ ˆ ˆ an extreme point of P must belong to P . j = 1. Using Prop. ˆ If P is bounded. m . . . vm and C is a ﬁnitely generated cone: ˆ P = m m x x= j=1 µj v j . then we must have P = P .6. . P can be represented as ˆ P = P + C. vm ). Therefore. By Prop. 1. since in this case x would be the midpoint ˆ ˆ of the line segment connecting the distinct vectors x and x + 2y. . ˆ where x ∈ P and y = 0.4 Extreme Points and Linear Programming We now consider a polyhedral set P and we characterize the set of its extreme points (also called vertices).Milman Theorem that P is equal to the convex hull of its extreme points (not just the convex hull of the vectors v1 .2.6. An extreme point of P must be one of the vectors v1 . and it follows from the Krein . . . which is central in the theory of linear programming. 1 1. . y ∈ C.128 Convex Analysis and Optimization Chap.
we have v + γw ∈ P and v − γw ∈ P . r} contains n linearly independent vectors. . . then a vector v ∈ P is an extreme point of P if and only if the columns of A corresponding to the nonzero coordinates of v are linearly independent. then the system of equations aj w = 0. z ∈ P . x ≥ 0}. . where A is a given m × n matrix and b is a given vector in m . if v is an extreme point. 1. r . and ¯ α ∈ (0. . b is a given vector in m . ∀ a j ∈ Av has a nonzero solution w. Proof: (a) If the set Av contains fewer than n linearly independent vectors. . we have v = αy + (1 − α)z. (b) If P has the form P = {x  Ax = b. Suppose that for some y ∈ P . then a vector v ∈ P is an extreme point of P if and only if the set Av = aj  aj v = bj . where A is a given m × n matrix. .Sec. j = 1. suppose that Av contains a subset Av consisting of n linearly independent vectors. For suﬃciently small γ > 0. Thus. we have bj = aj v = αaj y + (1 − α)aj z ≤ αbj + (1 − α)bj = bj . d are given vectors in n . then a vector v ∈ P is an extreme point of P if and only if the columns of A corresponding to the coordinates of v that lie strictly between the corresponding coordinates of c and d are linearly independent. c ≤ x ≤ d}. and c. . .6: Let P be a polyhedral set in (a) If P has the form n. 1). P = x  aj x ≤ bj . where aj and bj are given vectors and scalars. Then for all aj ∈ Av . j ∈ {1. Av must contain n linearly independent vectors.6 Polyhedral Convexity 129 Proposition 1.6. thus showing that v is not an extreme point. respectively. (c) If P has the form P = {x  Ax = b. . ¯ Conversely.
∀ aj ∈ Av . y.130 Convex Analysis and Optimization Chap. it has a representation P = {x  Ax ≥ b}. it attains a minimum at some extreme point of P . in addition to the usual equality and inequality constraints.D. and z are all solutions of the system of n linearly independent equations ¯ aj w = bj . Thus A has rank n and the result follows from Prop. so P would contain a line parallel to d. contradicting the existence of an extreme point [cf..5. (c) The proof is essentially the same as the proof of part (b). it forms the basis for algorithms. or Bertsimas and Tsitsiklis [BeT97]). 1 Thus v. such as the celebrated simplex method (see any linear programming book.E. Hence v = y = z. for some m × n matrix A and some b ∈ m . If A had rank less than n. −x ≤ 0}. 1.6.E. Q. Dantzig [Dan63].7: (Fundamental Theorem of Linear Programming) Let P be a polyhedral set that has at least one extreme point. −Ax ≤ −b. which is equivalent to the n − k nonzero columns of A (corresponding to the nonzero coordinates of v) being linearly independent. which systematically search the set of extreme points of the constraint polyhedron for an optimal extreme point. We obtain that v is an extreme point if and ¯ only if A contains n − k linearly independent rows. include the requirement that the optimization . We write P in the form P = {x  Ax ≤ b.6. 1. In addition to its analytical signiﬁcance. and consider the matrix ¯ A. which is the same as A except that all the columns corresponding to the zero coordinates of v are set to zero.3(b)]. Proof: Since P is polyhedral. Prop.D. Extreme Points and Integer Programming Many important optimization problems. Chvatal [Chv83]. The following theorem illustrates the importance of extreme points in linear programming. (b) Let k be the number of zero coordinates of v. Then if a linear function attains a minimum over P . implying that v is an extreme point of P .g.6. e. Q. and apply the result of part (a). Proposition 1. then its nullspace would contain some nonzero vector d.
c. such as 0 or 1. b is a given vector in m . We have the following proposition. Then all the extreme points of P have integer components. where A is a given m × n matrix. called the relaxed problem.8: Let P be a polyhedral set P = {x  Ax = b. which is derived from the original by neglecting the integer constraints while maintaining all the other constraints. . Thus polyhedra whose extreme points have integer components are of special signiﬁcance. Let us say that a square matrix with integer components is unimodular if its determinant is 0.6 Polyhedral Convexity 131 variables take integer values. Assume that all the components of the matrix A and the vectors b. and that the matrix A is totally unimodular.6. we want to determine conditions under which the components of v are integer. The methodology for solving an integer programming problem is very diverse.Sec. including the simplex method. 1. resource allocation. c ≤ x ≤ d}. c. by a variety of algorithms. and c and d are given vectors in n . and d are integer. Proposition 1. and we note that they arise in a broad variety of practical settings. and d are integer. (1. and let us say that a rectangular matrix with integer components is totally unimodular if each of its square submatrices is unimodular. Given an extreme point v of P . but also the original integer programming problem. such as matching. A particularly fortuitous situation arises if this extreme point solution happens to have integer components. but an important subset of this methodology relies on the solution of a continuous optimization problem. and c and d are given vectors in n . c ≤ x ≤ d}. Proof: Let v be an extreme point of P . We will now use Prop. such as for example in scheduling. In many important situations the relaxed problem is a linear program that can be solved for an extreme point optimal solution.50) where A is a given m × n matrix. 1. and engineering design. since it will then solve optimally not just the relaxed problem. b is a given vector in m . or 1. and traveling salesman problems. as well in many combinatorial optimization contexts. Consider a polyhedral set P of the form P = {x  Ax = b.6.6 to characterize an important class of polyhedra with this property. We refer to such problems as integer programming problems. 1. We assume that all the components of the matrix A and the vectors b. Consider the subset of indexes I = {i  ci < vi < di }.
each of the components of the inverse of a matrix is a fraction with a sum of products of the components of the matrix in the numerator and the determinant of the matrix in the denominator.D. or 1. . Q.E. A has linearly independent columns. From Prop. where b is equal to b minus the last n − m columns of A multiplied with the corresponding components of v (each of which is equal to either the corresponding component of c or the corresponding component of d. m} for some integer m. Note that each of the last n − m components of v is equal to either the corresponding component of c or to the corresponding component of d. 1 and without loss of generality. Equivalently. where A is the. and is a 0 otherwise. . 1. and it follows that v (and hence also the extreme point v) has integer components. and its ˜ determinant is either equal to 1 or is equal to 1. or 1 by induction on the dimension of the submatrix. there exists an invertible ˜ m × m submatrix A of A and a subvector ˜ of b with m components such b that ˜ b. is a 1 if the arc is incoming to i. 1. Let A be the matrix consisting of the ﬁrst m columns of A and let v be the vector consisting of the ﬁrst m components of v. which are integer. the submatrices of dimension 1 of A are the scalar components of A. some of which are discussed in the exercises. Then we can show that the determinant of each square submatrix of A is 0. 1. Here A has a row for each node and a column for each arc of the graph. which are 0. . In particular. The total unimodularity of the matrix A in the above proposition can be veriﬁed in a number of important special cases. so v is the unique solution of the system of equations Ay = b. socalled. Thus the extreme point v has integer components if and only if the subvector v does. . v = (A)−1˜ The components of v will be integer if we can guarantee that the ˜ components of the inverse (A)−1 are integer. the invertible submatrix A is unimodular. arc incidence matrix of a given directed graph (we refer to network optimization books such as Rockafellar [Roc84] or Bertsekas [Ber98] for a detailed discussion of these problems and their broad applications). By Cramer’s formula. Hence (A)−1 has integer components.6. so that b has integer components). assume that I = {1.132 Convex Analysis and Optimization Chap.6. A ˜ is totally unimodular. Suppose that . Since by hypothesis. The most important such case arises in network optimization problems. The component corresponding to the ith row and a given arc is a 1 if the arc is outgoing from node i.
. or 1. other than upper and lower bounds on the variables. Show that the recession cone of P is equal to C. Prop. 1. . 1. vm are some vectors and C is a ﬁnitely generated cone (cf. integer extreme point optimal solutions can generically be found.6. the sum of its rows is 0. m. 1.6. . by expanding its determinant along that component and using the induction hypothesis. . vm } in n . or 1. in linear network optimization where the only constraints. . . Thus. we see that the determinant is 0. .Sec. 1.2). 1. the matrix is singular. If the matrix has a column with a single nonzero component (a 1 or a 1).3 Show that the image and the inverse image of a polyhedral set under a linear transformation is a polyhedral set. . µj ≥ 0. where v1 . and its determinant is 0. j=1 µj = 1. Finally.1 Let V be the convex hull of a ﬁnite set of points {v1 . and let C ⊂ n be a ﬁnitely generated cone. so the matrix is singular. are equality constraints corresponding to an arc incidence matrix. y ∈ C . Hint : Use the MinkowskiWeyl Theorem to construct a polyhedral description of V +C. If this matrix has a column with all components 0.2 Let P be a polyhedral set represented as m m P = x x= j=1 µj vj + y. . EXE RC ISES 1. .6 Polyhedral Convexity 133 the determinant of each square submatrix of dimension n ≥ 1 is 0. .6.6. j = 1. if each column of the matrix has two components (a 1 and a 1). Consider a square submatrix of dimension n + 1. . 1. Show that V + C is a polyhedral set. and its determinant is 0. .
x∈C1 z∈C2 (Compare this with Exercise 1.1.. then C1 × C2 is a polyhedral cone. lines of the form {(x. ∞] is polyhedral if its epigraph is a polyhedral set in n+1 not containing vertical lines [i. cone(C) may not be a polyhedral cone. µi ≥0 i=1 i sup µ i ai i=1 y+ i=1 µi bi . w)  w ∈ } for some x ∈ n ]. . 1. Show also that if C1 and C2 are polyhedral cones. Show the following: (a) A polyhedral function is convex over n .e. F (y) = is polyhedral. (b) Show by counterexample that if C does not contain the origin.) 1. i. 1.134 Convex Analysis and Optimization Chap. Show that C1 + C2 is a polyhedral set.6. then C1 +C2 is a polyhedral cone. Show that C1 and C2 can be strongly separated.6. (a) Show that cone(C) is a polyhedral cone. 1.4.6. Show also that if C1 and C2 are polyhedral cones. for some vectors ai ∈ (c) The function given by m m n ∀ x ∈ dom(f) and scalars bi .6.7 Let C be a polyhedral set containing the origin. .5 Let C1 and C2 be two polyhedral sets in n . (b) f is polyhedral if and only if dom(f ) is a polyhedral set and f (x) = max{a1 x + b1 .6. .. 1 1.4 Let C1 ⊂ n and C2 ⊂ m be polyhedral sets. Show that the Cartesian product C1 × C2 is a polyhedral set. .8 (Polyhedral Functions) A function f : n → (−∞.6 (Polyhedral Strong Separation) Let C1 and C2 be two nonempty disjoint polyhedral sets in n . m µ =1.e. there is a nonzero vector a ∈ n such that sup a x < inf a z. am x + bm }.
. Let P and Q be two polyhedra of n and m . j = i.10 Let P be a polyhedral set represented as m m P = x x= j=1 µj vj + y. . . m. . where v1 . . Show that the function f given by f (x) = g(Ax) is polyhedral on n . we say that P and Q are isomorphic. (a) Show that there is an aﬃne function f : n → m that maps each extreme point of P into a distinct extreme point of Q if and only if there exists an aﬃne function g : n → m that maps each extreme point of Q into a distinct extreme point of P . Show that P has an extreme point if and only if the set of vectors {aj  j = 1. .6. Prop.6. µj ≥ 0. . j = 1. j=1 µj = 1. (b) Show that P and Q are isomorphic if and only if there exist aﬃne functions f : n → m and g : n → m such that x = g f (x) for all x ∈ P and y = f g(y) for all y ∈ Q. j = 1.6 Polyhedral Convexity 135 1. . 1.6. . y ∈ C . r} contains a subset of n linearly independent vectors. where aj are some vectors and bj are some scalars.9 Let A be an m × n matrix and let g : m → (−∞.12 (Isomorphic Polyhedral Sets) It is easily seen that if a linear transformation is applied to a polyhedron. ∞] be a polyhedral function. respectively. Note : If there exist such aﬃne mappings f and g.Sec. r . Show that each extreme point of P is equal to some vector vi that cannot be represented as a convex combination of the remaining vectors vj . . vm are some vectors and C is a ﬁnitely generated cone (cf. the result is a polyhedron whose extreme points are the images of some (but not necessarily all) of the extreme points of the original polyhedron. 1. . .11 Consider a nonempty polyhedral set of the form P = x  aj x ≤ bj . 1. This exercise clariﬁes the circumstances where the extreme points of the original and the transformed polyhedra are in onetoone correspondence. . 1. 1.6.2). . (c) Show that if P and Q are isomorphic then P and Q have the same dimension. . . .6.
n (a) Then exactly one of the following two conditions holds: (i) There exists a vector x ∈ such that a1 x < 0. . x3 )  (x1 − 1)2 + x2 ≤ 1. Give an example showing that the assertion fails if boundedness of the set is replaced by a weaker assumption that the set contains no lines. . (ii) There exists a vector µ ∈ r such that µ = 0. (b) Show that an alternative and equivalent statement of part (a) is the following: a polyhedral cone has nonempty interior if and only if its polar cone does not contain a line. Show that P and Q are isomorphic. .6. x2 = 0. . and 2 verify that conv(C1 ∪ C2 ) is compact. . z) ∈ n+k n  Ax ≤ b. .6.14 Show that a compact convex set is polyhedral if and only if it has a ﬁnite number of extreme points. .13 Show by an example that the set of extreme points of a compact set need not be closed. x3 )  x1 = 0. a set of the form {αz  α ∈ }..e. 1 (d) Let A be an k × n matrix and let P = {x ∈ Q = (x.  Ax + z = b. −1 ≤ x3 ≤ 1} and a circular disk C2 = {(x1 . x2 . x ≥ 0. 1. 1. and µ1 a1 + · · · + µr ar = 0. i. µ ≥ 0.136 Convex Analysis and Optimization Chap. . z ≥ 0 . x2 . x ≥ 0}. while its set of the extreme points is not closed.15 (Gordan’s Theorem of the Alternative) Let a1 . ar be vectors in n . . Hint : Consider a line segment C1 = {(x1 . x3 = 0}.6. ar x < 0. where z is a nonzero vector. 1.
. i = 1. Then exactly one of the following two conditions holds: (i) There exists a vector x ∈ n such that a1 x ≤ b1 . . .Sec. .6. . r be convex functions. 1. . . ar x ≤ br . . . i = 1. . . . . .18 (ConvexAﬃne System Alternatives) Let C be a nonempty convex set in n . . .16 Let a1 . . and ∀ x ∈ C. . . . µ ≥ 0. . (ii) There exists a vector µ ∈ r fr+1 (x) ≤ 0. . . . .17 (Convex System Alternatives) Let C be a nonempty convex set in n and let fi : C → . . (ii) There exists a vector µ ∈ r such that µ ≥ 0 and µ1 b1 + · · · + µr br < 0. . Let fi : C → . and let fi : C → . . . µ1 f1 (x) + · · · + µr fr (x) ≥ 0. . 1. . fr (x) ≤ 0 has a solution x ∈ ri(C). r be aﬃne functions such that the system fr+1 (x) ≤ 0. . and ∀ x ∈ C. µ1 f1 (x) + · · · + µr fr (x) ≥ 0. (ii) There exists a vector µ ∈ r such that µ = 0. µ ≥ 0. . fr (x) < 0. . br be scalars. 1. . fr (x) ≤ 0. ar be vectors in n and let b1 . . i = r + 1. . . .6. fr (x) < 0. Then exactly one of the following two conditions holds: (i) There exists a vector x ∈ C such that f1 (x) < 0. Then exactly one of the following two conditions holds: (i) There exists a vector x ∈ C such that f1 (x) < 0.6 Polyhedral Convexity 137 1. µ1 a1 + · · · + µr ar = 0. . such that µ = 0. r be convex functions.6.
20 Let f : n → (−∞. 1.6.3). 1.21 Show that an m × n matrix A is totally unimodular if and only if every subset J of {1. . 1. ∞] be a polyhedral function and let P ⊂ n be a polyhedral set. .6. Suppose that A has at most two nonzero components in each column.6. then the set of minimizers of f over C is nonempty. n} can be partitioned into two subsets J1 and J2 such that j∈J1 aij − j∈J2 aij ≤ 1. Hint : Use Exercise 1.8(b).6. 1. Show that if inf x∈C f (x) is ﬁnite. Suppose that in each column. (a) Show that each facet is a polyhedral set. (b) Show that the number of distinct facets of P is ﬁnite. 1.23 Let A be a matrix with components that are either 0 or 1. The set F = P ∩ H is called a facet of P . Show that A is totally unimodular. 1 1. . all the components that are equal to 1 appear consecutively. is a facet. m. . . .22 Let A be a matrix with components that are either 1 or 1 or 0. there exists a facet of P whose dimension is dim(P ) − 1. .6. Show that A is totally unimodular. and replace the problem of minimizing f over C by an equivalent linear program. (d) Show that if dim(P ) > 0. .. viewed as a singleton set.6. (c) Show that each extreme point of C. ∀ i = 1.6.138 Convex Analysis and Optimization Chap. Prop.19 (Facets) Let P be a polyhedral set and let H be a hyperplane that passes through a relative boundary point of P and contains P in one of its halfspaces (cf. .
it turns out that the diﬀerentiability properties of a convex function are determined by the corresponding properties along lines. y. this approach breaks down. a convex set is bounded if and only if its intersection with every line is bounded. 1. and a convex function is coercive if and only if it is coercive along any line.3)]. note that. we obtain f (y) ≤ y−x z−x f(z) + z −y z −x f (x) . then we can show the relation f (y) − f (x) f (z) − f (x) f(z) − f (y) ≤ ≤ . For a formal proof.24 Let A be a square invertible matrix such that all the components of A are integer. a function is convex if and only if it is convex along any line. as we have seen in Section 1. which are the subject of this section. Eq.51) which is illustrated in Fig. z ∈ I and x < y < z.5. 1. use the system Ax = ui . This calls for an analysis approach that uses derivatives. and let f : I → be convex. Show that A is unimodular if and only if the solution of the system Ax = b has integer components for every vector b that has integer components. 1. the notion of directional diﬀerentiabity and the related notion of a subgradient. (1.6. a set is convex if and only if its intersection with any line is convex. to show that A−1 has integer components. For example. using the deﬁnition of a convex function [cf. but fortunately.7. Then use the equality det(A) · det(A−1 ) = 1. where ui is the ith unit vector.Sec. Hint : To prove that A is unimodular.7 SUBGRADIENTS Much of optimization theory revolves around comparing the value of the cost function at a given point with its values at neighboring points.7 Subgradients 139 1. y−x z−x z−y (1. When the cost function is nondiﬀerentiable. With this in mind. it turns out that in the case of a convex cost function there is a convenient substitute.7. Similarly. 1. Let I be an interval of real numbers.1. If x. we ﬁrst consider convex functions of a single variable.1 Directional Derivatives Convex sets and functions can be characterized in many ways by their behavior along lines.
Its limit as α decreases to zero. Illustration of the inequalities (1.f(x) yx y slope = f(z) . α Let 0 < α ≤ α . it converges either to a ﬁnite number or to −∞. Let a and b be the inﬁmum and the supremum. For any x ∈ I that is not equal to the right end point. we deﬁne s− (x. Similarly. s+ (x. α) ≤ s+ (x.51) with y = x + α and z = x + α to obtain s+ (x. Therefore. which we call the right derivative of f at the point x.7. (1. α ). . α) = f (x) − f (x − α) . respectively. we deﬁne for completeness f − (a) = −∞ and f + (b) = ∞. The basic facts about the diﬀerentiability properties of onedimensional convex functions can be easily visualized. s− (x. and is either ﬁnite or equal to ∞. we deﬁne s+ (x.f(x) zx slope = x f(y) . and either of the desired inequalities follows by appropriately rearranging terms.f(y) zy z Figure 1. is called the left derivative of f at the point x. as α decreases to zero. α) = f(x + α) − f (x) . of I.51). Let f + (x) be the value of the limit. α) is a nondecreasing function of α and.1. α By a symmetrical argument. We use the ﬁrst inequality in Eq. and are given in the following proposition. α) is a nonincreasing function of α. The rate of change of the function f is nondecreasing with its argument. In the case where the end points a and b belong to the domain I of f . and for any α > 0 such that x + α ∈ I. respectively. and for any α > 0 such that x − α ∈ I.140 Convex Analysis and Optimization Chap. for any x ∈ I that is not equal to the left end point. also referred to as the left and right end points of I. denoted by f − (x). 1 slope = f(z) .
∞)] is upper (respectively. We have f + (x + δ) ≤ s+ (x + δ. (1. (z − x)/2 ≤ s− z. α). 1. f + : I → [−∞. (z − x)/2 . to obtain s+ x. α) > −∞. We allow x to be equal to a. b ∈ I) and f is continuous at a (respectively. then f + (x) = f − (x) = (df/dx)(x). α) ≤ s+ (x. b). +∞] are nondecreasing. f − : I → [−∞. (d) This follows by combining parts (a) and (c).51). and some positive δ and α such that x+δ+α < b. left–) continuous at a (respectively. (b) If x belongs to the interior of I. (f) If f is diﬀerentiable at a point x belonging to the interior of I. (z − x)/2 ≤ f − (z). as δ decreases to zero. the result is trivial because f − (a) = −∞ and f + (b) = ∞.51). Also. we let α > 0. (1. Proof: (a) If x is an end point of I. (h) The function f + : I → (−∞. (e) The function f + (respectively. We have used here the fact that s+ (x. then f + (respectively. lower) semicontinuous at every x ∈ I. (e) Fix some x ∈ I. Taking the limit as α decreases to zero. α). we obtain f + (x) < ∞.7. (b) Let x belong to the interior of I and let α > 0 be such that x − α ∈ I. For similar reasons. f − ) is right– (respectively. which is a consequence of the continuity of f (Prop. and use Eq. We take the limit. 1. (c) If x. left–) continuous at every interior point of I. z ∈ I and x < z. (d) The functions f − . with y = (z + x)/2. We now let α decrease to zero to obtain limδ↓0 f + (x + δ) ≤ f + (x).2. z ∈ I and any d satisfying f − (x) ≤ d ≤ f + (x). (g) For any x.Sec. (a) We have f − (x) ≤ f + (x) for every x ∈ I. Part (a) then implies that f −(x) < ∞ and f + (x) > −∞. in which case f is assumed to be continuous at a. to obtain limδ↓0 f + (x + δ) ≤ s+ (x. x = b. we obtain f − (x) ≤ f + (x). b). if a ∈ I (respectively.12). The result then follows because f + (x) ≤ s+ x. We assume that x is an interior point.7 Subgradients 141 Proposition 1. then f + (x) and f − (x) are ﬁnite. α) is a continuous function of x. α). then f + (x) ≤ f − (z). ∞] [respectively. with y = x + α and z = y − α.1: Let I ⊂ be an interval with end points a and b. we have f (z) ≥ f (x) + d(z − x). and let f : I → be a convex function. (z − x)/2 and s− z. The reverse inequality is also true because f + . (c) We use Eq. Then f − (x) ≥ s−(x. f − ) is right– (respectively. to obtain s− (x.
−f (x. by using Prop. ∀y∈ n. The result is trivially true for x = z. 1 is nondecreasing and this proves the right–continuity of f + . Since s+ (x. y) such that f (x. or equivalently.52) + where Fy (0) is the right derivative of the convex scalar function Fy (α) = f (x + αy) at α = 0.53) . α) for α belonging to (0.142 Convex Analysis and Optimization Chap. We only consider the case x < z.1. The proof for f − is similar. we say that f is directionally diﬀerentiable at x in the direction y if there is a scalar f (x. y) = f (x) y where f (x) is the gradient of f at x. Note that the above calculation also shows that the left derivative − Fy (0) of Fy is equal to −f (x. α We call f (x.4. Given a function f : n → . α↓0 α α (1. Q. As in Section 1. y). and a vector y ∈ n . (f) This is immediate from the deﬁnition of f + and f − . −y) and. y) is a linear function of y denoted by f (x. (e). We will now discuss notions of directional diﬀerentiablity of multidimensional realvalued functions (the extended realvalued case will be discussed later). we say that f is diﬀerentiable at x if it is directionally diﬀerentiable at x and f (x. An interesting property of realvalued convex functions is that they are directionally diﬀerentiable at all points x. We say that f is directionally diﬀerentiable at a point x if it is directionally diﬀerentiable at x in all directions. the proof for the case x > z is similar.1(a).D.E. (g) Fix some x. y) the directional derivative of f at x in the direction y. 1. we obtain − + Fy (0) ≤ Fy (0). y) = lim α↓0 f(x + αy) − f (x) . (h) This follows from parts (a).5).7. (d). we have f (z) − f (x) /(z − x) ≥ s+ (x. (1.1. as can be seen from the relation f (x. This is a consequence of the directional diﬀerentiability of scalar convex functions. z ∈ I. z − x). y) = lim α↓0 f(x + αy) − f (x) Fy (α) − Fy (0) + = lim = Fy (0). Letting α decrease to zero. and the deﬁning property of semicontinuity (Deﬁnition 1. we obtain f (z) − f (x) /(z − x) ≥ f + (x) ≥ d and the result follows. a point x ∈ n . α) is nondecreasing in α. −y) ≤ f (x.
and any sequences {xk } and {yk } converging to x and y. 1. y) = k→∞ k→∞ f(x) y. we have lim sup fk (xk .54) follows. we obtain − lim inf k→∞ k→∞ f (xk ) y = lim sup − f (xk ) y ≤ − f (x) y. we obtain lim sup fk (xk . yk ) < µ. respectively. inequality (1. then its gradient is continuous. we have fk (xk + αyk ) − fk (xk ) ≤µ α for all suﬃciently large k. If f is diﬀerentiable at all x ∈ n . y) = inf . then it is continuously Proof: For any µ > f (x. there exists an α > 0 such that f (x + αy) − f (x) < µ. y). 1. α>0 α The following proposition generalizes the upper semicontinuity property of right derivatives of scalar convex functions [Prop. and using Eq.1(h)]. lim sup f (xk ) y = lim sup f (xk . for α ≤ α. we have f (xk ) y → f (x) y for every y. we have for every sequence {xk } converging to x and every y ∈ n .52). so an equivalent deﬁnition of the directional derivative is f (x + αy) − f (x) f (x. ∀ α ≤ α. Proposition 1. and let {fk } be a sequence of convex functions fk : n → with the property that limk→∞ fk (xk ) = f(x) for every x ∈ n and every sequence {xk } that converges to x. n. α Hence. (1.Sec.D. Q. if f is diﬀerentiable over diﬀerentiable over n . y). the diﬀerence quotient f (x + αy) − f (x) /α is a monotonically nondecreasing function of α. which implies that f (xk ) → f (x). Therefore. By replacing y by −y in the preceding argument. k→∞ (1. the gradient is continuous.54) Furthermore. then using the continuity of f and the part of the proposition just proved. y) ≤ f (x.E. .7 Subgradients 143 Note also that for a convex function.7.2: Let f : n → be convex. y). and shows that if f is diﬀerentiable. yk ) ≤ f (x. k→∞ Since this is true for all µ > f (x. Hence.7. Then for any x ∈ n and y ∈ n .
2. . the hyperplane in n+1 that has normal (−d.f(x)) (d. and is denoted by ∂f (x).144 Convex Analysis and Optimization Chap. Illustration of a subgradient of a convex function f.7. ∀z∈ n. 1 1.3 provides some examples of subdiﬀerentials. (1.7. we say that d is a subgradient of f at x if −d is a subgradient of the convex function −f at x. f(z) (x. Figure 1. as illustrated in Fig.7. Such a hyperplane provides a linear approximation to the function. as shown in the ﬁgure. which is an underestimate in the case of a convex function and an overestimate in the case of a concave function. The set of all subgradients of a convex (or concave) function f at x ∈ n is called the subdiﬀerential of f at x.55) If instead f is a concave function. we say that a vector d ∈ subgradient of f at a point x ∈ n if n is a f (z) ≥ f (x) + (z − x) d. 1) and passes through x.2 Subgradients and Subdiﬀerentials Given a convex function f : n → . The directional derivative and the subdiﬀerential of a convex function are closely linked. 1.2. Equivalently. ∀z∈ n . The deﬁning relation (1. as shown in the following proposition. f (x) supports the epigraph of f .7. A subgradient admits an intuitive geometrical interpretation: it can be identiﬁed with a nonvertical supporting hyperplane to the graph of the function.1) z Figure 1.55) can be written as f (z) − z d ≥ f (x) − x d.
α ∀y∈ n.1)} { 145 0 ∂f(x) x 1 0 ∂f(x) 1 x 1 0 1 x 1 0 1 x Figure 1. we conclude that the subgradient inequality (1. y) as α ↓ 0 [Eq.51)]. and compact set. (a) A vector d is a subgradient of f at x if and only if f (x. and there holds f (x. Proof: (a) The subgradient inequality (1.55) is . the set ∪x∈X ∂f(x) is bounded. (1/2)(x2 . (c) If X is a bounded set. Since the quotient on the left above decreases monotonically to f (x. d∈∂f (x) ∀y∈ n.7. For every x ∈ n.55) is equivalent to f (x + αy) − f(x) ≥ y d.7. convex. (1. f is diﬀerentiable at x with gradient f (x).3: Let f : the following hold: n → be convex. if and only if it has f (x) as its unique subgradient at x. 1.56) In particular. y) ≥ y d.7 Subgradients f(x) = x f(x) = max0. y) = max y d.Sec. The subdiﬀerential of some scalar convex functions as a function of the argument x. (b) The subdiﬀerential ∂f (x) is a nonempty.3. (1. Proposition 1. α > 0. ∀y∈ n.
Hence ∂f(x) is nonempty. . Also if γ = 0. we obtain µ+(z−x) (w/γ) ≥ f (x)+αf (x.55). we have f (x + yk ) ≥ f (x) + yk dk = f (x) + dk . y) .4. To show that ∂f(x) is also bounded. we see that ∂f(x) is the intersection of the closed halfspaces d  y d ≤ f (x.59) By setting α = 0 in the above relation and by taking the limit as µ ↓ f (z). z ∈ µ > f (z). f (x.2). µ > f (z).59). from the subgradient inequality (1. suppose to arrive at a contradicd tion that there is a sequence {dk } ⊂ ∂f (x) with dk  → ∞. 1 equivalent to f (x. 1. By setting z = x and α = 1 in Eq. Therefore. To show that ∂f (x) is nonempty and that Eq. (1. (1. y) ≥ y d. y) ≥ y d for all y ∈ d ∈ ∂f (x) ⇐⇒ Therefore we obtain ∀y∈ n. To show the reverse inequality. implying that (−w/γ) ∈ ∂f (x). α ≥ 0 . take any x and y in n . (1. ∀ α ≥ 0. then Eq. convex. (1. ∀ α ≥ 0.56) holds. k Then.7. it follows that these two sets are nonempty. so it follows that f(x + yk ) → ∞. (1. It follows that ∂f (x) is closed and convex. (1. γ > 0 and by dividing with γ in Eq. z = x + αy. z)  µ > f (z) .58). y)+αy (w/γ). Thus ∂f (x) is bounded.57). This is a contradiction since f is convex and hence continuous. By applying the Separating Hyperplane Theorem (Prop.4. which is a contradiction. we see that there exists a nonzero vector (γ.58) We cannot have γ < 0 since then the lefthand side above could be made arbitrarily small by choosing µ suﬃciently large. Chap. we obtain f (z) ≥ f(x) + (z − x) (−w/γ) for all z ∈ n . we obtain y (−w/γ) ≥ n. ∀ k. and by taking the limit as µ ↓ f (x). w) ∈ n+1 such that γµ+w z ≥ γ f(x)+αf (x. Let yk = dk  .57) (b) From Eq. 1.146 Convex Analysis and Optimization n. z)  µ = f (x) + αf (x. (1. we ﬁrst observe that Eq. Using the deﬁnition of directional derivative and the convexity of f . and disjoint. z ∈ n. so it is bounded on any bounded set. (1. see Fig. y). y) ≥ maxd∈∂f(x) y d [where the maximum is −∞ if ∂f(x) is empty]. where y ranges over the nonzero vectors of n . y) +w (x+αy). and consider the subset of n+1 C1 = (µ.58) implies that w = 0.57) implies that f (x. and the halfline C2 = (µ. (1.
f (x.56) is complete. yk ) → ∞. if and only if it has f (x) as its unique subgradient at x. y).3(b).D. which implies that maxd∈∂f(x) y d ≥ f (x.56). (c) Assume the contrary. and we denote yk = dk / dk .56).e. we assume that dk = 0 for all k. (1.4.E. y) = f (x) y.7. (1. and a sequence {dk } with dk ∈ ∂f (xk ) for all k and dk → ∞. yk ) ≤ f (x. so it follows that f (xk . y). The proof of Eq. they must contain convergent subsequences.54). (1. y). . Illustration of the sets C1 and C2 used in the hyperplane separation argument of the proof of Prop.7 Subgradients µ 147 C2 f(z) C1 0 y x z Figure 1.7. f is diﬀerentiable at x with gradient f(x). Eq. 1. y) is a linear function of the form f (x. This contradicts. Since both {xk } and {yk } are bounded. which requires that lim supk→∞ f (xk . Without loss of generality. that there exists a sequence {xk } ⊂ X . The next proposition shows some basic properties of subgradients.Sec. Q. we have f (xk . we see that f is diﬀerentiable at x with gradient f(x) if and only if the directional derivative f (x. yk ) ≥ dk yk = dk . from Eq. i. We assume without loss of generality that xk converges to some x and yk converges to some y with y = 1. however. 1. From the deﬁnition of directional derivative. Thus. By Eq. (1.
3(c). y) < y d. m. and by Prop.2.3. i. the set ∂f1 (x) + ∂f2 (x) is compact (cf. k→∞ ∀y∈ n. 1.7. ∀ d2 ∈ ∂f2 (x).148 Convex Analysis and Optimization Chap. (b) If f is equal to the sum f1 + · · · + fm of convex functions fj : n → . by Prop. a vector y and a scalar b such that y (d1 + d2 ) < b < y d.55). the sets ∂f1(x) and ∂f2 (x) are compact.7. 1. y) + f2 (x. d1 ∈∂f1 (x) ∀ d1 ∈ ∂f1 (x). y).4: Let f : n → be convex.7. and by Prop.7.16). 1.3(b). 1 Proposition 1. Therefore. d2 ∈∂f2 (x) max y d1 + max y d2 < y d. If d is a limit point of {dk }. implying that ∂f1 (x) + ∂f2 (x) ⊂ ∂f (x).4. we have f1 (z) ≥ f1 (x) + (z − x) d1 . the sequence {dk } is bounded and each of its limit points is a subgradient of f at x. the sequence {dk } is bounded. y) ≤ f (x.3(a). that there exists a d ∈ ∂f (x) such that d ∈ ∂f1 (x) + ∂f2 (x). Therefore.2 y d ≤ lim sup f (xk . Prop. and by Prop. there exists a hyperplane strictly separating d from ∂f1(x) + ∂f2 (x). suppose to arrive at a contradiction. we have by taking limit in the above relation and by using Prop. we have y dk ≤ f (xk . then from the subgradient inequality (1.e. ∀z∈ n. 1.. f1 (x. then ∂f(x) is equal to the vector sum ∂f1 (x) + · · · + ∂fm (x). so by adding. f2 (z) ≥ f2 (x) + (z − x) d2 . . .7. j = 1. we obtain f (z) ≥ f (x) + (z − x) (d1 + d2 ). (b) It will suﬃce to prove the result for the case where f = f1 + f2 . Proof: (a) By Prop. . ∀z∈ n.3(a). y). we have d ∈ ∂f(x). ∀ y ∈ n.7.3(b). . 1.7. 1. If d1 ∈ ∂f1 (x) and d2 ∈ ∂f2 (x). . ∀ z ∈ n. / 1. 1. Since by Prop. Hence d1 + d2 ∈ ∂f (x). (a) If a sequence {xk } converges to x and dk ∈ ∂f (xk ) for all k. To prove the reverse inclusion.
so that f (x.7. 1.5: (Chain Rule) (a) Let f : n → be a convex function and let A be an m × n matrix. Ay). if g is convex and monotonically nondecreasing. we have f1 (x. (b) Let f : n → be a convex function and let g : → smooth scalar function. Then the function F deﬁned by F (x) = g f (x) is directionally diﬀerentiable at all x. be a y∈ n. 1. then F is convex and its subdiﬀerential is given by ∂F (x) = g f(x) ∂f (x). ∀x∈ n. y)+f2 (x. which is a contradiction in view of Prop. we have g z ≤ f (Ax.Sec. (1.7. y) = f (Ax. and its directional derivative is given by F (x.D.3(a).7.61) Proof: (a) It is seen using the deﬁnition of directional derivative that F (x.60) Furthermore. 1. We close this section with some versions of the chain rule for directional derivatives and subgradients. .3(a). z) ∀z∈ m. Then the subdiﬀerential of the function F deﬁned by F (x) = f (Ax). y) < y d. ∀y∈ n. Proposition 1. y). Q. Then by Prop. ∀x∈ n. y).7 Subgradients 149 By using the deﬁnition of directional derivative. Let g ∈ ∂f (Ax) and d = A g. (1.E. y) = g f (x) f (x. y) = f (x. is given by ∂F (x) = A ∂f (Ax) = A g  g ∈ ∂f (Ax) .
3(b). which is a contradiction in view of Prop. 1. suppose to come to a contradiction. y) = f (x. . Ay) < y d. 1. y). ∀y∈ n.1. by Prop. In case (2). Prop.60) holds. there exists a hyperplane strictly separating d from A ∂f (Ax). α f (x + αy) − f(x) As α ↓ 0. ∀y∈ n. y).7. 1. we have for all α ∈ (0. In case (1). from Eq. y) < y d. max (Ay) g < y d. f (x + αy) > f (x) for all α ∈ (0. 1 and in particular. that there exists a d ∈ ∂F (x) such that d ∈ A ∂f(Ax). and by Prop.3(a). To prove the reverse inclusion. Ay) = F (x. y).3. Ay) or (A g) y ≤ F (x. (1. the set A ∂f (Ax) is also compact [cf. (b) We have F (x. y) = 0 and the given formula (1. From this we obtain g∈∂f(Ax) ∀ g ∈ ∂f (Ax). from Eq. α]..7. Since by Prop. (2) for some α > 0. f (x + αy) < f (x) for all α ∈ (0. α]. 1. so the preceding equation yields F (x. or.62). g Ay ≤ f (Ax. / 1.9(d)]. f (x + αy) = f (x) for all α ∈ (0.62). f (Ax. y) = g f (x) f (x. i.3(a). The proof for case (3) is similar. we have A g ∈ ∂F (x). α] F (x. so that A ∂f (Ax) ⊂ ∂F (x).e. y) = lim α↓0 f(x + αy) − f (x) g f (x + αy) − g f (x) · .4. a vector y and a scalar b such that y (A g) < b < y d. (1.7.7. Since f (Ax.3(b). (3) for some α > 0. it follows that F (x. we have F (x. (1. we have f(x + αy) → f (x).62) α↓0 α α From the convexity of f it follows that there are three possibilities: (1) for some α > 0. y) = lim α↓0 g f (x + αy) − g f (x) F (x + αy) − F (x) = lim . by using Prop. 1. α]. Hence.150 Convex Analysis and Optimization Chap. the set ∂f(Ax) is compact.
we have g f (x) > 0 by the monotonicity of g. is equivalent to d/ g f (x) ∈ ∂f (x). 1]. and for the second inequality we use the convexity of f and the monotonicity of g. The set of all subgradients of a convex (or concave) function f at x ∈ n is called the subdiﬀerential of f at x.E. and is denoted by ∂f (x). y).61) holds.3(a). x2 ∈ n and α ∈ [0. or equivalently (from what has been already shown) yd≤ g f (x) f (x. 1. y) for all y ∈ n . Some of their important properties are given in the following proposition. by Prop. If g f (x) = 0.D.7. particularly in the context of computational methods. which.Sec. where for the ﬁrst inequality we use the convexity of g. so ∂F (x) = {0} and the desired formula (1. so we obtain y d ≤ f (x. . 1. 1. we say that d is an subgradient of f at x if −d is a subgradient of the convex function −f at x.63) If instead f is a concave function.7. (1. ∀y∈ n.5. we have αF (x1 ) + (1 − α)F (x2 ) = αg f (x1 ) + (1 − α)g f (x2 ) ≥ g αf (x1 ) + (1 − α)f (x2 ) ≥ g f αx1 + (1 − α)x2 = F αx1 + (1 − α)x2 .7 Subgradients 151 If g is convex and monotonically nondecreasing. If g f(x) = 0. g f (x) ∀y∈ n. subgradients ﬁnd several applications in nondiﬀerentiable optimization. y). 1. The subdiﬀerential is illustrated geometrically in Fig. Given a convex function f : n → and a scalar > 0. then F is convex since for any x1.7.61). we say that a vector d ∈ n is an subgradient of f at a point x ∈ n if f (z) ≥ f (x) + (z − x) d − . Thus we have shown that d ∈ ∂F (x) if and only if d/ g f (x) ∈ ∂f (x). this relation yields d = 0.7. To obtain the formula for the subdiﬀerential of F . which proves the desired formula (1. d ∈ ∂F (x) if and only if y d ≤ F (x. 1.3 Subgradients We now consider a notion of approximate subgradient.3(a). ∀z∈ n. we note that by Prop. Q.
Illustration of the subdiﬀerential ∂ f (x) of a convex onedimensional function f : → .5. δ .7. (x) = inf δ>0 f(x + δ) − f (x) + . 1 f(z) Slopes = endpoints of εsubdifferential at x ε 0 x z Figure 1. The subdiﬀerential is a bounded interval. Its left endpoint is f − (x) = sup δ<0 f (x + δ) − f (x) + . and corresponds to the set of slopes indicated in the ﬁgure. δ and its right endpoint is + fj.152 Convex Analysis and Optimization Chap.
∀ α > 0. . d∈∂ f (x) (b) We have 0 ∈ ∂ f (x) if and only if f (x) ≤ infn f (z) + . the following hold: be a positive (a) The subdiﬀerential ∂f (x) is a nonempty.65) It follows that ∂ f(x) is the intersection of the closed subspaces d inf f (x + αy) − f (x) + α ≥yd α>0 . d∈∂ f (x) d . convex.7. (e) If f is equal to the sum f1 + · · · + fm of convex functions fj : n → . . (d) If 0 ∈ ∂ f (x). j = 1.6: Let f : n → be convex and scalar.Sec. Proof: (a) We have d ∈ ∂ f(x) Hence d ∈ ∂ f (x) ⇐⇒ inf f(x + αy) − f (x) + α ≥ y d.7 Subgradients 153 Proposition 1. and compact set. then the direction y = −d. (1. 1. . ∀ y ∈ n . . ⇐⇒ f (x + αy) ≥ f (x) + αy d − . For every x ∈ n . then α>0 inf f(x + αy) < f(x) − . z∈ (c) If a direction y is such that y d < 0 for all d ∈ ∂ f(x). where / d = arg min satisﬁes y d < 0 for all d ∈ ∂ f(x). and for all y ∈ n there holds α>0 inf f (x + αy) − f (x) + α = max y d. ∀y∈ n. m. then ∂ f (x) ⊂ ∂ f1 (x) + · · · + ∂ fm (x) ⊂ ∂m f (x).64) α>0 (1.
i. and µ > f (z). To show that ∂ f (x) is nonempty and satisﬁes α>0 inf f(x + αy) − f (x) + α = d∈∂ f (x) max y d.67) .66) implies that w = 0. Let yk = dk  .3(b). so it is bounded on any bounded set. To show that ∂ f (x) is also bounded. and µ > f (z).154 Convex Analysis and Optimization Chap. µ+ w γ (z − x) ≥ f(x) − + β inf α>0 f (x + αy) − f(x) + +β α w γ y. we have for α = 1 f (x + yk ) ≥ f (x) + dk  − .e. α>0 Hence there exists a hyperplane separating them. γµ + w z ≤ γ f(x) − + β inf α>0 f (x + αy) − f(x) + α + w (x + βy).. k (1. z ∈ n . 0) such that for all β ≥ 0. Hence ∂ f (x) is closed and convex.66) We cannot have γ > 0. β ≥ 0}. z)  µ > f (z) .64). if γ = 0. z) ∈ C2 µ = f (x) − + β inf f (x + αy) − f (x) + α f(x + βy) − f (x) + ≤ f (x) − + β β = f (x + βy) = f (z). Then. ∀y∈ n. z ∈ n . we obtain for all β ≥ 0. (1.66) by γ. f (x + αy) − f (x) + . so it follows that f(x + yk ) → ∞. α C2 = {(µ. ∀ k. This is a contradiction since f is convex and hence continuous. Consider the subset of n+1 and the halfline C1 = (µ. we argue similar to the proof of Prop. suppose to arrive at a contradiction that there is a d sequence {dk } ⊂ ∂ f (x) with dk  → ∞. (1. z)  µ = f(x)− +β inf α>0 These sets are nonempty and convex. Thus ∂ f (x) is bounded. Also. (1. since we have for all (µ. (1.7. γ < 0 and after dividing Eq. Therefore. w) = (0. Eq. They are also disjoint. a vector (γ. since then the lefthand side above could be made arbitrarily large by choosing µ to be suﬃciently large. from Eq. z = x+βy. 1. 0). contradicting the fact that (γ. 1 as y ranges over n . w) = (0.
Hence d d ≥ d2 > 0.3. ∀ d ∈ ∂ f(x). we obtain f (z) ≥ f(x) − + − w γ (z − x). while by Prop. γ β α>0 α Since β can be chosen as large as desired.65). using part (a).3. we see that − w f (x + αy) − f (x) + y ≥ inf . 0 ∈ ∂ f (x) if and only if f(z) ≥ f (x) − for all z ∈ which is equivalent to inf z∈ n f(z) ≥ f (x) − . α>0 γ α Combining this relation with Eq. we have max y d = inf f (x + αy) − f (x) + α ≥ 0. (1.Sec. (1. (d − d) d ≥ 0. (1. . ≥ 0.68).67).7 Subgradients 155 Taking the limit above as µ ↓ f (z) and setting β = 0. d∈∂ f (x) α>0 which contradicts Eq. 1. we obtain w f(x + αy) − f (x) + − y ≥ − + inf . ∀ α > 0. (c) Assume that a direction y is such that d∈∂ f(x) max y d < 0. (1. and by letting µ ↓ f (x) and by dividing with β. α n. If 0 ∈ ∂ f (x). (d) The vector d is the projection of the zero vector on the convex and compact set ∂ f (x). ∀z∈ n. or equivalently f(x + αy) − f (x) + α Consequently. Then f(x + αy) − f (x) ≥ − for all α > 0. we have d > 0. ∀ d ∈ ∂ f (x). d∈∂ f(x) α>0 (b) By deﬁnition. we obtain max d y = inf f (x + αy) − f(x) − . Also by taking z = x in Eq. 1. showing that ∂ f (x) is nonempty. Hence −w/γ belongs to ∂ f (x).68) while inf α>0 f (x + αy) ≥ f (x) − .
2. . 1. (1. so by adding. there exists a hyperplane strictly separating d from ∂ f1 (x) + ∂ f2 (x). If d1 ∈ ∂ f1 (x) and d2 ∈ ∂ f2 (x). max y d1 + d2 ∈∂ f2 (x) max y d2 < y d.4. j = 1. i. 2. ∀ α > 0. implying that ∂ f1 (x) + ∂ f2 (x) ⊂ ∂2 f (x).3. we have f1 (x + αy) ≥ f1(x) + αy d1 − . ∀ y ∈ n. (1. Hence from Eq.e. that there exists a d ∈ ∂ f (x) such that d ∈ ∂ f1 (x)+∂ f2 (x). αj α j = 1. we obtain f (x + αy) ≥ f (x) + αy (d1 + d2) − 2 .156 Convex Analysis and Optimization Chap. To prove that ∂ f(x) ⊂ ∂ f1 (x)+ ∂ f2 (x). 1 (e) It will suﬃce to prove the result for the case where f = f1 + f2 . since αj ≥ α..2. ∀ d2 ∈ ∂ f2 (x). inf f1 (x + αy) − f1 (x) + f2 (x + αy) − f2 (x) + + inf α>0 α α < y d. we use an argument similar to the one of the proof of Prop. Suppose to come to a contradiction.7. a vector y and a scalar b such that y (d1 + d2 ) < b < y d. 1. Prop. Thus. be positive scalars such that f1 (x + α1 y) − f1 (x) + f2 (x + α2 y) − f2(x) + + α1 α2 Deﬁne α= 1 . and by Prop. j = 1.69) As a consequence of the convexity of fj . 2. we have fj (x + αj y) − fj (x) fj (x + αy) − fj (x) ≥ . 1/α1 + 1/α2 < y d.4(b). f2 (x + αy) ≥ f2(x) + αy d2 − . then from Eq. From this we obtain d1 ∈∂ f1 (x) ∀ d1 ∈ ∂ f1 (x). and by part (a). the sets ∂ f1(x) and ∂ f2 (x) are compact. ∀ y ∈ ∀ α > 0.64).16). (1. ∀ y ∈ n. Since by / part (a). ∀ α > 0. the ratio fj (x + αy) − fj (x) /α is monotonically nondecreasing in α. 1. we have d1 + d2 ∈ ∂2 f (x).64). n. the set ∂ f1 (x)+∂ f2 (x) is compact (cf. α>0 Let αj .
The subdiﬀerential ∂f (x) is the set of all subgradients of the convex (or concave) function f . if x ≤ 0 or 1 < x. f (x) < ∞] if the subgradient inequality holds. we check whether 0 ∈ ∂ f (x). ∞] can be developed along similar lines.Sec. 1. a vector d is a subgradient of f at a vector x ∈ dom(f) [i. 1. [BeM73]) for minimizing convex functions to within a tolerance of . If not.4 Subgradients of Extended RealValued Functions We have focused so far on realvalued convex functions f : n → . If this is so. then by Prop.E. ∀z∈ n. For example. ∞) Ø if 0 < x < 1. 1.. If the extended realvalued function f : n → [−∞. then by going along the direction opposite to the vector of minimum norm on ∂ f (x). By convention. if x = 1. we are guaranteed a cost improvement of at least . thus implying that ∂ f(x) ⊂ ∂ f1 (x) + ∂ f2 (x). has the subdiﬀerential 1 − 2√x ∂f (x) = [−1/2. i. the extended realvalued convex function given by √ − x if 0 ≤ x ≤ 1. ∞) is concave.7. Parts (b)(d) of Prop.D. At a given point x. (1. [Lem75] is given in the exercises.e.6 contain the elements of an important algorithm (called descent algorithm and introduced by Bertsekas and Mitter [BeM71].7. 1. The notion of a subdiﬀerential and a subgradient of an extended realvalued convex function f : n → (−∞. α>0 α This contradicts Eq. An implementation of this algorithm due to Lemarechal [Lem74]. f (z) ≥ f (x) + (z − x) d. Note that contrary to the case of realvalued functions.7.69) together with the deﬁnition of α yields yd> f1 (x + α1 y) − f1 (x) + f1 (x + α1 y) − f1 (x) + + α1 α1 f1 (x + αy) − f1 (x) f2 (x + αy) − f2(x) ≥ + + α α α f(x + αy) − f (x) + ≥ inf . or closed but unbounded. ∂f (x) may be empty.. which are deﬁned over the entire space n and are convex over n .e. x is an optimal solution.65).7 Subgradients 157 and Eq. ∂f (x) is considered empty for all x with f (x) = ∞. . (1. Q. In particular.6(a). f (x) = ∞ otherwise. d is said to be a subgradient of f at a vector x with f (x) > −∞ if −d is a subgradient of the convex function −f at x.
6 illustrates the deﬁnition of ∂ f (x) for the case of a onedimensional function f . (x) = inf δ>0. 1. which is convex and has as eﬀective domain an interval D. Illustration of the subdiﬀerential ∂ f (x) of a onedimensional function f : → (−∞. One can provide generalized versions of the results of Props.7. For = 0. ∞].6. The subdiﬀerential ∂ f(x) is the set of all subgradients of f .31.7. Note that these endpoints can be −∞ (as in the ﬁgure on the right) or ∞. a vector d is an subgradient of f at a vector x such that f (x) < ∞ if f (z) ≥ f (x) + (z − x) d − . The ﬁgure illustrates that even when f is extended realvalued.158 Convex Analysis and Optimization Chap. as also illustrated by the above example. This can be shown for multidimensional functions f as well. it may be empty for x at the relative boundary of D (as in the ﬁgure on the right). Note that while ∂f (x) is nonempty for all x in the interior of D. the subdiﬀerential ∂ f (x) is nonempty at all points of its eﬀective domain. Note that ∂ f(x) is nonempty at all x ∈ D. However. Figure 1. ∀z∈ n. x+δ∈D ∞ f (x+δ)−f (x)+ δ if x < sup D.6 within the context of extended realvalued convex functions. 1 Thus. if x = sup D. the above formulas also give the endpoints of the subdiﬀerential ∂f(x). Its left endpoint is f − (x) = supδ<0. ∂f (x) can be empty and can be unbounded at points x that belong to the eﬀective domain of f (as in the cases x = 0 and x = 1. and its right endpoint is + fj. f(z) f(z) Slope = right endpoint of εsubdifferential at x Slopes = endpoints of εsubdifferential at x ε 0 D x z ε 0 x D z Figure 1. Similarly.7.7. of the above example). if inf D = x. respectively. x+δ∈D −∞ f (x+δ)−f (x)+ δ if inf D < x. The subdiﬀerential corresponds to the set of slopes indicated in the ﬁgure. it can be shown that ∂f (x) is nonempty and compact at points x that are interior points of the eﬀective domain of f . but with .
1. Some of these generalizations are discussed in the exercises.4. It is therefore important to characterize this function.7 Subgradients 159 appropriate adjustments and additional assumptions to deal with cases where ∂f(x) may be empty or noncompact. the max function f (x) = maxz∈Z φ(x. 1.3. including duality. . z) is convex for all z ∈ Z.Sec.5 Directional Derivative of the Max Function As mentioned in Section 1. and the following proposition gives the directional derivative and the subdiﬀerential of f for the case where φ(·.7. z) arises in a variety of interesting contexts.
70) Z(x) = z φ(x. z) at x in the direction y. . and let φ : n × Z → be continuous and such that φ(·.70) (1. x φ(x. z) is convex and has directional derivative given by f (x.3. and Z(x) is the set of maximizing points in Eq. z) is the vector with coordinates ∂φ(x. (c) The conclusion of part (a) also holds if. 1. z) is continuous  z ∈ Z(x) . Proof: (a) The convexity of f has been established in Prop. z) is diﬀerentiable for all z ∈ Z and on Z for each x. y) is the directional derivative of the function φ(·. y). ·) (b) If φ(·. where x φ(x. we assume that Z(x) is nonempty for all x ∈ n . z. z) . (a) The function f : n → given by z∈Z f (x) = max φ(x. z) is diﬀerentiable at x.2. z. z) = max φ(x. 1. instead of assuming the Z is compact. We note that since φ is continuous and Z is compact. if Z(x) consists of a unique point z and φ(·. . ∀x∈ n. z∈Z(x) (1.3(c).7: (Danskin’s Theorem) Let Z ⊂ m be a compact set. then f is diﬀerentiable at x. z). and f (x) = x φ(x. n. z) : n → is convex for each z ∈ Z. .1) and f takes real values. the set Z(x) is nonempty by Weierstrass’ Theorem (Prop. z∈Z In particular. then ∂f (x) = conv x φ(x. . For any . ∂xi i = 1.7. z) .71) where φ (x. and that φ and Z are such that for every convergent sequence {xk }. there exists a bounded sequence {zk } with zk ∈ Z(xk ) for all k. 1 Proposition 1. (1. y) = max φ (x.160 Convex Analysis and Optimization Chap.
We now have f (x. consider a sequence {αk } with αk ↓ 0 and let xk = x + αk y. we obtain f (x.7 Subgradients n. Since {zk } belongs to the compact set Z. y) ≥ φ (x. zk . y) ≤ lim sup φ (x + αk y. z). α α Taking the limit as α decreases to zero.4. z) ≥ . zk ) ≥ φ(xk . z. y). we use the deﬁnition of f to obtain f (x + αy) − f (x) φ(x + αy. zk ). (1. zk . 1. z∈Z(x) ∀y∈ n. (1. z.53). zk ) − φ(x. zk ) =− αk ≤ −φ (x + αk y. it has a subsequence converging to some z ∈ Z. 1. z). we conclude that f (x.72) To prove the reverse inequality and that the supremum in the righthand side of the above inequality is attained. Since this is true for every z ∈ Z(x). . we assume that the entire sequence {zk } converges to z. zk ) ≤ αk φ(x + αk y + αk (−y). For each k. y). z ∈ Z(x). k→∞ where the right inequality in the preceding relation follows from Prop. (1. ∀ z ∈ Z. z. and yk = y. zk ) − φ(x + αk y. z) = αk φ(x + αk y. y). xk = x + αk y.2 with fk (·) = φ(·. so by taking the limit as k → ∞ and by using the continuity of φ. let zk be a vector in Z(xk ). we obtain φ(x. ∀ z ∈ Z. y) ≥ sup φ (x. y ∈ and α > 0. Without loss of generality.73) where the last inequality follows from inequality (1.71).Sec. y). By letting k → ∞. This relation together with inequality (1. we obtain f (x. −y) ≤ φ (x + αk y.72) proves Eq. zk ) − φ(x. zk . z) ≥ φ(x. 161 z ∈ Z(x). y) ≤ φ (x. We have φ(xk . z) − φ(x. y) ≤ f (x + αk y) − f(x) αk φ(x + αk y. Therefore.
we have f (x. x φ(x. 1. z) y. z)  z ∈ Z(x) is compact. Thus ∂f (x) ⊂ conv the proof is complete. z) (y − x). if Z(x) consists of the unique point z. the set x φ(x. y) = max while by Prop. z) y = f (x. z) x φ(x. x φ(x. z) x φ(x. such that d y > γ > x φ(x.2. z∈∂f (x) z∈Z(x) x φ(x. ·). y). To prove the reverse inclusion. we use a hyperplane separation argument. we have d y > max z∈Z(x) x φ(x.E. If d ∈ ∂f(x) while d ∈ conv x φ(x.71) and the diﬀerentiability assumption on φ yield f (x. Q.4. ·) and the compactness of Z. . z) ≥ φ(x. z) conv  z ∈ Z(x) ⊂ ∂f (x). z)  z ∈ Z(x) . Eq. so by the continuity of x φ(x. 1.3). By the continuity of φ(x. y) = y which implies that f (x) = x φ(x. (b) By part (a). implying that x φ(x. z) y.D. Therefore. (y − x) is a subgradient of f at x. ∀ z ∈ Z(x). there exist y = 0 and γ ∈ . z). it follows that conv x φ(x. we have f (x.3. y) = φ (x. z) contradicting Prop. y) = max d y. For all z ∈ Z(x) and y ∈ n.7. By Prop. z. (1.3.8. z) + = f (x) + Therefore. 1. 1. x φ(x. by / the Strict Separation Theorem (Prop. ∀y∈ n. we see that Z(x) is compact. we have f (y) = max φ(y. z)  z ∈ Z(x) is compact.162 Convex Analysis and Optimization Chap. z).  z ∈ Z(x) and (c) The proof of this part is nearly identical to the proof of part (a). 1 For the last statement of part (a).7. z) z∈Z ≥ φ(y.
Suppose that f and g are directionally diﬀerentiable at x. ·) : n → (a) f (x. 1.7. f (x. . Then for the following hold. .Sec. . n . 1. am are given vectors in n and b1 . . EXE RC ISES 1. . bm are given scalars.7. 1. In this case. . y) for all λ ≥ 0 and y ∈ (b) f (x. . ·) is convex over n n be a ﬁxed vector. . (c) The level set {y  f (x. . where a1 . .7 Subgradients 163 A simple but important application of the above proposition is when Z is a ﬁnite set and φ is linear in x for all z. . . ∀d∈ n . . and g is such that for all w ∈ m g (y. . w) = α↓0. .7.2 (Chain Rule for Directional Derivatives) Let f : n → m and g : m → k be given functions. α Then the composite function F (x) = g f (x) is directionally diﬀerentiable at x and the following chain rule applies F (x.1 (Properties of Directional Derivative) Let f : n → be a convex function and let x ∈ the directional derivative function f (x. Show that a vector d ∈ n is a subgradient of f at x if and only if the function d y − f(y) achieves its maximum at y = x. Then ∂f (x) is the convex hull of the set of all vectors ai such that ai x+bi = max{a1 x + b1. d) = g f (x). . and let x ∈ n be a given point. z→w lim g(y + αz) − g(y) . am x + bm }. λy) = λf(x. d) . f(x) can be represented as f (x) = max{a1 x + b1 . . . am x + bm }.3 Let f : n → be a convex function. y) ≤ 0} is a closed convex cone.
if x = 0. Show also that f (x. . if x ∈ C. (b) For f(x) = max1≤i≤n xi . 1 1.4 Show the following: (a) For f(x) = x. (c) For the function f (x) = 0 ∞ if x ∈ C. ei is the ith basis vector in n .7. y ∈ X.7. and Jx is the set of all i such that f (x) = xi . ∀ x ∈ X. . ∀y∈ n ∀ x. (b) Assuming that the level set {z  f(z) ≤ f (x)} is nonempty. that there exists a scalar L such that f (x) − f (y) ≤ L x − y . y) ≤ 0} = cone ∂f (x) . Show that f is Lipschitz continuous over X. ∗ (a) Show that {y  f (x.7. . .3). i. show that the normal cone of {z  f (z) ≤ f (x)} at the point x coincides with cone ∂f (x) . .6 (Polar Cones of Level Sets) Let f : n → be a convex function and let x ∈ n be a ﬁxed vector.164 Convex Analysis and Optimization Chap. we have ∂f (x) = NC (x) Ø 1. we have ∂f(x) = conv{sign(xi )ei  i ∈ Jx } conv{±e1 . 1. y) ≤ L y.e.7. . ±en } if x = 0. if x ∈ C. if x ∈ C. 1. we have ∂f (x) = {x/x} {x  x ≤ 1} if x = 0. where C is a nonempty convex set.. if x = 0. Hint: Use the boundedness property of the subdiﬀerential (Prop.5 Let f : n → be a convex function and X ⊂ n be a bounded set. where sign(t) denotes the sign of a scalar t.
Show also that ∂f (x) is nonempty and bounded if and only if x belongs to the interior of dom(f ). . NC1 ∩···∩Cm (x) = NC1 (x) + · · · + NCm (x). ∀x ∈ n .7. 1. Cm be convex sets in Show that n such that ri(C1 ) ∩ · · · ∩ ri(Cm ) is nonempty. Hint: For each i.11 Let f : n → be a convex function and X ⊂ every > 0. Furthermore. Then use Exercises 1.8 Let fi : n → (−∞. assuming that the relative interiors of dom(fi ) have a nonempty intersection.8. Show that ∂f1 (x) + · · · + ∂fm (x) ⊂ ∂f (x). Show that ∂f (x) is nonempty if x belongs to the relative interior of dom(f ). . let fi (x) = 0 when x ∈ C and fi (x) = ∞ otherwise. and let f = f1 + · · · + fm . . i = 1. n a bounded set.7 Subgradients 165 1. show that ∂f1 (x) + · · · + ∂fm (x) = ∂f (x). m be convex functions. . .7. 1. Hint: Follow the line of proof of Prop. ∞].Sec. 1. ∀x∈ n . the set ∪x∈X ∂ f (x) is bounded. .4(c) and 1.7 Let f : n → (−∞.7. Show that ∩ >0 ∂ f (x) = ∂f (x). .7. ∞] be a convex function.7.9 Let C1 . 1.7. 1.7. .10 Let f : n → be a convex function. Show that for .8. 1.7.
166 Convex Analysis and Optimization Chap. A vector d ∈ n is said to be a descent direction of f at x if the corresponding directional derivative of f satisﬁes f (x. 1. Note: This exercise shows that a pointtoset mapping x → ∂ f (x) is continuous.13 (Steepest Descent Directions of Convex Functions) Let f : n → be a convex function and ﬁx a vector x ∈ n . d) over all d with d ≤ 1. .7. . . while part (b) shows that this mapping is lower semicontinuous. For every x ∈ n . .. Show that a vector d is the vector of minimum norm in ∂f (x) if and only if either d = 0 or else d/ d minimizes f (x. Consider a limit point of {(wk . else let gk be an element of ∂f (x) such that gk wk = min g wk = f (x. . . . then gi wk ≥ wk 2 ≥ g ∗ 2 > 0 for all i = 1. gk−1 . then the sequence {dk } is bounded and each of its limit points is an subgradient of f at x. let wk be the vector of minimum norm in the convex hull of g1 . where g ∗ is the subgradient of minimum norm. g∈∂f (x) Show that this process terminates in a ﬁnite number of steps with a descent direction. In particular.7. For k = 2.14 (Generating Descent Directions of Convex Functions) Let f : n → be a convex function and ﬁx a vector x ∈ n . 3. part (a) shows that the mapping x → ∂ f (x) is upper semicontinuous. gk )}. Hint : If wk is not a descent direction. wk = arg g∈conv{g1 . then for every sequence {xk } converging to x..gk−1 } min g . d) < 0.. (a) If a sequence {xk } converges to x and dk ∈ ∂ f (xk ) for all k.. . .12 Continuity of Subdiﬀerential [Nur77] Let f : n → be a convex function and the following hold: > 0 be a scalar. . . Assume that x does not minimize f and let g1 be some subgradient of f at x. while at the same time gk wk ≤ 0. 1 1. there is a sequence {dk } such that dk ∈ ∂ f (xk ) and dk → d. wk ). .7. k − 1. This exercise provides a method for generating a descent direction in circumstances where obtaining a single subgradient at x is relatively easy. (b) If d ∈ ∂ f(x). If wk is a descent direction stop. 1..
. ∀ x ∈ X.7.8 Optimality Conditions 167 1. If such a α can be found. by a search along the line {x − αwk  α ≥ 0}.. This leads to the following optimality condition. 1..7. . .5 to problems involving a nondiﬀerentiable but convex cost function.52) of directional derivative and the fact that the diﬀerence quotient f x∗ + α(x − x∗ ) − f(x∗ ) α is a monotonically nondecreasing function of α. or by conﬁrmation that x is an optimal solution.7. We will ﬁrst use directional derivatives to provide a necessary and suﬃcient condition for optimality in the problem of minimizing a convex function f : n → over a convex set X ⊂ n . This exercise shows how the procedure of Exercise 1. we have g1 .15 (Implementing the Descent Method [Lem74]) Let f : n → be a convex function. It can be seen that x∗ is a global minimum of f over X if and only if f (x∗ .Sec. Otherwise. Let wk be the vector of minimum norm in the convex hull of g1 . gk−1 . gk−1 ∈ ∂ f (x). x − x∗ ) ≥ 0. determine whether there exists a scalar α such that f (x − αwk ) < f (x) − .. wk = arg min g . Otherwise let gk be an element of ∂ f (x) such that gk wk = min g wk . This follows using the deﬁnition (1. so by Prop. f (x) ≤ inf z∈ n f (z) + and x is optimal.8 OPTIMALITY CONDITIONS In this section we generalize the constrained optimality conditions of Section 1. .14 can be modiﬁed so that it generates an descent direction at a given vector x. 1.6(b). . we have 0 ∈ ∂ f (x). . 1.. . stop. At the typical step of this procedure. . g∈∂ f (x) Show that this process will terminate in a ﬁnite number of steps with either an improvement of the value of f by at least . .gk−1 } If wk = 0. stop and replace x with x − αwk (the value of f has been improved by at least ). g∈conv{g1 .
we see that y belongs to the polar cone of FX (x∗ )∗ . ∀ d ∈ W. we are done. and the cone W = −FX (x∗ )∗ = d  d w ≥ 0. which contradicts the optimality of x∗ . so x∗ minimizes f over X . i. Conversely.1).3 there exists a hyperplane strictly separating ∂f (x∗ ) and W . ∀ w ∈ FX (x∗ ) . also yields g∈∂f(x∗ ) max g y < c < 0.74) which when combined with the preceding inequality. so to arrive at a contradiction. Using the fact that W is a closed cone. .e. if and only if where TX (x∗ )∗ is the polar of the tangent cone of X at x∗ . assume the opposite. x∗ minimizes f over a convex set X ⊂ 0 ∈ ∂f(x∗ ) + TX (x∗ )∗ . since from the deﬁnition of a subgradient we have f (x) − f (x∗ ) ≥ d (x − x∗ ) for all x ∈ X. suppose that x∗ minimizes f over X.1: Let f : n → be a convex function. we have f (x∗ .8.74). it follows that c < 0 ≤ d y. a vector y and a scalar c such that g y < c < d y. 1. we obtain f (x) − f(x∗ ) ≥ 0 for all x ∈ X. ∀ d ∈ W. y) < 0. ∂f (x∗ ) ∩ W = Ø. Proof: Suppose that for some d ∈ ∂f (x∗ ) and all x ∈ X.e. while from Eq. n Equivalently. A vector x∗ minimizes f over a convex set X ⊂ n if and only if there exists a subgradient d ∈ ∂f (x∗ ) such that d (x − x∗ ) ≥ 0. (1. implies that y is in the closure of the set of feasible directions FX (x∗ ). which by the Polar Cone theorem (Prop.. ∀ x ∈ X. If ∂f (x∗ ) and W have a point in common. 1 Proposition 1. yk ) < 0. (1. ∀ g ∈ ∂f (x∗ ).. i. using part (b). Then.168 Convex Analysis and Optimization Chap. by Prop.4. 1. Consider the set of feasible directions of X at x∗ FX (x∗ ) = {w = 0  x∗ + αw ∈ X for some α > 0}. we have d (x − x∗ ) ≥ 0. Thus. Since ∂f (x∗ ) is compact and W is closed. Hence for a sequence yk of feasible directions converging to y we have f (x∗ .5.
Q. In the special case where X = n . Then there exists a sequence {ξk } and a sequence {xk } ⊂ X such that xk = x∗ for all k.E. ξk → 0.75) . Let y be a nonzero tangent of X at x∗ .2 for the case where f is convex and smooth: f(x∗ ) (x − x∗ ) ≥ 0. xk − x∗ y = + ξk .8.2: Let x∗ be a local minimum of a function f : n → over a subset X of n . we obtain a basic necessary and suﬃcient condition for unconstrained optimality of x∗ : 0 ∈ ∂f (x∗ ). and that f has the form f (x) = f1 (x) + f2 (x). 1. 1. and xk → x∗ . Props. Note that the above proposition generalizes the optimality condition of Prop.6.Sec. y We write this equation as xk − x∗ = (1.D.2. We ﬁnally extend the optimality conditions of Props. 1. where f1 is convex and f2 is smooth. Proof: The proof is analogous to the one of Prop.5.55). Then − f2 (x∗ ) ∈ ∂f1 (x∗ ) + TX (x∗ )∗ .5).2 and 1. 1.3 and 1. Proposition 1.6.5. This optimality condition is also evident from the subgradient inequality (1.6.8 Optimality Conditions 169 The last statement follows from the convexity of X which implies that TX (x∗ )∗ is the set of all z such that z (x − x∗ ) ≤ 0 for all x ∈ X (cf. Assume that the tangent cone TX (x∗ ) is convex. 1. xk − x∗ y xk − x∗ yk .1 to the case where the cost function involves a (possibly nonconvex) smooth component and a convex (possibly nondiﬀerentiable) component. ∀ x ∈ X.8.
2. xk − x∗ ) + so using Eq. then since xk → x∗ and yk → y. y) + f2 (x∗ ) y < 0.75). x ∀ k. 1. y)+ f2 (x∗ ) y ≥ 0 for all y ∈ TX (x∗ ). (1. By the convexity of f1 . xk − x∗ ) ≤ f1 (xk . 1 where yk = y + y ξk . it follows. yk ) + f2(˜k ) yk .3(b). We have thus shown that f1 (x∗ . and by adding these inequalities. or equivalently [since TX (x∗ ) is a cone] y ≤1 d∈∂f1 (x∗ ) y∈TX (x∗ ) min max d+ f2 (x∗ ) y = 0. . f1 (xk . (1. we have −f1 (xk . we have f2 (xk ) = f2 (x∗ ) + f2 (˜k ) (xk − x∗ ). ∀ k. ˜ using also Prop. d∈∂f1 (x∗ ) max dy+ f2 (x∗ ) y ≥ 0. x ∀ k. x∗ − xk ) and f1 (xk ) + f1(xk .7. Also. by Prop.170 Convex Analysis and Optimization Chap. by the Mean Value Theorem. We now show by contradiction that f1 (x∗ . Indeed. x∗ − xk ) ≤ f1 (x∗ ). that for all suﬃciently large k. we obtain f (xk ) ≤ f (x∗ ) + xk − x∗ y f1 (xk . 1.7. we obtain f1 (xk ) ≤ f1(x∗ ) + f1 (xk . y)+ f2 (x∗ ) y ≥ 0. or equivalently.76) f2 (˜k ) (xk − x∗ ). This contradicts the local optimality of x∗ . where xk is a vector on the line segment joining xk and x∗ . (1. if f1 (x∗ . f (xk ) ≤ f(x∗ ) + f1 (xk . By adding the ˜ last two relations.76)] f(xk ) < f (x∗ ). xk − x∗ ). x ∀ k. yk ) + f2 (˜k ) yk < 0 x and [from Eq.
9 Notes and Sources 171 We can now apply the Saddle Point Theorem (Prop.8 – the convexity/concavity and compactness assumptions of that proposition are satisﬁed) to assert that there exists a d ∈ ∂f1 (x∗ ) such that y ≤1 y∈TX (x∗ ) min d+ f2 (x∗ ) y = 0.Sec.8. . This implies that which in turn is equivalent to − f2 (x∗ ) ∈ ∂f1 (x∗ ) + TX (x∗ )∗ . This is because for a convex set X. Q.3. but it is essential in general.1 and Farkas’ Lemma. r}. and use Prop. . 1. when X is convex. . 1. x2 )  x1 x2 = 0 of 2 .8. an equivalent statement of Prop.2.2 is that if x∗ is a local minimum of f over X. . The convexity assumption on TX (x∗ ) is unnecessary in this case.1. 1.8. µ∗ 1 r such that (ii) 0 ∈ ∂f(x∗ ) + (i) µ∗ ≥ 0 for all j. j j r j=1 µ∗ aj . we have z ∈ TX (x∗ )∗ if and only if z (x − x∗ ) ≤ 0 for all x ∈ X.6.1 Consider the problem of minimizing a convex function f : polyhedral set X = {x  aj x ≤ bj . 1. it is easy to construct a convex nondiﬀerentiable function that has a global minimum at x∗ = 0 without satisfying the necessary condition of Prop. . 1. . − d+ f2 (x∗ ) ∈ TX (x∗ )∗ . 1. .] In the special case where f2 (x) ≡ 0 and X is convex. More generally. Prop. [Consider the subset X = (x1 . j Hint : Characterize the cone TX (x∗ )∗ . 1. .2. we obtain Prop.8. j = 1. ∀ x ∈ X. . Note that in the special case where f1 (x) ≡ 0.8. 1. EXE RC ISES 1. there exists a subgradient d ∈ ∂f1(x∗ ) such that d+ f2 (x∗ ) (x − x∗ ) ≥ 0.E. n → over the Show that x∗ is an optimal solution if and only if there exist scalars µ∗ .2 yields the necessity part of Prop.8. and µ∗ = 0 for all j such that aj x∗ < bj .D.
Borwein and Lewis [BoL00] develop many of the concepts in Rockafellar and Wets [RoW98]. and gives many historical references. we note Steinitz [Ste13]. Among the early contributors to convexity theory. The book by Rockafellar [Roc70].9 NOTES AND SOURCES The material in this chapter is classical and is developed in various books.” a broad spectrum of topics that integrate classical analysis. but more succinctly. Minkowski [Min11]. The book by Rockafellar and Wets contains a wealth of material beyond what is covered here. Subdiﬀerential theory owes much to the work of Fenchel. Bertsekas [Ber98] also gives a detailed coverage of this material. who also investigated hyperplane separation and gave the theorem on convex hulls that carries his name. and contains a more extensive development of convexity that the one given here. and optimization of both convex and nonconvex (possibly nonsmooth) functions. [Ste14]. convexity. Among other books with detailed accounts of convexity. [Ste16]. who developed the theory of relative interiors. HiriartUrruty and Lemarechal [HiL93] emphasize algorithms for dual and nondiﬀerentiable optimization. The normal cone. The MinkowskiWeyl Theorem given here is due to Weyl [Wey35]. who in his 1951 lecture notes [Fen51] laid the foundations for the subsequent work on convexity and its connection with optimization. 1 1. Caratheodory [Car11]. recession cones. but diﬀer in their scope and emphasis. which owes much to the early work of Minty [Min60] on network optimization. Schrijver [Sch86] provides an extensive account of polyhedral convexity with applications to integer programming and combinatorial optimization. and polar cones. known as monotropic programming. . who ﬁrst investigated supporting hyperplane theorems. and the work of Clarke on nonsmooth analysis [Cla83] play a central role in this subject. Bonnans and Shapiro [BoS00] emphasize sensitivity analysis and discuss inﬁnite dimensional problems as well. and is strongly recommended for the advanced reader who wishes to obtain a comprehensive view of the connecting links between convex and classical analysis. is widely viewed as the classic convex analysis text. Most of these books relate to both convex analysis and optimization. Stoer and Witzgall [StW70] discuss similar topics as Rockafellar Roc70] but less comprehensively. The book by Rockafellar and Wets [RoW98] is a deep and detailed treatment of “variational analysis. The notion of a tangent and the tangent cone originated with Bouligand [Bou30]. although it does not cross over into nonconvex optimization. Rockafellar [Roc84] focuses on convexity and duality in network optimization and an important generalization.172 Convex Analysis and Optimization Chap. [Bou32]. Ekeland and Temam [EkT76] develop the subject in inﬁnite dimensional spaces. introduced by Mordukhovich [Mor76].
1998. [BeM73] Bertsekas. S. reprinted in Constantin Caratheodory.. Sci. G. D. SpringerVerlag. MA. G. [Ber99] Bertsekas. N. and Tsitsiklis. N. Belmont.. 1983. [Cla83] Clarke. [Bou30] Bouligand.. 1983. Y. Amsterdam. I. Convex Analysis and Nonlinear Optimization.” Rendiconto del Circolo Matematico di Palermo. C. pp. 1932. Munchen.” Annales de la Societe Polonaise de Mathematique. J. H. Press.. [EkT76] Ekeland. Band III (H.). S.. NorthHolland Publ. P. D. [Dan63] Dantzig. V. and Lewis. “Uber den Variabilitatsbereich der Fourierschen Konstanten von Positiven Harmonischen Funktionen. 9. on Control. MA. Perturbation Analysis of Optimization Problems. K. ed. 11. N. P. D. 193217.. Tietze. Linear Programming. Introduction a la Geometrie Inﬁnitesimale Directe..Sec. A. Nonlinear Programming: 2nd Edition... Athena Scientiﬁc. [Ber98] Bertsekas. 2000. Inform. J. Paris. 1997. Beck’sche Verlagsbuchhandlung. Athena Scientiﬁc. A. and Mitter. Optimization and Nonsmooth Analysis.” Proc.. Y. 1955. H. 1976. and Mitter. pp. SpringerVerlag.. “Sur les Surfaces Depourvues de Points Hyperlimites. Freeman and Co. “Steepest Descent for Optimization Problems with Nondiﬀerentiable Cost Functionals. Vol. Network Optimization: Continuous and Discrete Models. Berlin and N. and Shapiro. S.. . M.9 Notes and Sources 173 REFERENCES [BeM71] Bertsekas. N.. N. [Chv83] Chvatal. Y.. 32. [BeT97] Bertsimas. GauthiersVillars. 1999. F. R. P. [BoL00] Borwein. J.. [BoS00] Bonnans. 1963. pp. Introduction to Linear Optimization. [Car11] Caratheodory. P.. 3241.. C. J. MA. Belmont. Vol.. Belmont. pp. H. B.. J. G. [Bou32] Bouligand.. “A Descent Numerical Method for Optimization Problems with Nondiﬀerentiable Cost Functionals. 1. Princeton. 637652. Linear Programming and Extensions. 1930. 1973.” SIAM J. 1971. and Temam.. Athena Scientiﬁc. 347351. W. Gesammelte Mathematische Schriften. Princeton Univ. pp. 78110. Y. D. F. Convex Analysis and Variational Problems.. Princeton. K. Vol. D. Wiley. 5th Annual Princeton Confer. N. 1911. Systems.
1970. The Theory of Matrices. (Ed.. McGrawHill. Programming Study 3. No. Y. Y. MA. [Nur77] Nurminskii. Amsterdam. 1976.. A. R. NorthHolland. “An Algorithm for Minimizing Convex Functions.B. M. Academic Press.” Proc.). R. R.. P. 1993. [Roc70] Rockafellar.” Gesammelte Abhandlungen. Wiley. H.. II. 1984.. and Functions. T. 1986. pp. Vol. and Kunze. R. L.. PrenticeHall. [Min60] Minty. Roy. Berlin. N.. SpringerVerlag. Amsterdam. pp. 194212. Y. “The Continuity of Subgradient Mappings.. “An Extension of Davidon Methods to Nondiﬀerentiable Problems. Englewood Cliﬀs. 1974. and Wolfe.. Convex Analysis. [HoK71] Hoﬀman. Teubner. [RoW98] Rockafellar. [Rud76] Rudin. Y.. Academic Press. [Lem74] Lemarechal. W. SpringerVerlag. C. N. 95109. J. “Monotone Networks.. J [Roc84] Rockafellar. M.. S. Rosenfeld. Balinski..” Mimeographed Notes. “Convex Cones. Insbesondere Begrundung Ihres Ober Flachenbegriﬀs. Vols. J. 148–149. 1975. Princeton. 1998. 1951. C.. B.. . (Eds. Princeton Univ... Press. 552556. Wiley.. Vol. and Wets. J. Theory of Linear and Integer Programming.. Variational Analysis. [Scr86] Shrijver. republished by Athena Scientiﬁc. [Lem75] Lemarechal.” Math.. 1985. [Mor76] Mordukhovich. M. J. A. [LaT85] Lancaster. [Min11] Minkowski. 5. G. and Lemarechal. W. Convex Analysis and Minimization Algorithms. 40. 1 [Fen51] Fenchel.). N. pp. C. 1970. T. Principles of Mathematical Analysis. 257. J. pp. N. N. Y. pp. Princeton Univ. of Applied Mathematics and Mechanics. 1911.B.” (Russian) Kibernetika (Kiev) 1977. Linear Algebra.” in Information Processing ’74. and Tismenetsky. E. I and II. J. N. [OrR70] Ortega..” J. “Theorie der Konvexen Korper. W. 1976. NorthHolland. Soc.. 960969. 1960. P. 1971. [HiL93] HiriartUrruty. and Rheinboldt. A.. R. Belmont. Y. “Maximum Principle in the Problem of Time Optimal Response with Nonsmooth Constraints.174 Convex Analysis and Optimization Chap.. T. N. Sets. Network Flows and Monotropic Optimization. 1998.. C. Berlin and N. K. Leipsig. London. Iterative Solution of Nonlinear Equations in Several Variables.
[StW70] Stoer. [Str76] Strang. 1970. “Elementare Theorie der Konvexen Polyeder.9 Notes and Sources 175 [Ste13] Steinitz. G. of Math. H. 1976. J. Berlin. Y. Vol. and Witzgall. C.Sec. 152.. 1916. [Wey35] Weyl. Academic Press.. Linear Algebra and Its Applications....” Commentarii Mathematici Helvetici. I. 143. “Bedingt Konvergente Reihen und Konvexe System. H. [Ste16] Steinitz. 290306. “Bedingt Konvergente Reihen und Konvexe System. “Bedingt Konvergente Reihen und Konvexe System. Vol. N. 140. J. H. II. 1935. 1914. 1913. Vol. pp. [Ste14] Steinitz. . 144.. III. 128175.. 146. SpringerVerlag.. pp. 7. J.. pp. Vol. 1.. H. J. Convexity and Optimization in Finite Dimensions. pp. of Math. of Math.