You are on page 1of 23

ENGG 5501: Foundations of Optimization 2018–19 First Term

Handout 2: Elements of Convex Analysis


Instructor: Anthony Man–Cho So September 10, 2018

As briefly mentioned in Handout 1, the notion of convexity plays a very important role in both
the theoretical and algorithmic aspects of optimization. Before we discuss in depth the relevance
of convexity in optimization, however, let us first introduce the notions of convex sets and convex
functions and study some of their properties.

1 Affine and Convex Sets


1.1 Basic Definitions and Properties
We begin with some definitions.

Definition 1 Let S ⊆ Rn be a set. We say that

1. S is affine if αx + (1 − α)y ∈ S whenever x, y ∈ S and α ∈ R;

2. S is convex if αx + (1 − α)y ∈ S whenever x, y ∈ S and α ∈ [0, 1].

Given x, y ∈ Rn and α ∈ R, the vector z = αx + (1 − α)y is called an affine combination of x


and y. If α ∈ [0, 1], then z is called a convex combination of x and y.
Geometrically, when x and y are distinct points in Rn , the set

L = {z ∈ Rn : z = αx + (1 − α)y, α ∈ R}

of all affine combinations of x and y is simply the line determined by x and y, and the set

S = {z ∈ Rn : z = αx + (1 − α)y, α ∈ [0, 1]}

is the line segment between x and y. By convention, the empty set ∅ is affine and hence also convex.
The notion of an affine (resp. convex) combination of two points can be easily generalized to
n
any finite number of points.
Pk Pkan affine combination of the points x1 , . . . , xk ∈ R is a
In particular,
point of the form z = i=1 αi xi , where i=1 αi = 1. Similarly, a convex combination of the points
x1 , . . . , xk ∈ Rn is a point of the form z = ki=1 αi xi , where ki=1 αi = 1 and α1 , . . . , αk ≥ 0.
P P

The following proposition reveals the structure of a non–empty affine set.

Proposition 1 Let S ⊆ Rn be non–empty. Then, the following are equivalent:

(a) S is affine.

(b) Any affine combination of points in S belongs to S.

(c) S is the translation of some linear subspace V ⊆ Rn ; i.e., S is of the form {x} + V = {x + v ∈
Rn : v ∈ V } for some x ∈ Rn .

1
Proof We first establish (a)⇒(b) by induction on the number of points k ≥ 1 in S used to form
the affine combination. The case where k = 1 is trivial, and the case where k = 2 follows Pfrom the
k
assumption that S is affine. For the inductive step, let k ≥ 3 and suppose that x = i=1 αi xi ,
where x1 , . . . , xk ∈ S and ki=1 αi = 1. If αi = 0 for some i ∈ {1, . . . , k}, then x is an affine
P
combination of k − 1 points in S and hence x ∈ S by the inductive hypothesis. On the other hand,
if αi 6= 0 for all i ∈ {1, . . . , n}, then we may assume without loss of generality that α1 6= 1 and
write
k
X αi
x = α1 x1 + (1 − α1 ) xi .
1 − α1
i=2
Pk Pk
Since i=2 (αi /(1 − α1 )) = 1, the point x̄ = i=2 (αi /(1 − α1 ))xi is an affine combination of k − 1
points in S. Hence, we have x̄ ∈ S by the inductive hypothesis. Since S is affine by assumption
and x is an affine combination of x1 ∈ S and x̄ ∈ S, we conclude that x ∈ S, as desired.
Next, we establish (b)⇒(c). Let x ∈ S and set V = S − {x}. Our goal is to show that V ⊆ Rn
is a linear subspace. Towards that end, let v1 , v2 ∈ V and α, β ∈ R. By definition of V , there exist
z1 , z2 ∈ S such that v1 = z1 − x and v2 = z2 − x. Using the fact that z(t) = tz1 + (1 − t)z2 ∈ S for
any t ∈ R, we have

αv1 + βv2 = α(β + (1 − β))z1 + β(α + (1 − α))z2 + (1 − α − β)x − x


= α((1 − β)z1 + βz2 ) + β(αz1 + (1 − α)z2 ) + (1 − α − β)x − x
= αz(1 − β) + βz(α) + (1 − α − β)x − x
= z̄ − x,

where z̄ = αz(1−β)+βz(α)+(1−α−β)x is an affine combination of the points z(1−β), z(α), x ∈ S


and hence belongs to S. It follows that αv1 + βv2 ∈ V , as desired.
Lastly, we establish (c)⇒(a). Suppose that S = {x} + V for some vector x ∈ Rn and linear
subspace V ⊆ Rn . Then, for any z1 , z2 ∈ S, there exist v1 , v2 ∈ V such that z1 = x + v1 and
z2 = x + v2 . Since V is linear, for any α ∈ R, we have z(α) = αz1 + (1 − α)z2 = x + v(α) with
v(α) = αv1 + (1 − α)v2 ∈ V . This shows that z(α) ∈ S. u
t
Proposition 1 provides alternative characterizations of an affine set and furnishes examples of
affine sets in Rn . Motivated by the equivalence of (a) and (c) in Proposition 1, we shall also refer
to an affine set as an affine subspace.
Let us now turn our attention to convex sets. In view of Proposition 1, the following proposition
should come as no surprise. We leave its proof to the reader.

Proposition 2 Let S ⊆ Rn be arbitrary. Then, the following are equivalent:

(a) S is convex.

(b) Any convex combination of points in S belongs to S.

Before we proceed, it is helpful to have some concrete examples of convex sets.

Example 1 (Some Examples of Convex Sets)

1. Non–Negative Orthant: Rn+ = {x ∈ Rn : x ≥ 0}

2
2. Hyperplane: H(s, c) = {x ∈ Rn : sT x = c}

3. Halfspaces: H − (s, c) = {x ∈ Rn : sT x ≤ c}, H + (s, c) = {x ∈ Rn : sT x ≥ c}

4. Euclidean Ball: B(x̄, r) = {x ∈ Rn : kx − x̄k2 ≤ r}

5. Ellipsoid: E(x̄, Q) = x ∈ Rn : (x − x̄)T Q(x − x̄) ≤ 1 , where Q is an n × n symmetric,




positive definite matrix (i.e., xT Qx > 0 for all x ∈ Rn \{0} and denoted by Q ∈ S++ n )

6. Simplex: ∆ = { ni=0 αi xi : ni=0 αi = 1, αi ≥ 0 for i = 0, 1, . . . , n}, where x0 , x1 , . . . , xn


P P
are vectors in Rn such that the vectors x1 − x0 , x2 − x0 , . . . , xn − x0 are linearly indepen-
dent (equivalently, the vectors x0 , x1 , . . . , xn are affinely independent)

7. Convex Cone: A set K ⊆ Rn is called a cone if {αx : α > 0} ⊆ K whenever x ∈ K. If K


is also convex, then K is called a convex cone.
n = {Q ∈ S n : xT Qx ≥ 0 for all x ∈ Rn }
8. Positive Semidefinite Cone: S+

The convexity of the sets in Example 1 can be easily established by first principles. We leave this
task to the reader.
It is clear from the definition that the intersection of an arbitrary family of affine (resp. convex)
sets is an affine (resp. convex) set. Thus, for any arbitrary S ⊆ Rn , there is a smallest (by
inclusion) affine (resp. convex) set containing S; namely, the intersection of all affine (resp. convex)
sets containing S. This leads us to the following definitions:

Definition 2 Let S ⊆ Rn be arbitrary.

1. The affine hull of S, denoted by aff(S), is the intersection of all affine subspaces containing
S. In particular, aff(S) is the smallest affine subspace that contains S.

2. The convex hull of S, denoted by conv(S), is the intersection of all convex sets containing
S. In particular, conv(S) is the smallest convex set that contains S.

The above definitions can be viewed as characterizing the affine hull and convex hull of a set S
from the outside. However, given a point in the affine or convex hull of S, it is not immediately clear
from the above definitions how it is related to the points in S. This motivates our next proposition,
which in some sense provides a characterization of the affine hull and convex hull of S from the
inside.

Proposition 3 Let S ⊆ Rn be arbitrary. Then, the following hold:

(a) aff(S) is the set of all affine combinations of points in S.

(b) conv(S) is the set of all convex combinations of points in S.

Proof Let us prove (a) and leave the proof of (b) as a straightforward exercise to the reader. Let
T be the set of all affine combinations of points in S. Since S ⊆ aff(S), every x ∈ T is an affine
combination of points in aff(S). Hence, by Proposition 1, we have T ⊆ aff(S).
To establish the reverse inclusion, we show that T is an affine subspace containing S. As aff(S)
is the smallest affine subspace that contains S, this would show that aff(S) ⊆ T . To begin, we
note that S ⊆ T . Thus, it remains to show that T is an affine subspace. Towards that end, let

3
x1 , x2 ∈ T . By definition, there exist y1 , . . . , yp , yp+1 , . . . , yq ∈ S and α1 , . . . , αp , αp+1 , . . . , αq ∈ R
such that
X p X q
x1 = αi yi , x2 = αi yi
i=1 i=p+1

and
p
X q
X
αi = αi = 1.
i=1 i=p+1

It follows that for any β ∈ R, we have


p
X q
X
x̄ = βx1 + (1 − β)x2 = βαi yi + (1 − β)αi yi
i=1 i=p+1

with
p
X q
X
βαi + (1 − β)αi = 1.
i=1 i=p+1

In other words, x̄ is an affine combination of points in S. Thus, we have x̄ ∈ T , which shows that
T is an affine subspace, as desired. u
t
Since an affine subspace is a translation of a linear subspace and the dimension of a linear
subspace is a well–defined notion, we can define the dimension of an affine subspace as the dimension
of its underlying linear subspace (see Section 1.5 of Handout B). This, together with the definition
of affine hull, makes it possible to define the dimension of an arbitrary set in Rn . Specifically, we
have the following definition:

Definition 3 Let S ⊆ Rn be arbitrary. The dimension of S, denoted by dim(S), is the dimension


of the affine hull of S.

Given a non–empty set S ⊆ Rn , we always have 0 ≤ dim(S) ≤ n. Roughly speaking, dim(S) is


the intrinsic dimension of S. As we shall see, this quantity plays a fundamental role in optimization.
To better understand the notion of the dimension of a set, let us consider the following example:

Example 2 (Dimension of a Set) Consider the two–point set S = {(1, 1), (3, 2)} ⊆ R2 . By
Proposition 3(a), we have aff(S) = {α(1, 1) + (1 − α)(3, 2) : α ∈ R} ⊆ R2 . It is easy to verify that
aff(S) = {(0, 1/2)} + V , where V = {t(1, 1/2) : t ∈ R} is the linear subspace generated by the vector
(1, 1/2). Hence, we have dim(S) = dim(V ) = 1.

1.2 Convexity–Preserving Operations


Although in theory one can establish the convexity of a set from the definition directly, in practice
this may not be the easiest route to take. Moreover, in many occasions, we need to apply certain
operation to a collection of convex sets, and we would like to know whether the resulting set is
convex. Thus, it is natural to ask which set operations are convexity–preserving. We have already
seen that the intersection of an arbitrary family of convex sets is convex. Thus, set intersection
is convexity–preserving. However, it is easy to see that set union is not. In the following, let us
introduce some other convexity–preserving operations.

4
1.2.1 Affine Functions
We say that a map A : Rn → Rm is affine if

A(αx1 + (1 − α)x2 ) = αA(x1 ) + (1 − α)A(x2 )

for all x1 , x2 ∈ Rn and α ∈ R. It can be shown that A is affine iff there exist A0 ∈ Rm×n and
y0 ∈ Rm such that A(x) = A0 x + y0 for all x ∈ Rn . As the following proposition shows, convexity
is preserved under affine mappings.

Proposition 4 Let A : Rn → Rm be an affine mapping and S ⊆ Rn be a convex set. Then, the


image A(S) = {A(x) ∈ Rm : x ∈ S} is convex. Conversely, if T ⊆ Rm is a convex set, then the
inverse image A−1 (T ) = {x ∈ Rn : A(x) ∈ T } is convex.

Proof The proposition follows from the easily verifiable fact that for any x1 , x2 ∈ Rn , A([x1 , x2 ]) =
[A(x1 ), A(x2 )] ⊆ Rm . u
t

Example 3 (Convexity of the Ellipsoid) Consider the ball B(0, r) = x ∈ Rn : xT x ≤ r2 ⊆




Rn , where r > 0. Clearly, B(0, r) is convex. Now, let Q be an n×n symmetric positive definite ma-
trix. Then, it is well–known that Q is invertible and the n×n symmetric matrix Q−1 is also positive
definite. Moreover, there exists an n × n symmetric matrix Q−1/2 such that Q−1 = Q−1/2 Q−1/2 .
(See [6, Chapter 7] if you are not familiar with these facts.) Thus, we may define an affine mapping
A : Rn → Rn by A(x) = Q−1/2 x + x̄. We claim that

A(B(0, r)) = x ∈ Rn : (x − x̄)T Q(x − x̄) ≤ r2 = E(x̄, Q/r2 ).




Indeed, let x ∈ B(0, r) and consider the point A(x). We compute

(A(x) − x̄)T Q(A(x) − x̄) = xT Q−1/2 QQ−1/2 x = xT x ≤ r2 ;

i.e., A(B(0, r)) ⊆ E(x̄, Q/r2 ). Conversely, let x ∈ E(x̄, Q/r2 ). Consider the point y = Q1/2 (x −
x̄) = A−1 (x). Then, we have y T y ≤ r2 , which implies that E(x̄, Q/r2 ) ⊆ A(B(0, r)). Hence, we
conclude from the above calculation and Proposition 4 that E(x̄, Q/r2 ) is convex.

1.2.2 Perspective Functions 透视函数


Define the perspective function P : Rn × R++ → Rn by P (x, t) = x/t. The following proposition
shows that convexity is preserved by perspective functions.

Proposition 5 Let P : Rn × R++ → Rn be the perspective function and S ⊆ Rn × R++ be a convex


set. Then, the image P (S) = {x/t ∈ Rn : (x, t) ∈ S} is convex. Conversely, if T ⊆ Rn is a convex
set, then the inverse image P −1 (T ) = {(x, t) ∈ Rn × R++ : x/t ∈ T } is convex.

Proof For any x1 = (x̄1 , t1 ) ∈ Rn × R++ , x2 = (x̄2 , t2 ) ∈ Rn × R++ , and α ∈ [0, 1], we have

αx̄1 + (1 − α)x̄2
P (αx1 + (1 − α)x2 ) = = βP (x1 ) + (1 − β)P (x2 ),
αt1 + (1 − α)t2

where
αt1
β= ∈ [0, 1].
αt1 + (1 − α)t2

5
Moreover, as α increases from 0 to 1, β increases from 0 to 1. It follows that P ([x1 , x2 ]) =
[P (x1 ), P (x2 )] ⊆ Rn . This completes the proof. u
t
The following result, which is a straightforward application of Propositions 4 and 5, shows that
convexity is also preserved under linear–fractional mappings (also known as projective mappings).
Corollary 1 Let A : Rn → Rm+1 be the affine map given by 线性分式函数
" # " #
Q u
A(x) = T x + ,
c d

where Q ∈ Rm×n , c ∈ Rn , u ∈ Rm , and d ∈ R. Furthermore, let D = x ∈ Rn : cT x + d > 0 .




Define the linear–fractional map f : D → Rm by f = P ◦ A, where P : Rm × R++ → Rm is the


perspective function. If S ⊆ D is convex, then the image f (S) is convex. Conversely, if T ⊆ Rm is
convex, then the inverse image f −1 (T ) is convex.

1.3 Extremal Elements of a Convex Set


Let S ⊆ Rn be a convex set. By Proposition 3(b), any x ∈ S can be represented as x = ki=1 αi xi ,
P

where x1 , . . . , xk ∈ S, ki=1 αi = 1, and α1 , . . . , αk ≥ 0. However, there is no a priori bound on k,


P
the number of points needed. The following theorem of Carathéodory remedies this situation:
Theorem 1 (Carathéodory’s Theorem) Let S ⊆ Rn be arbitrary. Then, any x ∈ conv(S) can
be represented as a convex combination of at most n + 1 points in S.
Proof Consider an arbitrary convex combination x = ki=1 αi xi , with x1 , . . . , xk ∈ S and k ≥ n+2.
P
The plan is to show that one of the coefficients αi can be set to 0 without changing x. To begin,
observe that since k ≥ n + 2, the vectors x2 − x1 , x3 − x1 , . . . , xk − xP
1 must be linearly P
dependent in
Rn . In particular, there exist β1 , . . . , βk ∈ R, not all zero, such that ki=1 βi xi = 0 and ki=1 βi = 0.
For i = 1, . . . , k, define
αj
αi0 = αi − t∗ βi , where t∗ = min = max {t ≥ 0 : αi − tβi ≥ 0 for i = 1, . . . , k} .
j:βj >0 βj

Note that t∗ < ∞, since there exists at least one index j such that βj > 0. Now, it is straightforward
to verify that x = ki=1 αi0 xi , ki=1 αi0 = 1, and α10 , . . . , αk0 ≥ 0, and that |{i : αi0 > 0}| ≤ k − 1.
P P
Now, this process can be repeated until there are only at most n + 1 non–zero coefficients. This
completes the proof. u
t
Although Carathéodory’s theorem asserts that any x ∈ conv(S) can be obtained as a convex
combination of at most n + 1 points in S, it does not imply the existence of a “basis” of size n + 1
for the points in S. In other words, there may not exist a fixed set of n + 1 points x0 , x1 , . . . , xn in
S such that conv(S) = conv({x0 , x1 , . . . , xn }) (consider, for instance, the unit disk B(0, 1)). This
should be contrasted with the case of linear combinations, where a subspace of dimension n can be
generated by n fixed basis vectors.
However, not all is lost. It turns out that under suitable conditions, there exists a fixed, though
possibly infinite, set S 0 of points in S such that S = conv(S 0 ). In order to introduce this result, we
need a definition.
Definition 4 Let S ⊆ Rn be non–empty and convex. We say that x ∈ Rn is an extreme point
of S if x ∈ S and there do not exist two different points x1 , x2 ∈ S such that x = 21 (x1 + x2 ).

6
The above definition can be rephrased as follows: x ∈ S is an extreme point of S if x = x1 = x2
whenever x1 , x2 ∈ S are such that x = αx1 + (1 − α)x2 for some α ∈ (0, 1).

Example 4 (Extreme Points of a Ball) Consider the unit ball B(0, 1). We claim that any
x ∈ B(0, 1) with kxk2 = 1 is an extreme point of B(0, 1). Indeed, let x ∈ B(0, 1) be such that
kxk2 = 1, and let x1 , x2 ∈ B(0, 1) be such that x = 21 (x1 + x2 ). We compute

1 1 1 1 1
1 = kxk22 = kx1 + x2 k22 = kx1 k22 + kx2 k22 − kx1 − x2 k22 ≤ 1 − kx1 − x2 k22 .
4 2 2 4 4
This shows that x1 = x2 .

We are now ready to state the result alluded to earlier.

Theorem 2 (Minkowski’s Theorem) Let S ⊆ Rn be compact and convex. Then, S is the convex
hull of its extreme points.

In particular, when combined with Carathéodory’s theorem, we see that if S ⊆ Rn is a compact


convex set, then any element in S can be expressed as a convex combination of at most n + 1
extreme points of S. We remark that Theorem 2 is false without the compactness assumption. For
instance, consider the set Rn+ .
We shall defer the proof of Theorem 2 to Section 1.6, after we have developed some additional
machinery.
Note that extreme points are zero–dimensional objects. By generalizing Definition 4, we can
define higher dimensional analogues of extreme points.

Definition 5 Let S ⊆ Rn be non–empty and convex. We say that a non–empty convex set F ⊆ Rn
is a face of S if F ⊆ S and there do not exist two points x1 , x2 ∈ S such that (i) x1 ∈ S \ F or
x2 ∈ S \ F , and (ii) 12 (x1 + x2 ) ∈ F .

In particular, a non–empty convex set F ⊆ S is a face of S if [x1 , x2 ] ⊆ F whenever x1 , x2 ∈ S and


αx1 + (1 − α)x2 ∈ F for some α ∈ (0, 1).
The collection of faces of a convex set S contains important geometric information about S and
has implications in the design and analysis of optimization algorithms. As an illustration of the
notion of face, consider the following example.

Example 5 (Faces of a Simplex) Consider the simplex


( n n
)
X X
∆= αi ei : αi = 1, αi ≥ 0 for i = 0, 1, . . . , n ,
i=0 i=0

where ei ∈ Rn is the i–th standard basis vector for i = 1, . . . , n and e0 = 0 ∈ Rn . Let I be a


non–empty subset of {0, 1, . . . , n} and define
( )
X X
∆I = αi e i : αi = 1, αi ≥ 0 for i ∈ I, αi = 0 for i 6∈ I .
i∈I i∈I

7
Clearly, ∆I is a non–empty convex subset of ∆. We claim that ∆I is a face of ∆. Indeed, let
x1 , x2 ∈ ∆ and suppose that βx1 + (1 − β)x2 ∈ ∆I for some β ∈ (0, 1). Then, there exist multipliers
{αi1 }ni=0 , {αi2 }ni=0 , and {αi }ni=0 such that
n
X n
X X
αi1 = αi2 = αi = 1, αi1 , αi2 , αi ≥ 0 for i = 0, 1, . . . , n, αi = 0 for i 6∈ I,
i=0 i=0 i∈I
n
X n
X
x1 = αi1 ei , x2 = αi2 ei ,
i=0 i=0
and
n
X n
X X
β αi1 ei + (1 − β) αi2 ei = αi e i . (1)
i=0 i=0 i∈I
Upon comparing coefficients in (1), we have
βαi1 + (1 − β)αi2 = αi for i = 1, . . . , n. (2)
In particular, we see that αi1 = αi2 = 0 whenever i 6∈ I ∪ {0}, since β ∈ (0, 1) and αi1 , αi2 ≥ 0 for
i = 0, 1, . . . , n. Now, in order to complete the proof of the claim, it suffices to show that x1 , x2 ∈ ∆I .
Towards that end, observe that if 0 ∈ I, then we have αi1 = αi2 = 0 for i 6∈ I, which implies that
x1 , x2 ∈ ∆I . Otherwise, we have 0 6∈ I, and our goal is to show that α01 = α02 = 0. To achieve that,
we first sum the equations in (2) and use the fact that α0 = 0 to get
n
X n
X n
X
β αi1 + (1 − β) αi2 = αi = 1.
i=1 i=1 i=1

Now, upon adding βα01 + (1 − β)α02 on both sides, we have


1 = 1 + βα01 + (1 − β)α02 .
Since β ∈ (0, 1) and α01 , α02 ≥ 0, this implies that α01 = α02 = 0, as desired. In particular, we
conclude that ∆I is a face of ∆.
A useful property of the notion of extreme points, which is captured in the following proposition,
is that it is transitive. The result should be intuitively obvious and we leave the proof to the reader.
Proposition 6 Let S ⊆ Rn be non–empty and convex. Then, the following hold:
(a) If S 0 is a convex subset of S and x ∈ S 0 is an extreme point of S, then x is an extreme point
of S 0 .
(b) If F ⊆ S is a face of S, then any extreme point of F is an extreme point of S.

1.4 Topological Properties


Given an arbitrary set S ⊆ Rn , recall that its interior is defined by
int(S) = {x ∈ S : B(x, ) ⊆ S for some  > 0}.
The notion of interior is intimately related to the space in which the set S lies. For instance,
consider the set S = [0, 1]. When viewed as a set in R, then int(S) = (0, 1). However, if we treat
S as a set in R2 , then int(S) = ∅, since no 2–dimensional ball of positive radius is contained in S.
Such ambiguity motivates the following definition.

8
Definition 6 Let S ⊆ Rn be arbitrary. We say that x ∈ S belongs to the relative interior of S,
denoted by x ∈ rel int(S), if there exists an  > 0 such that B(x, ) ∩ aff(S) ⊆ S. The relative
boundary of S, denoted by rel bd(S), is defined by rel bd(S) = cl(S) \ rel int(S).

The following result demonstrates the relevance of the above definition when S is convex.

Theorem 3 Let S ⊆ Rn be non–empty and convex. Then, rel int(S) is non–empty.

Proof Let k = dim(S) ≥ 0. Then, S contains k + 1 affinely independent points x0 , x1 , . . . , xk ,


which generate the simplex ∆ = conv({x0 , . . . , xk }). Clearly, we have ∆ ⊆ S. Moreover, since
dim(∆) = k, we have aff(∆) = aff(S). Thus, it suffices to show that rel int(∆) is non–empty.
Towards that end, let
k
1 X
x̄ = xi .
k+1
i=0

Clearly, we have x̄ ∈ aff(∆) = aff(S). Now, define V i = aff({x0 , x1 , . . . , xi−1 , xi+1 , . . . , xk }) and let
i = minx∈V i kx − x̄k2 . Then, we have i > 0 for i = 0, 1, . . . , k. Upon setting  = min0≤i≤m i , we
conclude that B(x̄, ) ∩ aff(S) ⊆ S, as required. u
t
As one would expect, if we move from a point in the relative interior of a non–empty convex set
S to any other point in S, then all the intermediate points should remain in the relative interior of
S. Such observation is made precise in the following proposition.

Proposition 7 Let S ⊆ Rn be non–empty and convex. For any x ∈ cl(S) and x0 ∈ rel int(S), we
have
(x, x0 ] = αx + (1 − α)x0 ∈ Rn : α ∈ [0, 1) ⊆ rel int(S).


Proof Let α ∈ [0, 1) be fixed and consider x̄ = αx + (1 − α)x0 . Since x ∈ cl(S), for any  > 0, we
have x ∈ S + (B(0, ) ∩ aff(S)). Now, we compute

B(x̄, ) ∩ aff(S) = αx + (1 − α)x0 + (B(0, ) ∩ aff(S))




⊆ (1 − α)x0 + αS + (1 + α)(B(0, ) ∩ aff(S))



   
0 1+α
= αS + (1 − α) B x ,  ∩ aff(S) .
1−α

Since x0 ∈ rel int(S), by taking  > 0 to be sufficiently small, we have


 
0 1+α
B x,  ∩ aff(S) ⊆ S.
1−α

It follows that for sufficiently small  > 0, we have B(x̄, ) ∩ aff(S) ⊆ αS + (1 − α)S = S, where
the last equality is due to the convexity of S. This implies that x̄ ∈ rel int(S), as desired. u
t
Another topological concept of interest is that of compactness. Recall that a set S ⊆ Rn is
compact if it is closed and bounded. Moreover, the Weierstrass theorem asserts that if S ⊆ Rn is
a non–empty compact set and f : S → R is a continuous function, then f attains its maximum
and minimum on S (see, e.g., [10, Chapter 7, Theorem 18] or Section 3.1 of Handout C). Thus, in
the context of optimization, it would be good to know when a convex set S ⊆ Rn is compact. The
following proposition provides one sufficient criterion.

9
Proposition 8 Let S ⊆ Rn be compact. Then, conv(S) is compact.

Proof By Carathéodory’s theorem (Theorem 1), for every x ∈ conv(S), there exist x1 , . . . , xn+1 ∈
S and α = (α1 , . . . , αn+1 ) ∈ ∆, where
n+1
( )
X
n+1
∆= α∈R : αi = 1, α ≥ 0 ,
i=1

such that x = n+1


P
i=1 αi xi . Conversely, it is obvious that any convex combination of n + 1 points in
S belongs to conv(S). Thus, if we define the continuous function f : S n+1 × ∆ → S by
n+1
X
f (x1 , . . . , xn+1 ; α1 , . . . , αn+1 ) = αi xi ,
i=1

then conv(S) = f (S n+1 ×∆). Now, observe that S n+1 ×∆ is compact, as both S and ∆ are compact.
Since the continuous image of a compact set is compact (see, e.g., [10, Chapter 7, Proposition 24]
or Section 3.1 of Handout C), we conclude that conv(S) is compact, as desired. u
t

1.5 Projection onto Closed Convex Sets


Given a non–empty set S ⊆ Rn and a point x ∈ Rn \ S, a natural problem is to find a point in S
that is closest, say, in the Euclidean norm, to x. However, it is easy to see that such a point may
not exist (e.g., when S is open) or may not be unique (e.g., when S is an arc on a circle and x is
the center of the circle). Nevertheless, as the following result shows, the aforementioned difficulties
do not arise when S is closed and convex.

Theorem 4 Let S ⊆ Rn be non–empty, closed, and convex. Then, for every x ∈ Rn , there exists
a unique point z ∗ ∈ S that is closest (in the Euclidean norm) to x.

Proof Let x0 ∈ S be arbitrary and consider the set T = {z ∈ S : kx − zk2 ≤ kx − x0 k2 }. Note


that T is compact, as it is closed and bounded. Since the function z 7→ kx − zk22 is continuous,
we conclude by Weierstrass’ theorem that its minimum over the compact set T is attained at some
z ∗ ∈ T . Clearly, z ∗ is a point in S that is closest to x. This establishes the existence.
Now, let µ∗ = kx − z ∗ k2 and suppose that z1 , z2 ∈ S are such that µ∗ = kx − z1 k2 = kx − z2 k2 .
Consider the point z̄ = 12 (z1 + z2 ). By Pythagoras’ theorem, we have

1
kz̄ − xk22 = (µ∗ )2 − kz1 − z̄k22 = (µ∗ )2 − kz1 − z2 k22 .
4

In particular, if z1 6= z2 , then kz̄ − xk22 < (µ∗ )2 , which is a contradiction. This establishes the
uniqueness and completes the proof of the theorem. u
t
In the sequel, we shall refer to the point z ∗ in Theorem 4 as the projection of x on S and
denote it by ΠS (x). In other words, we have

ΠS (x) = arg min kx − zk22 .


z∈S

The following theorem provides a useful characterization of the projection.

10
Theorem 5 Let S ⊆ Rn be non–empty, closed, and convex. Given any x ∈ Rn , we have z ∗ = ΠS (x)
iff z ∗ ∈ S and (z − z ∗ )T (x − z ∗ ) ≤ 0 for all z ∈ S.
Proof Let z ∗ = ΠS (x) and z ∈ S. Consider points of the form z(α) = αz + (1 − α)z ∗ , where
α ∈ [0, 1]. By convexity, we have z(α) ∈ S. Moreover, we have kz ∗ − xk2 ≤ kz(α) − xk2 for all
α ∈ [0, 1]. On the other hand, note that

kz(α) − xk22 = (z ∗ + α(z − z ∗ ) − x)T (z ∗ + α(z − z ∗ ) − x)

= kz ∗ − xk22 + 2α(z − z ∗ )T (z ∗ − x) + α2 kz − z ∗ k22 .

Thus, we see that kz(α) − xk22 ≥ kz ∗ − xk22 for all α ∈ [0, 1] iff (z − z ∗ )T (z ∗ − x) ≥ 0. This is
precisely the stated condition.
Conversely, suppose that for some z 0 ∈ S, we have (z − z 0 )T (x − z 0 ) ≤ 0 for all z ∈ S. Upon setting
z = ΠS (x), we have
(ΠS (x) − z 0 )T (x − z 0 ) ≤ 0. (3)
On the other hand, by our argument in the preceding paragraph, the point ΠS (x) satisfies

(z 0 − ΠS (x))T (x − ΠS (x)) ≤ 0. (4)

Upon adding (3) and (4), we obtain

(ΠS (x) − z 0 )T (ΠS (x) − z 0 ) = kΠS (x) − z 0 k22 ≤ 0,

which is possible only when z 0 = ΠS (x). u


t
We remark that the projection operator ΠS plays an important role in many optimization
algorithms. In particular, the efficiency of those algorithms depends in part on the efficient com-
putability of ΠS . We refer the interested reader to the recent paper [4] for details and further
references.

1.6 Separation Theorems


The results in the previous sub–section allow us to establish various separation theorems of convex
sets, which are of fundamental importance in convex analysis and optimization. We begin with the
following simple yet powerful result.
Theorem 6 Let S ⊆ Rn be non–empty, closed, and convex. Furthermore, let x ∈ Rn \ S be
arbitrary. Then, there exists a y ∈ Rn such that

max y T z < y T x.
z∈S

Proof Since S is a non–empty closed convex set, by Theorem 4, there exists a unique point z ∗ ∈ S
that is closest to x. Set y = x − z ∗ . By Theorem 5, for all z ∈ S, we have (z − z ∗ )T y ≤ 0, which
implies that
y T z ≤ y T z ∗ = y T x + y T (z ∗ − x) = y T x − kyk22 .
Since x 6∈ S, we have y 6= 0. It follows that

max y T z = y T z ∗ = y T x − kyk22 < y T x,


z∈S

11
as desired. u
t
To demonstrate the power of Theorem 6, we shall use it to establish three useful results. The
first states that every closed convex set can be represented as an intersection of halfspaces that
contain it.

Theorem 7 A closed convex set S ⊆ Rn is the intersection of all the halfspaces containing S.

Proof We may assume that ∅ ( S ( Rn , for otherwise the theorem is trivial. Let x ∈ Rn \ S
be arbitrary. Then, by Theorem 6, there exist y ∈ Rn and c = maxz∈S y T z ∈ R such that the
halfspace H − (y, c) = {z ∈ Rn : y T z ≤ c} contains S but not x. It follows that the intersection of
all the halfspaces containing S is precisely S itself. u
t
Theorem 7 suggests that at any boundary point x of a closed convex set S, there should be a
hyperplane supporting S at x. To formalize this intuition, we need some definitions.

Definition 7 Let S ⊆ Rn be non–empty. We say that the hyperplane H(s, c)

1. supports S if S is contained in one of the halfspaces defined by H(s, c), say, S ⊆ H − (s, c);

2. supports S at x ∈ S if H(s, c) supports S and x ∈ H(s, c);

3. supports S non–trivially if H(s, c) supports S and S 6⊆ H(s, c).

We are now ready to establish the second result, which concerns the existence of supporting hyper-
planes to convex sets.

Theorem 8 Let S ⊆ Rn be non–empty, closed, and convex. Then, for any x ∈ rel bd(S), there
exists a hyperplane supporting S non–trivially at x.

Proof Let x ∈ rel bd(S) be fixed. Set S 0 = S −{x} and V = aff(S)−{x}. Then, S 0 is a non–empty
closed convex set in the Euclidean space V with 0 ∈ rel bd(S 0 ). Moreover, since rel bd(S) 6= ∅, we
have S 6= aff(S), which implies that V 6= S 0 . Now, consider a sequence {xk }k≥1 in V \ S 0 such that
xk → 0. By Theorem 6, for k = 1, 2, . . ., there exists a yk ∈ V such that kyk k2 = 1 and ykT w < ykT xk
for all w ∈ S 0 . The latter is equivalent to

ykT (z − x) < ykT xk for all z ∈ S. (5)

Since the sequence {yk }k≥1 is bounded, by considering a subsequence if necessary, we may assume
that yk → y with kyk2 = 1. It follows from (5) that y T z ≤ y T x for all z ∈ S; i.e., S ⊆ H − (y, y T x).
In addition, if S ⊆ H(y, y T x), or equivalently, y T z = y T x for all z ∈ S, then y T z = y T x for all
z ∈ aff(S). This implies that y ∈ V ⊥ , which is impossible because y ∈ V and y 6= 0. Hence, we have
S 6⊆ H(y, y T x). Consequently, we deduce that the hyperplane H(y, y T x) supports S non–trivially
at x, as desired. u
t
Armed with Theorem 8, we can complete the proof of Minkowski’s theorem (Theorem 2).
Proof of Minkowski’s Theorem We proceed by induction on dim(S). The base case, which
corresponds to dim(S) = 0, is trivial, because in this case S consists of a single point, which by
definition is an extreme point of S. Now, suppose that dim(S) = k ≥ 1 and let x ∈ S be arbitrary.
We consider two cases:

12
Case 1: x ∈ rel bd(S). By Theorem 8, we can find a hyperplane H supporting S at x. Observe that
H ∩ S is a compact convex set whose dimension is at most k − 1. Thus, by the inductive hypothesis,
x ∈ H ∩ S can be expressed as a convex combination of extreme points of H ∩ S. Now, it remains to
show that the extreme points of H ∩S are also the extreme points of S. Towards that end, it suffices
to show that H ∩ S is a face of S and then invoke Proposition 6. Let x1 , x2 ∈ S and α ∈ (0, 1) be
such that αx1 + (1 − α)x2 ∈ H ∩ S. Suppose that H takes the form H = {z ∈ Rk : sT z = sT x} for
some s ∈ Rk \ {0}. Since H supports S, we may assume without loss of generality that sT x1 ≤ sT x
and sT x2 ≤ sT x. This, together with the fact that αx1 + (1 − α)x2 ∈ H, implies that x1 , x2 ∈ H.
It follows that [x1 , x2 ] ⊆ H ∩ S, as desired.
Case 2: x ∈ rel int(S). Since dim(S) ≥ 1, there exists an x0 ∈ S such that x0 6= x. Then,
Proposition 7 and the compactness of S imply that the line joining x and x0 intersects rel bd(S) at
two points, say, y, z ∈ rel bd(S). By the result in Case 1, both y and z can be expressed as a convex
combination of the extreme points of S. Since x is a convex combination of y and z, it follows that
x can also be expressed as a convex combination of the extreme points of S. u
t
To motivate the third result, observe that Theorem 6 is a result on point–set separation. A
natural question is whether we can derive an analogous result for set–set separation. In other
words, given two non–empty and non–intersecting convex sets, is it possible to separate them using
a hyperplane? As it turns out, under suitable conditions, we can reduce this question to that
of point–set separation. Consequently, we can apply Theorem 6 to obtain the desired separation
result.

Theorem 9 Let S1 , S2 ⊆ Rn be non–empty, closed, and convex with S1 ∩ S2 = ∅. Furthermore,


suppose that S2 is bounded. Then, there exists a y ∈ Rn such that

max y T z < min y T u.


z∈S1 u∈S2

Proof First, note that the set S1 − S2 = {z − u ∈ Rn : z ∈ S1 , u ∈ S2 } is non–empty and convex.


Moreover, we claim that it is closed. To see this, let x1 , x2 , . . . be a sequence in S1 − S2 such that
xk → x. We need to show that x ∈ S1 − S2 . Since xk ∈ S1 − S2 , there exist zk ∈ S1 and uk ∈ S2
such that xk = zk − uk for k = 1, 2, . . .. Since S2 is compact, there exists a subsequence {uki } such
that uki → u ∈ S2 . Since xki → x, we conclude that zki → x + u. Since S1 is closed, we conclude
that x + u ∈ S1 . It then follows that x = (x + u) − u ∈ S1 − S2 , as desired.
We are now in a position to apply Theorem 6 to the non–empty closed convex set S1 − S2 . Indeed,
since S1 ∩ S2 = ∅, we see that 0 6∈ S1 − S2 . By Theorem 6, there exist y ∈ Rn , z ∗ ∈ S1 , and u∗ ∈ S2
such that
y T (z ∗ − u∗ ) = max y T v < 0.
v∈S1 −S2

Since S2 is compact, we have y T u∗ = minu∈S2 y T u. This implies that y T z ∗ = maxz∈S1 y T z. Hence,


we obtain
max y T z < min y T u,
z∈S1 u∈S2

as desired. u
t

13
2 Convex Functions
2.1 Basic Definitions and Properties
Let us now turn to the notion of a convex function.

Definition 8 Let f : Rn → R ∪ {+∞} be an extended real–valued function that is not identically


+∞.

1. We say that f is convex if

f (αx1 + (1 − α)x2 ) ≤ αf (x1 ) + (1 − α)f (x2 ) (6)

for all x1 , x2 ∈ Rn and α ∈ [0, 1]. We say that f is concave if −f is convex.

2. The epigraph of f is the set epi(f ) = {(x, t) ∈ Rn × R : f (x) ≤ t}.

3. The effective domain of f is the set dom(f ) = {x ∈ Rn : f (x) < +∞}.

A major advantage of allowing f to take the value +∞ is that we do not need to explicitly specify
the domain of f . Indeed, if the function f is defined on a set S ⊆ Rn , then we may simply extend
f to Rn by setting f (x) = +∞ for all x 6∈ S and obtain dom(f ) = S.
The relationship between convex sets and convex functions is explained in the following propo-
sition, whose proof is left as an exercise to the reader.

Proposition 9 Let f : Rn → R ∪ {+∞} be as in Definition 8. Then, f is convex (as a function)


iff epi(f ) is convex (as a set).

A simple consequence of Proposition 9 is that if f is convex, then dom(f ) is convex. This follows
from Proposition 4 and the observation that dom(f ) is the image of the convex set epi(f ) under a
projection (which is a linear mapping). Another useful consequence of Proposition 9 is the following
inequality.

Corollary 2 (Jensen’s Inequality) Let f : Rn → R ∪ {+∞} be as in Definition 8. Then, f is


convex iff
k k
!
X X
f αi xi ≤ αi f (xi )
i=1 i=1
Pk
for any x1 , . . . , xk ∈ Rn and α1 , . . . , αk ∈ [0, 1] such that i=1 αi = 1.

Proof The “if” part is clear. To prove the “only if” part, we may assume without loss of generality
that x1 , . . . , xk ∈ dom(f ). Then, we have (xi , f (xi )) ∈ epi(f ) for i = 1, . . . , k. Since epi(f ) is convex
by Proposition 9, we may invoke Proposition 2 to conclude that
k k k
!
X X X
αi (xi , f (xi )) = αi xi , αi f (xi ) ∈ epi(f ).
i=1 i=1 i=1

However, this is equivalent to


k k
!
X X
f αi xi ≤ αi f (xi ),
i=1 i=1

14
which completes the proof. u
t
The epigraph epi(f ) of f is closely related to, but not the same as, the t–level set Lt (f ) of f ,
where Lt (f ) = {x ∈ Rn : f (x) ≤ t} and t ∈ R is arbitrary. One obvious difference is that epi(f ) is
a subset of Rn × R, but Lt (f ) is a subset of Rn . However, there is another, more subtle, difference:
Even if Lt (f ) is convex for all t ∈ R, the function f may not be convex. This can be seen, e.g.,
from the function x 7→ x3 . A function whose domain is convex and whose t–level sets are convex
for all t ∈ R is called a quasi–convex function. The class of quasi–convex functions possesses nice
properties and is important in its own right. We refer the interested reader to [5, 8] for details.
Now, let f : Rn → R ∪ {+∞} be a convex function, so that epi(f ) is a non–empty convex set
by Proposition 9. Suppose in addition that epi(f ) is closed. By Theorem 8, we know that epi(f )
can be represented as an intersection of all the halfspaces containing it. Such halfspaces take the
form H − ((y, y0 ), c) = {(x, t) ∈ Rn × R : y T x + y0 t ≤ c} for some y ∈ Rn and y0 , c ∈ R. In fact, it
suffices to use only those halfspaces with y0 = −1 in the representation. To see this, suppose that
epi(f ) ⊆ H − ((y, y0 ), c) and consider the following cases:
Case 1: y0 6= 0. Then, we must have y0 < 0 (since we can fix x ∈ dom(f ) and take t ≥ f (x) to be
arbitrarily large), in which case we may assume without loss that y0 = −1 (by scaling y and c).
Case 2: y0 = 0. We claim that there exists another halfspace H − ((ȳ, −1), c̄) such that epi(f ) ⊆
H − ((ȳ, −1), c̄). Indeed, if this is not the case, then every halfspace containing epi(f ) is of the form
H − ((y, 0), c) for some y ∈ Rn and c ∈ R. This implies that f = −∞, which is impossible.
Next, we show that for every (x̄, t̄) 6∈ H − ((y, 0), c), there exists a halfspace H − ((y 0 , −1), c0 )
with y 0 ∈ Rn , c0 ∈ R such that epi(f ) ⊆ H − ((y 0 , −1), c0 ) and (x̄, t̄) 6∈ H − ((y 0 , −1), c0 ). This
would imply that halfspaces of the form H − ((y, 0), c) with y ∈ Rn and c ∈ R are not needed in the
representation of epi(f ). To begin, observe that for any (x, t) ∈ epi(f ), we have (x, t) ∈ H − ((y, 0), c)
and (x, t) ∈ H − ((ȳ, −1), c̄), which means that y T x − c ≤ 0 and ȳ T x − c̄ ≤ t. It follows that for any
λ ≥ 0,
λ(y T x − c) + ȳ T x − c̄ ≤ t.
Moreover, since y T x̄ − c > 0, for sufficiently large λ ≥ 0, we have λ(y T x̄ − c) + ȳ T x̄ − c̄ > t̄.
Thus, by setting y 0 = λy + ȳ and c0 = λc + c̄, we conclude that epi(f ) ⊆ H − ((y 0 , −1), c0 ) and
(x̄, t̄) 6∈ H − ((y 0 , −1), c0 ), as desired.
It is not hard to see that the halfspace H − ((y, −1), c) is the epigraph of the affine function
h : Rn → R given by h(x) = y T x − c. Moreover, if epi(f ) ⊆ H − ((y, −1), c), then h(x) ≤ f (x)
for all x ∈ Rn . (This is obvious for x 6∈ dom(f ). For x ∈ dom(f ), note that (x, f (x)) ∈ epi(f ) ⊆
H − ((y, −1), c) implies y T x − f (x) ≤ c, or equivalently, h(x) ≤ f (x).) Since the intersection of
halfspaces yields the epigraph of the pointwise supremum of the affine functions induced by those
halfspaces, we obtain the following theorem:

Theorem 10 Let f : Rn → R ∪ {+∞} be a convex function such that epi(f ) is closed. Then, f
can be represented as the pointwise supremum of all affine functions h : Rn → R satisfying h ≤ f .

Motivated by Theorem 10, given a convex function f : Rn → R ∪ {+∞}, we consider the set

Sf = (y, c) ∈ Rn × R : y T x − c ≤ f (x) for all x ∈ Rn ,




which consists of the coefficients of those affine functions h : Rn → R satisfying h ≤ f . Clearly,


T n
 T
we have y x − c ≤ f (x) for all x ∈ R iff supx∈Rn y x − f (x) ≤ c. This shows that Sf is the

15
epigraph of the function f ∗ : Rn → R ∪ {+∞} given by

f ∗ (y) = sup y T x − f (x) .



x∈Rn

Moreover, observe that Sf is closed and convex. This implies that f ∗ is convex. The function f ∗
is called the conjugate of f . It plays a very important role in convex analysis and optimization.
We shall study it in greater detail later.

2.2 Convexity–Preserving Transformations


As in the case of convex sets, it is sometimes difficult to check directly from the definition whether a
given function is convex or not. In this sub–section we describe some transformations that preserve
convexity.

Theorem 11 The following hold:

(a) (Non–Negative Combinations) Let f1 , . . . , fm : Rn → R ∪ {+∞} be convex functions


satisfying ∩m n
i=1 dom(fi ) 6= ∅. Then, for any α1 , . . . , αm ≥ 0, the function f : R → R ∪ {+∞}
defined by
m
X
f (x) = αi fi (x)
i=1
is convex.

(b) (Pointwise Supremum) Let I be an index set and {fi }i∈I , where fi : Rn → R∪{+∞} for all
i ∈ I, be a family of convex functions. Define the pointwise supremum f : Rn → R∪{+∞}
of {fi }i∈I by
f (x) = sup fi (x).
i∈I

Suppose that dom(f ) 6= ∅. Then, the function f is convex.

(c) (Affine Composition) Let g : Rn → R∪{+∞} be a convex function and A : Rm → Rn be an


affine mapping. Suppose that range(A)∩dom(g) 6= ∅. Then, the function f : Rm → R∪{+∞}
defined by f (x) = g(A(x)) is convex.

(d) (Composition with an Increasing Convex Function) Let g : Rn → R ∪ {+∞} and


h : R → R ∪ {+∞} be convex functions that are not identically +∞. Suppose that h is
increasing on dom(h). Define the function f : Rn → R ∪ {+∞} by f (x) = h(g(x)), with the
convention that h(+∞) = +∞. Suppose that dom(f ) 6= ∅. Then, the function f is convex.

(e) (Restriction on Lines) Given a function f : Rn → R ∪ {+∞} that is not identically +∞,
a point x0 ∈ Rn , and a direction h ∈ Rn , define the function f˜x0 ,h : R → R ∪ {+∞} by
f˜x0 ,h (t) = f (x0 + th). Then, the function f is convex iff the function f˜x0 ,h is convex for any
x0 ∈ Rn and h ∈ Rn .

The above results can be derived directly from the definition. We shall prove (e) and leave the rest
as exercises to the reader.

16
Proof of Theorem 11(e) Suppose that f is convex. Let x0 , h ∈ Rn be arbitrary. Then, for any
t1 , t2 ∈ R and α ∈ [0, 1], we have

f˜x0 ,h (αt1 + (1 − α)t2 ) = f (x0 + (αt1 + (1 − α)t2 )h)

= f (α(x0 + t1 h) + (1 − α)(x0 + t2 h))

≤ αf (x0 + t1 h) + (1 − α)f (x0 + t2 h)

= αf˜x0 ,h (t1 ) + (1 − α)f˜x0 ,h (t2 );

i.e., f˜x0 ,h is convex. Conversely, suppose that f˜x0 ,h is convex for any x0 , h ∈ Rn . Let x1 , x2 ∈ S
and α ∈ [0, 1]. Upon setting x0 = x1 ∈ Rn and h = x2 − x1 ∈ Rn , we have

f ((1 − α)x1 + αx2 ) = f˜x0 ,h (α)

= f˜x0 ,h (α · 1 + (1 − α) · 0)

≤ αf˜x0 ,h (1) + (1 − α)f˜x0 ,h (0)

= αf (x2 ) + (1 − α)f (x1 );

i.e., f is convex. This completes the proof. u


t

2.3 Differentiable Convex Functions


When a given function is differentiable, it is possible to characterize its convexity via its derivatives.
We begin with the following result, which makes use of the gradient of the given function.
Theorem 12 Let f : Ω → R be a differentiable function on the open set Ω ⊆ Rn and S ⊆ Ω be a
convex set. Then, f is convex on S (i.e., the inequality (6) holds for all x1 , x2 ∈ S and α ∈ [0, 1])
iff
f (x) ≥ f (x̄) + (∇f (x̄))T (x − x̄)
for all x, x̄ ∈ S.
To appreciate the geometric content of the above theorem, observe that x 7→ f (x̄)+(∇f (x̄))T (x− x̄)
is an affine function whose level sets are hyperplanes with normal ∇f (x̄) and takes the value f (x̄)
at x̄. Thus, Theorem 12 stipulates that at every x̄ ∈ S, the function f is minorized by an affine
function that coincides with f at x̄.
Proof Suppose that f is convex on S. Let x, x̄ ∈ S and α ∈ (0, 1). Then, we have
f (αx + (1 − α)x̄) − (1 − α)f (x̄) f (x̄ + α(x − x̄)) − f (x̄)
f (x) ≥ = f (x̄) + . (7)
α α
Now, recall that
f (x̄ + α(x − x̄)) − f (x̄)
lim
α&0 α
is the directional derivative of f at x̄ in the direction x − x̄ and is equal to (∇f (x̄))T (x − x̄) (see
Section 3.2.2 of Handout C). Hence, upon letting α & 0 in (7), we have

f (x) ≥ f (x̄) + (∇f (x̄))T (x − x̄),

17
as desired.
Conversely, let x1 , x2 ∈ S and α ∈ (0, 1). Then, we have x̄ = αx1 + (1 − α)x2 ∈ S, which implies
that

f (x1 ) ≥ f (x̄) + (1 − α)(∇f (x̄))T (x1 − x2 ), (8)

f (x2 ) ≥ f (x̄) + α(∇f (x̄))T (x2 − x1 ). (9)

Upon multiplying (8) by α and (9) by 1 − α and summing, we obtain the desired result. u
t
In the case where f is twice continuously differentiable, we have the following characterization:
Theorem 13 Let f : S → R be a twice continuously differentiable function on the open convex set
S ⊆ Rn . Then, f is convex on S iff ∇2 f (x)  0 for all x ∈ S.
Proof Suppose that ∇2 f (x)  0 for all x ∈ S. Let x1 , x2 ∈ S. By Taylor’s theorem, there exists
an x̄ ∈ [x1 , x2 ] ⊆ S such that
1
f (x2 ) = f (x1 ) + (∇f (x1 ))T (x2 − x1 ) + (x2 − x1 )T ∇2 f (x̄)(x2 − x1 ). (10)
2
Since ∇2 f (x̄)  0, we have (x2 − x1 )T ∇2 f (x̄)(x2 − x1 ) ≥ 0. Upon substituting this inequality
into (10) and invoking Theorem 12, we conclude that f is convex on S.
Conversely, suppose that ∇2 f (x̄) 6 0 for some x̄ ∈ S. Then, there exists a v ∈ Rn such
that v T ∇2 f (x̄)v < 0. Since S is open and ∇2 f is continuous, there exists an  > 0 such that
x̄0 = x̄ + v ∈ S and v T ∇2 f (x̄ + α(x̄0 − x̄))v < 0 for all α ∈ [0, 1]. This implies (by taking x1 = x̄
and x2 = x̄0 in (10)) that f (x̄0 ) < f (x̄) + (∇f (x̄))T (x̄0 − x̄). Hence, by Theorem 12, we conclude
that f is not convex on S. This completes the proof. u
t
We point out that Theorem 13 only applies to functions f that are twice continuously differen-
tiable on an open convex set S. This should be contrasted with Theorem 12, where the function f
needs only be differentiable on a convex subset S 0 of an open set. In particular, the set S 0 need not
be open. To see why S must be open in Theorem 13, consider the function f : R2 → R given by
f (x, y) = x2 − y 2 . This function is convex on the set S = R × {0}. However, its Hessian, which is
given by  
2 2 0
∇ f (x, y) = for (x, y) ∈ R2 ,
0 −2
is nowhere positive semidefinite.

2.4 Establishing Convexity of Functions


Armed with the tools developed in previous sub–sections, we are already able to establish the
convexity of many functions. Here are some examples.
Example 6 (Some Examples of Convex Functions)
n → R be given by f (x, Y ) = xT Y −1 x. We compute
1. Let f : Rn × S++

epi(f ) = (x, Y, r) ∈ Rn × S++n


× R : Y  0, xT Y −1 x ≤ r


   
Y x
= (x, Y, r) ∈ Rn × S++
n
×R:  0, Y  0 ,
xT r

18
where the last equality follows from the Schur complement (see, e.g., [2, Section A.5.5]). This
shows that epi(f ) is a convex set, which implies that f is convex on Rn × S++n .

2. Let f : Rm×n → R+ be given by f (X) = kXk2 , where k · k2 denotes the spectral norm or
largest singular value of the m × n matrix X. It is well known that (see, e.g., [6])
f (X) = sup uT Xv : kuk2 = 1, kvk2 = 1 .


This shows that f is a pointwise supremum of a family of linear functions of X. Hence, f is


convex.
3. Let k · k : Rn → R+ be a norm on Rn and f : Rn → R+ be given by f (x) = kxkp , where p ≥ 1.
Then, for any x ∈ Rn , we have f (x) = g(kxk), where g : R+ → R+ is given by g(z) = z p .
Clearly, g is increasing on R+ and convex on R++ (e.g., by verifying g 00 (z) ≥ 0 for all z > 0).
To show that g is convex on R+ , it remains to verify that for any z ∈ R+ and α ∈ [0, 1],
g(α · 0 + (1 − α)z) = (1 − α)p z p ≤ (1 − α)z p = αg(0) + (1 − α)g(z).
Hence, by Theorem 11(d), we conclude that f is convex.
4. Let f : Rn → R be given by f (x) = log ( ni=1 exp(xi )). We compute
P

exp(xi ) exp(2xi )

 Pn − Pn if i = j,
( i=1 exp(xi ))2

∂2f i=1 exp(xi )


=
∂xi xj  exp(xi + xj )
 − Pn 6 j.
if i =


( i=1 exp(xi ))2
This gives
1
∇2 f (x) = eT z diag(z) − zz T ,
 
2
(eT z)
where z = (exp(x1 ), . . . , exp(xn )). Now, for any v ∈ Rn , we have
 !2 
n
! n ! n
1  X X X
v T ∇2 f (x)v = zi zi vi2 − z i vi 
(e z)2
T
i=1 i=1 i=1
 !2 
n n n
! !
1 X √ X √ X √ √
=  ( zi )2 ( zi v i ) 2 − zi · ( z i v i ) 
(eT z)2 i=1 i=1 i=1

≥ 0
by the Cauchy–Schwarz inequality. Hence, f is convex.
5. ([2, Chapter 3, Exercise 3.17]) Suppose that p ∈ (0, 1). Let f : Rn++ → R be given by
1/p
f (x) = ( ni=1 xpi ) . We compute
P

n
!p−1 −2 " n
! #

2(p−1)
xpi xpi xip−2 + xi
 X X
(1 − p) − if i = j,



2

∂ f 
i=1 i=1
= !p−1 −2
∂xi xj 
 n
p
xip−1 xp−1
 X
 (1 − p) xi 6 j.
if i =


 j
i=1

19
This gives

n
!p−1 −2 " n
! #
 
xpi xpi diag xp−2
X X
∇2 f (x) = (1 − p) − p−2
1 , . . . , xn + zz T ,
i=1 i=1

where zi = xp−1
i for i = 1, . . . , n. Now, for any v ∈ Rn , we have

n
!p−1 −2  n
! n ! n
!2 
p p p−2
vi xp−1
X X X X
T 2 2
v ∇ f (x)v = (1 − p) xi − xi vi xi + i
 ≤ 0,
i=1 i=1 i=1 i=1

since !2
n n n 
! !
 
(p−2)/2 p/2
xpi vi2 xp−2
X X X
− i + vi xi xi ≤0
i=1 i=1 i=1

by the Cauchy–Schwarz inequality. It follows that f is concave on Rn++ .


n → R be given by f (X) = − ln det X. For those readers who are well versed in
6. Let f : S++
matrix calculus (see, e.g., [7] for a comprehensive treatment), the following formulas should
be familiar:
∇f (X) = −X −1 , ∇2 f (X) = X −1 ⊗ X −1 .
Here, ⊗ denotes the Kronecker product. Since X −1  0, it can be shown that X −1 ⊗X −1  0.
n .
It follows that f is convex on S++
n
Alternatively, we can establish the convexity of f on S++ by applying Theorem 11(e). To
n n
begin, let X0 ∈ S++ and H ∈ S . Define the set D = {t ∈ R : X0 + tH  0} = {t ∈ R :
λmin (X0 + tH) > 0}. Since λmin is continuous (see, e.g., [12, Chapter IV, Theorem 4.11]),
we see that D is open and convex. Now, consider the function f˜X0 ,H : D → R given by
f˜X0 ,H (t) = f (X0 + tH). For any t ∈ D, we compute

f˜X0 ,H (t) = − ln det(X0 + tH)


   
1/2 −1/2 −1/2 1/2
= − ln det X0 I + tX0 HX0 X0

n
!
X
= − ln(1 + tλi ) + ln det X0
i=1

and
n
00
X λ2i
f˜X 0 ,H
(t) = ≥ 0,
(1 + tλi )2
i=1
−1/2 −1/2
where λ1 , . . . , λn are the eigenvalues of It follows that f˜X0 ,H is convex on D.
X0 HX0 .
n .
This, together with Theorem 11(e), implies that f is convex on S++

20
2.5 Non–Differentiable Convex Functions
In previous sub–sections we developed techniques to check whether a differentiable convex function
is convex or not. In particular, we showed that a differentiable function f : Rn → R is convex iff
at every x̄ ∈ Rn , we have f (x) ≥ f (x̄) + (∇f (x̄))T (x − x̄). The latter condition has a geometric
interpretation, namely, the epigraph of f is supported by its tangent hyperplane at every (x̄, f (x̄)) ∈
Rn × R. Of course, as can be easily seen from the function x 7→ |x|, not every convex function
is differentiable. However, the above geometric interpretation seems to hold for such functions as
well. In order to make such an observation rigorous, we need to first generalize the notion of a
gradient to that of a subgradient.

Definition 9 Let f : Rn → R ∪ {+∞} be as in Definition 8. A vector s ∈ Rn is called a subgra-


dient of f at x̄ if
f (x) ≥ f (x̄) + sT (x − x̄) for all x ∈ Rn . (11)
The set of vectors s such that (11) is called the subdifferential of f at x̄ and is denoted by ∂f (x̄).

Note that ∂f (x̄) may not be empty even though f is not differentiable at x̄. For example, consider
the function f : R → R given by f (x) = |x|. Clearly, f is not differentiable at the origin, but it can
be easily verified from the definition that ∂f (0) = [−1, 1].
In many ways, the subdifferential behaves like a derivative, although one should note that the
former is a set, while the latter is a unique element. Here we state some important properties of
the subdifferential without proof. For details, we refer the reader to [11, Chapter 2, Section 2.5].

Theorem 14 Let f : Rn → R ∪ {+∞} be a convex function that is not identically +∞.

(a) (Subgradient and Directional Derivative) Let

f (x + td) − f (x)
f 0 (x, d) = lim
t&0 t

be the directional derivative of f at x ∈ Rn in the direction d ∈ Rn \{0}, and let x ∈


int dom(f ). Then, ∂f (x) is a non–empty compact convex set. Moreover, for any d ∈ Rn ,
we have f 0 (x, d) = maxs∈∂f (x) sT d.

(b) (Subdifferential of a Differentiable Function) The convex function f is differentiable at


x ∈ Rn iff the subdifferential ∂f (x) is a singleton, in which case it consists of the gradient of
f at x.

(c) (Additivity of Subdifferentials) Suppose that f = f1 + f2 , where f1 : Rn → R ∪ {+∞}


and f2 : Rn → R ∪ {+∞} are convex functions that are not identically +∞. Furthermore,
suppose that there exists an x0 ∈ dom(f ) such that f1 is continuous at x0 . Then, we have

∂f (x) = ∂f1 (x) + ∂f2 (x) for all x ∈ dom(f ).

Theorem 14(a) states that the subdifferential ∂f of a convex function f is non–empty at any
x ∈ int dom(f ). This leads to the natural question of what happens when x ∈ bd dom(f ). The
following example shows that ∂f can be empty at such points.

21
Example 7 (A Convex Function with Empty Subdifferential at a Point) Let f : Rn →
R ∪ {+∞} be given by  q
− 1 − kxk22 if kxk2 ≤ 1,

f (x) =
 +∞ otherwise.
It is clear that f is a convex function with dom(f ) = B(0, 1) and is differentiable on int dom(f ) =
{x ∈ Rn : kxk2 < 1}. It is also clear that ∂f (x) = ∅ for all x 6∈ dom(f ). Now, let x̄ ∈ bd dom(f ) =
{x ∈ Rn : kxk2 = 1} be arbitrary. If s ∈ ∂f (x̄), then we must have
q
f (x) = − 1 − kxk22 ≥ sT (x − x̄) for all x ∈ B(0, 1). (12)

Without loss of generality, we may assume that x̄ = e1 . For α ∈ [−1, 1], define x(α) = αe1 ∈
B(0, 1). From (12), we see that s ∈ Rn satisfies
p
f (x(α)) = − 1 − α2 ≥ αs1 − s1 for all α ∈ [−1, 1].
p
However, this implies that s1 ≥ (1 + α)/(1 − α) for all α ∈ [−1, 1), which is impossible. Hence,
we have ∂f (x̄) = ∅ for all x̄ ∈ bd dom(f ).

To further illustrate the concepts and results discussed above, let us compute the subdifferential
of the Euclidean norm.

Example 8 (Subdifferential of the Euclidean Norm) Let f : Rn → R be given by f (x) =


kxk2 . Note that f is differentiable whenever x 6= 0. Hence, by Theorem 14(b) we have ∂f (x) =
{∇f (x)} = {x/kxk2 } for all x ∈ Rn \ {0}.
Now, recall from Definition 9 that s ∈ Rn is a subgradient of f at 0 iff kxk2 ≥ sT x for all
x ∈ Rn . It follows that ∂f (0) = B(0, 1).

3 Further Reading
Convex analysis is a rich subject with many deep and beautiful results. For more details, one can
consult the general references listed on the course website and/or the books [3, 13, 9, 5, 1, 11].

References
[1] A. Barvinok. A Course in Convexity, volume 54 of Graduate Studies in Mathematics. American
Mathematical Society, Providence, Rhode Island, 2002.

[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge,
2004. Available online at http://www.stanford.edu/~boyd/cvxbook/.

[3] A. Brøndsted. An Introduction to Convex Polytopes, volume 90 of Graduate Texts in Mathe-


matics. Springer–Verlag, New York, 1983.

[4] C. Ding, D. Sun, and K.-C. Toh. An Introduction to a Class of Matrix Cone Programming.
Mathematical Programming, Series A, 144(1–2):141–179, 2014.

22
[5] J.-B. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex Analysis. Grundlehren Text
Editions. Springer–Verlag, Berlin/Heidelberg, 2001.

[6] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,
1985.

[7] J. R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statis-
tics and Econometrics. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.,
Chichester, England, revised edition, 1999.

[8] E. A. Papa Quiroz, L. Mallma Ramirez, and P. R. Oliveira. An Inexact Proximal Method for
Quasiconvex Minimization. European Journal of Operational Research, 246(3):721–729, 2015.

[9] R. T. Rockafellar. Convex Analysis. Princeton Landmarks in Mathematics and Physics.


Princeton University Press, Princeton, New Jersey, 1997.

[10] H. L. Royden. Real Analysis. Macmillan Publishing Company, New York, third edition, 1988.

[11] A. Ruszczyński. Nonlinear Optimization. Princeton University Press, Princeton, New Jersey,
2006.

[12] G. W. Stewart and J. Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990.

[13] G. M. Ziegler. Lectures on Polytopes, volume 152 of Graduate Texts in Mathematics. Springer–
Verlag, New York, revised first edition, 1995.

23

You might also like