Professional Documents
Culture Documents
As briefly mentioned in Handout 1, the notion of convexity plays a very important role in both
the theoretical and algorithmic aspects of optimization. Before we discuss in depth the relevance
of convexity in optimization, however, let us first introduce the notions of convex sets and convex
functions and study some of their properties.
L = {z ∈ Rn : z = αx + (1 − α)y, α ∈ R}
of all affine combinations of x and y is simply the line determined by x and y, and the set
is the line segment between x and y. By convention, the empty set ∅ is affine and hence also convex.
The notion of an affine (resp. convex) combination of two points can be easily generalized to
n
any finite number of points.
Pk Pkan affine combination of the points x1 , . . . , xk ∈ R is a
In particular,
point of the form z = i=1 αi xi , where i=1 αi = 1. Similarly, a convex combination of the points
x1 , . . . , xk ∈ Rn is a point of the form z = ki=1 αi xi , where ki=1 αi = 1 and α1 , . . . , αk ≥ 0.
P P
(a) S is affine.
(c) S is the translation of some linear subspace V ⊆ Rn ; i.e., S is of the form {x} + V = {x + v ∈
Rn : v ∈ V } for some x ∈ Rn .
1
Proof We first establish (a)⇒(b) by induction on the number of points k ≥ 1 in S used to form
the affine combination. The case where k = 1 is trivial, and the case where k = 2 follows Pfrom the
k
assumption that S is affine. For the inductive step, let k ≥ 3 and suppose that x = i=1 αi xi ,
where x1 , . . . , xk ∈ S and ki=1 αi = 1. If αi = 0 for some i ∈ {1, . . . , k}, then x is an affine
P
combination of k − 1 points in S and hence x ∈ S by the inductive hypothesis. On the other hand,
if αi 6= 0 for all i ∈ {1, . . . , n}, then we may assume without loss of generality that α1 6= 1 and
write
k
X αi
x = α1 x1 + (1 − α1 ) xi .
1 − α1
i=2
Pk Pk
Since i=2 (αi /(1 − α1 )) = 1, the point x̄ = i=2 (αi /(1 − α1 ))xi is an affine combination of k − 1
points in S. Hence, we have x̄ ∈ S by the inductive hypothesis. Since S is affine by assumption
and x is an affine combination of x1 ∈ S and x̄ ∈ S, we conclude that x ∈ S, as desired.
Next, we establish (b)⇒(c). Let x ∈ S and set V = S − {x}. Our goal is to show that V ⊆ Rn
is a linear subspace. Towards that end, let v1 , v2 ∈ V and α, β ∈ R. By definition of V , there exist
z1 , z2 ∈ S such that v1 = z1 − x and v2 = z2 − x. Using the fact that z(t) = tz1 + (1 − t)z2 ∈ S for
any t ∈ R, we have
(a) S is convex.
2
2. Hyperplane: H(s, c) = {x ∈ Rn : sT x = c}
positive definite matrix (i.e., xT Qx > 0 for all x ∈ Rn \{0} and denoted by Q ∈ S++ n )
The convexity of the sets in Example 1 can be easily established by first principles. We leave this
task to the reader.
It is clear from the definition that the intersection of an arbitrary family of affine (resp. convex)
sets is an affine (resp. convex) set. Thus, for any arbitrary S ⊆ Rn , there is a smallest (by
inclusion) affine (resp. convex) set containing S; namely, the intersection of all affine (resp. convex)
sets containing S. This leads us to the following definitions:
1. The affine hull of S, denoted by aff(S), is the intersection of all affine subspaces containing
S. In particular, aff(S) is the smallest affine subspace that contains S.
2. The convex hull of S, denoted by conv(S), is the intersection of all convex sets containing
S. In particular, conv(S) is the smallest convex set that contains S.
The above definitions can be viewed as characterizing the affine hull and convex hull of a set S
from the outside. However, given a point in the affine or convex hull of S, it is not immediately clear
from the above definitions how it is related to the points in S. This motivates our next proposition,
which in some sense provides a characterization of the affine hull and convex hull of S from the
inside.
Proof Let us prove (a) and leave the proof of (b) as a straightforward exercise to the reader. Let
T be the set of all affine combinations of points in S. Since S ⊆ aff(S), every x ∈ T is an affine
combination of points in aff(S). Hence, by Proposition 1, we have T ⊆ aff(S).
To establish the reverse inclusion, we show that T is an affine subspace containing S. As aff(S)
is the smallest affine subspace that contains S, this would show that aff(S) ⊆ T . To begin, we
note that S ⊆ T . Thus, it remains to show that T is an affine subspace. Towards that end, let
3
x1 , x2 ∈ T . By definition, there exist y1 , . . . , yp , yp+1 , . . . , yq ∈ S and α1 , . . . , αp , αp+1 , . . . , αq ∈ R
such that
X p X q
x1 = αi yi , x2 = αi yi
i=1 i=p+1
and
p
X q
X
αi = αi = 1.
i=1 i=p+1
with
p
X q
X
βαi + (1 − β)αi = 1.
i=1 i=p+1
In other words, x̄ is an affine combination of points in S. Thus, we have x̄ ∈ T , which shows that
T is an affine subspace, as desired. u
t
Since an affine subspace is a translation of a linear subspace and the dimension of a linear
subspace is a well–defined notion, we can define the dimension of an affine subspace as the dimension
of its underlying linear subspace (see Section 1.5 of Handout B). This, together with the definition
of affine hull, makes it possible to define the dimension of an arbitrary set in Rn . Specifically, we
have the following definition:
Example 2 (Dimension of a Set) Consider the two–point set S = {(1, 1), (3, 2)} ⊆ R2 . By
Proposition 3(a), we have aff(S) = {α(1, 1) + (1 − α)(3, 2) : α ∈ R} ⊆ R2 . It is easy to verify that
aff(S) = {(0, 1/2)} + V , where V = {t(1, 1/2) : t ∈ R} is the linear subspace generated by the vector
(1, 1/2). Hence, we have dim(S) = dim(V ) = 1.
4
1.2.1 Affine Functions
We say that a map A : Rn → Rm is affine if
for all x1 , x2 ∈ Rn and α ∈ R. It can be shown that A is affine iff there exist A0 ∈ Rm×n and
y0 ∈ Rm such that A(x) = A0 x + y0 for all x ∈ Rn . As the following proposition shows, convexity
is preserved under affine mappings.
Proof The proposition follows from the easily verifiable fact that for any x1 , x2 ∈ Rn , A([x1 , x2 ]) =
[A(x1 ), A(x2 )] ⊆ Rm . u
t
Rn , where r > 0. Clearly, B(0, r) is convex. Now, let Q be an n×n symmetric positive definite ma-
trix. Then, it is well–known that Q is invertible and the n×n symmetric matrix Q−1 is also positive
definite. Moreover, there exists an n × n symmetric matrix Q−1/2 such that Q−1 = Q−1/2 Q−1/2 .
(See [6, Chapter 7] if you are not familiar with these facts.) Thus, we may define an affine mapping
A : Rn → Rn by A(x) = Q−1/2 x + x̄. We claim that
i.e., A(B(0, r)) ⊆ E(x̄, Q/r2 ). Conversely, let x ∈ E(x̄, Q/r2 ). Consider the point y = Q1/2 (x −
x̄) = A−1 (x). Then, we have y T y ≤ r2 , which implies that E(x̄, Q/r2 ) ⊆ A(B(0, r)). Hence, we
conclude from the above calculation and Proposition 4 that E(x̄, Q/r2 ) is convex.
Proof For any x1 = (x̄1 , t1 ) ∈ Rn × R++ , x2 = (x̄2 , t2 ) ∈ Rn × R++ , and α ∈ [0, 1], we have
αx̄1 + (1 − α)x̄2
P (αx1 + (1 − α)x2 ) = = βP (x1 ) + (1 − β)P (x2 ),
αt1 + (1 − α)t2
where
αt1
β= ∈ [0, 1].
αt1 + (1 − α)t2
5
Moreover, as α increases from 0 to 1, β increases from 0 to 1. It follows that P ([x1 , x2 ]) =
[P (x1 ), P (x2 )] ⊆ Rn . This completes the proof. u
t
The following result, which is a straightforward application of Propositions 4 and 5, shows that
convexity is also preserved under linear–fractional mappings (also known as projective mappings).
Corollary 1 Let A : Rn → Rm+1 be the affine map given by 线性分式函数
" # " #
Q u
A(x) = T x + ,
c d
Note that t∗ < ∞, since there exists at least one index j such that βj > 0. Now, it is straightforward
to verify that x = ki=1 αi0 xi , ki=1 αi0 = 1, and α10 , . . . , αk0 ≥ 0, and that |{i : αi0 > 0}| ≤ k − 1.
P P
Now, this process can be repeated until there are only at most n + 1 non–zero coefficients. This
completes the proof. u
t
Although Carathéodory’s theorem asserts that any x ∈ conv(S) can be obtained as a convex
combination of at most n + 1 points in S, it does not imply the existence of a “basis” of size n + 1
for the points in S. In other words, there may not exist a fixed set of n + 1 points x0 , x1 , . . . , xn in
S such that conv(S) = conv({x0 , x1 , . . . , xn }) (consider, for instance, the unit disk B(0, 1)). This
should be contrasted with the case of linear combinations, where a subspace of dimension n can be
generated by n fixed basis vectors.
However, not all is lost. It turns out that under suitable conditions, there exists a fixed, though
possibly infinite, set S 0 of points in S such that S = conv(S 0 ). In order to introduce this result, we
need a definition.
Definition 4 Let S ⊆ Rn be non–empty and convex. We say that x ∈ Rn is an extreme point
of S if x ∈ S and there do not exist two different points x1 , x2 ∈ S such that x = 21 (x1 + x2 ).
6
The above definition can be rephrased as follows: x ∈ S is an extreme point of S if x = x1 = x2
whenever x1 , x2 ∈ S are such that x = αx1 + (1 − α)x2 for some α ∈ (0, 1).
Example 4 (Extreme Points of a Ball) Consider the unit ball B(0, 1). We claim that any
x ∈ B(0, 1) with kxk2 = 1 is an extreme point of B(0, 1). Indeed, let x ∈ B(0, 1) be such that
kxk2 = 1, and let x1 , x2 ∈ B(0, 1) be such that x = 21 (x1 + x2 ). We compute
1 1 1 1 1
1 = kxk22 = kx1 + x2 k22 = kx1 k22 + kx2 k22 − kx1 − x2 k22 ≤ 1 − kx1 − x2 k22 .
4 2 2 4 4
This shows that x1 = x2 .
Theorem 2 (Minkowski’s Theorem) Let S ⊆ Rn be compact and convex. Then, S is the convex
hull of its extreme points.
Definition 5 Let S ⊆ Rn be non–empty and convex. We say that a non–empty convex set F ⊆ Rn
is a face of S if F ⊆ S and there do not exist two points x1 , x2 ∈ S such that (i) x1 ∈ S \ F or
x2 ∈ S \ F , and (ii) 12 (x1 + x2 ) ∈ F .
7
Clearly, ∆I is a non–empty convex subset of ∆. We claim that ∆I is a face of ∆. Indeed, let
x1 , x2 ∈ ∆ and suppose that βx1 + (1 − β)x2 ∈ ∆I for some β ∈ (0, 1). Then, there exist multipliers
{αi1 }ni=0 , {αi2 }ni=0 , and {αi }ni=0 such that
n
X n
X X
αi1 = αi2 = αi = 1, αi1 , αi2 , αi ≥ 0 for i = 0, 1, . . . , n, αi = 0 for i 6∈ I,
i=0 i=0 i∈I
n
X n
X
x1 = αi1 ei , x2 = αi2 ei ,
i=0 i=0
and
n
X n
X X
β αi1 ei + (1 − β) αi2 ei = αi e i . (1)
i=0 i=0 i∈I
Upon comparing coefficients in (1), we have
βαi1 + (1 − β)αi2 = αi for i = 1, . . . , n. (2)
In particular, we see that αi1 = αi2 = 0 whenever i 6∈ I ∪ {0}, since β ∈ (0, 1) and αi1 , αi2 ≥ 0 for
i = 0, 1, . . . , n. Now, in order to complete the proof of the claim, it suffices to show that x1 , x2 ∈ ∆I .
Towards that end, observe that if 0 ∈ I, then we have αi1 = αi2 = 0 for i 6∈ I, which implies that
x1 , x2 ∈ ∆I . Otherwise, we have 0 6∈ I, and our goal is to show that α01 = α02 = 0. To achieve that,
we first sum the equations in (2) and use the fact that α0 = 0 to get
n
X n
X n
X
β αi1 + (1 − β) αi2 = αi = 1.
i=1 i=1 i=1
8
Definition 6 Let S ⊆ Rn be arbitrary. We say that x ∈ S belongs to the relative interior of S,
denoted by x ∈ rel int(S), if there exists an > 0 such that B(x, ) ∩ aff(S) ⊆ S. The relative
boundary of S, denoted by rel bd(S), is defined by rel bd(S) = cl(S) \ rel int(S).
The following result demonstrates the relevance of the above definition when S is convex.
Clearly, we have x̄ ∈ aff(∆) = aff(S). Now, define V i = aff({x0 , x1 , . . . , xi−1 , xi+1 , . . . , xk }) and let
i = minx∈V i kx − x̄k2 . Then, we have i > 0 for i = 0, 1, . . . , k. Upon setting = min0≤i≤m i , we
conclude that B(x̄, ) ∩ aff(S) ⊆ S, as required. u
t
As one would expect, if we move from a point in the relative interior of a non–empty convex set
S to any other point in S, then all the intermediate points should remain in the relative interior of
S. Such observation is made precise in the following proposition.
Proposition 7 Let S ⊆ Rn be non–empty and convex. For any x ∈ cl(S) and x0 ∈ rel int(S), we
have
(x, x0 ] = αx + (1 − α)x0 ∈ Rn : α ∈ [0, 1) ⊆ rel int(S).
Proof Let α ∈ [0, 1) be fixed and consider x̄ = αx + (1 − α)x0 . Since x ∈ cl(S), for any > 0, we
have x ∈ S + (B(0, ) ∩ aff(S)). Now, we compute
It follows that for sufficiently small > 0, we have B(x̄, ) ∩ aff(S) ⊆ αS + (1 − α)S = S, where
the last equality is due to the convexity of S. This implies that x̄ ∈ rel int(S), as desired. u
t
Another topological concept of interest is that of compactness. Recall that a set S ⊆ Rn is
compact if it is closed and bounded. Moreover, the Weierstrass theorem asserts that if S ⊆ Rn is
a non–empty compact set and f : S → R is a continuous function, then f attains its maximum
and minimum on S (see, e.g., [10, Chapter 7, Theorem 18] or Section 3.1 of Handout C). Thus, in
the context of optimization, it would be good to know when a convex set S ⊆ Rn is compact. The
following proposition provides one sufficient criterion.
9
Proposition 8 Let S ⊆ Rn be compact. Then, conv(S) is compact.
Proof By Carathéodory’s theorem (Theorem 1), for every x ∈ conv(S), there exist x1 , . . . , xn+1 ∈
S and α = (α1 , . . . , αn+1 ) ∈ ∆, where
n+1
( )
X
n+1
∆= α∈R : αi = 1, α ≥ 0 ,
i=1
then conv(S) = f (S n+1 ×∆). Now, observe that S n+1 ×∆ is compact, as both S and ∆ are compact.
Since the continuous image of a compact set is compact (see, e.g., [10, Chapter 7, Proposition 24]
or Section 3.1 of Handout C), we conclude that conv(S) is compact, as desired. u
t
Theorem 4 Let S ⊆ Rn be non–empty, closed, and convex. Then, for every x ∈ Rn , there exists
a unique point z ∗ ∈ S that is closest (in the Euclidean norm) to x.
1
kz̄ − xk22 = (µ∗ )2 − kz1 − z̄k22 = (µ∗ )2 − kz1 − z2 k22 .
4
In particular, if z1 6= z2 , then kz̄ − xk22 < (µ∗ )2 , which is a contradiction. This establishes the
uniqueness and completes the proof of the theorem. u
t
In the sequel, we shall refer to the point z ∗ in Theorem 4 as the projection of x on S and
denote it by ΠS (x). In other words, we have
10
Theorem 5 Let S ⊆ Rn be non–empty, closed, and convex. Given any x ∈ Rn , we have z ∗ = ΠS (x)
iff z ∗ ∈ S and (z − z ∗ )T (x − z ∗ ) ≤ 0 for all z ∈ S.
Proof Let z ∗ = ΠS (x) and z ∈ S. Consider points of the form z(α) = αz + (1 − α)z ∗ , where
α ∈ [0, 1]. By convexity, we have z(α) ∈ S. Moreover, we have kz ∗ − xk2 ≤ kz(α) − xk2 for all
α ∈ [0, 1]. On the other hand, note that
Thus, we see that kz(α) − xk22 ≥ kz ∗ − xk22 for all α ∈ [0, 1] iff (z − z ∗ )T (z ∗ − x) ≥ 0. This is
precisely the stated condition.
Conversely, suppose that for some z 0 ∈ S, we have (z − z 0 )T (x − z 0 ) ≤ 0 for all z ∈ S. Upon setting
z = ΠS (x), we have
(ΠS (x) − z 0 )T (x − z 0 ) ≤ 0. (3)
On the other hand, by our argument in the preceding paragraph, the point ΠS (x) satisfies
max y T z < y T x.
z∈S
Proof Since S is a non–empty closed convex set, by Theorem 4, there exists a unique point z ∗ ∈ S
that is closest to x. Set y = x − z ∗ . By Theorem 5, for all z ∈ S, we have (z − z ∗ )T y ≤ 0, which
implies that
y T z ≤ y T z ∗ = y T x + y T (z ∗ − x) = y T x − kyk22 .
Since x 6∈ S, we have y 6= 0. It follows that
11
as desired. u
t
To demonstrate the power of Theorem 6, we shall use it to establish three useful results. The
first states that every closed convex set can be represented as an intersection of halfspaces that
contain it.
Theorem 7 A closed convex set S ⊆ Rn is the intersection of all the halfspaces containing S.
Proof We may assume that ∅ ( S ( Rn , for otherwise the theorem is trivial. Let x ∈ Rn \ S
be arbitrary. Then, by Theorem 6, there exist y ∈ Rn and c = maxz∈S y T z ∈ R such that the
halfspace H − (y, c) = {z ∈ Rn : y T z ≤ c} contains S but not x. It follows that the intersection of
all the halfspaces containing S is precisely S itself. u
t
Theorem 7 suggests that at any boundary point x of a closed convex set S, there should be a
hyperplane supporting S at x. To formalize this intuition, we need some definitions.
1. supports S if S is contained in one of the halfspaces defined by H(s, c), say, S ⊆ H − (s, c);
We are now ready to establish the second result, which concerns the existence of supporting hyper-
planes to convex sets.
Theorem 8 Let S ⊆ Rn be non–empty, closed, and convex. Then, for any x ∈ rel bd(S), there
exists a hyperplane supporting S non–trivially at x.
Proof Let x ∈ rel bd(S) be fixed. Set S 0 = S −{x} and V = aff(S)−{x}. Then, S 0 is a non–empty
closed convex set in the Euclidean space V with 0 ∈ rel bd(S 0 ). Moreover, since rel bd(S) 6= ∅, we
have S 6= aff(S), which implies that V 6= S 0 . Now, consider a sequence {xk }k≥1 in V \ S 0 such that
xk → 0. By Theorem 6, for k = 1, 2, . . ., there exists a yk ∈ V such that kyk k2 = 1 and ykT w < ykT xk
for all w ∈ S 0 . The latter is equivalent to
Since the sequence {yk }k≥1 is bounded, by considering a subsequence if necessary, we may assume
that yk → y with kyk2 = 1. It follows from (5) that y T z ≤ y T x for all z ∈ S; i.e., S ⊆ H − (y, y T x).
In addition, if S ⊆ H(y, y T x), or equivalently, y T z = y T x for all z ∈ S, then y T z = y T x for all
z ∈ aff(S). This implies that y ∈ V ⊥ , which is impossible because y ∈ V and y 6= 0. Hence, we have
S 6⊆ H(y, y T x). Consequently, we deduce that the hyperplane H(y, y T x) supports S non–trivially
at x, as desired. u
t
Armed with Theorem 8, we can complete the proof of Minkowski’s theorem (Theorem 2).
Proof of Minkowski’s Theorem We proceed by induction on dim(S). The base case, which
corresponds to dim(S) = 0, is trivial, because in this case S consists of a single point, which by
definition is an extreme point of S. Now, suppose that dim(S) = k ≥ 1 and let x ∈ S be arbitrary.
We consider two cases:
12
Case 1: x ∈ rel bd(S). By Theorem 8, we can find a hyperplane H supporting S at x. Observe that
H ∩ S is a compact convex set whose dimension is at most k − 1. Thus, by the inductive hypothesis,
x ∈ H ∩ S can be expressed as a convex combination of extreme points of H ∩ S. Now, it remains to
show that the extreme points of H ∩S are also the extreme points of S. Towards that end, it suffices
to show that H ∩ S is a face of S and then invoke Proposition 6. Let x1 , x2 ∈ S and α ∈ (0, 1) be
such that αx1 + (1 − α)x2 ∈ H ∩ S. Suppose that H takes the form H = {z ∈ Rk : sT z = sT x} for
some s ∈ Rk \ {0}. Since H supports S, we may assume without loss of generality that sT x1 ≤ sT x
and sT x2 ≤ sT x. This, together with the fact that αx1 + (1 − α)x2 ∈ H, implies that x1 , x2 ∈ H.
It follows that [x1 , x2 ] ⊆ H ∩ S, as desired.
Case 2: x ∈ rel int(S). Since dim(S) ≥ 1, there exists an x0 ∈ S such that x0 6= x. Then,
Proposition 7 and the compactness of S imply that the line joining x and x0 intersects rel bd(S) at
two points, say, y, z ∈ rel bd(S). By the result in Case 1, both y and z can be expressed as a convex
combination of the extreme points of S. Since x is a convex combination of y and z, it follows that
x can also be expressed as a convex combination of the extreme points of S. u
t
To motivate the third result, observe that Theorem 6 is a result on point–set separation. A
natural question is whether we can derive an analogous result for set–set separation. In other
words, given two non–empty and non–intersecting convex sets, is it possible to separate them using
a hyperplane? As it turns out, under suitable conditions, we can reduce this question to that
of point–set separation. Consequently, we can apply Theorem 6 to obtain the desired separation
result.
as desired. u
t
13
2 Convex Functions
2.1 Basic Definitions and Properties
Let us now turn to the notion of a convex function.
A major advantage of allowing f to take the value +∞ is that we do not need to explicitly specify
the domain of f . Indeed, if the function f is defined on a set S ⊆ Rn , then we may simply extend
f to Rn by setting f (x) = +∞ for all x 6∈ S and obtain dom(f ) = S.
The relationship between convex sets and convex functions is explained in the following propo-
sition, whose proof is left as an exercise to the reader.
A simple consequence of Proposition 9 is that if f is convex, then dom(f ) is convex. This follows
from Proposition 4 and the observation that dom(f ) is the image of the convex set epi(f ) under a
projection (which is a linear mapping). Another useful consequence of Proposition 9 is the following
inequality.
Proof The “if” part is clear. To prove the “only if” part, we may assume without loss of generality
that x1 , . . . , xk ∈ dom(f ). Then, we have (xi , f (xi )) ∈ epi(f ) for i = 1, . . . , k. Since epi(f ) is convex
by Proposition 9, we may invoke Proposition 2 to conclude that
k k k
!
X X X
αi (xi , f (xi )) = αi xi , αi f (xi ) ∈ epi(f ).
i=1 i=1 i=1
14
which completes the proof. u
t
The epigraph epi(f ) of f is closely related to, but not the same as, the t–level set Lt (f ) of f ,
where Lt (f ) = {x ∈ Rn : f (x) ≤ t} and t ∈ R is arbitrary. One obvious difference is that epi(f ) is
a subset of Rn × R, but Lt (f ) is a subset of Rn . However, there is another, more subtle, difference:
Even if Lt (f ) is convex for all t ∈ R, the function f may not be convex. This can be seen, e.g.,
from the function x 7→ x3 . A function whose domain is convex and whose t–level sets are convex
for all t ∈ R is called a quasi–convex function. The class of quasi–convex functions possesses nice
properties and is important in its own right. We refer the interested reader to [5, 8] for details.
Now, let f : Rn → R ∪ {+∞} be a convex function, so that epi(f ) is a non–empty convex set
by Proposition 9. Suppose in addition that epi(f ) is closed. By Theorem 8, we know that epi(f )
can be represented as an intersection of all the halfspaces containing it. Such halfspaces take the
form H − ((y, y0 ), c) = {(x, t) ∈ Rn × R : y T x + y0 t ≤ c} for some y ∈ Rn and y0 , c ∈ R. In fact, it
suffices to use only those halfspaces with y0 = −1 in the representation. To see this, suppose that
epi(f ) ⊆ H − ((y, y0 ), c) and consider the following cases:
Case 1: y0 6= 0. Then, we must have y0 < 0 (since we can fix x ∈ dom(f ) and take t ≥ f (x) to be
arbitrarily large), in which case we may assume without loss that y0 = −1 (by scaling y and c).
Case 2: y0 = 0. We claim that there exists another halfspace H − ((ȳ, −1), c̄) such that epi(f ) ⊆
H − ((ȳ, −1), c̄). Indeed, if this is not the case, then every halfspace containing epi(f ) is of the form
H − ((y, 0), c) for some y ∈ Rn and c ∈ R. This implies that f = −∞, which is impossible.
Next, we show that for every (x̄, t̄) 6∈ H − ((y, 0), c), there exists a halfspace H − ((y 0 , −1), c0 )
with y 0 ∈ Rn , c0 ∈ R such that epi(f ) ⊆ H − ((y 0 , −1), c0 ) and (x̄, t̄) 6∈ H − ((y 0 , −1), c0 ). This
would imply that halfspaces of the form H − ((y, 0), c) with y ∈ Rn and c ∈ R are not needed in the
representation of epi(f ). To begin, observe that for any (x, t) ∈ epi(f ), we have (x, t) ∈ H − ((y, 0), c)
and (x, t) ∈ H − ((ȳ, −1), c̄), which means that y T x − c ≤ 0 and ȳ T x − c̄ ≤ t. It follows that for any
λ ≥ 0,
λ(y T x − c) + ȳ T x − c̄ ≤ t.
Moreover, since y T x̄ − c > 0, for sufficiently large λ ≥ 0, we have λ(y T x̄ − c) + ȳ T x̄ − c̄ > t̄.
Thus, by setting y 0 = λy + ȳ and c0 = λc + c̄, we conclude that epi(f ) ⊆ H − ((y 0 , −1), c0 ) and
(x̄, t̄) 6∈ H − ((y 0 , −1), c0 ), as desired.
It is not hard to see that the halfspace H − ((y, −1), c) is the epigraph of the affine function
h : Rn → R given by h(x) = y T x − c. Moreover, if epi(f ) ⊆ H − ((y, −1), c), then h(x) ≤ f (x)
for all x ∈ Rn . (This is obvious for x 6∈ dom(f ). For x ∈ dom(f ), note that (x, f (x)) ∈ epi(f ) ⊆
H − ((y, −1), c) implies y T x − f (x) ≤ c, or equivalently, h(x) ≤ f (x).) Since the intersection of
halfspaces yields the epigraph of the pointwise supremum of the affine functions induced by those
halfspaces, we obtain the following theorem:
Theorem 10 Let f : Rn → R ∪ {+∞} be a convex function such that epi(f ) is closed. Then, f
can be represented as the pointwise supremum of all affine functions h : Rn → R satisfying h ≤ f .
Motivated by Theorem 10, given a convex function f : Rn → R ∪ {+∞}, we consider the set
15
epigraph of the function f ∗ : Rn → R ∪ {+∞} given by
Moreover, observe that Sf is closed and convex. This implies that f ∗ is convex. The function f ∗
is called the conjugate of f . It plays a very important role in convex analysis and optimization.
We shall study it in greater detail later.
(b) (Pointwise Supremum) Let I be an index set and {fi }i∈I , where fi : Rn → R∪{+∞} for all
i ∈ I, be a family of convex functions. Define the pointwise supremum f : Rn → R∪{+∞}
of {fi }i∈I by
f (x) = sup fi (x).
i∈I
(e) (Restriction on Lines) Given a function f : Rn → R ∪ {+∞} that is not identically +∞,
a point x0 ∈ Rn , and a direction h ∈ Rn , define the function f˜x0 ,h : R → R ∪ {+∞} by
f˜x0 ,h (t) = f (x0 + th). Then, the function f is convex iff the function f˜x0 ,h is convex for any
x0 ∈ Rn and h ∈ Rn .
The above results can be derived directly from the definition. We shall prove (e) and leave the rest
as exercises to the reader.
16
Proof of Theorem 11(e) Suppose that f is convex. Let x0 , h ∈ Rn be arbitrary. Then, for any
t1 , t2 ∈ R and α ∈ [0, 1], we have
i.e., f˜x0 ,h is convex. Conversely, suppose that f˜x0 ,h is convex for any x0 , h ∈ Rn . Let x1 , x2 ∈ S
and α ∈ [0, 1]. Upon setting x0 = x1 ∈ Rn and h = x2 − x1 ∈ Rn , we have
= f˜x0 ,h (α · 1 + (1 − α) · 0)
17
as desired.
Conversely, let x1 , x2 ∈ S and α ∈ (0, 1). Then, we have x̄ = αx1 + (1 − α)x2 ∈ S, which implies
that
Upon multiplying (8) by α and (9) by 1 − α and summing, we obtain the desired result. u
t
In the case where f is twice continuously differentiable, we have the following characterization:
Theorem 13 Let f : S → R be a twice continuously differentiable function on the open convex set
S ⊆ Rn . Then, f is convex on S iff ∇2 f (x) 0 for all x ∈ S.
Proof Suppose that ∇2 f (x) 0 for all x ∈ S. Let x1 , x2 ∈ S. By Taylor’s theorem, there exists
an x̄ ∈ [x1 , x2 ] ⊆ S such that
1
f (x2 ) = f (x1 ) + (∇f (x1 ))T (x2 − x1 ) + (x2 − x1 )T ∇2 f (x̄)(x2 − x1 ). (10)
2
Since ∇2 f (x̄) 0, we have (x2 − x1 )T ∇2 f (x̄)(x2 − x1 ) ≥ 0. Upon substituting this inequality
into (10) and invoking Theorem 12, we conclude that f is convex on S.
Conversely, suppose that ∇2 f (x̄) 6 0 for some x̄ ∈ S. Then, there exists a v ∈ Rn such
that v T ∇2 f (x̄)v < 0. Since S is open and ∇2 f is continuous, there exists an > 0 such that
x̄0 = x̄ + v ∈ S and v T ∇2 f (x̄ + α(x̄0 − x̄))v < 0 for all α ∈ [0, 1]. This implies (by taking x1 = x̄
and x2 = x̄0 in (10)) that f (x̄0 ) < f (x̄) + (∇f (x̄))T (x̄0 − x̄). Hence, by Theorem 12, we conclude
that f is not convex on S. This completes the proof. u
t
We point out that Theorem 13 only applies to functions f that are twice continuously differen-
tiable on an open convex set S. This should be contrasted with Theorem 12, where the function f
needs only be differentiable on a convex subset S 0 of an open set. In particular, the set S 0 need not
be open. To see why S must be open in Theorem 13, consider the function f : R2 → R given by
f (x, y) = x2 − y 2 . This function is convex on the set S = R × {0}. However, its Hessian, which is
given by
2 2 0
∇ f (x, y) = for (x, y) ∈ R2 ,
0 −2
is nowhere positive semidefinite.
Y x
= (x, Y, r) ∈ Rn × S++
n
×R: 0, Y 0 ,
xT r
18
where the last equality follows from the Schur complement (see, e.g., [2, Section A.5.5]). This
shows that epi(f ) is a convex set, which implies that f is convex on Rn × S++n .
2. Let f : Rm×n → R+ be given by f (X) = kXk2 , where k · k2 denotes the spectral norm or
largest singular value of the m × n matrix X. It is well known that (see, e.g., [6])
f (X) = sup uT Xv : kuk2 = 1, kvk2 = 1 .
exp(xi ) exp(2xi )
Pn − Pn if i = j,
( i=1 exp(xi ))2
∂2f i=1 exp(xi )
=
∂xi xj exp(xi + xj )
− Pn 6 j.
if i =
( i=1 exp(xi ))2
This gives
1
∇2 f (x) = eT z diag(z) − zz T ,
2
(eT z)
where z = (exp(x1 ), . . . , exp(xn )). Now, for any v ∈ Rn , we have
!2
n
! n ! n
1 X X X
v T ∇2 f (x)v = zi zi vi2 − z i vi
(e z)2
T
i=1 i=1 i=1
!2
n n n
! !
1 X √ X √ X √ √
= ( zi )2 ( zi v i ) 2 − zi · ( z i v i )
(eT z)2 i=1 i=1 i=1
≥ 0
by the Cauchy–Schwarz inequality. Hence, f is convex.
5. ([2, Chapter 3, Exercise 3.17]) Suppose that p ∈ (0, 1). Let f : Rn++ → R be given by
1/p
f (x) = ( ni=1 xpi ) . We compute
P
n
!p−1 −2 " n
! #
2(p−1)
xpi xpi xip−2 + xi
X X
(1 − p) − if i = j,
2
∂ f
i=1 i=1
= !p−1 −2
∂xi xj
n
p
xip−1 xp−1
X
(1 − p) xi 6 j.
if i =
j
i=1
19
This gives
n
!p−1 −2 " n
! #
xpi xpi diag xp−2
X X
∇2 f (x) = (1 − p) − p−2
1 , . . . , xn + zz T ,
i=1 i=1
where zi = xp−1
i for i = 1, . . . , n. Now, for any v ∈ Rn , we have
n
!p−1 −2 n
! n ! n
!2
p p p−2
vi xp−1
X X X X
T 2 2
v ∇ f (x)v = (1 − p) xi − xi vi xi + i
≤ 0,
i=1 i=1 i=1 i=1
since !2
n n n
! !
(p−2)/2 p/2
xpi vi2 xp−2
X X X
− i + vi xi xi ≤0
i=1 i=1 i=1
n
!
X
= − ln(1 + tλi ) + ln det X0
i=1
and
n
00
X λ2i
f˜X 0 ,H
(t) = ≥ 0,
(1 + tλi )2
i=1
−1/2 −1/2
where λ1 , . . . , λn are the eigenvalues of It follows that f˜X0 ,H is convex on D.
X0 HX0 .
n .
This, together with Theorem 11(e), implies that f is convex on S++
20
2.5 Non–Differentiable Convex Functions
In previous sub–sections we developed techniques to check whether a differentiable convex function
is convex or not. In particular, we showed that a differentiable function f : Rn → R is convex iff
at every x̄ ∈ Rn , we have f (x) ≥ f (x̄) + (∇f (x̄))T (x − x̄). The latter condition has a geometric
interpretation, namely, the epigraph of f is supported by its tangent hyperplane at every (x̄, f (x̄)) ∈
Rn × R. Of course, as can be easily seen from the function x 7→ |x|, not every convex function
is differentiable. However, the above geometric interpretation seems to hold for such functions as
well. In order to make such an observation rigorous, we need to first generalize the notion of a
gradient to that of a subgradient.
Note that ∂f (x̄) may not be empty even though f is not differentiable at x̄. For example, consider
the function f : R → R given by f (x) = |x|. Clearly, f is not differentiable at the origin, but it can
be easily verified from the definition that ∂f (0) = [−1, 1].
In many ways, the subdifferential behaves like a derivative, although one should note that the
former is a set, while the latter is a unique element. Here we state some important properties of
the subdifferential without proof. For details, we refer the reader to [11, Chapter 2, Section 2.5].
f (x + td) − f (x)
f 0 (x, d) = lim
t&0 t
Theorem 14(a) states that the subdifferential ∂f of a convex function f is non–empty at any
x ∈ int dom(f ). This leads to the natural question of what happens when x ∈ bd dom(f ). The
following example shows that ∂f can be empty at such points.
21
Example 7 (A Convex Function with Empty Subdifferential at a Point) Let f : Rn →
R ∪ {+∞} be given by q
− 1 − kxk22 if kxk2 ≤ 1,
f (x) =
+∞ otherwise.
It is clear that f is a convex function with dom(f ) = B(0, 1) and is differentiable on int dom(f ) =
{x ∈ Rn : kxk2 < 1}. It is also clear that ∂f (x) = ∅ for all x 6∈ dom(f ). Now, let x̄ ∈ bd dom(f ) =
{x ∈ Rn : kxk2 = 1} be arbitrary. If s ∈ ∂f (x̄), then we must have
q
f (x) = − 1 − kxk22 ≥ sT (x − x̄) for all x ∈ B(0, 1). (12)
Without loss of generality, we may assume that x̄ = e1 . For α ∈ [−1, 1], define x(α) = αe1 ∈
B(0, 1). From (12), we see that s ∈ Rn satisfies
p
f (x(α)) = − 1 − α2 ≥ αs1 − s1 for all α ∈ [−1, 1].
p
However, this implies that s1 ≥ (1 + α)/(1 − α) for all α ∈ [−1, 1), which is impossible. Hence,
we have ∂f (x̄) = ∅ for all x̄ ∈ bd dom(f ).
To further illustrate the concepts and results discussed above, let us compute the subdifferential
of the Euclidean norm.
3 Further Reading
Convex analysis is a rich subject with many deep and beautiful results. For more details, one can
consult the general references listed on the course website and/or the books [3, 13, 9, 5, 1, 11].
References
[1] A. Barvinok. A Course in Convexity, volume 54 of Graduate Studies in Mathematics. American
Mathematical Society, Providence, Rhode Island, 2002.
[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, Cambridge,
2004. Available online at http://www.stanford.edu/~boyd/cvxbook/.
[4] C. Ding, D. Sun, and K.-C. Toh. An Introduction to a Class of Matrix Cone Programming.
Mathematical Programming, Series A, 144(1–2):141–179, 2014.
22
[5] J.-B. Hiriart-Urruty and C. Lemaréchal. Fundamentals of Convex Analysis. Grundlehren Text
Editions. Springer–Verlag, Berlin/Heidelberg, 2001.
[6] R. A. Horn and C. R. Johnson. Matrix Analysis. Cambridge University Press, Cambridge,
1985.
[7] J. R. Magnus and H. Neudecker. Matrix Differential Calculus with Applications in Statis-
tics and Econometrics. Wiley Series in Probability and Statistics. John Wiley & Sons, Inc.,
Chichester, England, revised edition, 1999.
[8] E. A. Papa Quiroz, L. Mallma Ramirez, and P. R. Oliveira. An Inexact Proximal Method for
Quasiconvex Minimization. European Journal of Operational Research, 246(3):721–729, 2015.
[10] H. L. Royden. Real Analysis. Macmillan Publishing Company, New York, third edition, 1988.
[11] A. Ruszczyński. Nonlinear Optimization. Princeton University Press, Princeton, New Jersey,
2006.
[12] G. W. Stewart and J. Sun. Matrix Perturbation Theory. Academic Press, Boston, 1990.
[13] G. M. Ziegler. Lectures on Polytopes, volume 152 of Graduate Texts in Mathematics. Springer–
Verlag, New York, revised first edition, 1995.
23