You are on page 1of 34

CONVEX OPTIMIZATION

CONVEX OPTIMIZATION 1 / 34
1 ONE-DIMENSIONAL

2 MULTI-DIMENSIONAL CASE

CONVEX OPTIMIZATION 2 / 34
Consider a smooth function f : Ω → R, where Ω is a nonempty open
subset of Rn . We focus on the problem of finding x0 ∈ Ω so that
f (x0 ) = maxf (x) or f (x0 ) = minf (x), denoted by
x∈Ω x∈Ω

x0 = arg maxf (x) or x0 = arg minf (x)


x∈Ω x∈Ω

Such an x0 is called a global maximum or global minimum of f on


Ω. When f is a differentiable function, we only have the necessary
condition for x0 : ∇f (x0 ) = 0. This condition is also sufficient in
convex optimization.

CONVEX OPTIMIZATION 3 / 34
ONE-DIMENSIONAL

Outlines

1 ONE-DIMENSIONAL

2 MULTI-DIMENSIONAL CASE

CONVEX OPTIMIZATION 4 / 34
ONE-DIMENSIONAL

Let f : (a, b) → R, −∞ ≤ a < b ≤ ∞.


Definition 1. The function f : (a, b) → R is convex if for all
x, y ∈ (a, b) and 0 ≤ θ ≤ 1, we have
f (θx + (1 − θ) y ) ≤ θf (x) + (1 − θ) f (y ) . (1)
The function f is concave if −f is convex.
Geometrically, the inequality (1) means that the line segment
between (x, f (x)) and (y , f (y )), which is a chord from x to y, lies
above the graph of f.

CONVEX OPTIMIZATION 5 / 34
ONE-DIMENSIONAL

Theorem 2. Let f : (a, b) → R, where −∞ ≤ a < b ≤ ∞, be of


class C 2 , i.e. the second derivative f 00 exists and is a continuous
function. The following statements are equivalent
(i) f is a convex function.
(ii) f (y ) ≥ f (x) + f 0 (x) (y − x), for all x, y ∈ (a, b).
(iii) f 00 (x) ≥ 0, for all x ∈ (a, b).
In order to prove it, we need the following result, which is a particular
case of Taylor’s formula.
Proposition 3. Let f : (a, b) → R be of class C 2 . For each
x, y ∈ (a, b), x < y , there exists x ≤ z ≤ y such that
(y − x)2
f (y ) = f (x) + f 0 (x) (y − x) + f 00 (z) .
2!
Proof of Theorem 2. We shall prove (i) ⇔ (ii) and (ii) ⇔ (iii).
When (i) satisfies, with x, y ∈ (a, b) and 0 ≤ θ ≤ 1, (1) gives
f (y + θ (x − y )) ≤ f (y ) + θ (f (x) − f (y ))
CONVEX OPTIMIZATION 6 / 34
ONE-DIMENSIONAL

so with 0 < θ ≤ 1, since


f (y + θ (x − y )) − f (y )
f (x) − f (y ) ≥ → f 0 (y ) (x − y )
θ
whenever θ ↓ 0, (ii) is satisfied. Conversely, when (ii) holds, then with
z = θx + (1 − θ) y , we have

f (x) ≥ f (z) + f 0 (z) (x − z) , (2)

and
f (y ) ≥ f (z) + f 0 (z) (y − z) . (3)
Multiply both sides of (2) by θ, (3) by 1 − θ, and add, side by side,
we get

θf (x) + (1 − θ) f (y ) ≥ f (z) = f (θx + (1 − θ) y ) ,

and (i) is proved.


CONVEX OPTIMIZATION 7 / 34
ONE-DIMENSIONAL

Now, when (ii) holds, we have, with x, y ∈ (a, b), x < y ,


f (y ) ≥ f (x) + f 0 (x) (y − x) and f (x) ≥ f (y ) + f 0 (y ) (x − y ).
Therefore
f 0 (x) (y − x) ≤ f (y ) − f (x) ≤ f 0 (y ) (y − x) .
Divide both sides by (y − x)2 , we get
f 0 (y ) − f 0 (x)
≥ 0,
y −x
for all x, y ∈ (a, b), x < y . Letting y → x, we get (iii). Conversely, by
proposition 3, there exists z ∈ [x, y ] so that
1
f (y ) = f (x) + f 0 (x) (y − x) + f 00 (z) (y − x)2 .
2
Since f 00 (z) ≥ 0, we must have f (y ) ≥ f (x) + f 0 (x) (y − x), i.e.,
(ii) holds.
Using Theorem 2 with −f when f is a concave function, we have
CONVEX OPTIMIZATION 8 / 34
ONE-DIMENSIONAL

Corollary 4. Let f : (a, b) → R, where −∞ ≤ a < b ≤ ∞, be of


class C 2 . The following statements are equivalent
(i) f is a concave function.
(ii) f (y ) ≤ f (x) + f 0 (x) (y − x), for all x, y ∈ (a, b).
(iii) f 00 (x) ≤ 0, for all x ∈ (a, b).

Example 1. The functions y = x n , for even n ∈ N, and y = e x are


convex. The function y = ln x is concave.
Let f be a convex/concave function on (a, b) and let x0 ∈ (a, b) that
satisfies f 0 (x0 ) = 0. The inequality (ii) of Theorem 2 or Corollary 3,
with x = x0 , yields f (x) ≥ f (x0 ) or f (x) ≤ f (x0 ), for all x ∈ (a, b).
Hence, x0 is a global minimum or a global maximum of f on (a, b)
which shows that f 0 (x0 ) = 0 is also a sufficient condition for global
extremum.

CONVEX OPTIMIZATION 9 / 34
ONE-DIMENSIONAL

Corollary 5. Let f : (a, b) → R, where −∞ ≤ a < b ≤ ∞, be of


class C 1 , i.e., the first derivative f 0 exists and is continuous and let
x0 ∈ (a, b) such that f 0 (x0 ) = 0.
(i) If f is convex then x0 = arg minf (x).
x∈(a,b)
(ii) If f is concave then x0 = arg maxf (x).
x∈(a,b)
Example 2. (i) The function y = x 4 + 6x 2 , defined on R, has
y 0 = 4x 3 + 12x, y 00 = 12x 2 . Since y 00 ≥ 0, for all x ∈ R,
y = x 4 + 6x 2 is convex on R. Now, y 0 = 0 ⇔ x = 3 implies that
arg min (x 4 + 6x 2 ) = 3.
x∈R
(ii) The function y = −x ln x, defined on (0, +∞), has
y 0 = − ln x + 1, y 00 = − x1 . Since y 00 < 0, for all x ∈ (0, +∞),
y = −x ln x is concave on (0, +∞). Now, y 0 = 0 ⇔ x = e implies
that arg max (−x ln x) = e.
x∈(0,+∞)

CONVEX OPTIMIZATION 10 / 34
MULTI-DIMENSIONAL CASE

Outlines

1 ONE-DIMENSIONAL

2 MULTI-DIMENSIONAL CASE

CONVEX OPTIMIZATION 11 / 34
MULTI-DIMENSIONAL CASE

MULTI-DIMENSIONAL CASE
Let f : Ω → R, where Ω is a nonempty subset of Rn .
First, we set some notations and terminologies.
Definition 6. A nonempty subset Ω ⊂ Rn is said to be an open
convex set in Rn if for all x ∈ Ω and h ∈ Rn , {t ∈ R| x + th ∈ Ω}
is an open interval (a, b) in R, for some −∞ ≤ a < b ≤ ∞.

CONVEX OPTIMIZATION 12 / 34
MULTI-DIMENSIONAL CASE

Example 3. Any open convex set Ω in R must be an open interval.


Rn is itself an open convex subset.
R × (0, ∞) = {(x, y ) ∈ R2 | y > 0} and
(0, ∞) × (0, ∞) = {(x, y ) ∈ R2 | x, y > 0} are two convex open sets
in R2 .
Definition 7. A square matrix A ∈ Rn×n is said to be positive
semidefinite (negative semidefinite, resp), noted A ≥ 0 (A ≤ 0,
resp.), if hT Ah ≥ 0 (hT Ah ≤ 0, resp.) for all h ∈ Rn .
Example 4. Let  
1 2
A= .
2 5
For all h = (h, k) ∈ R2 ,
    
T
  1 2 h   h + 2k
h Ah = h k = h k
2 5 k 2h + 5k
= h (h + 2k) + k (2h + 5k) = h2 + 4hk + 5k 2
CONVEX OPTIMIZATION 13 / 34
MULTI-DIMENSIONAL CASE

By writing h2 + 4hk = (h + 2k)2 − 4k 2 , we have

h2 + 4hk + 5k 2 = (h + 2k)2 − 4k 2 + 5k 2 = (h + 2k)2 + k 2 ≥ 0,

for all h = (h, k) ∈ R2 . Therefore, A is positive semidefinite.


Let A ∈ Rn×n be a symmetric matrix with (repeated) eigenvalues λi ,
i = 1, · · · , n. We have
Theorem 8. The symmetric matrix A is positive semidefinite if
λi ≥ 0, for all i = 1, · · · , n, and is negative semidefinite if λi ≤ 0, for
all i = 1, · · · , n.
Proof. Let Q be an orthogonal matrix that orthogonally diagonalizes
A, Q T AQ = D with D = diag (λ1 , λ2 , · · · , λn ). Since and
QDQ T = A, it follows that Q T = Q −1 and for each h ∈ Rn , we have
T
hT Ah = hT QDQ T h = Q T h D Q T h .
 

Let Q T h = (y1 , y2 , · · · , yn ) ∈ Rn , the right hand side becomes


CONVEX OPTIMIZATION 14 / 34
MULTI-DIMENSIONAL CASE

  
λ1 0 · · · 0 y1
T
T T
   0 λ2 · · · 0  y2 
Q h D Q h = y1 y2 · · · yn  .. .. .. ..
  
 
 . . .  . 
0 0 · · · λn yn
n
X
= λi yi2
i=1
T
which shows that h Ah ≥ 0 when λi ≥ 0, for all i = 1, · · · , n, and it
is ≤ 0 when λi ≤ 0, for all i = 1, · · · , n.
Example 5. Consider again the matrix A of example 4. The
characteristic equation

1−λ 2
det (A − λI2 ) = = (1 − λ) (5 − λ)−4 = λ2 −6λ+5 = (
2 5−λ
shows that A has two positive eigenvalues, 1 and 5. Therefore, A is
positive semidefinite. CONVEX OPTIMIZATION 15 / 34
MULTI-DIMENSIONAL CASE

Definition 9. Let f : Ω → R, where Ω is a nonempty open convex


set in Rn .
(i) f is said to be a convex function if for all x, y ∈ Ω and
0 ≤ θ ≤ 1, we have

f (θx + (1 − θ) y) ≤ θf (x) + (1 − θ) f (y) .

(ii) f is said to be a concave function if −f is a convex function.

f (θx + (1 − θ) y) ≤ θf (x) + (1 − θ) f (y) .

Given a function f : Ω → R, where Ω is a nonempty open convex set


in Rn , for each x ∈ Ω and h ∈ Rn , let I = {t ∈ R| x + th ∈ Ω}
which is an open interval in R, and let gx,h : I → R be defined by
gx,h (t) = f (x + th), for each t ∈ I .
It is easy to have the following important connection between the
multivariate function f and the one-dimensional function gx,h as
follows.
CONVEX OPTIMIZATION 16 / 34
MULTI-DIMENSIONAL CASE

Proposition 10. Let f : Ω → R be a function defined on an open


convex set Ω in Rn and let {gx,h } be the family of functions of one
variable, defined by (??). Then
f is a convex function if and only if gx,h are convex functions, for all
x ∈ Ω and h ∈ Rn .
and
f is a concave function if and only if gx,h are concave functions, for
all x ∈ Ω and h ∈ Rn .
Let f : Rn → R be of class C 2 . The gradient and the Hessian of f at
x = (x1 , x2 , · · · , xn ) ∈ Rn
, noted ∇f(x) and ∇2 f (x), respectively,
∂f
∂x1
(x)
 ∂f (x) 
are defined by ∇f (x) =  ∂x2 .  and
 
 .. 
∂f
∂xn
(x)

CONVEX OPTIMIZATION 17 / 34
MULTI-DIMENSIONAL CASE

∂2f ∂2f ∂2f


 
∂x12
(x) ∂x2 ∂x1
(x) ··· ∂xn ∂x1
(x)
∂2f ∂2f ∂2f
(x) (x) ··· (x) 
 
∂x1 ∂x2 ∂x22 ∂xn ∂x2
2

∇ f (x) =  .. .. ..
. When f is of
. . .
 
 
∂2f ∂2f ∂2f
∂x1 ∂xn
(x) ∂x2 ∂xn
(x) · · · ∂x 2
(x)
n
∂2f ∂2f
class C 2 , ∂xj ∂xi
(x) = ∂xi ∂xj
(x), for all i, j = 1, · · · , n. It follows that
∇2 f (x) is a symmetric matrix, ∇2 f (x)T = ∇2 f (x).
Example 6. (a) Let f : R3 → R be defined by
f (x1 , x2 , x3 ) = 2x1 − x2 
+ 3x3 . We have
2
∇f (x1 , x2 , x3 ) =  −1  and ∇2 f (x1 , x2 , x3 ) = 0, the zero matrix in
3
3×3
R .
(b) Let f : R3 → R be defined by
f (x1 , x2 , x3 ) = x12 + x22 + x32 − x1 x2 + x1 x3 . We have

CONVEX OPTIMIZATION 18 / 34
MULTI-DIMENSIONAL CASE
 
2x1 − x2 + x3
∇f (x1 , x2 , x3 ) =  2x2 − x1  and
 2x3 + x1 
2 −1 1
2
∇ f (x1 , x2 , x3 ) =  −1 2 0 .
1 0 2
Proposition 11. Let f : Ω → R be of class C 2 , where Ω is a
nonempty open convex set in Rn . For each x ∈ Ω and
h = (h1 , h2 , · · · , hn ) ∈ Rn , let I = { t ∈ R| x + th ∈ Ω} and let
gx,h : I → R be defined. We have
0
gx,h (t) = ∇f (x + th) h và gx,h 00
(t) = hT ∇2 f (x + th) h.
Proof. The differentability of f gives the existence of a function
ε (h), defined on khk < r , such that ε (h) → 0 as h → 0, and

f (x + h) − f (x) = ∇f (x) h + khk ε (h) .

Now, let t ∈ I be fixed. For all k ∈ R so that t + k ∈ I , we have


CONVEX OPTIMIZATION 19 / 34
MULTI-DIMENSIONAL CASE

gx,h (t + k) − gx,h (t) = f (x + (t + k) h) − f (x + th) = ∇f (x + th


= k∇f (x + th) h + |k| khk ε (kh)
and then
gx,h (t + k) − gx,h (t) |k|
= ∇f (x + th) h + khk ε (kh) .
k k
Letting k → 0, we have kkhk → 0 which gives

|k|
khk ε (kh) = khk |ε (kh)| → 0.
k
Therefore
0 gx,h (t + k) − gx,h (t)
gx,h (t) = lim
k→0 k
n
X ∂f
= ∇f (x + th) h = (x + th) hi
i=1
∂x i

CONVEX OPTIMIZATION 20 / 34
MULTI-DIMENSIONAL CASE

∂f
Now, for each i = 1, 2, · · · , n, let gi (t) = ∂xi
(x + th). Apply the
∂f
above result for ∂x i
in place of f, we get
n n
∂2
 
X ∂ ∂f X
gi0 (t) = (x + th) hj = (x + th) hj
j=1
∂x j ∂x i
j=1
∂x j ∂xi

and then
Pn 2
00
hi gi0 (t) = ni=1 hi nj=1 ∂x∂j ∂xi (x + th) hj
P P
gx,h (t) = i=1
∂2
 Pn 
j=1 ∂xj ∂x1 (x + th) hj
 Pn 2
  j=1 ∂x∂j ∂x2 (x + th) hj 


= h1 h2 · · · hn  ..

.
 
 
∂2
Pn
j=1 ∂xj ∂xn (x + th) h j

as desired.
CONVEX OPTIMIZATION 21 / 34
MULTI-DIMENSIONAL CASE

∂2 ∂2 ∂2

∂x12
(x + th) ∂x2 ∂x1
(x + th) ··· ∂xn ∂x1
(
∂ 2 ∂ 2 ∂2
(x + th) (x + th) ··· (

∂x1 ∂x2 ∂x22 ∂xn ∂x2
 
= h1 h2 · · · hn  .. ..
. .


∂2 ∂2 ∂2
∂x1 ∂xn
(x + th) ∂x2 ∂xn
(x + th) · · · ∂xn2
(x
T
= h ∇2 f (x + th) h.

as desired.
We have the following multidimensional version of Theorem 2.
Theorem 12. Let f : Ω → R be a function of class C 2 on an open
convex set Ω in Rn . The following statements are equivalent.
(i) f is a convex function.
(ii) f (y) ≥ f (x) + ∇f (x)T (y − x), for all x, y ∈ Ω.
(iii) ∇2 f (x) ≥ 0, for all x ∈ Ω.

CONVEX OPTIMIZATION 22 / 34
MULTI-DIMENSIONAL CASE

Proof. For each z ∈ Ω and h ∈ Rn , I = {t ∈ R| z + th ∈ Ω}, and


let gz,h : I → R be defined by gz,h (t) = f (z + th), for all t ∈ I . By
proposition 10, (i) is satisfied if and only if all functions gz,h are
convex. Using Theorem 2, the statement that gz,h is convex is
equivalent with the two following statements
0
(a) gz,h (s) ≥ gz,h (t) + gz,h (t) (s − t), for all s, t ∈ I .
00
(b) gz,h (t) ≥ 0, for all t ∈ I .
By proposition 11,
0
gz,h (t) = ∇f (z + th) h và gz,h 00
(t) = hT ∇2 f (z + th) h.
Letting z = x, h = y − x, s = 1, and t = 0 in (a), we have
0
f (y) = gx,h (1) ≥ gz,h (0) + gz,h (0) = f (x) + ∇f (x) h

which is (ii). Letting z = x, we have


hT ∇2 f (x) h = gx,h00
(0) ≥ 0, for all h ∈ Rn ,

CONVEX OPTIMIZATION 23 / 34
MULTI-DIMENSIONAL CASE

which implies that ∇2 f (x) ≥ 0, i.e., (ii) is proved. Conversely,


consider the function gz,h (t) = f (z + th), z ∈ Ω and h ∈ Rn . For
s, t ∈ I by letting x = z + th and y = z + sh, we get

gz,h (s) = f (z + sh) ≥ f (z + th)+∇f (z + th)T ((s − t) h) = gz,h (t)+

whenever (ii) holds, and gz,h 00


(t) = hT ∇2 f (z + th) h ≥ 0 whenever
(iii) holds. Therefore, if (ii) or (iii) hold, then, by Theorem 2, gz,h is a
convex function and the Theorem is proved.
Conditions (iii) is used to check if f is a convex function and
condition (ii) shows that a solution of the equation ∇f (x) = 0 is a
global minimum of f on Ω.
As in the one-dimensional case, letting −f in Theorem 12, we get the
following result for concave functions.
Corollary 13. Let f : Ω → R be a function of class C 2 on an open
convex set Ω in Rn . The following statements are equivalent.
CONVEX OPTIMIZATION 24 / 34
MULTI-DIMENSIONAL CASE

(i) f is a concave function.


(ii) f (y) ≤ f (x) + ∇f (x)T (y − x), for all x, y ∈ Ω.
(iii) ∇2 f (x) ≤ 0, for all x ∈ Ω.
By combining these results, we have the following important result
for applications.
Corollary 14. Let Ω be an open convex set of Rn , let f : Ω → R be
of class C 1 , i.e., all partial derivatives of f exist and are continuous
continuous functions, and let x0 ∈ Ω such that ∇f (x0 ) = 0.
(i) If f is a convex function then x0 = arg minf (x).
x∈Ω
(ii) If f is a concave function then x0 = arg maxf (x).
x∈Ω

CONVEX OPTIMIZATION 25 / 34
MULTI-DIMENSIONAL CASE

Example 7. Consider the quadratic form of example 6 (b), with

f (x1 , x2 , x3 ) = x12 + x22 + x32 − x1 x2 + x1 x3 , (x1 , x2 , x3 ) ∈ R3 ,


 
2x1 − x2 + x3
∇f (x1 , x2 , x3 ) =  2x2 − x1  and
 2x3 + x1 
2 −1 1
2
∇ f (x1 , x2 , x3 ) =  −1 2 0 .
1 0 2
2

The √matrix ∇ f (x1 , x2 , x3 ) has positive eigenvalues 2, 2 + 2, and
2 − 2 which shows that ∇2 f (x1 , x2 , x3 ) ≥ 0 and then f is a convex
function. Therefore, the unique solution of the equation
∇f (x1 , x2 , x3 ) = (0, 0, 0) is the global minimum of f on R3 .

CONVEX OPTIMIZATION 26 / 34
MULTI-DIMENSIONAL CASE

Definition 15. Let q = (q1 , q2 , · · · , qn ) ∈ Rn and let


A = [aij ] ∈ Rn×n .
(i) The function f : Rn → R defined by
n
X
fq (x) = hq, xi = qi xi ≡ qT x,
i=1

for x = (x1 , x2 , · · · , xn ) ∈ Rn , is called a linear map.


(ii) The function f : Rn → R defined by
n X
X n
fA (x) = aij xi xj = xT Ax,
i=1 j=1

for x = (x1 , x2 , · · · , xn ) ∈ Rn , is called a quadratic form.

CONVEX OPTIMIZATION 27 / 34
MULTI-DIMENSIONAL CASE

Example 8. (a) The function f (x1 , x2 , x3 ) = 2x1 − x2 + 3x3 of


example 6 (a) is a linear map, f = fq , with q = (2, −1, 3).
(b) The function f (x1 , x2 , x3 ) = x12 + x22 + x32 − x1 x2 + x1 x3 of
example 6 (b) is a quadratic form, f = fA , with

1 − 21 12
 

A =  − 12 1 0  .
1
2
0 1

Remark. Let fA(x) = xT Ax be a quadratic form on Rn and let


B = 12 A + AT . Then B is symmetric and
1 T
xT Bx = x Ax + xT AT x = xT Ax,

2
T T
since xT AT x = xT AT x = xT AT x = xT Ax ∈ R1×1 . So we
can assume that the matrix A that defines the quadratic form
fA (x) = xT Ax is symmetric.
CONVEX OPTIMIZATION 28 / 34
MULTI-DIMENSIONAL CASE

Proposition 16. Let fq (x) be a linear map with


q = (q1 , q2 , · · · , qn ) ∈ Rn and let fA (x) = xT Ax be a quadratic
form with symmetric matrix A = [aij ] ∈ Rn×n . We have
(i) ∇fq (x) = q and ∇2 fq (x) = 0, for all x ∈ Rn .
(ii) ∇fA (x) = 2Ax and ∇2 fA (x) = 2A, for all x ∈ Rn .
Proof. (i) The identity fq (x) = q1 x1 + · · · + qi xi + · · · + qn xn gives
∂fq 2f

∂xi
(x) = qi , for all i = 1, 2, · · · , n, and ∂x∂ j ∂x
q
i
(x) = 0, for all
i, j = 1, 2, · · · , n.
(ii) Since ∂x∂ k (xi xj ) = 0, when k 6= i, j, ∂x∂ k (xk xj ) = xj , when
i = k 6= j, and ∂x∂ k (xk xk ) = 2xk , it follows from the identity
fA (x) = ni=1 nj=1 aij xi xj that, for each k = 1, 2, · · · , n,
P P

CONVEX OPTIMIZATION 29 / 34
MULTI-DIMENSIONAL CASE

P 
∂fA
(x) = ni=1 nj=1 aij ∂x∂ k (xi xj ) = ni=1 aik ∂x∂ k (xi xk ) + j6=k aij ∂x∂ k
P P P
∂xk
P P 
= ni=1 aik ∂x∂ k (xi xk ) + ni=1 ∂
P P
j6=k a ij ∂xk
(x i x j ) = 2akk xk + i6=k aik
= 2akk xk + i6=k aik xi + j6=k akj xj = 2 ni=1 aki xi
P P P

(4)
which is the k-th row of 2Ax. Hence, ∇fA (x) = 2Ax. Now, by (4),
∂fA
∂xk
(x) is a linear function with q = (2ak1 , 2ak2 , · · · , 2akn ), the k-th
2
fA
row of 2A. By (i), ∂x∂l ∂x k
(x) = alk , for each l, k = 1, 2, · · · , n, which
2
gives ∇ fA (x) = 2Aas desired.

CONVEX OPTIMIZATION 30 / 34
MULTI-DIMENSIONAL CASE

Finally, consider a matrix A ∈ RN×n , with n ≤ N, rank A = n, and a


vector b ∈ RN . Let the function f : Rn → R be defined by
f (x) = kAx − bk2 . We have

f (x) = hAx −b, Ax − bi = (Ax − b)T (Ax − b)


= xT AT − bT (Ax − b)
= xT AT Ax− bT Ax − x T AT b + bT b
= xT AT A x − 2bT A x + bT b,

since bT Ax = xT AT b. Therefore 


∇f (x) = 2 AT A x − 2AT b and ∇2 f (x) = 2 AT A .
Since
hT ∇2 f (x) h = 2 hT AT A h = 2 hT AT (Ah) = 2 kAhk2 ≥ 0,
     

for all h ∈ RN , the Hessian


 ∇2 f (x) is a positive semidefinite matrix.
Moreover, rank A A = n implies that the matrix AT A is invertible.
T

CONVEX OPTIMIZATION 31 / 34
MULTI-DIMENSIONAL CASE
−1 T
Hence, the solution x = AT A A b of the equation ∇f (x) = 0 is
the global minimum of f.
We state this result in the following theorem for future applications.
Theorem 17. Let a matrix A ∈ RN×n , with n ≤ N, so that
rank A = n = min (N, n), and let b ∈ RN . We have
−1 T
AT A A b = arg min kAx − bk2 .
x∈Rn

Example 9. Let b = (2, 2, 5, 8) ∈ R4 and let


 
1 1
 1 2  4×2
A=  1 3 ∈R .

1 4
It is easy to see that
 dim (col (A)) = 2 = rank (A) = min (4, 2),
4 10
AT A = is an invertible matrix, and
10 30
CONVEX OPTIMIZATION 32 / 34
MULTI-DIMENSIONAL CASE

 −1    
T
−1 T 4 10 17 −1
A A A b= = . Hence
10 30 53 2.1

(−1, 2.1) = arg min kAx − bk2 .


x∈R2

Now, with
12
 
1 1
 1 2 22 
 ∈ R4×3 .
A=
 1 3 32 
1 4 42
We havedim (col (A)) = 
3 = rank (A) = min (4, 3),
4 10 30
AT A =  10 30 100  is an invertible matrix, and
30 100 354

CONVEX OPTIMIZATION 33 / 34
MULTI-DIMENSIONAL CASE

 −1    
−1 T 4 10 30 17 2.75
AT A A b =  10 30 100   53  =  −1.65 .
30 100 354 183 0.75
Hence
(2.75, −1.65, 0.75) = arg min kAx − bk2 .
x∈R3

CONVEX OPTIMIZATION 34 / 34

You might also like