You are on page 1of 35

E1

 251  Linear  and  Nonlinear  


Op2miza2on  
 
Chapter  4:  Convex  and  quadra2c  func2ons  

1  
4.1  Defini2on:  Convex  func2on  

A funtion f (x) is called a convex function


if, for every x and y in  n , and for every θ in [0,1],
we have f (θ x + (1 − θ )y) ≤ θ f (x) + (1 − θ ) f (y)
  
(1) (2)

A funtion f (x) is convex


(2) if and only if for every vector
d, the 1D function
gd (α ) = f (x + α d) is convex.
(1)

x θ x + (1 − θ )y y 2  
Func2ons  taxonomy  

Convex   Non-­‐Convex  

Linear   Affine   Convex  quadra2c   Others  


T
b x b x+c
T f (x) = xT Qx
− 2bT x + c
Q : Positive semidefinite

3  
Local  minimum  is  global  minimum  
Theorem 4.1A
For a convex function f (x), if x* is a local minimum
in ! n , it is also a global minimum in ! n .
Proof:
1) if x* is not a global minimum, then there exists a y, such that
f (y) < f (x* ).

2) For any point in the line {α y + (1 − α )x*; 0 < α < 1}, we have
by convexity of f (x) that
f (α y + (1 − α )x* ) ≤ α f (y) + (1− α ) f (x* )
!## #"### $ !###"###$
T1 T2
4  
3) Since f (y) < f (x* ) by assumption, T2 < f (x* ). This implies
T1 = f (α y + (1 − α )x* ) < f (x* ).

4) For α aribitrarily small, this contradicts the the fact that


x* is a local minimum.

5) Hence, for a convex function f (x) having a local minimum


at x* , there is no such y satisfying f (y) < f (x* ).
Hence, a local minimum is also a global minimum.

5  
4.2  Deriva2ve  based  characteriza2on  

Theorem  4.2A  
A funtion f (x) is convex in ! n if and only if for
every pair of vectors x and y in ! n , we have
f (y) ≥ f (x) + (y − x) ∇f (x)
T

• First order Taylor's approximation is a global under


estimator for all points
• ∇f (x ) = 0 gives global minimizer
*

6  
f (x)

(y, f (y))

f (y) + (x − y)T f ′(y)

f (x) ≥ f (y) + (x − y)T f ′(y)


7  
Theorem  4.2B  
A function f (x) is convex if and only if its Hessian F(x) is positive
semi-definite for all x.

Implica2ons  of  Theorems  4.2A  and  4.2B  in    


Unconstrained  op2miza2on  

1) The point satisfying ∇f (x) = 0 is a global minimizer

2) If two point satisfy ∇f (x) = 0, then the function value


at the points should be equal

3) If FONC-Min is satisfied, SONC-Min is also satisfied


8  
4.3  Examples  of  convex  func2ons  

Single  variable  func2ons    


f1 (x) = x 2
f2 (x) = x 4
⎧ 0, for x ≤ a
⎪⎪
f3 (x) = ⎨ (x − a)2 , for x > a

⎪⎩ (x + a) 2
, for x < −a
f3 (x) = eax
f4 (x) = x a
f5 (x) = log(x), for x < 0
f6 (x) = x log(x)
9  
Mul2variable  func2ons  

f1 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )3


f2 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )4

f3 (x1 , x2 ) = x / x2
2
1

10  
Convexity  of  norms  

A function f (x) is a norm if it satisfies the following


(1) Positivity: f (x) ≥ 0, f (x) = 0 if and only if x = 0.
(2) Homogeneity: f (rx) = r f(x), r ∈
(3) Triangle inequality: f (x + y) ≤ f (x) + f (y)

Proof of convexity:
f (θ x + (1− θ )y) ≤ f (θ x) + f ((1− θ )y) (Triangle inequality)
≤ θ f (x) + (1− θ ) f (y) (Homogeneity)

11  
Convexity  of  Max  func2on  

f (x) =max(x1 ,...., xn )

f (θ x + (1 − θ )y) = max (θ x1 + (1 − θ )y1 ,....,θ xn + (1 − θ )yn )


≤ max (θ x1 ,....,θ xn ) +
max ((1 − θ )y1 ,....,(1 − θ )yn )
≤ θ max (x1 ,...., xn ) +
(1-θ )max (y1 ,...., yn )

12  
4.4  The  quadra2c  func2on  in  standard  form  

4.4.1    Defini2on  

f (x) = 0.5xT Qx − bT x + c

x : n × 1 vector of optimization variable


b : n × 1 constant vector
Q : n × n symmetric matrix
c: scalar constant

If Q is not symmetric let Q′ = ( Q + QT ) , and let


1
2
f ′(x) = 0.5xT Q′x − bT x + c

f (x) and f ′(x) are identical.


13  
4.4.2  Proper2es  of  symmetric  matrices  

(1)   Eigen  values  are  real  

(2)   Eigen  vectors    can  be  assumed  to  be  real  

(3)   Eigen  vectors  are  orthogonal  

(4)   Are  diagonalizable  

5) If Q is symmetric, then Q can be expressed


as Q = UΛU T , where U is an orthogonal
matrix.
14  
6)  Posi2ve  definiteness  
(a) An n × n symmetric matrix, Q, is called positive
definite if v T Qv > 0, for every v ∈! n

(b) Since Q is symmetric, we have Q = UΛU T .


Also any v can be expressed as v = Us, for some s.

(a) and (b) imply


An n × n symmetric matrix, Q, is called positive
definite if and only if s T Λs > 0, for every s ∈! n

which in turn implies


an n × n symmetric matrix, Q, is called positive definite
if and only if all eigen values are postive.
15  
4.5  Gradient  and  Hessian  of  the  quadra2c  form  

Exercise  
Compute  the  gradient  and  Hessian  of  the  following    
func2ons:  

T
⎡ 4 2 ⎤ ⎡ 1 ⎤
f (x) = x ⎢ T
⎥x − ⎢ ⎥ x+8
⎣ 2 3 ⎦ ⎣ 2 ⎦
T
⎡ 4 1 ⎤ ⎡ 1 ⎤
f (x) = x ⎢ T
⎥x − ⎢ ⎥ x+8
⎣ 3 3 ⎦ ⎣ 2 ⎦

16  
4.5.1  Gradient  of  the  general  quadra2c  form  
Direct  approach  
∂ T
1) show that
∂xl
( x Qx ) = ql T x + ql x

∂ ∂
2) Then
∂xl
( 0.5x T
Qx ) −
∂xl
( b T
x ) = 0.5 ( q l x + q l x ) − bi
T

3) Hence ∇ ( 0.5xT Qx ) − ∇ ( b T x ) = 0.5 ( QT x + Qx ) − b

4) Because of the symmetry of Q, we have


∇f (x) = Qx − b

17  
Alterna2ve  approach  

f (x) = 0.5x Qx − b x + c
T T

g[x,d ] (α ) = f (x + α d)
=0.5xT Qx + 0.5α 2 dT Qd + α dT Qx
− bT x − α bT d + c

We know that
′ ] (α ) = dT ∇f (x + α d)
g[x,d
′ ] (0) = dT ∇f (x)..........................(E1)
g[x,d

18  
′ ] (0) directly:
We can also compute g[x,d
g[x,d ] (α )=0.5xT Qx + 0.5α 2 dT Qd + α dT Qx
− bT x − α bT d + c
′ ] (α )=α dT Qd + dT Qx − bT d
g[x,d
′ ]
g[x,d (0)=d T
Qx − b T
d.....................(E2)

Comparing (E1) and (E2) we get


∇f (x) = Qx − b

19  
4.5.2  Hessian  of  the    quadra2c  form  
Direct  approach  

∇ ( f (x)) = Qx − b ................. (E1)

{F(x)}i, j =
∂2
∂xi ∂x j
f (x) =

∂xi
({∇f (x)} )j

n
From (E1) we get {∇f (x)} j = ∑ Q j,k xk − b j
k=1

Hence {F(x)}i, j =
∂2
∂xi
({∇f (x)} ) = Qj j,i

⇒ F(x) = Q 20  
Alterna2ve  approach  

f (x) = 0.5xT Qx − bT x + c
g[x,d ] (α ) = f (x + α d)
=0.5x Qx + 0.5α d Qd + α d Qx
T 2 T T

− bT x − α bT d + c
′′ ] (α ) = dT Qd.......................(A)
g[x,d
We know from chapter 2 that
′′ ]
g[x,d (α ) = d T
H(x + α d)d...........(B)
(A) and (B) imply
H(x) = Q.
21  
Summary:  Gradient  and  Hessian  of  quadra2c  form  

f (x) = 0.5x Qx − b x + c
T T

Gradient:
∇ ( f (x)) = Qx − b

Hessian:
F(x) = Q

22  
4.6  Minimiza2on  of  quadra2c  forms  

(1) FONC
∇ ( f (x* )) = Qx* − b = 0 is solvable.
⇒ Qx* = b is solvable.

(2a) SONC
the Hessian, F(x) = Q is postive semi-definite.

(2b) SOSC-SMin
Q is postive definite.
23  
Write  the  following  func2on  in  quadra2c  form  and    comment  on  
the  existence  and  uniqueness  of  minima.  
f1 (x1 , x2 ) = x + x 1
2 2
2

f2 (x1 , x2 ) = x + x + x1 x2
2
1
2
2

f3 (x1 , x2 ) = x − x 2
1
2
2

f4 (x1 , x2 ) = x1 x2

f5 (x1 , x2 ) = x + x + 2x1 x22


1
2
2

f6 (x1 , x2 ) = x + x + 2x1 x2 + x1 + x2
2
1
2
2

f7 (x1 , x2 ) = x + x + 2x1 x2 + x1 − x2
2
1
2
2 24  
n n
f1 (x) = ∑ ∑xx j k
j=1 k= j+1
n n
f2 (x) = ∑ ∑ x j xk
j=1 k=1
n n n
f3 (x) = ∑ ∑ x j xk + ∑ xk
j=1 k=1 k=1
n n n
f4 (x) = ∑ ∑ x j xk + ∑ (−1) xk n

j=1 k=1 k=1

25  
Least squares solution of Ax = b
The least square solution is defined as the
minimizer of
2
f (x) = Ax − b
= ( Ax − b ) ( Ax − b )
T

=xT A T Ax − 2bT Ax + bT b
The minimum is given by ∇f (x* ) = 2A T Ax* − 2A T b = 0
⇒ A T Ax* = A T b, for which the solution always
exists (why ?)
The Hessian F(x) = A T A, which is always postive
semi-definite (why ?)
26  
Suppose A has full column rank. Then A T A is invertible.
Then x* = (A T A)−1 A T b.

! = A(A T A)−1 A T b is the least squares approximation


Then b
of b in the column space of A.

! leaves it unchanged.
(1) Applying this approximation on b
i.e., P2 = P, where P = A(A T A)−1 A T .

2) P is called the projection matrix, and least squares


approximation is also called least squares projection (onto the
column space of A).
27  
3) The projection error b! = b − b̂ = (I − P)b is orthogonal to
b̂ as a consequence of properties P2 = P and PT = P (verify).

4) Hence least squares projection is synonymous to orthogonal


projection.

5) The converse is also true: any matrix P sastifying P2 = P


and PT = P corresponds to a least squares projection onto a
subspace.

6) R(P) = R(A) and N(P) = N(A T ). Hence R(P)⊥ = N(P).


28  
7) Suppose B is a matrix such that its columns span the
the null space of A T . Then P′ = B(BT B)−1 BT is an
orthogonal projection onto the column space of B.

8) P′ = I − P (verify)

9) R(P′ ) = R(B) and N(P′ ) = N(BT ). Hence R(P′ )⊥ = N(P′ ).

10) Combine the relations in (6), (8) and (9) by using the
relationship between B and A.

29  
Least squares solution of Ax = b
The regularized least square solution is defined as the
2
minimizer of f (x) = Ax − b + λ x
2

= ( Ax − b ) ( Ax − b ) + λ x
T 2

=x T ( A T A + λ I ) x − 2bT Ax + bT b
The minimum is given by
∇f (x ) = 2 ( A A + λ I ) x − 2A b = 0
* T * T

⇒ ( A T A + λ I ) x* = A T b, for which the solution always


exists.
The Hessian F(x) = ( A A + λ I ) , which is always postive
T

30  
definite.
Exercise:  Polynomial  approxima2on  

We have samples of a function g(x) at locations


{si ; i = 1,, N }
given by { g(si ) = gi ; i = 1,, N } . Find an
approximating function
n
of the form h(x) = ∑ ak x k−1 , such that
k=1
N
f (a1 ,,an ) = ∑ ( h(si ) − gi )
2

i=1

is minimized.

31  
Let a = [a1 an ]T h = [h(s1 )h(sN )]T g = [g(s1 )g(sN )]T
Then
h = Sa,
where {S}i, j = (si ) j−1
Then f (a1 ,,an ) = h − g = Sa − g = ( Sa − g ) ( Sa − g ) .
2 2 T

=aT (ST S)a − 2aT ST g + gT g

32  
Exercise:  Polynomial  approxima2on  II  

We have samples of a function g(x) at locations


{si ; i = 1,, N }
given by { g(si ) = gi ; i = 1,, N } . Find an
approximating function
n
of the form h(x) = ∑ ak x k−1 , such that
k=1
N n
f (a1 ,,an ) = ∑ ( h(si ) − gi ) + λ ∑ ai2
2

i=1 i=1

is minimized.

Then f (a1 ,,an ) = aT (ST S + λ I)a − 2aT ST g + gT g


33  
Exercise:  Polynomial  approxima2on  III  

We have samples of a function g(x) at locations


{si ; i = 1,, N } given by { g(si ) = gi ; i = 1,, N } .
Find an approximating function of the form
n
h(x) = ∑ ak x k−1 , such that
k=1
N
f (a1 ,,an ) = ∑ ( h(si ) − gi ) +λ ∫ h 2 (x) dx
2 1
is minimized.
0
i=1

f (a1 ,,an ) =aT (ST S + λ P)a − 2aT ST g + gT g


1
{P}i, j = ∫ (x)i+ j−2 dx
0

34  
Exercise:  Polynomial  approxima2on  IV  

We have samples of a function g(x) at locations


{si ; i = 1,, N } given by { g(si ) = gi ; i = 1,, N } .
Find an approximating function of the form
n
h(x) = ∑ ak x k−1 , such that
k=1
2
N
⎛ d
2

f (a1 ,,an ) = ∑ ( h(si ) − gi )
1
+λ ∫ ⎜ 2 h(x)⎟ dx
2

0 ⎝ dx ⎠
i=1

is minimized.

35  

You might also like