E1 251 Linear and Nonlinear Op2miza2on: Chapter 4: Convex and Quadra2c Func2ons

E1
251 Linear and Nonlinear

Op2miza2on

Chapter 4: Convex and quadra2c func2ons
1
4.1 Defini2on: Convex func2on
A funtion f (x) is called a convex function

if, for every x and y in  n , and for every θ in [0,1],
we have f (θ x + (1 − θ )y) ≤ θ f (x) + (1 − θ ) f (y)
  
(1) (2)
A funtion f (x) is convex

(2) if and only if for every vector
d, the 1D function
gd (α ) = f (x + α d) is convex.
(1)
x θ x + (1 − θ )y y 2
Func2ons taxonomy
Convex Non-‐Convex
Linear Affine Convex quadra2c Others

T
b x b x+c
T f (x) = xT Qx
− 2bT x + c
Q : Positive semidefinite
3
Local minimum is global minimum
Theorem 4.1A
For a convex function f (x), if x* is a local minimum
in ! n , it is also a global minimum in ! n .
Proof:
1) if x* is not a global minimum, then there exists a y, such that
f (y) < f (x* ).
2) For any point in the line {α y + (1 − α )x*; 0 < α < 1}, we have
by convexity of f (x) that
f (α y + (1 − α )x* ) ≤ α f (y) + (1− α ) f (x* )
!## #"### $ !###"###$
T1 T2
4
3) Since f (y) < f (x* ) by assumption, T2 < f (x* ). This implies
T1 = f (α y + (1 − α )x* ) < f (x* ).
4) For α aribitrarily small, this contradicts the the fact that

x* is a local minimum.
5) Hence, for a convex function f (x) having a local minimum

at x* , there is no such y satisfying f (y) < f (x* ).
Hence, a local minimum is also a global minimum.
5
4.2 Deriva2ve based characteriza2on
Theorem 4.2A
A funtion f (x) is convex in ! n if and only if for
every pair of vectors x and y in ! n , we have
f (y) ≥ f (x) + (y − x) ∇f (x)
T
• First order Taylor's approximation is a global under

estimator for all points
• ∇f (x ) = 0 gives global minimizer
*
6
f (x)
(y, f (y))
f (y) + (x − y)T f ′(y)
f (x) ≥ f (y) + (x − y)T f ′(y)

7
Theorem 4.2B
A function f (x) is convex if and only if its Hessian F(x) is positive
semi-definite for all x.
Implica2ons of Theorems 4.2A and 4.2B in

Unconstrained op2miza2on
1) The point satisfying ∇f (x) = 0 is a global minimizer
2) If two point satisfy ∇f (x) = 0, then the function value

at the points should be equal
3) If FONC-Min is satisfied, SONC-Min is also satisfied

8
4.3 Examples of convex func2ons
Single variable func2ons

f1 (x) = x 2
f2 (x) = x 4
⎧ 0, for x ≤ a
⎪⎪
f3 (x) = ⎨ (x − a)2 , for x > a
⎪
⎪⎩ (x + a) 2
, for x < −a
f3 (x) = eax
f4 (x) = x a
f5 (x) = log(x), for x < 0
f6 (x) = x log(x)
9
Mul2variable func2ons
f1 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )3

f2 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )4
f3 (x1 , x2 ) = x / x2
2
1
10
Convexity of norms
A function f (x) is a norm if it satisfies the following

(1) Positivity: f (x) ≥ 0, f (x) = 0 if and only if x = 0.
(2) Homogeneity: f (rx) = r f(x), r ∈
(3) Triangle inequality: f (x + y) ≤ f (x) + f (y)
Proof of convexity:
f (θ x + (1− θ )y) ≤ f (θ x) + f ((1− θ )y) (Triangle inequality)
≤ θ f (x) + (1− θ ) f (y) (Homogeneity)
11
Convexity of Max func2on
f (x) =max(x1 ,...., xn )
f (θ x + (1 − θ )y) = max (θ x1 + (1 − θ )y1 ,....,θ xn + (1 − θ )yn )

≤ max (θ x1 ,....,θ xn ) +
max ((1 − θ )y1 ,....,(1 − θ )yn )
≤ θ max (x1 ,...., xn ) +
(1-θ )max (y1 ,...., yn )
12
4.4 The quadra2c func2on in standard form
4.4.1 Defini2on
f (x) = 0.5xT Qx − bT x + c
x : n × 1 vector of optimization variable

b : n × 1 constant vector
Q : n × n symmetric matrix
c: scalar constant
If Q is not symmetric let Q′ = ( Q + QT ) , and let

1
2
f ′(x) = 0.5xT Q′x − bT x + c
f (x) and f ′(x) are identical.

13
4.4.2 Proper2es of symmetric matrices
(1)  Eigen values are real
(2)  Eigen vectors can be assumed to be real
(3)  Eigen vectors are orthogonal
(4)  Are diagonalizable
5) If Q is symmetric, then Q can be expressed

as Q = UΛU T , where U is an orthogonal
matrix.
14
6) Posi2ve definiteness
(a) An n × n symmetric matrix, Q, is called positive
definite if v T Qv > 0, for every v ∈! n
(b) Since Q is symmetric, we have Q = UΛU T .

Also any v can be expressed as v = Us, for some s.
(a) and (b) imply

An n × n symmetric matrix, Q, is called positive
definite if and only if s T Λs > 0, for every s ∈! n
which in turn implies

an n × n symmetric matrix, Q, is called positive definite
if and only if all eigen values are postive.
15
4.5 Gradient and Hessian of the quadra2c form
Exercise
Compute the gradient and Hessian of the following
func2ons:
T
⎡ 4 2 ⎤ ⎡ 1 ⎤
f (x) = x ⎢ T
⎥x − ⎢ ⎥ x+8
⎣ 2 3 ⎦ ⎣ 2 ⎦
T
⎡ 4 1 ⎤ ⎡ 1 ⎤
f (x) = x ⎢ T
⎥x − ⎢ ⎥ x+8
⎣ 3 3 ⎦ ⎣ 2 ⎦
16
4.5.1 Gradient of the general quadra2c form
Direct approach
∂ T
1) show that
∂xl
( x Qx ) = ql T x + ql x
∂ ∂
2) Then
∂xl
( 0.5x T
Qx ) −
∂xl
( b T
x ) = 0.5 ( q l x + q l x ) − bi
T
3) Hence ∇ ( 0.5xT Qx ) − ∇ ( b T x ) = 0.5 ( QT x + Qx ) − b
4) Because of the symmetry of Q, we have

∇f (x) = Qx − b
17
Alterna2ve approach
f (x) = 0.5x Qx − b x + c
T T
g[x,d ] (α ) = f (x + α d)
=0.5xT Qx + 0.5α 2 dT Qd + α dT Qx
− bT x − α bT d + c
We know that
′ ] (α ) = dT ∇f (x + α d)
g[x,d
′ ] (0) = dT ∇f (x)..........................(E1)
g[x,d
18
′ ] (0) directly:
We can also compute g[x,d
g[x,d ] (α )=0.5xT Qx + 0.5α 2 dT Qd + α dT Qx
′ ] (α )=α dT Qd + dT Qx − bT d
g[x,d
′ ]
g[x,d (0)=d T
Qx − b T
d.....................(E2)
Comparing (E1) and (E2) we get

∇f (x) = Qx − b
19
4.5.2 Hessian of the quadra2c form
Direct approach
∇ ( f (x)) = Qx − b ................. (E1)
{F(x)}i, j =
∂2
∂xi ∂x j
f (x) =
∂
∂xi
({∇f (x)} )j
n
From (E1) we get {∇f (x)} j = ∑ Q j,k xk − b j
k=1
Hence {F(x)}i, j =
∂2
∂xi
({∇f (x)} ) = Qj j,i
⇒ F(x) = Q 20
Alterna2ve approach
f (x) = 0.5xT Qx − bT x + c
g[x,d ] (α ) = f (x + α d)
=0.5x Qx + 0.5α d Qd + α d Qx
T 2 T T
′′ ] (α ) = dT Qd.......................(A)
g[x,d
We know from chapter 2 that
′′ ]
g[x,d (α ) = d T
H(x + α d)d...........(B)
(A) and (B) imply
H(x) = Q.
21
Summary: Gradient and Hessian of quadra2c form
f (x) = 0.5x Qx − b x + c
T T
Gradient:
∇ ( f (x)) = Qx − b
Hessian:
F(x) = Q
22
4.6 Minimiza2on of quadra2c forms
(1) FONC
∇ ( f (x* )) = Qx* − b = 0 is solvable.
⇒ Qx* = b is solvable.
(2a) SONC
the Hessian, F(x) = Q is postive semi-definite.
(2b) SOSC-SMin
Q is postive definite.
23
Write the following func2on in quadra2c form and comment on
the existence and uniqueness of minima.
f1 (x1 , x2 ) = x + x 1
2 2
2
f2 (x1 , x2 ) = x + x + x1 x2
2
1
2
2
f3 (x1 , x2 ) = x − x 2
1
2
2
f4 (x1 , x2 ) = x1 x2
f5 (x1 , x2 ) = x + x + 2x1 x22

1
2
2
f6 (x1 , x2 ) = x + x + 2x1 x2 + x1 + x2
2
1
2
2
f7 (x1 , x2 ) = x + x + 2x1 x2 + x1 − x2
2
1
2
2 24
n n
f1 (x) = ∑ ∑xx j k
j=1 k= j+1
n n
f2 (x) = ∑ ∑ x j xk
j=1 k=1
n n n
f3 (x) = ∑ ∑ x j xk + ∑ xk
j=1 k=1 k=1
n n n
f4 (x) = ∑ ∑ x j xk + ∑ (−1) xk n
j=1 k=1 k=1
25
Least squares solution of Ax = b
The least square solution is defined as the
minimizer of
2
f (x) = Ax − b
= ( Ax − b ) ( Ax − b )
T
=xT A T Ax − 2bT Ax + bT b
The minimum is given by ∇f (x* ) = 2A T Ax* − 2A T b = 0
⇒ A T Ax* = A T b, for which the solution always
exists (why ?)
The Hessian F(x) = A T A, which is always postive
semi-definite (why ?)
26
Suppose A has full column rank. Then A T A is invertible.
Then x* = (A T A)−1 A T b.
! = A(A T A)−1 A T b is the least squares approximation

Then b
of b in the column space of A.
! leaves it unchanged.
(1) Applying this approximation on b
i.e., P2 = P, where P = A(A T A)−1 A T .
2) P is called the projection matrix, and least squares

approximation is also called least squares projection (onto the
column space of A).
27
3) The projection error b! = b − b̂ = (I − P)b is orthogonal to
b̂ as a consequence of properties P2 = P and PT = P (verify).
4) Hence least squares projection is synonymous to orthogonal

projection.
5) The converse is also true: any matrix P sastifying P2 = P

and PT = P corresponds to a least squares projection onto a
subspace.
6) R(P) = R(A) and N(P) = N(A T ). Hence R(P)⊥ = N(P).

28
7) Suppose B is a matrix such that its columns span the
the null space of A T . Then P′ = B(BT B)−1 BT is an
orthogonal projection onto the column space of B.
8) P′ = I − P (verify)
9) R(P′ ) = R(B) and N(P′ ) = N(BT ). Hence R(P′ )⊥ = N(P′ ).
10) Combine the relations in (6), (8) and (9) by using the
relationship between B and A.
29
Least squares solution of Ax = b
The regularized least square solution is defined as the
2
minimizer of f (x) = Ax − b + λ x
2
= ( Ax − b ) ( Ax − b ) + λ x
T 2
=x T ( A T A + λ I ) x − 2bT Ax + bT b
The minimum is given by
∇f (x ) = 2 ( A A + λ I ) x − 2A b = 0
* T * T
⇒ ( A T A + λ I ) x* = A T b, for which the solution always

exists.
The Hessian F(x) = ( A A + λ I ) , which is always postive
T
30
definite.
Exercise: Polynomial approxima2on
We have samples of a function g(x) at locations

{si ; i = 1,, N }
given by { g(si ) = gi ; i = 1,, N } . Find an
approximating function
n
of the form h(x) = ∑ ak x k−1 , such that
k=1
N
f (a1 ,,an ) = ∑ ( h(si ) − gi )
2
i=1
is minimized.
31
Let a = [a1 an ]T h = [h(s1 )h(sN )]T g = [g(s1 )g(sN )]T
Then
h = Sa,
where {S}i, j = (si ) j−1
Then f (a1 ,,an ) = h − g = Sa − g = ( Sa − g ) ( Sa − g ) .
2 2 T
=aT (ST S)a − 2aT ST g + gT g
32
Exercise: Polynomial approxima2on II

{si ; i = 1,, N }
given by { g(si ) = gi ; i = 1,, N } . Find an
approximating function
n
of the form h(x) = ∑ ak x k−1 , such that
k=1
N n
f (a1 ,,an ) = ∑ ( h(si ) − gi ) + λ ∑ ai2
2
i=1 i=1
is minimized.
Then f (a1 ,,an ) = aT (ST S + λ I)a − 2aT ST g + gT g

33
Exercise: Polynomial approxima2on III

{si ; i = 1,, N } given by { g(si ) = gi ; i = 1,, N } .
Find an approximating function of the form
n
h(x) = ∑ ak x k−1 , such that
k=1
N
f (a1 ,,an ) = ∑ ( h(si ) − gi ) +λ ∫ h 2 (x) dx
2 1
is minimized.
0
i=1
f (a1 ,,an ) =aT (ST S + λ P)a − 2aT ST g + gT g

1
{P}i, j = ∫ (x)i+ j−2 dx
0
34
Exercise: Polynomial approxima2on IV

{si ; i = 1,, N } given by { g(si ) = gi ; i = 1,, N } .
Find an approximating function of the form
n
h(x) = ∑ ak x k−1 , such that
k=1
2
N
⎛ d
2
⎞
f (a1 ,,an ) = ∑ ( h(si ) − gi )
1
+λ ∫ ⎜ 2 h(x)⎟ dx
2
0 ⎝ dx ⎠
i=1
is minimized.
35

E1 251 Linear and Nonlinear Op2miza2on: Chapter 4: Convex and Quadra2c Func2ons

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

E1 251 Linear and Nonlinear Op2miza2on: Chapter 4: Convex and Quadra2c Func2ons

Uploaded by

Copyright:

Available Formats

E1

251 Linear and Nonlinear

A funtion f (x) is called a convex function

A funtion f (x) is convex

Linear Aﬃne Convex quadra2c Others

4) For α aribitrarily small, this contradicts the the fact that

5) Hence, for a convex function f (x) having a local minimum

• First order Taylor's approximation is a global under

f (y) + (x − y)T f ′(y)

f (x) ≥ f (y) + (x − y)T f ′(y)

Implica2ons of Theorems 4.2A and 4.2B in

1) The point satisfying ∇f (x) = 0 is a global minimizer

2) If two point satisfy ∇f (x) = 0, then the function value

3) If FONC-Min is satisfied, SONC-Min is also satisfied

Single variable func2ons

f1 (x1 , x2 ) = (x1 + x2 )2 + (x1 − x2 )3

A function f (x) is a norm if it satisfies the following

f (x) =max(x1 ,...., xn )

f (θ x + (1 − θ )y) = max (θ x1 + (1 − θ )y1 ,....,θ xn + (1 − θ )yn )

x : n × 1 vector of optimization variable

If Q is not symmetric let Q′ = ( Q + QT ) , and let

f (x) and f ′(x) are identical.

(1) Eigen values are real

(2) Eigen vectors can be assumed to be real

(3) Eigen vectors are orthogonal

(4) Are diagonalizable

5) If Q is symmetric, then Q can be expressed

(b) Since Q is symmetric, we have Q = UΛU T .

(a) and (b) imply

which in turn implies

3) Hence ∇ ( 0.5xT Qx ) − ∇ ( b T x ) = 0.5 ( QT x + Qx ) − b

4) Because of the symmetry of Q, we have

Comparing (E1) and (E2) we get

∇ ( f (x)) = Qx − b ................. (E1)

f5 (x1 , x2 ) = x + x + 2x1 x22

j=1 k=1 k=1

! = A(A T A)−1 A T b is the least squares approximation

2) P is called the projection matrix, and least squares

4) Hence least squares projection is synonymous to orthogonal

5) The converse is also true: any matrix P sastifying P2 = P

6) R(P) = R(A) and N(P) = N(A T ). Hence R(P)⊥ = N(P).

9) R(P′ ) = R(B) and N(P′ ) = N(BT ). Hence R(P′ )⊥ = N(P′ ).

⇒ ( A T A + λ I ) x* = A T b, for which the solution always

We have samples of a function g(x) at locations

=aT (ST S)a − 2aT ST g + gT g

We have samples of a function g(x) at locations

Then f (a1 ,,an ) = aT (ST S + λ I)a − 2aT ST g + gT g

We have samples of a function g(x) at locations

f (a1 ,,an ) =aT (ST S + λ P)a − 2aT ST g + gT g

We have samples of a function g(x) at locations

You might also like

(1)  Eigen values are real

(2)  Eigen vectors can be assumed to be real

(3)  Eigen vectors are orthogonal

(4)  Are diagonalizable