You are on page 1of 53

MA412 exercises

November 22, 2022

Questions & Answers

1 Part 1
1.1 Linear Algebra
Exercises 1
1. Consider the vectors
  
  
1 −1 0
a1 = 2 , a2 =  3  , a3 = 2 .
3 2 2

Use Gaussian elimination (transform the associated matrix in normal


form) to check whether a1 , a2 , a3 are linearly independent. Derive the
associated linear homogeneous system and show that it has ∞(3−2) solu-
tions.

Solution
The matrix A with a1 , a2 , a3 as column vectors is a square matrix of order
3: Gaussian elimination would work as follows:

       
1 −1 0 1 −1 0 1 −1 0 1 0 2/5
A = 2 3 2 →(1) 0 5 2 →(2) 0 5 2 →(3) 0 1 2/5
3 2 2 0 5 2 0 0 0 0 0 0

where: the following elementary operations were performed (Rj indicates


the j-th row): (1) : R2−R1, R3−3·R1; (2) : R3−R2; (3) : R2·1/3, R1+
R2. We see that the last matrix includes a (2, 2) identity submatrix, then
rk(A) = 2 and indeed the associated linear system admits ∞ solutions
with x1 = x2 = −2/5t and x3 = t ∈ R.

1
2. Consider the vectors
   
2 −1
a1 = , a2 = .
2 0

Are a1 , a2 linearly independent?.


If so, compute the inverse matrix of A. Where A is the square matrix
whose columns are the vectors a1 , a2 .

Solution
We see that a1 and a2 are l.i. since matrix A
 
2 −1
(1)
2 0

has determinant different from 0 and this is equivalent to say that the
homogeneous system α1 a1 + α2 a2 = 0 can be satisfied only having both
α1 , α2 = 0 null (this is the very definition of l.i., then we have all the
associated results as the one mentioned above on the determinant).
The inverse of a (2, 2) matrix can be computed straight away. The deter-
minant of A has value 2, then:
   
−1 1 0 1 0 1/2
A = · =
2 −2 2 −1 1

That can be verified through A · A−1 = I the identity matrix of order 2.


3. Given x1 = (1, 0, 1)T , x2 = (2, 0, 0)T . Verify whether they represent a
basis of a subspace X ⊂ R3 and derive the orthonormal basis for X.

Solution
We see immediately that the two vectors are defined in R3 but have the
second coordinate equal to 0. They are l.i. because their linear combina-
tion α1 x1 + α2 x2 generates the null vector only if both coefficients α1 , α2
are equal to 0. Then, yes, they form a basis in the 2-dimensional subspace
X ⊂ R3 . To derive the orthonormal basis we need to go through: (a) nor-
malization of the first vector, (b) derivation of the orthogonal projection
of the second vector, (c) normalization of the second vector. Thus:
 √ 
√ 1/ 2
(a) ||x1 || = 2 → v1 =  0√ 
1/ 2
 
1
(b) (xT2 · v1 ) · v1 = 0
0
   p 
2 1/ (2)
√ 
 
0− √2   0   
(2) 
1/ 2
  p 
x2 −(xT
2 v1 )v1
0 1/ (2)
(c) v2 = ||x2 −(xT
= √ =  0√ 
2 v1 )v1 || 2
−1/ 2

We see that ||v2 || = 1 and v1T · v2 = 0 so the 2 vectors form an orthonormal


basis in R2 .
4. Compute the rank of A as k varies
 
2 k 1 0
A= 1 0 −1 2 .
−k 1 0 1

Hint: the matrix is (3,4) so you already know that its rank cannot be
greater than 3 = min{3, 4}. You check whether the matrix contains a
non-singular (2,2) submatrix and if so, then you verify which conditions
on k should hold for the two (3,3) submatrix containing that one to be
nonsingular as well (Kronecker result).

Solution We apply Kronecker theorem and verify the rank by considering


selected sub-matrices. Then, by indicating the determinant with | · |, we
have: |a11 | = 2 ̸= 0, thus 1 ≤ rk(A) ≤ 3 = min{i, j}. Furthermore we
see that submatrix of order 2 formed by the last 2 columns and the first
2 rows has determinant 1 thus 2 ≤ rk(A) ≤ 3. Now we consider the (3, 3)
submatrix formed by the last 3 columns, the associated determinant does
depend on k:
 
k 1 0
Det 0 −1 2 = (−k + 2) ̸= 0 → k ̸= 2
1 0 1

Thus for any k ̸= 2 we have rk(A) = 3. On the other hand for k = 2 the
other minor submatrix formed by the 1st , 3rd and 4th columns has deter-
minant not equal to 0 and this is sufficient for rk(A) = 3 independently
of the value of k, which completes the solution.

Exercises 2

1. Normalize the vector x = (2, 3, −3)T using the Euclidean norm.
Solution
This is a trivial problem: first we compute ||x|| and then divide the
 vector

q
√ 2 √ 2
by the norm. So ||x|| = 22 + 3 + (−3)2 = 4, then xN = 41  3 =
−3
 1 
√2
q √
 3 .
4
We verify that ||xN || = 1: 0.52 + ( 3 2
4 ) + ( −3 2
4 ) = 1, so xN is
−3
4
a vector of unit length.
   
1 1
2. Consider the vectors v = ,w= . Determine the angle between
0 1
v, w: based on their angle, what can you say about the orthogonality of v
and w? What about their linear independence (l.i.)? Is l.i. sufficient for
orthogonality? What about the ⊥ =⇒ l.i.?
Solution
We indicate the angle between the two vectors with θ and just compute
the associated cosine:
(v T w) 1
cos(θ) = ||v||·||w|| then the angle between the two vectors in R2 is
= √
1· 2 √
given by the arc-cosine and we see that arcsin 1/ 2 = 0.785398 = π4 .
Remark: Based on the angle the two vectors are not orthogonal: we would
have got the same answer by just making the inner product between the
2 vectors, with a result different from 0. On the other hand the matrix
associated with the two vectors has determinant equal to 1 and therefore
the two vectors are l.i. So l.i. obviously does not imply orthogonality while
the opposite is always true.
3. Determine the value of k ∈ R for which v ⊥ w are orthogonal.

(a)    
−1 −5
v =  3k  , w =  0 .
0 2

Solution v T · w = ((−1) · (−5) + 3k · 0 + 0 · 2 = 5 independently of k.


The inner product is just 5 and thus there is no k for which the two
vectors would be orthogonal.
(b)    
2k + 1 4
 −1  k−9
v=
 0 ,
 w=
 5k  .

6 3k − 2

Solution same as before: v T ·w = (2k+1)·4+9−k+0·5k+6·(3k−2) =


8k + 4 + 9 − k + 18k − 12 = 25k + 1 then for this to be 0 we need
1
k = − 25 .
(c)    
k 3
v = 1 , w = −k  .
k 1
Solution same as before: v T · w = 3k − k + k = 3k so only k = 0 can
lead to v ⊥ w.
(d)    
4−k −k
 0  k + 1
 3k  ,
v= 
 −k  .
w= 

7 −2

Solution Finally: v T ·w = (4−k)·(−k)+0·(k+1)+3k·(−k)+7·(−2) =


k 2 − 4k − 3k 2 − 14 = −2k 2 − 4k − 14 then if √
we write the generic form
−b + b2 −4ac
a · k 2 + b · k + c = 0, with roots k1,2 = − 2a , then we see that
b2 − 4ac = 16 − 112 under the square root so the equation admits
only complex solutions and we cannot determine any k leading to
orthogonality.

4. Check if the row vectors in


 
1 0 2
A= 2 2 −1 .
−4 5 2
define an orthogonal basis in R3 .
Hint: The question asks whether the row-vectors of A : a1 := {a1,j }, a2 :=
{a2,j }, a3 := {a3,j }, j = 1, 2, 3 define an orthogonal basis: a sufficient con-
dition is that A · A⊺ is diagonal. Then we can say that those vectors form
an orthogonal basis in the R3 -row space. The result cannot be extended
automatically to the column space, for which AT · A should be diagonal.
Solution The sufficient condition in the Hint is very practical and useful.
We can very well compute the inner products between the row vectors as
usual and verify that they are all equal to 0, but the same result can be
achieved just by multiplying the matrix for its transpose: if orthogonal
the resulting matrix will be diagonal.
     
1 0 2 1 2 −4 5 0 0
2 2 −1 · 0 2 5  = 0 9 0
−4 5 2 2 2 −1 0 0 45

Then, yes they form an orthogonal basis. Due to the lack of commutative
property A · AT ̸= AT · A, then we cannot extend the evidence to the
column space.
Remark: this exercise is applied to a square matrix. What if we consider
a rectangular matrix? If A ∈ Rm,n , assume m < n: we would have m row
vectors defined in Rn and the product A · AT would be a matrix (m, m):
then the resulting vectors, say m = 2, in our exercise will still be orthogonal
but now defined in the subspace R2 . On the contrary if m > n, say m = 4
we would have 4 row vectors in R3 and the resulting matrix A · AT would
be (4, 4) and could not be diagonal due to the linear dependence of the 4
row vectors we started from.

Exercises 3
1. For each of the following
  matrices compute the solution x to the system
−1
Ax = b, where b = .
4
 
3 0
(a) A = .
0 −2
 
3 1
(b) A = .
0 −2
 
3 1
(c) A = .
6 −2

Solution

(a) Observe that det(A) = −6 ̸= 0. Thus, rk(A) = rk(A|b) = 2. Then,


the solution is unique and given by
( (
3x1 + 0x2 = -1 x1 = -1/3
Ax = b =⇒ =⇒
0x1 + (−2)x2 = 4 x2 = -2

(b) Here again det(A) = −6 ̸= 0, there is only one solution


( (
3x1 + 1x2 = -1 x1 = 1/3
Ax = b =⇒ =⇒
0x1 + (−2)x2 =4 x2 = -2

(c) In this case, det(A) = −12 ̸= 0, then using Cramer’s formula we can
compute A−1 . Since there exists the inverse matrix, the solution for
the linear system is x = A−1 b
      
−1 1/6 1/12 −1 1/6 1/12 −1 1/6
A = =⇒ x = A b = =
1/2 −1/4 1/2 −1/4 4 −3/2
(
−x − 2y + z = -3
2. Consider the linear system: Verify first whether it
3x − 4y + z = 9
is compatible and then, in case, determine its solution. Hint: you use
Rouche Capelli to verify whether compatible (rk(A) versus rk(A|b) and
then if compatible you can solve it. To do that, derive an equivalent system
by replacing the second equation with its difference with the first eq.
Solution We write this linear system as Ax = b with A ∈ R(2,3) with three
unknowns. The problem admits solutions since rk(A) = rk(A|b) = 2, the
rank however is smaller than the number of unknowns and the system will
then have infinite solutions. We can see this, by observing that both x
and y can be expressed as functions of z, which is free to change:
(
x = 15 (z + 15)
y = 25 (z)
resulting in infinite solutions as z ∈ R.
Let A be an (m, n) matrix with n column vectors each one with m com-
ponents. We can summarize in this table a set of relevant conditions;

Matrix A specification rk(A)=rk(A|b)=r l.i vectors → basis description


n≥m r=m Rm Vector space of dim=m
n≥m r<m Vmr ⊂ Rm Linear variety of dim=r
n<m r≤n Vnr ⊂ Rn Linear variety of dim=r

If rk(A) ̸= rk(A|b) we know that the system is inconsistent and has no


solution (Rouche-Capelli theorem).

2x − y + z
 =0
3. Consider the linear homogeneous system: −x + 2y − 3z = 0 . Does

3x + y + 2z =0

this system admits solutions other than the trivial solution?.

Solution Observe that det(A) = 14 ̸= 0, thus, the unique solution for


Ax = 0 is the trivial x = 0.
4. Solve by Gaussian elimination the linear system Ax = b, where
   
2 4 −2 2
A= 4 9 −3 , b =  8  .
−2 −3 7 10

Solution
We denote by Ri + a ∗ Rj → Ri the step that sums to the row Ri the
vector aRj . The Gaussian Elimination process is as follows

   
2 4 −2 2 2 4 −1 2
4 9 −3 8  ⇒ R2 + R1 (−1) → R2  0 3 11 28 ⇒
−2 −3 7 10 −2 −3 7 10

   
R3 + R1 → R3 0 4 −2 2 R2 − 2R3 → R2 2 4 −2 2
============⇒ 0 3 11 28 =============⇒ 0 1 1 4  ==⇒
0 1 5 12 0 1 5 12
 
R3 − R2 → R3 2 4 −2 2
============⇒ 0 1 1 4
0 0 4 8
. Thus, the solution for the linear system is given by
 
2x1 + 4x2 − 2x3 = 2
 x 3 = 2

0x1 + 1x2 + 1x3 = 4 ==⇒ x2 = 2
 
0x1 + 0x2 + 4x3 = 8 x1 = −1
 

Exercises 4
 
0.8 0.3
1. Let A = . Compute the associated eigenvalues λ1 , λ2 and
0.2 0.7
eigenvectors v1 , v2 for which A · vj = λj vj . Verify that the eigenvector are
l.i. and that A = V · Λ · V −1

Solution: We have solved this exercise in class. I provide some additional


clarifications to make everything clear. Notice that p(λ) = Det(λI − A) =
(λ − 0.8) · (λ − 0.7) − (0.3 · 0.2) − 0.06 = (0.8 − λ) · (0.7 − λ) − 0.06 =
Det(A − λI) = λ2 − 1.5λ + 0.5.
We want to identify the roots of p(λ). They are λ1 = 1 and λ2 = 1/2.
We need to replace each eigenvalue in the matrix (λI − A) to derive the
eigenvectors.
Since we have 2 distinct eigenvalues, we see already that we will have 2
l.i. eigenvectors v1 , v2 such that, by construction Avj = λj vj , for j = 1, 2.
Remark: this is irrelevant to solve the problem, but we notice as aside that
the eigenvalues are both positive, so matrix A is positive definite.
Let λ1 = 1: We consider the system
    
1 − 0.8 0 − 0.3 v11 0
=
0 − 0.2 1 − 0.7 v12 0
We have two equations: 0.2v11 − 0.3v12 = 0 and −0.2v11 + 0.3v12 = 0:
both then require 0.2v11 = 0.3v12 so the eigenvector is any vector in R2
whose first coordinate is 1.5
 times the second coordinate: this identifies
1.5
the direction v1 = t · for t ∈ R.
1
Consider now λ2 = 1/2. We replace this value in the system
    
0.5 − 0.8 −0.3 v21 0
=
−0.2 0.5 − 0.7 v2 2 0
And you see immediately that the system is satisfied by any vector with
second coordinate
  equal to the first with sign changed: v21 = −v22 , thus
1
v2 = t · for t ∈ R.
−1
You can finally verify the equalities λj vj = Avj . Also you can check that
the diagonal matrix with eigenvalues along the diagonal satisfies: Λ =
V −1 · A · V where V is the matrix whose columns are the two eigenvectors.
 
2 2
2. Let A = . Compute the associated eigenvalues and eigenvectors by
1 3
solving the characteristic polynomial p(λ). Hint: p(λ) = Det(A − λI) =
(2 − λ) · (3 − λ) − 2 whose solution will determine the two eigenvalues.

Solution: A is a square matrix of order 2, nonsingular and we can derive


two eigenvalues and from each of them an eigenvector also defined in R2 .
We follow a slightly different route from above and modify the matrix
(A − λI) with one elementary, rank preserving operation: replace the first
row with a linear combination between itself and (λ − 2) times the second
row ((R1 + (λ − 2)R2 to replace R1). Then
  
2−λ 2 0 2 + (3 − λ) · (λ − 2)
1 3−λ 1 3−λ

Remark 1: why is that a good idea? We first notice that the element a21 =
1 (and we know that the second rwo will not change after the elementary
operation above), then by subtracting (2 − λ) (same as summing (λ − 2))
we get a lower triangular matrix whose determinant will now be entirely
determined by the element first row second column.
    
0 2 + (3 − λ) · (λ − 2) x1 0
=
1 3−λ x2 0

we have now p(λ) = −2 − (3 − λ) · (λ − 2) = λ2 − 5λ + 4 = 0 with roots


λ1 = 1, λ2 = 4. We can derive the eigenvectors.
The eigenvectors must satisfy:
    
0 2 + (3 − λ)(λ − 2) x1 0
=
1 3−λ x2 0

From λ1 = 1: we have from the first equation that x2 can take any
value and from the second equation that the eigenvector must
 satisfy
 the
−2
equation x11 = −2x12 thus again for t ∈ R we have x1 = t · .
1
From λ2 = 4, likewise from the first equation the second coordinate is
free andfrom
 the second we must the first coordinate equal to the second:
1
x2 = t · .
1
Remark 2: what can we notice from this solution? As for any spectral de-
composition since from the beginning we construct a system whose deter-
minant is 0 (singular matrix) then we know that the homogeneous system
will have infinite solutions: we exploit this result to define through the two
eigenvalues two vectors which are l.i., identify two directions and have the
same second coordinate: this is a linear variety of dimension 1.
 
1 0 −1
3. Consider the symmetric matrix A =  0 2 0 . Solve the associated
−1 0 1
characteristic polynomial to determine the eigenvalues and the associated
eigenvectors. Are they orthogonal? If so determine the associated or-
thonormal basis.

Solution: This was solved in class as well, so I just point out the key
elements of the solution.
First, you see that the matrix A is square of order 3 and most importantly
symmetric and of rank 2, since the third column is equal to the first
times −1. From these remarks we already know that, given 2 distinct
eigenvalues, the associated eigenvectors will be orthogonal (due to the
matrix symmetry).
 So
 we derive the characteristic polynomial p(λ) =
(2 − λ) · (1 − λ)2 − 1 = −λ · (2 − λ)2 = 0 with roots λ1 = 0 and λ2 = 2.
We now derive for every eigenvalue the eigenvectors by solving the homo-
geneous system (A − λI)v = 0.
   
v1 − v3 0
• From λ1 = 0:  2v2  = 0 from which we see that the
−v1 + v3 0
second coordinate must be 0 and the firstand  the third must be
1
equal, so to identify the eigenvector v1 = t 0.
1
• From λ2 = 2 exactly same procedure and we notice now that the
second column of the system has three 0’s, so the second coordinate
is free while the first and the third are equal with opposite sign. On
the other hand v1 = v3 = 0 is also a solution.
 
1
We have identified, for t ∈ R, three eigenvectors v1 = t · 0, v2 =
    1
−1 0
t ·  0  and v3 = t · 1. The associated matrix is orthogonal V =
 1  0
1 −1 0
0 0 1 with an associated orthonormal basis ui = vi :
||vi ||
1 1 0
 √ √ 
1/ 2 −1/ 2 0
U =  0√ 0√ 1.
1/ 2 1/ 2 0
Exercises 5
 
2
1. Let v = 3 ∈ R3 , determine its projections onto the plane (x, y) and
1
then onto the z− axis.
Hint: you need to determine the projection matrices P1 and P2 s.t. pj =
Pj · v are defined as requested and p1 + p2 = v, P1 + P2 = I.
Solution: We saw this in class and I summarize the key elements of this
simple but useful exercise.
For the projection
 onto the(x, y) plan, let the projection
  matrix (or pro-
1 0 0 2
jector ) be P1 = 0 1 0, then p1 = P1 · v = 3.
0 0 0 0
 
0 0 0
For the second projection on the z axis, we take likewise P2 = 0 0 0,
  0 0 1
0
then p2 = P2 · v = 0. As a result p1 + p2 = v and P1 + P2 = I. Notice
1
that dim(P1 ) + dim(P2 ) = 3 and p1 ⊥ p2 .
Remark: the above result is general, meaning that once we project onto
orthogonal spaces it must be true that the sum of the dimensions of the
resulting sub-spaces is equal to the dimension of the original space. It is
also a general result that any vector in a linear space admits an orthogonal
decomposition such as the one above. These properties are corollaries to
the main result, namely that a matrix P is an orthogonal projector onto a
subspace V if and only if P 2 = P = P T .
 
4
2. Determine the orthogonal projection of x = , onto the subspace
  −1
3
generated by v = .
2
Solution: Here we have a vector in R2 and you are asked to compute its
projection onto a space which is defined by another vector also in R2 . It is
an easy exercise and we then discuss the solution. The resulting projection
will be x⊥ .
√ √
we normalize v: given ||v|| = 32 + 22 = 13, we define vN =
First !
√3
13 .
√2
13
Then we compute the projection:
!
√3
 30 
⊥ T √3 √2 ) 13
x = (x · vN )vN = (4 · 13
− 12
· √2
= 13
20 .
13 13
The solution is complete.
Remark: The term ”projection onto” may be misleading, as if it meant
that its purpose is to project a vector say from R2 into the line or from
R3 into R2 . No, that is not correct! Indeed in the exercise, we see that
everything occurs in R2 , being both the original vector and its projection
defined in that space. The second however is indeed a subspace, defined by
all those vectors whose first coordinate is 1.5 time the second coordinate,
whereas the original vector was arbitrary. Notice that such property is
defined by the eigenvector and evident if you look at the two coordinates
of vN .

Exercises 6
1. Is the function f (x, y, z) = −x2 − 2y 2 − 8z 2 + 8yz a quadratic form? Study
its sign (is n.d, n.s.d, p.d, p.s.d)?

Solution Observe that we can write the function as f (v) = v T Av, where
 
−1 0 0
 0 −2 4  .
0 4 −8

Thus, f is a quadratic form. A computation shows that the eigenvalues


are −10, −1, 0, then, the function f is n.s.d.
2. Consider the function f (x1 , x2 , x3 ) = 3x21 − 2x2 x3 + 4x23 . Is the associated
quadratic form xT · A · x positive definite (p.d.) or positive semi definite
(p.s.d.)?
Solution The associated quadratic form xT · A · x is given by the matrix
 
3 0 0
A = 0 0 −1 .
0 −1 4

Observe that 3 is an eigenvalue for A, and det(A) = −3. Therefore, the


matrix A has a negative eigenvalue and also a positive eigenvalue, which
shows that f is indefinite.
3. Study the sign of A as α ∈ R varies
 2 
α −1 0 0 0
 0 0 0 0 
 .
 0 0 α−1 0 
0 0 0 2−α

Hint: you see that this is a diagonal matrix already in the form λI so the
diagonal elements are eigenvalues and the question asks to check the sign
as α changes. It is sufficient to look at those values which are relevant:
α ≤ −1, α = 1, α ∈ (1, 2], α > 2.
Solution First of all, note that A is a diagonal matrix. Then its eigenvalues
are the elements in the diagonal.
Let us denote λ1 = α2 − 1, λ2 = 0, λ3 = α − 1, λ4 = 2 − α.
For α ∈ R we have the following cases:
• If α < −1 then λ1 > 0, λ2 = 0, λ3 < 0, λ4 > 0. Thus, A is indefinite.
• If α = −1 then λ1 = 0, λ2 = 0, λ3 < 0, λ4 > 0. Thus, A is indefinite.
• If α = 1 then λ1 = 0, λ2 = 0, λ3 = 0, λ4 > 0. Thus, A is psd.
• If 1 < α ≤ 2 then λ1 > 0, λ2 = 0, λ3 > 0, λ4 ≥ 0. Thus, A is psd.
• If α > 2 then λ1 > 0, λ2 = 0, λ3 > 0, λ4 < 0. Thus, A is indefinite.

1.2 Geometry
Exercises 7
1. Consider the function of two variables x1 , x2 : f (x1 , x2 ) = α1 x21 − α2 x22
with α1 , α2 ∈ R. For which values of αj , j = 1, 2 is the function concave,
convex, neither of the two?
Solution: The function is defined over R and it is sufficient to consider
values of α1 and α2 with different sign to assess the convexity of the
function. From the definition of convexity, we also have that f (x1 , x2 ) is
convex if for any x = (x1 , x2 )T and y = (y1 , y2 )T , with γ ∈ [0, 1], we have
γf (x) + (1 − γ)f (y) ≥ f (γx + (1 − γ)y). According
 to this
 condition, it
x1 y1
is sufficient to identify two points x = and y = and verify
x2 y2
whether a convex combination of the associated function values is above
or below the value of the function of their convex combination. You see
that the result will depend on the sign of α1 , α2 .
Also, you can see that the function defines a quadratic form and for α1 > 0
and α2 < 0, f (x1 , x2 ) is a quadratic function and the associated matrix
diagonal and positive definite. This implies the convexity of the function.
For α1 < 0 and α2 > 0, f (x1 , x2 ) is still a quadratic function but now the
associated matrix is negative definite leading to a concave surface.
While for α1 and α2 either both positive or both negative, the function is
neither concave nor convex, the matrix is indefinite and indeed the surface
has a saddle point at (0, 0).
2. Is the domain of p
1 − x22
f (x1 , x2 ) = ,
x2 − x1
open or closed, bounded or unbounded, convex?
Solution Remark: a domain is open if all points in the domain are inner
points and it is closed if it does include all its limit points. As you know
R is both open (since all its values are inner points) and closed (since all
are limit points). It is bounded if the boundary points are finite otherwise
it is unbounded if in some direction we go to |∞|.
To obtain the domain D we consider the conditions 1−x22 ≥ 0 and x2 −x1 ̸=
0.
Therefore, −1 ≤ x2 ≤ 1 and x1 ̸= x2 .
In particular, the domain D is not open because (0, 1) ∈ D but is not an
inner point. The element (0, 0) is a limit point of D but it does not belong
to D, so D is not closed.
The domain D is unbounded because it contains the sequence {(n, 0)}n∈N .
Finally, it is non convex, the elements (−1, 0) and (1, 0) belong to D but
the midpoint of the segment between them which is (0, 0) does not belong
to D.
3. Let √
x1 x2
f (x1 , x2 ) = .
x1 + x2
Is the domain of the function a convex set? Is there a hyperplane through
the origin that is a separating hyperplane of the domain into convex sets?
Solution
To obtain the domain D we consider the conditions x1 x2 ≥ 0 and x1 +x2 ̸=
0.
Therefore, sign(x1 ) = sign(x2 ) and x2 ̸= −x1 . So, the domain is not
convex, because (1, 1) and (−1, −1) belong to D but (0, 0) ̸= D.
Note that x1 +x2 = 0 is a separating hyperplane through the origin and the
sets D1 = {(x1 , x2 ) ∈ D | x1 + x2 > 0} and D2 = {(x1 , x2 ) ∈ D | x1 + x2 < 0}
are convex because D1 is defined by x1 ≥ 0, x2 ≥ 0, (x1 , x2 ) ̸= (0, 0) and
D2 is defined by x1 ≤ 0, x2 ≤ 0, (x1 , x2 ) ̸= (0, 0).
ln(4−x21 −x22 )
4. Let f : R2 → R, f (x) = √
x1 + x2 .
√ Determine the domain of f , is this
function concave/convex?

Solution The domain of the function consists of the points satisfying



4 − x21 − x22 >0 
4 > x21 + x22

 √x + √x


1 2 ̸= 0 
∼ (x1 , x2 ) ̸= (0, 0)
 x1 ≥0 
x1 , x2 ≥0

 
x2 ≥0

The first condition requires both x1 and x2 to be defined in the open set
(0, 2) and the second excludes the origin from the function domain. Thus,
the domain is neither open nor closed. The point (0, 1) ∈ D is not an
inner point and the point (0, 0) is a limit point but does not belong to D.

x2
2 x21 + x22 < 4

(x1 , x2 ) > 0

2 x1

The domain is thus not open and not closed but bounded and convex,
because any convex combination of points in the domain still belongs to
the domain.

1.3 Multivariable calculus


Exercises 8
√ x3
x1 + 2x2 − x3 + ln(1+x
1. Let f (x1 , x2 , x3 ) = 2 ) . Compute the partial deriva-
1
 
3
tive ∂f /∂x3 and evaluate its value in x0 =  0 . Is the function increas-
−1
ing or decreasing along this direction?
Solution
We apply the derivation rules and compute the partial derivative with
respect to x3 as:

1 1
∂f /∂x3 (x) = − √ + .
2 x1 + 2x2 − x3 ln(1 + x21 )

This is the function that you are asked to evaluate at the point x0 . We
have

1 1
∂f /∂x3 (x0 ) = − √ + = 0.184294
2 3 + 1 ln(10)
We see that the partial derivative is positive so the function is increasing
in that direction.
p
2. Let f (x1 , x2 ) =  x21 + 24 + 3x2 + x1 . Evaluate
 the directional derivative
1 1
of f (x) in x0 = along the direction u = .
2 0
Hint: you need first to derive the gradient of the function and then evaluate
it in x0 : the directional derivative is then Du f (x0 ) = ∇f (x0 )T · u.
Solution The expression for the gradient is

√ x21
!
+1
∇f (x) = x1 +24
3

. Then, the directional derivative in x0 is


 
1
Du f (x0 ) = ∇f (x0 )T · u = (6/5 3) · = 6/5.
0
 
−1/2
3. Determine the equation of the tangent hyperplane at x0 = of the
1
function f (x1 , x2 ) = x2 e−2x1 .
Solution First, we compute the partial derivatives

−2x2 e−2x1
 
∇f (x1 , x2 ) =
e−2x1

Since the tangent hyperplane is generated by the partial derivatives, then


the equation is given by

 
x1 + 1/2
f (x0 ) + ∇f (x0 )T · (x − x0 ) = e + (−2e e) = e · (x2 − 2x1 − 1).
x2 − 1

4. Let f : R3 → R defined by f (x1 , x2 , x3 ) = 2x1 + 3x21 x2 + x22 x3 . Compute


2
Df (x
0 ), D f (x0 ), the first and second order differentials of f in the point
1
x0 =  0 .
−1

Solution
Note that  
2 + 6x1 x2
∇f (x) = 3x21 + 2x2 x3  .
x22
At x0 we have  
2
∇f (x0 ) = 3 .
0
Therefore, Df (x0 ) · (x − x0 ) = ∇f (x0 )T (x − x0 ) = 2(x1 − 1) + 3x2 =
2x1 + 3x2 − 2.
As for the Hessian:
 
6x2 6x1 0
Hf (x) = 6x1 2x3 2x2  .
0 2x2 0
At x0 we have
 
0 6 0
D2 f (x0 ) = Hf (x0 ) = 6 −2 0 .
0 0 0

Remark: the second order differential defines a quadratic form:


(x − x0 )T D2 f (x0 )(x − x0 )
= −2x22 + 12x1 x2 − 12x2 .
and we see below the second term of Taylor expansion in R3 in this case.
5. Compute the directional derivative of the function f : R2 → R, f (x, y) =
ex y in the direction of the vector u = (2, 1)T evaluated at (x0 , y0 )T =
(2, 0)T .
Solution We apply the formula for the directional derivative as the inner
product between the gradient evaluated at x0 and the direction.
 x   
e y 0
Consider the gradient: ∇f (x, y) = =x0 .
ex e2
The directional derivative is then 0 · 2 + e2 · 1 = e2 positive and steep!

Exercises 9
1. Consider the function f (x1 , x2 ) = 2x1 + x2 + 3. Plot the level curves in
R2 as the level c varies. Identify the direction of increasing in R2+ .
Solution For a given c the level curves are defined by 2x1 + x2 = c − 3
which are lines in R2 . Thus, when c increases the function f along the
level curves also increases.
2. Determine and plot the level curves of f (x1 , x2 ) = 4x21 − x22 for c1 = 0,
c2 = 1, c3 = −1. In which direction do we climb or move down the
surface?
Solution For a given c, the level curve is defined by equation 4x21 − x22 = c
which is a hyperbola for c ̸= 0. When c > 0 the hyperbola is transverse in
the x − axis, while for c < 0 the hyperbola is transverse in the y − axis.
For c = 0 the level curve consists of two lines. Observe that the function
f climbs when the value of the level curves c increases.
Figure 1: Relevant figure for problem 9.1

Figure 2: Relevant for problem 9.2


3. Let f (x1 , x2 ) = −x21 +x1 −x22 −2x2 −1: derive the level sets of this function
as level c varies and show that they define a family of concentric circles in
R2 . Compute the gradient ofthe  function and derive the equation of the
1/2
tangent hyperplane at x0 = .
−1
Solution
We derive the contour equation:

x21 − x1 + x22 + 2x2 = −(1 + c)

which simplifies in:


 2
1 2 1
x1 − + (x2 + 1) = −c
2 4
q
1 1

which defines a circle with a radius 4 − c centered at 2 , −1 . In order
1
to have a nonempty curve it is necessary to consider c ≤ 4.
The gradient of f is  
−2x1 + 1
∇f (x) = .
−2x2 − 2
 
1/2
In particular, at x0 = we have ∇f (x0 ) = (0, 0)T and x0 is a
−1
stationary point.
Therefore, the tangent hyperplane at x0 is
1
f (x0 ) + ∇f (x0 )T (x − x0 ) = f (x0 ) = .
4

Exercises 10
1. Let f : R3 → R, f (x) = x1 x2 − x23 . Write down the Taylor expansion of
the second order around x0 = (1, 1, 0).
Solution
   
1 0 1 0
Note that f (x0 ) = 1, ∇f (x0 ) = 1 and Hf (x0 ) = 1 0 0 .
0 0 0 −2
Then,
1 2
f (x) = f (x0 )+∇f (x0 )T (x−x0 )+ (x−x0 )T Hf (x0 )(x−x0 )+o(∥x − x0 ∥ ).
2!

You multiply the vectors and matrices and derive a polynomial of degree
2. The function is the sum of this polynomial and the remainder which is
2
o(∥x − x0 ∥ ).
2. Let f : R3 → R, f (x1 , x2 , x3 ) = x1 x2 + ex3 . Expand the Taylor series to
the second order around x0 = (1, 1, 0)T .
Solution    
1 0 1 0
Note that f (x0 ) = 2, ∇f (x0 ) = 1 and Hf (x0 ) = 1 0 0.
1 0 0 1
Then,
1 2
f (x) = f (x0 )+∇f (x0 )T (x−x0 )+ (x−x0 )T Hf (x0 )(x−x0 )+o(∥x − x0 ∥ ).
2
    
x1 − 1 0 1 0 x1 − 1
We have: f (x) = 2+(1 1 1) x2 − 1+1/2(x1 −1 x2 −1 x3 ) 1 0 0 x2 − 1+
x3 0 0 1 x3
2 2
o(∥x − x0 ∥ ) = ... = x1 x2 + x23 + 1 + o(∥x − x0 ∥ ).
2
Remark: the term o(∥x − x0 ∥ ) is referred to as the rest or error of the
expansion as we move away from x0 . Notice that under the Euclidean
2
norm, at x0 ∥x − x0 ∥ = (x1 − 1)2 + (x2 − 1)2 + x23 , then o(..) implies that
the error will converge to 0 faster than such norm. We come back to this.
3. Consider the function f : R2 → R, f (x1 , x2 ) = −(x21 − 1)2 − x22 . Show
that (1, 0), (−1, 0), and (0, 0) are stationary points and by evaluating the
Hessian at those points infer whether they are also optimal. Hint: a
negative definite Hessian would define maxima at stationary points, while
an indefinite matrix would lead to a saddle point.
Solution The gradient of the function if given by
−4x1 (x21 − 1)
 
∇f (x) = .
−2x2
Observe that for the points (1, 0), (−1, 0), and (0, 0), the gradient is zero,
thus these points are stationary. Also, the Hessian is given by
−12x21 + 4 0
 
Hf (x) = .
0 −2

Observe that the eigenvalues of Hf (x) are −12x21 + 4 and −2, which
are both negative for the points (1, 0), (−1, 0), in that case the points
(1, 0), (−1, 0) are local maxima as stationary points. On the other hand,
the eigenvalues of Hf ((0, 0)) are 4, −2 both with different signs, then
Hf ((0, 0)) is an indefinite matrix which implies that (0, 0) is a saddle
point.

End solution exercises for Part 1


2 Part 2: Unconstrained optimization
Exercises 11
 
5 5 9
1. Consider matrix A −1 1 1. Determine the sign of A and check if
  1 3 5
0
the origin 0 = 0 is a minimum or a maximum of the quadratic form
0
T
x AX.
Solution Matrix A is notsymmetric, so we generate a symmetric matrix
5 2 5
B = 1/2(A + AT ): B = 2 1 2 whose sign is positive semi definite,
5 2 5
being b11 = 5, Det(B2 ) = 1, Det(B) = 0.  The
 quadratic form is then
0
p.s.d. and the function is convex with 0 = 0 a global minimum of the
0
function.
2. Let f (x1 , x2 , x3 ) = x21 +2x1 x2 −2x1 x3 +8x22 +4x2 x3 +3x23 . Determine: (a)
the stationary points of the function, (b) whether the function is concave
or convex and (c) local and/or global maxima and minima.
 
1 1 −1
Solution The function defines a quadratic form xT Ax with A =  1 8 2 .
     −1 2 3
2 2 −2 x1 0
We derive the FONC:  2 16 4  x2  = 0
−2 4 6 x3 0
Given the full rank of the coefficient matrix we can conclude that the null
vector is the only stationary point.
For
 (b) we evaluate
 the Hessian at the origin and see that Hf (0) =
2 2 −2
 2 16 4 . Then: we notice that the Hessian is p.d. and thus
−2 4 6
the function is convex but also that it is equal to 2A according to ma-
trix derivation rules. For (c) due to the convexity, the origin is a global
minimum.

3. Let f (x1 , x2 ) = x21 · (x2 − 1). Determine if any the minimum or the maxi-
mum of this function using Second-Order-Sufficient-Conditions (SOSC).
Solution We have solved
( this in class. Let’s first consider the first order
2x1 (x2 − 1) = 0
conditions: ∇f (x) = Then x1 = 0 and x2 is a free
x21 = 0
 
0
variable. All points along the x2 axis are stationary: x∗ = , c ∈ R.
c
We now  consider second
 order conditions through the Hessian evaluated
∗ 2c − 2 0
at x : .
0 0
Then the Hessian is (i) p.s.d. for c > 1 and all stationary points are local
minima; (ii) n.s.d. and all stationary points
 are local maxima; for c < 1
0
and (iii) indefinite for c = 1 so at x∗ = the function has a saddle
1
point.
4. Determine the local and global minima and maxima of the function f (x1 , x2 ) =
ln(x21 ) · ln(x2 ). Apply First and Second Order Necessary Conditions.
Solution The domain of the function requires x1 ̸= 0 and x2 > 0. So
there may not exist a global min or max. Indeed we see that if you
fix x2 > 1 then ln(x2 ) × limx1 →∞ ln(x21 ) = +∞ while for x2 < 1 then
ln(x2 ) × limx1 →∞ ln(x21 ) = −∞ so the function is unbounded. Consider
now FONC: ! (
2
x1 ln(x2 ) ln(x2 ) = 0
ln(X12 ) = 0. Then leads to x2 = 1 and x1 = +/ − 1,
x2 ln(x21 ) = 0
   
1 −1
with 2 stationary points x1 = and x2 = .
1 1
−2ln(x2 ) 2
!
x21 x1 x2
Consider now SONC, with the Hessian H(x1 ) = 2 −ln(x21 ) =
x1 x2 x22
 
0 2
, then we see that the quadratic form xT1 H(x1 )x1 = 4x1 x2 changes
2 0
sign depending on the signs of the coordinates x1 and x2 . Same conclusion
for x2 and thus we conclude that the Hessian is indefinite and the function
does not admit neither global nor local min or max.

Exercises 12
1. Apply a first-order algorithm to determine the minimum of f (x) = (1−x2 )·
2
e−x . Write the solution of the problem. Hint: determine the stationary
points and study the limits for x going to +/ − ∞. Check if the function
is even.

Solution The derivative of f is


df 2
= e−x (2x(x2 − 2)).
dx
df
The stationary points are determined by 0 = dx , then the stationary
points are √
0, ± 2
Since, the function x → x2 is even, then f is also even. In addition,

limx→∞ f (x) =√limx→−∞ f (x) = 0. Observe that 1 = f (0) ≥ g(± 2) =
−e−2 . Thus, ± 2 are global minimum.

Figure 3:

2. Apply again a first order algorithm to determine min and/or max of f (x) =
x4 + 4x3 + 1.

Solution The derivative of f is


df
= 4x3 + 12x2
dx
df
The stationary points are determined by 0 = dx , which are 0, −3. Since

lim f (x) = lim f (x) = +∞,


x→−∞ x→∞

and f (0) = 1, f (−3) = −26, then −3 is a global minimum. Observe that


df
the derivative satisfies dx ≥ 0 for −3 < x < ∞, therefore f is increasing
in −3 < x < ∞. Thus, the point 0 is neither minimum nor maximum,
then 0 is a saddle point.

Figure 4:
3. Apply 1-d Newton method to find the minimum of f (x) = x4 − 2x − 5.
Let x0 = 0.5.

Solution This problem and the following one are the two I have discussed in
class and for which you have the excel file with the two methods’ iterations.
I refer you to that file as for the solutions. Here I give some more details
to frame properly the two methods.
Under Newton’s method we reach the optimum at x∗ = 0.7937, where
f (x∗ ) = −6.1905 with first derivative f ′ (x∗ ) = 0.000032. The algorithm
stops after 4 iterations because before starting the iterations we have set
a given tolerance, say f ′ (xk ) ≤ 0.00005, and then we take the associated
values as valid solution of the problem. The starting point was x0 = 0.5.
In general the definition of the starting point is based on few function
evaluations after checking the domain of the function and its functional
form. At every iteration in this method we need to keep track of first and
second derivatives.
4. Same function as in the previous question, apply now Fibonacci method.
Hint: let ϵ = 0.1 and N | F1+2ϵ
N +1
= 0.06 .

Solution For the same function as in the previous problem, we see that
here with an iterative method driven by the Fibonacci sequence, after 6
iterations, the minimum of the function is set at f (x∗ ) = −6.1898. Fi-
bonacci’s method requires a series of initial inputs, from which we derive
the starting points a0 and b0 at which first to compute f (.), the initial Fi-
bonacci ratio and the number N of iterations needed to solve the problem.
In particular:
• Given an ϵ > 0 we set N as that integer for which F1+2ϵ
N +1
≤ c, c being
the range reduction we wish to attain between the final and initial
evaluation interval.
• The first Fibonacci ratio will then beρ1 = 1 − FFNN+1 . At every it-
FN −1 FN −2
eration, selectively, given ρ2 = 1 + FN , ρ3 = 1 + FN −1 etc, until
F1
ρN = 1 − F2 , selectively we restrict the range of evaluation.
5. Apply Newton method to find the minimum of f (x) = −5x2 + x. Let
x0 = 0.

Solution
Here you are given the initial value x0 = 0, so you just have to complete
the first evaluation before starting the iterations: f (x0 ) = 0, f ′ (x0 ) =
10x0 − 1 = −1 abd f ′′ (x0 ) = 10 which is positive, the function is convex
and we can set k = 1:
1
k = 1 : x(1) = 0 − (− 10 ) = 0.1, f (x(1) = −0.5, f ′ (x(1) = 10 ∗ 0.1 − 1 =
0, f ”(x(1) = 10. First derivative 0 (FONC) plus second positive
(convexity), we STOP.
• Notice that in this problem the first derivative at the stationary point
is 0, full stop. In the previous exercise we stopped due to the tol-
erance. Then in this problem a second iteration would have been
absolutely ineffective with the optimal value remaining exactly the
same, while in the pervious we may have requested an even smaller
error and thus the algorithms wouldn’t be terminated.

Exercises 13
1. Consider the quadratic function f (x) = 12 xT Qx := x21 + 2x22 + 4x23 + 2x1 x3 .
Apply the steepest descent algorithm to determine the minimum of this
function with initial condition x(0) = (1 1 1)T . Verify the orthogonality
of x(k) − x(k−1) ⊥ x(k+1) − x(k) for k = 2, 3, ... Verify the updating of αk
at every iteration. Estimate the speed of convergence of this algorithm.
Compare the results with the case of constant step length α = 0.2 in
the updating of x(k) . Hint: you can try to solve this type of problem by
computing the iterations in an excel worksheet. You may be asked to show
only a pair of iterations in an exam context.

Solution See the excel file I have uploaded in Blackboard to gather the formulae
implemented. Here I attach a screenshot of the excel sheet with the iterations.
You may notice that I stopped after 6 iterations but indeed the gradient was
not yet 0 (column L).

Exercises 14
Apply Newton’s method to minimize the following functions:
1 T 2 2 2
1. From the previous
  exercise: f (x) = 2 x Qx := x1 + 2x2 + 4x3 + 2x1 x3 .
1
Let x(0) = 1. Show that the method converges in just one iteration.
1
Solution The matrix Q is given by
 
2 0 1
Q = 0 4 0
1 0 8

. For k = 1, we have

x(1) = x(0) − F (x(0) )−1 g (0) = x(0) − Q−1 g (0)


          
1 0, 533 0 −0.0664 3 1 1 0
= 1 −  0 0, 25 0  4 = 1 − 1 = 0
1 −0.0664 0 0, 1333 9 1 1 0
Figure 5: Steepest descent. Exercise 13
   
2 6
Let x(0) = 2 then g (0) =  8 , and
2 18
   
2 0
Q−1 g (0) = 2 =⇒ x(1) = 0
2 0

See also the excel file that was uploaded on BB.

Figure 6: Newton Method. Exercise 14.1


 
1
2. f (x1 , x2 ) = 2x21 + x22 − x1 x2 − 3x1 + x2 − 5, with initial point x (0)
= .
1

Solution We apply Newton to minimize f (x1 , x2 ). The iteration stops for


k = 2, and the minimum value obtained is −6.14286. See the excel file
attached.

3. f (x1 , x2 ) = x21 (x2 − 1). First verify First and Second order conditions,
then apply Newton’s method to determine   a local optimum in the domain
(0) 1
x2 > 1, with initial condition x = . Apply a termination criterion
2
and verify convergence
 toa minimum. Apply the same algorithm now
(0) 0.00001
adopting x = .
1.00001
   
2x1 (x2 − 1) 2x2 − 2 2x
Solution We have ∇f (x) = and Hessian F (x) = .
x2 2x 0
Then we see that from FONC x1 = 0 while x2 is free over R. SONC re-
quire for positive or negative semi-definiteness x2 ̸= 1. Let then x2 > 1
Figure 7: Newton Method. Exercise 14.2

     
(0) 1 (0) (0) 0 2 (0) 2 2
and x = , f (x = 1. Then ∇f (x = g = ,F = ,
2 1 2 0
 
0 0.5
thus [F (0) ]−1 = , leading to
0.5 −0.5
     
1 0.5 0.5
x(1) = − = , f (x(1) = 0.125
2 0.5 1.5
     
(2) 0.5 0.25 0.25
x = − = , f (x(2) = 0.015625
1.5 0.25 1.25
     
(3) 0.25 0.125 0.125
x = − = ,
1.25 0.125 1.125
     
0.125 0.0625 0.0625
x(4) = − = , f (x(4) = 0.000244. And so on
1.125 0.0625 1.0625  
0
until convergence with given tolerance to x∗ = with optimal value
1
f (x∗ ) = 0.
 
(0) 0.00001
Consider the case in which x = . The gradient is g (0) =
1.00001
   
0.00001 0.00 50000.00
and the inverse of the Hessian would be F −1 (x(0) ) =
1E − 10 50000.00 −50000.00
leading to an update from x(0) to x(1) subject to numerical instability
caused by the Hessian inversion step.
Exercises 15
   
2 1 1 1 1
1. Consider the system Ax = b in R with b = and A = 2 :
1 1 + 10−10 1.1 − 10−10
solve to determine x. Let now the element a22 of A to change from
0.55(1 − 10−10 ) to 0.51(1 − 10−10), then −10
 to 0.5(1 − 10 ). Determine
x1
the inverse A−1 and solve for x = . Evaluate the impact on the solu-
x2
tion of a negligible change of vector b. Compute the condition number of
the system. Hint: for the condition number let ||A||2 = max||x||=1 ||Ax||2
and determine k(A) := ||A||2 ·||A−1 ||2 . This problem aims at clarifying the
impact of ill conditioning of the Hessian matrix in the Newton’s updates.

Solution The matrix A is non singular with determinant  Det(A)=0.025. 


∗ −1 22 −20 1
Then we can solve the system through x = A b = =
  −20 20 1
2
. Now assume a minor perturbation of a22 from 0.55(1−10−10
−2 · 10−9
to 0.51(1−10−10 ) andthen to 0.5(1−10 
−10
). We have asinverse matrices,

−1 102 −100 ∗ 2
respectively A = leading to x = and
 −100 100 −1 · 10−8
  
−99999999 9999999 1
then A−1 = leading to x∗ = . In this
9999999
 −9999999  1 
1.01 −1, 999, 999
last case consider b = , then we would have x∗ = .
0.99 +1, 999, 999
Miles away from the solution. The condition number of the A matrix
as (a22 ) → 0.5, computed as k(A) = ||A|| · ||A−1 || → ∞. Notice that
k(A) = ||A|| · ||A−1 || = λλM (A)
m (A)
. Quasi Newton methods are conceived to
avoid such divergences in Newton iterations.

2. Consider again problem f (x1 , x2 ) = x21 (x2 − 1) and let directly consider
the case for x2 > 1. Determine the minimum of the function  by applying
1
directly a rank-1 QN algorithm with H0 = I2 and x(0) = . Show that
2
the algorithm won’t converge to the minimum.

Solution Notice that H0 is given


  (not only symmetric
  but even diagonal
1 2
and equal to I!). For x(0) = , we have g (0) = , then
2 1
       
1 1 0 2 −1
x(1) = − · = , from which: f (x(1) ) = 0 and
2 0 1 1 1
 
0 (1)
g (1) = . Notice that x2 = 1 which does not satisfy x2 > 1. Let’s
1
keep this value for the time being.
   
(0) (1) (0) −2 (0) −2
We can compute ∆x = x − x = and ∆g = .
−1 0
 
The rank 1 Hessian update is based on the increment ∆x(0) − H0 ∆g (0) ·
T T 
given by {∆g (0) ∆x(0) − H0 ∆g (0) }−1 .
 (0) 
∆x − H0 ∆g (0) with a step  length

−1
Now the problem is that at the denominator is 0 and the Hessian
1
is indefinite.
T
[∆x(0) −H0 ∆g(0) ]·[∆x(0) −H0 ∆g(0) ]
H1 = H0 + T
∆g (0) [∆x(0) −H0 ∆g (0) ]
is indefinite.

Still avoiding x2 = 1 and introducing in the iteration at every iteration a


value strictly greater than 1 you can see in the excel file v2 on BB that
the Hessian update won’t work and the gradient is diverging.

3. To the same function of the previous


  problem, apply now a DFP algorithm
(0) 2
now with starting point x = . Let the step size be α = 0.25.
2

   
4 1 0
Solution We now have: f (x(0) ) = 4, g (0) = and H0 = . Then
4 0 1
         
(1) 2 1 0 4 1 (1) (1) 0
x = − 0.25 · = , then f (x ) = 0, g = ,
2 0 1 4 1 1
   
−1 −4
∆x(0) = and ∆g (0) = , leading to the following Hessian update1 :
−1   −3    
1 1 1 1 16 12 0.4 −0.44
H1 = H0 + 25 − 25 = leading to the
1 1 12 9 −0.44 0.68
second iteration:
      
1 0.4 −0.44 0 1.11
x(2) = −0.25 · = , with f (x(2) ) = −0.20946
1 −0.44 0.68 1 0.83
 
(2) −0.3774
and gradient evaluated in this point g = .
1.2321
From which the algorithm iterations will determine ∆x(1) , ∆g (1) , H1 ∆g (1) , ∆x(1) −
H1 ∆g (1) with tight control of the Hessian update: however the algorithm due
to the singularity at the origin will not converge either.

Exercises 16
Least square estimation

1. Apply least square estimation todetermine  the best linear interpolation


1 2 4
Y = aX + b in R2 to the values: where every column denotes
1 2 3
the observations i = 1, 2, 3 of {Yi , Xi }. Estimate the error e = ||Y − aX −
b||2 = (Y − aX − b)T (Y − aX − b).

Solution
h ih iT
T Hk ∆g (k) Hk ∆g (k)
1 We ∆x(k) ·∆x(k)
have Hk+1 = Hk + T − T
∆x(k) ∆g (k) ∆g (k) ·Hk ·∆g (k)
Using Excel and its =LINEST() function we obtain that the fitted regres-
sion line is y = 1.5x − 0.667. The estimated error is 0.166.

Figure 8:

2. Consider now a more extended case problem and determine the LS esti-
mator b ∈ R3 associated with the observations (first row the endogenous
variable Y and following rows the explanatory variables X1 , X2 .
   
Y 12.5 9.8 11.2 6.9 10.0 13.2 11.7
X1  =  6.4 7.3 8.4 6.5 7.6 6.9 8.1 .
X2 13.6 12 11.1 13.3 12.2 11.9 12.5
Compute the linear approximation error. Explain the rationale of the LSE
method. Which function are we minimizing?

Solution
2
We are minimizing the function of the estimated error e(a, b, c) = ∥Y − aX1 − bX2 − c∥ .
The importance of the LSE method lies in the fact that we look for the variables
a, b, c such that square sum of the errors is minimum and that is a way to find
a model which fit the data.
After using the Data Analysis tool in Excel we obtain the following estima-
tions:
Therefore a = −0.127309247, b = −0.667095567 and c = 19.94124422 and
the minimum error at this point is 24.9548771. Thus the endogenous vari-
able Y has a negative dependence on the two endogenous variables X1 and X2
with however a relatively high constant term. The method identifies the linear
combination of X1 and X2 for which the squared error between the sampled
values and the linear fitting is minimum. In general LSE can also be applied
assuming a polynomial fitting rather than a linear fitting, in which case however
it
 will
 become a nonlinear programming problem and the postulated solution
a
 b  = X T · X −1 X T · b
 

c
Figure 9:

END of PART 2 Questions


3 Part 3: Linear programming
Exercises 17
Simplex method
 
2 1 3
1. Let A = 1 0 1. Construct the augmented system [A I] and de-
3 4 5
termine the sequence of elementary  E1 , E2 , ..., En such that En ·
 matrices
En−1 · E1 = A−1 and [A I] → I A−1 . (Hint: solve in an excel file)
Solution

Figure 10: See excel file on BB


2. Apply the simplex method to solve the LP problem: maxx≥0 2x1 + 5x2
s.t. x1 ≤ 4, /x2 ≤ 6, x1 + x2 ≤ 8.
Solution

Figure 11: See excel file on BB

3. Apply simplex to solve the LP problem: maxx≥0 5x1 + 4x2 + 6x3 s.t.
x1 − x2 + x3 ≤ 20, 3x1 + 2x2 + 4x3 ≤ 42, 3x1 + 2x + 2 ≤ 30.
Solution

Figure 12: See excel file on BB


Exercises 18
LP duality

1. Determine the dual problem of the primal LP: maxx≥0 2x1 + 5x2 + x3 s.t.
2x1 − x2 + 7x3 ≤ 6, x1 + 3x2 + 4x3 ≤ 9, 3x1 + 6x + 2 + x3 ≤ 3. Verify the
duality gap and complementary slackness. Hint: always put the primal in
standard form and then derive the dual

Solution
First we remind the standard form of a general program with the same
size and its dual.

(a) The primal linear program in the standard form is:

min[cT 0T ]x (2)
x

subject to
[A I] x = b
with x = (x1 , · · · , x6 ) ≥ 0.
(b) The dual problem is
max λT b (3)
lambda

subject to
λT [A I] ≤ cT 0T
 

which is equivalent to λT A ≤ cT and λ ≤ 0.

The Tableu of (2) is

Table 1: Tableau of the primary linear problem


a1 a2 a3 a4 a5 a5 b
2 -1 7 1 0 0 6
1 3 4 0 1 0 9
3 6 1 0 0 1 3
cT -2 -5 -1 0 0 0

Applying the simplex method we get after few iterations:

a1 a2 a3 a4 a5 a5 b
15 6 1 39
43 0 1 43 0 43 43
4 21 25 146
- 43 0 0 - 43 1 - 43 83
19 1 7 15
43 1 0 - 43 0 43 43
24 1 36
rT 43 0 0 43 0 43

with optimal solution cT x∗ = 114


43 .
We derive the dual solution as:

λT D = cTD − rD
T

 
2 1 0
24 1 36
  
λ1 λ2 λ3 3 1 0 = −2 0 0 − 43 43 43 .
3 0 1
Therefore,
λT = − 43
1
− 36

0 43

is the solution to the dual problem.


T
Note that rD = cTD − cTB B −1 D = cTD − λT D where λT = cTB B −1 . We also
have λT B = cTB then:
 T
λ D = cTD − rD T
, λT B = cTB ⇒ λT A = cT − rT .

The primal problem is:


max cT x
subject to
Ax = b, x ≥ 0
which has solution:
x∗1
   
0
x∗ = x∗2  = 0.348837
x∗3 0.906977

and x∗5 = 4.325581 subject to x∗1 + 3x∗2 + 4x∗3 + x∗5 = 9.


The dual problem is:
min λT b
subject to
λT A = cT , λ ≥ 0
which has solution (λ∗ )T = (0.02326, 0, 0.83721).
Now we verify the duality gap:
(a)
cT x∗ = 2x∗1 + 5x∗2 + x∗3 = 2.651163
(λ∗ )T b = 0.02326 · 6 + 0 · 9 + 0.83721 · 3 = 2.651163

(b) Dual feasibility


T
λ ∗ A ≤ cT
from the LP form with:

cT = (−2, −5, −1)


 
1 36
(λ∗ )T = − , 0, −
43 43
 
2 −1 7
A = 1 3 4
3 6 3
Therefore,
(λ∗ )T A = (−2.558, −5, −2.674).
2. Consider the primal problem maxx≥0 (x1 −x2 ) s.t. x1 +x2 ≤ 23 , 2x1 +x2 ≤
2. Derive the dual problem and provide a graphical solution to both
problems, separately, preserving the symmetric form of PLP and DLP.
Verify the solution by determining the Lagrangean associated to the primal
and dual problems. Verify complementary slackness. Can you provide
an economic interpretation of the primal and dual problems and their
solution?
Solution The primal problem is

max x1 − x2
3
s.t. x1 + x2 ≤
2 (4)
2x1 + x2 ≤ 2
x1 , x2 ≥ 0

Graphically, the solution for the primal problem is given by

Figure 13:
Dual problem, interpretation

3
min λ1 + 2λ2
λ 2
s.t. λ1 + 2λ2 ≥ 1 (5)
2λ1 + λ2 ≥ −1
λ1 , λ2 ≥ 0

Graphically, the solution for the dual problem is

Figure 14:

In summary, the primal problem in the matrix form


 
x1
max (1, −1)
x x2
    3
1 1 x1
s.t. ≤ 2 (6)
2 1 x2 2
 
x1
≥0
x2
has the dual  
3 λ
min ( , 2) 1
λ 2 λ2
    
1 2 λ1 1
s.t. ≥ (7)
2 1 λ2 −1
 
λ1
≥0
λ2
and the solutions are
 
1
max → (1, −1)x∗ (1, −1) =1 (8)
x 0

 
3 ∗ 3 1
min → ( , 2)λ = ( , 2) 1 = 1 (9)
λ 2 2 2

The P / D relationship is satisfied for the optimal solution

cT x − λ T b = 0 (10)

The computation for the Lagrangian is

3
L(x, λ) = x1 − x2 − λ1 (x1 + x2 − ) − λ2 (2x1 + x2 − 2)
2 (11)
3
= (1 − λ1 − 2λ2 )x1 + (−1 − λ1 − λ2 )x2 + λ1 + 2λ2
2

Thus, the optimal max in the primal problem is equal to the min in the
dual problem. We check the complementary slackness condition
• λ∗ (Ax∗ − b) = 0
     3   
1 1 1 1 1 −0.5
[0, ] − 2 = [0, ] =0 (12)
2 2 1 0 2 2 0

• (cT − λ∗ T A)x∗ = 0
      
1 1 1 1 3 1
(−1, 1) − (0, ) = (0, − ) =0 (13)
2 2 1 0 2 0

Exercises 19
Interior point method
Also here you better analyse the problem and solve it in excel.
1. Consider the problem minx≥0 5x1 + 4x2 + 8x3 s.t. x1 + x2 + x3 = 1.
This is in canonical but not restricted form: derive the restricted form
appropriate to apply Karmarkar’s IP method.
Solution
In order to formulate the problem in so called restricted form, all you
need to do is to translate the coefficient vector in such a way that the
cost function (recall that this is an hyperplane) will go through the origin.
Then in this simple case you see that since 1T x = 1 the minimum cost is
reached for x2 = 1 and thus the translated problem will be minx≥0 (5 −
4)x1 + (4 − 4)x2 + (8 − 4)x3 = x1 + 4x3 which is the objective function of
the following problem.

2. Consider the LP problem minx≥0 x1 + 4x3 s.t. 2x1 + x2 = 1, x1 + 3x3 =


4, x1 + 4x2 + x3 = 6, Derive Karmarkar’s canonical formulation and apply
few iteration’s to determine the optimal solution.
Solution We display the two iterations thaT you can find in the excel
file. Karmarkar problem is first formulated in canonical form and then we
apply the IPM iterations, shown here below.

Figure 15: Iteration 1, see excel file on BB


Figure 16: Iteration 2, see excel file on BB

END of PART 3 Questions

4 PART 4: Nonlinear programming


Optimization problems with equality constraints
Exercises 20
1. Let f (x1 , x2 ) = −(x21 + x22 ) subject to h(x1 , x2 ) − (x1 − 2)2 + x22 = 1. Find
the max and min of f (x) s.t. h(x) = 0 with the Lagrange method.
Solution
 We specify the Lagrange function L(x1 , x2 , λ) = −x21 − x22 −
λ( (x1 − 2)2 + x22 − 1 and derive the first order necessary conditions (FONC)
for this problem:

∂L
= −2x1 − 2λ(x1 − 2) = 0
∂x1
∂L
= −2x2 − 2λx2 = 0
∂x2
∂L
= (x1 − 2)2 + x22 − 1 = 0
∂λ
Consider the second partial derivative, we see that as NC we have x2 = 0
∂L
or λ = −1. From λ = −1, however would be inconsistent with ∂x 1
= 0.
Let then x2 = 0:
• From the third partial derivative we derive x1 = 3 or x1 = 1.
• Consider x1 = 3: from ∂L
∂x1 we get λ = −3.
• For x1 = 1 we have λ = 1
We then have two candidates for max and min: x∗1 = (3 0)T and x∗2 =
(1 0)T , the first would lead to f (x∗1 ) = −9, the second to f (x∗2 ) = −1.
The former is then the constrained minimum of the function and the latter
constrained maximum.

2. Consider the function f (x1 , x2 , x3 ) = x21 + 2x22 + 3x23 s.t. h1 (x1 , x2 , x3 ) =


x21 + x22 = 25 and h2 (x) = x1 + x3 = 2. Find the constrained max and min
using the Lagrange method.
Solution We have a constrained problem with 3 unknowns to be deter-
mined and 2 equality constraints. We need to define the Lagrange function
and then consider 5 equations for the first order optimality conditions of
the problem L(x, λ) = x21 + 2x22 + 3x23 − λ1 (x21 + x22 − 25) − λ2 (x1 + x3 − 2),
from which we derive:

∂L
= 2x1 − 2λ1 X1 − λ2 = 0
∂x1
∂L
= 4x2 − 2λ1 x2 = 0
∂x2
∂L
= 6x3 − λ2 = 0
∂x3
∂L
= x21 + x22 − 25 = 0
∂λ1
∂L
= x1 + x3 − 2 = 0
∂λ2

From the second partial derivative with respect to x2 we recover x2 = 0


or λ1 = 2. Consider the two cases separately:
• x2 = 0: then from ∂λ ∂L
1
∂L
= 0 and ∂λ 2
= 0 we derive x1 = 5 and
x3 = −3 as well as x1 = −5 and x3 = 7.
• λ1 = 2:we derive from every set of conditions: λ2 = −2x1 , λ2 = 6x3
then x1 = −3x3 . Furthermore x3 = −1 from which x1 = 3 and
λ2 = −6. Finally from the partial derivative with respect to λ1 we
have x22 = 25 − 9 = 16, then x2 = +/ − 4.
To sum up we have 4 points satisfying the first order necessary conditions:
x∗1 = (5 0 − 3)T , x∗2 = (−5 0 8)T , x∗3 = (3 4 − 1)T and x∗4 =
(3 − 4 − 1)T , with one max equal to 172 at x∗2 and two minima equal
to 44 at x∗3 and x∗4 .

3. Find the max and min of f (x) = x1 · x2 s.t. h(x) = x1 + x22 = 1. Solve by
substitution.
Solution In this problem with only two variables we want to solve by
direct substitution and then compute the optimal with respect to one
variable only. Let x1 = 1 − x22 . Then f (x2 ) = x2 (1 − x22 ) resulting into
first order condition dfdx
(x2 )
= 1 − 3x22 = 0, thus x2 = +/ − √1 . Then
2 (3)
d2 f (x2 )
dx22
= −6x2 = √6 > 0 for x2 = − √1 is a local minimum, while for
(3) (3)
2
d f (x2 )
x2 = − √1 the function is concave, dx22
= −6x2 = − √6 < 0 for is a
(3) (3)
local maximum. Now we can derive x1 from the constraint function, and
we have x1 = 1 − 31 = 23 with value of the objective function f (x1 , x2 ) =
− √2 a local minimum and f (x1 , x2 ) = √2 a local maximum.
3 (3) 3 (3)

4. Determine the min and max of f (x) = x21 + x22 s.t. h(x) = x1 + x2 = 1
first by substitution and then by Lagrange method. Verify the optimality
results from the lecture notes on the solution of linearly constrained QP.
Solution It can be solved by substitution and then we can double check
by solving the problem using FONC from the Lagrange method (remark:
this being a convex function over a convex set, FONC would do and the
stationary point will be unique).

• Substitution: x1 = 1 − x2 , then f (x2 ) = 2x22 − x2 + 1, then f ′ (x2 ) =


4x2 − 2 = 0 leads to x2 = 21 , since f ′′ (x2 ) = 4 the function is convex
and x∗2 = 1/2 = x∗1 is a minimum, resulting in f (x∗1 , x∗2 ) = 12 .
• Lagrange method. Let L(x, λ) = 12 (2x21 + 2x22 ) + λ(1 − x1 − x2 ). Then
we have three optimality conditions to satisfy:

∂L
= 2x1 − λ = 0
∂x1
∂L
= 2x2 − λ = 0
∂x2
∂L
= 1 − x1 − x2 = 0
∂λ

with a unique solution x∗1 = x∗2= 21 λ∗ and since must be λ∗ = 1 we


1/2
recover the solution x∗ = .
1/2
As a linearly constrained QP this problem admits actually a solution which
is general and can be determined by matrix inversion. Let’s verify. This
QP have a problem in the form minx 12 xT Qx s.t. Ax = b. Consider
L(x, λ) = 12 xT Qx + λT (b − Ax). First order conditions on the Lagrange
function lead to the solution x∗ = Q−1 AT λ∗ and λ∗ = (AQ−1 AT )−1 b.
    −1
1/2 0 1
Then we have in this exercise: λ∗ = (1 1) ·1 = 1
0 1/2 1
    
1/2 0 1 ∗ 1/2
and x∗ = λ = which confirms the result. Then
0 1/2 1 1/2
this is the approach to be generally considered when solving a linearly
constrained QP program.

5. Determine the max and min of f (x1 , x2 ) = x21 x2 −x32 +x21 on the boundaries
of a triangle with vertices O = (0, 0), B = (2, 0) and C = (0, 2). Notice
that the feasible region is not the area of the triangle but only the sides.
Determine the max and min by function evaluation through the triangle
edges.
Solution The objective is a bit nasty function, as we can see in the next
figure:

Figure 17: Objective function for problem 5 in this section

However we know thanks to Wasserstein theorem (from calculus) that on


a compact domain any function has a min and a max.
Consider the edge O − B, then x2 = 0 and we get f (x1 , ·) = x21 with
x1 ∈ [0, 2] with a minimum at 0 and a maximum at 2. Thus the points
   
0 2
x= and x = are candidate extremes.
0 0
Likewise consider the edge O − C, then  = 0 and weget f (x1 , ·) = −x32
x1 
0 0
with x2 ∈ [0, 2]. Again the points x = and x = are candidate
0 2
extremes.
Finally on the edge B − C we have x2 = −x1 + 2 resulting into the
restriction f (x1 , ·) = x21 (−x1 + 2) − (−x1 + 2)3 + x21 = −3x21 + 12x1 − 8
for x1 ∈ [0, 2]. We see that this is a concave parabola with an extreme at
x = (2 4)T . Thus the restriction leads to candidate minimum at x1 = 0
and maximum at x1 = 2.
Taking all the three edges
 into consideration
  we have
 that the extremes

∗ 0 ′′
∗ 2 ′′′
∗ 0 ′
should be at x = or x = , or x = : we have f (x ∗ ) = 0,
0 0 2
′′ ′′′
f (x ∗ ) = 4, which is the constrained max, and f (x ∗ ) = −8 which is the
constrained min.
Remark : the problem is solved by substitution and function evaluation.
Why? there are several reasons: first of all we see that we only have two
unknown x1 and x2 , so the problem can be solved in R2 by substitution.
Second we see at first inspection that the Lagrangean would include a
nonconvex objective function and a feasible region with linear constraints
that at the vertices are non differentiable! The objective non convexity
also discourages the adoption of a method based on level sets as in the
following exercise. The problem however must have a solution due to the
compact feasible region, no matter how nasty is the function. It is also
important to notice that for x1 = 0 or x2 = 0 due to the nonnegativity
constraints we see that the two partial derivatives are associated either
with a (decreasing) concave function f (0, x2 ) = −x32 , x2 ∈ [0, 2] or with a
convex function f (x1 , 0) = x21 , both then with easy to determine min and
max. For x1 , x2 jointly positive and varying between 0 and 2 the function
is concave when expressed in terms of x1 .

6. Find the max and min of the function f (x1 , x2 ) = x1 + x2 s.t. h(x1 , x2 ) =
x21 + x22 = 1. Apply Lagrange and the level curves methods.
Solution This is a quadratically constrained problem with linear objective:
both the obj function and the constraints are convex and defined over a
compact set. We use this problem to compare the solutions (need to be
equivalent!) of two methods that in general are alternative to each other.

• Level curves method: Let fc (x) = {(x1 , x2 ) ∈ R2 |f (x) = c} =


{(x1 , x2 )|x2 = c − x1 } so as c increases these are all negatively sloped
parallel lines. The feasible region on the other hand is the circle cen-
tered at the origin and with radius equal to 1. For c = 0 we see that
the line x1 + x2 = 0, x1 = −x2 goes through the origin and this is
the bisector between the second and fourth orthants. For c = −1
the line has equation x2 = −1 − x1 and for c = 1 the level curve
is x2 = 1 − x1 . All level curves are linear functions and the solu-
tions (one for the min and one for the max) are the coordinates of
f (x1 , x2 ) = +/ − c of the lower and upper tangent lines to the circle
x21 + x22 = 1. To find these coordinates we need to solve the systems:

x 1 + x 2 = c

x21 + x22 = 1

c>0

and 
x 1 + x 2 = c

x21 + x22 = 1

c<0

Consider the substitution x1 = c − x2 into the second equation and


then we discriminate between the two cases of positive (max) or neg-
ative (min) level sets. We must have 2x 22 − 2cx2 + c2 − 1 = 0 and
2 2
√ of c by imposing c − 2c + 1 = 0, then
we can recover the value
2
c = 2 and c = +/ − √ 2 leads to the solution of the two systems:√in
particular for c = 2 the first system is solved while for c = − 2
the second system is solved. We can then put those values in the two
systems to determine the coordinates
√ of the
 max and the min. For
√ 2/2
the max, c = 2, we have x∗ = √ while for the minimum,
2/2
 √


−√2/2
c = − 2 we have x∗ = .
− 2/2
• Much simpler the solution with the Lagrange method. We formulate
the Lagrangean L(x, λ) = x1 + x2 − λ(x21 + x22 − 1) and set to 0 the
gradients with respect to x1 , x2 and the multiplier. We have
∂L
= 1 − 2λx1 = 0
∂x1
∂L
= 1 − 2λx2 = 0
∂x2
∂L
= 1 − x21 − x22 = 0
∂λ
1
That have solution x1 = x2 = 2λ and 1 − 4λ1 2 − 4λ1 2 = 0 leading to

+ √1 √1 , x1 = x2 = 2
λ = − 2
. We see then that for λ = 2 2 and for

2
λ = − √12 , x1 = x2 = − are stationary points leading to function
2

values
√ in the first case of 2, the maximum and in the second of
− 2, the minimum, That coincide with the values of the two level
curves tangents to the circle in the first solution.
PK
7. Consider the problem of max f (x) = aT x s.t. k=1 x2k = 1 with a, x ∈ Rn .
Apply Lagrange method. Show that as a result of the solution, we have a
proof of the Schwarz inequality.
Solution This problem is not so different from the previous one and indeed
we see that generalizes the previous result to Rn : you are asked to max-
imize a linear objective (an hyperplane) under a spherical feasibility set:
2 Pn
∥x∥ = 1. The Lagrange function is L(x, λ) = k=1 ak xk − λ( k x2k − 1).
P
∂L
From which we have as optimality conditions ∂x = ak − 2λxk = 0 for
akk
every k = 1, 2, .., n leading to the solutionsxk = 2λ and since we have only
P 2
k ak
one constraint from the condition ∂L
P 2
= 0 we recover x
k k = 4λ2 = 1:
+
pP∂λ
from which we have 2λ = − 2
k ak and we can specify the optimal
+ √ ak
coordinates as xk = − P 2 .
k ak
 a1 
 ∗ √P 2
x1 k ak
∗
 √ a2 
 x2    2

We have a maximum at x∗1 = 
P
= k ak  with optimal objective
· · ·  · · · 
x∗n
 
√Pan
2
k ak
pP
value f (x∗k ) = a2.
k k
pP
And a minimum at x∗2 = −x∗1 with f (x∗k ) = − 2
k ak .
Remark The above can be taken as a proof of Schwarz inequality: let ak be
the coordinates of vector v and assume another vector w with coordinates
bk . Then we have
sX sX X sX sX
− a2k b2k ≤ a k bk ≤ a2k b2k
k k k k k

or − ∥v∥ ∥w∥ ≤< v, w >≤ ∥v∥ ∥w∥.

8. Find the max and min of f (x) = x21 + x22 s.t. 4x21 + x22 = 4 adopting the
Lagrange method.
Solution Again a quadratic program with a convex elliptic feasible region.
We have one constraint, thus resulting into one Lagrange multiplier and
two unknowns. We know already that without constraints this is a triv-
ial quadratic program with solution at (0, 0) and optimal value 0. The
Lagrange function is L(x, λ) = x21 + x22 − λ(4x21 + x22 − 4). We have

∂L
= 2x1 − 8λx1 = 0
∂x1
∂L
= 2x2 − 2λx2 = 0
∂x2
∂L
= 4 − 4x21 − x22 = 0
∂λ
From the last condition ∂L ∂λ = 0, along the axes, we have for x1 = 0:
+ +
x2 = − 2, or for x2 = 0 we need x1 = − 1. From the first and second
partial derivatives we see that x1 = x2 = 0 would satisfy the optimality
condition. Alternatively the Lagrange multiplier should be either λ = 41
or λ = 1. But in the first case the partial derivative of L with respect
to x1 would be 0 but leading to an inconsistency in the second condition

and
  viceversa. So we have only
 the following
  stationary points: x1 =
1 −1 0 0
, x∗2 = , x∗3 = x∗4 = . Resulting into the local
0 0 2 −2
minima f (x1 ) = f (x2 ) = 1 and local maxima f (x∗3 ) = f (x∗4 ) = 4.
∗ ∗

9. Consider f (x1 , x2 , x3 ) = 2x1 x2 + x1 x3 + x2 x3 − 4x2 under the constraint


h(x) = x1 + x2 + x3 = 0. Determine the constrained maximum of f (x) by
the Lagrange method.
Solution We specify the Lagrange function: L(x) = 2x1 x2 + x1 x3 + x2 x3 −
4x2 − λ(x1 + x2 + x3 ), with first order conditions

∂L
= 2x2 + x3 − λ = 0
∂x1
∂L
= 2x1 + x3 − 4 − λ = 0
∂x2
∂L
= x1 + x2 − λ = 0
∂x3
∂L
= x1 + x2 + x3 = 0
∂λ

This is a system of 4 equations and 4 unknown with a unique solution x1 =


0, x2 = −2, x3 = 2 and λ = −2. We calculate directly the augmented
Hessian matrix associated with the Lagrange function:
 
  0 1 1 1
0 Dh(x1 , x2 , x3 ) 1 0 2 1
L̃L = T 1 2 0 1.
= 
Dh(x1 , x2 , x3 ) Lx
1 1 1 0
We have 3 unknowns and 1 constraint. We study the sign of the North-
west
 minorof dimension n = 3 and n+m = 4. We see that the determinant
0 1 1
1 0 2 = 4 > 0 (it has sign (−1)2=m+1 ), while the determinant of
1 2 0 
0 1 1 1  
1 0 2 1 0
  = −4 < 0, thus alternating, then x∗ = −2 is a local
1 2 0 1
2
1 1 1 0

maximum with value of the objective function f (x ) = 4.
Optimization problems with equality and inequality con-
straints
Exercises 21
1. Solve the problem maxx∈R2 −(x21 + x22 ) subject to x1 + x2 ≥ 1.
Solution

(a) Since we have one constraint we have one KKT multiplier.

L(x, µ) = −(x21 + x22 ) − µ(1 − x1 − x2 ).

(b) KKT conditions


∂L
= −2x1 + µ = 0,
∂x1
∂L
= −2x2 + µ = 0,
∂x2
µ(1 − x1 − x2 ) = 0.
In the interior of the feasibility region µ = 0 therefore x1 = x2 = 0
∂L
from ∂x j
= 0 for j = 1, 2. However, this point is not in the feasible
region.
(c) On the boundary we have g(x) = 0 with x1 + x2 = 1. Therefore,
x1 + x2 = 1 = µ2 + µ2 and µ = 1, x∗1 = x∗2 = 12 and f (x∗ ) = − 21 .
(d) We check that x1 + x2 → ∞ in the feasible region and therefore:

lim −(x21 + x22 ) = −∞


x→∞,g(x)→−∞

with g(x) = 1 − x1 − x2 .
2. Let f (x) = −7x21 − x22 . Determine maxx∈R2 f (x), subject to x1 ≥ 0,
x2 ≥ −1.
Solution
Let g1 (x1 , x2 ) = −x1 , g2 (x1 , x2 ) = −x2 − 1, therefore the feasible region
is given by g1 (x1 , x2 ) ≤ 0, g2 (x1 , x2 ) ≤ 0.
Remark: f is quadratic, negative definite, concave and g1 , g2 are convex
(linear inequalities).

(a) From

L(x, µ1 , µ2 ) = −7x21 − x22 − µ1 (−x1 ) − µ2 (−x2 − 1)

= −7x21 − x22 + µ1 x1 + µ2 x2 + µ2 .
Therefore the (5=2+1+2=two conditions for the obj fn, one for the
joint constraints and two for the original inequalities) KKT condi-
tions, plus the multipliers non negativity, are:
∂L
= −14x1 + µ1 = 0
∂x1
∂L
= −2x2 + µ2 = 0
∂x2
µ1 x1 + µ2 (x2 + 1) = 0
−x1 ≤ 0, −x2 − 1 ≤ 0
µ1 ≥ 0, µ2 ≥ 0

Hence, µ1 = 0 or x1 = 0, µ2 = 0 or x2 = −1. However,

µ1 = 14x1

µ2 = 2x2
x1 ≥ 0, x2 ≥ −1, µ1 ≥ 0, µ2 ≥ 0.
The solution x2 = −1 implies µ2 = −2 < 0 which is not feasible.
Therefore, µ2 = 0 and x2 = 0.
On the other hand, µ1 = 0 if and only if x1 = 0 and hence µ1 = 0
and x1 = 0. Then the solution is at x∗ = (0, 0) and maxx∈R2 f (x) =
f (x∗ ) = 0.
However, we can use a shortcut:
• We have f concave.
• We have convex constraints gj for j = 1, 2.
• We have KKT constraints.
Therefore, the KKT are sufficient for optimality and at x∗ = (0, 0)
we have a maximum with value f (x∗ ) = 0.

3. Consider the problem minx∈R2 ,x≥0 x1 x2 subject to x1 + x2 ≥ 2, x2 ≥ x1 .


Verify its solution by applying KKT conditions. (This is an example of a
problem that satisfies SONC for optimality, which however turn out to be
not sufficient.)
Solution
Let us define g1 (x1 , x2 ) = 2 − x1 − x2 , g2 (x1 , x2 ) = x1 − x2 .
(a) The Lagrangean is:

L(x1 , x2 , µ1 , µ2 ) = x1 x2 + µ1 (2 − x1 − x2 ) + µ2 (x1 − x2 ).
(b) The KKT conditions are:

∂L
= x2 − µ1 + µ2 = 0
∂x1
∂L
= x1 − µ1 − µ2 = 0
∂x2
µ1 (2 − x1 − x2 ) + µ2 (x1 − x2 ) = 0
2 − x1 − x2 ≤ 0, x1 − x2 ≤ 0.
µ1 ≥ 0, µ2 ≥ 0
(c) From the above FONC, assume µ1 > 0 and µ2 = 0 then x2 = µ1 ,
x1 = x2 , µ1 (2 − x1 − x2 ) = 0. At the boundary (this is the point
in which the inequality associated with the multiplier becomes an
equality), we have µ1 = 1 and therefore x1 + x2 = 2 implies x∗1 =
x∗2 = 1. You can see that all first order KKT conditions are satisfied
at this point. Same result would be generated by assuming µ1 = 0
and µ2 = 1. Notice that the non-negativity condition on the KKT
∂L
multipliers together with the conditions ∂x j
= 0 for j = 1, 2 forces
one or the other multipliers to be 0.
At the point x∗ = (1, 1) we also have ∇g1 (x1 , x2 ) = (−1, −1) and
∇g2 (x1 , x2 ) = (1, −1), so the gradients of the constraint functions
associated with x∗ are l.i. and the point is regular. Furthemore
being regular we also have that T (x∗ ) includes the origin and SONC
are satisfied with both constraints active. Recall that we are looking
for a minimum of f (x) = x1 x2 .
However if we compute the Hessian of the Lagrange function, we
have:

L(x∗ , µ∗ ) = F (x∗ ) + µ1 G1 (x∗ ) + µ2 G2 (x∗ )


with F, G1 , G2 the Hessians of f, g1 , g2 . From ∇f (x1 , x2 ) = (x2 , x1 ),
∇g1 (x1 , x2 ) = (−1, −1) and ∇g2 (x1 , x2 ) = (1, −1) we have:
       
∗ ∗ 0 1 0 0 0 0 0 1
L(x1 , x2 ) = + µ1 + µ2 =
1 0 0 0 0 0 1 0
 
0
Then you see that the vector y = satisfies the condition y T L(x∗ , µ∗ )y =
0
0, but the second order constraints are not satisfied. Indeed f has a
saddle point in the feasible region. To see that SOSC are not satis-
fied, you may take y = (−1, 1) ∈ T (x∗ , µ∗ ) which satisfies y1 = −y2 ,
but y T L(x∗ , µ∗ )y = −2 which is not p.d.. In fact the point is not a
local minimizer!
4. Let f (x) = x1 − 2x22 . Determine x ∈ arg max {f (x)|x1 ≥ 0, x2 − x1 ≥ 4}
which satisfy the KKT conditions.
Solution We write the problem as maxx∈R2 (x1 −2x22 ) subject to g1 (x1 , x2 ) ≤
0, g2 (x1 , x2 ) ≤ 0, with g1 (x1 , x2 ) = −x1 , g2 (x1 , x2 ) = x1 − x2 + 4.
We see already that ∇g1 = (−1, 0) and ∇g2 = (1, −1) are linearly inde-
pendent. Then the candidate x∗ is regular.
(a) The Lagrangean is

L(x1 , x2 , µ1 , µ2 ) = x1 − 2x22 − µ1 (−x1 ) − µ2 (x1 − x2 + 4).

(b) The KKT conditions can be written as:


∂L
= 1 + µ1 − µ2 = 0
∂x1
∂L
= −4x1 + µ2 = 0
∂x2
−x1 ≤ 0, µ1 x1 = 0
x1 − x2 + 4 ≤ 0, µ2 (x1 − x2 + 4) = 0
µ1 ≥ 0, µ2 ≥ 0.
We use the complementary conditions µ1 x1 = 0 and µ2 (x1 −x2 +4) =
0 to obtain µ1 = 0 or x1 = 0 and µ2 = 0 or x1 − x2 + 4 = 0.
• Case 1: µ1 = 0 and µ2 = 0. This is impossible since we must
have 1 + µ1 − µ2 = 0.
• Case 2: µ1 = 0 and x1 − x2 + 4 = 0. Therefore, µ2 = 1 and
x2 = 41 : this would imply x1 = − 15 <
4  0 which contradicts
0
x1 ≥ 0. Hence x∗ = − 15 1

4 , 4 and µ = 1
do not satisfy the
KKT conditions.
• Case 3: x1 = 0 and µ2 = 0. This is impossible since it would
imply µ1 = −1 < 0.
• Case 4: x1 = 0 and
 x 1 − x2 + 4 = 0. In this case x2 = 4,
0 15
x∗ = and µ = satisfy the KKT conditions.
4 16
 
0
In conclusion x∗ = satisfies the KKT conditions and it is
4
the constrained maximum of the function.
5. Let’s now consider a problem with both equality and inequality con-
straints. We want to find max, min of f (x) = x21 +2x22 subject to x21 +x22 ≤
1, x1 + x2 − 1 = 0 by KKT theorem.
Solution
(a) From the constraints we see that the feasible region is closed and
bounded, thus compact and there are global min and max. Let’s
consider g1 (x1 , x2 ) = x21 + x22 − 1 and h1 (x1 , x2 ) = x1 + x2 − 1.
Hence, our constraints are given by g1 (x1 , x2 ) = h1 (x1 , x2 ) = 0.
At (0, 1) and (1, 0) we are on the boundary of g1 . Both points are
regular since:

∇g1 (1, 0) = (2, 0), ∇g1 (0, 1) = (0, 2)

∇g1 (1, 0) = (1, 1), ∇g1 (0, 1) = (1, 1)


are linearly independent.
(b) The Lagrangean is given by

L(x, λ, µ) = x21 + 2x22 − µ(x21 + x22 − 1) − λ(x1 + x2 − 1)

The KKT conditions are


∂L
= 2x1 − 2µx1 − λ = 0
∂x1
∂L
= 4x2 − 2µx2 − λ = 0
∂x2
x21 + x22 − 1 ≤ 0, µ(x21 + x22 − 1) = 0, x1 + x2 − 1 = 0, µ ≥ 0.
We obtain that µ = 0 or x21 + x22 − 1 = 0. From the equality constraint
x1 + x2 = 1 we have x1 = 1 − x2 .
We find the following boundary points:
T
• x∗0 = 32 , 13 for µ = 0 and λ = 34 .
T
• x∗1 = (1, 0) for µ = 1 and λ = 0.
T
• x∗2 = (0, 1) for µ = 2, λ = 0.
At which points do we have:
2
f (x∗0 ) = , f (x∗1 ) = 1, f (x∗2 ) = 2.
3
Thus, we have a global maximum at x∗2 and a global minimum at x∗0 .
Consider these solution separately and observe that:

• At the minimum, the constraint x21 +x22 ≤ 1 is not binding and we are
in the interior of the feasible region, since (2/3)2 + (1/3)2 = 95 < 1,
then the KKT multiplier must be 0, while the Lagrange multiplier
for the equality constraint in this case is positive.
• At the maximum on the other hand the inequality constraint is bind-
ing and the multiplier µ is positive. Here λ is null since the equality
x1 + x2 = 1 is now satisfied at the two boundary points x∗2 and x∗1 .

You might also like