Professional Documents
Culture Documents
1 Part 1
1.1 Linear Algebra
Exercises 1
1. Consider the vectors
1 −1 0
a1 = 2 , a2 = 3 , a3 = 2 .
3 2 2
Solution
The matrix A with a1 , a2 , a3 as column vectors is a square matrix of order
3: Gaussian elimination would work as follows:
1 −1 0 1 −1 0 1 −1 0 1 0 2/5
A = 2 3 2 →(1) 0 5 2 →(2) 0 5 2 →(3) 0 1 2/5
3 2 2 0 5 2 0 0 0 0 0 0
1
2. Consider the vectors
2 −1
a1 = , a2 = .
2 0
Solution
We see that a1 and a2 are l.i. since matrix A
2 −1
(1)
2 0
has determinant different from 0 and this is equivalent to say that the
homogeneous system α1 a1 + α2 a2 = 0 can be satisfied only having both
α1 , α2 = 0 null (this is the very definition of l.i., then we have all the
associated results as the one mentioned above on the determinant).
The inverse of a (2, 2) matrix can be computed straight away. The deter-
minant of A has value 2, then:
−1 1 0 1 0 1/2
A = · =
2 −2 2 −1 1
Solution
We see immediately that the two vectors are defined in R3 but have the
second coordinate equal to 0. They are l.i. because their linear combina-
tion α1 x1 + α2 x2 generates the null vector only if both coefficients α1 , α2
are equal to 0. Then, yes, they form a basis in the 2-dimensional subspace
X ⊂ R3 . To derive the orthonormal basis we need to go through: (a) nor-
malization of the first vector, (b) derivation of the orthogonal projection
of the second vector, (c) normalization of the second vector. Thus:
√
√ 1/ 2
(a) ||x1 || = 2 → v1 = 0√
1/ 2
1
(b) (xT2 · v1 ) · v1 = 0
0
p
2 1/ (2)
√
0− √2 0
(2)
1/ 2
p
x2 −(xT
2 v1 )v1
0 1/ (2)
(c) v2 = ||x2 −(xT
= √ = 0√
2 v1 )v1 || 2
−1/ 2
Hint: the matrix is (3,4) so you already know that its rank cannot be
greater than 3 = min{3, 4}. You check whether the matrix contains a
non-singular (2,2) submatrix and if so, then you verify which conditions
on k should hold for the two (3,3) submatrix containing that one to be
nonsingular as well (Kronecker result).
Thus for any k ̸= 2 we have rk(A) = 3. On the other hand for k = 2 the
other minor submatrix formed by the 1st , 3rd and 4th columns has deter-
minant not equal to 0 and this is sufficient for rk(A) = 3 independently
of the value of k, which completes the solution.
Exercises 2
√
1. Normalize the vector x = (2, 3, −3)T using the Euclidean norm.
Solution
This is a trivial problem: first we compute ||x|| and then divide the
vector
q
√ 2 √ 2
by the norm. So ||x|| = 22 + 3 + (−3)2 = 4, then xN = 41 3 =
−3
1
√2
q √
3 .
4
We verify that ||xN || = 1: 0.52 + ( 3 2
4 ) + ( −3 2
4 ) = 1, so xN is
−3
4
a vector of unit length.
1 1
2. Consider the vectors v = ,w= . Determine the angle between
0 1
v, w: based on their angle, what can you say about the orthogonality of v
and w? What about their linear independence (l.i.)? Is l.i. sufficient for
orthogonality? What about the ⊥ =⇒ l.i.?
Solution
We indicate the angle between the two vectors with θ and just compute
the associated cosine:
(v T w) 1
cos(θ) = ||v||·||w|| then the angle between the two vectors in R2 is
= √
1· 2 √
given by the arc-cosine and we see that arcsin 1/ 2 = 0.785398 = π4 .
Remark: Based on the angle the two vectors are not orthogonal: we would
have got the same answer by just making the inner product between the
2 vectors, with a result different from 0. On the other hand the matrix
associated with the two vectors has determinant equal to 1 and therefore
the two vectors are l.i. So l.i. obviously does not imply orthogonality while
the opposite is always true.
3. Determine the value of k ∈ R for which v ⊥ w are orthogonal.
(a)
−1 −5
v = 3k , w = 0 .
0 2
6 3k − 2
7 −2
Then, yes they form an orthogonal basis. Due to the lack of commutative
property A · AT ̸= AT · A, then we cannot extend the evidence to the
column space.
Remark: this exercise is applied to a square matrix. What if we consider
a rectangular matrix? If A ∈ Rm,n , assume m < n: we would have m row
vectors defined in Rn and the product A · AT would be a matrix (m, m):
then the resulting vectors, say m = 2, in our exercise will still be orthogonal
but now defined in the subspace R2 . On the contrary if m > n, say m = 4
we would have 4 row vectors in R3 and the resulting matrix A · AT would
be (4, 4) and could not be diagonal due to the linear dependence of the 4
row vectors we started from.
Exercises 3
1. For each of the following
matrices compute the solution x to the system
−1
Ax = b, where b = .
4
3 0
(a) A = .
0 −2
3 1
(b) A = .
0 −2
3 1
(c) A = .
6 −2
Solution
(c) In this case, det(A) = −12 ̸= 0, then using Cramer’s formula we can
compute A−1 . Since there exists the inverse matrix, the solution for
the linear system is x = A−1 b
−1 1/6 1/12 −1 1/6 1/12 −1 1/6
A = =⇒ x = A b = =
1/2 −1/4 1/2 −1/4 4 −3/2
(
−x − 2y + z = -3
2. Consider the linear system: Verify first whether it
3x − 4y + z = 9
is compatible and then, in case, determine its solution. Hint: you use
Rouche Capelli to verify whether compatible (rk(A) versus rk(A|b) and
then if compatible you can solve it. To do that, derive an equivalent system
by replacing the second equation with its difference with the first eq.
Solution We write this linear system as Ax = b with A ∈ R(2,3) with three
unknowns. The problem admits solutions since rk(A) = rk(A|b) = 2, the
rank however is smaller than the number of unknowns and the system will
then have infinite solutions. We can see this, by observing that both x
and y can be expressed as functions of z, which is free to change:
(
x = 15 (z + 15)
y = 25 (z)
resulting in infinite solutions as z ∈ R.
Let A be an (m, n) matrix with n column vectors each one with m com-
ponents. We can summarize in this table a set of relevant conditions;
Solution
We denote by Ri + a ∗ Rj → Ri the step that sums to the row Ri the
vector aRj . The Gaussian Elimination process is as follows
2 4 −2 2 2 4 −1 2
4 9 −3 8 ⇒ R2 + R1 (−1) → R2 0 3 11 28 ⇒
−2 −3 7 10 −2 −3 7 10
R3 + R1 → R3 0 4 −2 2 R2 − 2R3 → R2 2 4 −2 2
============⇒ 0 3 11 28 =============⇒ 0 1 1 4 ==⇒
0 1 5 12 0 1 5 12
R3 − R2 → R3 2 4 −2 2
============⇒ 0 1 1 4
0 0 4 8
. Thus, the solution for the linear system is given by
2x1 + 4x2 − 2x3 = 2
x 3 = 2
0x1 + 1x2 + 1x3 = 4 ==⇒ x2 = 2
0x1 + 0x2 + 4x3 = 8 x1 = −1
Exercises 4
0.8 0.3
1. Let A = . Compute the associated eigenvalues λ1 , λ2 and
0.2 0.7
eigenvectors v1 , v2 for which A · vj = λj vj . Verify that the eigenvector are
l.i. and that A = V · Λ · V −1
Remark 1: why is that a good idea? We first notice that the element a21 =
1 (and we know that the second rwo will not change after the elementary
operation above), then by subtracting (2 − λ) (same as summing (λ − 2))
we get a lower triangular matrix whose determinant will now be entirely
determined by the element first row second column.
0 2 + (3 − λ) · (λ − 2) x1 0
=
1 3−λ x2 0
From λ1 = 1: we have from the first equation that x2 can take any
value and from the second equation that the eigenvector must
satisfy
the
−2
equation x11 = −2x12 thus again for t ∈ R we have x1 = t · .
1
From λ2 = 4, likewise from the first equation the second coordinate is
free andfrom
the second we must the first coordinate equal to the second:
1
x2 = t · .
1
Remark 2: what can we notice from this solution? As for any spectral de-
composition since from the beginning we construct a system whose deter-
minant is 0 (singular matrix) then we know that the homogeneous system
will have infinite solutions: we exploit this result to define through the two
eigenvalues two vectors which are l.i., identify two directions and have the
same second coordinate: this is a linear variety of dimension 1.
1 0 −1
3. Consider the symmetric matrix A = 0 2 0 . Solve the associated
−1 0 1
characteristic polynomial to determine the eigenvalues and the associated
eigenvectors. Are they orthogonal? If so determine the associated or-
thonormal basis.
Solution: This was solved in class as well, so I just point out the key
elements of the solution.
First, you see that the matrix A is square of order 3 and most importantly
symmetric and of rank 2, since the third column is equal to the first
times −1. From these remarks we already know that, given 2 distinct
eigenvalues, the associated eigenvectors will be orthogonal (due to the
matrix symmetry).
So
we derive the characteristic polynomial p(λ) =
(2 − λ) · (1 − λ)2 − 1 = −λ · (2 − λ)2 = 0 with roots λ1 = 0 and λ2 = 2.
We now derive for every eigenvalue the eigenvectors by solving the homo-
geneous system (A − λI)v = 0.
v1 − v3 0
• From λ1 = 0: 2v2 = 0 from which we see that the
−v1 + v3 0
second coordinate must be 0 and the firstand the third must be
1
equal, so to identify the eigenvector v1 = t 0.
1
• From λ2 = 2 exactly same procedure and we notice now that the
second column of the system has three 0’s, so the second coordinate
is free while the first and the third are equal with opposite sign. On
the other hand v1 = v3 = 0 is also a solution.
1
We have identified, for t ∈ R, three eigenvectors v1 = t · 0, v2 =
1
−1 0
t · 0 and v3 = t · 1. The associated matrix is orthogonal V =
1 0
1 −1 0
0 0 1 with an associated orthonormal basis ui = vi :
||vi ||
1 1 0
√ √
1/ 2 −1/ 2 0
U = 0√ 0√ 1.
1/ 2 1/ 2 0
Exercises 5
2
1. Let v = 3 ∈ R3 , determine its projections onto the plane (x, y) and
1
then onto the z− axis.
Hint: you need to determine the projection matrices P1 and P2 s.t. pj =
Pj · v are defined as requested and p1 + p2 = v, P1 + P2 = I.
Solution: We saw this in class and I summarize the key elements of this
simple but useful exercise.
For the projection
onto the(x, y) plan, let the projection
matrix (or pro-
1 0 0 2
jector ) be P1 = 0 1 0, then p1 = P1 · v = 3.
0 0 0 0
0 0 0
For the second projection on the z axis, we take likewise P2 = 0 0 0,
0 0 1
0
then p2 = P2 · v = 0. As a result p1 + p2 = v and P1 + P2 = I. Notice
1
that dim(P1 ) + dim(P2 ) = 3 and p1 ⊥ p2 .
Remark: the above result is general, meaning that once we project onto
orthogonal spaces it must be true that the sum of the dimensions of the
resulting sub-spaces is equal to the dimension of the original space. It is
also a general result that any vector in a linear space admits an orthogonal
decomposition such as the one above. These properties are corollaries to
the main result, namely that a matrix P is an orthogonal projector onto a
subspace V if and only if P 2 = P = P T .
4
2. Determine the orthogonal projection of x = , onto the subspace
−1
3
generated by v = .
2
Solution: Here we have a vector in R2 and you are asked to compute its
projection onto a space which is defined by another vector also in R2 . It is
an easy exercise and we then discuss the solution. The resulting projection
will be x⊥ .
√ √
we normalize v: given ||v|| = 32 + 22 = 13, we define vN =
First !
√3
13 .
√2
13
Then we compute the projection:
!
√3
30
⊥ T √3 √2 ) 13
x = (x · vN )vN = (4 · 13
− 12
· √2
= 13
20 .
13 13
The solution is complete.
Remark: The term ”projection onto” may be misleading, as if it meant
that its purpose is to project a vector say from R2 into the line or from
R3 into R2 . No, that is not correct! Indeed in the exercise, we see that
everything occurs in R2 , being both the original vector and its projection
defined in that space. The second however is indeed a subspace, defined by
all those vectors whose first coordinate is 1.5 time the second coordinate,
whereas the original vector was arbitrary. Notice that such property is
defined by the eigenvector and evident if you look at the two coordinates
of vN .
Exercises 6
1. Is the function f (x, y, z) = −x2 − 2y 2 − 8z 2 + 8yz a quadratic form? Study
its sign (is n.d, n.s.d, p.d, p.s.d)?
Solution Observe that we can write the function as f (v) = v T Av, where
−1 0 0
0 −2 4 .
0 4 −8
Hint: you see that this is a diagonal matrix already in the form λI so the
diagonal elements are eigenvalues and the question asks to check the sign
as α changes. It is sufficient to look at those values which are relevant:
α ≤ −1, α = 1, α ∈ (1, 2], α > 2.
Solution First of all, note that A is a diagonal matrix. Then its eigenvalues
are the elements in the diagonal.
Let us denote λ1 = α2 − 1, λ2 = 0, λ3 = α − 1, λ4 = 2 − α.
For α ∈ R we have the following cases:
• If α < −1 then λ1 > 0, λ2 = 0, λ3 < 0, λ4 > 0. Thus, A is indefinite.
• If α = −1 then λ1 = 0, λ2 = 0, λ3 < 0, λ4 > 0. Thus, A is indefinite.
• If α = 1 then λ1 = 0, λ2 = 0, λ3 = 0, λ4 > 0. Thus, A is psd.
• If 1 < α ≤ 2 then λ1 > 0, λ2 = 0, λ3 > 0, λ4 ≥ 0. Thus, A is psd.
• If α > 2 then λ1 > 0, λ2 = 0, λ3 > 0, λ4 < 0. Thus, A is indefinite.
1.2 Geometry
Exercises 7
1. Consider the function of two variables x1 , x2 : f (x1 , x2 ) = α1 x21 − α2 x22
with α1 , α2 ∈ R. For which values of αj , j = 1, 2 is the function concave,
convex, neither of the two?
Solution: The function is defined over R and it is sufficient to consider
values of α1 and α2 with different sign to assess the convexity of the
function. From the definition of convexity, we also have that f (x1 , x2 ) is
convex if for any x = (x1 , x2 )T and y = (y1 , y2 )T , with γ ∈ [0, 1], we have
γf (x) + (1 − γ)f (y) ≥ f (γx + (1 − γ)y). According
to this
condition, it
x1 y1
is sufficient to identify two points x = and y = and verify
x2 y2
whether a convex combination of the associated function values is above
or below the value of the function of their convex combination. You see
that the result will depend on the sign of α1 , α2 .
Also, you can see that the function defines a quadratic form and for α1 > 0
and α2 < 0, f (x1 , x2 ) is a quadratic function and the associated matrix
diagonal and positive definite. This implies the convexity of the function.
For α1 < 0 and α2 > 0, f (x1 , x2 ) is still a quadratic function but now the
associated matrix is negative definite leading to a concave surface.
While for α1 and α2 either both positive or both negative, the function is
neither concave nor convex, the matrix is indefinite and indeed the surface
has a saddle point at (0, 0).
2. Is the domain of p
1 − x22
f (x1 , x2 ) = ,
x2 − x1
open or closed, bounded or unbounded, convex?
Solution Remark: a domain is open if all points in the domain are inner
points and it is closed if it does include all its limit points. As you know
R is both open (since all its values are inner points) and closed (since all
are limit points). It is bounded if the boundary points are finite otherwise
it is unbounded if in some direction we go to |∞|.
To obtain the domain D we consider the conditions 1−x22 ≥ 0 and x2 −x1 ̸=
0.
Therefore, −1 ≤ x2 ≤ 1 and x1 ̸= x2 .
In particular, the domain D is not open because (0, 1) ∈ D but is not an
inner point. The element (0, 0) is a limit point of D but it does not belong
to D, so D is not closed.
The domain D is unbounded because it contains the sequence {(n, 0)}n∈N .
Finally, it is non convex, the elements (−1, 0) and (1, 0) belong to D but
the midpoint of the segment between them which is (0, 0) does not belong
to D.
3. Let √
x1 x2
f (x1 , x2 ) = .
x1 + x2
Is the domain of the function a convex set? Is there a hyperplane through
the origin that is a separating hyperplane of the domain into convex sets?
Solution
To obtain the domain D we consider the conditions x1 x2 ≥ 0 and x1 +x2 ̸=
0.
Therefore, sign(x1 ) = sign(x2 ) and x2 ̸= −x1 . So, the domain is not
convex, because (1, 1) and (−1, −1) belong to D but (0, 0) ̸= D.
Note that x1 +x2 = 0 is a separating hyperplane through the origin and the
sets D1 = {(x1 , x2 ) ∈ D | x1 + x2 > 0} and D2 = {(x1 , x2 ) ∈ D | x1 + x2 < 0}
are convex because D1 is defined by x1 ≥ 0, x2 ≥ 0, (x1 , x2 ) ̸= (0, 0) and
D2 is defined by x1 ≤ 0, x2 ≤ 0, (x1 , x2 ) ̸= (0, 0).
ln(4−x21 −x22 )
4. Let f : R2 → R, f (x) = √
x1 + x2 .
√ Determine the domain of f , is this
function concave/convex?
The first condition requires both x1 and x2 to be defined in the open set
(0, 2) and the second excludes the origin from the function domain. Thus,
the domain is neither open nor closed. The point (0, 1) ∈ D is not an
inner point and the point (0, 0) is a limit point but does not belong to D.
x2
2 x21 + x22 < 4
(x1 , x2 ) > 0
2 x1
The domain is thus not open and not closed but bounded and convex,
because any convex combination of points in the domain still belongs to
the domain.
1 1
∂f /∂x3 (x) = − √ + .
2 x1 + 2x2 − x3 ln(1 + x21 )
This is the function that you are asked to evaluate at the point x0 . We
have
1 1
∂f /∂x3 (x0 ) = − √ + = 0.184294
2 3 + 1 ln(10)
We see that the partial derivative is positive so the function is increasing
in that direction.
p
2. Let f (x1 , x2 ) = x21 + 24 + 3x2 + x1 . Evaluate
the directional derivative
1 1
of f (x) in x0 = along the direction u = .
2 0
Hint: you need first to derive the gradient of the function and then evaluate
it in x0 : the directional derivative is then Du f (x0 ) = ∇f (x0 )T · u.
Solution The expression for the gradient is
√ x21
!
+1
∇f (x) = x1 +24
3
−2x2 e−2x1
∇f (x1 , x2 ) =
e−2x1
x1 + 1/2
f (x0 ) + ∇f (x0 )T · (x − x0 ) = e + (−2e e) = e · (x2 − 2x1 − 1).
x2 − 1
Solution
Note that
2 + 6x1 x2
∇f (x) = 3x21 + 2x2 x3 .
x22
At x0 we have
2
∇f (x0 ) = 3 .
0
Therefore, Df (x0 ) · (x − x0 ) = ∇f (x0 )T (x − x0 ) = 2(x1 − 1) + 3x2 =
2x1 + 3x2 − 2.
As for the Hessian:
6x2 6x1 0
Hf (x) = 6x1 2x3 2x2 .
0 2x2 0
At x0 we have
0 6 0
D2 f (x0 ) = Hf (x0 ) = 6 −2 0 .
0 0 0
Exercises 9
1. Consider the function f (x1 , x2 ) = 2x1 + x2 + 3. Plot the level curves in
R2 as the level c varies. Identify the direction of increasing in R2+ .
Solution For a given c the level curves are defined by 2x1 + x2 = c − 3
which are lines in R2 . Thus, when c increases the function f along the
level curves also increases.
2. Determine and plot the level curves of f (x1 , x2 ) = 4x21 − x22 for c1 = 0,
c2 = 1, c3 = −1. In which direction do we climb or move down the
surface?
Solution For a given c, the level curve is defined by equation 4x21 − x22 = c
which is a hyperbola for c ̸= 0. When c > 0 the hyperbola is transverse in
the x − axis, while for c < 0 the hyperbola is transverse in the y − axis.
For c = 0 the level curve consists of two lines. Observe that the function
f climbs when the value of the level curves c increases.
Figure 1: Relevant figure for problem 9.1
Exercises 10
1. Let f : R3 → R, f (x) = x1 x2 − x23 . Write down the Taylor expansion of
the second order around x0 = (1, 1, 0).
Solution
1 0 1 0
Note that f (x0 ) = 1, ∇f (x0 ) = 1 and Hf (x0 ) = 1 0 0 .
0 0 0 −2
Then,
1 2
f (x) = f (x0 )+∇f (x0 )T (x−x0 )+ (x−x0 )T Hf (x0 )(x−x0 )+o(∥x − x0 ∥ ).
2!
You multiply the vectors and matrices and derive a polynomial of degree
2. The function is the sum of this polynomial and the remainder which is
2
o(∥x − x0 ∥ ).
2. Let f : R3 → R, f (x1 , x2 , x3 ) = x1 x2 + ex3 . Expand the Taylor series to
the second order around x0 = (1, 1, 0)T .
Solution
1 0 1 0
Note that f (x0 ) = 2, ∇f (x0 ) = 1 and Hf (x0 ) = 1 0 0.
1 0 0 1
Then,
1 2
f (x) = f (x0 )+∇f (x0 )T (x−x0 )+ (x−x0 )T Hf (x0 )(x−x0 )+o(∥x − x0 ∥ ).
2
x1 − 1 0 1 0 x1 − 1
We have: f (x) = 2+(1 1 1) x2 − 1+1/2(x1 −1 x2 −1 x3 ) 1 0 0 x2 − 1+
x3 0 0 1 x3
2 2
o(∥x − x0 ∥ ) = ... = x1 x2 + x23 + 1 + o(∥x − x0 ∥ ).
2
Remark: the term o(∥x − x0 ∥ ) is referred to as the rest or error of the
expansion as we move away from x0 . Notice that under the Euclidean
2
norm, at x0 ∥x − x0 ∥ = (x1 − 1)2 + (x2 − 1)2 + x23 , then o(..) implies that
the error will converge to 0 faster than such norm. We come back to this.
3. Consider the function f : R2 → R, f (x1 , x2 ) = −(x21 − 1)2 − x22 . Show
that (1, 0), (−1, 0), and (0, 0) are stationary points and by evaluating the
Hessian at those points infer whether they are also optimal. Hint: a
negative definite Hessian would define maxima at stationary points, while
an indefinite matrix would lead to a saddle point.
Solution The gradient of the function if given by
−4x1 (x21 − 1)
∇f (x) = .
−2x2
Observe that for the points (1, 0), (−1, 0), and (0, 0), the gradient is zero,
thus these points are stationary. Also, the Hessian is given by
−12x21 + 4 0
Hf (x) = .
0 −2
Observe that the eigenvalues of Hf (x) are −12x21 + 4 and −2, which
are both negative for the points (1, 0), (−1, 0), in that case the points
(1, 0), (−1, 0) are local maxima as stationary points. On the other hand,
the eigenvalues of Hf ((0, 0)) are 4, −2 both with different signs, then
Hf ((0, 0)) is an indefinite matrix which implies that (0, 0) is a saddle
point.
3. Let f (x1 , x2 ) = x21 · (x2 − 1). Determine if any the minimum or the maxi-
mum of this function using Second-Order-Sufficient-Conditions (SOSC).
Solution We have solved
( this in class. Let’s first consider the first order
2x1 (x2 − 1) = 0
conditions: ∇f (x) = Then x1 = 0 and x2 is a free
x21 = 0
0
variable. All points along the x2 axis are stationary: x∗ = , c ∈ R.
c
We now consider second
order conditions through the Hessian evaluated
∗ 2c − 2 0
at x : .
0 0
Then the Hessian is (i) p.s.d. for c > 1 and all stationary points are local
minima; (ii) n.s.d. and all stationary points
are local maxima; for c < 1
0
and (iii) indefinite for c = 1 so at x∗ = the function has a saddle
1
point.
4. Determine the local and global minima and maxima of the function f (x1 , x2 ) =
ln(x21 ) · ln(x2 ). Apply First and Second Order Necessary Conditions.
Solution The domain of the function requires x1 ̸= 0 and x2 > 0. So
there may not exist a global min or max. Indeed we see that if you
fix x2 > 1 then ln(x2 ) × limx1 →∞ ln(x21 ) = +∞ while for x2 < 1 then
ln(x2 ) × limx1 →∞ ln(x21 ) = −∞ so the function is unbounded. Consider
now FONC: ! (
2
x1 ln(x2 ) ln(x2 ) = 0
ln(X12 ) = 0. Then leads to x2 = 1 and x1 = +/ − 1,
x2 ln(x21 ) = 0
1 −1
with 2 stationary points x1 = and x2 = .
1 1
−2ln(x2 ) 2
!
x21 x1 x2
Consider now SONC, with the Hessian H(x1 ) = 2 −ln(x21 ) =
x1 x2 x22
0 2
, then we see that the quadratic form xT1 H(x1 )x1 = 4x1 x2 changes
2 0
sign depending on the signs of the coordinates x1 and x2 . Same conclusion
for x2 and thus we conclude that the Hessian is indefinite and the function
does not admit neither global nor local min or max.
Exercises 12
1. Apply a first-order algorithm to determine the minimum of f (x) = (1−x2 )·
2
e−x . Write the solution of the problem. Hint: determine the stationary
points and study the limits for x going to +/ − ∞. Check if the function
is even.
Figure 3:
2. Apply again a first order algorithm to determine min and/or max of f (x) =
x4 + 4x3 + 1.
Figure 4:
3. Apply 1-d Newton method to find the minimum of f (x) = x4 − 2x − 5.
Let x0 = 0.5.
Solution This problem and the following one are the two I have discussed in
class and for which you have the excel file with the two methods’ iterations.
I refer you to that file as for the solutions. Here I give some more details
to frame properly the two methods.
Under Newton’s method we reach the optimum at x∗ = 0.7937, where
f (x∗ ) = −6.1905 with first derivative f ′ (x∗ ) = 0.000032. The algorithm
stops after 4 iterations because before starting the iterations we have set
a given tolerance, say f ′ (xk ) ≤ 0.00005, and then we take the associated
values as valid solution of the problem. The starting point was x0 = 0.5.
In general the definition of the starting point is based on few function
evaluations after checking the domain of the function and its functional
form. At every iteration in this method we need to keep track of first and
second derivatives.
4. Same function as in the previous question, apply now Fibonacci method.
Hint: let ϵ = 0.1 and N | F1+2ϵ
N +1
= 0.06 .
Solution For the same function as in the previous problem, we see that
here with an iterative method driven by the Fibonacci sequence, after 6
iterations, the minimum of the function is set at f (x∗ ) = −6.1898. Fi-
bonacci’s method requires a series of initial inputs, from which we derive
the starting points a0 and b0 at which first to compute f (.), the initial Fi-
bonacci ratio and the number N of iterations needed to solve the problem.
In particular:
• Given an ϵ > 0 we set N as that integer for which F1+2ϵ
N +1
≤ c, c being
the range reduction we wish to attain between the final and initial
evaluation interval.
• The first Fibonacci ratio will then beρ1 = 1 − FFNN+1 . At every it-
FN −1 FN −2
eration, selectively, given ρ2 = 1 + FN , ρ3 = 1 + FN −1 etc, until
F1
ρN = 1 − F2 , selectively we restrict the range of evaluation.
5. Apply Newton method to find the minimum of f (x) = −5x2 + x. Let
x0 = 0.
Solution
Here you are given the initial value x0 = 0, so you just have to complete
the first evaluation before starting the iterations: f (x0 ) = 0, f ′ (x0 ) =
10x0 − 1 = −1 abd f ′′ (x0 ) = 10 which is positive, the function is convex
and we can set k = 1:
1
k = 1 : x(1) = 0 − (− 10 ) = 0.1, f (x(1) = −0.5, f ′ (x(1) = 10 ∗ 0.1 − 1 =
0, f ”(x(1) = 10. First derivative 0 (FONC) plus second positive
(convexity), we STOP.
• Notice that in this problem the first derivative at the stationary point
is 0, full stop. In the previous exercise we stopped due to the tol-
erance. Then in this problem a second iteration would have been
absolutely ineffective with the optimal value remaining exactly the
same, while in the pervious we may have requested an even smaller
error and thus the algorithms wouldn’t be terminated.
Exercises 13
1. Consider the quadratic function f (x) = 12 xT Qx := x21 + 2x22 + 4x23 + 2x1 x3 .
Apply the steepest descent algorithm to determine the minimum of this
function with initial condition x(0) = (1 1 1)T . Verify the orthogonality
of x(k) − x(k−1) ⊥ x(k+1) − x(k) for k = 2, 3, ... Verify the updating of αk
at every iteration. Estimate the speed of convergence of this algorithm.
Compare the results with the case of constant step length α = 0.2 in
the updating of x(k) . Hint: you can try to solve this type of problem by
computing the iterations in an excel worksheet. You may be asked to show
only a pair of iterations in an exam context.
Solution See the excel file I have uploaded in Blackboard to gather the formulae
implemented. Here I attach a screenshot of the excel sheet with the iterations.
You may notice that I stopped after 6 iterations but indeed the gradient was
not yet 0 (column L).
Exercises 14
Apply Newton’s method to minimize the following functions:
1 T 2 2 2
1. From the previous
exercise: f (x) = 2 x Qx := x1 + 2x2 + 4x3 + 2x1 x3 .
1
Let x(0) = 1. Show that the method converges in just one iteration.
1
Solution The matrix Q is given by
2 0 1
Q = 0 4 0
1 0 8
. For k = 1, we have
3. f (x1 , x2 ) = x21 (x2 − 1). First verify First and Second order conditions,
then apply Newton’s method to determine a local optimum in the domain
(0) 1
x2 > 1, with initial condition x = . Apply a termination criterion
2
and verify convergence
toa minimum. Apply the same algorithm now
(0) 0.00001
adopting x = .
1.00001
2x1 (x2 − 1) 2x2 − 2 2x
Solution We have ∇f (x) = and Hessian F (x) = .
x2 2x 0
Then we see that from FONC x1 = 0 while x2 is free over R. SONC re-
quire for positive or negative semi-definiteness x2 ̸= 1. Let then x2 > 1
Figure 7: Newton Method. Exercise 14.2
(0) 1 (0) (0) 0 2 (0) 2 2
and x = , f (x = 1. Then ∇f (x = g = ,F = ,
2 1 2 0
0 0.5
thus [F (0) ]−1 = , leading to
0.5 −0.5
1 0.5 0.5
x(1) = − = , f (x(1) = 0.125
2 0.5 1.5
(2) 0.5 0.25 0.25
x = − = , f (x(2) = 0.015625
1.5 0.25 1.25
(3) 0.25 0.125 0.125
x = − = ,
1.25 0.125 1.125
0.125 0.0625 0.0625
x(4) = − = , f (x(4) = 0.000244. And so on
1.125 0.0625 1.0625
0
until convergence with given tolerance to x∗ = with optimal value
1
f (x∗ ) = 0.
(0) 0.00001
Consider the case in which x = . The gradient is g (0) =
1.00001
0.00001 0.00 50000.00
and the inverse of the Hessian would be F −1 (x(0) ) =
1E − 10 50000.00 −50000.00
leading to an update from x(0) to x(1) subject to numerical instability
caused by the Hessian inversion step.
Exercises 15
2 1 1 1 1
1. Consider the system Ax = b in R with b = and A = 2 :
1 1 + 10−10 1.1 − 10−10
solve to determine x. Let now the element a22 of A to change from
0.55(1 − 10−10 ) to 0.51(1 − 10−10), then −10
to 0.5(1 − 10 ). Determine
x1
the inverse A−1 and solve for x = . Evaluate the impact on the solu-
x2
tion of a negligible change of vector b. Compute the condition number of
the system. Hint: for the condition number let ||A||2 = max||x||=1 ||Ax||2
and determine k(A) := ||A||2 ·||A−1 ||2 . This problem aims at clarifying the
impact of ill conditioning of the Hessian matrix in the Newton’s updates.
2. Consider again problem f (x1 , x2 ) = x21 (x2 − 1) and let directly consider
the case for x2 > 1. Determine the minimum of the function by applying
1
directly a rank-1 QN algorithm with H0 = I2 and x(0) = . Show that
2
the algorithm won’t converge to the minimum.
4 1 0
Solution We now have: f (x(0) ) = 4, g (0) = and H0 = . Then
4 0 1
(1) 2 1 0 4 1 (1) (1) 0
x = − 0.25 · = , then f (x ) = 0, g = ,
2 0 1 4 1 1
−1 −4
∆x(0) = and ∆g (0) = , leading to the following Hessian update1 :
−1 −3
1 1 1 1 16 12 0.4 −0.44
H1 = H0 + 25 − 25 = leading to the
1 1 12 9 −0.44 0.68
second iteration:
1 0.4 −0.44 0 1.11
x(2) = −0.25 · = , with f (x(2) ) = −0.20946
1 −0.44 0.68 1 0.83
(2) −0.3774
and gradient evaluated in this point g = .
1.2321
From which the algorithm iterations will determine ∆x(1) , ∆g (1) , H1 ∆g (1) , ∆x(1) −
H1 ∆g (1) with tight control of the Hessian update: however the algorithm due
to the singularity at the origin will not converge either.
Exercises 16
Least square estimation
Solution
h ih iT
T Hk ∆g (k) Hk ∆g (k)
1 We ∆x(k) ·∆x(k)
have Hk+1 = Hk + T − T
∆x(k) ∆g (k) ∆g (k) ·Hk ·∆g (k)
Using Excel and its =LINEST() function we obtain that the fitted regres-
sion line is y = 1.5x − 0.667. The estimated error is 0.166.
Figure 8:
2. Consider now a more extended case problem and determine the LS esti-
mator b ∈ R3 associated with the observations (first row the endogenous
variable Y and following rows the explanatory variables X1 , X2 .
Y 12.5 9.8 11.2 6.9 10.0 13.2 11.7
X1 = 6.4 7.3 8.4 6.5 7.6 6.9 8.1 .
X2 13.6 12 11.1 13.3 12.2 11.9 12.5
Compute the linear approximation error. Explain the rationale of the LSE
method. Which function are we minimizing?
Solution
2
We are minimizing the function of the estimated error e(a, b, c) = ∥Y − aX1 − bX2 − c∥ .
The importance of the LSE method lies in the fact that we look for the variables
a, b, c such that square sum of the errors is minimum and that is a way to find
a model which fit the data.
After using the Data Analysis tool in Excel we obtain the following estima-
tions:
Therefore a = −0.127309247, b = −0.667095567 and c = 19.94124422 and
the minimum error at this point is 24.9548771. Thus the endogenous vari-
able Y has a negative dependence on the two endogenous variables X1 and X2
with however a relatively high constant term. The method identifies the linear
combination of X1 and X2 for which the squared error between the sampled
values and the linear fitting is minimum. In general LSE can also be applied
assuming a polynomial fitting rather than a linear fitting, in which case however
it
will
become a nonlinear programming problem and the postulated solution
a
b = X T · X −1 X T · b
c
Figure 9:
3. Apply simplex to solve the LP problem: maxx≥0 5x1 + 4x2 + 6x3 s.t.
x1 − x2 + x3 ≤ 20, 3x1 + 2x2 + 4x3 ≤ 42, 3x1 + 2x + 2 ≤ 30.
Solution
1. Determine the dual problem of the primal LP: maxx≥0 2x1 + 5x2 + x3 s.t.
2x1 − x2 + 7x3 ≤ 6, x1 + 3x2 + 4x3 ≤ 9, 3x1 + 6x + 2 + x3 ≤ 3. Verify the
duality gap and complementary slackness. Hint: always put the primal in
standard form and then derive the dual
Solution
First we remind the standard form of a general program with the same
size and its dual.
min[cT 0T ]x (2)
x
subject to
[A I] x = b
with x = (x1 , · · · , x6 ) ≥ 0.
(b) The dual problem is
max λT b (3)
lambda
subject to
λT [A I] ≤ cT 0T
a1 a2 a3 a4 a5 a5 b
15 6 1 39
43 0 1 43 0 43 43
4 21 25 146
- 43 0 0 - 43 1 - 43 83
19 1 7 15
43 1 0 - 43 0 43 43
24 1 36
rT 43 0 0 43 0 43
λT D = cTD − rD
T
2 1 0
24 1 36
λ1 λ2 λ3 3 1 0 = −2 0 0 − 43 43 43 .
3 0 1
Therefore,
λT = − 43
1
− 36
0 43
max x1 − x2
3
s.t. x1 + x2 ≤
2 (4)
2x1 + x2 ≤ 2
x1 , x2 ≥ 0
Figure 13:
Dual problem, interpretation
3
min λ1 + 2λ2
λ 2
s.t. λ1 + 2λ2 ≥ 1 (5)
2λ1 + λ2 ≥ −1
λ1 , λ2 ≥ 0
Figure 14:
3 ∗ 3 1
min → ( , 2)λ = ( , 2) 1 = 1 (9)
λ 2 2 2
cT x − λ T b = 0 (10)
3
L(x, λ) = x1 − x2 − λ1 (x1 + x2 − ) − λ2 (2x1 + x2 − 2)
2 (11)
3
= (1 − λ1 − 2λ2 )x1 + (−1 − λ1 − λ2 )x2 + λ1 + 2λ2
2
Thus, the optimal max in the primal problem is equal to the min in the
dual problem. We check the complementary slackness condition
• λ∗ (Ax∗ − b) = 0
3
1 1 1 1 1 −0.5
[0, ] − 2 = [0, ] =0 (12)
2 2 1 0 2 2 0
• (cT − λ∗ T A)x∗ = 0
1 1 1 1 3 1
(−1, 1) − (0, ) = (0, − ) =0 (13)
2 2 1 0 2 0
Exercises 19
Interior point method
Also here you better analyse the problem and solve it in excel.
1. Consider the problem minx≥0 5x1 + 4x2 + 8x3 s.t. x1 + x2 + x3 = 1.
This is in canonical but not restricted form: derive the restricted form
appropriate to apply Karmarkar’s IP method.
Solution
In order to formulate the problem in so called restricted form, all you
need to do is to translate the coefficient vector in such a way that the
cost function (recall that this is an hyperplane) will go through the origin.
Then in this simple case you see that since 1T x = 1 the minimum cost is
reached for x2 = 1 and thus the translated problem will be minx≥0 (5 −
4)x1 + (4 − 4)x2 + (8 − 4)x3 = x1 + 4x3 which is the objective function of
the following problem.
∂L
= −2x1 − 2λ(x1 − 2) = 0
∂x1
∂L
= −2x2 − 2λx2 = 0
∂x2
∂L
= (x1 − 2)2 + x22 − 1 = 0
∂λ
Consider the second partial derivative, we see that as NC we have x2 = 0
∂L
or λ = −1. From λ = −1, however would be inconsistent with ∂x 1
= 0.
Let then x2 = 0:
• From the third partial derivative we derive x1 = 3 or x1 = 1.
• Consider x1 = 3: from ∂L
∂x1 we get λ = −3.
• For x1 = 1 we have λ = 1
We then have two candidates for max and min: x∗1 = (3 0)T and x∗2 =
(1 0)T , the first would lead to f (x∗1 ) = −9, the second to f (x∗2 ) = −1.
The former is then the constrained minimum of the function and the latter
constrained maximum.
∂L
= 2x1 − 2λ1 X1 − λ2 = 0
∂x1
∂L
= 4x2 − 2λ1 x2 = 0
∂x2
∂L
= 6x3 − λ2 = 0
∂x3
∂L
= x21 + x22 − 25 = 0
∂λ1
∂L
= x1 + x3 − 2 = 0
∂λ2
3. Find the max and min of f (x) = x1 · x2 s.t. h(x) = x1 + x22 = 1. Solve by
substitution.
Solution In this problem with only two variables we want to solve by
direct substitution and then compute the optimal with respect to one
variable only. Let x1 = 1 − x22 . Then f (x2 ) = x2 (1 − x22 ) resulting into
first order condition dfdx
(x2 )
= 1 − 3x22 = 0, thus x2 = +/ − √1 . Then
2 (3)
d2 f (x2 )
dx22
= −6x2 = √6 > 0 for x2 = − √1 is a local minimum, while for
(3) (3)
2
d f (x2 )
x2 = − √1 the function is concave, dx22
= −6x2 = − √6 < 0 for is a
(3) (3)
local maximum. Now we can derive x1 from the constraint function, and
we have x1 = 1 − 31 = 23 with value of the objective function f (x1 , x2 ) =
− √2 a local minimum and f (x1 , x2 ) = √2 a local maximum.
3 (3) 3 (3)
4. Determine the min and max of f (x) = x21 + x22 s.t. h(x) = x1 + x2 = 1
first by substitution and then by Lagrange method. Verify the optimality
results from the lecture notes on the solution of linearly constrained QP.
Solution It can be solved by substitution and then we can double check
by solving the problem using FONC from the Lagrange method (remark:
this being a convex function over a convex set, FONC would do and the
stationary point will be unique).
∂L
= 2x1 − λ = 0
∂x1
∂L
= 2x2 − λ = 0
∂x2
∂L
= 1 − x1 − x2 = 0
∂λ
5. Determine the max and min of f (x1 , x2 ) = x21 x2 −x32 +x21 on the boundaries
of a triangle with vertices O = (0, 0), B = (2, 0) and C = (0, 2). Notice
that the feasible region is not the area of the triangle but only the sides.
Determine the max and min by function evaluation through the triangle
edges.
Solution The objective is a bit nasty function, as we can see in the next
figure:
6. Find the max and min of the function f (x1 , x2 ) = x1 + x2 s.t. h(x1 , x2 ) =
x21 + x22 = 1. Apply Lagrange and the level curves methods.
Solution This is a quadratically constrained problem with linear objective:
both the obj function and the constraints are convex and defined over a
compact set. We use this problem to compare the solutions (need to be
equivalent!) of two methods that in general are alternative to each other.
and
x 1 + x 2 = c
x21 + x22 = 1
c<0
8. Find the max and min of f (x) = x21 + x22 s.t. 4x21 + x22 = 4 adopting the
Lagrange method.
Solution Again a quadratic program with a convex elliptic feasible region.
We have one constraint, thus resulting into one Lagrange multiplier and
two unknowns. We know already that without constraints this is a triv-
ial quadratic program with solution at (0, 0) and optimal value 0. The
Lagrange function is L(x, λ) = x21 + x22 − λ(4x21 + x22 − 4). We have
∂L
= 2x1 − 8λx1 = 0
∂x1
∂L
= 2x2 − 2λx2 = 0
∂x2
∂L
= 4 − 4x21 − x22 = 0
∂λ
From the last condition ∂L ∂λ = 0, along the axes, we have for x1 = 0:
+ +
x2 = − 2, or for x2 = 0 we need x1 = − 1. From the first and second
partial derivatives we see that x1 = x2 = 0 would satisfy the optimality
condition. Alternatively the Lagrange multiplier should be either λ = 41
or λ = 1. But in the first case the partial derivative of L with respect
to x1 would be 0 but leading to an inconsistency in the second condition
∗
and
viceversa. So we have only
the following
stationary points: x1 =
1 −1 0 0
, x∗2 = , x∗3 = x∗4 = . Resulting into the local
0 0 2 −2
minima f (x1 ) = f (x2 ) = 1 and local maxima f (x∗3 ) = f (x∗4 ) = 4.
∗ ∗
∂L
= 2x2 + x3 − λ = 0
∂x1
∂L
= 2x1 + x3 − 4 − λ = 0
∂x2
∂L
= x1 + x2 − λ = 0
∂x3
∂L
= x1 + x2 + x3 = 0
∂λ
with g(x) = 1 − x1 − x2 .
2. Let f (x) = −7x21 − x22 . Determine maxx∈R2 f (x), subject to x1 ≥ 0,
x2 ≥ −1.
Solution
Let g1 (x1 , x2 ) = −x1 , g2 (x1 , x2 ) = −x2 − 1, therefore the feasible region
is given by g1 (x1 , x2 ) ≤ 0, g2 (x1 , x2 ) ≤ 0.
Remark: f is quadratic, negative definite, concave and g1 , g2 are convex
(linear inequalities).
(a) From
= −7x21 − x22 + µ1 x1 + µ2 x2 + µ2 .
Therefore the (5=2+1+2=two conditions for the obj fn, one for the
joint constraints and two for the original inequalities) KKT condi-
tions, plus the multipliers non negativity, are:
∂L
= −14x1 + µ1 = 0
∂x1
∂L
= −2x2 + µ2 = 0
∂x2
µ1 x1 + µ2 (x2 + 1) = 0
−x1 ≤ 0, −x2 − 1 ≤ 0
µ1 ≥ 0, µ2 ≥ 0
µ1 = 14x1
µ2 = 2x2
x1 ≥ 0, x2 ≥ −1, µ1 ≥ 0, µ2 ≥ 0.
The solution x2 = −1 implies µ2 = −2 < 0 which is not feasible.
Therefore, µ2 = 0 and x2 = 0.
On the other hand, µ1 = 0 if and only if x1 = 0 and hence µ1 = 0
and x1 = 0. Then the solution is at x∗ = (0, 0) and maxx∈R2 f (x) =
f (x∗ ) = 0.
However, we can use a shortcut:
• We have f concave.
• We have convex constraints gj for j = 1, 2.
• We have KKT constraints.
Therefore, the KKT are sufficient for optimality and at x∗ = (0, 0)
we have a maximum with value f (x∗ ) = 0.
L(x1 , x2 , µ1 , µ2 ) = x1 x2 + µ1 (2 − x1 − x2 ) + µ2 (x1 − x2 ).
(b) The KKT conditions are:
∂L
= x2 − µ1 + µ2 = 0
∂x1
∂L
= x1 − µ1 − µ2 = 0
∂x2
µ1 (2 − x1 − x2 ) + µ2 (x1 − x2 ) = 0
2 − x1 − x2 ≤ 0, x1 − x2 ≤ 0.
µ1 ≥ 0, µ2 ≥ 0
(c) From the above FONC, assume µ1 > 0 and µ2 = 0 then x2 = µ1 ,
x1 = x2 , µ1 (2 − x1 − x2 ) = 0. At the boundary (this is the point
in which the inequality associated with the multiplier becomes an
equality), we have µ1 = 1 and therefore x1 + x2 = 2 implies x∗1 =
x∗2 = 1. You can see that all first order KKT conditions are satisfied
at this point. Same result would be generated by assuming µ1 = 0
and µ2 = 1. Notice that the non-negativity condition on the KKT
∂L
multipliers together with the conditions ∂x j
= 0 for j = 1, 2 forces
one or the other multipliers to be 0.
At the point x∗ = (1, 1) we also have ∇g1 (x1 , x2 ) = (−1, −1) and
∇g2 (x1 , x2 ) = (1, −1), so the gradients of the constraint functions
associated with x∗ are l.i. and the point is regular. Furthemore
being regular we also have that T (x∗ ) includes the origin and SONC
are satisfied with both constraints active. Recall that we are looking
for a minimum of f (x) = x1 x2 .
However if we compute the Hessian of the Lagrange function, we
have:
• At the minimum, the constraint x21 +x22 ≤ 1 is not binding and we are
in the interior of the feasible region, since (2/3)2 + (1/3)2 = 95 < 1,
then the KKT multiplier must be 0, while the Lagrange multiplier
for the equality constraint in this case is positive.
• At the maximum on the other hand the inequality constraint is bind-
ing and the multiplier µ is positive. Here λ is null since the equality
x1 + x2 = 1 is now satisfied at the two boundary points x∗2 and x∗1 .