You are on page 1of 8

Foundations of Information Systems Engineering

IEE 605

Midterm Exam– Mar 14th, 2024

1. One page note sheet (double sided) are allowed. Calculators without internet capacity are
allowed.

2. Write your answer on this exam paper. DO NOT use your own paper. If you need more
space, write on the back side of this exam paper.

3. The test is worth 100 points. The assigned points for each question are given beside it.
There is an extra 10 points that you can earn.

4. When time is called, DO NOT enter any more answers on your exam. You must have
a photo ID to take an exam. Please place your student ID card on the desk. Failure to
present a photo ID for an exam results in an automatic 10% deduction. In addition, until
a valid photo ID is presented to the instructor, your exam score is “0”

5. Cell phones, electronic readers, computers, smart watches, and other internet-capable de-
vices are not allowed. If your cell phone or other internet-capable device is seen out in any
way during an exam, you will be given a zero for the exam and reported for a violation of
ASU’s Academic Integrity Policy.

6. Sign the Honor Code below:


I commit to uphold the ideals of honor and integrity by refusing to betray the trust will
violate ASU Academic Integrity Policy.
Name:
Student ID:
Signature: Data:

Page 1 of 4 – Foundations of Information Systems Engineering (IEE 605)


Computational Problem Please compute the following problem. Necessary compu-
tation step is needed and write down the equation you use for each subproblem
would help you on gaining intermediate points for each question. However, please try to
be concise about your answer.
1. Logistic Regression
For a Logistic regression problem, the following loss function is often derived where xi , θ ∈
Rp×1 and yi ∈ {0, 1} is the label.
X
L(θ) = −yi log(σ(θT xi )) − (1 − yi ) log(1 − σ(θT xi ))
i

1 d
Here, given that σ(x) = 1+exp(−x) and dx σ(x) = σ(x)(1 − σ(x)).

(a) (10 pionts) Please first derive the gradient of L(θ).


(b) (10 pionts) Please compute the Hessian matrix . and answer if the problem is convex?
(c) (7 pionts) Please answer if the problem is strongly convex? If not, what method you
would like to use to make the method strongly convex?
(d) (7 points) What is the computational complexity of the gradient method.
(e) Suppose that we like to add L1 penalty to the Logistic regression as
X
L(θ) = −yi log(σ(θT xi )) − (1 − yi ) log(1 − σ(θT xi )) + λ∥θ∥1
i

Please write down the proximal gradient algorithm to solve this question.

2. Quadratic Optimization
Quadratic optimization problems are an important class of optimization problems that
appear both in practice and various stages of some of the nonlinear optimization algorithms.
In this problem, we will compare different approaches to solve the quadratic programming
problems denoted as follows:
1
x∗ = arg min( x⊤ Qx + b⊤ x + c) (1)
x 2

Here, the optimization variable x ∈ Rp and Matrix Q ∈ Rp×p , vector b ∈ Rp×1 and scalar
c are assumed to be given.

(a) (6 pts) What is the condition that the problem is strongly convex? Under this con-
dition, how many global optimum do we have. For all the following questions, we
assume that this condition is true.
(b) (6 pts) Please derive a closed-form solution for Problem (1). What are the space
complexity and time complexity of the closed-form analytical solution?
(c) (10 pts) Please derive the Gradient descent of Problem (1). Please perform 1 iteraion
of the Gradient descent aglorithm from any initial points x0 ∈ Rp . What are the
space complexity and time complexity of the Gradient Descent algorithm?
(d) (7 pts) For a simple linear regression problem, with three data points x1 = 1 , x2 = 2
,Px3 = 4 , y1 = 0 , y2 = 1 , y3 = 2. The goal is to minimize the least square loss
3 2
i=1 (yi − xi θ1 − θ0 ) with respect to θ1 , θ0 . Please write down the equivalent Q, b, c
in Problem (1).

Page 2 of 4 – Foundations of Information Systems Engineering (IEE 605)


(e) (7 pts) When applying the gradient descent, you plot the loss function vs epoch (# of
iterations). The four curves (noted by #1, 2, 3, 4, 5) in the Figure below are plotted
using different but constant learning rate (step size). Please rank the learning rate
from smallest to largest (in terms of the value of the learning rate) and
answer which learning rate you would like to choose?

(f) (Bonus 5 pts) Suppose that the Gradient Descent solution in the k th iteration is
given by x(k) , please derive the range of the stepsize such that the gradient descent
algorithm will eventually converge to the global optimum.
(g) (Bonus 7 pts) Suppose that you would like to find the best step size α for Gradient
Descent algorithm in the k th iteration. Please derive the optimum step size αk to
solve Problem (1) in the k th iteration?
Hint: You can write down the function after the gradient descent with step size and
try to solve the following equation αk = arg minα f (x(k) − α∇f (x(k) )).

3. Suppose we want to find x∗ ∈ C1 . . . Cm , i.e, find a point in intersection of closed, con-


T T
vex sets C1 , ..Cm . First define fi (x) = dist (x, Ci ) , i = 1, . . . , m and f (x) = maxi=1,...,m fi (x).
Then the problem is to solve min f (x). We will guide you steps in steps:

(a) Recall the distance function dist(x, C) = miny∈C ∥y − x∥2 . Please compute the gra-
dient. Assume that PC (x) is the projection of x onto C.
(b) Recall for f (x) = maxi=1,...,m fi (x) = maxi=1,...,m dist (x, Ci ). Please compute ∂f (x).
(c) Please derive the subgradient method to solve this problem.

4. (Coordinate Descent Method) box-constrained QP Given b ∈ Rn , Q ∈ S+


n , consider a

box-constrained QP:
1
min xT Qx + bT x subject to l ≤ x ≤ u.
x 2

(a) Please show that P


minimizing
bi − j̸=i Qij xj
 over xi with all xj , j ̸= i fixed: same basic steps give
xi = T[li ,ui ] Qii ,where T[li ,ui ] is the truncation (projection) operator onto

Page 3 of 4 – Foundations of Information Systems Engineering (IEE 605)



ui
 if z > ui
[li , ui ] : T[li ,ui ] (z) = z if li ≤ z ≤ ui

li if z < li

(b) Please derive coordinate descent algorithm in solving this box-constrained QP.

5. The proximal operator (proximal map) is defined as


1
proxh,t (x) = argmin ∥z − x∥22 + th(z)
z 2

which we can think of as minimizing h(z), while respecting a quadratic penalty on the
distance from z = x. Derive the proximal map for each of the following functions:

(a) L2 norm: h(z) = ∥z∥2


(b) Elastic net penalty: h(z) = α∥z∥1 + (1 − α)∥z∥22 , 0 < α < 1
(c) h(z) = ∥z∥qq = ( pi=1 |zi |q ) , 0 < q < 1
P

6. The proximal minimization algorithm is:


 
x(k+1) = proxh,t x(k)

Write out the proximal minimization algorithm applied to h(x) = 21 xT Ax − bT x, where


A ∈ S n and show that this is equivalent to the iterative refinement algorithm:
 
x(k+1) = x(k) + (A + ϵI)−1 b − Ax(k)

where ϵ > 0 is a constant. Please comment why this algorithm is helpful to solve the rank
defficient A.

——— End of Examination ———

Page 4 of 4 – Foundations of Information Systems Engineering (IEE 605)


Page 5 of 4 – Foundations of Information Systems Engineering (IEE 605)
Page 6 of 4 – Foundations of Information Systems Engineering (IEE 605)
Page 7 of 4 – Foundations of Information Systems Engineering (IEE 605)
Page 8 of 4 – Foundations of Information Systems Engineering (IEE 605)

You might also like