You are on page 1of 6

CSE 597 Spring 2019

Exercise 1
Due Sunday 11:59 PM, February 3th

Instructions:

• There are four problems in this excercise.

• Please mention if you are auditing the course

• Using this LATEX template will be helpful for grading purposes.

• Please write down every mathematical fact you are using in derivation (even if it looks obvious
to you).

1
Problem 1 (25 points). Consider the squared p-norm of a vector x ∈ Rd defined as f (x) = kxk2p =
P 2/p
d p
|x
i=1 i | . Prove the that f (x) is (p − 1)-smooth for p ∈ [2, ∞].

Solution. 

2
Problem 2 (25 points). Assuming the data matrix is X ∈ Rn×d , the standard Lasso problem is
given by:
min kXw − yk22 + λkwk1
w∈Rd

Show that the dual problem is:


2
λ2

1 2 y
minn kyk2 − α −
α∈R 2 2 λ 2
subject to |x>
i α| ≤ 1, i = 1, 2, . . . , d

where xi , i = 1, 2, . . . , d are the feature vectors (columns of the data matrix X).

Solution. 

3
Problem 3 (25 points). The convex envelope of a function f : C → R is defined as the largest
(point-wise) convex function g such that g(x) ≤ f (x) for all x ∈ C, meaning that among all convex
functions, g is the one that is closest to f (e.g., `1 norm is the convex envelope of `0 norm).
To obtain the convex envelope of a non-convex function we can rely on a basic result is convex
analysis that states for a non-convex function f , the conjugate of conjugate f ∗∗ is the convex
envelope of the function f . Using this fact, show that the convex envelope of function f (X) =
rank(X) on the set n o
C = X ∈ Rn×d | kXk2 ≤ 1

is the function
min(n,d)
X
g(X) = kXk∗ = σi (X)
i=1

An immediate implication of this result is that the trace norm of a matrix is the tightest convex
relaxation of the rank.
Solution. 

4
Problem 4 (25 points). Consider the following convex optimization problem over matrices:
 
min F (X) = f (X) + λkXk∗
C

where C = {X ∈ Rn×d | kXkF ≤ M }, f (X) is any convex function (not necessarily differentiable),
λ > 0 is a regularization parameter, and kXk∗ denotes the nuclear (trace) norm of a matrix X,
which is the sum, or equivalently the `1 norm, of the singular values of X.

(a) Show that projection over the set C is:


 
M
ΠC (X) = min 1, X
kXkF

(b) What is the subdiffernartiable set ∂F (X) of the objective function. You might need to
read (AW).

(c) Consider the projected subgradient descent algorithm for solving the above optimization
problem which iteratively updates the initial solution X 0 = 0 by

X t+1 = ΠC (X t − ηt Gt ) ,

where Gt ∈ ∂F (X t ). Show the convergence rate after T iterations is:


T
kX ∗ k2F + G2 Tt=1 ηt2
P
1X
E[F (X t )] ≤ F (X ∗ ) +
T 2 Tt=1 ηt
P
t=1

• Decide an optimal value for learning rate ηt and simply the convergence rate.

Solution. 

5
References
[AW] Alistair G Watson Characterization of the subdifferential of some matrix norms, Linear
algebra and its applications, 1992

You might also like