10 Lowrank Quadratic PDF

Short Notes on Low Rank Recovery
Justin Romberg
Georgia Tech, School of ECE
Tsinghua University
October 18, 2013
Beijing, China
Matrix completion
The following problem has many applications. We are interested in a

matrix ...  
X1,1 X1,2 X1,3 X1,4 X1,5
X2,1 X2,2 X2,3 X2,4 X2,5 
 
X= X3,1 X3,2 X3,3 X3,4 X3,5 

X4,1 X4,2 X4,3 X4,4 X4,5 
X5,1 X5,2 X5,3 X5,4 X5,5
Matrix completion
The following problem has many applications. We are interested in a

matrix ...  
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 
X=  − X 3,2 X 3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5
... but we only get to observe some of its entries.
It is possible to “fill in the blanks?”

Yes, if X is low rank and its singular vectors are incoherent
X= Rows index movies
Low rank matrices Columns index users
• How do you fill in the missing data?
R*
X = L
kxn kxr rxn
kn entries r(k+n) entries
(slide courtesy of Benjamin Recht)

Applications of matrix completion
Recommender Euclidean
Systems Embedding
Rank of: Data Gram

Matrix Matrix
(slide courtesy of Benjamin Recht)

Low rank matrix recovery
 
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 

X =  − X3,2 X3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5
How do we fill in the missing data?
min rank(Z) subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω

Z
where Ω = locations where we have observed the data

This program is intractable
 
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 

X =  − X3,2 X3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5
min kZk∗ subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω

Z
where kZk∗ = “nuclear norm” = sum of singular values of Z

 
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 
X=
 − X3,2 X3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5
min kZk∗ subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω

Z
where kZk∗ = “nuclear norm” = sum of singular values of Z

The “nuclear norm heuristic” for low rank recovery was developed by
Maryam Fazel in her 2002 PhD Thesis (Stanford)
Matrix completion theory
This particular result is due to Recht ’09, but there are related results from
Candes, Tao, Keshevan, Montenari, Oh, Plan, and others ...
Suppose that an K × N matrix X0 = UΣV∗ is rank R with coherence

K N KN
µ = max max kU∗ ei k22 , max kV∗ ei k22 , kUV∗ k2∞
R i R i R
Then we can “complete” X0 with high probability from randomly chosen

samples when
#samples ≥ Const · µ · R(K + N ) log2 (N )

Why do we need incoherence?
Which matrices?
These matrices are low rank, but have high coherence:
• Any subset
X= that misses
component
nothing!
• Still need to
X= entire first r
• Want each e
(courtesy of Benjamin Recht)
provide nea
General matrix recovery
We can also consider the more general problem of recovering a matrix
from a set of linear measurements.
A single linear measurement of a K × N matrix X0 can be written as the
trace inner product between X0 and another K × N matrix Am :
ym = hX0 , Am iF = trace(A∗m X0 )
This is the same as pointwise multiplying the entries of Am and X0 and

then adding them up.
ym = h iF
,
General matrix recovery
We can also consider the more general problem of recovering a matrix

from a set of linear measurements.
A single linear measurement of a K × N matrix X0 can be written as the
trace inner product between X0 and another K × N matrix Am :
ym = hX0 , Am iF = trace(A∗m X0 )
This is the same as pointwise multiplying the entries of Am and X0 and

then adding them up.
We can collect M measurements together as
y = A(X0 )
where y ∈ RM and A takes K × N matrices to RM .

Geometrical structure in RN K
• 2x2 matrices
• plotted in 3d
rank 1
x2 + z2 + 2y2 = 1
Convex hull:
(courtesy of Benjamin Recht)

Work by Recht, Fazel, Parrilo, and Mohan has shown that if A obeys the
matrix RIP; for a δ < 1
(1 − δ)kXk2F ≤ kA(X)k22 ≤ (1 + δ)kXk2F ∀X : rank(X) ≤ 2R
then we can recovery a rank R matrix X0 from linear measurements

y = A(X0 ) by solving
min kXk∗ subject to A(X) = y.

X
Work by Recht, Fazel, Parrilo, and Mohan has shown that if A obeys the
matrix RIP; for a δ < 1
(1 − δ)kXk2F ≤ kA(X)k22 ≤ (1 + δ)kXk2F ∀X : rank(X) ≤ 2R
then we can recovery a rank R matrix X0 from linear measurements

y = A(X0 ) by solving
min kXk∗ subject to A(X) = y.

X
The recovery is stable in the presence of noise and is robust for matrices
that are approximately low rank.
Random linear measurements
If A is a random linear projection (iid Gaussian, for example) then we can

show for a fixed K × N matrix X

P kA(X)k22 − kXk2F > δkXk2F ≤ C e−c(δ)M
Just as in the “sparse RIP” case you can couple that with standard bounds
on the entropy of the space of low rank matrices to see that δ2R < 1 when
M & R(K + N )
Recht, Fazel, and Parillo had an early result of this nature in 2007; it was
later refined by Candes and Plan in 2010.
Bilinear equations
Bilinear equations contain unknown terms multiplied by one another

u1 v1 + 5u1 v2 + 7u2 v3 = −12
u3 v1 − 9u2 v2 + 4u3 v2 = 2
u1 v2 − 6u1 v3 − u3 v3 = 7
Their nonlinearity makes them trickier to solve, and the computational

framework is nowhere nearly as strong as for linear equations
Bilinear equations
Simple (but only recently appreciated) observation:

Systems of bilinear equations, e. g.
u1 v1 + 5u1 v2 + 7u2 v3 = −12
u3 v1 − 9u2 v2 + 4u3 v2 = 2
can be recast as linear system of equations on a matrix that has rank 1:
2 3
u1 v 1 u1 v 2 u1 v 3 ··· u1 v N
6 u2 v 1 u2 v 2 u2 v 3 ··· u2 v N 7
6 7
6 ··· u3 v N 7
uvT = 6 u3 v1 u3 v 2 u3 v 3 7
6 .. .. .. 7
4 . . . 5
uK v 1 uK v 2 uK v 3 ··· uK v N
Bilinear equations
Simple (but only recently appreciated) observation:
Systems of bilinear equations, e. g.
u1 v1 + 5u1 v2 + 7u2 v3 = −12
u3 v1 − 9u2 v2 + 4u3 v2 = 2
can be recast as linear system of equations on a matrix that has rank 1:
2 3
u1 v 1 u1 v 2 u1 v 3 ··· u1 v N
6 u2 v 1 u2 v 2 u2 v 3 ··· u2 v N 7
6 7
6 ··· u3 v N 7
uv = 6 u3 v1
T u3 v 2 u3 v 3 7
6 .. .. .. 7
4 . . . 5
uK v 1 uK v 2 uK v 3 ··· uK v N
Compressive (low rank) recovery ⇒

“Generic” quadratic systems with cN equations and N unknowns can be
solved using nuclear norm minimization
Recasting quadratic equations
2 3
v12 v1 v2 v1 v3 ··· v1 vN
6 v2 v1 v22 v2 v3 ··· v2 vN 7
6 7
T 6 v3 v1 v3 v2 v33 ··· v3 vN 7
vv = 6 7
6 .. .. 7
4 ··· . . 5
2
vN v1 vN v2 vN v3 ··· vN
2v12 + 5v3 v1 + 7v2 v3 = · · ·

v2 v1 + 9v22 + 4v3 v2 = · · ·
A quadratic system of equations can be recast as a linear system of

equations on a symmetric matrix that has rank 1.
From quadratic equations to linear equations
Relaxing quadratic equality constraints using SDPs is widespread in

optimization:
MAXCUT
Stability analysis
Filter and antenna array design
Recently, Candès, Strohmer, Voroninski, have looked at stylized version of

phase retrieval:
observe y` = |ha` , xi|2 ` = 1, . . . , L
and shown that x ∈ RN can be recovered when
L ∼ Const · N
for random a` .
A stylized communications problem:
unknown multipath channel
Multipath
(from B. Davis, UICU) (from ICTE, Aachen)

Channel coding: multipath
m encode p y decode m̃
y = HCm h̃
= m
= discovers unknown
p channel and message
C
y H p
convolution with
unknown channel
We observe a linear combination of shifts of the coded message:
y = HCm
= h(0)(Cm)↓0 + h(1)(Cm)↓1 + · · · + h(K − 1)(Cm)↓K−1
Bad:
With H unknown, these are nonlinear measurements
Good:
There are two sources of structure we can exploit
I the channel h is “short”
I the coded message Cm lives in a known subspace
Rearrange as multiple convolutions against columns Ck :
y = m(1) toep(C1 )h + m(2) toep(C2 )h + · · · m(N ) toep(CN )h
y = G1 G2 · · · Gp m(1)h
m(2)h
..
shifts of column n .
m(N )h
y = G1 G2 · · · Gp m(1)h
m(2)h
..
shifts of column n .
m(N )h
We have linear observations of a rank-1 matrix

y = A(hmT )
or
ŷ(`) = hm, d` ihf` , hi
ŷ(`) = Fourier coefficient of y
d` = Fourier transform of a row of C,
f` = Fourier vector
Numerical results
white = 100% success, black = 0% success
channel w sparse channel w short

In both of these cases, it looks like it is sufficient for
L ≈ 3(N + K)
Theoretical results
N = message length
K = channel length
L = code length
µ2h = channel coherence
µ2h
ĥ(ω)
0
freq !
µ2h = L max |ĥ(ω)|2

ω
Always true that 1 ≤ µ2h ≤ K for khk2 = 1
Theoretical results
N = message length
K = channel length
L = code length
µ2h = channel coherence
Ahmed, Recht, R, ’12:

Given y = A(hmT ), we can recover (whp) h and m using nuclear-norm
minimization when
L & max(K, µ2h N ) log3 (KN )
Recall that the number of degrees of freedom is ∼ K + N

(which we almost match)
Theoretical results
Key technical issue: how well the operator
A=
G1 G2 · · · Gp
embeds rank-2 matrices with a certain support
Key tool: matrix Bernstein inequality (Tropp 10; Koltchinskii)

for sums of random matrices

10 Lowrank Quadratic PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10 Lowrank Quadratic PDF

Uploaded by

Copyright:

Available Formats

Short Notes on Low Rank Recovery

Georgia Tech, School of ECE

The following problem has many applications. We are interested in a

The following problem has many applications. We are interested in a

It is possible to “fill in the blanks?”

• How do you fill in the missing data?

kxn kxr rxn

kn entries r(k+n) entries

(slide courtesy of Benjamin Recht)

Rank of: Data Gram

(slide courtesy of Benjamin Recht)

How do we fill in the missing data?

min rank(Z) subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω

where Ω = locations where we have observed the data

How do we fill in the missing data?

min kZk∗ subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω

where kZk∗ = “nuclear norm” = sum of singular values of Z

How do we fill in the missing data?

min kZk∗ subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω

where kZk∗ = “nuclear norm” = sum of singular values of Z

Then we can “complete” X0 with high probability from randomly chosen

#samples ≥ Const · µ · R(K + N ) log2 (N )

This is the same as pointwise multiplying the entries of Am and X0 and

We can also consider the more general problem of recovering a matrix

This is the same as pointwise multiplying the entries of Am and X0 and

We can collect M measurements together as

where y ∈ RM and A takes K × N matrices to RM .

(courtesy of Benjamin Recht)

(1 − δ)kXk2F ≤ kA(X)k22 ≤ (1 + δ)kXk2F ∀X : rank(X) ≤ 2R

then we can recovery a rank R matrix X0 from linear measurements

min kXk∗ subject to A(X) = y.

(1 − δ)kXk2F ≤ kA(X)k22 ≤ (1 + δ)kXk2F ∀X : rank(X) ≤ 2R

then we can recovery a rank R matrix X0 from linear measurements

min kXk∗ subject to A(X) = y.

If A is a random linear projection (iid Gaussian, for example) then we can

Bilinear equations contain unknown terms multiplied by one another

Their nonlinearity makes them trickier to solve, and the computational

Simple (but only recently appreciated) observation:

Compressive (low rank) recovery ⇒

2v12 + 5v3 v1 + 7v2 v3 = · · ·

A quadratic system of equations can be recast as a linear system of

Relaxing quadratic equality constraints using SDPs is widespread in

Recently, Candès, Strohmer, Voroninski, have looked at stylized version of

observe y` = |ha` , xi|2 ` = 1, . . . , L

and shown that x ∈ RN can be recovered when

(from B. Davis, UICU) (from ICTE, Aachen)

We observe a linear combination of shifts of the coded message:

Rearrange as multiple convolutions against columns Ck :

y = m(1) toep(C1 )h + m(2) toep(C2 )h + · · · m(N ) toep(CN )h

We have linear observations of a rank-1 matrix

channel w sparse channel w short

µ2h = L max |ĥ(ω)|2

Ahmed, Recht, R, ’12:

L & max(K, µ2h N ) log3 (KN )

Recall that the number of degrees of freedom is ∼ K + N

Key technical issue: how well the operator

embeds rank-2 matrices with a certain support

Key tool: matrix Bernstein inequality (Tropp 10; Koltchinskii)

You might also like