You are on page 1of 31

Short Notes on Low Rank Recovery

Justin Romberg

Georgia Tech, School of ECE

Tsinghua University
October 18, 2013
Beijing, China
Matrix completion

The following problem has many applications. We are interested in a


matrix ...  
X1,1 X1,2 X1,3 X1,4 X1,5
X2,1 X2,2 X2,3 X2,4 X2,5 
 
X= X3,1 X3,2 X3,3 X3,4 X3,5 

X4,1 X4,2 X4,3 X4,4 X4,5 
X5,1 X5,2 X5,3 X5,4 X5,5
Matrix completion

The following problem has many applications. We are interested in a


matrix ...  
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 
X=  − X 3,2 X 3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5
... but we only get to observe some of its entries.

It is possible to “fill in the blanks?”


Yes, if X is low rank and its singular vectors are incoherent
X= Rows index movies
Low rank matrices Columns index users

• How do you fill in the missing data?

R*
X = L

kxn kxr rxn

kn entries r(k+n) entries

(slide courtesy of Benjamin Recht)


Applications of matrix completion

Recommender Euclidean
Systems Embedding

Rank of: Data Gram


Matrix Matrix

(slide courtesy of Benjamin Recht)


Low rank matrix recovery

 
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 

X =  − X3,2 X3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5

How do we fill in the missing data?

min rank(Z) subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω


Z

where Ω = locations where we have observed the data


This program is intractable
Low rank matrix recovery

 
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 

X =  − X3,2 X3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5

How do we fill in the missing data?

min kZk∗ subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω


Z

where kZk∗ = “nuclear norm” = sum of singular values of Z


Low rank matrix recovery

 
X1,1 − X1,3 − X1,5
 − X2,2 − X2,4 − 
 
X=
 − X3,2 X3,3 − −  
X4,1 − − X4,4 X4,5 
− − − X5,4 X5,5

How do we fill in the missing data?

min kZk∗ subject to Z(i, j) = X(i, j) ∀(i, j) ∈ Ω


Z

where kZk∗ = “nuclear norm” = sum of singular values of Z


The “nuclear norm heuristic” for low rank recovery was developed by
Maryam Fazel in her 2002 PhD Thesis (Stanford)
Matrix completion theory

This particular result is due to Recht ’09, but there are related results from
Candes, Tao, Keshevan, Montenari, Oh, Plan, and others ...
Suppose that an K × N matrix X0 = UΣV∗ is rank R with coherence
 
K N KN
µ = max max kU∗ ei k22 , max kV∗ ei k22 , kUV∗ k2∞
R i R i R

Then we can “complete” X0 with high probability from randomly chosen


samples when

#samples ≥ Const · µ · R(K + N ) log2 (N )


Why do we need incoherence?
Which matrices?
These matrices are low rank, but have high coherence:

• Any subset
X= that misses
component
nothing!

• Still need to
X= entire first r

• Want each e
(courtesy of Benjamin Recht)
provide nea
General matrix recovery
We can also consider the more general problem of recovering a matrix
from a set of linear measurements.
A single linear measurement of a K × N matrix X0 can be written as the
trace inner product between X0 and another K × N matrix Am :

ym = hX0 , Am iF = trace(A∗m X0 )

This is the same as pointwise multiplying the entries of Am and X0 and


then adding them up.

ym = h iF
,
General matrix recovery

We can also consider the more general problem of recovering a matrix


from a set of linear measurements.
A single linear measurement of a K × N matrix X0 can be written as the
trace inner product between X0 and another K × N matrix Am :

ym = hX0 , Am iF = trace(A∗m X0 )

This is the same as pointwise multiplying the entries of Am and X0 and


then adding them up.

We can collect M measurements together as

y = A(X0 )

where y ∈ RM and A takes K × N matrices to RM .


Geometrical structure in RN K

• 2x2 matrices
• plotted in 3d

rank 1
x2 + z2 + 2y2 = 1

Convex hull:

(courtesy of Benjamin Recht)


Low rank matrix recovery

Work by Recht, Fazel, Parrilo, and Mohan has shown that if A obeys the
matrix RIP; for a δ < 1

(1 − δ)kXk2F ≤ kA(X)k22 ≤ (1 + δ)kXk2F ∀X : rank(X) ≤ 2R

then we can recovery a rank R matrix X0 from linear measurements


y = A(X0 ) by solving

min kXk∗ subject to A(X) = y.


X
Low rank matrix recovery

Work by Recht, Fazel, Parrilo, and Mohan has shown that if A obeys the
matrix RIP; for a δ < 1

(1 − δ)kXk2F ≤ kA(X)k22 ≤ (1 + δ)kXk2F ∀X : rank(X) ≤ 2R

then we can recovery a rank R matrix X0 from linear measurements


y = A(X0 ) by solving

min kXk∗ subject to A(X) = y.


X

The recovery is stable in the presence of noise and is robust for matrices
that are approximately low rank.
Random linear measurements

If A is a random linear projection (iid Gaussian, for example) then we can


show for a fixed K × N matrix X

P kA(X)k22 − kXk2F > δkXk2F ≤ C e−c(δ)M

Just as in the “sparse RIP” case you can couple that with standard bounds
on the entropy of the space of low rank matrices to see that δ2R < 1 when

M & R(K + N )

Recht, Fazel, and Parillo had an early result of this nature in 2007; it was
later refined by Candes and Plan in 2010.
Bilinear equations

Bilinear equations contain unknown terms multiplied by one another


u1 v1 + 5u1 v2 + 7u2 v3 = −12
u3 v1 − 9u2 v2 + 4u3 v2 = 2
u1 v2 − 6u1 v3 − u3 v3 = 7

Their nonlinearity makes them trickier to solve, and the computational


framework is nowhere nearly as strong as for linear equations
Bilinear equations

Simple (but only recently appreciated) observation:


Systems of bilinear equations, e. g.
u1 v1 + 5u1 v2 + 7u2 v3 = −12
u3 v1 − 9u2 v2 + 4u3 v2 = 2
can be recast as linear system of equations on a matrix that has rank 1:
2 3
u1 v 1 u1 v 2 u1 v 3 ··· u1 v N
6 u2 v 1 u2 v 2 u2 v 3 ··· u2 v N 7
6 7
6 ··· u3 v N 7
uvT = 6 u3 v1 u3 v 2 u3 v 3 7
6 .. .. .. 7
4 . . . 5
uK v 1 uK v 2 uK v 3 ··· uK v N
Bilinear equations
Simple (but only recently appreciated) observation:
Systems of bilinear equations, e. g.
u1 v1 + 5u1 v2 + 7u2 v3 = −12
u3 v1 − 9u2 v2 + 4u3 v2 = 2
can be recast as linear system of equations on a matrix that has rank 1:
2 3
u1 v 1 u1 v 2 u1 v 3 ··· u1 v N
6 u2 v 1 u2 v 2 u2 v 3 ··· u2 v N 7
6 7
6 ··· u3 v N 7
uv = 6 u3 v1
T u3 v 2 u3 v 3 7
6 .. .. .. 7
4 . . . 5
uK v 1 uK v 2 uK v 3 ··· uK v N

Compressive (low rank) recovery ⇒


“Generic” quadratic systems with cN equations and N unknowns can be
solved using nuclear norm minimization
Recasting quadratic equations

2 3
v12 v1 v2 v1 v3 ··· v1 vN
6 v2 v1 v22 v2 v3 ··· v2 vN 7
6 7
T 6 v3 v1 v3 v2 v33 ··· v3 vN 7
vv = 6 7
6 .. .. 7
4 ··· . . 5
2
vN v1 vN v2 vN v3 ··· vN

2v12 + 5v3 v1 + 7v2 v3 = · · ·


v2 v1 + 9v22 + 4v3 v2 = · · ·

A quadratic system of equations can be recast as a linear system of


equations on a symmetric matrix that has rank 1.
From quadratic equations to linear equations

Relaxing quadratic equality constraints using SDPs is widespread in


optimization:
MAXCUT
Stability analysis
Filter and antenna array design

Recently, Candès, Strohmer, Voroninski, have looked at stylized version of


phase retrieval:

observe y` = |ha` , xi|2 ` = 1, . . . , L

and shown that x ∈ RN can be recovered when

L ∼ Const · N

for random a` .
A stylized communications problem:
unknown multipath channel
Multipath

(from B. Davis, UICU) (from ICTE, Aachen)


Channel coding: multipath

m encode p y decode m̃

y = HCm h̃
= m
= discovers unknown
p channel and message
C
y H p
convolution with
unknown channel
Channel coding: multipath

We observe a linear combination of shifts of the coded message:

y = HCm
= h(0)(Cm)↓0 + h(1)(Cm)↓1 + · · · + h(K − 1)(Cm)↓K−1

Bad:
With H unknown, these are nonlinear measurements

Good:
There are two sources of structure we can exploit
I the channel h is “short”
I the coded message Cm lives in a known subspace
Channel coding: multipath

Rearrange as multiple convolutions against columns Ck :

y = m(1) toep(C1 )h + m(2) toep(C2 )h + · · · m(N ) toep(CN )h

y = G1 G2 · · · Gp m(1)h
m(2)h
..
shifts of column n .
m(N )h
Channel coding: multipath

y = G1 G2 · · · Gp m(1)h
m(2)h
..
shifts of column n .
m(N )h

We have linear observations of a rank-1 matrix


y = A(hmT )
or
ŷ(`) = hm, d` ihf` , hi
ŷ(`) = Fourier coefficient of y
d` = Fourier transform of a row of C,
f` = Fourier vector
Numerical results
white = 100% success, black = 0% success

channel w sparse channel w short


In both of these cases, it looks like it is sufficient for
L ≈ 3(N + K)
Theoretical results
N = message length
K = channel length
L = code length
µ2h = channel coherence

µ2h

ĥ(ω)

0
freq !

µ2h = L max |ĥ(ω)|2


ω
Always true that 1 ≤ µ2h ≤ K for khk2 = 1
Theoretical results

N = message length
K = channel length
L = code length
µ2h = channel coherence

Ahmed, Recht, R, ’12:


Given y = A(hmT ), we can recover (whp) h and m using nuclear-norm
minimization when

L & max(K, µ2h N ) log3 (KN )

Recall that the number of degrees of freedom is ∼ K + N


(which we almost match)
Theoretical results

Key technical issue: how well the operator

A=
G1 G2 · · · Gp

embeds rank-2 matrices with a certain support

Key tool: matrix Bernstein inequality (Tropp 10; Koltchinskii)


for sums of random matrices

You might also like