You are on page 1of 53

Dynamic `1 Reconstruction

Justin Romberg
Georgia Tech, ECE

Collaborators:
M. Salman Asif
Aurèle Balavoine
Chris Rozell

Tsinghua University
October 18, 2013
Beijing, China
Goal: a dynamical framework for sparse recovery

Given y and Φ, solve


1
min λkxk1 + kΦx − yk22
x 2
Goal: a dynamical framework for sparse recovery
We want to move from:

Given y and Φ, solve


1
min λkxk1 + kΦx − yk22
x 2
to
9
8 >
> >
>
>
> >
>
>
< >
=
min `1
y(t) x̂(t)
>
> >
>
>
> >
>
: >
>
;

(t)
Agenda
We will look at dynamical reconstruction in two different contexts:

Fast updating of solutions of `1 optimization programs

M. Salman Asif

Systems of nonlinear differential equations that solve `1 (and related)


optimization programs, implemented as continuous-time neural nets

Aurèle Balavoine Chris Rozell


Motivation:
Classical: dynamic
Recursive least-squares updating in LS
• System model:
y = ©x
System model:
  is full rank y = © x
y = Φx
 x is arbitrary
Φ has full column rank
x is arbitrary
• LS estimate
k©x ¡ yk2
minimizeestimate:
Least-squares ! x0 = (©T ©)¡1©T y
• Updates for
minaky − Φxk22 =⇒signal
time-varying T
with
x̂ = (Φ −1 T
Φ)the
Φ same
y 
mainly incurs a one-time cost of factorization. 11
Classical: Recursive
• Sequential least-squares
measurements:
· ¸ · ¸
Sequential measurement:
y ©
 =   x y = © x
w y = ÁΦ x
w φT

• Recursive LS w Á
T T ¡1 T
x1 =new
Compute (©estimate
© + Áusing
Á)rank-1 y + ÁT w)
(©update:
T T −1 T
= x0 +x̂1K=1(Φ
(wΦ¡+Áx
φφ )) (ΦRank
y+φ · w)
one update
0
= x̂0 + K1 (w − φT x0 )
K1 = (©T ©)¡1ÁT (1 + Á(©T ©)¡1ÁT ) 12
where
K1 = (ΦT Φ)−1 φ(1 + φT (ΦT Φ)−1 φ)−1
With the previous inverse in hand, the update has the cost of a
few matrix-vector multiplies
Classical: The Kalman filter
Linear dynamical system for state evolution and measurement:
yt = Φt xt + et
xt+1 = Ft xt + ft
   
I 0 0 0 ··· F0 x0
 Φ1 0 0 0 · · ·    y1 
  x1  
−F1 I 0 0 · · ·   
 0 
  
 0 Φ2 0 0 · · · x2   
  x3  =  y2 
 0 −F2 I 
0 · · · .   
 .  0 
 0 0 
Φ 3 0 · · · .  
  y3 
.. .. .. . . . ..
. . . . .. .
As time marches on, we add both rows and columns.
Least-squares problem:
X 
min σt kΦt yt − yt k22 + λt kxt − Ft−1 xt−1 k22
x1 ,x2 ,...
t
Classical: The Kalman filter

Linear dynamical system for state evolution and measurement:

yt = Φt xt + et
xt+1 = Ft xt + dt

Least-squares problem:
X 
min σt kΦt yt − yt k22 + λt kxt − Ft−1 xt−1 k22
x1 ,x2 ,...
t

Again, we can use low-rank updating to solve this recursively:

vk = Fk x̂
Kk+1 = (Fk Pk FkT + I)ΦTk+1 (Φk+1 (Fk Pk FkT + I)ΦTk+1 + I)−1
x̂k+1|k+1 = vk + Kk+1 (yk+1 − Φk+1 vk )
Pk+1 = (I − Kk+1 Φk+1 )(Fk Pk FkT + I)
Dynamic sparse recovery: `1 filtering

Goal: efficient updating for optimization programs like


1
min kW xk1 + kΦx − yk22
x 2

We want to dynamically update the solution when

I the underlying signal changes slightly,


I we add/remove measurements,
I the weights changes,
I we have streaming measurements for an evolving signal
(adding/removing columns from Φ)
Optimality conditions for BPDN

1
min kW xk1 + kΦx − yk22
x 2

Conditions for x∗ (supported on Γ∗ ) to be a solution:

φTγ (Φx∗ − y) = −W [γ, γ]z[γ] γ ∈ Γ∗


|φTγ (Φx∗ − y)| ≤ W [γ, γ] γ ∈ Γ∗c

where z[γ] = sign(x[γ])

Derived simply by computing the subgradient of the functional above


Example: time-varying sparse signal
Initial measurements. Observe
y1 = Φx1 + e1
Initial reconstruction. Solve
1
min λkxk1 + kΦx − y1 k22
x 2
Example: time-varying sparse signal
Initial measurements. Observe
y1 = Φx1 + e1
Initial reconstruction. Solve
1
min λkxk1 + kΦx − y1 k22
x 2
A new set of measurements arrives:
y2 = Φx2 + e2
Reconstruct again using `1 -min:
1
min λkxk1 + kΦx − y2 k22
x 2
Example: time-varying sparse signal
Initial measurements. Observe
y1 = Φx1 + e1
Initial reconstruction. Solve
1
min λkxk1 + kΦx − y1 k22
x 2
A new set of measurements arrives:
y2 = Φx2 + e2
Reconstruct again using `1 -min:
1
min λkxk1 + kΦx − y2 k22
x 2
We can gradually move from the first solution to the second solution
using homotopy
1
min λkxk1 + kΦx − (1 − )y1 − y2 k22
2
Take  from 0 → 1
Example: time-varying sparse signal

1
min λkxk1 + kΦx − (1 − )yold − ynew k22 , take  from 0 → 1
2

Path from old solution to new solution is piecewise linear


Optimality conditions for fixed :

ΦT
Γ (Φx − (1 − )yold − ynew ) = −λ sign xΓ
kΦT
Γc (Φx − (1 − )yold − ynew )k∞ < λ

Γ = active support
Update direction:
(
−(ΦT −1
Γ ΦΓ ) (yold − ynew ) on Γ
∂x =
0 off Γ
y1from
Path = ©x + e1 to new
old1solution
Γ = support of current solution.
1
minimize
Move in this¿kxk
direction
1 + k©x ¡ y1 k22
(2
−(ΦT −1
Γ ΦΓ ) (yold − ynew ) on Γ
∂x =
x !x
1 2 ) y !y
0 1 2 off Γ
parse  innovations”
until support changes, or one
1 of these constraints is violated:
2
minimize ¿kxk
φTγ (Φ(x 1 +− (1k©x ¡ y2 k2 ) < λ for all γ ∈ Γc
+ ∂x) 2 − )yold − ynew
x ¡ (1 ¡ ²)y1 ¡ ²y2 k22
omotopy parameter: 0 ! 1
b2
x
to new
near and
²: 0 ! 1 b1
x 17
Sparse innovations
House

Blocks Pcw. poly

20
Numerical experiments: time-varying sparse signals
!"#$%&'"()*+,-$%'-
TKLLU&
X9MEA45YZ R7LSG;; H7IJKL&
L4[ME3&P9W2 V/A/P/W9
!M7N/OKPK%&I7Q) !M7N/OKPK% I7Q) !M7N/OKPK%&I7Q)
!M7N/OKPK%&I7Q)
=&>&,#*+
?&>&-,*
!*'$:*% #$,'*) !*'-%&#$"*+) !,#+$-%&#$,6) !,+6$(-%&#$,::)
@&>&AB-%&<&C&@B*#
DE3021&>&FBG ,

;3/5<1 !*$:%&#$#*6) !:($6%&#$+"#) !,:%&#$,'') !-'$-%&#$,"()

758$&7/39$ !,'$6'%&#$,-,) !,-#$*%&,$#"() !*($#-% #$*,*) !(($6"%&#$*-)

./012&134521 !*($*%&#$#,,) !-'$+%&#$#,") !"*$*+%&#$#,*) !"#$"%&#$#'()

! ! """#!#! $!!
M7N/OKPK\ N/0[V39 PV2 E][$ M/$ /^ AEPN4_ ]25P/N WN/O05P1 84PV # EMO #!
I7Q\ E]2NE[2 5W0P4A2 P/ 1/3]2
[Asif and R. 2009]
Adding a measurement

Analog of recursive least squares for `1 min:


     
y Φ e 1 1
= x+ −→ min λkxk1 + kΦx−yk22 + |hφ, xi−w|2
w φ d x 2 2

Work in the new measurement slowly


1 
min λkxk1 + kΦx − yk22 + |hφ, xi − w|2
2
Again, the solution path is piecewise linear in 

[Garrigues et al. 08, Asif and R 09]


Adding a measurement: updating
Time-varying signals
Optimality conditions

del: y1 = ©xΦ1TΓ+
(Φxe1− y) + (hφ, xi − w)φΓ = −λ sign xΓ
kΦTΓc (Φx − y) + (hφ, xi − w)φΓc k∞ < λ
1
minimize ¿kxk1 + k©x ¡ y1 k22
2
s: x1 !
From critical
x2 point
(
)x0 ,yupdate
1 ! y2
direction is
“Sparse  innovations” T T −1
(w − hφ,
1 x0 i) · (ΦΓ Φ2Γ + 0 φφ ) φΓ on Γ
∂x =
blem: minimize ¿kxk 0 1 + k©x ¡ y2 k2 off Γ
2
1
+ k©x ¡ (1 ¡ ²)y1 ¡ ²y2 k22
2
Homotopy parameter: 0 ! 1
b2
x
solution to new
ewise linear and
ized by ²: 0 ! 1 b1
x 17
Numerical experiments: adding a measurement
N=Dynamic `1 numerical
1024, measurements M = experiments
512, sparsity S = 100
Add Initial
P new problem: N = 1024, M = 512, K = M/5 (Gaussian)
measurements
Sequential update: Added P new measurements
Compare the average
Comparison number
for the count of matrix-vector
of matrix-vector productsproducts
per updateper update

4
Reweighted `1
Weighted `1 reconstruction:
X 1 1
min wk |xk | + kΦx − yk22 = min kW xk1 + kΦx − yk22
x 2 x 2
k
solve this iteratively, adapting the weights to the previous solution:
λ
wk = old
|xk | + c
Enhancing Sparsity by Reweighted l1 Minimization 25

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
0 100 200 300 400 500 0 100 200 300 400 500
(a) (b)
1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
0 100 200 300 400 500 0 100 200 300 400 500
(c) (d)

FIGURE 11 (a) Original two-pulse signal (blue) and reconstructions (red) via (b)
!1 synthesis, (c) !1 analysis, (d) reweighted !1 analysis.
(from Boyd, Candes, Wakin ’08)
Changing the weights

Iterative reweighting: take {wk } → {w̃k }

Optimality conditions:

φ∗k (y − Φx) = (wk + (1 − )w̃k )zk on support, k ∈ Γ


|φ∗k (y − Φx)| < wk + (1 − )w̃k off support, k ∈ Γc

Update direction (increasing ):


(
(Φ∗Γ ΦΓ )−1 (W − W̃ )z on Γ
∂x =
0 on Γc

[Asif and R 2012]


Numerical experiments: changing the weights

Numerical Experiments
Sparse signal of length N recovered from M Gaussian measurements
ADP-H (adaptive weighting via homotopy) SpaRSA [Wright et al 2007]
IRW-H (iterative reweighting via homotopy) YALL1 [Yang et al 2007]
A general, flexible homotopy framework

Formulations above
I depend critically on maintaining optimality
I are very efficient when the solutions are close
Streaming measurements for evolving signals require some type of
predict and update framework

Kalman filter: vk = F x̂k , (predict)


x̂k+1 = vk + K(y − Φk vk ), (update)

What program does the prediction vk solve?


Can we trace the path to the solution from a general “warm start”?
A general, flexible homotopy framework

We want to solve
1
min kW xk1 + kΦx − yk22
x 2
Initial guess/prediction: v
Solve
1
min kW xk1 + kΦx − yk22 + (1 − )uT x
x 2
for  : 0 → 1.
Taking
u = −W z − ΦT (Φv − y)
for some z ∈ ∂(kvk1 ) makes v optimal for  = 0
Moving from the warm-start to the solution

1
min kW xk1 + kΦx − yk22 + (1 − )uT x
x 2
The optimality conditions are

ΦTΓ (Φx − y) + (1 − )u = −W sign xΓ


T
φγ (Φx − y) + (1 − )u ≤ W [γ, γ]

We move in direction (
uΓ on Γ
∂x =
0 on Γc
until a component shrinks to zero or a constraint is violated, yielding new Γ
Streaming basis: Lapped orthogonal transform Streaming signal re
1.5
1

0.8
0.5

0.6
0

0.4
−0.5

Streaming signal recovery - Results


0.2 −1

0 −1.5
−0.5 0 0.5 1 1.5 2 2.5 3 3.5 −0.5 0 0.5 1 1.5
Streaming sparse recovery
Sparse recovery: streaming system
• Signal observations:
Observations:   yt = Φt xt + et
X
• Sparse representation:
Representation:    = αp,k ψp,k [n]
x[n]
p,k

Measurements Measurement matrices Signal Error LOT Signal LOT representation bases LOT
windows coefficients
active interval

active interval

34
Streaming sparse recovery
Sparse recovery: streaming system
Iteratively reconstruct the signal over a sliding (active) interval,
form u from your prediction, then take  : 0 → 1 in
• Iteratively estimate the signal over a sliding (active) interval:
1 1 2 ¹+ ~(1 − )uT2α
Desired min kW αk1 +kW®k
minimize kΦ̄Ψ̃α +
− ỹkk2©ª® ¡ y ~k2
α 2 1
2
1 ¹~
where Ψ̃, ỹ accountminimize
Homotopy kW®k1 +
for edge effects ~k22 + (1 ¡ ²)uT®
k©ª® ¡ y
2

Overlapping system matrix Sparse Error Divide the system into two parts
vector

37
Streaming signal recovery: Simulation
Streaming signal recovery - Results

(Top-left) Mishmash signal (zoomed in for first 2560 samples.


(Top-right) Error in the reconstruction at R=N/M = 4.
(Bottom-left) LOT coefficients. (Bottom-right) Error in LOT coefficients 40
Streaming signal recovery: Simulation
Streaming signal recovery - Results

(left) SER at different R from ±1 random measurements in 35 db noise


(middle) Count for matrix-vector multiplications
(right) Matlab execution time 41
Streaming signal recovery: Dynamic signal

Observation/evolution model:

yt = Φt xt + et
xt+1 = Ft xt + dt

We solve
X 1 1
min kWt αt k1 + kΦt Ψt αt − yt k22 + kFt−1 Ψt−1 αt−1 − Ψt αt k22
α
t
2 2

(formulation similar to Vaswani 08, Carmi et al 09, Angelosante et al 09, Zainel at al 10, Charles et al 11)

using
1 1
min kW αk1 + kΦ̄Ψ̃α − ỹk22 + kF̄ Ψ̃α − q̃k22 + (1 − )uT α
α 2 2
Dynamic signal: Simulation
Dynamic signal recovery - Results

(Top-left) Piece-Regular signal (shifted copies) in image


(Top-right) Error in the reconstruction at R=N/M = 4.
(Bottom-left) Reconstructed signal at R=4.
(Bottom-right) Comparison of SER for the L1-regularized and the L2-regularized
problems 46
Dynamic signal recovery - Results
Dynamic signal: Simulation

(left) SER at different R from ±1 random measurements in 35 db noise


(middle) Count for matrix-vector multiplications
(right) Matlab execution time 47
Dynamical systems for sparse recovery
Approximate analog computing

Radical re-think of how computer arithmetic is done — computations


use the physics of the devices (transistors) more directly
Use < 1% of the transistors, maybe 1/10, 000 of the power, possibly
100x faster than GPU
Computations are noisy, overall precision ≈ 10−2
Approximate analog computing

Radical re-think of how computer arithmetic is done — computations


use the physics of the devices (transistors) more directly
Use < 1% of the transistors, maybe 1/10, 000 of the power, possibly
100x faster than GPU
Computations are noisy, overall precision ≈ 10−2
Small scale successes (embedded beamforming, adaptive filtering)
Approximate analog computing

Radical re-think of how computer arithmetic is done — computations


use the physics of the devices (transistors) more directly
Use < 1% of the transistors, maybe 1/10, 000 of the power, possibly
100x faster than GPU
Computations are noisy, overall precision ≈ 10−2
Small scale successes (embedded beamforming, adaptive filtering)
Medium to large scale potential
I FPAAs
I specialized circuits for optimization
(Hopfield networks, neural net implementations)
I general (SIMD) computing architecture ?
Much of this work is proprietary, start-ups swallowed up by AD, NI, ...
Lyric semiconductor, GTronix, Singular Computing, ...
Motivation(
Analog vector-matrix-multiply

•  Digital(Multiply0and0 •  Analog(Vector0
Accumulate( Matrix(Multiplier(

! (
! !
! !
–  Small!time!constant! –  Limited!accuracy!
–  Low!power!consumption! –  Limited!dynamic!range!
! !
Dynamical systems for sparse recovery
There are simple systems of nonlinear differential equations that settle to
the solution of
1
min λkxk1 + kΦx − yk22
x 2
or more generally
N
X 1
min λ C(x[n]) + kΦx − yk22
x 2
n=1

The Locally Competitive Algorithm (LCA):

τ u̇(t) = −u(t) − (ΦT Φ − I)x(t) + ΦT y


x(t) = Tλ (u(t))

is a neurologically-inspired (Rozell et al 08) system which settles to the


solutions of the above
Locally competitive algorithm
τ u̇(t) = −u(t) − (ΦT Φ − I)x(t) + ΦT y
x(t) = Tλ (u(t))

inputs state activation outputs


T
1y u1 (t) T (·) x1 (t)
h 1, 2 ix2 (t)

T
2y u2 (t) T (·) x2 (t)
y

h N, 2 ix2 (t)

T uN (t) T (·) xN (t)


Ny
Locally competitive algorithm
Cost function
X 1
V (x) = λ C(xn ) + kΦx − yk22 τ u̇(t) = −u(t) − (ΦT Φ − I)x(t) + ΦT y
2
n xn (t) = Tλ (un (t))

dC
(u) = u x x = T (u )
C(un ) dx n n

un un
Key questions

T x⇤
1y u1 (t) T (·) x1 (t)
h 1, 2 ix2 (t) x(0)

T
2y u2 (t) T (·) x2 (t)
y
X 1
h N, 2 ix2 (t)
min C(xn ) + k x yk22
x
n
2
T uN (t) T (·) xN (t)
Ny

Uniform convergence
Convergence speed (general)
Convergence speed for sparse recovery via `1 minimization
LCA convergence

Assumptions
x(t)
1 u − a ∈ λ∂C(a)

(
0 |u| ≤ λ
2 x = Tλ (u) =
f (u) |u| > λ

3 Tλ (·) is odd and continuous,


f 0 (u) > 0, f (u) < u
LCA convergence

Global asymptotic convergence:

If 1–3 hold above, then the outputs


stop moving eventually:

ẋ(t) → 0 as t → ∞

If the critical points of the cost func-


tion are isolated then

x(t) → x∗ , u(t) → u∗ , as t → ∞
LCA convergence

Assumptions
1 u − a ∈ λ∂C(a)

(
0 |u| ≤ λ
2 x = Tλ (u) =
f (u) |u| > λ
x(t)

3 Tλ (·) is odd and continuous,


f 0 (u) > 0, f (u) < u

4 f (·) is subanalytic

5 f 0 (u) ≤ α
LCA convergence

Global asymptotic convergence:

If 1–5 hold above, then the LCA is


globally asymptotically convergent:

x(t) → x∗ , u(t) → u∗ , as t → ∞

where x∗ is a critical point of the func-


tional.
Convergence: support is recovered in finite time

If the LCA converges to a fixed point # of switches/sparsity


u∗ such that

|uγ | ≥ λ + r, and |uγ | ≤ λ − r

for all γ ∈ Γ∗c , then the support of x∗


is recovered in finite time

Φ = [DCT I]
M = 256, N = 512
Convergence: exponential (of a sort)

In addition to the conditions for global convergence, if there exists


0 ≤ δ < 1 such that for all t ≥ 0

(1 − δ)kx̃(t)k22 ≤ kΦx̃(t)k22 ≤ (1 + δ)kx̃(t)k22 ,

where x̃(t) = x(t) − x∗ , and αd < 1 (f 0 (u) ≤ α), then the LCA
exponentially converges to a unique fixed point:

ku(t) − u∗ k2 ≤ κ0 e−(1−αδ)/τ
Convergence: exponential (of a sort)

In addition to the conditions for global convergence, if there exists


0 ≤ δ < 1 such that for all t ≥ 0

(1 − δ)kx̃(t)k22 ≤ kΦx̃(t)k22 ≤ (1 + δ)kx̃(t)k22 ,

where x̃(t) = x(t) − x∗ , and αd < 1 (f 0 (u) ≤ α), then the LCA
exponentially converges to a unique fixed point:

ku(t) − u∗ k2 ≤ κ0 e−(1−αδ)/τ

Of course, this depends on not too many nodes being active at any one
time ...
Activation in proper subsets for `1
If Φ a “random compressed sensing matrix” and
M ≥ Const · S 2 log(N/S)
then for reasonably small values of λ and starting from rest
Γ(t) ⊂ Γ∗
That is, only subsets of the true support are ever active

Similar results for OMP and homotopy algorithms in CS literature


Efficient activation for `1
If Φ a “random compressed sensing matrix” and

M ≥ Const · S log(N/S)

then for reasonably small values of λ and starting from rest

|Γ(t)| ≤ 2|Γ∗ |

Similar results for OMP/ROMP, CoSAMP, etc. in CS literature


References

M. Asif and J. Romberg, “Dynamic updating for l1 minimization” IEEE Journal on Special Topics in Signal Processing,
April 2010.

M. Asif and J. Romberg, “Fast and accurate algorithms for re-weighted `1 -norm minimization,” submitted to IEEE
Transactions on Signal Processing, July 2012.

M. Asif and J. Romberg, “Sparse recovery of streaming signals using `1 homotopy,” submitted to IEEE Transactions on
Signal Processing, June 2013.

A. Balavoine, J. Romberg, and C. Rozell, “Convergence and Rate Analysis of Neural Networks for Sparse
Approximation,” IEEE Transactions on Neural Networks and Learning Systems, vol. 23, no. 9, pp. 13771389,
September 2012.

A. Balavoine, J. Romberg, and C. Rozell, “Convergence Speed of a Dynamical System for Sparse Recovery,” to appear
in IEEE Transactions on Signal Processing, late 2013.

A. Balavoine, J. Romberg, and C. Rozell, “Convergence of a neural network for sparse approximation using the
nonsmooth ojasiewicz inequality,” Proc. of Int. Joint. Conf. on Neural Networks, August 2013.

http://users.ece.gatech.edu/~justin/Publications.html

You might also like