magic : Recovery of Sparse Signals
via Convex Programming
Emmanuel Cand`es and Justin Romberg, Caltech
October 2005
1 Seven problems
A recent series of papers [3–8] develops a theory of signal recovery from highly incomplete
information. The central results state that a sparse vector x
0
∈ R
N
can be recovered from a
small number of linear measurements b = Ax
0
∈ R
K
, K < N (or b = Ax
0
+ e when there is
measurement noise) by solving a convex program.
As a companion to these papers, this package includes MATLAB code that implements this
recovery procedure in the seven contexts described below. The code is not meant to be cutting
edge, rather it is a proofofconcept showing that these recovery procedures are computationally
tractable, even for large scale problems where the number of data points is in the millions.
The problems fall into two classes: those which can be recast as linear programs (LPs), and
those which can be recast as secondorder cone programs (SOCPs). The LPs are solved using
a generic pathfollowing primaldual method. The SOCPs are solved with a generic logbarrier
algorithm. The implementations follow Chapter 11 of [2].
For maximum computational eﬃciency, the solvers for each of the seven problems are imple
mented separately. They all have the same basic structure, however, with the computational
bottleneck being the calculation of the Newton step (this is discussed in detail below). The code
can be used in either “small scale” mode, where the system is constructed explicitly and solved
exactly, or in “large scale” mode, where an iterative matrixfree algorithm such as conjugate
gradients (CG) is used to approximately solve the system.
Our seven problems of interest are:
• Min
1
with equality constraints. The program
(P
1
) min x
1
subject to Ax = b,
also known as basis pursuit, ﬁnds the vector with smallest
1
norm
x
1
:=
¸
i
[x
i
[
that explains the observations b. As the results in [4, 6] show, if a suﬃciently sparse x
0
exists such that Ax
0
= b, then (P
1
) will ﬁnd it. When x, A, b have realvalued entries, (P
1
)
can be recast as an LP (this is discussed in detail in [10]).
• Minimum
1
error approximation. Let A be a M N matrix with full rank. Given
y ∈ R
M
, the program
(P
A
) min
x
y −Ax
1
1
ﬁnds the vector x ∈ R
N
such that the error y − Ax has minimum
1
norm (i.e. we are
asking that the diﬀerence between Ax and y be sparse). This program arises in the context
of channel coding [8].
Suppose we have a channel code that produces a codeword c = Ax for a message x. The
message travels over the channel, and has an unknown number of its entries corrupted.
The decoder observes y = c + e, where e is the corruption. If e is sparse enough, then the
decoder can use (P
A
) to recover x exactly. Again, (P
A
) can be recast as an LP.
• Min
1
with quadratic constraints. This program ﬁnds the vector with minimum
1
norm that comes close to explaining the observations:
(P
2
) min x
1
subject to Ax −b
2
≤ ,
where is a user speciﬁed parameter. As shown in [5], if a suﬃciently sparse x
0
exists such
that b = Ax
0
+ e, for some small error term e
2
≤ , then the solution x
2
to (P
2
) will be
close to x
0
. That is, x
2
− x
0

2
≤ C , where C is a small constant. (P
2
) can be recast
as a SOCP.
• Min
1
with bounded residual correlation. Also referred to as the Dantzig Selector,
the program
(P
D
) min x
1
subject to A
∗
(Ax −b)
∞
≤ γ,
where γ is a user speciﬁed parameter, relaxes the equality constraints of (P
1
) in a diﬀerent
way. (P
D
) requires that the residual Ax −b of a candidate vector x not be too correlated
with any of the columns of A (the product A
∗
(Ax−b) measures each of these correlations).
If b = Ax
0
+e, where e
i
∼ ^(0, σ
2
), then the solution x
D
to (P
D
) has nearoptimal minimax
risk:
Ex
D
−x
0

2
2
≤ C(log N)
¸
i
min(x
0
(i)
2
, σ
2
),
(see [7] for details). For realvalued x, A, b, (P
D
) can again be recast as an LP; in the
complex case, there is an equivalent SOCP.
It is also true that when x, A, b are complex, the programs (P
1
), (P
A
), (P
D
) can be written as
SOCPs, but we will not pursue this here.
If the underlying signal is a 2D image, an alternate recovery model is that the gradient is
sparse [18]. Let x
ij
denote the pixel in the ith row and j column of an nn image x, and deﬁne
the operators
D
h;ij
x =
x
i+1,j
−x
ij
i < n
0 i = n
D
v;ij
x =
x
i,j+1
−x
ij
j < n
0 j = n
,
and
D
ij
x =
D
h;ij
x
D
v;ij
x
. (1)
The 2vector D
ij
x can be interpreted as a kind of discrete gradient of the digital image x. The
total variation of x is simply the sum of the magnitudes of this discrete gradient at every point:
TV(x) :=
¸
ij
(D
h;ij
x)
2
+ (D
v;ij
x)
2
=
¸
ij
D
ij
x
2
.
With these deﬁnitions, we have three programs for image recovery, each of which can be recast
as a SOCP:
2
• MinTV with equality constraints.
(TV
1
) min TV(x) subject to Ax = b
If there exists a piecewise constant x
0
with suﬃciently few edges (i.e. D
ij
x
0
is nonzero for
only a small number of indices ij), then (TV
1
) will recover x
0
exactly — see [4].
• MinTV with quadratic constraints.
(TV
2
) min TV(x) subject to Ax −b
2
≤
Examples of recovering images from noisy observations using (TV
2
) were presented in [5].
Note that if A is the identity matrix, (TV
2
) reduces to the standard RudinOsherFatemi
image restoration problem [18]. See also [9, 11–13] for SOCP solvers speciﬁcally designed
for the totalvariational functional.
• Dantzig TV.
(TV
D
) min TV(x) subject to A
∗
(Ax −b)
∞
≤ γ
This program was used in one of the numerical experiments in [7].
In the next section, we describe how to solve linear and secondorder cone programs using modern
interior point methods.
2 Interior point methods
Advances in interior point methods for convex optimization over the past 15 years, led by the
seminal work [14], have made largescale solvers for the seven problems above feasible. Below we
overview the generic LP and SOCP solvers used in the
1
magic package to solve these problems.
2.1 A primaldual algorithm for linear programming
In Chapter 11 of [2], Boyd and Vandenberghe outline a relatively simple primaldual algorithm
for linear programming which we have followed very closely for the implementation of (P
1
),(P
A
),
and (P
D
). For the sake of completeness, and to set up the notation, we brieﬂy review their
algorithm here.
The standardform linear program is
min
z
'c
0
, z` subject to A
0
z = b,
f
i
(z) ≤ 0,
where the search vector z ∈ R
N
, b ∈ R
K
, A
0
is a KN matrix, and each of the f
i
, i = 1, . . . , m
is a linear functional:
f
i
(z) = 'c
i
, z` + d
i
,
for some c
i
∈ R
N
, d
i
∈ R. At the optimal point z
, there will exist dual vectors ν
∈ R
K
, λ
∈
R
m
, λ
≥ 0 such that the KarushKuhnTucker conditions are satisﬁed:
(KKT) c
0
+ A
T
0
ν
+
¸
i
λ
i
c
i
= 0,
λ
i
f
i
(z
) = 0, i = 1, . . . , m,
A
0
z
= b,
f
i
(z
) ≤ 0, i = 1, . . . , m.
3
In a nutshell, the primal dual algorithm ﬁnds the optimal z
(along with optimal dual vectors
ν
and λ
) by solving this system of nonlinear equations. The solution procedure is the classical
Newton method: at an interior point (z
k
, ν
k
, λ
k
) (by which we mean f
i
(z
k
) < 0, λ
k
> 0), the
system is linearized and solved. However, the step to new point (z
k+1
, ν
k+1
, λ
k+1
) must be
modiﬁed so that we remain in the interior.
In practice, we relax the complementary slackness condition λ
i
f
i
= 0 to
λ
k
i
f
i
(z
k
) = −1/τ
k
, (2)
where we judiciously increase the parameter τ
k
as we progress through the Newton iterations.
This biases the solution of the linearized equations towards the interior, allowing a smooth, well
deﬁned “central path” from an interior point to the solution on the boundary (see [15, 20] for an
extended discussion).
The primal, dual, and central residuals quantify how close a point (z, ν, λ) is to satisfying (KKT)
with (2) in place of the slackness condition:
r
dual
= c
0
+ A
T
0
ν +
¸
i
λ
i
c
i
r
cent
= −Λf −(1/τ)1
r
pri
= A
0
z −b,
where Λ is a diagonal matrix with (Λ)
ii
= λ
i
, and f =
f
1
(z) . . . f
m
(z)
T
.
From a point (z, ν, λ), we want to ﬁnd a step (∆z, ∆ν, ∆λ) such that
r
τ
(z + ∆z, ν + ∆ν, λ + ∆λ) = 0. (3)
Linearizing (3) with the Taylor expansion around (z, ν, λ),
r
τ
(z + ∆z, ν + ∆ν, λ + ∆λ) ≈ r
τ
(z, ν, λ) + J
rτ
(z, νλ)
¸
∆z
∆ν
∆λ
¸
,
where J
rτ
(z, νλ) is the Jacobian of r
τ
, we have the system
¸
0 A
T
0
C
T
−ΛC 0 −F
A
0
0 0
¸
¸
∆z
∆v
∆λ
¸
= −
¸
c
0
+ A
T
0
ν +
¸
i
λ
i
c
i
−Λf −(1/τ)1
A
0
z −b
¸
,
where m N matrix C has the c
T
i
as rows, and F is diagonal with (F)
ii
= f
i
(z). We can
eliminate ∆λ using:
∆λ = −ΛF
−1
C∆z −λ −(1/τ)f
−1
(4)
leaving us with the core system
−C
T
F
−1
ΛC A
T
0
A
0
0
∆z
∆ν
=
−c
0
+ (1/τ)C
T
f
−1
−A
T
0
ν
b −A
0
z
. (5)
With the (∆z, ∆ν, ∆λ) we have a step direction. To choose the step length 0 < s ≤ 1, we ask
that it satisfy two criteria:
1. z + s∆z and λ + s∆λ are in the interior, i.e. f
i
(z + s∆z) < 0, λ
i
> 0 for all i.
2. The norm of the residuals has decreased suﬃciently:
r
τ
(z + s∆z, ν + s∆ν, λ + s∆λ)
2
≤ (1 −αs) r
τ
(z, ν, λ)
2
,
where α is a userspreciﬁed parameter (in all of our implementations, we have set α = 0.01).
4
Since the f
i
are linear functionals, item 1 is easily addressed. We choose the maximum step size
that just keeps us in the interior. Let
1
+
f
= ¦i : 'c
i
, ∆z` > 0¦, 1
−
λ
¦i : ∆λ < 0¦,
and set
s
max
= 0.99 min¦1, ¦−f
i
(z)/'c
i
, ∆z`, i ∈ 1
+
f
¦, ¦−λ
i
/∆λ
i
, i ∈ 1
−
λ
¦¦.
Then starting with s = s
max
, we check if item 2 above is satisﬁed; if not, we set s
= β s and
try again. We have taken β = 1/2 in all of our implementations.
When r
dual
and r
pri
are small, the surrogate duality gap η = −f
T
λ is an approximation to how
close a certain (z, ν, λ) is to being opitmal (i.e. 'c
0
, z` −'c
0
, z
` ≈ η). The primaldual algorithm
repeats the Newton iterations described above until η has decreased below a given tolerance.
Almost all of the computational burden falls on solving (5). When the matrix −C
T
F
−1
ΛC is
easily invertible (as it is for (P
1
)), or there are no equality constraints (as in (P
A
), (P
D
)), (5)
can be reduced to a symmetric positive deﬁnite set of equations.
When N and K are large, forming the matrix and then solving the linear system of equations
in (5) is infeasible. However, if fast algorithms exist for applying C, C
T
and A
0
, A
T
0
, we can use
a “matrix free” solver such as Conjugate Gradients. CG is iterative, requiring a few hundred
applications of the constraint matrices (roughly speaking) to get an accurate solution. A CG
solver (based on the very nice exposition in [19]) is included with the
1
magic package.
The implementations of (P
1
), (P
A
), (P
D
) are nearly identical, save for the calculation of the
Newton step. In the Appendix, we derive the Newton step for each of these problems using
notation mirroring that used in the actual MATLAB code.
2.2 A logbarrier algorithm for SOCPs
Although primaldual techniques exist for solving secondorder cone programs (see [1, 13]), their
implementation is not quite as straightforward as in the LP case. Instead, we have implemented
each of the SOCP recovery problems using a logbarrier method. The logbarrier method, for
which we will again closely follow the generic (but eﬀective) algorithm described in [2, Chap.
11], is conceptually more straightforward than the primaldual method described above, but at
its core is again solving for a series of Newton steps.
Each of (P
2
), (TV
1
), (TV
2
), (TV
D
) can be written in the form
min
z
'c
0
, z` subject to A
0
z = b
f
i
(z) ≤ 0, i = 1, . . . , m (6)
where each f
i
describes either a constraint which is linear
f
i
= 'c
i
, z` + d
i
or secondorder conic
f
i
(z) =
1
2
A
i
z
2
2
−('c
i
, z` + d
i
)
2
(the A
i
are matrices, the c
i
are vectors, and the d
i
are scalars).
The standard logbarrier method transforms (6) into a series of linearly constrained programs:
min
z
'c
0
, z` +
1
τ
k
¸
i
−log(−f
i
(z)) subject to A
0
z = b, (7)
5
where τ
k
> τ
k−1
. The inequality constraints have been incorporated into the functional via a
penalty function
1
which is inﬁnite when the constraint is violated (or even met exactly), and
smooth elsewhere. As τ
k
gets large, the solution z
k
to (7) approaches the solution z
to (6): it
can be shown that 'c
0
, z
k
` −'c
0
, z
` < m/τ
k
, i.e. we are within m/τ
k
of the optimal value after
iteration k (m/τ
k
is called the duality gap). The idea here is that each of the smooth subproblems
can be solved to fairly high accuracy with just a few iterations of Newton’s method, especially
since we can use the solution z
k
as a starting point for subproblem k + 1.
At logbarrier iteration k, Newton’s method (which is again iterative) proceeds by forming a series
of quadratic approximations to (7), and minimizing each by solving a system of equations (again,
we might need to modify the step length to stay in the interior). The quadratic approximation
of the functional
f
0
(z) = 'c
0
, z` +
1
τ
¸
i
−log(−f
i
(z))
in (7) around a point z is given by
f
0
(z + ∆z) ≈ z +'g
z
, ∆z` +
1
2
'H
z
∆z, ∆z` := q(z + ∆z),
where g
z
is the gradient
g
z
= c
0
+
1
τ
¸
i
1
−f
i
(z)
∇f
i
(z)
and H
z
is the Hessian matrix
H
z
=
1
τ
¸
i
1
f
i
(z)
2
∇f
i
(z)(∇f
i
(z))
T
+
1
τ
¸
i
1
−f
i
(z)
∇
2
f
i
(z).
Given that z is feasible (that A
0
z = b, in particular), the ∆z that minimizes q(z + ∆z) subject
to A
0
z = b is the solution to the set of linear equations
τ
H
z
A
T
0
A
0
0
∆z
v
= −τg
z
. (8)
(The vector v, which can be interpreted as the Lagrange multipliers for the quality constraints
in the quadratic minimization problem, is not directly used.)
In all of the recovery problems below with the exception of (TV
1
), there are no equality con
straints (A
0
= 0). In these cases, the system (8) is symmetric positive deﬁnite, and thus can be
solved using CG when the problem is “large scale”. For the (TV
1
) problem, we use the SYMMLQ
algorithm (which is similar to CG, and works on symmetric but indeﬁnite systems, see [16]).
With ∆z in hand, we have the Newton step direction. The step length s ≤ 1 is chosen so that
1. f
i
(z + s∆z) < 0 for all i = 1, . . . , m,
2. The functional has decreased suﬃently:
f
0
(z + s∆z) < f
0
(z) + αs∆z'g
z
, ∆z`,
where α is a userspeciﬁed parameter (each of the implementations below uses α = 0.01).
This requirement basically states that the decrease must be within a certain percentage of
that predicted by the linear model at z.
As before, we start with s = 1, and decrease by multiples of β until both these conditions are
satisﬁed (all implementations use β = 1/2).
The complete logbarrier implementation for each problem follows the outline:
1
The choice of −log(−x) for the barrier function is not arbitrary, it has a property (termed selfconcordance)
that is very important for quick convergence of (7) to (6) both in theory and in practice (see the very nice
exposition in [17]).
6
1. Inputs: a feasible starting point z
0
, a tolerance η, and parameters µ and an initial τ
1
. Set
k = 1.
2. Solve (7) via Newton’s method (followed by the backtracking line search), using z
k−1
as
an initial point. Call the solution z
k
.
3. If m/τ
k
< η, terminate and return z
k
.
4. Else, set τ
k+1
= µτ
k
, k = k + 1 and go to step 2.
In fact, we can calculate in advance how many iterations the logbarrier algorithm will need:
barrier iterations =
¸
log m−log η −log τ
1
log µ
¸
.
The ﬁnal issue is the selection of τ
1
. Our implementation chooses τ
1
conservatively; it is set so
that the duality gap m/τ
1
after the ﬁrst iteration is equal to 'c
0
, z
0
`.
In Appendix, we explicitly derive the equations for the Newton step for each of (P
2
), (TV
1
), (TV
2
), (TV
D
),
again using notation that mirrors the variable names in the code.
3 Examples
To illustrate how to use the code, the
1
magic package includes mﬁles for solving speciﬁc
instances of each of the above problems (these end in “ example.m” in the main directory).
3.1
1
with equality constraints
We will begin by going through l1eq example.m in detail. This mﬁle creates a sparse signal,
takes a limited number of measurements of that signal, and recovers the signal exactly by solving
(P
1
). The ﬁrst part of the procedure is for the most part selfexplainatory:
% put key subdirectories in path if not already there
path(path, ’./Optimization’);
path(path, ’./Data’);
% load random states for repeatable experiments
load RandomStates
rand(’state’, rand_state);
randn(’state’, randn_state);
% signal length
N = 512;
% number of spikes in the signal
T = 20;
% number of observations to make
K = 120;
% random +/ 1 signal
x = zeros(N,1);
q = randperm(N);
x(q(1:T)) = sign(randn(T,1));
7
We add the ’Optimization’ directory (where the interior point solvers reside) and the ’Data’
directories to the path. The ﬁle RandomStates.m contains two variables: rand state and
randn state, which we use to set the states of the random number generators on the next
two lines (we want this to be a “repeatable experiment”). The next few lines set up the prob
lem: a length 512 signal that contains 20 spikes is created by choosing 20 locations at random
and then putting ±1 at these locations. The original signal is shown in Figure 1(a). The next
few lines:
% measurement matrix
disp(’Creating measurment matrix...’);
A = randn(K,N);
A = orth(A’)’;
disp(’Done.’);
% observations
y = A*x;
% initial guess = min energy
x0 = A’*y;
create a measurement ensemble by ﬁrst creating a K N matrix with iid Gaussian entries,
and then orthogonalizing the rows. The measurements y are taken, and the “minimum energy”
solution x0 is calculated (x0, which is shown in Figure 1 is the vector in ¦x : Ax = y¦ that is
closest to the origin). Finally, we recover the signal with:
% solve the LP
tic
xp = l1eq_pd(x0, A, [], y, 1e3);
toc
The function l1eq pd.m (found in the ’Optimization’ subdirectory) implements the primaldual
algorithm presented in Section 2.1; we are sending it our initial guess x0 for the solution, the
measurement matrix (the third argument, which is used to specify the transpose of the measure
ment matrix, is unnecessary here — and hence left empty — since we are providing A explicitly),
the measurements, and the precision to which we want the problem solved (l1eq pd will termi
nate when the surrogate duality gap is below 10
−3
). Running the example ﬁle at the MATLAB
prompt, we have the following output:
>> l1eq_example
Creating measurment matrix...
Done.
Iteration = 1, tau = 1.921e+02, Primal = 5.272e+01, PDGap = 5.329e+01, Dual res = 9.898e+00, Primal res = 1.466e14
H11p condition number = 1.122e02
Iteration = 2, tau = 3.311e+02, Primal = 4.383e+01, PDGap = 3.093e+01, Dual res = 5.009e+00, Primal res = 7.432e15
H11p condition number = 2.071e02
Iteration = 3, tau = 5.271e+02, Primal = 3.690e+01, PDGap = 1.943e+01, Dual res = 2.862e+00, Primal res = 1.820e14
H11p condition number = 2.574e04
Iteration = 4, tau = 7.488e+02, Primal = 3.272e+01, PDGap = 1.368e+01, Dual res = 1.902e+00, Primal res = 1.524e14
H11p condition number = 8.140e05
Iteration = 5, tau = 9.731e+02, Primal = 2.999e+01, PDGap = 1.052e+01, Dual res = 1.409e+00, Primal res = 1.380e14
H11p condition number = 5.671e05
Iteration = 6, tau = 1.965e+03, Primal = 2.509e+01, PDGap = 5.210e+00, Dual res = 6.020e01, Primal res = 4.071e14
H11p condition number = 2.054e05
Iteration = 7, tau = 1.583e+04, Primal = 2.064e+01, PDGap = 6.467e01, Dual res = 6.020e03, Primal res = 3.126e13
H11p condition number = 1.333e06
Iteration = 8, tau = 1.450e+05, Primal = 2.007e+01, PDGap = 7.062e02, Dual res = 6.020e05, Primal res = 4.711e13
H11p condition number = 1.187e07
Iteration = 9, tau = 1.330e+06, Primal = 2.001e+01, PDGap = 7.697e03, Dual res = 6.020e07, Primal res = 2.907e12
H11p condition number = 3.130e09
Iteration = 10, tau = 1.220e+07, Primal = 2.000e+01, PDGap = 8.390e04, Dual res = 6.020e09, Primal res = 1.947e11
H11p condition number = 3.979e11
Elapsed time is 0.141270 seconds.
The recovered signal xp is shown in Figure 1(c). The signal is recovered to fairly high accuracy:
>> norm(xpx)
8
0 50 100 150 200 250 300 350 400 450 500
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
0 50 100 150 200 250 300 350 400 450 500
−0.4
−0.3
−0.2
−0.1
0
0.1
0.2
0.3
0.4
0 50 100 150 200 250 300 350 400 450 500
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
(a) Original (b) Minimum energy reconstruction (c) Recovered
Figure 1: 1D recovery experiment for 1 minimization with equality constraints. (a) Original length
512 signal x consisting of 20 spikes. (b) Minimum energy (linear) reconstruction x0. (c) Minimum 1
reconstruction xp.
ans =
8.9647e05
3.2 Phantom reconstruction
A large scale example is given in tveq phantom example.m. This ﬁles recreates the phantom
reconstruction experiment ﬁrst published in [4]. The 256256 SheppLogan phantom, shown in
Figure 2(a), is measured at K = 5481 locations in the 2D Fourier plane; the sampling pattern is
shown in Figure 2(b). The image is then reconstructed exactly using (TV
1
).
The starshaped Fourierdomain sampling pattern is created with
% number of radial lines in the Fourier domain
L = 22;
% Fourier samples we are given
[M,Mh,mh,mhi] = LineMask(L,n);
OMEGA = mhi;
The auxiliary function LineMask.m (found in the ‘Measurements’ subdirectory) creates the star
shaped pattern consisting of 22 lines through the origin. The vector OMEGA contains the locations
of the frequencies used in the sampling pattern.
This example diﬀers from the previous one in that the code operates in largescale mode. The
measurement matrix in this example is 5481 65536, making the system (8) far too large to
solve (or even store) explicitly. (In fact, the measurment matrix itself would require almost 3
gigabytes of memory if stored in double precision.) Instead of creating the measurement matrix
explicitly, we provide function handles that take a vector x, and return Ax. As discussed above,
the Newton steps are solved for using an implicit algorithm.
To create the implicit matrix, we use the function handles
A = @(z) A_fhp(z, OMEGA);
At = @(z) At_fhp(z, OMEGA, n);
9
(a) Phantom (b) Sampling pattern (c) Min energy (d) minTV reconstruction
Figure 2: Phantom recovery experiment.
The function A fhp.m takes a length N vector (we treat n n images as N := n
2
vectors), and
returns samples on the K frequencies. (Actually, since the underlying image is real, A fhp.m
return the real and imaginary parts of the 2D FFT on the upper halfplane of the domain shown
in Figure 2(b).)
To solve (TV
1
), we call
xp = tveq_logbarrier(xbp, A, At, y, 1e1, 2, 1e8, 600);
The variable xbp is the initial guess (which is again the minimal energy reconstruction shown
in Figure 2(c)), y are the measurements, and1e1 is the desired precision. The sixth input is
the value of µ (the amount by which to increase τ
k
at each iteration; see Section 2.2). The last
two inputs are parameters for the largescale solver used to ﬁnd the Newton step. The solvers
are iterative, with each iteration requiring one application of A and one application of At. The
seventh and eighth arguments above state that we want the solver to iterate until the solution
has precision 10
−8
(that is, it ﬁnds a z such that Hz −g
2
/g
2
≤ 10
−8
), or it has reached 600
iterations.
The recovered phantom is shown in Figure 2(d). We have X
TV
−X
2
/X
2
≈ 8 10
−3
.
3.3 Optimization routines
We include a brief description of each of the main optimization routines (type help <function>
in MATLAB for details). Each of these mﬁles is found in the Optimization subdirectory.
10
cgsolve
Solves Ax = b, where A is symmetric positive deﬁnite, using the
Conjugate Gradient method.
l1dantzig pd
Solves (P
D
) (the Dantzig selector) using a primaldual algorithm.
l1decode pd
Solves the norm approximation problem (P
A
) (for decoding via
linear programming) using a primaldual algorithm.
l1eq pd
Solves the standard Basis Pursuit problem (P
1
) using a primaldual
algorithm.
l1qc logbarrier
Barrier (“outer”) iterations for solving quadratically constrained
1
minimization (P
2
).
l1qc newton
Newton (“inner”) iterations for solving quadratically constrained
1
minimization (P
2
).
tvdantzig logbarrier
Barrier iterations for solving the TV Dantzig selector (TV
D
).
tvdantzig newton
Newton iterations for (TV
D
).
tveq logbarrier
Barrier iterations for equality constrained TV minimizaiton (TV
1
).
tveq newton
Newton iterations for (TV
1
).
tvqc logbarrier
Barrier iterations for quadratically constrained TV minimization
(TV
2
).
tvqc newton
Newton iterations for (TV
2
).
4 Error messages
Here we brieﬂy discuss each of the error messages that the
1
magic may produce.
• Matrix illconditioned. Returning previous iterate. This error can occur when
the code is running in smallscale mode; that is, the matrix A is provided explicitly. The
error message is produced when the condition number of the linear system we need to solve
to ﬁnd the step direction (i.e. (5) for the linear programs, and (8) for the SOCPs) has an
estimated condition number of less than 10
−14
.
This error most commonly occurs during the last iterations of the primaldual or logbarrier
algorithms. While it means that the solution is not within the tolerance speciﬁed (by the
primaldual gap), in practice it is usually pretty close.
• Cannot solve system. Returning previous iterate. This error is the largescale
analog to the above. The error message is produced when the residual produced by the
conjugate gradients algorithm was above 1/2; essentially this means that CG has not solved
the system in any meaningful way. Again, this error typically occurs in the late stages of
the optimization algorithm, and is a symptom of the system being illconditioned.
• Stuck backtracking, returning last iterate. This error occurs when the algorithm,
after computing the step direction, cannot ﬁnd a step size small enough that decreases the
objective. It is generally occurs in largescale mode, and is a symptom of CG not solving
for the step direction to suﬃcient precision (if the system is solved perfectly, a small enough
step size will always be found). Again, this will typically occur in the late stages of the
optimization algorithm.
• Starting point infeasible; using x0 = At*inv(AAt)*y. Each of the optimization
programs expects an initial guess which is feasible (obeys the constraints). If the x0
provided is not, this message is produced, and the algorithm proceeds using the least
squares starting point x
0
= A
T
(AA
T
)
−1
b.
11
Appendix
A
1
minimization with equality constraints
When x, A and b are real, then (P
1
) can be recast as the linear program
min
x,u
¸
i
u
i
subject to x
i
−u
i
≤ 0
−x
i
−u
i
≤ 0,
Ax = b
which can be solved using the standard primaldual algorithm outlined in Section 2.1 (again,
see [2, Chap.11] for a full discussion). Set
f
u1;i
:= x
i
−u
i
f
u2;i
:= −x
i
−u
i
,
with λ
u1;i
, λ
u2;i
the corresponding dual variables, and let f
u1
be the vector (f
u1;1
. . . f
u1;N
)
T
(and likewise for f
u2
, λ
u1
, λ
u2
). Note that
∇f
u1;i
=
δ
i
−δ
i
, ∇f
u2;i
=
−δ
i
−δ
i
, ∇
2
f
u1;i
= 0, ∇
2
f
u2;i
= 0,
where δ
i
is the standard basis vector for component i. Thus at a point (x, u; v, λ
u1
, λ
u2
), the
central and dual residuals are
r
cent
=
−Λ
u1
f
u1
−Λ
u2
f
u2
−(1/τ)1,
r
dual
=
λ
u1
−λ
u2
+ A
T
v
1 −λ
u1
−λ
u2
,
and the Newton step (5) is given by:
¸
Σ
1
Σ
2
A
T
Σ
2
Σ
1
0
A 0 0
¸
¸
∆x
∆u
∆v
¸
=
¸
w
1
w
2
w
3
¸
:=
¸
(−1/τ) (−f
−1
u1
+ f
−1
u2
) −A
T
v
−1 −(1/τ) (f
−1
u1
+ f
−1
u2
)
b −Ax
¸
,
with
Σ
1
= Λ
u1
F
−1
u1
−Λ
u2
F
−1
u2
, Σ
2
= Λ
u1
F
−1
u1
+ Λ
u2
F
−1
u2
,
(The F
•
, for example, are diagonal matrices with (F
•
)
ii
= f
•;i
, and f
−1
•;i
= 1/f
•;i
.) Setting
Σ
x
= Σ
1
−Σ
2
2
Σ
−1
1
,
we can eliminate
∆x = Σ
−1
x
(w
1
−Σ
2
Σ
−1
1
w
2
−A
T
∆v)
∆u = Σ
−1
1
(w
2
−Σ
2
∆x),
and solve
−AΣ
−1
1
A
T
∆v = w
3
−A(Σ
−1
x
w
1
−Σ
−1
x
Σ
2
Σ
−1
1
w
2
).
This is a KK positive deﬁnite system of equations, and can be solved using conjugate gradients.
Given ∆x, ∆u, ∆v, we calculate the change in the inequality dual variables as in (4):
∆λ
u1
= Λ
u1
F
−1
u1
(−∆x + ∆u) −λ
u1
−(1/τ)f
−1
u1
∆λ
u2
= Λ
u2
F
−1
u2
(∆x + ∆u) −λ
u2
−(1/τ)f
−1
u2
.
12
B
1
norm approximation
The
1
norm approximation problem (P
A
) can also be recast as a linear program:
min
x,u
M
¸
m=1
u
m
subject to Ax −u −y ≤ 0
−Ax −u + y ≤ 0,
(recall that unlike the other 6 problems, here the MN matrix A has more rows than columns).
For the primaldual algorithm, we deﬁne
f
u1
= Ax −u −y, f
u2
= −Ax −u + y.
Given a vector of weights σ ∈ R
M
,
¸
m
σ
m
∇f
u1;m
=
A
T
σ
−σ
,
¸
m
σ
m
∇f
u2;m
=
−A
T
σ
−σ
,
¸
m
σ
m
∇f
u1;m
∇f
T
u1;m
=
A
T
ΣA −A
T
Σ
−ΣA Σ
,
¸
m
σ
m
∇f
u2;m
∇f
T
u2;m
=
A
T
ΣA A
T
Σ
ΣA Σ
.
At a point (x, u; λ
u1
, λ
u2
), the dual residual is
r
dual
=
A
T
(λ
u1
−λ
u2
)
−λ
u1
−λ
u2
,
and the Newton step is the solution to
A
T
Σ
11
A A
T
Σ
12
Σ
12
A Σ
11
∆x
∆u
=
−(1/τ) A
T
(−f
−1
u1
+ f
−1
u2
)
−1 −(1/τ) (f
−1
u1
+ f
−1
u2
)
:=
w
1
w
2
where
Σ
11
= −Λ
u1
F
−1
u1
−Λ
u2
F
−1
u2
Σ
12
= Λ
u1
F
−1
u1
−Λ
u2
F
−1
u2
.
Setting
Σ
x
= Σ
11
−Σ
2
12
Σ
−1
11
,
we can eliminate ∆u = Σ
−1
11
(w
2
−Σ
22
A∆x), and solve
A
T
Σ
x
A∆x = w
1
−A
T
Σ
22
Σ
−1
11
w
2
for ∆x. Again, A
T
Σ
x
A is a N N symmetric positive deﬁnite matrix (it is straightforward to
verify that each element on the diagonal of Σ
x
will be strictly positive), and so the Conjugate
Gradients algorithm can be used for largescale problems.
Given ∆x, ∆u, the step directions for the inequality dual variables are given by
∆λ
u1
= −Λ
u1
F
−1
u1
(A∆x −∆u) −λ
u1
−(1/τ)f
−1
u1
∆λ
u2
= Λ
u2
F
−1
u2
(A∆x + ∆u) −λ
u2
−(1/τ)f
−1
u2
.
C
1
Dantzig selection
An equivalent linear program to (P
D
) in the real case is given by:
min
x,u
¸
i
u
i
subject to x −u ≤ 0,
−x −u ≤ 0,
A
T
r − ≤ 0,
−A
T
r − ≤ 0,
13
where r = Ax −b. Taking
f
u1
= x −u, f
u2
= −x −u, f
1
= A
T
r −, f
2
= −A
T
r −,
the residuals at a point (x, u; λ
u1
, λ
u2
, λ
1
, λ
2
), the dual residual is
r
dual
=
λ
u1
−λ
u2
+ A
T
A(λ
1
−λ
2
)
1 −λ
u1
−λ
u2
,
and the Newton step is the solution to
A
T
AΣ
a
A
T
A + Σ
11
Σ
12
Σ
12
Σ
11
∆x
∆u
=
−(1/τ) (A
T
A(−f
−1
1
+ f
−1
2
)) −f
−1
u1
+ f
−1
u2
−1 −(1/τ) (f
−1
u1
+ f
−1
u2
)
:=
w
1
w
2
where
Σ
11
= −Λ
u1
F
−1
u1
−Λ
u2
F
−1
u2
Σ
12
= Λ
u1
F
−1
u1
−Λ
u2
F
−1
u2
Σ
a
= −Λ
1
F
−1
1
−Λ
2
F
−1
2
.
Again setting
Σ
x
= Σ
11
−Σ
2
12
Σ
−1
11
,
we can eliminate
∆u = Σ
−1
11
(w
2
−Σ
12
∆x),
and solve
(A
T
AΣ
a
A
T
A + Σ
x
)∆x = w
1
−Σ
12
Σ
−1
11
w
2
for ∆x. As before, the system is symmetric positive deﬁnite, and the CG algorithm can be used
to solve it.
Given ∆x, ∆u, the step directions for the inequality dual variables are given by
∆λ
u1
= −Λ
u1
F
−1
u1
(∆x −∆u) −λ
u1
−(1/τ)f
−1
u1
∆λ
u2
= −Λ
u2
F
−1
u2
(−∆x −∆u) −λ
u2
−(1/τ)f
−1
u2
∆λ
1
= −Λ
1
F
−1
1
(A
T
A∆x) −λ
1
−(1/τ)f
−1
1
∆λ
2
= −Λ
2
F
−1
2
(−A
T
A∆x) −λ
2
−(1/τ)f
−1
2
.
D
1
minimization with quadratic constraints
The quadractically constrained
1
minimization problem (P
2
) can be recast as the secondorder
cone program
min
x,u
¸
i
u
i
subject to x −u ≤ 0,
−x −u ≤ 0,
1
2
Ax −b
2
2
−
2
≤ 0.
Taking
f
u1
= x −u, f
u2
= −x −u, f
=
1
2
Ax −b
2
2
−
2
,
we can write the Newton step (as in (8)) at a point (x, u) for a given τ as
Σ
11
−f
−1
A
T
A + f
−2
A
T
rr
T
A Σ
12
Σ
12
Σ
11
∆x
∆u
=
f
−1
u1
−f
−1
u2
+ f
−1
A
T
r
−τ1 −f
−1
u1
−f
−1
u2
:=
w
1
w
2
14
where r = Ax −b, and
Σ
11
= F
−2
u1
+ F
−2
u2
Σ
12
= −F
−2
u1
+ F
−2
u2
.
As before, we set
Σ
x
= Σ
11
−Σ
2
12
Σ
−1
11
and eliminate ∆u
∆u = Σ
−1
11
(w
2
−Σ
12
∆x),
leaving us with the reduced system
(Σ
x
−f
−1
A
T
A + f
−2
A
T
rr
T
A)∆x = w
1
−Σ
12
Σ
−1
11
w
2
which is symmetric positive deﬁnite and can be solved using CG.
E Total variation minimization with equality constraints
The equality constrained TV minimization problem
min
x
TV(x) subject to Ax = b,
can be rewritten as the SOCP
min
t,x
¸
ij
t
ij
s.t. D
ij
x
2
≤ t
ij
Ax = b.
Deﬁning the inequality functions
f
tij
=
1
2
D
ij

2
2
−t
2
ij
i, j = 1, . . . , n (9)
we have
∇f
tij
=
D
T
ij
D
ij
x
−t
ij
δ
ij
∇f
tij
∇f
T
tij
=
D
T
ij
D
ij
xx
T
D
T
ij
D
ij
−t
ij
D
T
ij
D
ij
xδ
T
ij
−t
ij
δ
ij
x
T
D
T
ij
D
ij
t
2
ij
δ
ij
δ
T
ij
, ∇
2
f
tij
=
D
∗
ij
D
ij
0
0 −δ
ij
δ
T
ij
,
where δ
ij
is the Kronecker vector that is 1 in entry ij and zero elsewhere. For future reference:
¸
ij
σ
ij
∇f
tij
=
D
T
h
ΣD
h
x + D
T
v
ΣD
v
x
−σt
,
¸
ij
σ
ij
∇f
tij
∇f
T
tij
=
BΣB
T
−BTΣ
−ΣTB
T
ΣT
2
,
¸
ij
σ
ij
∇
2
f
tij
=
D
T
h
ΣD
h
+ D
T
v
ΣD
v
0
0 −Σ
where Σ = diag(¦σ
ij
¦), T = diag(t), D
h
has the D
h;ij
as rows (and likewise for D
v
), and B is a
matrix that depends on x:
B = D
T
h
Σ
∂h
+ D
T
v
Σ
∂v
.
with Σ
∂h
= diag(D
h
x), Σ
∂v
= diag(D
v
x).
15
The Newton system (8) for the logbarrier algorithm is then
¸
H
11
BΣ
12
A
T
Σ
12
B
T
Σ
22
0
A 0 0
¸
¸
∆x
∆t
∆v
¸
=
¸
D
T
h
F
−1
t
D
h
x + D
T
v
F
−1
t
D
v
x
−τ1 −F
−1
t
t
0
¸
:=
¸
w
1
w
2
0
¸
,
where
H
11
= D
T
h
(−F
−1
t
)D
h
+ D
T
v
(−F
−1
t
)D
v
+ BF
−2
t
B
T
.
Eliminating ∆t
∆t = Σ
−1
22
(w
2
−Σ
12
B
T
∆x)
= Σ
−1
22
(w
2
−Σ
12
Σ
∂h
D
h
∆x −Σ
12
Σ
∂v
D
v
∆x),
the reduced (N + K) (N + K) system is
H
11
A
T
A 0
∆x
∆v
=
w
1
0
(10)
with
H
11
= H
11
−BΣ
2
12
Σ
−1
22
B
T
= D
T
h
(Σ
b
Σ
2
∂h
−F
−1
t
)D
h
+ D
T
v
(Σ
b
Σ
2
∂v
−F
−1
t
)D
v
+
D
T
h
(Σ
b
Σ
∂h
Σ
∂v
)D
v
+ D
T
v
(Σ
b
Σ
∂h
Σ
∂v
)D
h
w
1
= w
1
−BΣ
12
Σ
−1
22
w
2
= w
1
−(D
T
h
Σ
∂h
+ D
T
v
Σ
∂v
)Σ
12
Σ
−1
22
w
2
Σ
b
= F
−2
t
−Σ
−1
22
Σ
2
12
.
The system of equations (10) is symmetric, but not positive deﬁnite. Note that D
h
and D
v
are
(very) sparse matrices, and hence can be stored and applied very eﬃciently. This allows us to
again solve the system above using an iterative method such as SYMMLQ [16].
F Total variation minimization with quadratic constraints
We can rewrite (TV
2
) as the SOCP
min
x,t
¸
ij
t
ij
subject to D
ij
x
2
≤ t
ij
, i, j = 1, . . . , n
Ax −b
2
≤
where D
ij
is as in (1). Taking f
tij
as in (9) and
f
=
1
2
Ax −b
2
2
−
2
,
with
∇f
=
A
T
r
0
, ∇f
∇f
T
=
A
T
rr
T
A 0
0 0
, ∇
2
f
=
A
∗
A 0
0 0
where r = Ax −b.
Also,
∇
2
f
tij
=
D
∗
ij
D
ij
0
0 −δ
ij
δ
T
ij
∇
2
f
=
A
∗
A 0
0 0
.
16
The Newton system is similar to that in equality constraints case:
H
11
BΣ
12
Σ
12
B
T
Σ
22
∆x
∆t
=
D
T
h
F
−1
t
D
h
x + D
T
v
F
−1
t
D
v
x + f
−1
A
T
r
−τ1 −tf
−1
t
:=
w
1
w
2
.
where (tf
−1
t
)
ij
= t
ij
/f
tij
, and
H
11
= D
T
h
(−F
−1
t
)D
h
+ D
T
v
(−F
−1
t
)D
v
+ BF
−2
t
B
T
−
f
−1
A
T
A + f
−2
A
T
rr
T
A,
Σ
12
= −TF
−2
t
,
Σ
22
= F
−1
t
+ F
−2
t
T
2
,
Again eliminating ∆t
∆t = Σ
−1
22
(w
2
−Σ
12
Σ
∂h
D
h
∆x −Σ
12
Σ
∂v
D
v
∆x),
the key system is
H
11
∆x = w
1
−(D
T
h
Σ
∂h
+ D
T
v
Σ
∂v
)Σ
12
Σ
−1
22
w
2
where
H
11
= H
11
−BΣ
2
12
Σ
−1
22
B
T
= D
T
h
(Σ
b
Σ
2
∂h
−F
−1
t
)D
h
+ D
T
v
(Σ
b
Σ
2
∂v
−F
−1
t
)D
v
+
D
T
h
(Σ
b
Σ
∂h
Σ
∂v
)D
v
+ D
T
v
(Σ
b
Σ
∂h
Σ
∂v
)D
h
−
f
−1
A
T
A + f
−2
A
T
rr
T
A,
Σ
b
= F
−2
t
−Σ
2
12
Σ
−1
22
.
The system above is symmetric positive deﬁnite, and can be solved with CG.
G Total variation minimization with bounded residual cor
relation
The TV Dantzig problem has an equivalent SOCP as well:
min
x,t
¸
ij
t
ij
subject to D
ij
x
2
≤ t
ij
, i, j = 1, . . . , n
A
T
(Ax −b) − ≤ 0
−A
T
(Ax −b) − ≤ 0.
The inequality constraint functions are
f
tij
=
1
2
D
ij
x
2
2
−t
2
ij
i, j = 1, . . . , n
f
1
= A
T
(Ax −b) −,
f
2
= −A
T
(Ax −b) −,
with
¸
ij
σ
ij
∇f
1;ij
=
A
T
Aσ
0
,
¸
ij
σ
ij
∇f
2;ij
=
−A
T
Aσ
0
,
and
¸
ij
σ
ij
∇f
1;ij
∇f
T
1;ij
=
¸
ij
σ
ij
∇f
2;ij
∇f
T
2;ij
=
A
T
AΣA
T
A 0
0 0
.
17
Thus the log barrier Newton system is nearly the same as in the quadratically constrained case:
H
11
BΣ
12
Σ
12
B
T
Σ
22
∆x
∆t
=
D
T
h
F
−1
t
D
h
x + D
T
v
F
−1
t
D
v
x + A
T
A(f
−1
1
−f
−1
2
)
−τ1 −tf
−1
t
:=
w
1
w
2
.
where
H
11
= D
T
h
(−F
−1
t
)D
h
+ D
T
v
(−F
−1
t
)D
v
+ BF
−2
t
B
T
+ A
T
AΣ
a
A
T
A,
Σ
12
= −TF
−2
t
,
Σ
22
= F
−1
t
+ F
−2
t
T
2
,
Σ
a
= F
−2
1
+ F
−2
2
.
Eliminating ∆t as before
∆t = Σ
−1
22
(w
2
−Σ
12
Σ
∂h
D
h
∆x −Σ
12
Σ
∂v
D
v
∆x),
the key system is
H
11
∆x = w
1
−(D
T
h
Σ
∂h
+ D
T
v
Σ
∂v
)Σ
12
Σ
−1
22
w
2
where
H
11
= D
T
h
(Σ
b
Σ
2
∂h
−F
−1
t
)D
h
+ D
T
v
(Σ
b
Σ
2
∂v
−F
−1
t
)D
v
+
D
T
h
(Σ
b
Σ
∂h
Σ
∂v
)D
v
+ D
T
v
(Σ
b
Σ
∂h
Σ
∂v
)D
h
+ A
T
AΣ
a
A
T
A,
Σ
b
= F
−2
t
−Σ
2
12
Σ
−1
22
.
References
[1] F. Alizadeh and D. Goldfarb. Secondorder cone programming. Math. Program., Ser. B,
95:3–51, 2003.
[2] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[3] E. Cand`es and J. Romberg. Quantitative robust uncertainty principles and optimally sparse
decompositions. To appear in Foundations of Comput. Math., 2005.
[4] E. Cand`es, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal recon
struction from highly incomplete frequency information. Submitted to IEEE Trans. Inform.
Theory, June 2004. Available on theArXiV preprint server: math.GM/0409186.
[5] E. Cand`es, J. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate
measurements. Submitted to Communications on Pure and Applied Mathematics, March
2005.
[6] E. Cand`es and T. Tao. Nearoptimal signal recovery from random projections and universal
encoding strategies. submitted to IEEE Trans. Inform. Theory, November 2004. Available
on the ArXiV preprint server: math.CA/0410542.
[7] E. Cand`es and T. Tao. The Dantzig selector: statistical estimation when p is much smaller
than n. Manuscript, May 2005.
[8] E. J. Cand`es and T. Tao. Decoding by linear programming. To appear in IEEE Trans.
Inform. Theory, December 2005.
18
[9] T. Chan, G. Golub, and P. Mulet. A nonlinear primaldual method for total variationbased
image restoration. SIAM J. Sci. Comput., 20:1964–1977, 1999.
[10] S. S. Chen, D. L. Donoho, and M. A. Saunders. Atomic decomposition by basis pursuit.
SIAM J. Sci. Comput., 20:33–61, 1999.
[11] D. Goldfarb and W. Yin. Secondorder cone programming methods for total variationbased
image restoration. Technical report, Columbia University, 2004.
[12] H. Hinterm¨ uller and G. Stadler. An infeasible primaldual algorithm for TVbased inf
convolutiontype image restoration. To appear in SIAM J. Sci. Comput., 2005.
[13] M. Lobo, L. Vanderberghe, S. Boyd, and H. Lebret. Applications of secondorder cone
programming. Linear Algebra and its Applications, 284:193–228, 1998.
[14] Y. E. Nesterov and A. S. Nemirovski. Interior Point Polynomial Methods in Convex Pro
gramming. SIAM Publications, Philadelphia, 1994.
[15] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York, 1999.
[16] C. C. Paige and M. Saunders. Solution of sparse indeﬁnite systems of linear equations.
SIAM J. Numer. Anal., 12(4), September 1975.
[17] J. Renegar. A mathematical view of interiorpoint methods in convex optimization. MPS
SIAM Series on Optimization. SIAM, 2001.
[18] L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation noise removal algorithm.
Physica D, 60:259–68, 1992.
[19] J. R. Shewchuk. An introduction to the conjugate gradient method without the agonizing
pain. Manuscript, August 1994.
[20] S. J. Wright. PrimalDual InteriorPoint Methods. SIAM Publications, 1997.
19
• Min. If the underlying signal is a 2D image. Again. A. where C is a small constant.ij x . and deﬁne the operators Dh. b. the programs (P1 ). j=n The 2vector Dij x can be interpreted as a kind of discrete gradient of the digital image x.ij x)2 + (Dv. we have three programs for image recovery. 2 i (see [7] for details). where e is the corruption.j+1 − xij 0 j<n . If b = Ax0 +e. then the solution x2 to (P2 ) will be close to x0 . an alternate recovery model is that the gradient is sparse [18].ij x Dv. This program ﬁnds the vector with minimum norm that comes close to explaining the observations: (P2 ) min x 1 1 subject to Ax − b 2 ≤ . With these deﬁnitions. in the complex case. The total variation of x is simply the sum of the magnitudes of this discrete gradient at every point: TV(x) := ij (Dh. and has an unknown number of its entries corrupted. (PD ) requires that the residual Ax − b of a candidate vector x not be too correlated with any of the columns of A (the product A∗ (Ax − b) measures each of these correlations). A. then the solution xD to (PD ) has nearoptimal minimax risk: E xD − x0 2 ≤ C(log N ) · min(x0 (i)2 . each of which can be recast as a SOCP: 2 . (PD ) can again be recast as an LP.j − xij 0 i<n i=n Dv.1 with quadratic constraints. (1) xi+1. where is a user speciﬁed parameter. This program arises in the context of channel coding [8]. where γ is a user speciﬁed parameter. but we will not pursue this here. It is also true that when x. x2 − x0 2 ≤ C · . b are complex. (PA ).ﬁnds the vector x ∈ RN such that the error y − Ax has minimum 1 norm (i. relaxes the equality constraints of (P1 ) in a diﬀerent way. If e is sparse enough. then the decoder can use (PA ) to recover x exactly. if a suﬃciently sparse x0 exists such that b = Ax0 + e.ij x = xi. The decoder observes y = c + e. • Min. Also referred to as the Dantzig Selector.ij x)2 = ij Dij x 2 . Let xij denote the pixel in the ith row and j column of an n × n image x. there is an equivalent SOCP. (P2 ) can be recast as a SOCP.ij x = and Dij x = Dh. The message travels over the channel. Suppose we have a channel code that produces a codeword c = Ax for a message x. (PA ) can be recast as an LP.e. That is.1 with bounded residual correlation. For realvalued x. σ 2 ). As shown in [5]. σ 2 ). we are asking that the diﬀerence between Ax and y be sparse). where ei ∼ N (0. (PD ) can be written as SOCPs. for some small error term e 2 ≤ . the program (PD ) min x 1 subject to A∗ (Ax − b) ∞ ≤ γ.
. Dij x0 is nonzero for only a small number of indices ij). . The standardform linear program is min c0 . At the optimal point z . Below we overview the generic LP and SOCP solvers used in the 1 magic package to solve these problems.(PA ). where the search vector z ∈ RN . m is a linear functional: fi (z) = ci . Note that if A is the identity matrix. and each of the fi . . di ∈ R. .e. . and to set up the notation. • MinTV with quadratic constraints. . fi (z ) ≤ 0. . fi (z) ≤ 0. (T V2 ) reduces to the standard RudinOsherFatemi image restoration problem [18]. λi fi (z ) = 0. (T VD ) min TV(x) subject to A∗ (Ax − b) ∞ ≤γ This program was used in one of the numerical experiments in [7]. . for some ci ∈ RN . i = 1. . . 3 . have made largescale solvers for the seven problems above feasible. (T V2 ) min TV(x) subject to Ax − b 2 ≤ Examples of recovering images from noisy observations using (T V2 ) were presented in [5]. i = 1. A0 z = b. there will exist dual vectors ν ∈ RK . . m. (T V1 ) min TV(x) subject to Ax = b If there exists a piecewise constant x0 with suﬃciently few edges (i. led by the seminal work [14]. b ∈ RK . we brieﬂy review their algorithm here. 2 Interior point methods Advances in interior point methods for convex optimization over the past 15 years. we describe how to solve linear and secondorder cone programs using modern interior point methods. For the sake of completeness. In the next section. z + di . . Boyd and Vandenberghe outline a relatively simple primaldual algorithm for linear programming which we have followed very closely for the implementation of (P1 ). and (PD ). m. z z subject to A0 z = b.• MinTV with equality constraints. 11–13] for SOCP solvers speciﬁcally designed for the totalvariational functional. 2. then (T V1 ) will recover x0 exactly — see [4]. i = 1. See also [9. λ ≥ 0 such that the KarushKuhnTucker conditions are satisﬁed: (KKT ) c0 + AT ν + 0 i λi ci = 0. A0 is a K × N matrix. • Dantzig TV.1 A primaldual algorithm for linear programming In Chapter 11 of [2]. λ ∈ Rm .
λ). However.e.01). λ + s∆λ) 2 ≤ (1 − αs) · rτ (z. ∆λ where Jrτ (z. we relax the complementary slackness condition λi fi = 0 to λk fi (z k ) = −1/τ k . λk+1 ) must be modiﬁed so that we remain in the interior. 2. We can i eliminate ∆λ using: ∆λ = −ΛF −1 C∆z − λ − (1/τ )f −1 (4) leaving us with the core system −C T F −1 ΛC A0 AT 0 0 ∆z ∆ν = −c0 + (1/τ )C T f −1 − AT ν 0 . the step to new point (z k+1 . ν. the system is linearized and solved. we have the system ∆z 0 AT C T c0 + AT ν + i λi ci 0 0 −ΛC 0 −F ∆v = − −Λf − (1/τ )1 . λk > 0). ∆λ) we have a step direction. b − A0 z (5) (3) With the (∆z. ν + ∆ν. allowing a smooth. z + s∆z and λ + s∆λ are in the interior. ν + ∆ν. νλ) is the Jacobian of rτ . and f = f1 (z) . In practice. fm (z) T where Λ is a diagonal matrix with (Λ)ii = λi . i. λ) 2 . λi > 0 for all i. From a point (z. fi (z + s∆z) < 0. λ) + Jrτ (z. The primal. ∆ν. we want to ﬁnd a step (∆z. . The norm of the residuals has decreased suﬃciently: rτ (z + s∆z. . λ) is to satisfying (KKT ) with (2) in place of the slackness condition: rdual rcent rpri = c0 + AT ν + 0 i λi ci = −Λf − (1/τ )1 = A0 z − b. dual. welldeﬁned “central path” from an interior point to the solution on the boundary (see [15. ν k+1 . and F is diagonal with (F )ii = fi (z). i (2) where we judiciously increase the parameter τ k as we progress through the Newton iterations. we have set α = 0. λ). Linearizing (3) with the Taylor expansion around (z. ∆λ A0 0 0 A0 z − b where m × N matrix C has the cT as rows. 20] for an extended discussion). we ask that it satisfy two criteria: 1. To choose the step length 0 < s ≤ 1. where α is a userspreciﬁed parameter (in all of our implementations.In a nutshell. ν + s∆ν. This biases the solution of the linearized equations towards the interior. 4 . λ + ∆λ) ≈ rτ (z. ∆ν. νλ) ∆ν . ν. ν. ν k . ν. and central residuals quantify how close a point (z. ∆λ) such that rτ (z + ∆z. . ∆z rτ (z + ∆z. λk ) (by which we mean fi (z k ) < 0. the primal dual algorithm ﬁnds the optimal z (along with optimal dual vectors ν and λ ) by solving this system of nonlinear equations. λ + ∆λ) = 0. ν. The solution procedure is the classical Newton method: at an interior point (z k .
z − c0 . In the Appendix. but at its core is again solving for a series of Newton steps. for which we will again closely follow the generic (but eﬀective) algorithm described in [2. i = 1. (PD )).Since the fi are linear functionals. Instead.e. The implementations of (P1 ). However. . 11]. 13]). We choose the maximum step size that just keeps us in the interior. we check if item 2 above is satisﬁed. Each of (P2 ). (PD ) are nearly identical. (T VD ) can be written in the form min c0 . z + di )2 2 2 (the Ai are matrices. When the matrix −C T F −1 ΛC is easily invertible (as it is for (P1 )). is conceptually more straightforward than the primaldual method described above. or there are no equality constraints (as in (PA ). the ci are vectors. CG is iterative. {−λi /∆λi . their implementation is not quite as straightforward as in the LP case. we can use 0 a “matrix free” solver such as Conjugate Gradients. z + z or secondorder conic 1 τk − log(−fi (z)) i subject to A0 z = b. and set + − smax = 0. if fast algorithms exist for applying C. When N and K are large.99 · min{1. . Chap. if not. {−fi (z)/ ci . When rdual and rpri are small. λ) is to being opitmal (i. Then starting with s = smax . − Iλ {i : ∆λ < 0}. ν. z z subject to A0 z = b fi (z) ≤ 0. we have implemented each of the SOCP recovery problems using a logbarrier method. The logbarrier method. z ≈ η). (T V2 ). C T and A0 . we set s = β · s and try again. We have taken β = 1/2 in all of our implementations. z + di 1 Ai z 2 − ( ci . (PA ). (T V1 ). m (6) where each fi describes either a constraint which is linear fi = ci .2 A logbarrier algorithm for SOCPs Although primaldual techniques exist for solving secondorder cone programs (see [1. (5) can be reduced to a symmetric positive deﬁnite set of equations. ∆z . requiring a few hundred applications of the constraint matrices (roughly speaking) to get an accurate solution. Let + If = {i : ci . . the surrogate duality gap η = −f T λ is an approximation to how close a certain (z. AT . and the di are scalars). (7) 5 . fi (z) = The standard logbarrier method transforms (6) into a series of linearly constrained programs: min c0 . save for the calculation of the Newton step. forming the matrix and then solving the linear system of equations in (5) is infeasible. . item 1 is easily addressed. i ∈ Iλ }}. A CG solver (based on the very nice exposition in [19]) is included with the 1 magic package. Almost all of the computational burden falls on solving (5). ∆z > 0}. 2. i ∈ If }. c0 . we derive the Newton step for each of these problems using notation mirroring that used in the actual MATLAB code. The primaldual algorithm repeats the Newton iterations described above until η has decreased below a given tolerance.
i i Given that z is feasible (that A0 z = b. it has a property (termed selfconcordance) that is very important for quick convergence of (7) to (6) both in theory and in practice (see the very nice exposition in [17]).01). there are no equality constraints (A0 = 0). especially since we can use the solution z k as a starting point for subproblem k + 1. The inequality constraints have been incorporated into the functional via a penalty function1 which is inﬁnite when the constraint is violated (or even met exactly). . and thus can be solved using CG when the problem is “large scale”. we have the Newton step direction. the system (8) is symmetric positive deﬁnite.e.) In all of the recovery problems below with the exception of (T V1 ). The quadratic approximation of the functional 1 − log(−fi (z)) f0 (z) = c0 . ∆z . This requirement basically states that the decrease must be within a certain percentage of that predicted by the linear model at z. Newton’s method (which is again iterative) proceeds by forming a series of quadratic approximations to (7). ∆z + where gz is the gradient gz = c0 + and Hz is the Hessian matrix Hz = 1 τ 1 fi (z)2 fi (z)( fi (z))T + 1 τ 1 −fi (z) 2 1 Hz ∆z. which can be interpreted as the Lagrange multipliers for the quality constraints in the quadratic minimization problem. and smooth elsewhere. see [16]).where τ k > τ k−1 . z < m/τ k . m. With ∆z in hand. The step length s ≤ 1 is chosen so that 1. At logbarrier iteration k. in particular). z + τ i in (7) around a point z is given by f0 (z + ∆z) ≈ z + gz . we start with s = 1. i. In these cases. 2. and minimizing each by solving a system of equations (again. and works on symmetric but indeﬁnite systems. is not directly used. As τ k gets large. the ∆z that minimizes q(z + ∆z) subject to A0 z = b is the solution to the set of linear equations τ Hz A0 AT 0 0 ∆z v = −τ gz . 6 . z k − c0 . fi (z + s∆z) < 0 for all i = 1. ∆z 2 1 fi (z) −fi (z) := q(z + ∆z). The complete logbarrier implementation for each problem follows the outline: 1 The choice of − log(−x) for the barrier function is not arbitrary. . the solution z k to (7) approaches the solution z to (6): it can be shown that c0 . we use the SYMMLQ algorithm (which is similar to CG. For the (T V1 ) problem. 1 τ i fi (z). where α is a userspeciﬁed parameter (each of the implementations below uses α = 0. The functional has decreased suﬃently: f0 (z + s∆z) < f0 (z) + αs∆z gz . The idea here is that each of the smooth subproblems can be solved to fairly high accuracy with just a few iterations of Newton’s method. we might need to modify the step length to stay in the interior). . . we are within m/τ k of the optimal value after iteration k (m/τ k is called the duality gap). and decrease by multiples of β until both these conditions are satisﬁed (all implementations use β = 1/2). (8) (The vector v. As before.
Call the solution z k .1. Our implementation chooses τ 1 conservatively. % N % T % K signal length = 512. a tolerance η. 2.1)). randn_state). path(path. In Appendix. takes a limited number of measurements of that signal. number of observations to make = 120. we explicitly derive the equations for the Newton step for each of (P2 ). using z k−1 as an initial point. 7 . ’. 3 Examples To illustrate how to use the code. again using notation that mirrors the variable names in the code. set τ k+1 = µτ k . number of spikes in the signal = 20. ’. % random +/. log µ The ﬁnal issue is the selection of τ 1 . 4.1). 3. (T V1 ). Else. rand_state).m” in the main directory). In fact. we can calculate in advance how many iterations the logbarrier algorithm will need: barrier iterations = log m − log η − log τ 1 . (T V2 ). 3. and recovers the signal exactly by solving (P1 ). Set k = 1. and parameters µ and an initial τ 1 ./Data’). randn(’state’. it is set so that the duality gap m/τ 1 after the ﬁrst iteration is equal to c0 . the 1 magic package includes mﬁles for solving speciﬁc instances of each of the above problems (these end in “ example. q = randperm(N). Solve (7) via Newton’s method (followed by the backtracking line search).m in detail. (T VD ). If m/τ k < η. x(q(1:T)) = sign(randn(T. z 0 . terminate and return z k . k = k + 1 and go to step 2. % load random states for repeatable experiments load RandomStates rand(’state’.1 signal x = zeros(N./Optimization’). Inputs: a feasible starting point z 0 .1 1 with equality constraints We will begin by going through l1eq example. This mﬁle creates a sparse signal. The ﬁrst part of the procedure is for the most part selfexplainatory: % put key subdirectories in path if not already there path(path.
Primal = 3.. tau = 1.We add the ’Optimization’ directory (where the interior point solvers reside) and the ’Data’ directories to the path. tau = 9. Primal res = 2.731e+02. y.062e02. PDGap = 5.126e13 H11p condition number = 1.271e+02. Primal = 2. A.001e+01.820e14 H11p condition number = 2.140e05 Iteration = 5. Primal res = 1. Dual res = 6.m contains two variables: rand state and randn state. we recover the signal with: % solve the LP tic xp = l1eq_pd(x0.272e+01.311e+02. A = randn(K.009e+00. tau = 5. Primal res = 7. Primal = 2.380e14 H11p condition number = 5. The measurements y are taken.000e+01. create a measurement ensemble by ﬁrst creating a K × N matrix with iid Gaussian entries.052e+01. [].524e14 H11p condition number = 8. The next few lines: % measurement matrix disp(’Creating measurment matrix. The next few lines set up the problem: a length 512 signal that contains 20 spikes is created by choosing 20 locations at random and then putting ±1 at these locations.N). Primal res = 3. Dual res = 6.020e07.329e+01. Dual res = 2.697e03. we are sending it our initial guess x0 for the solution.020e09.054e05 Iteration = 7.187e07 Iteration = 9. tau = 7.907e12 H11p condition number = 3.330e+06. PDGap = 1.. and the precision to which we want the problem solved (l1eq pd will terminate when the surrogate duality gap is below 10−3 ). Primal = 5. Finally. PDGap = 1.390e04. Primal = 3.466e14 H11p condition number = 1.093e+01. Primal res = 1. PDGap = 7.071e14 H11p condition number = 2. PDGap = 8.020e03. the measurements. Dual res = 6.071e02 Iteration = 3. Primal = 2.368e+01..272e+01. is unnecessary here — and hence left empty — since we are providing A explicitly). disp(’Done.583e+04. Primal res = 1. Iteration = 1. The recovered signal xp is shown in Figure 1(c). we have the following output: >> l1eq_example Creating measurment matrix. PDGap = 1.862e+00. Primal = 4.220e+07.333e06 Iteration = 8. Primal res = 4. Dual res = 9.020e05.141270 seconds. which we use to set the states of the random number generators on the next two lines (we want this to be a “repeatable experiment”).965e+03.383e+01.947e11 H11p condition number = 3.122e02 Iteration = 2. % observations y = A*x. and the “minimum energy” solution x0 is calculated (x0.671e05 Iteration = 6.’). tau = 3.432e15 H11p condition number = 2. The original signal is shown in Figure 1(a). The signal is recovered to fairly high accuracy: >> norm(xpx) 8 .902e+00. which is shown in Figure 1 is the vector in {x : Ax = y} that is closest to the origin). Primal res = 1.409e+00..130e09 Iteration = 10.943e+01.1. toc The function l1eq pd.979e11 Elapsed time is 0.898e+00. Dual res = 1.’). % initial guess = min energy x0 = A’*y.921e+02.007e+01.690e+01. Dual res = 1. PDGap = 5.574e04 Iteration = 4.020e01. Primal = 2. PDGap = 6.467e01.450e+05.509e+01. Dual res = 6. The ﬁle RandomStates. Primal res = 1.999e+01.210e+00. tau = 1.488e+02.711e13 H11p condition number = 1.064e+01. tau = 1. PDGap = 3. Done. Primal res = 4. Primal = 2. 1e3). which is used to specify the transpose of the measurement matrix. and then orthogonalizing the rows. A = orth(A’)’. PDGap = 7. Dual res = 6. tau = 1.m (found in the ’Optimization’ subdirectory) implements the primaldual algorithm presented in Section 2. tau = 1. the measurement matrix (the third argument. Primal = 2. Running the example ﬁle at the MATLAB prompt. Dual res = 5. tau = 1.
2 0 −0. the measurment matrix itself would require almost 3 gigabytes of memory if stored in double precision. This ﬁles recreates the phantom reconstruction experiment ﬁrst published in [4]. (In fact. The 256 × 256 SheppLogan phantom. The auxiliary function LineMask. ans = 8.4 −0.8 −1 0 50 100 150 200 250 300 350 400 450 500 −0. shown in Figure 2(a).) Instead of creating the measurement matrix explicitly. This example diﬀers from the previous one in that the code operates in largescale mode.4 0 50 100 150 200 250 300 350 400 450 500 −0.2 −0. The image is then reconstructed exactly using (T V1 ).8 0.6 −0.mhi] = LineMask(L. The starshaped Fourierdomain sampling pattern is created with % number of radial lines in the Fourier domain L = 22.4 0. % Fourier samples we are given [M.m. the Newton steps are solved for using an implicit algorithm.6 0. The vector OMEGA contains the locations of the frequencies used in the sampling pattern.4 −0. is measured at K = 5481 locations in the 2D Fourier plane. To create the implicit matrix.1 0.0.2 Phantom reconstruction A large scale example is given in tveq phantom example.6 −0.3 1 0.3 (a) Original (b) Minimum energy reconstruction (c) Recovered Figure 1: 1D recovery experiment for 1 minimization with equality constraints.Mh.8 0. (b) Minimum energy (linear) reconstruction x0.4 1 0.2 0.6 0.2 −0.9647e05 3. making the system (8) far too large to solve (or even store) explicitly. (a) Original length 512 signal x consisting of 20 spikes.n). (c) Minimum 1 reconstruction xp. OMEGA. we provide function handles that take a vector x. OMEGA = mhi. and return Ax.2 0 −0.mh. 9 . The measurement matrix in this example is 5481 × 65536.8 −1 0 50 100 150 200 250 300 350 400 450 500 −0. we use the function handles A = @(z) A_fhp(z. As discussed above. n).1 0 0.4 0.2 −0.m (found in the ‘Measurements’ subdirectory) creates the starshaped pattern consisting of 22 lines through the origin. At = @(z) At_fhp(z. the sampling pattern is shown in Figure 2(b). OMEGA).
see Section 2. 10 . The solvers are iterative.2). The recovered phantom is shown in Figure 2(d). 1e1. A. (Actually.m takes a length N vector (we treat n × n images as N := n2 vectors). with each iteration requiring one application of A and one application of At.) To solve (T V1 ). The function A fhp. 1e8. and returns samples on the K frequencies. The sixth input is the value of µ (the amount by which to increase τ k at each iteration. 2.3 Optimization routines We include a brief description of each of the main optimization routines (type help <function> in MATLAB for details). Each of these mﬁles is found in the Optimization subdirectory. and1e1 is the desired precision. The seventh and eighth arguments above state that we want the solver to iterate until the solution has precision 10−8 (that is. We have XT V − X 2/ X 2 ≈ 8 · 10−3 . it ﬁnds a z such that Hz − g 2 / g 2 ≤ 10−8 ). The last two inputs are parameters for the largescale solver used to ﬁnd the Newton step. or it has reached 600 iterations.(a) Phantom (b) Sampling pattern (c) Min energy (d) minTV reconstruction Figure 2: Phantom recovery experiment. y. y are the measurements.m return the real and imaginary parts of the 2D FFT on the upper halfplane of the domain shown in Figure 2(b). At. 3. we call xp = tveq_logbarrier(xbp. since the underlying image is real. A fhp. The variable xbp is the initial guess (which is again the minimal energy reconstruction shown in Figure 2(c)). 600).
This error most commonly occurs during the last iterations of the primaldual or logbarrier algorithms. and the algorithm proceeds using the leastsquares starting point x0 = AT (AAT )−1 b. • Cannot solve system. This error is the largescale analog to the above. Newton (“inner”) iterations for solving quadratically constrained 1 minimization (P2 ). a small enough step size will always be found). where A is symmetric positive deﬁnite. this message is produced. It is generally occurs in largescale mode. Returning previous iterate. • Matrix illconditioned. returning last iterate. Each of the optimization programs expects an initial guess which is feasible (obeys the constraints). • Starting point infeasible. Newton iterations for (T VD ). Barrier iterations for equality constrained TV minimizaiton (T V1 ). Solves the norm approximation problem (PA ) (for decoding via linear programming) using a primaldual algorithm. this error typically occurs in the late stages of the optimization algorithm. While it means that the solution is not within the tolerance speciﬁed (by the primaldual gap). • Stuck backtracking. and is a symptom of the system being illconditioned. and (8) for the SOCPs) has an estimated condition number of less than 10−14 . 4 Error messages 1 magic Here we brieﬂy discuss each of the error messages that the may produce. (5) for the linear programs. Solves the standard Basis Pursuit problem (P1 ) using a primaldual algorithm. the matrix A is provided explicitly. using the Conjugate Gradient method. this will typically occur in the late stages of the optimization algorithm. This error can occur when the code is running in smallscale mode. 11 . in practice it is usually pretty close.e. Solves (PD ) (the Dantzig selector) using a primaldual algorithm. cannot ﬁnd a step size small enough that decreases the objective. after computing the step direction. The error message is produced when the condition number of the linear system we need to solve to ﬁnd the step direction (i. Newton iterations for (T V1 ). If the x0 provided is not. Newton iterations for (T V2 ). using x0 = At*inv(AAt)*y. The error message is produced when the residual produced by the conjugate gradients algorithm was above 1/2. Again. This error occurs when the algorithm. Again. and is a symptom of CG not solving for the step direction to suﬃcient precision (if the system is solved perfectly. essentially this means that CG has not solved the system in any meaningful way. Returning previous iterate. Barrier (“outer”) iterations for solving quadratically constrained 1 minimization (P2 ).cgsolve l1dantzig pd l1decode pd l1eq pd l1qc logbarrier l1qc newton tvdantzig logbarrier tvdantzig newton tveq logbarrier tveq newton tvqc logbarrier tvqc newton Solves Ax = b. that is. Barrier iterations for quadratically constrained TV minimization (T V2 ). Barrier iterations for solving the TV Dantzig selector (T VD ).
then (P1 ) can be recast as the linear program min x.i the corresponding dual variables. v. we calculate the change in the inequality dual variables as in (4): ∆λu1 ∆λu2 −1 −1 = Λu1 Fu1 (−∆x + ∆u) − λu1 − (1/τ )fu1 −1 −1 = Λu2 Fu2 (∆x + ∆u) − λu2 − (1/τ )fu2 . 2 fu2 . 1 and solve −AΣ−1 AT ∆v = w3 − A(Σ−1 w1 − Σ−1 Σ2 Σ−1 w2 ).i = −δi . the central and dual residuals are rcent rdual = = −Λu1 fu1 −Λu2 fu2 − (1/τ )1. Chap. A and b are real. 2 1 ∆v w3 b − Ax A 0 0 with −1 −1 Σ1 = Λu1 Fu1 − Λu2 Fu2 . −δi 2 fu1 . are diagonal matrices with (F• )ii = f•.i := xi − ui := −xi − ui .) Setting Σx = Σ1 − Σ2 Σ−1 .11] for a full discussion). Set fu1 . x x 1 1 This is a K ×K positive deﬁnite system of equations. . −1 (The F• . Note that fu1 . and let fu1 be the vector (fu1 . λu1 .i . and can be solved using conjugate gradients. −δi fu2 .1 . Thus at a point (x. 12 . with λu1 .Appendix A 1 minimization with equality constraints When x.i . λu2 ). fu1 .N )T (and likewise for fu2 . λu2 ). λu2 .i . see [2. λu1 .i = δi . 1 − λ u1 − λu 2 and the Newton step (5) is given by: −1 −1 ∆x w1 Σ1 Σ2 AT (−1/τ ) · (−fu1 + fu2 ) − AT v −1 −1 Σ2 Σ1 0 ∆u = w2 := −1 − (1/τ ) · (fu + fu ) .i = 0. λu1 − λu2 + AT v . ∆v. for example.i fu2 . . Ax = b which can be solved using the standard primaldual algorithm outlined in Section 2. −1 −1 Σ2 = Λu1 Fu1 + Λu2 Fu2 . and f•. ∆u. 2 1 we can eliminate ∆x = Σ−1 (w1 − Σ2 Σ−1 w2 − AT ∆v) x 1 ∆u = Σ−1 (w2 − Σ2 ∆x). where δi is the standard basis vector for component i.1 (again.u i ui subject to xi − ui ≤ 0 −xi − ui ≤ 0.i = 0.i = 1/f•. u. Given ∆x.
m = m fu2 = −Ax − u + y. −AT r − ≤ 0. λu2 ). −AT σ . Given a vector of weights σ ∈ RM . Again. the step directions for the inequality dual variables are given by ∆λu1 ∆λu2 −1 −1 = −Λu1 Fu1 (A∆x − ∆u) − λu1 − (1/τ )fu1 −1 −1 = Λu2 Fu2 (A∆x + ∆u) − λu2 − (1/τ )fu2 . −σ AT ΣA ΣA AT Σ .m fu2 . λu1 . −λu1 − λu2 −1 −1 −(1/τ ) · AT (−fu1 + fu2 ) −1 −1 −1 − (1/τ ) · (fu1 + fu2 ) ∆x ∆u = := w1 w2 = C 1 Dantzig selection An equivalent linear program to (PD ) in the real case is given by: min x. −ΣA Σ T σm fu2 . (recall that unlike the other 6 problems.m = m T σm fu1 . AT Σx A is a N × N symmetric positive deﬁnite matrix (it is straightforward to verify that each element on the diagonal of Σx will be strictly positive). the dual residual is rdual = and the Newton step is the solution to AT Σ11 A AT Σ12 Σ12 A Σ11 where Σ11 Σ12 Setting Σx = Σ11 − Σ2 Σ−1 . AT r − ≤ 0. and so the Conjugate Gradients algorithm can be used for largescale problems.m = m AT ΣA −AT Σ .u m=1 um subject to Ax − u − y ≤ 0 −Ax − u + y ≤ 0. For the primaldual algorithm. AT (λu1 − λu2 ) . −σ σm fu2 . −1 −1 = −Λu1 Fu1 − Λu2 Fu2 −1 −1 Λ u 1 Fu 1 − Λ u 2 Fu 2 . we deﬁne fu1 = Ax − u − y. Given ∆x. here the M × N matrix A has more rows than columns).m = m At a point (x. ∆u.m fu1 . u. 12 11 we can eliminate ∆u = Σ−1 (w2 − Σ22 A∆x). 13 .B The 1 1 norm approximation norm approximation problem (PA ) can also be recast as a linear program: M min x. and solve 11 AT Σx A∆x = w1 − AT Σ22 Σ−1 w2 11 for ∆x.u i ui subject to x − u ≤ 0. Σ AT σ . σm fu1 . −x − u ≤ 0.
where r = Ax − b. λu1 . Σ11 − f −1 AT A + f −2 AT rrT A Σ12 Σ12 Σ11 ∆x ∆u 14 = −1 −1 fu1 − fu2 + f −1 AT r −1 −1 −τ 1 − fu1 − fu2 := w1 w2 .u i ui Taking 1 Ax − b 2 − 2 . 1 Ax − b 2 − 2 ≤ 0. As before. u. λ 2 ). u) for a given τ as fu1 = x − u. 1 2 −1 −1 = −Λu1 Fu1 (∆x − ∆u) − λu1 − (1/τ )fu1 −1 −1 = −Λu2 Fu2 (−∆x − ∆u) − λu2 − (1/τ )fu2 = −Λ 1 F −1 (AT A∆x) − λ 1 − (1/τ )f −1 1 1 = −Λ 2 F −1 (−AT A∆x) − λ 2 − (1/τ )f −1 . the step directions for the inequality dual variables are given by ∆λu1 ∆λu2 ∆λ ∆λ 1 2 Σ12 Σ11 ∆x ∆u = −1 −1 −(1/τ ) · (AT A(−f −1 + f −1 )) − fu1 + fu2 2 1 −1 −1 −1 − (1/τ ) · (fu1 + fu2 ) := w1 w2 −1 −1 = −Λu1 Fu1 − Λu2 Fu2 = −1 −1 Λ u 1 Fu 1 − Λ u 2 Fu 2 = −Λ 1 F −1 − Λ 2 F −1 . −x − u ≤ 0. fu2 = −x − u. 1 − λu 1 − λu 2 and the Newton step is the solution to AT AΣa AT A + Σ11 Σ12 where Σ11 Σ12 Σa Again setting Σx = Σ11 − Σ2 Σ−1 . 2 2 D 1 minimization with quadratic constraints 1 The quadractically constrained cone program minimization problem (P2 ) can be recast as the secondorder subject to x − u ≤ 0. λu2 . the residuals at a point (x. 11 and solve (AT AΣa AT A + Σx )∆x = w1 − Σ12 Σ−1 w2 11 for ∆x. Taking fu1 = x − u. 2 2 f = min x. Given ∆x. f 1 = AT r − . 12 11 we can eliminate ∆u = Σ−1 (w2 − Σ12 ∆x). 2 2 we can write the Newton step (as in (8)) at a point (x. the system is symmetric positive deﬁnite. fu2 = −x − u. f 2 = −AT r − . λ 1 . and the CG algorithm can be used to solve it. the dual residual is rdual = λu1 − λu2 + AT A(λ 1 − λ 2 ) . ∆u.
. T −δij δij where δij is the Kronecker vector that is 1 in entry ij and zero elsewhere. Dij x 2 ≤ tij Ax = b. σij ij 2 ftij = T T Dh ΣDh + Dv ΣDv 0 where Σ = diag({σij }). Σ∂v = diag(Dv x). and Σ11 Σ12 As before. 2 T tij δij δij ftij = ∗ Dij Dij 0 0 . Dh has the Dh. . T = diag(t). 11 leaving us with the reduced system (Σx − f −1 AT A + f −2 AT rrT A)∆x = w1 − Σ12 Σ−1 w2 11 which is symmetric positive deﬁnite and can be solved using CG. we set Σx = Σ11 − Σ2 Σ−1 12 11 and eliminate ∆u ∆u = Σ−1 (w2 − Σ12 ∆x). with Σ∂h = diag(Dh x).x ij tij s. . −σt σij ftij ftT = ij ij BΣB T −ΣT B T −BT Σ ΣT 2 0 −Σ . and B is a matrix that depends on x: T T B = Dh Σ∂h + Dv Σ∂v . can be rewritten as the SOCP min t.where r = Ax − b. −2 −2 = Fu 1 + Fu 2 −2 −2 = −Fu1 + Fu2 .t. n (9) T Dij Dij x −tij δij 2 T T −tij Dij Dij xδij . 15 . Deﬁning the inequality functions ftij = we have ftij = ftij ftT = ij T T Dij Dij xxT Dij Dij T T −tij δij x Dij Dij 1 2 Dij 2 2 − t2 ij i.ij as rows (and likewise for Dv ). . For future reference: σij ftij = ij T T Dh ΣDh x + Dv ΣDv x . E Total variation minimization with equality constraints The equality constrained TV minimization problem min TV(x) x subject to Ax = b. j = 1.
This allows us to again solve the system above using an iterative method such as SYMMLQ [16]. 0 f fT = AT rrT A 0 . Taking ftij as in (9) and f = with f = where r = Ax − b. 22 the reduced (N + K) × (N + K) system is H11 A with H11 = H11 − BΣ2 Σ−1 B T 12 22 T T = Dh (Σb Σ2 − Ft−1 )Dh + Dv (Σb Σ2 − Ft−1 )Dv + ∂h ∂v T T Dh (Σb Σ∂h Σ∂v )Dv + Dv (Σb Σ∂h Σ∂v )Dh AT 0 ∆x ∆v = w1 0 (10) w1 = w1 − BΣ12 Σ−1 w2 22 T T = w1 − (Dh Σ∂h + Dv Σ∂v )Σ12 Σ−1 w2 22 Σb = Ft−2 − Σ−1 Σ2 . . and hence can be stored and applied very eﬃciently. i. 0 0 2 f = ftij = 0 T −δij δij 2 f = A∗ A 0 0 0 . . j = 1. n 2 Ax − b ≤ where Dij is as in (1). A∗ A 0 0 0 AT r . Also.The Newton system (8) for the logbarrier algorithm is then T −1 T H11 BΣ12 AT ∆x Dh Ft Dh x + Dv Ft−1 Dv x w1 Σ12 B T := w2 . Eliminating ∆t ∆t = Σ−1 (w2 − Σ12 B T ∆x) 22 = Σ−1 (w2 − Σ12 Σ∂h Dh ∆x − Σ12 Σ∂v Dv ∆x). Note that Dh and Dv are (very) sparse matrices. . Σ22 0 ∆t = −τ 1 − Ft−1 t ∆v 0 A 0 0 0 where T T H11 = Dh (−Ft−1 )Dh + Dv (−Ft−1 )Dv + BFt−2 B T . 22 12 The system of equations (10) is symmetric. but not positive deﬁnite. 16 . F Total variation minimization with quadratic constraints We can rewrite (T V2 ) as the SOCP min x. 2 ∗ Dij Dij 0 1 2 Ax − b 2 2 − 2 .t ij tij subject to Dij x 2 ≤ tij . .
t ij tij subject to T Dij x 2 ≤ tij .ij = 2 AT AΣAT A 0 . The inequality constraint functions are ftij f f with σij f ij 1 . G Total variation minimization with bounded residual correlation The T V Dantzig problem has an equivalent SOCP as well: min x. j = 1.The Newton system is similar to that in equality constraints case: H11 Σ12 B T BΣ12 Σ22 ∆x ∆t = T T Dh Ft−1 Dh x + Dv Ft−1 Dv x + f −1 AT r −τ 1 − tft−1 := w1 .ij 1 2 = 1 Dij x 2 − t2 2 ij 2 T = A (Ax − b) − . . 12 22 The system above is symmetric positive deﬁnite. 0 0 17 . .ij f T . Σb = Ft−2 − Σ2 Σ−1 . . i. j = 1. and can be solved with CG. . .ij = 1 ij σij f 2 . and T T H11 = Dh (−Ft−1 )Dh + Dv (−Ft−1 )Dv + BFt−2 B T − f −1 AT A + f −2 AT rrT A. 0 i. AT Aσ .ij f T . . n A (Ax − b) − ≤ 0 −AT (Ax − b) − ≤ 0. 0 and σij f ij 1 . 22 the key system is T T H11 ∆x = w1 − (Dh Σ∂h + Dv Σ∂v )Σ12 Σ−1 w2 22 where H11 = H11 − BΣ2 Σ−1 B T 12 22 T T = Dh (Σb Σ2 − Ft−1 )Dh + Dv (Σb Σ2 − Ft−1 )Dv + ∂h ∂v T T Dh (Σb Σ∂h Σ∂v )Dv + Dv (Σb Σ∂h Σ∂v )Dh − f −1 AT A + f −2 AT rrT A. n = σij f ij 2 . . = −AT (Ax − b) − . . Σ22 = Ft−1 + Ft−2 T 2 . Σ12 = − T Ft−2 . Again eliminating ∆t ∆t = Σ−1 (w2 − Σ12 Σ∂h Dh ∆x − Σ12 Σ∂v Dv ∆x).ij = −AT Aσ . w2 where (tft−1 )ij = tij /ftij .
Robust uncertainty principles: Exact signal recone struction from highly incomplete frequency information. Submitted to Communications on Pure and Applied Mathematics. BΣ12 Σ22 ∆x ∆t = T T Dh Ft−1 Dh x + Dv Ft−1 Dv x + AT A(f −1 − f −1 ) 1 2 −τ 1 − tft−1 := w1 . December 2005. Cand`s and T. 22 the key system is T T H11 ∆x = w1 − (Dh Σ∂h + Dv Σ∂v )Σ12 Σ−1 w2 22 where T T H11 = Dh (Σb Σ2 − Ft−1 )Dh + Dv (Σb Σ2 − Ft−1 )Dv + ∂h ∂v T T Dh (Σb Σ∂h Σ∂v )Dv + Dv (Σb Σ∂h Σ∂v )Dh + AT AΣa AT A. [5] E. Tao. 2003.GM/0409186. May 2005. Stable signal recovery from incomplete and inaccurate e measurements. Tao. Cand`s. w2 Σ12 = − T Ft−2 . November 2004. Nearoptimal signal recovery from random projections and universal e encoding strategies. e Inform. 12 22 References [1] F. [2] S. [4] E. Tao. Goldfarb. Theory. Boyd and L. Romberg. 1 2 Eliminating ∆t as before ∆t = Σ−1 (w2 − Σ12 Σ∂h Dh ∆x − Σ12 Σ∂v Dv ∆x). and T. 2005. Σ22 = Ft−1 + Ft−2 T 2 . 95:3–51. Theory. Ser. Inform. Σb = Ft−2 − Σ2 Σ−1 . Romberg. Cambridge University Press. Cand`s and J. and T. J. The Dantzig selector: statistical estimation when p is much smaller e than n. Math. To appear in IEEE Trans. March 2005. 2004. Tao.Thus the log barrier Newton system is nearly the same as in the quadratically constrained case: H11 Σ12 B T where T T H11 = Dh (−Ft−1 )Dh + Dv (−Ft−1 )Dv + BFt−2 B T + AT AΣa AT A. June 2004. Available on the ArXiV preprint server: math. Vandenberghe. J. J. Tao. 18 . Alizadeh and D. Theory. Secondorder cone programming. Cand`s and T. Inform. [7] E. [8] E. Decoding by linear programming. [6] E.CA/0410542. Program. Math. Romberg. submitted to IEEE Trans. To appear in Foundations of Comput. Manuscript. Cand`s and T. B. Quantitative robust uncertainty principles and optimally sparse e decompositions.. Submitted to IEEE Trans. [3] E. Convex Optimization.. Σa = F −2 + F −2 . Available on theArXiV preprint server: math. Cand`s.
Interior Point Polynomial Methods in Convex Programming. Lebret. 2004. Numer. Saunders. L. Rudin. Fatemi. Wright. 1999. S. Nesterov and A.. An introduction to the conjugate gradient method without the agonizing pain. [20] S. Secondorder cone programming methods for total variationbased image restoration.. S. Boyd. 20:33–61. J. Chan. A. Linear Algebra and its Applications. C. 284:193–228. [15] J. Comput. 1999. Springer. and P. SIAM J. Applications of secondorder cone programming. 12(4). Hinterm¨ller and G. Manuscript. SIAM Publications. 2005. Donoho. SIAM J. Vanderberghe. 60:259–68. S. SIAM Publications. 1999. E. PrimalDual InteriorPoint Methods. 1994. Yin. Atomic decomposition by basis pursuit. An infeasible primaldual algorithm for TVbased infu convolutiontype image restoration. and H. Saunders. 2001. SIAM. SIAM J. [11] D. [10] S. [18] L. Philadelphia. Physica D.[9] T. Columbia University.. Osher. and E. Golub. 19 . G. Comput. Technical report. A nonlinear primaldual method for total variationbased image restoration. I. S. [17] J. Numerical Optimization. 1998. September 1975. Comput. 1992. [12] H.. Nonlinear total variation noise removal algorithm. [16] C. Sci. Lobo. J. and M. Chen. Wright. Solution of sparse indeﬁnite systems of linear equations. Nemirovski. Goldfarb and W. Mulet. Sci. August 1994. [13] M. Anal. New York. Paige and M. A mathematical view of interiorpoint methods in convex optimization. Sci. R. [14] Y. 20:1964–1977. 1997. To appear in SIAM J. Stadler. Nocedal and S. Renegar. [19] J. Shewchuk. L. D. MPSSIAM Series on Optimization.