ALTERNATING DIRECTION ALGORITHMS FOR CONSTRAINED SPARSE REGRESSION: APPLICATION TO HYPERSPECTRAL UNMIXING Jos´ M.

Bioucas-Dias e and M´ rio A. T. Figueiredo a

Instituto de Telecomunicacoes, Instituto Superior T´ cnico, Lisboa, Portugal ¸˜ e
ABSTRACT Convex optimization problems are common in hyperspectral unmixing. Examples are the constrained least squares (CLS) problem used to compute the fractional abundances in a linear mixture of known spectra, the constrained basis pursuit (CBP) to find sparse (i.e., with a small number of terms) linear mixtures of spectra, selected from large libraries, and the constrained basis pursuit denoising (CBPDN), which is a generalization of BP to admit modeling errors. In this paper, we introduce two new algorithms to efficiently solve these optimization problems, based on the alternating direction method of multipliers, a method from the augmented Lagrangian family. The algorithms are termed SUnSAL (sparse unmixing by variable splitting and augmented Lagrangian) and C-SUnSAL (constrained SUnSAL). C-SUnSAL solves the CBP and CBPDN problems, while SUnSAL solves CLS as well as a more general version thereof, called constrained sparse regression (CSR). C-SUnSAL and SUnSAL are shown to outperform off-the-shelf methods in terms of speed and accuracy. 1. INTRODUCTION Hyperspectral unmixing (HU) is a source separation problem with applications in remote sensing, analytical chemistry, and other areas [10, 11, 12]. Given a set of observed mixed hyperspectral vectors, HU aims at estimating the number of reference spectra (the endmembers), their spectral signatures, and their fractional abundances, usually under the assumption that the mixing is linear [10, 12]. Unlike in a canonical source separation problem, the sources in HU (i.e., the fractional abundances of the spectra/materials present in the data) exhibit statistical dependency [15]. This characteristic, together with the high dimensionality of the data, places HU beyond the reach of most standard source separation algorithms, thus fostering active research in the field. Most HU methods can be classified as statistical or geometrical. In the (statistical) Bayesian framework, all inference relies on the posterior probability density of the unknowns, given the observations. According to Bayes’ law, the posterior probability density results from two factors: the observation model (the likelihood), which formalizes the assumed data generation model, possibly including random perturbations such as additive noise; the prior, which may impose natural constraints on the endmembers (e.g., nonnegativity) and on the fractional abundances (e.g., belonging to the probability simplex, since they are relative abundances), as well as model spectral variability [5, 14, 16]. Geometrical approaches exploit the fact that, under the linear mixing model, the observed hyperspectral vectors belong to a simplex set whose vertices correspond to the endmembers. Therefore, finding the endmembers amounts to identifying the vertices of that simplex [18, 16, 3, 13, 20, 1]. Sparse regression is another direction recently explored for HU [9], which has connections with both the statistical and the geometrical frameworks. In this approach, the problem is formulated as that of fitting the observed (mixed) hyperspectral vectors with sparse (i.e., containing a small number of terms) linear mixtures of spectral signatures from a large dictionary available a priori. Estimating the endmembers is thus not necessary in this type of methods. Notice that the sparse regression problems in this context are not standard, as the unknown coefficients (the fractional abundances) sum to one (the so-called abundance sum constraint – ASC) and are nonnegative (abundance non-negativity constraint – ANC). These problems are thus referred to as constrained sparse regression (CSR). Several variants of the CSR problem can be used for HU; some examples follow. In the classical constrained least squares (CLS), the fractional abundances in a linear mixture of known spectra are estimated by minimizing the total squared error, under the ANC and the ASC. Although no sparseness is explicitly encouraged in CLS, under some conditions (namely positivity of the spectra) it can be shown that the solutions are indeed sparse [2]. Constrained basis pursuit (CBP) is a variant of the well-known basis pursuit (BP) criterion [4] under the ANC and the ASC; as in BP, CBP uses the ℓ1 norm to explicitly encourage sparseness of the fractional abundance vectors. Finally, constrained basis pursuit denoising (CBPDN) is a generalization of CBP that admits modeling errors (e.g., observation noise). 1.1. Contribution In this paper, we introduce a class of alternating direction algorithms to solve several CSR problems (namely CLS, CBP, and CBPDN). The proposed algorithms are based on the alternating direction method of multipliers (ADMM) [8, 7, 6], which decomposes a difficult problem into a sequence of simpler ones. Since ADMM can be derived as a variable splitting procedure followed by the adoption of an augmented Lagrangian method to solve the resulting constrained problem, we term our algorithms as SUnSAL (spectral unmixing by splitting and augmented Lagrangian) and C-SUnSAL (constrained SUnSAL). The paper is organized as follows. Section 2 introduces notation and formulates the optimization problems. Section 3 reviews the ADMM and the associated convergence theorem. Section 4 introduces the SUnSAL and C-SUnSAL algorithms. Section 5 presents experimental results, and Section 6 ends the paper by presenting a few concluding remarks. 2. PROBLEM FORMULATION: CSR, CLS, CBP, CBPDN Let A ∈ Rk×n denote a matrix containing the n spectral signatures of the endmembers, x ∈ Rn denote the (unknown) fractional abundance vector, and y ∈ Rk be an (observed) mixed spectral vector.

d0 ∈ Rp . THE ADMM Consider an unconstrained problem of the form x∈R With the current setting. The alternating direction method of multipliers (ADMM). thus usually n > k. λ/µ). (9) (10) (11) (3) subject to: ∥Ax − y∥2 ≤ δ. and d0 . ADMM for CSR and CLS: the SUnSAL Algorithm We start by writing the optimization PCSR in the equivalent form min (1/2)∥Ax − y∥2 + λ∥x∥1 + ι{1} (1T x) + ιRn (x). Fig. (17) A straightforward reasoning leads to the conclusion that the effect of the ANC term ιRn is to project onto the first orthant. uk+1 ∈ arg minu f2 (u) + µ ∥G xk+1 − u − dk ∥2 2 2 5. otherwise. APPLICATION OF ADMM In this section.In this paper. and the inequality x ≥ 0 is to be understood in the componentwise sense.. The ADMM [6. The objective function (8) is proper. Notice that PCBP corresponds to PCSR with λ → ∞. Problem PCSR is equivalent to PCBPDN in the sense that for any choice of δ for which PCBPDN is feasible. in which case usually n ≪ k. where the maximum is to be understood in the componentwise sense.. until stopping criterion is satisfied. The constraints x ≥ 0 and 1T x = 1 correspond to the ANC and ASC. convex. 4. subject to: x ≥ 0. x ≥ 0. The key advantage of the CSR approach is that it avoids the estimation of endmembers. 1. choose µ > 0. 1T x = 1. k = 0. The CBP and CBPDN problems are also equivalent to particular cases of PCSR . The CLS problem corresponds to PCSR with λ = 0. where it is a library with a large number of spectral signatures. we assume that A is known. . is as shown in Fig. at least one of the sequences {uk } or {dk } diverges. 1) is simply ¯ ¯ where f1 : Rn → R.1. 2 + x (8) (2) subject to: Ax = y. The CBP optimization problem is (PCBP ): min ∥x∥1 x 4. proper. the key tool in this paper. λ/µ)}. (5) (6) (7) uk+1 ← arg min (1/2)∥u − ν k ∥2 + (λ/µ)∥u∥1 + ιRn (u) (16) 2 + u where ν k ≡ xk+1 − dk .. we specialize the ADMM to each of the optimization problems stated in Section 2. u0 . and coercive. n (4) w ≡ A y + µ(uk + dk ).. soft(ν k . k ←k+1 7. The following is a simplified version of a theorem of Eckstein and Bertsekas stating convergence of ADMM. 2. there is a choice of λ for which the solutions of the two problems coincide [19]. if (4) has a solution. ιS (x) = 0 if x ∈ S and ιS (x) = ∞ if x ∈ S). which is obtained by replacing lines 3 and 4 of ADMM by (12) and (18).. k = 0. repeat 3. Set k = 0.}. We now apply the ADMM using / the following translation table: 1 ∥Ax − y∥2 + ι{1} (1T x) 2 2 f2 (x) ≡ λ∥x∥1 + ιRn (x) + f1 (x) ≡ G ≡ I. respectively. 1. {uk ∈ Rp . 2 shows the SUnSAL algorithm. the solution of which is xk+1 ← B−1 w − C(1T B−1 w − 1) where B ≡ AT A + µI C≡B −1 T (12) (13) −1 1(1 B T 1) −1 (14) (15) min f1 (x) + f2 (G x). the solution of (16) + would be the well-known soft threshold [4]: uk+1 ← soft(ν k . k = 0. Consider arbitrary µ > 0 and u0 .e.}. 1. Step 4 of the ADMM (Fig. and convex. Finally.. where ∥x∥2 and ∥x∥1 denote the ℓ2 and ℓ1 norms of x.} that satisfy xk+1 uk+1 dk+1 = = = µ arg min f1 (x) + ∥Gx−uk −dk ∥2 2 x 2 µ arg min f2 (u) + ∥Gxk+1 −u−dk ∥2 2 u 2 dk − (G xk+1 − uk+1 ). 8]. 1T x = 1. 7. 1. lower semicontinuous. f2 : Rp → R. respectively. Theorem 1 ([6]) Let G have full column rank and f1 . notice that PCBP corresponds to PCBPDN with δ = 0.. quite often a very hard problem. f2 be closed. (18) Then. respectively. 1 denotes a column vector of 1’s. xk+1 ∈ arg minx f1 (x) + µ ∥G x − uk − dk ∥2 2 2 4. as stated next. 3. The general CSR problem is defined as L(x) Algorithm ADMM 1. 1) of the ADMM requires solving a quadratic problem with linear equality constraint. x ≥ 0. and G ∈ Rp×n . . dk+1 ← dk − (G xk+1 − uk+1 ) 6. 1T x = 1. Matrix A can also be the output of an endmember extraction algorithm. thus it has a non-empty set of minimizers . (PCSR ): min (1/2)∥Ax − y∥2 + λ∥x∥1 2 x (1) Fig. 1. this is the case the CSR approach [9]. The CBPDN optimization problem is (PCBPDN ): min ∥x∥1 x where ιS is the indicator function of the set S (i. . λ ≥ 0 is a parameter controlling the relative weight between the ℓ2 and ℓ1 terms. step 3 (Fig. the sequence {xk } converges to it. Consider three sequences {xk ∈ Rn . and {dk ∈ Rp . thus + uk+1 ← max{0. Without the term ιRn .

and d0 .δ) (u) 2 u u (27) u2. (see [19]. we refer that. 5. 2. we address only the latter.k ) 7. thus it has a non-empty set of minimizers.0 . Functions f1 and f2 in (9) and (10) are closed and G ≡ I is obviously of full column rank.k = xk+1 − d2. Functions f1 and f2 in (20) and (21) are closed and G in (22) is obviously of full column rank.k+1 ∈ arg min (1/2)∥u − ν 1. k ←k+1 9. a quadratic problem with linear equality constraints. The complexity of the algorithm per iteration is thus O(n2 ).k ← xk+1 − d2. Constrained spectal unmixing by variable slitting and augmented Lagrangian (C-SUnSAL).k+1 ← max{0.k ) + (u2. Concerning the computational complexity. ∥ν 1. which results from replacing line 3 of ADMM (Fig. Problem PCBPDN is equivalent to min ∥x∥1 + ιB(y. where ν 1. u1. (32) (19) where B(y.k ) 4.k+1← ψB (y.k + d2. choose µ > 0. ADMM for CBP and CBPDN: the C-SUnSAL Algorithm Given that the CBP problem corresponds to CBPDN with δ = 0. w ← AT (u1. convex. δ) = {z : ∥z − y∥2 ≤ δ} is a radius-δ closed ball around y. soft(ν k .k .0 . EXPERIMENTS We now report experimental results obtained with simulated data generated according to y = Ax + n. u2.k+1 ← ψB (y. To solve the CLS problem. is xk+1 ← B where B ≡ AT A + I C≡B −1 T −1 Fig. Set k ← 0. repeat 3. ν k ← xk+1 − dk 6. thus Theorem 1 can be invoked to ensure convergence of C-SUnSAL. 2. line 4 of ADMM (Fig.k ν 2. (31) Similarly to (18).δ) (Ax) + ι{1} (1T x) + ιR+ (x).k 6. soft(ν 2.k+1 ← d2.k + d2.k − (xk+1 − u2. For this reason. and d2. λ/µ)}. thus B−1 can be easily precomputed. k ←k+1 12. d2.k ). thus Theorem 1 can be invoked to ensure convergence of SUnSAL. λ/µ)} 7. 1). ν 2. choose µ > 0. We define the signal-to-noise ratio (SNR) as ( ) E[∥Ax∥2 ] 2 SNR ≡ 10 log10 . dk+1 ← dk − (xk+1 − uk+1 ) 8. The objective function (19) is proper. until stopping criterion is satisfied. the ANC and/or the ASC can be deactivated trivially. 2.k .k 8. δ). Fig. u2.k −y∥2 δ. the solution of 2 line 3 of ADMM (Fig. 3 shows the C-SUnSAL algorithm for CBPDN.k −y u1. lower semicontinuous.δ) (u1 ) + λ∥u2 ∥1 + ιRn (u2 ) + G = [AT I]T .k+1 ) 11. repeat 3. As mentioned above. 1) by (23) and line 4 of ADMM by (31)–(32).k − y∥2 ≤ δ ν 1.k ← Axk+1 − d1. often of the order of a few hundred. the ANC (non-negativity constrain) can be turned off by using (17) instead of (18) and the ASC (sum to one constraint) can be deactivated by setting C = 0 in (12). u1. d1. where n ∈ Rk models additive perturbations. the rank of matrix B is no larger that the number of bands.k = Axk+1 − d1.k ) + (u2. Set k = 0. d1. E[∥n∥2 ] 2 The expectations in the above definition are approximated with sample means over 10 runs. Moreover. the solution of (28) is given by u2. and coercive. xk+1 ← B−1 w − C(1T B−1 w − 1) 5. Concerning the computational complexity. In hyperspectral applications. 1) consists in solving two separate problems. u0 . we generate random samples uniformly in the (s − 1)−simplex and distribute randomly these s values among the w − C(1 B T −1 w − 1).k − y∥2 > δ. C-SUnSAL can be used to solve the CBP problem simply by setting δ = 0.Algorithm SUnSAL 1.k+1 ← d1.i. δ)(ν 1. Because the variables u1 and u2 are decoupled. these perturbations are mostly model errors dominated by low-pass components.0 . As in SUnSAL.k ∥2 + (λ/µ)∥u∥1 + ιRn (u) 2 + (28) . x Fig.k+1 ∈ arg min (1/2)∥u − ν 2. w ← AT y + µ(uk + dk ) 4.d. λ/µ)} 9. (23) (24) −1 1(1 B T 1) −1 (25) (26) w ≡ A (u1.0 . in hyperspectral applications. given by { ν 1k .k+1 ← max{0.2. δ)(ν 1k ) ≡ y + ∥ν 1. 4.k + d1. (20) (21) (22) where u = [u1 T uT ]T . Gaussian sequences of random variables. u1. we generate the noise by low-pass filtering samples of zero-mean i. With the above definitions. xk+1 ← B−1 w − C(1T B−1 w − 1) 5. To apply the ADMM we use the following definitions: f1 (x) = ι{1} (1T x) f2 (u) = ιB(y.k + d1. ν 1. uk+1 ← max{0.k − (Axk+1 − u1.k+1 ) 10.k . 3. until stopping criterion is satisfied. corresponding to the matrix-vector products. soft(ν 2. ∥ν 1.k ∥2 + ιB(y. Spectral unmixing by variable slitting and augmented Lagrangian (SUnSAL). the number of nonzero components in x. the scenario is similar to that of SUnSAL. thus complexity of C-SUnSAL is O(n2 ) per iteration. we simply run SUnSAL with λ = 0. for definitions of these convex analysis concepts). (29) (30) The solution of (27) is the projection onto the ball B(y. Algorithm C-SUnSAL 1. The original fractional abundance vectors x are generated in the following way: given s.

[3] C. This is a critical issue in imaging application where an instance of the problem has to be solved for each pixel. Math. 19. on Geoscience and Remote Sensing. Marroco.” SIAM review. and J. SUnSAL and C-SUnSAL run 200 iterations. 1 http://speclab. Keshava and J. Algorithms for Multispectral. Bioucas-Dias. 2000. J. “Spatial/spectral endmember extraction by multidimensional morphological operations. Appl. vol. Analytical Chemistry. Tourneret. which is a particular case of CSR. Tables 1 and 2 report reconstruction SNR (RSNR).13 lsqnonneg RSNR time (sec) -7 22 10 32 15 47 components of x. [19] R.-C. RSNR values and execution times for the Gaussian library defined in the text (average over 10 runs). no. which was found to be more than enough to achieve convergence. “Hyerspectral unmixing algorithm via dependent component analysis.4049. no. 2007. o D. Table 2. IEEE Trans. Rockafellar. Signal Proc. [13] L. vol. pp. Geosc. pp. pp. Mustard. F. [16] J.gov/spectral. [18] R. there are no special purpose algorithms for solving the CSR. 2008. Qi. [17] R. M. SNR (dB) 30 40 50 SUnSAL RSNR time (dB) (sec) 6 0. P. 56. Gabay and B.” IEEE Trans. Geosc. et la resolution.-I. [11] M. 42. “NIR hyperspectral unmixing based on a minimum volume criterion for fast and accurate chemical characterisation of counterfeit tablets”. 44. pp. 43. C. vol. 2008. Princeton University Press. S. which we use as baseline in our comparisons. (b) the lower accuracy obtained with the USGS matrix is due to the fact that the spectral signatures are highly correlated resulting in a much harder problem than with the Gaussian matrix. defined as ( RSNR = 10 log10 E[∥x∥2 ] 2 E[∥x − x∥2 ] 2 ) . We considered two libraries (i. “Atomic decomposition by basis pursuit. vol. Gader. Bruckstein. Moussaoui.13 23 0. vol.. “A quantitative and comparative analysis of endmembr extraction algorithms from hyperspectral data. 765–777. 1992 [7] D.12 C-SUnSAL RSNR time (dB) (sec) 3 0. pp.G. Chang. [15] J.13 17 0. and Math. pp.. Francaise d’Automatique. [14] S. can be solved with the MATLAB function lsqnonneg. J. Plaza. and Ultraspectral Imagery. 175–187. 2684–2695. 2007. and C.-C. Plaza. 41–76. Rech.P. 293-318. on Geoscience and Remote Sensing..-D. Remote Sensing. Perez A. Reykjvik. vol. 45. Bertsekas. D.e. Gaussian entries and a 224 × 498 matrix with a selection of 498 materials (different mineral types) from the USGS library denoted splib061 . Mercier. 2008. 2009.12 30 0. CONCLUDING REMARKS In this paper. Remote Sensing. 2002. SNR (dB) 20 30 40 50 SUnSAL RSNR time (dB) (sec) 10 0. “Algorithm taxonomy for hyperspectral unmixing. J. Zibulevsky. 5. 650–663. pp. Hyperspectral. Comp. 44–57.-Y. 4813–4820. Donoho.” IEEE Trans.12 37 0. Plaza. 9. Wu. Schmidt. 129–159. Martinez and J. pp. 4033–4036. REFERENCES [1] J. [20] A.12 27 0. Of course these are canonical convex problems. Eckstein. “Endmember extraction from highly mixed data using minimum volume constrained nonegative matrix factorization. Jutten. 43. 1. ”On the Use of Spectral Libraries to perform Sparse Unmixing of Hyperspectral Data”. Saunders. Rev. “On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators”. vol. 1462-1469. ¸ Inform. A. accepted. CBP. 2010. Glowinski and A. [6] J. We showed that sufficient conditions for convergence are satisfied.” IEEE Trans. Shaw.2 0. Inf. 40.12 lsqnonneg RSNR time (sec) 3 31 25 32 27 48 42 57 7. 2025–2041. 6. thus they can be tackled with standard convex optimization techniques.lib06 . IEEE Geoscience and Remote Sensing Letters. Wolff. 2010 (submitted). “A non-negative and sparse enough solution of an underdetermined linear system of equations is unique”. “Spectral unmixing. Ongoing work includes a comprehensive experimental evaluation of the proposed algorithms. “Sur l’approximation. F. Iordache. SPIE Vol. Hauksd´ ttir. and J.A. W. par penalisation-dualit´ d’une classe de e problemes de Dirichlet non lineares”. 2006. “A new growing method for simplex-based endmember extraction algorithm. Nascimento and J. Iceland. ”Hyperspectral band selection and endmember detection using sparsity promoting priors”.12 47 0.” Neurocomputing. 1976. Perez A.i. we introduced new algorithms to solve a class of optimization problems arising in spectral unmixing.. We highlight the following conclusions: (a) the proposed algorithms achieve higher accuracy in about two orders of magnitude shorter time. J. 42. Ouyang. M. Martinez and J. Bioucas-Dias. kerekes and G. “On the decomposition e of Mars hyperspectral data by ICA and Bayesian positive source separation.13 C-SUnSAL RSNR time (dB) (sec) 1. where x is the estimated fractional abundance vector. “Does independent component analysis play a role in unmixing hyperspectral data?” IEEE Trans. “A dual algorithm for the solution of nonlinear variational problems via finite-element approximations”. vol. RSNR values and execution times for the USGS library (average over 10 runs). Zare and and P. Lopes. vol. Bioucas-Dias.” IEEE Trans. Figueiredo. vol. Op´ rationelle. Plaza. Chanussot. Namely. 82. 256–260.” IEEE Trans. matrices A): a 200 × 400 matrix with zero-mean unit variance i.” IEEE Intern. pp.5 0. NJ. Benediksson. pp. 1995.1. and M. [10] N. Bioucas-Dias. C. Theo. VI. Miao and H.. Remote Sensing.5 0. pp. 2008. 2. vol.12 32 0. Plaza. 2002. J. Manolakis N. [12] D.12 48 0. pp. pp.13 12. and CBPDN problems. and execution times. P. Keshava. In limited set of experiments.-C. Dobigeon.. The lsqnonneg is run with its default options. 54.-I Chang.Table 1. in First IEEE GRSS Workshop on Hyperspectral Image and Signal Processing-WHISPERS’2009. and Y. Geoscience and Remote Sensing Symposium.” Proc. Chen and D. 2804– 2819. Geosc.d. Elad. France. 2005. 17–40. T. [5] N. pp.13 14. Bioucas-Dias. the proposed algorithms were shown to clearly outperform an off-the-shelf optimization tool. vol 5. and and M. 2. Convex Analysis. [8] R. 1975.” IEEE Signal Processing Magazine.cr. A. which decomposes a difficult problem into a sequence of simpler ones. Princeton. pp. The proposed algorithms are based on the alternating direction method of multipliers. in 2nd IEEE GRSS Workshop on Hyperspectral Image and Signal ProcessingWHISPERS’2010. 2004. vol. ”A variable splitting augmented Lagrangian approach to linear spectral unmixing”. no. As far as we know. vol. Nascimento and J. 1970. pp. Grenoble. for the two libraries referred above. Progr. [4] S. the CLS. Liu. vol. Dout´ . Brie.usgs. e [9] M. H. [2] A. “Semi-supervised linear spectral unmixing using a hierarchical Bayesian model for hyperspectral imagery. par elements finis d’ordre un.