Professional Documents
Culture Documents
Abstract—We develop a dictionary learning algorithm by sparsity. The K-SVD uses the `2 distortion as a measure
minimizing the `1 distortion metric on the data term, which is of data fidelity. The dictionary learning problem has also
known to be robust for non-Gaussian noise contamination. The been addressed from the analysis perspective [9]–[11]. Apart
proposed algorithm exploits the idea of iterative minimization of
from denoising, dictionary-based techniques find applications
arXiv:1410.0311v2 [cs.CV] 3 Mar 2015
xn 1 0.15 0.15
0.15
0.1
0.1 0.1
where D̂ denotes the estimate of the dictionary in the current 0.05 0.05
0.05
! 1 -K-SVD
K-SVD
iteration. To solve (2) using IRLS, one has to start with an 0
20 40 60 80
0
20 40 60 80
0 RDL
20 40 60 80
ITERATIONS ITERATIONS ITERATIONS
(0)
initial guess xn and update xn in the k th iteration as 1 1
! 1 -K-SVD 1 ! 1 -K-SVD
ATOM DETECTION RATE
0.4
0.6
0.4
0.6
0.4
(0) (0)
W1n and W2n to identity matrices, for all n, and updating N = 600 N = 300 N = 200
th
in the k iteration as
FIG . 2: (Color online) Performance comparison of `1 -K-SVD, K-
(k+1) 1 (k+1) 1 SVD, and RDL algorithms for varying data-size N . The values of
W1n (j) = , W2n (j) = ,
yn − D̂xn(k) +
(k)
xn + m, s, and K are kept fixed at m = 20, s = 3, and K = 50. The
j
j
figures in the first and second rows show the variations of the distance
metric κ and the ADR, respectively, with iterations. The dataset is
where W (j) denotes the j th diagonal entry of a diagonal corrupted by additive Laplacian noise with SNR 20 dB.
matrix W , and (x)j denotes the j th entry of the vector x.
A small positive constant is added to the denominator to corresponding row in X, we focus on the error matrix Ej =
ensure numerical stability. One can also choose to work with Y − i6=j di xi , resulting from the absence of dj , where xi
P
an equivalent constrained version of (2) of the form denotes the ith row of X. In order to obtain a refined estimate
of dj and xj , one is required to solve
min
yn − D̂xn
subject to kxn k1 ≤ τn , (3)
xn 1
min
Ej − dj xj
s.t. S xj = Ωj and kdj k2 = 1, (4)
where τn controls the sparsity of the resulting solution, with j
dj ,x 1
smaller values of τn resulting in higher sparsity. Efficient where kAk1 denotes the sum of the absolute values of the
optimization packages, such as CVX [21] and ManOpt [22] entries of matrix A. We seek to minimize
the cost function
are available to solve problems of the form given in (2) and subject to the constraint that S xj = Ωj , where S(·)
(3). To further reduce the sensitivity of the solution to noise, denotes the support operator, defined as S (z) = {i : zi 6= 0}.
we apply an appropriate threshold T0 on the entries of X, Therefore, it suffices to consider only those columns of Ej
following sparse coding. The optimum value of the threshold is whose indices are in the support set Ωj , denoted by Ej Ω .
j
application specific, and the values chosen in our experiments Furthermore, to get rid of scaling ambiguity, we restrict D
are indicated in Section III. Thus, the proposed approach may to have unit-length atoms, that is, kdi k2 = 1 for all i.
Consequently, solving (4) is equivalent to solving
also be interpreted as `1 minimization followed by pruning.
M
X
min
E − uvT
= min ken − vn uk1 s.t. kuk2 = 1, (5)
B. Dictionary Update u,v 1 u,v
n=1
Similar to the K-SVD algorithm, we adopt a sequential
K
updating strategy for the dictionary atoms {di }i=1 . To si- where en is the nth column of E and vn denotes the nth entry
th
multaneously update the j atom in the dictionary and the of the vector v. The key idea behind the `1 -K-SVD approach
3
K-SVD `1 -K-SVD
σ = 15 σ = 25 σ = 35 σ = 15 σ = 25 σ = 35
Gaussian noise 31.72/31.23 28.66/28.26 26.43/26.18 30.77/30.08 28.58/28.00 25.98/26.22
0.80/0.81 0.68/0.67 0.56/0.54 0.78/0.77 0.73/0.73 0.62/0.61
Laplacian noise 31.50/31.04 28.39/27.98 26.12/25.86 31.70/30.96 28.46/28.30 26.79/26.55
0.79/0.80 0.67/0.66 0.55/0.53 0.81/0.80 0.76/0.75 0.66/0.60
TABLE II: (Color online) Comparison of output PSNR (DB) and SSIM (corresponding to the first and second rows in each cell, respectively)
for the K-SVD and `1 -K-SVD algorithms on the House (black) and Tower (red) images corrupted by additive Gaussian and Laplacian noise,
for different input noise levels. The values reported are obtained by averaging over five independent realizations. The values of PSNR and
SSIM did not change much from one realization to another.
Algorithm
1 (`1 -K-SVD): To find u and v such that found that J = 10 suffices for the convergence. The `1 -K-
E − uvT
is minimized, subject to kuk = 1. SVD algorithm is computationally more expensive than the
1 2
1. Input: Error matrix E, number of iterations J. K-SVD and starts with the K-SVD initialization. However, the
2. Initialization: quality of the solution obtained using `1 -K-SVD is better than
(p) that obtained using K-SVD, in the case where the training set
• Set the iteration counter p ← 0, u ← a, and
(p) is contaminated by non-Gaussian as well as Gaussian noise.
v ← σb, where a and b are the left and right sin-
gular vectors, respectively, corresponding to the largest
III. E XPERIMENTAL R ESULTS
singular value σ of E.
(p) 1) Synthesized Signal: To validate the `1 -K-SVD algo-
• Initialize the weight matrices Wn ← I, for n =
1, 2, · · · , M , where I denotes the identity matrix and rithm, we first conduct experiments on synthesized training
M is the number of columns in E. data, for Gaussian as well as Laplacian noise. Since the
ground-truth dictionary D is known in the experiments, we
3. Iterate the following steps J times:
(p+1)
use two performance metrics, namely the atom detection rate
1
• wn (j) ←
(p)
en −vn u(p) + (ADR) and distance κ D̂, D of the estimated dictionary D̂
j
" M 2
#−1 " M
# from the ground-truth D. To compute ADR, we count the
number of recovered atoms in D. An atom di in D is consid-
X X
• u (p+1)
← vn(p) Wn(p+1) vn(p) Wn(p+1) en
T
n=1
T
n=1 ered to be recovered if di d̂j exceeds 0.99 for some j. Since
(p+1) (u(p+1) ) Wn en
• vn ← T , for n = 1, 2, · · · , M D and D̂ contain unit-length atoms, this criterion ensures a
(u(p+1)
) Wn (u(p+1) )
• p←p+1 near-accurate atom recovery. Finally, to compute ADR, we
u(p)
take the ratio of the number of recovered atoms to the total
4. Scaling: Set u ← and v ←
u(p)
2 v(p)
ku(p) k2 number of atoms. The metric κ measures of D̂
the closeness
5. Output: Solution (u, v) to (5). 1
PK
to D, and is defined as κ = K i=1 min 1 − dTi d̂j .
1≤j≤K
The value of κ satisfies 0 < κ < 1. If κ is close to 0, it
σ = 15 σ = 25 σ = 35 indicates that the recovered dictionary closely matches with
λ = 1, 1 λ = 1, 1 λ = 8, 8 the ground-truth. The experimental setup is identical to that
np = 0.18, 0.15 np = 0.08, 0.05 np = 0.05, 0.04 in [4]. The ground-truth dictionary is created by randomly
generating K = 50 vectors in R20 , followed by column
TABLE I: (Color online) Values of the regularization parameter λ
normalization. Each column of the coefficient matrix X has
and the fraction of entries np retained in the thresholding step of
`1 -K-SVD, in case of Laplacian (black) and Gaussian (red) noise. s = 3 non-zeros, with their locations chosen uniformly at
random and amplitudes drawn from the N (0, 1) distribution.
is to approximate the `1 -norm using a reweighted `2 metric: Subsequently, D is combined with X and contaminated by ad-
M M ditive noise, to generate N = 1500 exemplars. All algorithms
are initialized with the training examples and the iterations are
X X T
ken − vn uk1 ≈ (en − vn u) Wn (en − vn u) .
n=1 n=1 repeated 80 times, as suggested in [4]. The optimum choice
of the hard-threshold T0 , found experimentally, is given by
Differentiating with respect to u and setting to zero, we get
T0 = 0.03kXkF . The variation of κ and ADR with iterations,
!−1 M
averaged over five independent trials, is demonstrated in Fig. 1.
M
!
X X
2
u= vn W n vn Wn en . To facilitate fair comparison of `1 -K-SVD with the K-SVD
n=1 n=1 and RDL [19], true values of the `1 and `0 norm of the ground-
T truth coefficient vectors, that is, true values of τn in (3) and the
Similarly, for a fixed u, we have vn = uuTW n en
Wn u
. Once the
th parameter s, respectively, are supplied to the algorithms. OMP
update of u and v is done, the j diagonal element of W
is used for sparse coding in K-SVD. The `1 -K-SVD algorithm,
should be updated as wn (j) ← (e −v1 u) + . Dictionary up-
| n n j| as one can see from Fig. 1, results in a superior performance
date is performed by setting dj = u0 and xj Ωj = v0T , where over RDL and its `2 -based counterpart K-SVD, both in terms
(u0 , v0 ) solves (5). The step-by-step description to obtain the of κ and ADR. One can observe similar robustness of the
solution of (5) is given in Algorithm 1. Experimentally we `1 -K-SVD algorithm to both Gaussian and Laplacian noise
4
R EFERENCES
[1] R. Rubinstein, A. M. Bruckstein, and M. Elad, “Dictionaries for sparse
representation modeling,” Proceedings of the IEEE, vol. 98, no. 6, pp.
1045–1057, Jun. 2010.
[2] I. Tosic and P. Frossard, “Dictionary learning,” IEEE Sig. Proc. Mag.,
vol. 28, no. 2, pp. 27–38, Mar. 2011.
[3] M. Elad and M. Aharon, “Image denoising via sparse and redundant
representations over learned dictionaries,” IEEE Trans. Image Proc., vol.
15, no. 12, pp. 3736–3745, Dec. 2006.
[4] M. Aharon, M. Elad, and A. Bruckstein, “K-SVD: An algorithm for
designing overcomplete dictionaries for sparse representation,” IEEE
Trans. Sig. Proc., vol. 54, no. 11, pp. 4311–4322, Nov. 2006.
[5] S. Arora, R. Ge, and A. Moitra, “New algorithms for learning incoherent
and overcomplete dictionaries,” arXiv:1308.6273v5, May 2014.
[6] K. Schnass, “On the identifiability of overcomplete dictionaries via the
minimisation principle underlying K-SVD,” arXiv:1301.3375v4, Mar.
2013.
[7] W. Dai, T. Xu, and W. Wang, “Simultaneous codeword optimization
(SimCO) for dictionary update and learning,” IEEE Trans. on Sig. Proc.,
vol. 60, no. 12, pp. 6340–6353, Dec. 2012.
[8] K. Engan, S. O. Aase, and J. H. Husoy, “Method of optimal directions
for frame design,” in Proc. IEEE Intl. Conf. on Accoust., Speech, and
Sig. Proc., pp. 2443–2446, Mar. 1999.
[9] R. Rubinstein, T. Peleg, and M. Elad “Analysis K-SVD: A dictionary-
learning algorithm for the analysis sparse model,” IEEE Trans. Sig.
Proc., vol. 61, no. 3, pp. 661–677, Feb. 2013.
[10] E. M. Eksioglu and O. Bayir, “K-SVD meets transform learning:
Transform K-SVD,” IEEE Sig. Proc. Lett., vol. 21, no. 3, pp. 347–351,
Mar. 2014.
[11] J. Dong, W. Wang, and W. Dai, “Analysis SimCO: A new algorithm
for analysis dictionary learning,” in Proc. IEEE Intl. Conf. on Accoust.,
Speech, and Sig. Proc., pp. 7193–7197, May 2014.
[12] J. Yang, J. Wright, T. S. Huang, and Y. Ma, “Image super-resolution via
sparse representation,” IEEE Trans. Image Proc., vol. 19, no. 11, pp.
2861–2873, Nov. 2010.
[13] S. Sahoo and W. Lu, “Image in-painting using sparse approximation
with adaptive window selection,” in Proc. IEEE Intl. Symposium on
Intelligent Sig. Proc., pp. 1–4, Sep. 2011.
[14] J. Mairal, M. Elad, and G. Sapiro, “Sparse representation for color image
restoration, IEEE Trans. Image Proc., vol. 17, pp. 53–69, Jan. 2008.
[15] I. Daubechies, R. DeVore, M. Fornasier, and C. S. Güntürk, “Iteratively
re-weighted least-squares minimization for sparse recovery,” Communi-
cations on Pure and Applied Math., vol. LXIII, pp. 1–38, 2010.
[16] C. J. Miosso, R. V. Borries, M. Argáez, L. Velazquez, C. Quintero, and
C. M. Potes, “Compressive sensing reconstruction with prior information
by iteratively reweighted least-squares,” IEEE Trans. on Sig. Proc., vol.
57, no. 6, pp. 2424–2431, Jun. 2009.
[17] O. Endra and Gunawan, “Comparison of `1 -minimization and iteratively
reweighted least squares-`p -minimization for image reconstruction from
compressive sensing,” in Proc. Second Intl. Conf. on Advances in
Computing, Control and Telecom. Tech., pp. 85–88, Dec. 2010.
[18] S. P. Kasiviswanathan, H. Wang, A. Banerjee, and P. Melville, “Online
`1 dictionary learning with application to novel document detection,”
Advances in Neural Info. Proc. Systems, vol. 25, pp. 2267–2275, 2012.
[19] C. Lu, J. Shi, and J. Jia, “Online robust dictionary learning,” in Proc.
IEEE Conf. on Comp. Vis. and Patt. Recog., pp. 415–422, 2013.
[20] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, “Online learning for matrix
factorization and sparse coding,” J. of Mach. Learning Res., vol. 11, pp.
19–60, 2010.
[21] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex
programming,” version 2.0 beta, Sep. 2013.
[22] N. Boumal, B. Mishra, P. A. Absil, and R. Sepulchre, “ManOpt, a
MATLAB toolbox for optimization on manifolds,” J. of Mach. Learning
Res., vol. 15, pp. 1455–1459, 2014.