Professional Documents
Culture Documents
R ES EA R CH Open Access
Abstract
The aim of this paper is to develop strategies to estimate the sparsity degree of a signal from compressive projections,
without the burden of recovery. We consider both the noise-free and the noisy settings, and we show how to extend
the proposed framework to the case of non-exactly sparse signals. The proposed method employs γ -sparsified
random matrices and is based on a maximum likelihood (ML) approach, exploiting the property that the acquired
measurements are distributed according to a mixture model whose parameters depend on the signal sparsity. In the
presence of noise, given the complexity of ML estimation, the probability model is approximated with a two-component
Gaussian mixture (2-GMM), which can be easily learned via expectation-maximization.
Besides the design of the method, this paper makes two novel contributions. First, in the absence of noise, sufficient
conditions on the number of measurements are provided for almost sure exact estimation in different regimes of
behavior, defined by the scaling of the measurements sparsity γ and the signal sparsity. In the presence of noise, our
second contribution is to prove that the 2-GMM approximation is accurate in the large system limit for a proper
choice of γ parameter. Simulations validate our predictions and show that the proposed algorithms outperform the
state-of-the-art methods for sparsity estimation. Finally, the estimation strategy is applied to non-exactly sparse
signals. The results are very encouraging, suggesting further extension to more general frameworks.
Keywords: Sparsity recovery, Compressed sensing, High-dimensional statistical inference, Gaussian mixture models,
Maximum likelihood, Sparse random matrices
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the
Creative Commons license, and indicate if changes were made.
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 2 of 18
measurements. The acquisition can stop as soon as the the numerical sparsity, i.e., the ratio between the 1 and
number of acquired measurements is enough to guaran- 2 norms of the signal, where these quantities can be
tee the correct reconstruction of a signal based on the estimated from random projections obtained using
estimated sparsity. Cauchy-distributed and Gaussian-distributed matrices,
Other applications may include the possibility of com- respectively. A limitation of this approach is that it is
paring the support of two sparse signals from their mea- not suitable for adaptive acquisition since measurements
surements. Due to the linearity of the sparse signal model, taken with Cauchy-distributed matrices cannot be used
the degree of overlap between the supports of two sparse later for signal reconstruction. In [21], this approach is
signal can be estimated by measuring the sparsity degree extended to a family of entropy-based sparsity measures
of their sum (or difference) [10]. Finally, sparsity estima- of kind (xq /x1 )q/(1−q) with q ∈[ 0, 2], for which esti-
tion can be used to decide whether a signal can be repre- mators are designed and theoretically estimated in terms
sented in a sparse way according to a specific basis, which of limiting distributions. In [22], the authors propose to
can be used to select the most suitable basis allowing the estimate the sparsity of an image before its acquisition,
sparsest representation. by calculating the image complexity. However, the pro-
posed method is based on the image pixel values and
1.1 Related work needs a separate estimation that does not depend on
The problem of estimating the sparsity degree has begun the measurements. Further, in [23], the minimum num-
to be recognized as a major gap between theory and prac- ber of measurements to recovery, the sparsity degree was
tice [11–13], and the literature on the subject is very theoretically investigated.
recent. Finally, we notice that the problem of estimating the
In some papers, the joint problem of signal reconstruc- sparsity degree of a vector is partially connected to the
tion and sparsity degree estimation is investigated, in problem of estimating the number of distinct elements
particular for time-varying settings. The following itera- in data streams [24, 25], which has been largely studied
tive approach is considered: given an initial upper bound in the last decades due to its diverse applications. The
for the sparsity degree, at a generic time step t, the sig- analogy lies in the fact that the sparsity degree problem
nal is reconstructed and sparsity degree is estimated; such could be seen as the estimation of the number of elements
estimation is then used at time t + 1 to assess the number distinct from zero. Moreover, many efficient algorithms
of measurements sufficient for reconstruction. The sem- to estimate the number of distinct elements are based
inal work [14] investigates the problem in the framework on random hashing (see [25] for a review) to reduce
of spectrum sensing for cognitive radios and proposes an the storage space, which is our concern as well. How-
iterative method that at each time step performs two oper- ever, the problem of distinct elements considers vectors
ations: (a) the signal is recovered via Lasso, and (b) the a = (a1 , . . . , an ) with ai ∈ Q, where Q is a finite set, which
sparsity degree is estimated as the number of recovery is intrinsically different from our model where the signal
components with magnitude larger than an empirically set x has real-valued components. Therefore, the strategies
threshold. The efficiency of this procedure is validated via conceived for this problem cannot be applied for our
numerical simulations. purpose.
Some authors propose sequential acquisition tech-
niques in which the number of measurements is 1.2 Our contribution
dynamically adapted until a satisfactory reconstruction In this paper, we propose a technique for directly estimat-
performance is achieved [15–19]. Even if the reconstruc- ing the sparsity degree of a signal from its linear mea-
tion can take into account the previously recovered signal, surements, without recovery. The method relies on the
these methods require to solve a minimization problem fact that measurements obtained by projecting the signal
at each newly acquired measurement and may prove too according to a γ -sparsified random matrix are distributed
complex when the underlying signal is not sparse, or if according to a mixture density whose parameters depend
one is only interested in assessing the sparsity degree of a on k. This is an extension of the algorithm in [26], which
signal under a certain basis without reconstructing it. works only in the case of noise-free, exactly k-sparse
In other papers, the sparsity degree estimation is only signals. First, we analyze the case of noise-free, exactly k-
considered, which generally requires less measurements sparse signals as a special case, and we provide theoretical
than signal reconstruction. In [13], sparsity degree is esti- guarantees regarding the consistency of the proposed esti-
mated through an eigenvalue-based method, for wide- mator and its asymptotic behavior under different regimes
band cognitive radios applications. In this work, the of the parameters k and γ . Then, we analyze the more
signal reconstruction is not required, while in practice, generic case of noise, including the non-exactly sparse
the used number of measurements was quite large. In signals, and we propose to approximate the measure-
[20], the sparsity of the signal is lower-bounded through ment model by a two-component Gaussian mixture model
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 3 of 18
(2-GMM), whose parameters can be easily estimated via Given a random variable, we denote the probability
expectation-maximization (EM) techniques. In this case, density function with f .
we prove that there is a regime of behavior, defined
by the scaling of the measurement sparsity γ and the 2.2 Sparse signal recovery using sparse random
sparsity degree k, where this approximation is accu- projections
rate. An interesting property of the proposed method Let x ∈ Rn be an unknown deterministic signal. CS
is that measurements acquired using a γ -sparsified ran- [30] aims to recover a signal from a small number of
dom matrix also enable signal reconstruction, with only non-adaptive linear measurements of the form
a slight performance degradation with respect to dense
y = Ax + η. (1)
matrices [27, 28].
Some preliminary results, limited to the sparsity esti- where y ∈ Rmis a vector of observations, A ∈ Rm×n is
mation of noisy, exactly k-sparse ternary signals, have the sensing matrix with m < n, η ∈ Rm is an additive
appeared in [29]. In this paper, we extend the results Gaussian noise N 0, σ 2 Im×m , and Im×m is the identity
in [29] from both a theoretical and a practical point of matrix with m rows, and m is the columns. Since the solu-
view, by considering any k-sparse signal and extending the tion to (1) is not unique, the signal is typically assumed to
model to non-exactly sparse signals. be sparse, i.e., it can be represented with k non-zero coef-
ficients, or compressible, in the sense that it can be well
1.3 Outline of the paper approximated by a vector having only k non-zero coeffi-
The paper is organized as follows. Section 2 presents cients. In the following, we refer to k as the signal sparsity
the notation and a brief review of CS-related results. degree and we denote the set of signals with exactly k
The sparsity estimation problem is formally introduced non-zero components as k = {v ∈ Rn : v0 ≤ k}.
in Section 3, where we discuss the optimal estimator, The literature describes a wide variety of approaches to
whereas the main properties of the optimal estimator select the sparsest solution to the affine system in (1). In
in the noise-free setting are outlined in Section 4. In particular, a large amount of work in CS investigates the
Section 5, we introduce the proposed iterative algorithm performance of 1 relaxation for sparse approximation.
for dealing with the noisy setting, together with some The problem of recovery can be analyzed in determinis-
approximate performance bounds. Finally, the proposed tic settings, where the measurement matrix A is fixed, or
estimators are experimentally validated in Section 6, while in random settings in which A is drawn randomly from a
concluding remarks are given in Section 7. sub-Gaussian ensemble. Past work on random designs has
focused on matrices drawn from ensemble of dense matri-
2 Preliminaries ces, i.e., each row of A has n non-zero entries with high
In this section, we define some notation, we review the CS probability. However, in various applications, sparse sens-
fundamentals, and we briefly discuss the use of sparsified ing matrices are more desirable [31]. Furthermore, sparse
matrices in the CS literature. measurement matrices require significantly less storage
space, and algorithms adapted to such matrices have lower
2.1 Notation computational complexity [32, 33]. In [27], the authors
Throughout this paper, we use the following notation. We study what sparsity degree is permitted in the sensing
denote column vectors with small letters, and matrices matrices without increasing the number of observations
with capital letters. If x ∈ Rn , we denote its jth element required for support recovery.
with xj and, given S ⊆[ n] := {1, . . . , n}, by x|S , the sub- In this paper, we consider γ -sparsified matrices [27], in
vector of x corresponding to the indices in S. The support which the entries of the matrix A are independently and
set of x is defined by supp(x) = {i ∈[ n] : xi = 0} and we identically distributed according to
use x0 = |supp(x)|. Finally, the symbol x with no sub-
N 0, γ1 w.p. γ ,
script has always to be intended as the Euclidean norm of Aij ∼ (2)
the vector x. δ0 w.p. 1 − γ
This paper makes frequent use of the following notation where δ0 denotes a Dirac delta centered at zero.
for asymptotics of real sequences (an )n∈N and (bn )n∈N : (i) Since weak signal entries could be confused with noise,
an = O(bn ) for n → ∞ if there exists a positive con- in [27], the support recovery is studied also as a function
stant c ∈ (0, +∞) and n0 ∈ N such that an ≤ cbn for all of the minimum (in magnitude) non-zero value of x:
n > n0 , (ii) an = (bn ) for n → ∞ if there exists a con-
stant c ∈ (0, +∞) and n1 ∈ N such that an ≥ c bn for all λ := min |xi |. (3)
i∈supp(x)
n > n0 , (iii) an = (bn ) for n → ∞ if an = O(bn ) and
an = (bn ), and (iii) an = o(bn ) for n → ∞ means that Consequently, for a fixed λ > 0, let us define:
limn→∞ |an /bn | = 0. Xk (λ) = {x ∈ k : |xi | ≥ λ}. (4)
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 4 of 18
For this class of signals, the following result has been We say that the sparsity
estimator
k is asymptotically
proved. weakly consistent when e k, k converges in probability to
0 as m, n → ∞. If we replace convergence in probability
Theorem 1 (Corollary 2 in [27]) Let the measurements with almost sure convergence, then the estimator is said
matrix A ∈ Rm be drawn with i.i.d. elements from the to be strongly consistent.
γ -sparsified Gaussian ensemble. Then, a necessary con- If the signals are not exactly sparse but compressible, i.e.,
dition for asymptotically reliable recovery over the signal they admit a representation with few large components
class Xk (λ) is in magnitude; in CS literature, the recovery guarantees
m ≥ max{g1 (n, k, λ, γ ), g2 (n, k, λ, γ )} (5) are expressed in terms of the sparsity degree of the best-k
approximation [30, 35] defined as follows
where
xk = argmin x − z. (9)
log nk − 1 z∈ k
g1 (n, k, λ, γ ) ≥ 1
2 + 1 log 2πe γ k + 1
2 γ k log 1 + λ k/σ
2 For this reason, the sparsity of a not exactly sparse signal
2 12
(6) is defined as the number of components containing most
of the energy up to a relative error τ
log (n − k + 1) − 1 xs 2 ≤ τ x2 }.
kτ = min{s ∈[ n] : x − (10)
g2 (n, k, λ, γ ) ≥ 1 (7)
2 γ log 1 + λ / σ γ .
2 2 Then, defining e = x − xkτ , we write
y = Ax = A xkτ + e = A x kτ + η (11)
In particular, Theorem 1 says that if γ k → ∞ as n →
∞, then the number of measurements is of the same order where η = Ae. It should be noticed that each component
as that for dense sensing matrices. In sharp contrast, if ηi is distributed as is a mixture ofGaussians.
We make the
γ k → 0 sufficiently fast as n → ∞, then the number following approximation: ηi ∼ N 0, σ 2 with
of measurements of any decoder increases dramatically.
Finally, if γ k = (1) and λ2 k = (1), then at least σ 2 = E ηi2 = e2 ≤ τ x2 . (12)
max{(k log(n/k)), (k log(n − k)/ log k)} measurements In the noiseless case, by linearity of expectation, we have
are necessary for estimating the support of the signal. m m
Several recovery algorithms are based on the use 1 1
E y2i = E y2i
of sparse sensing matrices. In particular, count-minute m m
i=1 i=1
sketch algorithms need about 10 to 15 times more mea- ⎡ ⎤
m n
surements than 1 -decoding and sparse matching pursuit 1
= E⎣ A2ij x2j +2 Aij Aik xj xk ⎦
needs about half of the measurements of count-min m
i=1 j=1 j<k
sketch. Other sketch algorithms include [34] that can be
as accurate as 1 decoding with dense matrices under from which
the condition γ k = (1) with the same order of 1
m
1
m n
measurements. E y2i = E A2ij x2j
m m
i=1 i=1 j=1
However, this optimization problem is NP hard and the The estimator derived in proposition 1 has already been
search of the solution requires an exponential time in proposed in [26] for estimating the degree of sparsity. In
the signal length n (one optimization problem for all the following, we will denote the estimator in (16) as oracle
subsets
n nof [n]n of size s and for all s, which amounts to estimator since it is equivalent to estimating k in the pres-
s=1 s = 2 − 1). ence of an oracle who knows which entries in y are only
Given supp(x), if A is chosen from the ensemble of due to noise. In our analysis, we prove that the oracle esti-
γ -sparsified matrices, any measurement mator is asymptotically strongly consistent, i.e, with the
property that as the number of measurements increases
yi = Aij xj + ηi (14) indefinitely, the resulting sequence of estimates converges
j∈[n]
almost surely to the true sparsity degree (see Theorem 2).
is a random variable whose density function is a mixture This means that the density functions of the estimators
of Gaussians with 2k components. The result follows from become more and more concentrated near the true value
the following argument. Let S be the overlap between the of the sparsity degree.
support of ith row of A and supp(x). It is easy to see that Given a sequence of events {Em }m∈N , we denote with
given S then yi ∼ N(0, αS ) where αS = xS 2 /γ + σ 2 . lim supm→∞ Em the set of outcomes that occur infinitely
Without any further assumption, taking into account all many times. More formally,
possible overlaps between the support of the ith row of ∞
∞
A and support of the signal x with cardinality s ≤ k, lim sup Em =
E (17)
we can have in principle s≤k ks = 2k different type of m→∞
m=1 ≥m
Gaussians. We conclude that yi is a Gaussian mixture with
2k components. If the non-zero elements of the signal have Theorem 2 Let pk = 1 − (1 − γ )k , then
all equal values in magnitude, then the number of compo-
P e ko , k > ≤ 2e−2mξk ,
2
nents of the Gaussian mixture reduces dramatically to k. (18)
Given the set of m independent and identically distributed
samples y = (y1 , . . . , ym ) , the sparsity estimation can be where ξk = (1 − pk ) 1 − e log(1−pk ) . Moreover, let Em be
recast into the problem of evaluating the number of mix- the sequence of events
ture components and parameters. However, also in the ⎧ ! √ " #⎫
⎪
⎪ −
ρ log m ⎪
⎪
simple case where k is known, the estimation of the finite ⎨ log 1 1−pk m ⎬
mixture density function does not admit a closed-form
Em = e(ko , k) ≥ (19)
⎪
⎪ log(1 − pk ) ⎪
⎪
solution, and the computational complexity is practically ⎩ ⎭
unfeasible.
with m ∈ N and ρ > 1/2, then
4 Method: noise-free setting ! #
In this section, we show that in the absence of noise (i.e., P lim sup Em = 0. (20)
m→∞
η = 0), y0 is a sufficient statistic for the underlying
parameter k. We show that the performance of the pro-
Remark 1 From Theorem 2, we deduce that almost
posed estimators of the sparsity degree depends on the
surely (i.e., with probability 1) the relative error between
SNR = λ2 k/σ 2 and that the traditional measure x2 /σ 2
the estimated sparsity and the true value of the sparsity
has no significant effect in the estimation of the sparsity
degree is
degree. '( )
Even in the absence of noise, since A is chosen from the log m
e ko , k = O (21)
ensemble
of γ -sparsified matrices, any measurement yi = m
j∈[n] Aij xj is still a random variable. The ML solution
provides the following estimator of the signal sparsity. for all but finitely many m.
the scaling of the measurement sparsity γ and the signal Remark 3 Theorems 3 and 4 suggest that in order to
sparsity k. obtain a good estimation of the sparsity degree, we need
sufficiently sparse matrices, but not too sparse. On the
Theorem 3 Let ψ(k) = γ k = o(k) as k → ∞ and other hand, at the same time, the use of sparser matri-
define the function g(k) as follows: ces requires more measurements for a successful recov-
ery of the signal. If we combine the results obtained in
a) if ψ(k) → ∞ as k → ∞, then g(k) = e2ψ(k) ; Theorems 3 and 4 with those provided in Theorem 1, we
b) if ψ(k) = (1) as k → ∞, then g(k) → ∞ for notice there is a large range for γ (provided by c ≤ ψ(k) ≤
k → ∞; 1
2 log k − log(log k) with c > 0) where both sparsity esti-
c) if ψ(k) = o(1) as k → ∞ , then mation and the recovery can be successful. We will provide
g(k) = ψ(k)−2(1+) , for any > 0. more details on how to choose γ for specific applications in
If the number of measurements is such that m
≥ g(k), Section 6.
log m
then
k→∞ The proofs of Theorems 3 and 4 are postponed to the
P e ko , k ≥ k −→ 0, (22) Appendix.
Given a set S, let αS = xS 2 /γ + σ 2 and pS = f (yi , zi |α, β, p) = (1 − p)(1 − zi )φ(yi |α) + pzi φ(yi |β) (31)
(1 − γ )k−|S| γ |S| . Let us consider the density functions (the
subscript is to emphasize the dependence on parameter k) with i ∈ [ m]. Starting from an initial guess of mixture
parameters α(0), β(0), andp(0), the algorithm that we pro-
fk (ζ ) = pS φ(ζ |αS ) (27) pose (named EM-Sp and summarized in Algorithm 1)
S⊆supp(x) computes, at each iteration t ∈ N, the posterior
distribution
! #
x2
fk2-GMM (ζ ) = (1 − pk )φ ζ |σ 2 + pk φ ζ σ 2 + . πi (t) = P(zi = 1|α(t), β(t), p(t)) (32)
pk
(28) (E-Step) and re-estimates the mixture parameters
(M-Step) until a stopping criterion is satisfied. Finally, the
Then, there exists a constant C ∈ R such that estimation of the signal sparsity is provided by
⎛ ⎞
+ + n 4
+ + Cγ 2 x x 4
k = log(1 − pfinal )/ log(1 − γ ). (33)
+fk − fk2-GMM + ≤ 4 ⎝ i=1 i + x2i x2j − ⎠.
K λ γ pk
i=j
(29)
Algorithm 1 EM-Sp [29]
The proof of Theorem 5 is postponed to the Appendix. Require: Measurements y ∈ Rn , parameter γ
As a simple consequence, we obtain that, under suitable 1: Initialization: α(0), β(0), p(0)
conditions, the approximation error depends on ψ(k). 2: for t = 0, 1, . . . , StopIter do
3: E-step: for i = 1, . . . , m, compute the posterior
Corollary 1 Let ψ(k) = γ k and fk , fk2-GMM be the probabilities
sequence of density functions defined in (27) and (28). √
p(t) −y2i /(2β(t))
e
Then, there exists a constant C ∈ R such that β(t)
πi (t + 1) =
+ + ! # 1−p(t) −y2i /(2α(t))
√ e + √p(t) e−yi /(2β(t))
2
5.2 Sparsity estimation via EM 5.3 The Cramér-Rao bound for 2-GMM
Using the approximation in Theorem 5, we recast the The Cramér-Rao (CR) bound is a popular lower bound
problem of inferring the signal sparsity as the problem on the variance of estimators of deterministic parame-
of estimating the parameters of a two-component Gaus- ters. Given a parameter ξ and an unbiased estimator ξ,
sian mixture, whose joint density function of y ∈ Rm and let f (x; ξ ) be the likelihood function. The CR bound is
hidden class variables z ∈ {0, 1}m is given by given by
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 8 of 18
1
CR(
ξ) = (39) Finally, an application where sparsity estimation improves
∂ log(f (x;ξ )) 2 the efficiency of signal recovery in CS is proposed.
E ∂ξ
that is, the inverse of the Fisher information. 6.1 Noise-free measurements
The EM-Sp algorithm, for measurements that can be We start testing signals that are exactly sparse and mea-
exactly modeled as a 2-GMM, and for a large number surements that are not affected by additive noise.
of measurements, would be asymptotically optimal and We evaluate the estimation accuracy in terms of empir-
unbiased and achieve a performance very close to the CR ical probability of correct
estimation: a run is considered
bound. However, because of the presence of noise in the successful if e k, k < 5 × 10−2 where k is the estimated
data and the approximation of the 2-GMM model, we sparsity. In Fig. 1, we show results averaged over 1000
expect that the estimator provided by EM-Sp algorithm random instances, obtained by generating different sens-
will be biased. A theoretical analysis of the bias in terms ing matrices from the γ -sparsified Gaussian ensemble. We
of these two factors is hard to carry out. In the following, underline that the values of the non-zero entries of the
we analyze the performance of EM-Sp in the estimation of signals (which are drawn from a standard Gaussian distri-
the 2-GMM parameters via the CR bound, which gives us bution for this experiment) do not affect the performance
an indication of how much the non-idealities of the model in the noise-free case. Similarly, the length n of the signal
affect the performance of the proposed estimator. plays no role in the estimation (see Proposition 1).
Let us consider a 2-GMM framework, in which two The empirical probability of correct estimation is stud-
zero-mean Gaussians with known variances α and β are ied as a function of m and k for three different regimes of
given, and let us call p the mixture parameter. The likeli- parameter ψ(k) defined in Remark 2 (see Fig. 1) :
hood function is f (x; p) = (1 − p)φ(x|α) + pφ(x|β), and
a) ψ(k) = 12 (log(k) − log(log k));
1 b) ψ(k) = 1;
p) =
CR( 2 (40) ,
E φ(x|β)−φ(x|α) c) ψ(k) = 3 log k/k.
(1−p)φ(x|α)+pφ(x|β)
a b c
Fig. 1 Noise-free setting: empirical probability"of success sparsity estimation as a function of sparsity degree k and number of measurements.
3 log k
a ψ(k) = 12 log k
log k b ψ(k) = 1 c ψ(k) = k
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 9 of 18
EM-Sp
Lopes
0.2
0.18
0.19 0.16
0.14
0.12
0.1
0.1 0.08
0.06
0.04
MRE 0.02
0.05
2000
0.025 4000 k
6000 4000 2000 6000
m
Fig. 2 Experiment 2: MRE (in log-scale) of EM-Sp and Lopes’s estimator as a function of the sparsity degree k and the number of measurements m,
SNR =10 dB
to have a fair comparison, we perform this test on an unfair comparison) or are conceived for very specific
ternary signals in {−λ, 0, λ}n for which sparsity and applications [13, 22].
numerical sparsity coincide. We then consider random In Fig. 2, we show the mean relative error (MRE)
sparse signals with non-zero entries uniformly chosen defined as
in {λ, −λ}, λ ∈ R, and SNR = λ2 k/σ 2 (see defini-
tion in Section 3). Moreover, we set ψ(k) constant in MRE = E e k, k (41)
order to focus on the effects of the additive noise in the
estimation. for different values of k and m in settings with
We remark that we compare only to [20] because, SNR = 10 dB and ψ(k) = 1/10. We appreciate that,
as illustrated in Section 1.1, the other proposed algo- in the considered framework, EM-Sp always outperforms
rithms for sparsity degree estimation are based on signal the method based on the numerical sparsity estimation.
reconstruction [14] (requiring a larger number of mea- In Fig. 3, we set k = 1000, ψ(k) = 1/3, and we vary the
surements and increased complexity, which would give SNR from 0 to 40 dB, while m ∈ {800, 1000, 2000, 5000}.
0.16
MRE
0.08
0.04
0.02
0 5 10 15 20 25 30 35 40
SNR
Fig. 3 Experiment 2: MRE (in log-scale) of EM-Sp and numerical sparsity estimator as a function of the SNR, for different m’s, k = 1000
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 10 of 18
Again, we see that EM-Sp outperforms [20]. We spec- and β are exploited. However, our results show that the
ify that a few tens of iterations are sufficient for the 2-GMM approximation is trustworthy. In Fig. 4, we depict
convergence of EM-Sp. the sample variance of the estimator p of p (obtained from
Finally, we compare the performance of EM-Sp with an 1000 runs) of EM-Sp and EM oracle. We show also the
oracle estimator designed as follows: we assume to know CR bound (see Section 5.3), which represents a perfor-
exactly the variances α and β and we generate measure- mance lower bound for the estimation of p. As explained
ments yi distributed according to 2-GMM (1−p)φ(yi |α)+ in Section 5.3, the stochastic mean required in the CR
pφ(yi |β), for i = 1, . . . , m; given the sequence y, α, and bound for 2-GMM cannot be analytically computer and is
β, we then compute the ML estimate of p via EM. We here evaluated via Monte Carlo.
name this estimator EM oracle. Comparing the estimates In both graphs of Fig. 4, we set k = 1000, m = k in
of p of EM-Sp and EM oracle, we can check if our Fig. 4a, and ψ(k) = 1/10 in Fig. 4b. We notice that in
2-GMM approximation is reliable. We clearly expect that the considered regimes, EM oracle and CR bound are very
EM oracle performs better, as the measurements are really close and not really affected by the SNR. Regarding EM-Sp,
generated according to a 2-GMM, and also the true α we observe that (a) keeping k, γ fixed, EM-Sp gets closer
b
Fig. 4 Noisy setting: comparison with oracle EM and Cramér-Rao (CR) bound. a k = 1000, ψ(k) = 1/10 b k = m = 1000
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 11 of 18
to the optimum as the SNR increases; and (b) keeping k, m the noise energy (namely, the error of the best-k approxi-
fixed, we can find an optimal γ that allows us to get very y2
mation), evaluated from the measurements: β = m 2 and
close to the optimum. α = τβ.
In Fig. 5, we depict the results on three n = 256 × 256
6.3 Compressibility of real signals
images (shown in Fig. 6) represented in the DCT basis
In this section, we test our EM-Sp algorithm to eval- (DCT is performed on 8 × 8 blocks). Specifically, we show
uate the compressibility of real signals. Specifically, we original and estimated sparsity (the y-axis represents the
consider images which are approximately sparse in the ratio k/n), averaged on 100 random sensing matrices. The
discrete cosine transform (DCT) domain, that is, they images have been chosen with different compressibilities,
are well represented by few DCT coefficients. Our aim to test our method in different settings. We appreciate
is to estimate the number k of significant DCT coeffi- that for all the images and across different values of τ , we
cients. More precisely, we seek the minimum k such that are able to estimate k with a small error. This experiment
the best-k approximation xk has a relative error smaller shows then that EM-Sp can be practically used to estimate
approximately τ , namely xk − x22 ≤ τ x22 . Since DCT the compressibility of real signals.
coefficients of natural images usually have a power-law
decay of the form xi ≈ c/i, the following approximation 6.4 Sparsity estimation for signal recovery
holds We have already remarked that the knowledge of the
sparsity degree k is widely used for signal recovery in
- n −2
xk − x22 x dx CS. In this section, we consider CoSaMP [9], an algo-
≈ -kn −2 ∝ k −1 (42) rithm which can recover a sparse signal (exactly or with
x22
1 x dx
a bounded error, in the absence and in the presence of
noise, respectively) if a sufficient number of measure-
and since according to theoretical derivation γ ∝ 1/k, we ments m is provided, and assuming the knowledge an
fix γ ∝ τ . Tuning γ proportionally to τ allows to adapt upper bound kmax for k. Our aim is to show that EM-
better the sparsity of the sensing matrix to the expected Sp can be used as a pre-processing for CoSaMP when
signal sparsity: for larger τ ’s, we expect smaller k s, which k is not known; specifically, we estimate k to design the
call for larger γ to have the sufficient matrix density to number of measurements necessary for CoSaMP recov-
pick the non-zero entries. ery. Subsequently, we denote this joint procedure as
In the proposed experiments, we fix γ = cτ with EM-SP/CoSaMP.
c = 5·10−2 and we initialize πi (0) = 12 for all i = 1, . . . , m, We compare CoSaMP with EM-Sp/CoSaMP in the fol-
while we set β and α respectively as the signal energy and lowing setting. We consider a family S of signals of length
0.7
Original: EM-Sp estimate:
Pattern Pattern
0.6 Lena Lena
Aerial 1 Aerial 1
Sparsity best-k approximation
0.5
0.4
0.3
0.2
0.1
0
0.05 0.075 0.1 0.125 0.15 0.175 0.2
Tolerated error τ
xk −x22
Fig. 5 Experiment 3: compressibility of real signals, with non-exactly sparse representations. We estimate k such that x22
< τ , where x and
xk
respectively are the non-exactly sparse representation and its best-k approximation
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 12 of 18
Fig. 6 Non-exactly sparse signals: images with approximately sparse DCT representation: pattern, Lena, and aerial
n = 1600 and (unknown) sparsity k ∈ {20, 200} (then, (2), and we provide an estimate
k of k using
kmax = 200). The value of k and the position of the non- Algorithm 1.
zero coefficients are generated uniformly at random, and 2 Second sensing stage and recovery: we add a
the non-zero values are drawn from a standard Gaussian sufficient number of measurements (dimensioned
distribution. Since k is not known, the number of over k) and then perform CoSaMP recovery.
measurements needed by CoSaMP has to be dimensioned
Specifically, the following assessments have been proved
on kmax : assuming SNR = 30 dB, from the literature,
to be suitable for our example:
we get that mC = 4kmax are sufficient to get a
satisfactory recovery using dense Gaussian sensing matri- • We estimate k with EM-Sp from mS = kmax 2
ces. In our specific setting, we always observe a mean sparsified measurements, with γ = 6/kmax ;
relative error MRErec = x − x2 / x2 < 5.5 × 10−2 • Since underestimates of k are critical for CoSaMP,
(for each k ∈ {20, 200}, 100 random runs have been we consider
k equal to 2 times the estimate provided
performed). by EM-Sp;
• We add mA = 4 k Gaussian measurements, and we
We propose now the following procedure.
run CoSaMP with the so-obtained sensing matrix
1 First sensing stage and sparsity estimation: we take with mS + mA rows. When mS + mA > mC , we
mS mC measurements via γ -sparsified matrix in reduce the total number of measurements to mC .
800
700
Number of measurements
600
500
400
300 EM-Sp/CoSaMP
CoSaMP
20 40 60 80 100 120 140 160 180 200
k
Fig. 7 Recovery experiment with unknown k: EM-Sp/CoSaMP saves a significant number of measurements with respect to pure CoSaMP
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 13 of 18
0.08
0.07
0.06
MRErec 0.05
0.04
0.03
0.02
0.01
EM-Sp/CoSaMP
CoSaMP
0
20 40 60 80 100 120 140 160 180 200
k
Fig. 8 The gain obtained by EM-Sp/CoSaMP in terms of measurements has no price in terms of accuracy: the recovery mean relative error is close to
that of pure CoSaMP
We show the results averaged over 100 random matrix, but only on its sparsity parameter γ . This
experiments. In Fig. 7, we compare the number of enables applications in which one is not interested in
measurements used for recovery, as a function of signal reconstruction, but only in embedding the spar-
the sparsity degree k: a substantial gain is obtained sity degree of the underlying signal in a more compact
in terms of measurements by EM-Sp/CoSaMP, with representation.
no significant accuracy loss. In Fig. 8, we can see
that CoSaMP and EM-Sp/CoSaMP algorithms achieve Endnote
1
similar MRErec . The code to reproduce the simulations proposed in
this section is available at https://github.com/sophie27/
7 Conclusions sparsityestimation
In this paper, we have proposed an iterative algorithm
for the estimation of the signal sparsity starting from Appendix
compressive and noisy projections obtained via sparse Proofs of results in Section 4
random matrices. As a first theoretical contribution, we Proof of Proposition 1
have demonstrated that the estimator is consistent in the It should be noticed that P(ωi = 0) = (1−γ )θ n , where θ ∈
noise-free setting and we characterized its asymptotic [0, 1] is the parameter to be optimized. Since the rows of
behavior for different regimes of the involved parameters, the matrix A are independent, 5 so are ωi , and considering
namely the sparsity degree k, the number of measure- that the event ωi = 0 is equivalent to the event that the
ments m, and the sensing matrix sparsity parameter γ . support of ith row of A is orthogonal to the support of
Then, we have showed that in the noisy setting, the pro- signal x, the ML estimation computes
jections can be approximated using a 2-GMM, for which m
the EM algorithm provides an asymptotically optimal
θo = argmax log f (ω |θ) = f (ωi |θ) (43)
estimator. θ∈[0,1] i=1
Numerical results confirm that the 2-GMM approach m ω
1−(1 − γ )θn i (1 − γ )θn(1−ωi )
is effective for different signal models and outperforms = argmax log
methods known in the literature, with no substantial θ∈[0,1] i=1
pS
Var Var(yi )|ωi = 1 (82) g(t, αS ) (90)
! # pk
k n S∈S \{∅}
(1 − γ )k−s γ s−2 k − 1 0 1
= x4 (83) 1 pS ∂ 2g
pk s−1 = g(t, ᾱ) + (t, ξ(αS ))(αS − ᾱ)2
s=1 =1
2 pk ∂α 2
s−2 ! #
k S∈S \{∅}
(1 − γ )k−s γ k−2 x4 (91)
+ x2 x2j − .
pk s−2 p2k
s=2 i=j
with ξ(αS ) ∈ (min{ᾱ, α}, max{α, ᾱ}) ⊆ (λ2 /γ , x2 /γ ).
(84) Putting this expression into (87), we obtain
We conclude + +
+ +
+fk − fk2-GMM + (92)
Var Var(yi )|ωi = 1 (85) K
⎛ ⎞ 0 1
n pk pS ∂ 2 g
1 ⎝ =1 x4 x4 ⎠ ≤ 2
(t, ξ(αS ))(αS − ᾱ) (93)
= + x2 x2j − (86) 4
S∈S \{∅} kp ∂α 2
pk γ pk
i=j 2
pk pS ∂ g
≤ (t, ξ(αS )) (αS − ᾱ)2 (94)
4 pk ∂α 2
S∈S \{∅}
Proof of Theorem 5 pk pS
≤ Gmax (αS − ᾱ)2 (95)
Let ψ(k) = γ k and consider the sequence of probability 4
S∈S \{∅}
pk
density functions
where
fk (ζ ) = pS φ (ζ |αS ) 2
∂ g
S⊆supp(x) Gmax = max
sup 2 (t, ξ ) (96)
! # ξ ∈[λ /γ ,x /γ ] t∈R
2 2 ∂α
x2
fk2-GMM (ζ ) = (1 − pk )φ ζ |σ 2 + pk φ ζ σ 2 + , and
pk ! #
∂ 2g 1 −t 2 t t2
where αS = xS 2
+ σ2 and pS = (1 − γ )k−|S| γ |S| . Let (t, ξ ) = √ e 2ξ 3− (97)
γ ∂α 2 2 2π ξ 5/2 ξ
x2
ᾱ = E Var(yi )|ωi = 1 = σ 2 + (see Lemma 1) and Through standard computations, " we see that the maxi-
pk
denote S the set of possible subsets of supp(x), we thus √
have mizing value is obtained for t = (3 − 6)ξ
+ + ! #
+ + 1 −t 2 t t 2
+fk − fk2-GMM + Gmax = max
sup √ e 2ξ 3−
K
ξ ∈[λ2 /γ ,x2 /γ ] t∈R 2 2π ξ 5/2 ξ
* t * t
(98)
= sup pS φ (ζ |αS ) − pk φ (ζ |ᾱ)
t∈R S∈S \{∅} −∞ −∞ γ2
=C max ξ −2 = C . (99)
ξ ∈[λ2 /γ ,x2 /γ ] λ4
! # ! #
pk pS t t
= sup erf √ − erf √ with
2 t∈R pk 2αS 2 ᾱ
(
S∈S \{∅} √ √
(87) 3 − 6 3 −(3−√6)
C= √ e 2 . (100)
where erf is the Gauss error function. Let g : R2 → R be 2 π
the function Finally, considering
! #
t pS
g(t, α) := erf √ (88) (αS − ᾱ)2 = Var Var(yi )|ωi = 1 (101)
2α pk
S∈S \{∅}
and by the Lagrange Theorem [40], we obtain
and using Lemma 1, we conclude
∂g ∂ 2g + +
1 + +
g(t, α) = g(t, ᾱ)+ (t, ᾱ)(α− ᾱ)+ (t, ξ )(α− ᾱ)2 +fk − fk2-GMM + (102)
∂α 2 ∂α 2 ⎛ K ⎞
(89)
C γ ⎝ ni=1 x4i
2 x 4
⎠
≤ 4 + x2i x2j − (103)
with ξ(α) ∈ (min{ᾱ, α}, max{α, ᾱ}). It should be noticed λ γ pk
i=j
that the first-order term in Taylor’s expansion of g(t; α)
vanishes due to conditional mean result from Theorem 2 with C = C/4.
Ravazzi et al. EURASIP Journal on Advances in Signal Processing (2018) 2018:56 Page 17 of 18