Compressed Sensing and Sparse Representations

Compressed Sensing: A survey
Shruti Sharma
May 9, 2016
Abstract
Compressed sensing is a new concept in signal processing where
one seeks to minimize the number of measurements to be taken from
signals while still retaining the information necessary to approximate
them well. Conventional approaches to sampling signals or images follow Shannons celebrated theorem: the sampling rate must be at least
twice the maximum frequency present in the signal. Compressed Sensing theory asserts that one can indeed recover certain signals and images from far fewer samples or measurements than traditional methods
used, provided signal has some sparse representation in some transform domain.
Introduction
Data acquisition is ubiquitous. Standard data acquisition system firstly acquire the data at Nyquist rate and then compress it. Compression is possible
because all the real life signals are compressible or sparse meaning only few
coefficients have information but not all. This is also the basis of Transform
Coding and reason behind the success of codecs like JPEG, JPEG 2000,
MPEG, H.264/AVC to name a few. Sometimes this paradigm is also termed
as Sampling and Compression paradigm. However, in 2006, David Donoho in
his seminal work proposed a mathematical framework for combining acquisition and compression stage [1]. In phenomenal work of Emmanuel Candes,
Romberg and Terence Tao, they showed that with few random linear frequency measurements, original sinusoidal signal can be reconstructed with
an assumption that signal is sparse in frequency domain [2].
Compressed sensing theorys is a mathematical framework for acquiring few information rich measurements rather than many information poor
measurements. Underlying theory of compressed sensing builds on the various branches of mathematics which include Linear Algebra, Real Analysis,
Topology, graph theory, Wavelet theory etc [3].
Basics of Compressed Sensing
The typical paradigm for obtaining a compressed version of a discrete signal

represented by a vector x RN is to choose an appropriate basis, compute
the coefficients of x in this basis, and then retain only the k largest of these
with k < N [4,5,6].
2.1
Sparse Signals
We say k be the set of all k sparse vectors:

k = {x RN |supp(x) k}
2.2
(2.1.1)
Compressible Signals
Due to noise either during acquisition or due to sensors limitations, data is

generally not sparse and thus sparsity imposes stringent condition. So with
introduction to weak l p spaces, compressible signals are defined as:
Mp
N
}
(2.2.1)
compressible
=
{x
R
:
{supp(x)
:
|x
|
t}
j
k
tp
Thus, compressible signal if sorted coefficients in some basis decay very
rapidly and it has been empirically found out that all real world signals
satisfy this property. This is being shown in Fig1.
2.3
Problem Formulation
Standard compressed sensing problem consists of reconstruction of a sparse

signal vector x RN from under-determined measurements y (also called
measurement vector) RM given sensing matrix RM N . CS tries to
design encoder-decoder pair in such a way that number of measurements
required to capture the information of the underlying process is minimum
without destroying the information present in signal.
2
Figure 1: Compressible Signal

2.3.1
Encoder
Encoder tries to find a representation of a sparse signal such that number of

measurements is minimum. Mathematically, it can be written as:
y =x
(2.3.1)
If signal vector x is not sparse in canonical basis but does have sparse representation in some Dictionary RN N , then problem becomes:
y =
(2.3.2)
However, CS requires (sensing matrix) to be as incoherent as possible with

dictionary .
2.3.2
Decoder
Decoder or recovery algorithm aims to reconstruct back the sparse signal x

given the measurements y with minimum reconstruction error. Mathematically,
x =(y)
=(x)
2.4
(2.3.3)
(2.3.4)
Nullspace Property (NSP)
Theorem: Given RM N , if M 2K, K being the sparsity level, then

there is a decoder such that:
(x) = x
3
x K
(2.4.1)
It is equivalent to saying
2K N () = 0
(2.4.2)
which is also known as Null Space Property. Thus, one of the major condition
of perfect reconstruction of k-sparse vectors is that nullspace of should only
contain dense vectors.
2.5
Coherence and Babel function
Nullspace property is not easily verifiable by a direct computation. So, coherence parameter , which can be loosely used as a measure of NSP, is defined
as:
|t j |
(2.5.1)
= max ti
ij
/
|i ||j |
Coherence is used to assess the quality of measurements and ideally it should
be 0 or close to 0.
However, coherence is a blunt penalization measure and even if only few
columns of sensing matrix are correlated then also coherence will be high.
Thus, more robust measure to assess the quality of sensing matrix is babel
function (p) defined as:
X |t j |
i
}}
(2.5.2)
(p) = max{max{
t
||=p j
/
|
i ||j |
i
2.6
Restricted Isometry Property (RIP)
Restricted Isometry Property (RIP) ensures robustness of reconstruction of

k-sparse vectors. We say satisfies RIP of order K if there is a 0 < k < 1
such that
(1 k )||||l2 ||||l2 (1 + k )||||l2
(2.6.1)
Random Matrices satisfy RIP with overwhelming probability.
2.7
Recovery Algorithms
For practical purposes, recovering k sparse signals in real time is essential.

Basically, we need to solve following optimization problem:
min ||x||0
s.t.y = x
4
(2.7.1)
(2.7.2)
It means, we search for the sparsest vector consistent with the measured data
y = Ax. However, this problem is NP hard and cant be solved in real time for
large N. Various algorithms are proposed in literature to solve this problem
either by convexification of objective function or by greedy strategies. Now a
days, Bayesian methods exploiting Sparse Bayesian Learning framework are
gaining much attention of research community. Recovery algorithms mainly
fall into following categories:
Optimization Methods
Basis Pursuit
Basis Pursuit Denoising
Greedy Methods
Orthogonal Matching Pursuit (OMP)
Compressive Sampling Matching Pursuit (CoSaMP)
Thresholding Based Methods
Hard Thresholding
Iterative Hard Thresholding (IHT)
Bayesian Algorithms
MAP based approach
Hierarchical or empirical Bayes
2.7.1
Optimization Methods
In optimization methods (like Basis Pursuit), we convexify l0 norm objective

function and replace it by l1 norm as follows:
min ||x||1
s.t.y = x
(2.7.3)
(2.7.4)
It can be shown that sparsest possible solution will have minimum l1 norm
provided NSP and RIP properties of matrix are satisfied.
Figure 2: Orthogonal Matching Pursuit

2.7.2
Rationale of greedy Recovery Algorithms
We solve the following optimization problem directly:

min ||x||0
s.t.y = x
(2.7.5)
(2.7.6)
Greedy algorithms involve two main tasks to solve the above optimization
problem:
Identification of sparse support.
Magnitude of values over this support.
e.g. Fig2 demonstrate OMP algorithm.
Adaptive Compressed sensing [7]
Object of interest can be represented as a linear combination of small no of

elementary functions. Sparse recovery problem is to extract those elementary
6
functions from various measurements. Most of the sparse recovery methods

deal with non adaptive measurements.
1
2
y = .. x
.
M
Here, each entry of y is chosen independently of x. In adaptive CS framework,
we can choose entries of yi sequentially i.e. each entry will depend on previous
gathered information and hence emphasize on the subspaces that are more
likely to contain x.
3.1
Bayesian Adaptive Sensing
Bayesian perspective provides a natural framework for sequential adaptive

sensing, wherein information gleaned from previous measurements is used to
automatically adjust and focus the sensing.
Let Q1 be a probability measure over all m n matrices having a unit
Frobenius norm in expectation. It still implies that SNR is independent of
m, but also allows for the possibility of distributing the measurement budget
more flexibly throughout the columns. By adaptively adjusting the variance
of distributions used to generate the entries, these sensing matrices can place
more or less emphasis on certain components of signal. Ideally, this adaptive
and sequential approach to sensing will tend to focus on x so that sensing
energy is allocated to the correct subspace, increasing SNR of measurements
relative to non-adaptive sensing.
Statistical Compressed Sensing [8]
Statistical Compressed Sensing (SCS) aims at efficiently sampling a collection of signals that follow a statistical distribution and achieving accurate
reconstruction on average.
Z
Ex ||x (x)||X = ||x (x)||X f (x)dx
(4.0.1)
Unlike in original compressed sensing problem where aim is to reconstruct
the signal with minimum error, here aim is to control the average error over
7
a collection of signals. Thus, modeling the signal with suitable prior not
only provide a statistical framework but can also be extended to structured
sparsity models.
Dictionary Learning [9]
Representation of signal vectors x Rm in terms of small number of basis

vectors from an over-complete dictionary / is the main task of sparse
coding. Here, we assume the knowledge of sensing matrix or dictionary
and find the sparse representation. However, if we dont know /, then
we try to learn this matrix. Formally, given a set of data {xi }Ti=1 , we wish
to find the dictionary such that data vectors are expected to have sparse
representation in /.
5.1
K-SVD
K-SVD is the most celebrated algorithm in literature for Dictionary Learning.

It is an iterative algorithm that alternates between
Sparse Coding
Codebook Update
5.1.1
Sparse Coding
Sparse coding involves solving the following optimization problem:

min{||y Dx||2F }
D,x
s.t.
||xi ||0 T0
(5.1.1)
(5.1.2)
Here, D is the dictionary to be found out. This problem can be solved by

any of the methods mentioned in Section2.7.
5.1.2
Codebook Update
In codebook update followin are the steps:

x and D except k th column are held fixed.
8
Decomposition of Dx to the sum of rank 1 matrices.

||y
Dx||2F
= ||y
K
X
dj xjT ||2F
(5.1.3)
dj xjT dk xkT ||2F
(5.1.4)
j=1
= ||y
X
j6=k
= ||Ek dk xkT ||2F
(5.1.5)
Here, Ek are the errors for all examples when k th atom is removed. Here,
SVD is finding the closest rank 1 approximation and minimizing the error.
One thing is to be noted that optimizing over both the dictionary and
the coefficients in the expansions results in a non-convex program, even when
using l1-minimization. Therefore, it is notoriously hard to establish a rigorous mathematical theory of dictionary learning despite the fact that the
algorithms perform well in practice.
Structured Compressed Sensing [10,11]
The measurement bound M 2K has not been achieved till date. To get
lower measurement bound, either structure needs to be imposed on sensing
matrix or standard sparsity prior needs to be replaced by the more general.
Both of the problems take into account real time Compressed Sensing involving hardware implementations. Also, sensing modalities impose constraints
on the type of sensing matrix to be used. Most of the times, it is not feasible
to implement random matrices in real world application. So, to impose specific structure on sensing matrix without harming its properties (e.g. RIP)
is holy grail in CS literature.
Non-linear signal estimation (like greedy algorithms) based on sparse
models enjoys the full degree of freedom of subspace among any possible
combination of dimensions whose number is large. Also the optimization
over general unstructured overcomplete dictionaries is often expensive and
unstable. Therefore, dictionaries must be replaced by more structured measurement operators depending on the application of interest e.g. wireless
channels, analog sampling hardware, sensor networks and optical imaging.
In CS literature, various structures are imposed on the measurements
namely:
9
Figure 3: Multiple Measurement Vectors

6.0.3
Structure in finite dimensional models
Multiple measurement vectors (MMV)

Union of subspaces
6.0.4
Structure in infinite dimensional models
Shift invariant spaces of analog signals

Finite union of infinite dimensional subspaces
Infinite union of finite dimensional subspaces
Infinite union of infinite dimensional subspaces
6.1
Multiple Measurement Vectors (MMV) [12,13]
In multiple measurement Vectors (MMV), support of every column is identical i.e. Common sparsity assumption and less non-zero rows. It ensures
unique and global solution. (Common sparsity assumption). Mathematically,
Y = X + V
(6.1.1)
10
where measurement matrix Y = [Y.1 Y.2 ...Y.L ] RN L , data matrix X =

[X.1 X.2 ...X.L ] RM L , mixing matrix RN M , and noise matrix V.
This is shown in Fig3.
Various methods are proposed in literature which exploit correlation among
the entries of data matrix (X). One such family of methods falls into the category Model Based Compressed Sensing [14]. However, it is very difficult
to incorporate correlation structure in the objective function which will also
ensure the sparsity of representation. Bayesian Methods have proven to be
very useful in this context because of Sparse Bayesian Learning framework
which incorporates correlation term by introducing posterior density over the
priors in hierarchical manner [15].
Conclusion
Signal acquisition based on compressive sensing can be more efficient than

traditional sampling for sparse or compressible signals. We can conclude
from this paper that mathematical and computational methods can have
an enormous impact in the areas where conventional hardware design has
significant limitations.
References
[1] David Donoho,Compressed Sensing, IEEE Trans. on Information Theory,
Vol. 52, No. 4, pp. 1289-1306, 2006.
[2] E. J. Cands, J. Romberg, and T. Tao, Robust uncertainty principles:
Exact signal reconstruction from highly incomplete frequency information,
IEEE Trans. on Information Theory, Vol. 52, No. 2, pp. 489509, 2006.
[3] S. Mallat, A Wavelet Tour of Signal Processing, San Diego, CA: Academic, 1998.
[4] Albert Cohen, Wolfgang Dahmen, and Ronald DeVore, Compressed sensing and best k-term approximation, 2007.
[5] S. Foucart and H. Rauhut, Mathematical Introduction to Compressive
Sensing, Springer Science, 2013.
11
[6] J.A. Tropp, Greed is Good: Algorithmic Results for Sparse Approximation
, IEEE Trans. on Information Theory, Vol. 50, No. 10, pp. 2231-2242,
2004.
[7] Jarvis Haupt and Robert Nowak, Adaptive Sensing for Sparse Recovery,
Compressed Sensing: Theory and Applications, Cambridge University
Press, Chapter 6, 2012.
[8] G. Yu and G. Sapiro, Statistical Compressed Sensing, IEEE Trans. on
Signal Processing, Vol. 59, No. 12, pp. 5842-5858, 2011.
[9] M.Aharon, M.Elad, A.Bruckstein, The K-SVD:An algorithm for designing of over-complete dictionaries for sparse representation, IEEE Trans.
Signal Process. Vol. 54, No. 11, pp. 43114322, 2006.
[10] Marco F. Duarte and Yonina C. Eldar, Structured Compressed Sensing:
From Theory to Applications, IEEE Trans. on Signal Processing, Vol 59,
No. 9, pp. 4063-4085, 2011.
[11] L. Zelnik-Manor, K. Rosenblum, and Yonina C. Eldar, Sensing Matrix
Optimization for Block-Sparse Decoding, IEEE Trans. on Signal Processing, Vol. 59, No. 9, pp. 4300-4312, 2011.
[12] Palmer, J. A., Wipf, D.P., Kreutz-Delgado, K., Rao, B.D. Variational
EM Algorithms for Non-Gaussian Latent Variable Models, Advances in
Neural Information Processing Systems, MIT Press, 2005.
[13] Z. Zhang and B. D. Rao, Sparse signal recovery with temporally correlated source vectors using sparse Bayesian learning, IEEE Journal of
Selected Topics on Signal Processing, Vol. 5, No. 5, pp. 912926, 2011.
[14] Richard G. Baraniuk, Volkan Cevher, Member, Marco F. Duarte, and
Chinmay Hegde, Model-Based Compressed Sensing, IEEE Trans. on Information Theory, Vol. 56, No. 4, pp. 1982-2001, 2010.
[15] M. E. Tipping, Sparse Bayesian learning and the relevance vector machine, JMLR, vol. 1, pp. 211244, 2001
12

Compressed Sensing and Sparse Representations

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Compressed Sensing and Sparse Representations

Uploaded by

Copyright:

Available Formats

Compressed Sensing: A survey

Basics of Compressed Sensing

The typical paradigm for obtaining a compressed version of a discrete signal

We say k be the set of all k sparse vectors:

Due to noise either during acquisition or due to sensors limitations, data is

Standard compressed sensing problem consists of reconstruction of a sparse

Figure 1: Compressible Signal

Encoder tries to find a representation of a sparse signal such that number of

However, CS requires (sensing matrix) to be as incoherent as possible with

Decoder or recovery algorithm aims to reconstruct back the sparse signal x

Nullspace Property (NSP)

Theorem: Given RM N , if M 2K, K being the sparsity level, then

Coherence and Babel function

Restricted Isometry Property (RIP)

Restricted Isometry Property (RIP) ensures robustness of reconstruction of

For practical purposes, recovering k sparse signals in real time is essential.

In optimization methods (like Basis Pursuit), we convexify l0 norm objective

Figure 2: Orthogonal Matching Pursuit

Rationale of greedy Recovery Algorithms

We solve the following optimization problem directly:

Adaptive Compressed sensing [7]

Object of interest can be represented as a linear combination of small no of

functions from various measurements. Most of the sparse recovery methods

Bayesian Adaptive Sensing

Bayesian perspective provides a natural framework for sequential adaptive

Statistical Compressed Sensing [8]

Dictionary Learning [9]

Representation of signal vectors x Rm in terms of small number of basis

K-SVD is the most celebrated algorithm in literature for Dictionary Learning.

Sparse coding involves solving the following optimization problem:

Here, D is the dictionary to be found out. This problem can be solved by

In codebook update followin are the steps:

Decomposition of Dx to the sum of rank 1 matrices.

dj xjT dk xkT ||2F

= ||Ek dk xkT ||2F

Structured Compressed Sensing [10,11]

Figure 3: Multiple Measurement Vectors

Structure in finite dimensional models

Multiple measurement vectors (MMV)

Structure in infinite dimensional models

Shift invariant spaces of analog signals

Multiple Measurement Vectors (MMV) [12,13]

where measurement matrix Y = [Y.1 Y.2 ...Y.L ] RN L , data matrix X =

Signal acquisition based on compressive sensing can be more efficient than

You might also like