You are on page 1of 45

Robust Signal Recovery: Designing Stable Measurement

Matrices and Random Sensing


Ragib Morshed
Ami Radunskaya
April 3, 2009
Pomona College
Department of Mathematics
Submitted as part of the senior exercise for the degree of
Bachelor of Arts in Mathematics
Acknowledgements
I would like to thank my family for pushing me so far in life, my advisors, Ami Radunskaya
and Tzu-Yi Chen for all their help and motivation, my friends and everyone else who has
made my college life really amazing. This thesis is not just a product of my own eort, but
more.
Abstract
In recent years a series of papers have developed a collection of results and theories showing
that it is possible to reconstruct a signal f R
n
from a limited number of linear mea-
surements of f. This broad collection of results form the basis of the intriguing theory of
compressive sensing, and has far reaching implications in areas of medical imaging, compres-
sion, coding theory, and wireless sensor network technology. Suppose f is well approximated
by a linear combination of m vectors taken from a known basis . Given we know nothing
in advance about the signal, f can be reconstructed with high probability from some limited
nonadaptive measurements. The reconstruction technique is concrete and involves solving a
simple convex optimization problem.
In this paper, we look at the problem of designing sensing or measurements matrices.
We explore a number of properties that such matrices need to hold. We prove that one
of these properties in an NP-Complete problem, and give an approximation algorithm for
estimating that property. We also discuss the relation of randomness and random matrix
theory to measurement matrices, develop a template based on eigenvalue distribution that
can help determine random matrix ensembles that are suitable as measurement matrices,
and look at deterministic techniques for designing measurement matrices. We prove the
suitability of a new random matrix ensemble using the template. We develop approaches to
quantifying randomness in matrices using entropy, and a computational technique to identify
good measurement matrices. We also briey discuss some of the more recent applications of
compressive sensing.
Contents
1 Introduction 4
1.1 Nyquist-Shannon Theorem and Signal Processing . . . . . . . . . . . . . . . 4
1.2 Compressive Sensing: A Novel sampling/sensing Mechanism . . . . . . . . . 6
1.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 Background 7
2.1 The Sensing Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Sparsity and Compressible Signal . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Transform Coding and its ineciencies . . . . . . . . . . . . . . . . . 8
2.3 Incoherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 What Compressive Sensing is trying to solve? . . . . . . . . . . . . . . . . . 10
2.5 Relation to Classical Linear Algebra . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 l
2
norm minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.7 l
0
norm minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.8 Basis Pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Signal recovery from incomplete measurements 14
4 Robust Compressive Sensing 15
5 Designing Measurement Matrices 16
5.1 Randomness in Sensing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
5.2 Restricted Isometry Property (RIP) . . . . . . . . . . . . . . . . . . . . . . . 17
5.2.1 Statistical Determination of Suitability of Matrices . . . . . . . . . . 20
5.3 Uniform Uncertainty Principle (UUP) and Exact Reconstruction Principle
(ERP) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.4 Random Matrix Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4.1 Gaussian ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.4.2 Laguerre ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.3 Jacobi ensemble . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.5 Deterministic Measurement Matrices . . . . . . . . . . . . . . . . . . . . . . 29
5.6 Measurement of Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1
5.6.1 Entropy Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.6.2 Computational Approach . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
6 Applications of Compressive Sensing 35
7 Conclusion 36
2
List of Figures
1.1 How Shannons theorem is used? . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Example of sparsity in digital images. The expansion of the image on a wavelet
basis is shown to the right of the image. . . . . . . . . . . . . . . . . . . . . 9
2.2 l
2
norm minimization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 l
0
norm minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.4 l
1
norm minimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.1 Typical measurement matrix dimensions . . . . . . . . . . . . . . . . . . . . 18
5.2 Reduction using Independent Set. Vertex a and c form an independent set of
this graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 Statistical determination of Restricted Isometry Property [23] . . . . . . . . 21
5.4 Eigenvalue distribution of a Wishart matrix. Parameters: n=50, m=100 . . . 26
5.5 Eigenvalue distribution of a Manova matrix. Parameters: n=50, m1=100,
m2=100 (nm1 and nm2 are the dimensions of the matrix G forming the
two Wishart matrices respectively) . . . . . . . . . . . . . . . . . . . . . . . 27
3
Chapter 1
Introduction
Signals are mathematical functions of independent variables that carry information. One can
also view signals as an electrical representation of time-varying or spatial-varying physical
quantities, the so called digital signal. Signal processing is the eld that deals with the
analysis, interpretation, and manipulation of such signals. The signals of interest can be
of the category of sound, video, images, biological signals such as in MRIs, wireless sensor
networks and others. Due to the myriad applications of signal processing systems in our day
to day life, it turns out to be an important eld of research.
The concept of compression has also enabled us to store and transmit such signals in
many modern-day applications. For example, image compression algorithms help reduce
data sets by orders of magnitude, enabling the advent of systems that can acquire extremely
high-resolution images [3]. Signals and compression are apparently interlinked, and that
has made it feasible to develop all sorts of modern-day innovations. In order to acquire the
information in the signal, the signal needs to be sampled. Conventionally, sampling signals
is determined by Shannons celebrated theorem.
1.1 Nyquist-Shannon Theorem and Signal Processing
Theorem 1. Suppose f is a continuous-time signal whose highest frequency is less than
W/2, then
f(t) =

nZ
f(
n
W
)sinc(Wt n).
where sinc(x) =
sin(x)
x
.
f has a continuous Fourier transform and f(
n
W
) are the samples of f(t) taken at intervals
of
n
W
. We exactly reconstruct the signal f from its samples, weighted with shifted, scaled
sinc functions. We deal with the proof of this theorem later in the paper (proof outline is
given in Appendix A). However, in essence, the theorem states that in order to reconstruct
a signal perfectly, the sampling rate must be at least twice the maximum frequency present
in the signal.
4
Figure 1.1: How Shannons theorem is used?
The example in Figure 1.1 will make Theorem 1 clearer. The curve is a Fourier transform
of the signal that is to be sampled. According to Nyquist-Shannon theorem, in order to
reconstruct the original signal (or its Fourier transform) perfectly, the signal needs to be
sampled at twice the maximum frequency present in the signal (the red dots in the gure in
this example).
This sampling frequency, W in this case, is known as the Nyquist rate, and underlies
all signal acquisition protocols used in consumer electronics, radio receivers, and medical
imaging devices. Although, there are systems and devices that are not naturally bandlimited,
but their construction usually involves using bandlimiting lters before sampling, and so is
also dictated by Shannons theorem [1].
The Nyquist rate turns out to be really high for a lot of cases leading to ineciencies in
the protocol system. For example, in typical digital cameras, the millions of pixels take in a
lot of information about the environment, but then uses compression techniques to represent
the information by storing only the important coecients, after transformation of the signal
into an appropriate basis, and throwing away the rest of the insignicant coecients. This
especially becomes a problem for MRI devices which are hardware limited [1]. For MRI
devices, in order to get a really good MRI image, a lot of samples are necessary, which
translates to keeping the patient in the MRI machine for a really long time. Compared to
typical MRI times of maximum a few minutes, this would require having a patient in the
machine for a few hours. For a lot of cases, this is practically impossible because of the cost
and time constraints involved.
However, it is possible to move away from the traditional method of signal acquisition
towards more ecient methods. In fact, it might be possible to even build the compression
process into the acquisition process itself. In recent years, a novel theory of sensing/sampling
paradigm has emerged known as compressive sensing or compressive sampling (CS) that can
essentially solve this problem.
5
1.2 Compressive Sensing: A Novel sampling/sensing
Mechanism
Compressive sensing theory asserts that it is possible to recover certain signals from far
fewer measurements or samples as dictated by the traditional theory. From an information
theory point of view, this is equivalent to saying that given a few measurements about some
data, where it seems the measurements are not enough to accurately reconstruct the data,
it might still be possible to reconstruct the data given certain structure exists in the data.
An interesting aspect of compressive sampling is that it touches upon numerous elds in
the applied sciences and engineering such as statistics, information theory, coding theory,
and theoretical computer science [4]. Currently, there are a number of research groups
working on dierent applications of compressive sensing as well. For example, compressive
sensing techniques have been used to develop a robust face recognition algorithm [15], and
classication of human action using wearable motion sensor network [16]. In this paper, we
look in details of this new theory, specially focusing on designing measurement matrices for
signal acquisition.
1.3 Summary
In this section we looked at the traditional theory that guides almost all signal processing
systems (for a proof of the theorem see Appendix A). We looked at some of the problems
associated with this theory when the measurement process is limited or very expensive. A
possible solution to this problem of reconstructing data from seemingly fewer measurements
than necessary is addressed by the theory of Compressive Sensing. Compressive Sensing
asserts that, under certain constraints, it is possible to reconstruct data from far fewer
measurements of the data than dictated by the the traditional theory.
6
Chapter 2
Background
Compressive sensing, also known as compressive sampling, emerged as a theory around
the year 2000, and exploded in the following years. Compressive sensing goes against the
conventional wisdom of signal processing (Nyquist-Shannon Theorem) by stating that this
is inaccurate. Surprisingly, it is possible to reconstruct a signal (for example an image) quite
accurately, or even exactly, from very few available samples compared to the original signal
dimension. The novelty of compressive sensing lies in its underlying challenge to typical
signal processing technique [4].
Before delving into the theory behind compressive sensing, one needs to understand the
sensing problem itself from a mathematical perspective.
2.1 The Sensing Problem
The sensing problem is simply the question of how one can obtain information about a signal
before any kind of processing is applied on it. Throughout this paper, for all purposes of the
signal mechanisms described, information about a signal f(t) is obtained as linear functionals
recording the values
y
k
= f,
k
) , k = 1, ..., m (2.1)
The idea is to establish a correlation between the object/signal one wants to acquire with
some sort of waveform,
k
(t). For example, the sensing waveform(s), , can be the spike
function. In that case, the resultant vector, y, is a vector of samples of f in either the time
or space domain. For a more applied perspective, the waveform(s) can be thought of as an
indicator function of pixels; y is then the image data that is typically collected by sensors in
digital cameras [1]. In this paper, we restrict ourselves to the discrete signal case, f R
n
.
The discrete case is simpler and the current theory of compressive sensing is far developed
for this case than the continuous - although the continuous case is pretty similar [1].
The theory of compressive sensing is built upon two very important concepts - spar-
sity and incoherence. These two ideas form the fundamental premise that underlies all of
compressive sensing.
7
2.2 Sparsity and Compressible Signal
The concept of Sparsity is not new in signal processing. A signal, f, is said to be sparse if
the information in the signal can be represented using only a few non-zero coecients when
the signal is expanded on an orthonormal basis, . More formally, f R
n
(for example, a n
pixel image) is expanded in an orthonormal basis, such as wavelet basis, = [
1
,
2
, ...,
n
]
as follows [1]:
Denition 1 (Sparsity).
f(t) =
n

i=1
x
i

i
(t) (2.2)
where x is the coecient sequence of f, x
i
=< f,
i
>.
The signal f is said to be S sparse if only S of the coecients of x are non-zero. The
essential point here is that when a signal has sparse representation, one can discard the small
coecients without any signicant loss of information in the signal.
Signals like f in the real world are not exactly sparse in any orthogonal basis. However,
a common model termed compressible signals quite accurately approximate the nature of
sparse representation [6]. A compressible signal is such that the reordered entries of the
expansion coecients of f decay like a power law; i.e. if arranged in decreasing order
of magnitude, the nth largest entry obeys [x
n
[ Const n
s
for some s 1. Ideas of
compression are already in use in many typical compression algorithms (JPEG-2000) [2].
Another example of a typical sparse representation basis for signals is the commonly
known Fourier basis,
j
(t) = n

1
2
e
i2jt
n
.
Figure 2.1 shows an example of sparsity and compressible signal
1
. The signal is an image
which can be transformed into a one dimensional vector with the pixel values forming the
entries of this vector. The expansion of this signal on a wavelet basis is shown to the right
of the picture. If we look at the expansion, it is apparent that this signal does not depict
sparsity exactly since there are a lot of nonzero constants, some larger than others. However,
this signal can be modeled as compressible. By introducing a threshold level in the plot such
that any nonzero value smaller than this threshold is considered as zero, and any nonzero
value above the threshold is left as it is, one can model sparsity quite accurately.
2.2.1 Transform Coding and its ineciencies
Many modern day system uses the concept of sparsity, for example JPEG-2000 lossy
coders for image compression as mentioned in section 2.2. In this section we briey explore
the idea of lossy compressors, and look at the ineciencies inherent with this technique.
In general, Transform Coding works as follows [2]:
Compute x from f.
Adaptively encode locations and values of the signicant coecients.
1
This picture is taken from [1]
8
Figure 2.1: Example of sparsity in digital images. The expansion of the image on a wavelet
basis is shown to the right of the image.
Throw away the smaller ones.
Lossy compressors involve throwing away a lot of the information initially obtained while
encoding the information to a lower dimension. A lot of extra measurements are taken before
encoding is done, and a chunk of this information that was measured is simply ignored for
encoding. Compressive sensing avoids this ineciency by building the compression with
the acquisition process itself. The idea of sparsity has more implications in compressive
sensing than just its denition; it helps to determine how eciently one can acquire signals
non-adaptively.
2.3 Incoherence
Another important feature that is essential to compressive sensing is the concept of Incoher-
ence. The formal denition of Incoherence is as follows [1]:
Denition 2 (Incoherence). Given a pair of orthobases (, ) of R
n
, the coherence between
the sensing basis and the representation basis is
(, ) =

n max
1k,jn
[
k
,
j
)[ (2.3)
From linear algebra, (, ) [1,

n].
In plain English, the coherence measures the largest correlation between any two elements
of and . Conceptually, the idea of coherence extends the duality between time and
frequency and expresses the idea that objects having a sparse representation in one domain
will have a spread out representation in the domain it is acquired.
For example, is the spike basis
k
(t) = (t k) and is the Fourier basis
j
(t) =
n
0.5
e
i2
jt
n
. The coherence value between this pair of basis is 1.
9
In compressive sensing, we are interested in very low coherence pairs, i.e. the value of
coherence is close to 1. Intuitively, this makes sense since the higher the incoherence between
the sensing and representation domain the lesser the number of coecients necessary to
represent the information in the signal.
2.4 What Compressive Sensing is trying to solve?
Given a signal that has a sparse representation in any basis, compressive sensing theory
tells us that it is possible to recover the sparse signal from a limited set of measurements
than deemed by traditional signal processing. The formal denition of the problem that
compressive sensing is trying to solve is articulated below:
Denition 3 (Compressive Sensing problem). Given a sparse signal x
0
R
n
whose sup-
port T
0
= t : x
0
(t) ,= 0 has small cardinality, and all we know about x
0
are m linear
measurements of the form
y
k
= x
0
, a
k
), k = 1, ..., m or y = Ax
0
where a
k
R
n
are known test signals.
When this system is vastly underdetermined, m << n, can we recover the signal x
0
?
2.5 Relation to Classical Linear Algebra
Ideas from compressive sensing closely resounds with those in classical linear algebra. The
concept of solving the system Ax = b, a most common problem in classical linear algebra, is
what compressive sensing deals with as well. The sparse signal is represented as x, A being
the measurement or sensing matrix that extracts the information from the signal x, and b is
the set of measurements which has a dimension far less than the dimension of x.
When the matrix A has more rows than columns, that is, the classical system is overde-
termined, the exact solution x is recoverable. The number of rows correspond to the number
of constraints, and in an overdetermined system these constraints serve to reduce the dimen-
sion or subspace where the solution can lie, eventually boiling down on the exact solution
x.
Compressive sensing is more interested with the second case: the matrix A has more
columns than rows, that is, this classical system is underdetermined. According to theory
of linear algebra, such a system has innitely many solutions to x, since each row, being a
constraint, only reduces the high dimension into lower dimensions (for example a line or a
plane) where innite solutions of x lie.
A best approximation can be determined classically using the idea of least squares solution
(minimum l
2
norm), or in other words a solution with the least energy. This technique gives
a solution that satises the constraint of the systems most closely. For a lot of cases, this
least square approximation solution is sucient. However, when such a technique is applied
10
to for signal recovery in compressive sensing, this completely fails: the sparse signal cannot
be recovered. Compressive sensing takes advantage of the sparsity in the signal, and by
treating this as an additional constraint to the system Ax = b, enables recovery of the sparse
solution. There are new and ecient techniques that have been developed for compressive
sensing which can solve the recovery problem eciently. This is a huge dierence that sets
apart ideas from compressive sensing and classical linear algebra.
2.6 l
2
norm minimization
From the previous section, l
2
fails to recover the sparse solution x to the system Ax = b.
The mathematical formulation for this minimization problem is given by equation(2.4).
x
#
= argmin[[x[[
2
such that Ax = y (2.4)
Figure 2.2: l
2
norm minimization.
Figure 2.2 shows why this fails. Assume that our space is two dimensional, and the
solution x lies on the lower dimensional subspace represented by the slanted straight line.
The sparse solutions always lie on the coordinate axis. If we create an l
2
ball, i.e. a circle,
and expand it from the origin, the point of intersection of the l
2
ball with the subspace where
x lies is the best approximation that is obtained. Clearly, as we can see from gure 1, this
solution is incorrect.
11
2.7 l
0
norm minimization
Failure of l
2
norm minimization demands the need for other techniques. For example, the l
0
norm minimization as shown in equation(2.5).
x
#
= argmin[[x[[
0
such that Ax = y where [[x[[
0
= max
n
[x
i
[ (2.5)
The l
0
norm is a less frequently used norm, but the idea is very simple. It is written as
[[x[[
0
= max
n
[x
i
[. That is, it checks for presence of non-zero components in the sparse signal
x.
Figure 2.3: l
0
norm minimization
In gure 2.3, the l
0
norm is represented as the boxes on the axis, that is, whether that
particular component is present or not. Similar to gure 2.2, the solution x lies on the
subspace represented by the slanted straight line. The l
0
norm coincides with component(s)
of the sparse solution on the subspace. Figure 2.3 also depicts an important idea of l
0
norm
minimization; the solution coincides with two possible locations in the subspace where x
lies. In general, one needs to check all possible components for x. For sparse signals of
very high dimension, this leads to combinatorial explosion and the minimization problem,
equation (2.5), becomes exponentially hard to solve. In fact, nding the sparsest solution to
any general underdetermined system of equations is NP-hard [8].
2.8 Basis Pursuit
Although l
0
norm potentially recovers sparse signals exactly, the combinatorial nature of
the problem makes it dicult for all practical purposes. Basis Pursuit method, or l
1
norm
minimization, has been developed that can recover the sparse solution correctly and do
not have the inherent combinatorial complexity of l
0
norm minimization. A mathematical
formulation of the problem is as follows:
12
x
#
= argmin[[x[[
1
such that Ax = y (2.6)
Figure 2.4: l
1
norm minimization
The l
1
norm is in the shape of a diamond in two dimensions (gure 2.4), octahedron in
higher dimensions, and is pointed at the corners where it meets the axis. Since our solution
x lies on the one dimensional subspace in the example, expanding the l
1
norm intersects the
subspace at the axis points where the sparse solutions lie. Thus, l
1
norm correctly recovers
the sparse solution.
Another great property of the l
1
norm is that it is convex. This enables reformulating
equation (2.6) as a linear programming problem, equation (2.7).
min
xR
n[[x[[
1
subject to Ax = y (2.7)
There are ecient algorithms that can solve linear programming problems like equation
(2.7). This leads to a practical usability of l
1
norm minimization technique for sparse signal
recovery.
2.9 Summary
We have presented the theory of compressive sensing. Compressive sensing is based upon two
important concepts: Sparsity and Incoherence. In this section, we have presented the formal
denition of a sparse signal, and also contrasted the idea of Sparsity and compressible signal
in compressive sensing to that of transform encoders for jpeg images. We also presented
the formal denition of Incoherence, which is important in determining the sensing basis
with respect to the sparse signal representation basis. The problem of Compressive sensing
has also been formally dened. Next, we turned our attention to sparse signal recovery.
We explored the relation of compressive sensing with classical linear algebra, and presented
details as to why a certain minimization technique works perfectly and eciently for sparse
signal recovery.
13
Chapter 3
Signal recovery from incomplete
measurements
Sparse signal recovery is an important area in compressive sensing where a lot of work
has been done, and is currently an area of research. The basic idea of signal recovery is
related to what we have seen in the previous sections using dierent norm minimizations,
but work is being done on renements of these techniques and more ecient algorithms.
The l
1
norm minimization is well established in the paper by Donoho [8]. Donoho also
showed that heuristic techniques to recover sparse solutions, for example, greedy algorithms
and thresholding methods, perform inadequately in recovering sparse solutions. Candes,
Romberg and Tao in [14] addresses issues to recovering data for both noiseless and noisy
recovery. They also relate recovery to the importance of well designed sensing matrices. The
idea of random projection and its relation with signal reconstruction is addressed by Candes
and Romberg in [6]. In this paper, they also highlight the importance of random projection
- we will look into details of eectiveness of randomness as a sensing mechanism in a later
chapter - and discuss its implication in areas of data compression and medical imaging.
14
Chapter 4
Robust Compressive Sensing
In reality, signals do not have an exact sparse representation. Such signals are modeled as
compressive signals (see Sparsity), with a threshold such that any values above the threshold
are considered non-zeros and values below are treated as zeros. This approximates the spar-
sity model. Signals also have inherent noise, in the form of measurement noise or instrument
noise. For example, when a signal is sent as packets over the Internet, a lot of these pack-
ets might get lost or corrupted. Given the signal has a sparse representation, compressive
sensing asserts that the original signal is still recoverable. Any small perturbation in the
observed signal will induce small perturbation in the reconstruction [4]. In this case, the
measurements are modeled as
y = Ax +e (4.1)
The vector e is a vector of errors or measurement noise. e can be either stochastic or
deterministic with a maximum total bound, i.e. [[e[[
l
2
. The sparse signal can still be
recovered, within its error bounds, by solving the l
1
minimization problem, equation (4.2).
min
xR
n[[x[[
1
subject to Ax y (4.2)
In this case, the reconstruction is consistent within the noise level present in the data.
We expect that Axy is within the noise level of the data since clearly we cannot do better
than this with the inherent noise. For this paper, we only consider the simplied case where
there is no noise in the signal measurements. However, the ideas we discuss in the paper can
easily be extended for the case of signals with noise.
15
Chapter 5
Designing Measurement Matrices
Measurement matrix design is an important problem in compressive sensing. It is also a
dicult problem. Designing a good measurement matrix is important since data recovery is
dependent on how well the limited measurements provide information about the structure
of the signal. An inherent quality of a good measurement matrix is that it should ensure
that there is enough mixing of the information from the original signal in the limited
measurements. In other words, there should be enough representative information of the
original signal in the limited set of measurements.
5.1 Randomness in Sensing
Randomness of measurement matrices turns out to be an important aspect in signal mea-
surements. This property ensures that the information represented by the limited set of
measurements is somewhat representative of the total information present in the original
signal. Random matrices are largely incoherent with any xed basis R
n
. This is
important because, as we have seen previously, incoherent pairs determine the number of
measurements necessary for any signal. Presence of high randomness in a measurement ma-
trix makes it suitable for use as a sensing matrix with any representation basis. In other
words, this can serve as an universal measurement matrix.
Candes and Romberg [5] have shown that for exact reconstruction with high probability,
only m random measurements uniformly chosen from the domain are needed,
m C
2
(, ) S log n (5.1)
where C is a positive constant, is the incoherence value, and S is the number of nonzero
components in the sparse signal x.
Theorem 2. [5] Let f R
n
and suppose x is the coecient expansion of f in the basis
and is S-sparse. Choose a subset of the measurement domain of size [[ = m, and let m
be selected in the domain uniformly at random. Then if
m C
2
(, ) S log n (5.2)
16
for some positive constant C, then the sparse signal can be recovered exactly from equation
(2.7), i.e. l
1
minimization, with overwhelming probability.
In [5], a detailed proof of this theorem is given and it is also shown that the probability
of success of exact reconstruction exceeds 1 if m C
2
(, ) S log(n/).
It is interesting to note a few things here. If the two basis are highly incoherent ( is
close to 1) then only a few samples are needed. There is no information loss by measuring
only just about any set of m coecients, and it may be far less than the signal size; if (, )
is very close to 1, only S log n measurements are sucient for exact reconstruction. This
is also a minimum bound and we cannot do with any lesser number of samples than this.
Also, in order to do the reconstruction using equation (2.7), no previous knowledge about
the number of nonzero coordinates of x, their locations, or their amplitudes is needed [1].
This also implies that the measurement basis can serve as an encoding system for signals.
In order to reconstruct a signal at least as good as f
S
, where f
S
is the representation of
the original signal using only the S largest coecient of the expansion of f, one only needs
measurements of the order of O(S log(n/S)) [1].
5.2 Restricted Isometry Property (RIP)
In order to determine how good any given matrix is as a sensing matrix, it has to satisfy the
Restricted Isometry Property (RIP). However, rst we need to dene the Isometry constant.
Denition 4 (Isometry constant). For each integer s = 1,2,..., dene the Isometry constant

S
of a matrix A as the smallest number such that
(1
S
)[[x[[
2
l
2
[[Ax[[
2
l
2
(1 +
S
)[[x[[
2
l
2
(5.3)
hold for all S-sparse vectors x.
A given matrix A satises RIP if
S
is not too close to 1. If it is close to 1, then the
equation is trivial to dene. Candes and Walkin [1] proved that for
S
<
_
(2) 1, the
solution to the reconstruction is exact with high probability if the measurements are taken
by the matrix satisfying this
S
value for any signal. There is also an important theorem
relating
S
and signal recovery.
Theorem 3. [12] (Noiseless recovery) Assume that
2S

_
(2) 1. Then the solution x

to the l
1
-norm minimization obeys
[[x

x[[
l
1
C
0
[[x x
S
[[
l
1
(5.4)
and
[[x

x[[
l
2
C
0
s
1/2
[[x x
S
[[
l
1
(5.5)
where C
0
is some constant. x
S
is dened a s the vector x with all but the S-largest entries
set to zero.
17
Two important assertions emerge from this theorem.
If
2S
1 the l
0
problem has a unique S-sparse solution.
If
2S

_
(2) 1 then the solution given by the l
1
minimization is exactly the solution
given by the l
0
minimization; the convex relaxation problem is exact.
[12] has more details regarding the proof of Theorem 3.
Restricted Isometry Property (RIP) is an elegant way to determine whether any arbitrary
matrix is suitable as a sensing matrix. In our case the measurement matrices are short and
fat (gure 5.1). Since the limited measurements we take are linear combinations of some
columns of the matrix A, linear independence somehow has to be preserved in these matrices.
Restricted Isometry Property (RIP) ensures that there are some subset of columns of the
Figure 5.1: Typical measurement matrix dimensions
matrix that are linearly independent/orthogonal. Specically, any size S subsets of arbitrary
columns of matrix A is nearly orthogonal.
It seems like one can take an innite number of matrices of some particular dimensions
and try to prove the Restricted Isometry Property (RIP) for each matrix. In this way, the
matrices that satisfy the property can be classied as belonging to the set of matrices suitable
for sensing purposes. However, there is an inherent problem with this idea. Firstly, one has
to start with a large set of matrices which can be dicult to obtain, but the major problem
lies in determining Restricted Isometry Property (RIP) for each matrix.
In fact, proving Restricted Isometry Property (RIP) for any matrix means one has to
check for all size S subsets of the columns of the matrix for near orthogonality. This might
be harder than it seems. For a given set of subsets of the columns of the matrix, it is easy
to test for orthogonality of these set of columns. We claim that the problem of determining
Restricted Isometry Property (RIP) is in Non-deterministic Polynomial Complete (NPC).
Lemma 1. Restricted Isometry Property (RIP) is NP-Complete.
Proof. Let A be the given matrix with c
i
s the columns of A. That is, A = [c
1
, c
2
, ..., c
m
].
Dene some subset, P, of columns of A such that [P[ = S, where S is the sparsity of the
signal. A holds RIP if there is a set P such that i, j in the index set of P, c
i
c
j
= 0
(the columns in this set are mutually orthogonal). We can dene the Restricted Isometry
Property problem as a formal language (Sparse Orthogonal Vector):
SOV = < A, S >: A has a subset of size S consisting of some columns from A such that
the columns are mutually orthogonal.
18
We now want to show what SOV is in NP. In order to do that, we show that given a
certicate consisting of a subset of columns of A, it is easy to verify in polynomial time
whether this set of columns are orthogonal. This can be done by checking orthogonality for
each pair of column vectors in the subset, and all these operations can be done in polynomial
time. Thus, SOV NP.
Now, we choose a known NPC problem Independent Set to reduce it down to SOV. The
Independent Set problem is dened as follows:
IS = < G, S >: Graph G has an independent set of size S.
We want to show that IS
p
SOV.
Label the edges with unique integers and create the matrix A = [c
1
, c
2
, ..., c
m
] where m equals
the number of vertices in the vertex set of G, and the number of rows of each column vector
equal the size of the edge set of G.
For each vertex of column, the entry is 1 if v
i
is incident to edge e
j
. For independent sets
of G, the entries of the columns are not in the same location in the vector since there are no
incident edges common between these vertices. This reduction can be done in polynomial
time by looking through the set of vertices and checking the incident edges on the vertices.
Figure 5.2: Reduction using Independent Set. Vertex a and c form an independent set of
this graph.
Thus, a maximally independent set of size S corresponds to the subset of size S of
mutually orthogonal columns from A.
To show that a yes in IS implies a yes in SOV. For the vertices forming IS, the columns
(by construction) are mutually orthogonal and the size of the subset of A is of size S as well.
To show that a yes in SOV implies a yes in IS. For some subset of size S of columns,
19
each column of this subset is mutually orthogonal to each other, that is, there are no 1 entries
in the same location for any column pair from this subset of columns. This translates to
vertices of G that do not have any incident edges common In other words, an independent
set of size S.
Hence, SOV is NP-Complete.
Thus by Lemma 1, we have shown that the Restricted Isometry Property (RIP) is NPC.
For any given matrix, it becomes exponentially harder (in the size of the matrix) to verify
satisability of RIP. Also, any polynomial time solution to RIP will solve the famous P=NP
problem.
5.2.1 Statistical Determination of Suitability of Matrices
Even though exact determination of Restricted Isometry Property (RIP) is dicult, being
able to have some idea whether a matrix satises this property can have practical signicance.
Instead of checking for each size S subsets of the columns of the given matrix for near
orthogonality, it might be ecient if it is possible to statistically determine for any matrix
how closely it satises Restricted Isometry Property (RIP). We were motivated by [23] to
provide an algorithmic approach to statistically determining Restricted Isometry Property.
The algorithm is polynomial time, and can be regarded as an approximation algorithm to
determining Restricted Isometry Property (RIP). The pseudo code is given as follows.
1. Pick a base random matrix generation process: Gaussian, Binary or Fourier.
2. For t =
T
M

M
, and for every value 1 to M, and for chosen base process
Generate sets M uniformly randomly over all sets, and scale each col-
umn to unit norm.
Calculate max and min eigenvalues.
3. Repeat step 2 for k samples of eigenvalues.
4. Calculate the mean and std. deviation of the max and min eigenvalues, for
each M.
5. Plot max and min eigenvalue vs Sparsity M.
Figure 5.3 shows an example of how the extremal eigenvalues of a matrix satisfying
approximate-RIP compare to the extremal eigenvalue distribution of a known, base matrix
that satises RIP. For this particular example, the green curve (dotted line) corresponds to
the base Gaussian matrix, and the blue curve (solid line) corresponds to a test matrix.
20
Figure 5.3: Statistical determination of Restricted Isometry Property [23]
5.3 Uniform Uncertainty Principle (UUP) and Exact
Reconstruction Principle (ERP)
Determining Restricted Isometry Property (RIP) for any given matrix is dicult. However,
it turns out that random matrix theory has important applications in designing measurement
matrices with randomness. In order for a random matrix, or consequently a random matrix
generation process, to be suitable as a sensing mechanism, it should to satisfy the Uniform
Uncertainty Principle (UUP) and Exact Reconstruction Reconstruction Property (ERP).
The set of matrices generated by any random process is treated as an ensemble of random
matrices, and these properties must be satised by the ensemble.
Before formally dening UUP and ERP, let us dene a few terms. An abstract mea-
surement matrix is dened as F

, which is a random matrix of size [[ N following some


probability distribution or random process. [[ is the number of measurements and is treated
as a random variable taking values between 1 and N; the set K is dened as K := E([[),
that is, it is the expected number of measurements. The entries of F

are also assumed to


be real valued [11].
Denition 5 (UUP: Uniform Uncertainty Principle [11]). Measurement matrix F

obeys UUP with oversampling factor if for suciently small > 0, the following holds
with probability of at least 1 O(N

) where > 0, for all subsets T such that [T[


K

.
F

obeys
1
2
K
N

min
(F

T
F
T
)
max
(F

T
F
T
)
3
2
K
N
(5.6)

1
2
K
N
[[f[[
2
l
2
[[F

f[[
2
l
2

3
2
K
N
[[f[[
2
l
2
(5.7)
holding for all signals f with support T.
21
In essence, Uniform Uncertainty Principle (UUP) is a denition about the maximum and
minimum eigenvalue distribution of the matrices. The constants
1
2
and
3
2
are inconsequen-
tial in equation (5.6). It simply gives concreteness to the denition of UUP. However, the
important aspect of the denition is the bounded nature of the minimum and maximum
eigenvalues of the measurement matrix.
Uniform Uncertainty Principle (UUP) is similar in nature to many other dierent prin-
ciples and results related to random projection. For example, Restricted Isometry Property
(RIP) is similar to Uniform Uncertainty Principle (UUP). Although, the denitions of these
are not apparently similar at all, however, both are conceptually similar. For matrices that
satisfy Restricted Isometry Property (RIP), column vectors picked from arbitrary subsets
of the set of columns of the measurement matrix are nearly orthogonal. The larger these
subsets the better is the measurement matrix. In essence, randomness of matrices ensures
that this property holds [1]. Random matrices also satisfy Uniform Uncertainty Principle
(UUP). In other words, the distribution of the eigenvalues of measurement matrices seems
to have some relation with the near orthogonality of column vectors taken from arbitrary
subsets. This apparent duality might also have interesting consequences since Uniform Un-
certainty Principle (UUP) can come to the rescue in cases where determining Restricted
Isometry Property (RIP) exactly might be dicult. Random matrices are one such example
since it is natural to look at random matrices using the idea of an abstract measurement
matrix, rather than trying to determine Restricted Isometry Property (RIP) exactly.
Uniform Uncertainly Principle (UUP) is the crux in proving whether a random matrix
ensemble is suitable as a sensing matrix. If an ensemble satises UUP, and a matrix from
this ensemble is used for taking measurements about some data, the original sparse signal
can be reconstructed using equation (2.7) with high probability.
However, there is second principle or hypothesis that is described by Candes and Tao in
[11]. The principle is dened as follows:
Denition 6 (ERP: Exact Reconstruction Principle [11]). Measurement matrix F

obeys ERP with oversampling factor if for all suciently small > 0, each xed subset
T obeying [T[
K

, and each sign vector dened on T, [[ = 1, there exists with


overwhelmingly large probability a vector P R
N
with the following properties:
1. P(T) = (t), t T;
2. P is a linear combination of the rows of F

, that is, P = F

V for some vector V of


length [[;
3. and [P(t)[
1
2
t T
C
= 0, ..., N 1T.
Overwhelmingly large means that the probability is at least 1O(N
/
for some xed positive
constant > 0.
Exact Reconstruction Principle (ERP) is crucial in determining whether the recon-
structed signal, x

, is close to the original sparse signal, x, in the l


1
norm. Both the Exact
Reconstruction Property (ERP) and Uniform Uncertainty Property (UUP) hypothesis are
22
closely related. We only concern ourselves with UUP in trying to determine whether a
random matrix ensemble is suitable as a sensing matrix because of the implication of the
form,
UUP ERP
for any signal x [11].
In order to prove Uniform Uncertainty Principle (UUP) for any random matrix ensemble,
using the denition, one only needs to look for ensembles where the maximum and minimum
eigenvalues of its matrices are bounded; that is, a distribution plot of the eigenvalues do
not tail o to innity. An abstraction for proving Uniform Uncertainty Principle (UUP)
by studying the proofs from paper by Candes and Tao [11] is given as follows (note that
this abstraction relies on known results on distribution of the singular values for the random
matrix ensemble of interest):
For X

X or the singular values of X


1. The eigenvalues have a bounded distribution, i.e. the distribution do not
tail o to innity.
2. There exists some inequalities on the probabilistic bounds (exponential for
some known results) on the distance of the largest and smallest eigenvalues
from the median.
3. UUP is satised if the above two conditions hold.
5.4 Random Matrix Theory
There are three known random matrix ensembles that hold the Uniform Uncertainty Principle
(UUP) [11]. These three ensembles are the Gaussian ensemble, Binary ensemble and Fourier
ensemble. The abstraction described in the previous section is based on studying UUP proofs
of these ensembles, and extracting out the most important aspects of the proofs necessary
for analyzing other random matrix ensembles. In this section we briey present the proof
outline for one of these known ensembles in order to show how such a proof would follow.
5.4.1 Gaussian ensemble
Lemma 2 ( [11]). The Gaussian ensemble obeys the uniform uncertainty principle (UUP)
with oversampling factor = log N.
Proof. Let X be an n by m matrix with n m, and with i.i.d. entries sampled from the
normal distribution with mean zero and variance
1
m
. According to our proof abstraction, we
are interested in the singular of X, that is, the eigenvalues of X

X.
23
There are known results about the eigenvalues of X

X, and one such result states that


the eigenvalues have a deterministic limit distribution supported on the interval
[(1

c)
2
, (1 +

c)
2
] as m, n , with m/n c < 1. In [11], a known result about the
concentration of the extreme singular values for the Gaussian matrix is used.
P(
m
(X) > 1 +
_
m/n +r) e
nr
2
/2
(5.8)
P(
1
(X) < 1
_
m/n r) e
nr
2
/2
(5.9)
Now the remaining part is to prove that Gaussian ensemble obeys the UUP, and determine
the oversampling factor = log N.
To do that, x K log N and let := 1, ..., K. Let T be a xed index set. The event
E
T
is dened as
E
T
:=
min
(F

T
) < K/2N
max
(F

T
) > 3K/2N
Now, adding equation (5.8) and (5.9), and replacing n = K and r
2
/2 = c, it is possible
to write
P(E
T
) 2e
cK
.
Candes and Tao then looks at the tightness of the spectrum over all sets T T
m
:=
[T[ m where it is assumed that m is less than N/2.
P(
Tm
E
T
) 2e
cK
[T
m
[ = 2e
cK

k=1
_
N
k
_
2e
cK
m
_
N
m
_
A known result
m

k=1
_
N
k
_
m
_
N
m
_
and
_
N
m
_
e
NH(m/N)
, where H is the binary entropy function, is used.
H(q) := q log q (1 q) log(1 q)
From the binary entropy function
1
, the inequality (1 q) log(1 q) q holds since
0 < q < 1.
Now, using the above
P(
Tm
E
T
) 2e
cK
m
_
N
m
_
(5.10)
2e
cK
me
NH(m/N)
(5.11)
2e
cK
me
N((m/N) log(m/N)(1(M/N)) log(1(m/N)))
(5.12)
2e
cK
me
mlog m+mlog N+m
(5.13)
2e
cK
me
mlog(N/m)+m
(5.14)
2e
cK
e
log m
e
mlog(N/m)+m
(5.15)
2e
cK
e
mlog(N/m)+m+log m
(5.16)
1
Note that this is natural log
24
Equation (22) holds since (1 m/N) log(1 m/N) (m/N).
Taking log of equation(25),
log(P(
Tm
E
T
)) log 2 cK +mlog(N/m) +m+ log m
log 2 cK +m(log(N/m) + 1 +m
1
log m)
log 2 K
given that m(log(N/m)+1+m
1
log m) (cK) which implies that cK+m(log(N/m)+
1 +m
1
log m) K.
Thus, the Gaussian ensemble satises UUP with oversampling factor proportional to
= log(N/K).
Details of this proof can be found in [11].
The proof is similar for the Binary ensemble and the Fourier ensemble. Although, for the
Fourier ensemble case, the bounded nature of the eigenvalue distribution is obtained using
ideas of entropy. However, in general, once a known result about the concentration of the
largest and smallest eigenvalue of a random matrix ensemble is known, the proof basically
follows a template.
For this paper, we are interested in other random matrix ensembles apart from the three
described above. So far we have been able to identify two more potential random matrix
ensembles.
Wishart Matrix (or Laguerre Ensemble):
Let G be a NM random matrix with independent, zero mean, unit variance elements.
W =
1
M
GG

The entries of G are i.i.d. normally distributed elements.


Manova Matrix (or Jacobi Ensemble):
The Manova matrix, J, is dened in terms of two Wishart matrices, W(c
1
) and W(c
2
)
as
J = (W(c
1
) +W(c
2
))
1
W(c
1
)
where c
1
=
N
M
1
, c
2
=
N
M
2
, and N M
1
,N M
2
are dimensions of matrix G forming the
Wishart matrices.
Figure 5.4 and 5.5 shows the distribution of the eigenvalues of the Wishart and Manova
matrix respectively. The reason these two ensembles are of interest to us is because of the
bounded nature of the maximum and minimum eigenvalues of the matrices belonging to
these ensembles. Matrices of either ensemble are symmetric. In practice, we want matrices
that are non-symmetric in nature (with rows less than columns), and symmetric matrices are
not currently practically useful. However, there might be possible applications of symmetric
matrices in sparse signal processing in the near future.
25
Figure 5.4: Eigenvalue distribution of a Wishart matrix. Parameters: n=50, m=100
26
Figure 5.5: Eigenvalue distribution of a Manova matrix. Parameters: n=50, m1=100,
m2=100 (n m1 and n m2 are the dimensions of the matrix G forming the two Wishart
matrices respectively)
27
Apart from these two, it would be interesting to gure out UUP for other random matrix
ensembles. This can be achieved by looking at the distribution of the singular values of
matrices of the ensemble, and using any known results on the concentration of the largest
and smallest eigenvalue from the singular decomposition for these random matrices.
For the cases described above, the entries of the random matrix are picked from an i.i.d.
normal distribution. Another aspect of the problem that might be of interest is forming a
random matrix with entries from a uniform distribution.
One technique of generating a sensing matrix from a uniform distribution is by forming
the sensing matrix A by sampling n column vectors uniformly at random on the unit sphere
of R
m
.
5.4.2 Laguerre ensemble
Matrices of the Laguerre ensemble are as follows:
W =
1
M
GG

where entries of G are from an i.i.d. normal distribution.


Note that matrices of this class are symmetric in nature.
Lemma 3. The Laguerre ensemble obeys the uniform uncertainty principle (UUP) with
oversampling factor = log N.
Proof. Lets look at the singular values of W

W.
W

W = (
1
M
GG

(
1
M
GG

)
=
1
M
2
G

GGG

=
1
M
2
(GG

)
2
We know from [11] that the distribution of the maximum and minimum eigenvalues of G

G
follow this:
P(
M
(X) > 1 +
_
M/N +r) e
Nr
2
/2
P(
1
(X) < 1
_
M/N r) e
Nr
2
/2
where
1
(X)
2
(X)...
M
(X).
Then, for W

W it is
P(
2
M
(X) > (1 +
_
M/N +r)
2
) e
Nr
2
(5.17)
P(
2
M
(X) > (1
_
M/N r)
2
) e
Nr
2
(5.18)
Now,
E
T
:=
min
(F

T
) < aK/N
max
(F

T
) > bK/N
28
where a, b are constants.
Adding equation (5.17) and (5.18)
P(E
T
) 2e
Nr
2
= 2e
Kc
where c = r
2
and K = N.
The rest of the proof from here follows exactly the Gaussian ensemble proof. Thus, the
oversampling factor is = log N.
5.4.3 Jacobi ensemble
The Jacobi ensemble is composed of Wishart matrices as shown in a previous section. In
order to prove UUP, we need to know how the composition of the Wishart matrices aects
the distribution of the maximum and minimum eigenvalue. (W(c
1
) + W(c
2
))
1
has a very
high upper bound, in the worst case, for the largest eigenvalue. If either W(c
1
) or W(c
2
) has
a small eigenvalue, then the upper bound on the eigenvalue of the inverse sum is very large.
[[(W(c
1
) +W(c
2
))
1
W(c
1
)[[ [[1/(W(c
1
) +W(c
2
))[[ [[W(c
1
)[[
However, since the Wishart matrices are generated from an i.i.d. distribution the upper
bound might generally be signicantly less than this worst case. Figure 5.5 shows that in
fact this is true. The histogram shows that it is statistically likely that, due to the generation
technique used, the minimum and maximum eigenvalues have a more optimistic bound.
This makes proving UUP for Jacobi dicult since one has to rely on a statistical likelihood
bound of the maximum and minimum eigenvalues. Assuming the likelihood of the minimum
eigenvalue greater than zero and the maximum less than some for > 0, we are trying to
gure out an expression for the probabilistic bounds under this assumption.
A further renement of random matrix generation might be to pick entries resulting from
some chaotic process, and check satisability of UUP for such matrices. In future work, this
interesting connection between random matrices and chaotic systems should be explored.
Deterministic matrices are discussed in the next section.
5.5 Deterministic Measurement Matrices
All matrices that are suitable as a sensing matrix seem to have some inherent randomness
associated with them. However, Candes and Tao conjectures in [11] that the entries of these
sensing matrices do not have to be independent or Gaussian, but completely deterministic
classes of matrices obeying the Uniform Uncertainty Principle could be constructed.
One design of deterministic measurement matrix is using chirp sequences [23]. The
idea in deterministic design is to go back to the concept of linear combinations. Since
recovering the sparse signal is nding out which small linear combinations of the columns
of the measurement matrix forms the vector of measurements, the measurement matrix can
29
be designed to facilitate this. In fact, this concept is essentially the same as the concept of
Restricted Isometry Property. In [23], the authors form the columns of the measurement
matrix using chirp signals. The algorithm for generating such deterministic matrices is also
given in the paper.
A chirp signal is dened as follows
v
m,r
(l) = e
j2ml
K
+
j2rl
2
K
where m, r Z
K
, m is the base frequency, r is the chirp rate, and K is length of signal. In
a length K signal, there are K
2
possible pairs of m and r.
Let the measurement vector be y, indexed by l, and formed by linear combination of
some chirp signal
y(l) = s
1
e
j2m
1
l
K
+
j2r
1
l
2
K
+s
2
e
j2m
2
l
K
+
j2r
2
l
2
K
+...
= s
1
v
m
1
,r
1
(l) +s
2
v
m
2
,r
2
(l) +...
Since the columns of the measurement matrix are formed using chirp signals, the coe-
cients s
i
form the sparse signal of interest.
Now, chirp rates can be recovered from y by looking at y(l)y(l +d) where the index (l +d)
is taken 2r
i
TmodK.
f(l) = y(l)y(l +T) = [s
1
[
2
e
j2
K
(m
1
T+r
1
T
2
)
e
j2(2r
1
lT)
K
+[s
2
[
2
e
j2
K
(m
2
T+r
2
T
2
)
e
j2(2r
2
lT)
K
+ cross terms
where the cross terms are the chirps, and T Z
K
, T ,= 0.
At the discrete frequencies 2r
i
TmodK, f(l) has sinusoids. If K is a prime number, then
there is a bijection between the chirp rates and FFT bins. The cross terms in the remainder
of the signal have their energy spread across all FFT bins.
Now, when y consists of suciently few chirps (that is, x is sparse), taking FFT of
f(l) results in a spectrum with signicant peaks at locations corresponding to 2r
i
TmodK.
The chirp rates can be extracted from these peaks. The signal, y(l), can be dechirped by
multiplying by e
j2r
i
l
2
K
, converting the chirp rate r
i
to sinusoids. Once FFT is performed on
the resulting signal, the values of m
i
and s
i
can be retrieved. The sparse signal can then be
reconstructed.
The authors have proved Restricted Isometry Property (RIP) approximately, i.e. in a
statistical sense, for the sensing matrix formed using chirp signals as its columns. See [23]
for details.
There also exists explicit constructions of deterministic matrices whose entries are 0 or
1. However, it has been shown in [13] that such explicitly constructed 0,1 matrices require
more rows than O(S log(n/S)) to satisfy the Restricted Isometry Property (RIP), and such
matrices cannot achieve good performance with respect to RIP. This is articulated in the
following theorem:
Theorem 4 ( [13]). Let A be an m n 0,1-matrix that satises (S,D)-RIP. Then, m
min
_
k
2
D
4
,
n
D
2
_
where (S,D)-RIP is dened as
c
2
[[Ax[[
2
2
[[x[[
2
2
c
2
D
2
[[Ax[[
2
2
30
.
In order to create a good 0,1 sensing matrix, the number of rows m has to satisfy the
minimum condition as stated in the theorem. However, generation of such a matrix might
not be practically ecient.
As we have already seen, Restricted Isometry Property (RIP) and Uniform Uncertainty
Property (UUP) are somewhat dicult to prove. There are deterministic methods of gener-
ating matrices, for example from chaotic processes, but proving randomness for such systems
is hard. If we look at random matrix theory, we already know the randomness associated with
the process that is generating the matrices. This is one reason measurement matrices are
likely to have more prominent candidates in random matrix theory. Designing deterministic
measurement matrices that satisfy the Uniform Uncertainty Property (UUP) or Restricted
Isometry Property (RIP) is still an open problem.
5.6 Measurement of Randomness
Apart from guring out how to design good measurement matrices for compressive sensing,
it would be convenient to have some measurement of the amount of randomness exhib-
ited by the matrices. As we have seen, presence of high randomness is a good feature for
measurement matrices. Measurement of randomness can be a measure of the goodness of
a matrix as a sensing matrix. There are two ways to look at randomness measurement:
entropy approach and computability approach.
5.6.1 Entropy Approach
A common quantitative approach to randomness is to quantify the entropy of the sys-
tem. One idea is to look at the eigenvalue distribution for these matrices and quantify the
randomness using this spread of the eigenvalues. Another option might be to look at the
oversampling factor, from Uniform Uncertainty Principle (UUP). The idea of oversampling
factor is from an equation from [11]
[[f f
#
[[
l
2
C R (/K)
r
(5.19)
where [[f[[
l
1
R, r := 1/m 1/2, and C is some constant. The reconstruction error,
[[f f
#
[[
l
2
[[, depends on the oversampling factor involved.
For the Gaussian and Binary random matrix ensembles, = log N, and for the Fourier
random matrix ensemble = (log N)
6
. Just looking at these values, it seems reconstruction
error is minimum for the Gaussian and Binary ensemble, and higher for the Fourier ensemble.
From a practical point of view, this entropy based on the oversampling factor suggests that
the Gaussian and Binary ensembles are equally good measurement matrix ensembles, and
the Fourier ensemble is worse compared to both.
Randomness can also be quantied by calculating the entropy of the system from the
probability distribution. In quantifying randomness of measurement matrices, one can imag-
ine a R
mn
dimensional space where all m n matrices of some ensemble reside. There is
31
a sense of distribution of these matrices in the higher dimensional space, and we want to
capture the randomness associated with each matrix in this distribution. Both the Gaussian
and Binary measurement matrices are generated by taking entries from a Gaussian prob-
ability distribution and a Bernoulli probability distribution respectively. Since the entries
of the matrices are chosen from the probability distributions, intuitively it seems as if the
inherent randomness of the sensing matrices generated using these particular distributions
should correspond to the amount of randomness within the probability distributions. In
general, entropy is calculated as follows,
H =

i
p
i
ln p
i
where p
i
is the probability of each event occurring. For a continuous case, this becomes
H =
_
P(j) ln P(j)dj
for some event j.
It can be shown that the entropy of each matrix in our R
mn
space is the sum of the
entropy associated with selecting each entry of the matrix. The entropy of the matrix
increases as more entries are selected. Since the matrices of interest to us are of size mn,
where m << n, the scaling factor is mn for entropy of the matrix as a whole.
The probability of an entry is the associated probability of each entry being selected from
a particular distribution. For the Gaussian distribution, with mean zero, the pdf is given by
P(x) =
1

2
e
x
2
2
2
and the entropy is
H
gaussian
= ln(

2e).
Assuming our measurement matrices are of size d m, where d is the number of mea-
surements and m is the original signal dimension of size 32000 (from images used in 5.6.2
for recognition problem [24]), then
H
gaussian
= md ln(
1

2e) = 120, 570d


For a Bernoulli distribution, the pdf is
P(k) =
_
(1 p) k = 1/

m
p k = 1/

m
and the entropy measurement is
H
binary
= md (1 p) ln(1 p) p ln(p) = 22, 180d
32
assuming that there is a 0.50 probability of selection of either value as entries of the Binary
measurement matrix. This simple calculation shows that even though the oversampling
factor for either of these matrices is the same, in fact, the amount of randomness for the
Gaussian ensemble is signicantly more than for the Binary ensemble.
We have not yet quantied randomness for the two promising matrix ensembles we de-
scribed earlier, the Laguerre and Jacobi ensembles.
5.6.2 Computational Approach
For our computational approach, we test out the suitability of a matrix by using it as a sensing
matrix in a particular object recognition problem. A baseline for the object recognition
problem has been established using matrices from the Gaussian ensemble, which satisfy
Uniform Uncertainty Principle (UUP) and is an example of a good sensing matrix. Using this
as a benchmark, any random matrix from another ensemble can be used, and the recognition
rate compared to the benchmark to determine its eectiveness as a sensing matrix. Our code
is available as a matlab le.
Recognition test on the Binary ensemble The Binary ensemble has been proven to
satisfy Uniform Uncertainty Principle (UUP) by Candes and Tao in [11]. Sensing matrices
of the binary ensemble have entries taking values in 1/

n, 1/

n where n is the original


size of the signal.
Our recognition problem is an image based face recognition problem where the train-
ing and testing set are projected onto a lower dimensional space we call feature space,
and recognition is performed on this lower dimensional space by using sparse recovery tech-
niques [15].
For the baseline with the Gaussian ensemble using a feature space size of 30, the average
successful recognition percentage obtained is 85.43 0.64%. Using the Binary ensemble as
the sensing matrix, with the same feature space dimension, training and testing set, the
average percentage successful classication is 83.97 0.17%.
As discussed previously, sensing matrices with high randomness are good for sparse signal
recovery. Our results from the recognition problem suggests that the amount of randomness
in the Binary ensemble might be less compared to that in the Gaussian ensemble. Com-
paring this result with the oversampling factor, the dierence in the recognition rates is not
signicant to deter the usability of the Binary ensemble for practical purposes. The error
in sparse signal reconstruction is low for Gaussian ensemble in practice, as suggested by
the dierence in recognition rates, in contrast to that suggested by equation (5.19). This
also supports the previous entropy estimation of the Gaussian and Binary ensemble where
matrices from the Gaussian ensemble had higher entropy compared to those from the Binary
ensemble.
33
5.7 Summary
In this section we have looked at designing measurement matrices, our main area of interest
for this paper. We introduced the importance of a good measurement matrix, and discussed
the eectiveness of randomness as a design tool. In contrast to randomness, we have also
discussed issues related to designing a deterministic measurement matrix. Some of the
important concepts related to designing measurement matrices such as Restricted Isometry
Property (RIP), Uniform Uncertainty Property (UUP) and Exact Reconstruction Property
(ERP) are also discussed in this section. We have shown that Restricted Isometry Property is
a dicult problem to solve, and we have proved its NP-Complete nature. We also suggested
an approximation algorithm, which has a polynomial time bound, for estimating Restricted
Isometry Property (RIP) for any given matrix. We also explored relation of random matrix
theory with measurement matrices. We have suggested a proof template for proving Uniform
Uncertainty Principle (UUP) for any suitable random matrix ensemble. In order to quantify
randomness of dierent measurement matrices, we suggested techniques based on an entropy
calculation approach and a computational approach that involves a recognition algorithm.
34
Chapter 6
Applications of Compressive Sensing
In this section we briey explore some of the applications of compressive sensing. Applica-
tions of compressive sensing range from coding theory to facial recognition. One research
group has developed a single pixel based camera based on the ideas from compressive sensing
[2]. Compressive sensing also looks promising in the eld of wireless sensor networks since
compressive sensing ideas help reduce the complexity of the end sensors that are deployed in
the eld. The processing can be carried out at a remote site with adequate computational
power, and signal reconstruction would still be overwhelmingly exact even if data sent from a
number of sensors get lost. In medical imaging devices such as Magnetic Resonance Imaging
(MRI) [14], compressive sensing carry great promise in better image reconstruction. More
current application of compressive sensing is in facial recognition and human action classier
system [15], [16] respectively. The design techniques explored in this paper for measurement
matrices are equally important and applicable in a lot of these applications.
35
Chapter 7
Conclusion
Compressive sensing is a new tool that makes measurement process ecient and results in
good data reconstruction given there is a certain structure to the data of interest. Being
able to design good measurement matrices is imperative to obtain good reconstruction of
the data. Randomness is an eective tool in signal projection and reconstruction. We have
shown that the Restricted Isometry Property (RIP) is a dicult property to determine, and
in fact, it is a NP-Complete problem. However, we suggested an approximation algorithm to
estimate RIP. We developed techniques to quantify entropy of matrices of a random matrix
ensemble by looking at the entropy of the associated probability, as well as a recognition
algorithm based technique to quantify randomness. Using our techniques, one can determine
eectiveness of potential matrices as sensing matrices. In this paper, we explored dierent
techniques and issues related to generation and design of eective sensing matrices. We hope
our work will motivate people to further look into relations between sensing matrices and
random (or even chaotic) processes.
36
Appendix A: Proof outline of
Shannons Theorem
We restate Nyquist-Shannon Theorem as follows
1
:
Theorem. Let x(t) denote any continuous signal having a continuous Fourier transform
X(j) =
_

x(t)e
jt
dt
Let
x
d
(n) = x(nT), n = ..., 2, 1, 0, 1, 2, ...
denote the samples of x(t) at uniform intervals of T seconds. x(t) can be reconstructed from
its samples x
d
(n) if X(j) = 0 for all [[ /T.
Proof. First write the discrete-time spectrum X
d
(e
j
) in terms of the continuous-time spec-
trum X(j) as
X
d
(e
j
d
T
) =
1
T

m=
X[j(
d
+m
s
)]
This can be reduced down to
X
d
(e
j
d
T
) =
1
T
X(j
d
),
d
(

T
,

T
)
Now, the spectrum of the sampled signal x(nT) coincides with the nonzero spectrum of
the continuous-time signal x(t). That is, the DTFT of x(nT) is equal to the FT of x(t)
between plus and minus half of the sample rate. The spectral information is preserved which
makes it possible to go back and forth between the continuous waveform and the samples.
To reconstruct x(t) from its samples x(nT), one needs to take the inverse Fourier trans-
form of the zero-extended DTFT since,
x(t) = IFT
t
(X) =
1
2
_

X(j)e
jt
d =
1
2
_
s/2
s/2
X
d
(e
j
)e
jt
d = IDTFT
t
(X
d
)
1
Center for Computer research in Music and Acoustics (CCRMA), Stanford University
37
Now,
x(t) = IDTFT
t
(X
d
)
=
1
2
_

X
d
(e
j
)e
jt
d
=
T
2
_
/T
/T
X
d
(e
j
d
T
e
j
d
t
)d
d
=
T
2
_
/T
/T
_

n=
x(nT)e
j
d
nT
_
e
j
d
t
d
d
=

n=
x(nT)
T
2
_
/T
pi/T
e
j
d
(tnT)
d
d
=

n=
x(nT)h(t nT)
= (x h)(t)
h(t nT) is dened as follows,
h(t nT) =
T
2
_
/T
/T
e
j
d
(tnT)
d
d
=
T
2
2
2j(t nT)
_
e
j
tnT
T
e
j
tnT
T
_
=
sin (
t
T
n)
(
t
T
n)
= sinc(
t nT
T
) = sinc(
t
T
n)
Thus,
h(t) = sinc
t
T
, where sinc(t) =
sin t
t
This shows that when x(t) is bandlimited to less than half the sampling rate, the samples
obtained can be used to reconstruct the original continuous-time signal x(t).
38
Bibliography
[1] Emmanuel Candes and Michael Wakin. An introduction to compressive sampling. IEEE
Signal Processing Magazine. 25(2). pp. 21 - 30. March 2008.
[2] Richard Baraniuk. Compressive sensing. IEEE Signal Processing Magazine. 24(4). pp.
118-121. July 2007.
[3] Justin Romberg. Imaging via compressive sampling. IEEE Signal Processing Magazine.
25(2). pp. 14 - 20. March 2008.
[4] Emmanuel Candes. Compressive sampling. Proceedings International Congress of Math-
ematics. 3. pp. 1433-1452. Madrid. Spain. 2006.
[5] Emmanuel Candes and Justin Romberg. Sparsity and incoherence in compressive sam-
pling. Inverse Problems. 23(3). pp. 969-985. 2007.
[6] E. J. Candes and J. Romberg. Practical signal recovery from random projections.
Wavelet Applications in Signal and Image Processing XI. Proc. SPIE Conf. 5914.
[7] Emmanuel Candes and Terence Tao. Decoding by linear programming. IEEE Trans. on
Information Theory. 51(12). pp. 4203 - 4215. December 2005.
[8] David Donoho. For most large underdetermined systems of linear equations, the minimal
l
1
norm solution is also the sparsest solution. Communications on Pure and Applied
Mathematics. 59(6). pp. 797-829. June 2006.
[9] David Donoho. Compressed sensing. IEEE Trans. on Information Theory. 52(4). pp.
1289 - 1306. April 2006.
[10] Emmanuel Candes, Justin Romberg, and Terence Tao, Robust uncertainty principles:
Exact signal reconstruction from highly incomplete frequency information. IEEE Trans.
on Information Theory. 52(2). pp. 489 - 509. February 2006.
[11] Emmanuel Candes and Terence Tao. Near optimal signal recovery from random projec-
tions: Universal encoding strategies? IEEE Trans. on Information Theory. 52(12). pp.
5406 - 5425. December 2006.
39
[12] Emmanuel Candes. The restricted isometry property and its implications for compressed
sensing. Compte Rendus de lAcademie des Sciences. Paris. Series I. 346. pp. 589-592.
2008.
[13] Venkat Chandar. A negative result concerning explicit matrices with the restricted isom-
etry property. Preprint. 2008.
[14] Emmanuel Candes, Justin Romberg, and Terence Tao, Stable signal recovery from in-
complete and inaccurate measurements. Communications on Pure and Applied Mathe-
matics. 59(8). pp. 1207-1223. August 2006.
[15] John Wright, Allen Yang, Arvind Ganesh, Shankar Sastry and Yi Ma. Robust Face
Recognition via Sparse Representation. To appear in IEEE Transactions on Pattern
Analysis and Machine Intelligence (PAMI), 2008.
[16] Allen Yang, Sameer Iyengar, Shankar Sastry, Ruzena Bajcsy, Philip Kuryloski and
Roozbeh Jafari. Distributed Segmentation and Classication of Human Actions Using
a Wearable Motion Sensor Network. Workshop on Human Communicative Behavior
Analysis. IEEE Conference on Computer Vision and Pattern Recognition (CVPR). June
2008.
[17] Jerey Ho, Ming-Hsuan Yang, Jongwoo Lim, Kuang-Chih Lee, and David Kriegman.
Clustering Appearances of Objects Under Varying Illumination Conditions. Proceed-
ings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR03)
[18] Ke Huang and Selin Aviyente. Sparse Representation for Signal Classication. Proceed-
ings of Neural Information Processing Systems Conference. 2006.
[19] Dimitri Pissarenko. Eigenface-based facial recognition.
[20] M. Turk and A. Pentland. Eigenfaces for recognition. In Proceedings of IEEE Interna-
tional Conference on Computer Vision and Pattern Recognition. 1991.
[21] P. Belhumeur, J. Hespanda, and D. Kriegman. Eigenfaces vs. Fisherfaces: recognition
using class specic linear projection. IEEE Transactions on Pattern Analysis and Ma-
chine Intelligence. Vol. 19. No. 7. pp. 711-720. 1997.
[22] X. He, S. Yan, Y. Hu, P. Niyogi, and H. Zhang. Face recognition using Laplacianfaces.
IEEE Transactions on Pattern Analysis and Machine Intelligence. Vol. 27. No. 3. pp.
328-340. 2005.
[23] Lorne Applebaum, Stephen Howard, Stephen Searle, and Robert Calderbank. Chirp
Sensing Codes: Deterministic Compressed Sensing Measurements for Fast Recovery.
Applied and Computational Harmonic Analysis. Vol. 26. Issue 2. pp. 282-290. March
2008.
40
[24] Ragib Morshed, Tzu-Yi Chen. Senior Thesis in Computer Science on compressive sens-
ing based face recognition. Pomona College. 2009.
41

You might also like