You are on page 1of 46

Sparse Representations

in signal processing

Bogdan Dumitrescu

University Politehnica of Bucharest, Romania


bogdan.dumitrescu@acse.pub.ro

Sparse Representations 1 / 46
Très Petit Larousse

Sparse representation = représentation


creuse
parcimonieuse
éparse
creux = cavité, partie vide ou concave d’une surface;
renfoncement, trou
parcimonie = épargne, économie minutieuse
épars = répandu de tous côtés, dispersé (lat. sparsus)

Sparse Representations 2 / 46
Contents

1 Why sparsity?

2 Sparse representations

3 Sparse representation algorithms

4 Dictionary learning

5 Applications

Sparse Representations 3 / 46
Why sparsity?

Many signals or images are sparse in some transform domain


Sometimes we know the transform, see next examples
In many other cases
we don’t know the sparsifying transform
we only guess that a sparse representation exists

Sparse Representations 4 / 46
Sound
A musical instrument has a harmonic spectrum
The basic frequency and a few coefficients are enough to rebuild
the sound

Sparse Representations 5 / 46
Speech
Speech is not harmonic, especially consonants
However, spectrum is relatively sparse

Sparse Representations 6 / 46
Images
The image itself is not at all sparse
In transform domain, most coefficients are (almost) zero

Sparse Representations 7 / 46
Images
What is sparsity?
DFT 2D example
• Sparsity implies many zeros in a vector or a matrix

FFT Sparse
transform representation

Fingerprint patch FFT response IFFT


transform

Usage:
Compression
Analysis
Denoising
…… Reconstructed patch 4

Sparse Representations 8 / 46
Transforms and representations

y ∈ Rm (frame of) a signal


Usual transforms are nice orthogonal matrices: DFT, DCT,
wavelets, etc.
F
Transform: ∈ Rm×m
x Fy
Transformed signal (analysis): =
y F x
Representation (synthesis): = H
Inverse transform is also nice

Sparse Representations 9 / 46
Transforms?

With transforms, all is nice, quick and square

= ·

y FH x

. . . but they are not always the best

Sparse Representations 10 / 46
Contents

1 Why sparsity?

2 Sparse representations

3 Sparse representation algorithms

4 Dictionary learning

5 Applications

Sparse Representations 11 / 46
Overcomplete transform

Overcomplete representation
y Dx
=
D∈ Rm×n , with m < n: fat matrix

= ·

y D x
Difficulty
Given y and D , the overcomplete system has many solutions x
Sparse Representations 12 / 46
Sparse solutions

Idea: seek only sparse solutions


x
Sparsity level s = k k0 : the number of nonzeros in x
x
Caution: s = k k0 is called the 0-norm but is not a norm
D is called dictionary; its columns are the atoms

= ·

y D x

Sparse Representations 13 / 46
The problem

Sparse representation problem


Given
signal y ∈ Rm
dictionary D ∈ Rm×n
sparsity level s
compute representation x ∈ Rn such that
the representation relation y = Dx holds
the representation is sparse: kx k0 = s

Sparse Representations 14 / 46
Do sparse solutions exist?

In general, there is no guarantee that a sparse solution exists


The solution space Dx is a union of subspaces of dimension s
(Illustration at the blackboard)
Usually we are happy with approximate solutions

Approximate sparse representation problem


y Dx
minimize k − x
k with k k0 = s

Sparse Representations 15 / 46
How to compute a sparse solution?

x D DD
Least-squares solution = T ( T −1
) y
is NOT sparse
If the support I were known, then the system is overdetermined

= ·

y DI xI

The least-squares solution xI = (DIT DI )−1DIT y is unique

Sparse Representations 16 / 46
NP-hard

How to find the support?


Exhaustive search: try all combinations of s atoms
Combinatorial complexity =⇒ NP-hard problem
We need better algorithms

Sparse Representations 17 / 46
Sparsity guarantees

x
Assume we have found a solution with s nonzeros
Is this the sparsest one?
D
spark( ) = smallest number of linearly dependent atoms

Sparsest solution
D
If s < spark( )/2, then x is the sparsest solution
That’s good news: generically each group of m atoms has
maximum rank
So, if s < (m + 1)/2, then the solution is the sparsest
There are not that many sparse solutions. . .

Sparse Representations 18 / 46
Mutual coherence

With no loss of generality, assume that atoms have unit norm


D
Mutual coherence is µ( ) = maxi6=j iT j d d
Relation with spark:

D
spark( ) ≥ 1 +
1
µ( ) D
Mutual coherence can be computed easier than the spark
Conclusion: the best dictionary would have atoms as
”orthogonal” as possible

Sparse Representations 19 / 46
Contents

1 Why sparsity?

2 Sparse representations

3 Sparse representation algorithms

4 Dictionary learning

5 Applications

Sparse Representations 20 / 46
Sparse representation algorithms

Greedy
orthogonal matching pursuit (OMP)
orthogonal least squares (OLS)
. . . and many others
Optimization based
convex relaxation
min kx k1
x
s.t. y = Dx
. . . and many others

Sparse Representations 21 / 46
Orthogonal Matching Pursuit

Finds the atoms one by one


Assume that at some point the support is I
The representation residual is

e y d
X
= − xj j
j∈I

d e d
New atom: k for which | T k | = maxj6∈I | T j | e d
New support: I ← I ∪ {k}
New optimal representation is least-squares minimizer

xI = (DIT DI )−1DIT y

Sparse Representations 22 / 46
Illustration
An example for orthogonal matching
OMP first two steps
pursuit
Dictionary elements
Patch from latent
c1=-0.039 c2= 0.577 c3=0.054 c4=-0.031 c5 =-0.437
Correlation ci= diT x ……
x d1 d2 d3 d4 d5

Residual r = - ×0.577
c1=-0.035 c2= 0 c3=0.037 c4=-0.046 c5 =-0.289

Correlation ci= diT r ……


d1 d2 d3 d4 d5

Residual r = - × 0.499 - ×(-0.309)


|| x − xˆ ||2 =
0.759
Reconstructed patch x̂ = × 0.499 + × (-0.309)

Sparse Representations 23 / 46
Why convex relaxation?

Remind the convex relaxation


min k k1
x
x
s.t. y = Dx
Convex optimization is nice: usually unique solution and reliable
algorithms
1-norm is the ”nearest” true norm from the 0-norm
Why would the solution be sparse?

Sparse Representations 24 / 46
p-norms

x
Pm
p-norms are k kp = ( i=1 |xi |p )
1/p

12
Shapes of |xi |p 1 Prologue

1.5 p=1

1 p=0.1

p=0.5

0.5
p=2

0
−1.5 −1 −0.5 0 0.5 1 1.5

Fig. 1.3 The behavior of |x| p for various values of p. As p tends to zero, |x| p approaches the
Sparse Representations 25 / 46
indicator function, which is 0 for x = 0 and 1 elsewhere.
1-norm relaxation
y = Dx is a hyperplane
Find smallest p-norm ball that still intersects the hyperplane
(is tangent, in fact)
1.6 Promoting Sparse Solutions 11

Fig. 1.2 The intersection between the ` p -ball and the set Ax = b defines the solution of (P p ). This
intersection is demonstrated here in 3D for
Sparse p = 2 (top left), p = 1.5 (top right), p = 1 (bottom
Representations 26 / 46
left), and p = 0.7 (bottom right). When p ≤ 1, the intersection takes place at a corner of the ball,
Lasso

Usually the sparse representation is not exact (noise)


Lasso problem

min k −
x
y Dx k2 + λkx k1
Again convex optimization
λ > 0 is a sparsifying parameter
λ small ⇒ (almost) full solution
λ big ⇒ (too) sparse solution

Sparse Representations 27 / 46
Contents

1 Why sparsity?

2 Sparse representations

3 Sparse representation algorithms

4 Dictionary learning

5 Applications

Sparse Representations 28 / 46
What kind of dictionaries?

Preset
Made from the rows of a classic transform
Random
Especially built e.g. for incoherence

Learned
Learned from training signals for each specific application

Sparse Representations 29 / 46
Fixed vs learned

Fig. 2. Sparse K-SVD Algorithm.


DCT vs learned dictionary

a concrete figu
chine (an Inte
this assumptio
is given b

The base di
pact represent
implicit dictio
2
Fig. 3. (left) Overcomplete DCT dictionary for 8 8 image patches. (right) ranging from l
Sparse dictionary trained over the overcomplete DCT using Sparse K-SVD. lowing analys
Dictionary atoms are represented using 6 coefficients each. Marked atoms are dictionaries, w
magnified in Fig. 4.
1) Separab
Sparse Representations necker
30 / 46produc
Why learned dictionaries?

Advantages
Maximize performance for the application at hand
Learning can be done before application

Drawbacks
No structure, hence no fast algorithms
Learning dictionaries takes time and might be hard

Sparse Representations 31 / 46
Dictionary learning

Input
Data set Y ∈ Rm×N , the training signals
Sparsity level s

Dictionary learning problem


Solve
min k
D ,X
Y − DX k2F
s.t. kx` k0 ≤ s, ` = 1 : N (DL)
kdj k2 = 1, j = 1 : n

Sparse Representations 32 / 46
Representation of many signals

The matrix X has s nonzeros on each column

≈ ·

Y D X

Sparse Representations 33 / 46
Representation of a signal

The problem can be solved separately for each signal

≈ ·

Y D X

Sparse Representations 34 / 46
Short problem analysis

NP-hard due to the sparsity constraint


If sparsity pattern Ω is fixed, the problem is bi-quadratic, hence
still nonconvex
The problem is convex
in D , if X is fixed and normalization ignored
in X , if D and Ω are fixed
Difficulties
Many local minima, at least one for each Ω
Big size, many variables
Example: m = 64, n = 128, N = 10000, s = 6
D is 64 × 128 full matrix ⇒ 8192 variables
X has 60,000 nonzeros in 640,000 possible positions
Sparse Representations 35 / 46
Subproblems: sparse coding

Sparse coding (representation)


With fixed dictionary, compute sparse representations

min k
X
Y − DX k2F
(SC)
s.t. kxi k0 ≤ s, 1 ≤ i ≤ N

Useful also after learning, when the designed dictionary is used


in applications
Preferred algorithm: OMP, mostly due to its speed

Sparse Representations 36 / 46
Subproblems: dictionary update

Dictionary update
With fixed sparsity pattern Ω

min
D ,(X )
kY − DX k2F
(DU)
s.t. XΩ = 0
c

Atoms are always updated


Representation coefficients may be changed or not

Sparse Representations 37 / 46
Basic DL algorithm

Simple idea: alternate between sparse coding and dictionary


update

Initial dictionary
random atoms
random selection of signals

Stopping criteria
Fixed number of iterations
Reason: usually, error not monotonously decreasing

Sparse Representations 38 / 46
Basic algorithm structure

Algorithm 1: DL by alternate optimization


D
Data: initial dictionary ∈ Rm×n
signals setY ∈ Rm×N
sparsity level s
D
Result: trained dictionary , sparse representations X
1 for k = 1 to K do
2 D
Sparse coding: with fixed , solve (SC) for X
3 Dictionary update: with fixed nonzero pattern Ω, solve (DU) for
new dictionary D and possibly new X
4 d d d
Atoms normalization (if not done in DU): j ← j /k j k, j = 1 : n

Sparse Representations 39 / 46
Contents

1 Why sparsity?

2 Sparse representations

3 Sparse representation algorithms

4 Dictionary learning

5 Applications

Sparse Representations 40 / 46
Applications

Applications
denoising
compression
coding
classification
compressive sampling
. . . and many others
Main idea: sparse representation captures the essential
information in a signal (image)

Sparse Representations 41 / 46
Denoising
Application---Denoising
Sparse representation removes noise, which is not well
represented by (learned) atoms
Source

Result 30.829dB

Dictionary

PSNR = 22.1dB
Noisy image

Sparse Representations 42 / 46
[M. Elad, Springer 2010]
Compression
Application---Compression
Sparse representation contains few information: position of the
atoms and their coefficients
JPEG Dictionary
Original JPEG 2000 PCA based
550 bytes per
image

15.81 13.89 10.66 6.60

14.67 12.41 9.44 5.49

Bottom:
RMSE values
15.30 12.57 10.27 6.36

[O. Bryta, M. Elad, 2008]

Sparse Representations 43 / 46
Compressive sampling (compressed sensing)

η initial signal, huge (imagine many megapixels camera)


If we store the whole signal ⇒ lots of data
Instead of compressing after acquisition, compress AT acquisition
F sparsifying transform (e.g. DCT): = F η x
D overcomplete transform
y D
Acquired signal: = η = DF x
T
, small size
D
The ”computation” η is done in hardware in single-pixel
cameras

Sparse Representations 44 / 46
Decoder

x
Decoding: first compute sparse solving the system =y DF x T

F x
then recover η = T via the inverse transform
Dumb encoder, smart decoder
y
Robustness: all elements of have equal value, unlike those of x
Beats Nyquist bound

Sparse Representations 45 / 46
Very short bibliography

A.M. Bruckstein, D.L. Donoho, and M. Elad. “From Sparse


Solutions of Systems of Equations to Sparse Modeling of Signals and
Images”. In: SIAM Rev. 51.1 (2009), pp. 34–81

R. Rubinstein, A.M. Bruckstein, and M. Elad. “Dictionaries for


Sparse Representations Modeling”. In: Proc. IEEE 98.6 (June
2010), pp. 1045–1057

M.F. Duarte et al. “Single-Pixel Imaging via Compressive Sampling”.


In: IEEE Signal Proc. Mag. (Mar. 2008), pp. 83–91

Sparse Representations 46 / 46

You might also like