Representations Creuses Dans Le TduS BD

Sparse Representations
in signal processing
Bogdan Dumitrescu
University Politehnica of Bucharest, Romania

bogdan.dumitrescu@acse.pub.ro
Sparse Representations 1 / 46
Très Petit Larousse
Sparse representation = représentation

creuse
parcimonieuse
éparse
creux = cavité, partie vide ou concave d’une surface;
renfoncement, trou
parcimonie = épargne, économie minutieuse
épars = répandu de tous côtés, dispersé (lat. sparsus)
Contents
1 Why sparsity?
2 Sparse representations
3 Sparse representation algorithms
4 Dictionary learning
5 Applications
Why sparsity?
Many signals or images are sparse in some transform domain

Sometimes we know the transform, see next examples
In many other cases
we don’t know the sparsifying transform
we only guess that a sparse representation exists
Sound
A musical instrument has a harmonic spectrum
The basic frequency and a few coefficients are enough to rebuild
the sound
Speech
Speech is not harmonic, especially consonants
However, spectrum is relatively sparse
Images
The image itself is not at all sparse
In transform domain, most coefficients are (almost) zero
Images
What is sparsity?
DFT 2D example
• Sparsity implies many zeros in a vector or a matrix
FFT Sparse
transform representation
Fingerprint patch FFT response IFFT

transform
Usage:
Compression
Analysis
Denoising
…… Reconstructed patch 4
Transforms and representations
y ∈ Rm (frame of) a signal

Usual transforms are nice orthogonal matrices: DFT, DCT,
wavelets, etc.
F
Transform: ∈ Rm×m
x Fy
Transformed signal (analysis): =
y F x
Representation (synthesis): = H
Inverse transform is also nice
Transforms?
With transforms, all is nice, quick and square
= ·
y FH x
. . . but they are not always the best
Contents
1 Why sparsity?
5 Applications
Overcomplete transform
Overcomplete representation
y Dx
=
D∈ Rm×n , with m < n: fat matrix
= ·
y D x
Difficulty
Given y and D , the overcomplete system has many solutions x
Sparse solutions
Idea: seek only sparse solutions

x
Sparsity level s = k k0 : the number of nonzeros in x
x
Caution: s = k k0 is called the 0-norm but is not a norm
D is called dictionary; its columns are the atoms
= ·
y D x
The problem
Sparse representation problem

Given
signal y ∈ Rm
dictionary D ∈ Rm×n
sparsity level s
compute representation x ∈ Rn such that
the representation relation y = Dx holds
the representation is sparse: kx k0 = s
Do sparse solutions exist?
In general, there is no guarantee that a sparse solution exists

The solution space Dx is a union of subspaces of dimension s
(Illustration at the blackboard)
Usually we are happy with approximate solutions
Approximate sparse representation problem

y Dx
minimize k − x
k with k k0 = s
How to compute a sparse solution?
x D DD
Least-squares solution = T ( T −1
) y
is NOT sparse
If the support I were known, then the system is overdetermined
= ·
y DI xI
The least-squares solution xI = (DIT DI )−1DIT y is unique
NP-hard
How to find the support?

Exhaustive search: try all combinations of s atoms
Combinatorial complexity =⇒ NP-hard problem
We need better algorithms
Sparsity guarantees
x
Assume we have found a solution with s nonzeros
Is this the sparsest one?
D
spark( ) = smallest number of linearly dependent atoms
Sparsest solution
D
If s < spark( )/2, then x is the sparsest solution
That’s good news: generically each group of m atoms has
maximum rank
So, if s < (m + 1)/2, then the solution is the sparsest
There are not that many sparse solutions. . .
Mutual coherence
With no loss of generality, assume that atoms have unit norm

D
Mutual coherence is µ( ) = maxi6=j iT j d d
Relation with spark:
D
spark( ) ≥ 1 +
1
µ( ) D
Mutual coherence can be computed easier than the spark
Conclusion: the best dictionary would have atoms as
”orthogonal” as possible
Contents
1 Why sparsity?
5 Applications
Sparse representation algorithms
Greedy
orthogonal matching pursuit (OMP)
orthogonal least squares (OLS)
. . . and many others
Optimization based
convex relaxation
min kx k1
x
s.t. y = Dx
Orthogonal Matching Pursuit
Finds the atoms one by one

Assume that at some point the support is I
The representation residual is
e y d
X
= − xj j
j∈I
d e d
New atom: k for which | T k | = maxj6∈I | T j | e d
New support: I ← I ∪ {k}
New optimal representation is least-squares minimizer
xI = (DIT DI )−1DIT y
Illustration
An example for orthogonal matching
OMP first two steps
pursuit
Dictionary elements
Patch from latent
c1=-0.039 c2= 0.577 c3=0.054 c4=-0.031 c5 =-0.437
Correlation ci= diT x ……
x d1 d2 d3 d4 d5
Residual r = - ×0.577
c1=-0.035 c2= 0 c3=0.037 c4=-0.046 c5 =-0.289
Correlation ci= diT r ……

d1 d2 d3 d4 d5
Residual r = - × 0.499 - ×(-0.309)

|| x − xˆ ||2 =
0.759
Reconstructed patch x̂ = × 0.499 + × (-0.309)
Why convex relaxation?
Remind the convex relaxation

min k k1
x
x
s.t. y = Dx
Convex optimization is nice: usually unique solution and reliable
algorithms
1-norm is the ”nearest” true norm from the 0-norm
Why would the solution be sparse?
p-norms
x
Pm
p-norms are k kp = ( i=1 |xi |p )
1/p
12
Shapes of |xi |p 1 Prologue
1.5 p=1
1 p=0.1
p=0.5
0.5
p=2
0
−1.5 −1 −0.5 0 0.5 1 1.5
Fig. 1.3 The behavior of |x| p for various values of p. As p tends to zero, |x| p approaches the
indicator function, which is 0 for x = 0 and 1 elsewhere.
1-norm relaxation
y = Dx is a hyperplane
Find smallest p-norm ball that still intersects the hyperplane
(is tangent, in fact)
1.6 Promoting Sparse Solutions 11
Fig. 1.2 The intersection between the ` p -ball and the set Ax = b defines the solution of (P p ). This
intersection is demonstrated here in 3D for
Sparse p = 2 (top left), p = 1.5 (top right), p = 1 (bottom
Representations 26 / 46
left), and p = 0.7 (bottom right). When p ≤ 1, the intersection takes place at a corner of the ball,
Lasso
Usually the sparse representation is not exact (noise)

Lasso problem
min k −
x
y Dx k2 + λkx k1
Again convex optimization
λ > 0 is a sparsifying parameter
λ small ⇒ (almost) full solution
λ big ⇒ (too) sparse solution
Contents
1 Why sparsity?
5 Applications
What kind of dictionaries?
Preset
Made from the rows of a classic transform
Random
Especially built e.g. for incoherence
Learned
Learned from training signals for each specific application
Fixed vs learned
Fig. 2. Sparse K-SVD Algorithm.

DCT vs learned dictionary
a concrete figu
chine (an Inte
this assumptio
is given b
The base di
pact represent
implicit dictio
2
Fig. 3. (left) Overcomplete DCT dictionary for 8 8 image patches. (right) ranging from l
Sparse dictionary trained over the overcomplete DCT using Sparse K-SVD. lowing analys
Dictionary atoms are represented using 6 coefficients each. Marked atoms are dictionaries, w
magnified in Fig. 4.
1) Separab
Sparse Representations necker
30 / 46produc
Why learned dictionaries?
Advantages
Maximize performance for the application at hand
Learning can be done before application
Drawbacks
No structure, hence no fast algorithms
Learning dictionaries takes time and might be hard
Dictionary learning
Input
Data set Y ∈ Rm×N , the training signals
Sparsity level s
Dictionary learning problem

Solve
min k
D ,X
Y − DX k2F
s.t. kx` k0 ≤ s, ` = 1 : N (DL)
kdj k2 = 1, j = 1 : n
Representation of many signals
The matrix X has s nonzeros on each column
≈ ·
Y D X
Representation of a signal
The problem can be solved separately for each signal
≈ ·
Y D X
Short problem analysis
NP-hard due to the sparsity constraint

If sparsity pattern Ω is fixed, the problem is bi-quadratic, hence
still nonconvex
The problem is convex
in D , if X is fixed and normalization ignored
in X , if D and Ω are fixed
Difficulties
Many local minima, at least one for each Ω
Big size, many variables
Example: m = 64, n = 128, N = 10000, s = 6
D is 64 × 128 full matrix ⇒ 8192 variables
X has 60,000 nonzeros in 640,000 possible positions
Subproblems: sparse coding
Sparse coding (representation)

With fixed dictionary, compute sparse representations
min k
X
Y − DX k2F
(SC)
s.t. kxi k0 ≤ s, 1 ≤ i ≤ N
Useful also after learning, when the designed dictionary is used

in applications
Preferred algorithm: OMP, mostly due to its speed
Subproblems: dictionary update
Dictionary update
With fixed sparsity pattern Ω
min
D ,(X )
kY − DX k2F
(DU)
s.t. XΩ = 0
c
Atoms are always updated

Representation coefficients may be changed or not
Basic DL algorithm
Simple idea: alternate between sparse coding and dictionary

update
Initial dictionary
random atoms
random selection of signals
Stopping criteria
Fixed number of iterations
Reason: usually, error not monotonously decreasing
Basic algorithm structure
Algorithm 1: DL by alternate optimization

D
Data: initial dictionary ∈ Rm×n
signals setY ∈ Rm×N
sparsity level s
D
Result: trained dictionary , sparse representations X
1 for k = 1 to K do
2 D
Sparse coding: with fixed , solve (SC) for X
3 Dictionary update: with fixed nonzero pattern Ω, solve (DU) for
new dictionary D and possibly new X
4 d d d
Atoms normalization (if not done in DU): j ← j /k j k, j = 1 : n
Contents
1 Why sparsity?
5 Applications
Applications
Applications
denoising
compression
coding
classification
compressive sampling
Main idea: sparse representation captures the essential
information in a signal (image)
Denoising
Application---Denoising
Sparse representation removes noise, which is not well
represented by (learned) atoms
Source
Result 30.829dB
Dictionary
PSNR = 22.1dB
Noisy image
[M. Elad, Springer 2010]
Compression
Application---Compression
Sparse representation contains few information: position of the
atoms and their coefficients
JPEG Dictionary
Original JPEG 2000 PCA based
550 bytes per
image
15.81 13.89 10.66 6.60
14.67 12.41 9.44 5.49
Bottom:
RMSE values
15.30 12.57 10.27 6.36
[O. Bryta, M. Elad, 2008]
Compressive sampling (compressed sensing)
η initial signal, huge (imagine many megapixels camera)

If we store the whole signal ⇒ lots of data
Instead of compressing after acquisition, compress AT acquisition
F sparsifying transform (e.g. DCT): = F η x
D overcomplete transform
y D
Acquired signal: = η = DF x
T
, small size
D
The ”computation” η is done in hardware in single-pixel
cameras
Decoder
x
Decoding: first compute sparse solving the system =y DF x T
F x
then recover η = T via the inverse transform
Dumb encoder, smart decoder
y
Robustness: all elements of have equal value, unlike those of x
Beats Nyquist bound
Very short bibliography
A.M. Bruckstein, D.L. Donoho, and M. Elad. “From Sparse

Solutions of Systems of Equations to Sparse Modeling of Signals and
Images”. In: SIAM Rev. 51.1 (2009), pp. 34–81
R. Rubinstein, A.M. Bruckstein, and M. Elad. “Dictionaries for

Sparse Representations Modeling”. In: Proc. IEEE 98.6 (June
2010), pp. 1045–1057
M.F. Duarte et al. “Single-Pixel Imaging via Compressive Sampling”.

In: IEEE Signal Proc. Mag. (Mar. 2008), pp. 83–91

Representations Creuses Dans Le TduS BD

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Representations Creuses Dans Le TduS BD

Uploaded by

Copyright:

Available Formats

Sparse Representations

University Politehnica of Bucharest, Romania

Sparse representation = représentation

3 Sparse representation algorithms

Many signals or images are sparse in some transform domain

Fingerprint patch FFT response IFFT

y ∈ Rm (frame of) a signal

With transforms, all is nice, quick and square

. . . but they are not always the best

3 Sparse representation algorithms

Idea: seek only sparse solutions

Sparse representation problem

In general, there is no guarantee that a sparse solution exists

Approximate sparse representation problem

The least-squares solution xI = (DIT DI )−1DIT y is unique

How to find the support?

With no loss of generality, assume that atoms have unit norm

3 Sparse representation algorithms

Finds the atoms one by one

Correlation ci= diT r ……

Residual r = - × 0.499 - ×(-0.309)

Remind the convex relaxation

Usually the sparse representation is not exact (noise)

3 Sparse representation algorithms

Fig. 2. Sparse K-SVD Algorithm.

Dictionary learning problem

The matrix X has s nonzeros on each column

The problem can be solved separately for each signal

NP-hard due to the sparsity constraint

Sparse coding (representation)

Useful also after learning, when the designed dictionary is used

Atoms are always updated

Simple idea: alternate between sparse coding and dictionary

Algorithm 1: DL by alternate optimization

3 Sparse representation algorithms

15.81 13.89 10.66 6.60

14.67 12.41 9.44 5.49

[O. Bryta, M. Elad, 2008]

η initial signal, huge (imagine many megapixels camera)

A.M. Bruckstein, D.L. Donoho, and M. Elad. “From Sparse

R. Rubinstein, A.M. Bruckstein, and M. Elad. “Dictionaries for

M.F. Duarte et al. “Single-Pixel Imaging via Compressive Sampling”.

You might also like