You are on page 1of 134

Sparse & Redundant Representations

and Their Applications in


Signal and Image Processing
Overview of this Field and this Course

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
What This Field is All About ?

Take 1: A New Transform

Michael Elad | The Computer-Science Department | The Technion


The Answer is Not Trivial

Depends whom you ask, as the researchers


in this field come from various disciplines:
• Mathematics
• Applied Mathematics
• Statistics
• Signal & Image Processing: CS, EE, Bio-medical, …
• Computer-Science Theory
• Machine-Learning
• Physics (optics)
• Geo-Physics
• Physics - Astronomy
• Psychology (neuroscience)
• …

Michael Elad | The Computer-Science Department | The Technion


My Answer (For Now)

A New Transform for Signals


 We are all well-aware of the idea of transforming
a signal and changing its representation
 We apply a transform to gain something –
efficiency, simplicity of the processing, speed, …
 There are many known transforms: Fourier, DCT,
Hadamard, Wavelets and its descendants, …
 Our message: There is a new transform in town,
based on sparse and redundant representations

Michael Elad | The Computer-Science Department | The Technion


Transforms – The General Picture

Linear
n
D  n

Invertible Transforms
 x
Linear
Unitary

Separable
Structured

Michael Elad | The Computer-Science Department | The Technion


Redundancy?

 In a redundant transform, the representation vector is


longer (m>n) m

 This can still be done


while preserving n
D  n

the linearity of the


transform: x
x  D
 n 
 DD x

m
 D†
I x
 x

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representation

n D  n

x

 We shall keep the linearity of the inverse-transform
 As for the forward (computing  from x), there are
infinitely many possible solutions
 We shall seek the sparsest of all solutions – the one
with the fewest non-zeros

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representation

n D  n

x

 This makes the forward transform a highly non-linear
operation
 The field of sparse and redundant representations
is all about defining clearly this transform, solving various
theoretical and numerical issues related to it, and
showing how to use it in practice

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representation

Sounds great !?
No, in fact, it sounds boring !!!!
Who cares about a new
transform?

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Overview of this Field and this Course

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
What This Field is All About ?

Take 2: Modeling Data

Michael Elad | The Computer-Science Department | The Technion


Lets Take a Wider Perspective
Voice Signal Still Image
Stock Market

CT
Radar Imaging
Heart Signal Traffic Information

 We are surrounded by various sources of


massive information of different nature
 All these sources have some internal structure,
which can be exploited
Michael Elad | The Computer-Science Department | The Technion
Model?

Effective removal of noise (and many other applications)


relies on an proper modeling of the signal

Michael Elad | The Computer-Science Department | The Technion


Which Model to Choose?

 There are many different Principal-Component-Analysis


ways to mathematically Anisotropic diffusion
model signals and images Markov Random Field
with varying degrees of Wienner Filtering
success DCT and JPEG
Wavelet & JPEG-2000
 The following is a partial list
Piece-Wise-Smooth
of such models (for images):
C2-smoothness
 Good models should be Besov-Spaces
simple while matching the Total-Variation
signals: Beltrami-Flow

Simplicity Reliability

Michael Elad | The Computer-Science Department | The Technion


An Example: JPEG and DCT
178KB – Raw 24KB 20KB 12KB
data

8KB
How & why does it works?

Discrete
4KB
Cosine
Trans.

The model assumption: after DCT, the top left


coefficients to be dominant and the rest zeros

Michael Elad | The Computer-Science Department | The Technion


Research in Signal/Image Processing

Problem
Model Signal
(Application)

Numerical
Scheme

The fields of signal & image A New


Research
processing are essentially built of an Work (and
evolution of models and ways to use Paper) is
them for various tasks Born

Michael Elad | The Computer-Science Department | The Technion


Again: What This Field is all About?

A Data Model and its Use


 Almost every task in data processing requires a
model – true for denoising, deblurring, super-
resolution, inpainting, compression, anomaly-
detection, sampling, and more
 There is a new model in town – sparse and
redundant representation – we will call it
Sparseland
 We will be interested in a flexible model that
can adjust to the signal

Michael Elad | The Computer-Science Department | The Technion


A New Emerging Model

Machine
Signal Learning
Processing Mathematics
Wavelet Approximation
Theory Theory
Multi-Scale Linear
Analysis Algebra
Signal Sparseland Optimization
Transforms Theory

Compression Inpainting
Blind Source Super-
Separation Resolution
Denoising Demosaicing

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Overview of this Field and this Course

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
A Closer Look at the

Sparseland Model

Michael Elad | The Computer-Science Department | The Technion


The Sparseland Model

 Task: model image patches


of size 8×8 pixels
 We assume that a
dictionary of such image Σ
patches is given, containing
256 atom images α1 α2 α3

 The Sparseland model


assumption:
every image patch can be
described as a linear
combination of few atoms

Michael Elad | The Computer-Science Department | The Technion


The Sparseland Model

Properties of this model: Sparsity and Redundancy


 We start with a 8-by-8 pixels patch and
represent it using 256 numbers
– This is a redundant representation
Σ
 However, out of those 256
elements in the representation, α1 α2 α3
only 3 are non-zeros
– This is a sparse representation
 Bottom line: 64 numbers
representing the patch are
replaced by 6 (3 for the
indices of the non-zeros,
and 3 for their entries)

Michael Elad | The Computer-Science Department | The Technion


Chemistry of Data

We could refer to the Sparseland model as the


chemistry of information:
 Our dictionary stands for the Periodic Table
containing all the elements Σ
 Our model follows a similar α1 α2 α3
rationale: Every molecule is
built of few elements

Michael Elad | The Computer-Science Department | The Technion


Model vs. Transform ?
 The relation between the m
signal x and its
representation  is the
n
D  n

following linear system, x


just as described earlier

 We shall be interested in seeking sparse
solutions to this system when deploying the
sparse and redundant representation model
 This is EXACTLY the transform we discussed earlier
Bottom Line: The transform and the model we described
above are the same thing, and their impact on
signal/image processing is profound and worth studying

Michael Elad | The Computer-Science Department | The Technion


Difficulties With Sparseland
 Problem 1: Given an image patch, how can we find
its atom decomposition?
 A simple example:
o There are 2000 atoms in the dictionary
Σ
o The signal is known to be built
of 15 atoms possibilities α1 α2 α3
 2000 
   2.4e  37
 15 
 If each takes 1 nano-sec to
test, the whole sweep will take
~7.5e20 years to finish !!!!!!
 Solution: Approximation
algorithms
Michael Elad | The Computer-Science Department | The Technion
Difficulties With Sparseland

 Various algorithms exist. Their Σ


theoretical analysis guarantees α1 α2 α3

their success if the solution is


sparse enough
 Here is an example – the
Iterative Reweighted LS (IRLS):
2

0
Iteration 6
0
1
2
3
4
5
-1

-2

0 200 400 600 800 1000 1200 1400 1600 1800 2000

Michael Elad | The Computer-Science Department | The Technion


Difficulties With Sparseland

 Problem 2: Given a family of signals, how do we


find the dictionary to represent it well?
 Solution: Learn! Gather a large set of signals
(many thousands), and find the
dictionary that sparsifies them Σ
α1 α2 α3
 Such algorithms were developed
in the past decade (e.g., K-SVD)
and their performance is
surprisingly good
 This is only the beginning of a
new era in signal processing …

Michael Elad | The Computer-Science Department | The Technion


Difficulties With Sparseland
 Problem 3: Is this model flexible enough to
describe various sources? e.g., Is it good for
images? Audio? Stocks? …
 General answer: Yes, this model is
extremely effective in representing Σ
various sources α1 α2 α3
o Theoretical answer: yet to be
given
o Empirical answer: we will see in
this course, several image
processing applications, where
this model leads to the best
known results (benchmark tests)

Michael Elad | The Computer-Science Department | The Technion


Difficulties With Sparseland?

 Problem 1: Given an image patch, how can we


find its atom decomposition?
 Problem 2: Given a family of signals,
how do we find the dictionary to
represent it well? Σ
α1 α2 α3
 Problem 3: Is this model flexible
enough to describe various
sources? E.g., Is it good for
images? audio? …

Michael Elad | The Computer-Science Department | The Technion


Difficulties With Sparseland?

 Problem 1: Given an image patch, how can we


find its atom decomposition?
 Problem 2: Given a family of signals,
how do we find the dictionary to
represent it well? Σ
α1 α2 α3
 Problem 3: Is this model flexible
enough to describe various
sources? E.g., Is it good for
images? audio? …

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Overview of this Field and this Course

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Who Works on This
and Who Are We ?

Michael Elad | The Computer-Science Department | The Technion


Who is Working in This Field ?

Donoho Candes – Stanford


Donoho, Goyal – MIT
Tropp – CalTech Mallat – Ecole-Polytec. Paris
Baraniuk, W. Yin – Rice Texas Nowak, Willet – Wisconsin
Gilbert, Vershynin, Plan – U-Michigan Coifman – Yale
Gribonval, Fuchs – INRIA France Romberg – GaTech
Starck – CEA – France Lustig, Wainwright – Berkeley
Frossard, Cevher – EPFL Swiss Sapiro, Daubechies – Duke
Rao, Delgado – UC San-Diego Friedlander – UBC Canada
Do, Ma – U-Illinois Lei Zhang – Hong-Kong PolyU
Davies – Edinbourgh UK Cohen, Combettes – Paris VI
Elad, Zibulevsky, Bruckstein, Eldar, Segev, Bronstein, Mendelson – Technion

Michael Elad | The Computer-Science Department | The Technion


David L. Donoho

 An extremely talented
mathematician and statistician
from the Stanford Statistics
Department
 He is among the few who founded
this field sparse and redundant
representations and its
spin-off topic of
compressed sensing
 In 2013 he won the Shaw
prize (“the Nobel of the
east”)

Michael Elad | The Computer-Science Department | The Technion


The Course’s Team: Leading Instructor

Computer-Science Department
Prof. Michael Elad The Technion
 M. Elad published more than 70 journal
papers and a book in the field of sparse
& redundant representations, spanning
theory, algorithms, and applications
 In the past 10 years, Michael Elad has
been teaching a course at the Technion,
similar to the one offered here
Michael Elad | The Computer-Science Department | The Technion
The Course’s Team: Teaching Assistant

Electrical Engineering Department


Mr. Yaniv Romano The Technion

 Yaniv is a PhD student, supervised by M. Elad


 He has published so far 8 journal papers in the field of
sparse & redundant representations

Michael Elad | The Computer-Science Department | The Technion


This Field is rapidly Growing …

 Searching ISI-Web-of-Science (December 26th 2016):


Topic = ((spars* and (represent* or approx* or solution or
estimation) and (dictionary or pursuit or convex)) or
(compres* and sens* and spars*))
led to 6354 papers (it was ~4000 papers a year ago)
 Here is how they spread over time (~146178 citations):

Michael Elad | The Computer-Science Department | The Technion


Which Countries?

Michael Elad | The Computer-Science Department | The Technion


Who is Publishing in This Area?

# citations
1606
11190
From the 362
506
Technion 423
642
Israel 2988
.
.
Zhang Lei – Xidian Radar .
Zhang Lei – HK Polytech .
.
Zhang Li – Xidian EE 1128
Zhang Li – Tsinghua Univ. .
Zhang Long – Xidian .
.
Zhang Lan – Xi An Jiao Tong .
Zhang Liang – Chongqing .
18735
Zhang Liao – Xidian .
Zhang Lin – Beijing Univ. .
Zhang Lu – Capital Med. 16474

Michael Elad | The Computer-Science Department | The Technion


Books in this field

 We already mentioned the book authored


by Michael Elad (2010). It will serve as the
textbook for this course
 This is the first published comprehensive
book in this field, serving as a textbook for
similar courses worldwide
 In the past several years, many other books
followed …

Michael Elad | The Computer-Science Department | The Technion


Books in this field

 We already mentioned the book authored


by Michael Elad (2010). It will serve as the
textbook for this course
 This is the first published comprehensive
book in this field, serving as a textbook for
similar courses worldwide
 In the past several years, many other books
followed …

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Overview of this Field and this Course

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Several Examples:

Applications Leveraging
this Model

Michael Elad | The Computer-Science Department | The Technion


Image Separation [Starck, Elad, & Donoho (`04)]

The original The Cartoon


image - part spanned
Galaxy SBS by wavelets
0335-052 as
photographed
by Gemini

The texture The residual


part spanned being additive
by global DCT noise

Michael Elad | The Computer-Science Department | The Technion


Inpainting [Starck, Elad, and Donoho (‘05)]

Source Outcome

Michael Elad | The Computer-Science Department | The Technion


Image Denoising (Gray) [Elad & Aharon (`06)]

Source

Initial dictionary
The obtained
Result 30.829dB (overcomplete DCT)
dictionary after
10 64×256
iterations

Noisy image   20

Michael Elad | The Computer-Science Department | The Technion


Denoising (Color) [Mairal, Elad & Sapiro (‘06)]

Original Noisy (12.77dB) Result (29.87dB)

Michael Elad | The Computer-Science Department | The Technion


Denoising (Color) [Mairal, Elad & Sapiro (‘06)]

Original Noisy (12.77dB) Result (29.87dB)


Original Noisy (20.43dB) Result (30.75dB)

Michael Elad | The Computer-Science Department | The Technion


Poisson Denoising [Giryes & Elad (‘14)]

Original Noisy (peak=0.2) Result (PSNR=24.16dB)

Original Noisy (peak=2) Result (PSNR=24.76dB)

Michael Elad | The Computer-Science Department | The Technion


Deblurring [Elad, Zibulevsky and Matalon (‘07)]

original (left), Measured (middle), and Restored (right):


Iteration: 19 ISNR=7.0322 dB

Michael Elad | The Computer-Science Department | The Technion


Blind Deblurring [Shao and Elad (‘14)]

Michael Elad | The Computer-Science Department | The Technion


Inpainting (Again!) [Mairal, Elad & Sapiro (‘06)]

Original 80% missing Result

Michael Elad | The Computer-Science Department | The Technion


Inpainting (Again!) [Mairal, Elad & Sapiro (‘06)]

Michael Elad | The Computer-Science Department | The Technion


Inpainting (Again!) [Mairal, Elad & Sapiro (‘06)]

Original 80% missing Result

Michael Elad | The Computer-Science Department | The Technion


Facial Image Compression [Brytt and Elad (`07)]

Original
15.81 13.89 6.60 JPEG
JPEEG-2000
14.67 12.41 5.49 Sparse-Based

15.30 12.57 6.36


Results for 550 Bytes per each file
Michael Elad | The Computer-Science Department | The Technion
Facial Image Compression [Brytt and Elad (`07)]

Original
? 18.62 7.61 JPEG
JPEEG-2000
? 16.12 6.31 Sparse-Based

? 16.81
Results for 400 Bytes per each file
7.20

Michael Elad | The Computer-Science Department | The Technion


Super-Resolution [Zeyde, Protter & Elad (‘09)]

Ideal Image

SR Result
PSNR=16.95dB

Bicubic
interpolation
PSNR=14.68dB Given Image

Michael Elad | The Computer-Science Department | The Technion


Super-Resolution [Zeyde, Protter & Elad (‘09)]

The Original Bicubic Interpolation SR result

Michael Elad | The Computer-Science Department | The Technion


To Summarize

An effective (yet
simple) model for Sparse and redundant
signals/images is representations and other
key in getting better Which model example-based modeling
algorithms for to choose? methods are drawing a
various applications considerable attention in
recent years

Are they working well?


Yes, these methods have been
deployed to a series of
applications, leading to state-of-
the-art results. In parallel,
theoretical results provide the
backbone for these algorithms’
stability and good-performance

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Overview of this Field and this Course

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
This Course:

Scope and Style

Michael Elad | The Computer-Science Department | The Technion


This Course

Will review 15 years of tremendous


progress in the field of

Sparse and Redundant


Representations

Numerical Applications
Theory
Problems (image processing)

Michael Elad | The Computer-Science Department | The Technion


This Course Structure

 We chose to break this course into two separate


parts, each of which is an independent five-
weeks course:
o Part 1: Introduction to the Fundamentals of
Sparse Representations
o Part 2: Sparse Representations - From
Theory to Practice
 While we recommend that you take both these
courses, each of them can be taken
independently of the other.

Michael Elad | The Computer-Science Department | The Technion


This Course Grading

Part 1: Part 2:
Introduction to the Sparse Representations
Fundamentals of Sparse - From Theory to
Representations Practice

 Each of these two parts includes


o Knowledge-check questions and discussions
o Series of quizzes, &
o Matlab programming projects

 Each course will be graded separately, using the


average grades of the questions/discussions [K]
quizzes [Q], and projects [P], by
Final-Grade = 0.1K + 0.5Q + 0.4P
Michael Elad | The Computer-Science Department | The Technion
Good Luck!
We hope you will enjoy this course

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Mathematical Warm-Up

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Underdetermined Linear
System of Equations
& Regularization
Underdetermined Linear System

 Our story begins with a classic problem in linear


algebra: We are given a linear system of equations
of the form Ax=b with n equations and m>n
unknowns, were A is full-rank
 Clearly this system has infinitely many solutions
 Often times in engineering we encounter problems
admitting this form … m
 Our Problem: How could
we suggest a meaningful n A  n
single solution in such
a case? x
b

Michael Elad | The Computer-Science Department | The Technion


Underdetermined Linear System

 Here is such a practical example: x is an image of


size 100×100 pixels, and b is a 4:1 subsampled
version of it m

n A  n

b b
x x
 Solving this linear system amounts to an
attempt to obtain x from b, a problem known
as Interpolation or Single-Image-Super-Resolution
 Clearly, in such problems we desire a single
solution among the infinitely many that exist

Michael Elad | The Computer-Science Department | The Technion


The Answer: Regularization

 If we aim to get a unique solution to our problem,


we should “rank” all the possible solutions based on
a chosen quality-measure or penalty, and then
choose the best one
 We define this by: min J  x  s.t. A x  b
x
All Possible
Solutions This is the
regularization
term – it assigns
Among all a penalty value to
possible solutions each possible
satisfying the solution
linear system, we
seek the
Levels of the function J(x) “cheapest” one

Michael Elad | The Computer-Science Department | The Technion


Sounds Easy But …

Who is J(x) ?
 Clearly, everything depends on the choice of this
function, so who is it? How should we choose it?
 The answer to this question depends on your
needs, beliefs, mathematical proficiency, your field,
and more
 The idea of regularization is central in sciences, and
it has been used in many fields and in many ways
 Why is it called REGULARIZATION? Because it takes
a problem with a severe singularity (a.k.a. ill-posed
problem) and turns it to become “regular”

Michael Elad | The Computer-Science Department | The Technion


And the Most Popular Regularization is …

L2 -based
1
Jx 
2
Bx for some chosen matrix B
2 2

Why is it so popular? Because of several critical reasons:


o This choice is easy to mathematically analyze
o This choice leads to a closed-form solution
o It leads to a unique solution under clear conditions

Michael Elad | The Computer-Science Department | The Technion


Let’s Have a Closer Look

1 2
min Bx s.t. A x  b
x 2 2

Let’s derive the close-form solution to this problem


o First step: Define the Lagrangian that takes care of the
constraint
L(x,  )  2 B x 2    A x  b 
1 2 T

o Second step: Take the gradient of L w.r.t. x and null it


x   B B  A 
1
T T T T
L  B B x  A   0
o Third step: Plug the outcome into the constraint

 
1

   
1 1
T T T T
A x  A B B A  b    A B B A b

 A B B  
1

 
1 1
T T T T
x  B B A A b

Michael Elad | The Computer-Science Department | The Technion


L2 Regularization – Closed-Form

1 2
min Bx s.t. A x  b
x 2 2

Theorem:Under some conditions, this problem admits


a unique solution, given by

 A B B  
1

 
1 1
T T T T
x opt  B B A A b

 As promised, this problem is easy to analyze, it


has a closed-form solution, and this solution is
unique under a simple condition on B and A
 This explains the great popularity of this approach
in sciences in general, and image processing in
particular

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Mathematical Warm-Up

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
The Temptation
of Convexity
Is There Life Beyond L2 ?

 We have seen that the L2 regularization is pleasant


to work with, but … this does not mean that it
serves our needs
 Indeed, in image processing, the Wiener filter that
emerges from the L2 regularization strategy is
known to perform very poorly
 So, the question that comes to mind is: What else
is out there that could be used, while preserving
the good properties that the L2 term had?
 The answer: Convex functions J(x). Why?
Because (strict) convexity of J(x) guarantees
a unique solution, which we so much desire
Michael Elad | The Computer-Science Department | The Technion
Recall: Convex Sets

Definition: A set  is convex if  x 1 , x 2   and


t  [0,1] , the convex combination
x  tx 1  (1  t)x 2
is also in 

x1 x2 x1
x2

Convex Set Non-Convex Set

Michael Elad | The Computer-Science Department | The Technion


Convex Sets: An Example

 Consider the set of all the solutions of a given


linear system:    x A x  b

 This set is convex since  x1 , x 2   and t  [0,1]


we get
A x1  b 
 A  t x 1  (1  t)x 2   t A x 1  (1  t) A x 2
A x2  b
 tb  (1  t)b  b
 Observe that the above is correct even if t is
outside the interval [0,1]. What does that mean?

Michael Elad | The Computer-Science Department | The Technion


Recall: Convex Functions

Definition: A function f  x  :   is convex if  x 1 , x 2  


and t  [0,1] , the convex combination
x  tx 1  (1  t)x 2
satisfies
f  t x 1  (1  t)x 2   tf  x 1   (1  t)f  x 2 

Here is an alternative definition that ties convex


functions and convex sets:

Definition: A function f  x  :   is convex if its epigraph


is a convex set:
 x , y  | y  f (x )

Michael Elad | The Computer-Science Department | The Technion


Convex Functions: Illustration

tf  x 1   (1  t)f  x 2 
f x
Epigraph

x1 x x2

f  t x 1  (1  t)x 2   tf  x 1   (1  t)f  x 2 

Convex function

x1 x x2

Non-Convex function

Michael Elad | The Computer-Science Department | The Technion


Convexity via Derivatives

If f(x) is twice continuously differentiable, we can


tie convexity to the function’s derivatives:
Theorem: A function f  x  :   is convex iff  x 1 , x 2  
f  x 2   f  x1   f  x1 
T
 x 2  x1 
The tangent planes bounds the function from below

Convex function Non-Convex function

x2 x2
x1 x1

Michael Elad | The Computer-Science Department | The Technion


Convexity via Derivatives

If f(x) is twice continuously differentiable, we can


tie convexity to the function’s derivatives:
Theorem: A function f  x  :   is convex iff  x 1 , x 2  
f  x 2   f  x1   f  x1 
T
 x 2  x1 
The tangent planes bounds the function from below

Theorem: A function f  x  :   is convex iff  x  


 f x
2
0
i.e., the Hessian is positive semi-definite (PSD)
everywhere in 

Michael Elad | The Computer-Science Department | The Technion


Convexity vs. Strict Convexity
In all the definitions for convexity, if we
do not allow equality, the convexity becomes strictly so
f  t x 1  (1  t)x 2   tf  x 1   (1  t)f  x 2 

f  x 2   f  x1   f  x1 
T
 x 2  x1 
 f x
2
0

These functions are convex but not strictly so

Michael Elad | The Computer-Science Department | The Technion


Convexity vs. Strict Convexity

Example:
Is the following function (strictly) convex?
1
Jx 
2
Bx
2 2

Answer:
The first derivative of this expression is given by
J  x   B B x
T

and the second derivative is  2 J  x   B T B

If B has full column-rank, then BTB is PD, and thus


this function is strictly convex
x2
Question: What about the function J  x   B x 2 ?
Michael Elad | The Computer-Science Department | The Technion
And Now to the Punch-Line ...

Theorem:Consider the following constrained problem:


min f  x  s.t. x  
x

where f  x  :   is a convex function and  is


a convex set. Then:
o Every local minimum point is a global one
o The set of optimal solutions forms a convex set
o If f(x) is strictly convex, the solution is unique

 Returning to the problem m in J  x  s.t. A x  b,


if J(x) is strictly convex, we get a unique solution,
as wished
 The L2 case is simply a special case of this broader property
Michael Elad | The Computer-Science Department | The Technion
So, Which J(x) to Choose?

There are many options, and we’ll mention


one specific & popular family of such functions:

 Lp-norm (p1): x   x
2 p

2 k
x p
 p
 xk
k k

 Special cases of interest are L1: x 1


  xk
k

L: x 
 max x k
k

Note 1: L1 is convex but not strictly so


Note 2: For all 1<p<, the norm function is convex but not
strictly so. Its p-th power becomes strictly-convex
(just like the L2 case)
Note 3: For p<1, the above is not a convex function, and
therefore it is not a formal norm [See next Slide]

Michael Elad | The Computer-Science Department | The Technion


Norms vs. Convexity

 A norm is a function x :
m
 with the properties:
o Non-negativity:  x x  0 and x  0  x  0
o Homogeneity:  x &  c, c x  c  x
o Triangle-Inequality:  x , y , x  y  x  y

 For 0t1 and for any pair of vectors x and y,


p p

t x  (1  t)y  t x  (1  t)y Any valid


x p
  xk
 k
norm
Triangle with p<1 is
function is
necessarily NOT CONVEX
 t x  (1  t) y and thus not
 convex
Hom og. a valid norm

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Mathematical Warm-Up

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
A Closer Look at
the L1 Minimization
Our Goal

Better understanding of the (P1) problem

 P1  min x
x 1
s.t. A x  b
Why are we so interested in this choice of J(x)?

Because, as we are about to see, it favors

SPARSE SOLUTIONS

Michael Elad | The Computer-Science Department | The Technion


(P1) Solutions form a Convex Set

 Let’s start with a relatively simple to prove property of (P1)


Theorem: The set of optimal solutions of (P1),
S, forms a bounded and convex set

 Recall: L1 is not strictly


Example: m in   
convex and thus an
option for multitude of s.t.     2
A  [1 1]
solutions does exist b  [2]
 Our goal here is to say
that the situation in this 2
case is not so grave, since
Optimal
all the solutions are
solutions
“concentrated”

Michael Elad | The Computer-Science Department | The Technion


Proof – Part 1: Bounded Set

 We start with x1 & x2 in S, thus


x1 1
 x2 1
 J min & A x1  A x 2  b

 Consider a convex (0<t<1) combination of x1 and x2:


x*  tx 1  (1  t)x 2

o It satisfies the constraint:


A x*  A  t x 1  (1  t)x 2   b

o Due to the convexity of L1,


x* 1
 t x 1  (1  t)x 2 1
 t x1 1
 (1  t) x 2 1
 J min

o Due to the optimality of x1 and x2: x * 1  Jmin


x*  tx 1  (1  t)x 2 is in the optimal solution set

Michael Elad | The Computer-Science Department | The Technion


Proof – Part 2: Convexity

 If x1 and x2 are any two optimal solutions in S, then


x1 1
 x2 1
 Jmin
 Using the triangle inequality we get
x1  x 2 1
 x1 1
 x2 1
 2Jm in

The L1-diameter of the S


set S is at most 2Jmin
x2

To summarize: S – the set x1  2Jmin


of all optimal solutions of
(P1) – is bounded and convex

Michael Elad | The Computer-Science Department | The Technion


Tendency to Sparse Solutions

Theorem: The set of optimal solutions of (P1),


S, must contain a solution with at
most n non-zeros

m
Example: m in   
n A  n
A  [1 1]
s.t.     2

b  [2]
b
x
2

Sparse Optimal
Solutions solutions

Michael Elad | The Computer-Science Department | The Technion


Proof: (1)

 Assume that x*  S , with k>n non-zeros


 The term Ax* linearly combines k columns from
A to form b m
 Since k>n, clearly these
(yellow) columns are n  n
linearly dependent
k A x
b

h  0 | A h  0 Assume w.l.o.g.
that these are
supp h  supp  x * the first k
columns in A

Michael Elad | The Computer-Science Department | The Technion


Proof: (2)

 Consider the vector x  x * h for a


very small ε k 

o Observation 1: x is a feasible solution


A  x * h   A x *  A h  b
x* h
o Observation 2: For small enough values of ε,
, sign  x *   h  sign  x *
*
xj
   m in
1  j k h j

o Observation 3: For these ε we have


k

 sign(x )  x 
* *
x *  h 1
 j j
 h j
j 1

k k

 sign(x )x    sign(x j )h j  x *
* * * T
 j j  h sign(x*)
1
j 1 j 1

Michael Elad | The Computer-Science Department | The Technion


Proof: (3)
T
x * h 1
  x* 1
  h sign(x*)
T
h sign(x*)  0 x * h 1
 x* 1
for   0
T
h sign(x*)  0 x * h 1
 x* 1
for   0

Could this term be 0 ?


Both these cases stand in complete
disagreement with the optimality of x*

T
h sign(x*)  0

x* 1
 x *  h 1

Michael Elad | The Computer-Science Department | The Technion


Proof: (4)

Conclusions:
 x*+εh is in S as well (just as optimal as x*)
 We can choose ε to null one entry in x*+εh,
getting a new optimal solution with k-1 non-zeros
 We can proceed this way and null elements in the
optimal solution till we get n non-zeros, as claimed

We cannot guarantee going below n


since then the linear dependence of the
columns in A is not given

Michael Elad | The Computer-Science Department | The Technion


m
Disclaimer
n A  n

b
x

We have just seen that (P1) favors sparse solutions


 P1  min
x
x 1
s.t. A x  b

namely, a solution with n<m non-zeros at most

While this course is indeed dealing with


SPARSE REPRESENTATIONS
The above is not the sparsity we are aiming for, as
we will be interested in (much) fewer non-zeros in x

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Mathematical Warm-Up

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Conversion of (P1) to
Linear Programming
Solving (P1)

 We now turn to present a useful connection


between our problem, (P1), and Linear
Programming (LP)
 General LP structure:
T
(LP) min c x s.t. A x  b & x  0
x

 This connection is valuable since there are many


available effective LP solvers, and with this
connection, these become relevant for solving our
task
 P1  min x
x 1
s.t. A x  b
 We shall build this connection (P1)-LP by
construction
Michael Elad | The Computer-Science Department | The Technion
Splitting the Unknown

 Starting with (P1), we propose to split the unknown


x to its positive and negative entries:

 x  u  v where x, u, v 
m
and u  0 & v  0

 Example:
o All the positive entries go to u
 1  1   0 


3


   
3
   
0 o All the negative ones are in v
 0 
 
0   0 
   
o Both vectors are non-negative
 4  0 4
x      uv o A by-product of this split, since
 1  0  1 
     
 0  0   0 
their supports do not overlap,
 0  0   0  we get
      T
 2   2   0  u v  0

Michael Elad | The Computer-Science Department | The Technion


Turning to Use u and v

 Lets plug this split into the (P1) problem:


 P1  min x
x 1
s.t. A x  b

b  A x  A u  v 
m
x 1
  xk
k 1
m m
  A  A   u 
 
 u k
 v k v 
k 1 k 1

 1 1   u 
T T u 
  and    0
  v 
v 

 So, it appears that we could replace x with its new


description and modify (P1) to an equivalent form

Michael Elad | The Computer-Science Department | The Technion


From (P1) to LP

 Lets plug this split into the (P1) problem:


 P1  min x
x 1
s.t. A x  b

1 T 1 T   u  s.t. b   A  A   u  u 
min    
(LP)     &  0
u ,v v
  v
  v 

 Looks like an LP problem, no?


 One problem, though: what about the condition
uTv=0 ? Inserting it into the above problem ruins
the LP structure !

Michael Elad | The Computer-Science Department | The Technion


Is there a Problem?
 1  3 2 
Theorem: The constraint uTv=0
is a passive      
3
     
3 0

one and thus can be omitted from  0  0   0 


     
4 0 4
the LP formulation without harm x        uv
 1  0  1 
     

Proof:  0  0   0 
 0  0   0 
     
o Assume that the optimal solution violates uTv=0  2   2   0 

o There exists a non-zero entry in the same location in


u and v. Assume that it is u1v1>0

o Replacing u1 by u1-v1>0, and v1 by 0, we get


• The constraint is not affected: A(u-v)=b
• The penalty 1T(u+v) is reduced by 2v1>0
Our initial assumption about violation of uTv=0 does not
hold, and the optimal solution does obey this constraint
Michael Elad | The Computer-Science Department | The Technion
Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Seeking Sparse Solutions

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
Promoting Sparse
Solutions
What Have we Done so Far?

Ax  b Regularize:
What
has infinitely Jx  ?
to do? min J  x  s.t. A x  b
many solutions x

Such Option 2: Choose Other Option 1: Choose


as ? a convex J(x) options?
Jx  x
2

Option 3: Choose … Tendency to


So?
Jx  x 1 Sparse Solutions
Properties?

We will now try to better understand this behavior


and then discuss its importance for our needs

Michael Elad | The Computer-Science Department | The Technion


Why L1 Leads to Sparsity ? Answer #1

 In order to understand this tendency better, consider


the following problem:
min x 1 s.t. x 1
Find the shortest x in L1, x 2

given that its L2-norm is normalized?


 While an algebraic solution could be applied here,
let’s try a geometric view
x 1
 The solution is given on the 2

“corners” of the L1-sphere,


such as x opt  [1, 0, 0, 0, 0]
 Conclusion: Among all
L2-normalized vectors, the shortest in
L1 are also the sparest possible

Michael Elad | The Computer-Science Department | The Technion


Why L1 Leads to Sparsity ? Answer #2

 Indeed, let’s consider our more general problem for


p=1 and 2, and again consider things geometrically
p
m in x p
s.t. A x  b
x

Ax  b Ax  b

x  const p=1 p=2 x 2


 const
1

 Conclusion: L2 tends to dense solutions while L1


(again) promotes sparse ones
Michael Elad | The Computer-Science Department | The Technion
So, why L1 ?

 If the migration from L2 to L1 Clearly, this choice is no


promotes sparsity, why not go longer a valid norm as it
is not a convex function,
further to Lp with p<1? but if it serves our needs,
p
m in x s.t. A x  b why not use it?
x p

Ax  b Ax  b

x  const x 2
 const
1

 Conclusion: As p decreases below 1, we


strengthen the tendency to get sparse solutions
Michael Elad | The Computer-Science Department | The Technion
Other Options

 We have to emphasize: There is nothing p

special about the Lp-norm expression,


x p
 p
 xk
k

and one could suggest other functions


m

 More broadly, any function of the form J  x      x k 


could fit, if the scalar function () is k 1

o Symmetric (x)
o Monotonically x
non-decreasing for x>0
o Monotonic non-increasing derivative for x>0
 Examples:   x   log 1   x 
x
 x  1  e 
 x
 x  
1 x

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Seeking Sparse Solutions

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
The L0-Norm and the
(P0) Problem
From Lp to L0

 If we are interested in the sparsest solution to


Ax=b, why not go with p all the way to zero?
What does that mean?
 Here are the “influence functions” |x|p for varying p:
p p
x p
  xk
2
k
p=2
 As p0, this function 1.5 p=1
p=0.5
becomes an indicator 1 p=0.1
function, I(x):
0.5
0 0 x 0
I(x)  x 
1 x 0 0
-1.5 -1 -0.5 0 0.5 1 1.5

Michael Elad | The Computer-Science Department | The Technion


L0–Norm: Definition

m
Definition: Given a vector x  , its L0-norm is
defined as
m
  I xk 
p
x 0
lim x p
p0
k 1

 This L0-norm simply counts the number of


non-zeros in x
0
 Source of confusion: Is it x 0
or x 0
?
p
o While we took the limit of x p , the outcome is
considered as “norm” without the 0-th power
o We shall use both notations and mean the same:
the number of non-zeros (the cardinality) in x

Michael Elad | The Computer-Science Department | The Technion


L0–Norm: It is Not Really a Norm !
m

 I x 
p
x 0
lim x p
 k
p0
k 1

 Clearly, x 0 is not a valid norm, as it is not convex


 Which of the norm-axioms is violated?
o Non-negativity:
x 0
0
o Homogeneity:
cx 0
 c  x 0

o What about the triangle inequality?

Michael Elad | The Computer-Science Department | The Technion


L0–Norm and the Triangle Inequality

 The triangle inequality: x  y 0  x 0  y 0


 In words: the # of non-zeros in x+y is equal or smaller
than the sum of the non-zeros in x and y separately

Equality: When the Inequality: When the supports overlap


supports do not overlap or there is a cancelation
1  0  1  3  1   1 
           
0 1 0 1 0 1
           
0  0  0  0  0  0 
x    an d y    x    an d y    x    a nd y   
7  0  7  0  7  0 
1  0  1  0  1  0 
           
 0    2   0    2   0    2 

5 xy  x  y 5 5 xy  x 0
 y 6 4  xy  x  y 6
0 0 0 0 0 0 0 0

Michael Elad | The Computer-Science Department | The Technion


L0–Norm Properties: Summary
m

 I x 
p
x 0
lim x p
 lim k
p0 p0
k 1

 Clearly, x 0 is not a valid norm, as it is not convex


 Which of the norm-axioms is violated?
o Non-negativity:
x 0
0
o Homogeneity:
cx 0
 c  x 0

o inequality: Surprisingly
xy  x 0
 y
0 0

Michael Elad | The Computer-Science Department | The Technion


Defining Our Holy-Grail: (P0)

Definition: Given an under-determined linear system


of equations, Ax=b, the (P0) problem
seeks its sparsest solution:
(P0 ) min x 0
s.t. A x  b
x

Observe how similar this problem is to


2
(P2 ) min x 2
s.t. A x  b
x

And yet, we know so much about (P2)


and so little about (P0)

Michael Elad | The Computer-Science Department | The Technion


What Do We Want to Know About (P0) ?
2
(P2 ) min x 2
s.t. A x  b (P0 ) min x s.t. A x  b
x x 0

Here is What Our


we Know Questions:

 There is a unique  Is there a unique solution?


solution to this Under which conditions?
problem  Given a candidate solution,
& could we test its optimality
 This solution is easily?
easily computed  How can we get this solution
(closed-form) in reasonable time?

Michael Elad | The Computer-Science Department | The Technion


(P0) is (Really) Hard

 Assume we are given (P0)


o m=2000, n=500 n A 
o We know that
x  20 m b
0 x

 Our goal: Find x using a direct approach that checks


all the possible supports  2000 
   3.9e  47
 20 
 If every such test takes 1nano-sec, we will
conclude the whole calculation in 1.2E31 years (!!!)

Is There Hope? Why Should We Care?

Michael Elad | The Computer-Science Department | The Technion


Do Sparse Solutions Exist ?

Here is an interesting observation: Question: What


would the cardinality
n m
Draw A at A of the solution be?
Random (m  n)
Answer: x 0  n with
probability 1 – The
A Solver of solution is not
(P0 ) min x s.t. A x  b expected to be
x 0
x “deeply” sparse at all
Draw b at b
n So, why bother with
Random (P0), when (P1) could
have given this
solution easily?

Clearly, this is not the case we are after


So, what are we trying to achieve?

Michael Elad | The Computer-Science Department | The Technion


Our Point of View

Draw A at Obviously, in this case, there


n m
Random A is a sparse solution to the
(m  n) linear system, and the
solution will give x 0  s
n
b
x
Multiply A Solver of
b=Ax0 (P0 ) min x s.t. A x  b
x 0

m
x0  Our field is not about covering all b  n

x0 s n but rather about treating only those b


Draw an 0
that can be explained “sparsely” –
s-sparse x0
Their “dimension” is O(s). In doing
at Random
so, we aim to reveal the underlying
structure of b, believed to be a
composition of few atoms from A

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations
and Their Applications in
Signal and Image Processing
Seeking Sparse Solutions

Michael Elad
The Computer Science Department
The Technion – Israel Institute of technology
Haifa 32000, Israel
A Signal Processing
Perspective
Mathematical Riddle ?

(P0 ) min x 0
s.t. A x  b
x
So far, our discussion has been purely mathematical: We
are interested in better understanding the problem (P0)
that seeks the sparsest solution to a linear system

We argue that this is more than just a mathematical riddle

As we will show in this course, (P0) and related topics are


of central importance to data processing in a universal way

Michael Elad | The Computer-Science Department | The Technion


Signal Modeling

 The linear system Ax=b


describes a signal n A  n
construction process
b
 We consider our signal m x
(the vector b) to be composed
as a linear combination of s<<n columns from A
 What does this mean? It implies that our n-dim.
signals are in-fact simpler, residing in a union of
many s-dim. subspaces
 Is it so far fetched? Clearly, we do know that all the
data we operate on is compressible, suggesting that
an inner structure in it seems quite plausible
Michael Elad | The Computer-Science Department | The Technion
Signal Modeling: An Example

 Piece-wise constant
signals can be described n A  n
as a sparse combination
of atoms taken from the b
Heaviside dictionary m x

=
 x5
+ x9
1
0 + x16

Michael Elad | The Computer-Science Department | The Technion


Is this a New Idea ?

 JPEG assumed that only few coefficients of the DCT*


capture most of the
image content
 JPEG-2000 does the same
n
A 
with the wavelet transform
(allowing any location for x b
n
the non-zeros)
* Actually, the
 Transform’s (e.g., wavelet) sparsity has assumption is even
been used effectively for handling more restrictive:
problems such as denoising, deblurring, … The non-zero
coefficients are
expected to be the
 In our language, A is simply the leading ones, as
Inverse-Transform since it takes us from in PCA
representation-space to the signal-space
Michael Elad | The Computer-Science Department | The Technion
Why Redundancy ?

 We propose to use redundant


dictionaries (m>n)
n A 
 Why? Because
o It enriches further the n x b
model, enabling more
combinations for the
signal construction
o It enables getting
sparser solutions
n A 
m b
x

Michael Elad | The Computer-Science Department | The Technion


Sparse & Redundant Representations

 The concept of seeking


the sparsest
representation for data
has been around and
found useful in many Welcome
fields already to
 It is now time to make Sparseland
it more clear and more
precise, which is exactly
the goal of this field
… so
 Welcome to Sparseland

Michael Elad | The Computer-Science Department | The Technion

You might also like