Professional Documents
Culture Documents
2012 Fall
Course Introduction
n Data compression (before)
n Main text: Introduction to Data Compression (3rd Ed) (K. Sayood)
n Huffman Coding
(ECE 5546-
5546-41) n Arithmetic Coding
n Dictionary Techniques
n Context-Based Compression
2012/Fall
Ch2--page.
Ch2 page.11
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
n Each student should study thoroughly and present at least one paper.
n It should be completely understood by the presenter before presentation.
n A list of papers will be provided by the instructor. However, a preferred
paper can be suggested by student.
n Grading Policy
n Attendance 10%
n Project/Presentation 20 %
n Homework 10 %
n Exam (Mid 30 + Final 30) 60 %
2012/Fall
Ch2--page.
Ch2 page.22
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
Y = AX
Very Brief Introduction to CS
n Its solution of X is easy unless the matrix A is non-invertible.
n Nonlinear system can be approximated to a linear system of equations.
n Continuous system can be discretized to a linear system of equations.
modified from a file by Igor Carron (version 2- draft)
at
https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsdGRvbWFpbnxpZ2
9yY2Fycm9uMnxneDoxYmNkZjU5MWQ2NmJkOGUy)
2012/Fall
Ch2--page.
Ch2 page.33
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/Fall
Ch2--page.
Ch2 page.44
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/Fall
Ch2--page.
Ch2 page.55
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/Fall
Ch2--page.
Ch2 page.66
Lecture Note (Data Compression-CS) by Prof. Byeungwoo Jeon (2012 Fall)
2012/Fall
Ch2--page.
Ch2 page.77
2012 Fall
Data Compression
(ECE 5546-
5546-41)
Ch2. Sparse and Compressible Signal Models
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Compressed Sensing
n An instance of an underdetermined system of linear equation is a
compressed sensing system.
n Y ~ compressed measurement (few)
n
n
A ~ sensing (in a form of linear combinations)
X ~ original information (like to find)
Y = AX
n The recovery of a sparse solution to an underdetermined system of
linear equations is performed using Compressed Sensing
Reconstruction techniques /solvers.
n Key Question: “Do all underdetermined systems of linear
equations admit very sparse and unique solution ?”
n Answer: Some systems do under a condition (RIP, NSP,….).
n In 2-space
n In 3-space
Initial point
n Vector addition and scalar multiplication
n Dot product: If u and v are two vectors in 2-space (or 3-space), and
the angle between them is q, then the dot product is defined as,
u gv = u v cos(q )
n It is sometimes called scalar product or Euclidean inner product
Extension to N
N--space (1)
n Definition of n-space: For a given a positive integer n, an ordered n-
tuple is a sequence of n real numbers denoted by ( a1 , a2 ,..., an ) . The
complete set of all ordered n-tuples is called n-space and is denoted
by R n .
n It is a natural extension of 2-space, 3-space
Extension to N
N--space (3)
n Let’s extend the concepts of norm and distance to n-space
n Definition: For a vector u = ( u1 , u2 ,..., un ) Î R n , the Euclidean norm
is defined as,
n
u = u gu = åu i
2
i =1
n
n Definition: For two vectors u , v Î R , the Euclidean distance bet.
two points indicated by the two vectors, is defined as,
n
2
d (u , v ) = u - v = å ( ui - vi )
i =1
n Examples
n All norms are seminorms.
n The trivial seminorm, with p(x) = 0 for all x in V.
n The absolute value is a norm on the real numbers.
n Every linear form f on a vector space defines a seminorm by x → |f(x)|.
n n
2 n 2
u 2
= åu i (u Î R ); u = åu i (u Î C n )
i =1 i =1
n The set of vectors whose 1-norm is a given constant forms the surface
of a cross polytope of dimension equivalent to that of the norm minus 1.
n The Taxicab norm is also called the L1 norm. The distance derived from
this norm is called the Manhattan distance or L1 distance.
Properties of Norms
n The concept of unit circle (the set of all vectors of norm 1) is different
in different norms
n For the 1-norm the unit circle in R2 is a square
n For the 2-norm (Euclidean norm) it is the well-known unit circle
n For the infinity norm it is a different square.
n For any p-norm it is a superellipse (with congruent axes).
n Due to the definition of the norm, the unit circle is always convex and
centrally symmetric (therefore, the unit ball may be a rectangle but
cannot be a triangle).
Linear Independence
n Linear Independence
n A finite set of vectors that contains the zero vector will be linearly
dependent.
å c v , (where
i =1
i i ci : scalar )
n Suppose that the set S = {v1 ,..., vn } is a basis for the vector space V,
then every vector u from V can be expressed as a linear combination
of the vectors from S in exactly one way.
n Suppose that S = {v1 ,..., vn } is a set of linearly independent vectors,
then S is a basis for the vector space V = span (S).
n Theorem
n Theorem
In Matrix Form
n
n A basis set {fi }i =1 Î R n , any vector x in Rn is uniquely represented
as,
n
x = å ci fi
i =1
n Form a nxn matrix F with columns given by fi ‘s, and let c denote
the length-n vector with entries ci, the matrix representation is:
x = Fc
Digital Media Lab. Fog. 1.3 (Compressed Sensing by Y. Eldar et. Al) 34
Ex: Sparse approx. of a natural image
n Sparse approximation of a natural image
Digital Media Lab. Fog. 1.4 (Compressed Sensing by Y. Eldar et. Al) 35
Set of K
K--Sparse Signals
n Set of all K-sparse signals
{
SK = x x 0 £ K }
n Q: Is the set Sk a linear space?
n That is, for any pair of vector x, z in Sk , x+z also belongs to Sk ?
Digital Media Lab. Fog. 1.5 (Compressed Sensing by Y. Eldar et. Al) 36
Sparseness of Image
n Most natural images are characterized by large smooth or textured
regions with relatively few sharp edges.
n Signals with this structure are known to be very nearly sparse when
represented using a multiscale wavelet approximation.
n K-term approximation
n Need measure (i.e., appropriate norm) to measure the error.
n This kind of approximation is non-linear (since choice of which coefficients to
keep in the approximation depends on signal itself).
n Choose a basis set such that the coefficients obey the power-law decay.
Compressibility (1)
n Definition of compressibility: A signal is called compressible if its
sorted coefficient magnitudes in Y decays rapidly.
x = Ya where a1 ³ a 2 ³ ... a n
s K ( x ) 2 £ C2 K - r
K-term Approximation
n Only K-largest coefficients are kept while making the others zero to
represent the given signal.
n K-term approximation error
s K ( x ) = argmin x - Ya 2
a Î SK
Data Compression
(ECE 5546-
5546-41)
Ch 3. Sensing Matrices
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
x = Ys
Example: K= 4
x Y s
Digital Media Lab.
sparse x compressible
y F x
Mx1 MxN Nx 1
y F
Mx1 MxN
Y s
NxN Nx 1
x
Digital Media Lab.
Compressed Sensing (3)
n Measurement process with Q = FY
n There are small number of columns corresponding to nonzero coeffs.
n The measurement vector y is a linear combination of these columns
y Q = FY
Mx1 MxN
s
Nx 1
n Q5: When will the L1 convex relaxation solution attain the L0 solution?
n Proof: . .
. .
. .
.
.
. In this case, no way to find all signals x
. from the measurements y è distinct x
must mean distinct measurement vectors.
n Spark
n A term coined by Donoho & Elad (2003)
n It is a way characterizing the null space of F using L-0 norm.
n It is very complex to obtain (compared to a rank), since it calls for
combinatorial search over all possible subsets of columns from F.
Spark Condition
n Theorem: for any vector y Î R M ,
$at most one signal
€ spark ( F ) > 2 K
x ÎS K s.t. y=Fx
n Proof:
2 £ spark (F ) £ M + 1; spark (F ) > 2 K
Þ 2K < M + 1
Þ 2K £ M
n Rf:
é h1 ù é h1 ù é0ù
êh ú ê0ú êh ú
ê 2ú ê ú ê 2ú
ê h3 ú ê h3 ú ê0 ú
h=ê ú hL = ê ú, hLC =ê ú h = hL + hL C
0
ê M ú L Î {1, 3} ê ú ê h4 ú
êM ú ê0ú êM ú
ê ú ê ú ê ú
ëê hn ûú êë 0 úû êë hn úû
n This means that if a matrix F satisfies the NSP, then the only K-sparse
vector in N(F) is h=0.
hLC
hL £C 1 s K ( x ) p = min x - xˆ p
2 xˆÎS K
K
s K ( x )1
n Since by construction x¢ Î S K , we can apply D(Fx) - x 2 £ C to
K
obtain x¢ = D(Fx ) . Moreover, since h Î N (F ) , we have
F h = F ( x - x¢) = 0
n If matrix F satisfies the NSP then the only 2K-sparse vector in N(F )
is h=0
{
holds for all x Î S K = x | x 0 £ K
. }
n If a matrix F satisfies the RIP of order 2K, F approximately preserves
the distance between any pair of K-sparse vectors.
n Fundamental implications concerning robustness to noise.
n If a matrix F satisfies the RIP of order K with constant dK, then, for
any K’ < K, we automatically have that F satisfies the RIP of order K’
with constant d K ' £ d K .
Data Compression
(ECE 5546
5546--41)
Ch 3. Sensing Matrices
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
n Proof:
n Lemma : Suppose that F satisfies the RIP of order 2K, and let
h Î R N , h ¹ 0 be arbitrary. Let L 0 be any subset of {1,2,…,N} s.t.
L 0 £ K . Define L1 as the index set corresponding to the K
entries of hL c0 with largest magnitude, and set L = L 0 È L1 . Then,
hL c F hL , F h
0 1
hL 2
£a +b
K hL 2
where, 2d 2 K 1
a= ,b=
1 - d 2K 1 - d 2K
n Methods
1. Deterministic method
2. Randomization method
n Method without specified d2K (ß just assume d2K > 0)
n Method with specified d2K (ß particular value of d2K is specified)
n Definition of RIP
n A matrix F satisfies the restricted isometry property(RIP) of order K if
{
there exists a d K Î (0,1) such that, for all x Î S K = x | x 0 £ K , }
2 2 2
(1 - d K ) x 2 £ Fx 2 £ (1 + d K ) x 2
n Cond 2: The PDF is sub-Gaussian. That is, there exists a constant c > 0 s.t.
( )
f t
E e ij £ e c t
2 2
/2
for all t Î R
n Note that, the moment-generating function of the PDF is dominated by
that of a Gaussian PDF, which is also equivalent to requiring that the
tails of the PDF decay at least as fast as the tails of a Gaussian PDF.
n
with a constant c below. 2 1
( )
Strictly-Sub-Gaussian: a PDF satisfying E ef t = ec t /2 for all t Î R.
ij
2 2
c = E (fij2 ) =
M
n Corollary: suppose that F is an MxN matrix whose entries fij are iid
with fij drawn according to a strictly sub-Gaussian PDF with c2=1/M.
Let Y = Fx for x in RN. Then for any e > 0 and any x in RN,
æ Me 2 ö
( )=
E Y
2
2
x
2
2
& P ( Y
2
2
- x
2
2
³e x
2
2 ) £ 2 exp ç - * ÷
è k ø
With k * = 2
» 6.52
1 - log(2)
n Note that the norm of a sub-Gaussian RV strongly concentrates about its
mean.
n Furthermore, for sufficiently large M, (FY) will satisfy RIP with high
probability.
Practical Situation
n In practical implementation, the fully random matrix design may be
sometimes impractical to build in HW. Therefore it is possible to:
n Use a reduced amount of randomness
n Or model the architecture via matrices F that has significantly more
structure than a fully random matrix
n EX: random demodulator[192], random filtering [194], modulated
wideband converter [147], random convolution [2,166], compressive
multiplier [179]
n Although not quite easy as in the fully random case, one can prove
that many of such constructions also satisfy RIP.
1
n Lemma: For any matrix F, spark (F ) ³ 1 +
m (F )
n The lemma suggests need for small coherence m(F) for matrices in
CS.
Data Compression
(ECE 5546
5546--41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
s K ( x )1 < F hL , F h >
h 2 £ C0 + C1
K hL 2
where, 1 - (1 - 2 )d 2 K 2
C0 = 2 , C1 =
1 - (1 + 2)d 2 K 1 - (1 - 2)d 2 K
s K ( x )1
h 2 £ C0
K
--Q.E.D.---
(
then, M > 1 - 1 - 1 / C 2 N )
n In order to make the bound hold for all signals x with a constant C » 1 ,
then, regardless of what recovery algorithm is being used, need to
take M » N measurements.
n Cf: Guarantee that only holds for some subset of possible signals,
such as compressible or sparse signals (the quality of guarantee
adapts to the particular choice of x)
n In that sense, instance-optimality is also commonly referred to as
“uniform guarantee” since they hold uniformly for all x.
k 2 = d 2 / 2k * - log ( 42e / d ) / k1
End of Chapter 4
Data Compression
(ECE 5546
5546--41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Digital Media Lab.
From Chapter 4
xˆ = minN F x - y p
xÎR
Convex optimization
optimization--based method (2)
n Ex: J(x) = ||x||p
n p=0 (L0 norm): directly measure sparsity (but hard to solve)
n p=1 (L1 norm): gives robustness against outliers
min { J ( x ) | y = F x} ® min x 0
subject to y = Fx
x x
min { J ( x ) | H ( F x , y ) £ e } ® min x 0
subject to F x - y 2 £ e
x x
min F x - y 2
subject to x 0 £ K
x
ì1 2 ü Review
min í F x - y +m x 0 ý, m > 0 Convexity,
x
î2 2
þ Optimization,
etc.
L0 Approach (1)
n L0 norm explictly computes the number of nonzero components of
the given data
n Directly related to sparsity of a signal
n A function card(x): cardinality
n For scalar x:
card ( x ) = 0 ( x = 0) and 1 ( x ¹ 0)
http://en.wikipedia.org/wiki/Quasiconvex_function
Digital Media Lab. 13
Rf:: Quasiconvexity
Rf Quasiconvexity (2)
n Def: A function f : S à R defined on a convex subset S of a real vector
space is quasiconvex if for all x, y ∈ S and l ∈ [0,1], then,
f {l x + (1 - l ) y} £ max ( f ( x), f ( y ) )
n Note that the points x and y, and the point directly between them, can be
points on a line or more generally points in n-dimensional space.
n In words, if f is such that it is always true that a point directly between two
other points does not give a higher a value of the function than do both of
the other points, then f is quasiconvex.
n An alternative way of defining a quasi-convex function is to require that
each sub-levelset Sa(f) is a convex set.
S a ( f ) = { x | f ( x) £ a} ~ convex set
http://en.wikipedia.org/wiki/Quasiconvex_function
Digital Media Lab. 15
L0 Approach (2)
n General convex-cardinality problem
n It refers to a problem what would be convex, except for appearance of
card(.) in objective or constrains.
n Example: For f, C: convex,
Minimize card ( x ) subject to x Î C
n
n Solving convex-cardinality problem: for x ∈ R ,
n Fix a sparsity pattern of x (i.e., which entries are zero/nonzero), then
solve its convex problem
n If we solve the 2n convex problems associated with all possible sparsity
patterns, the convex-cardinality problem is solved completely.
n However, practically possible only for n ≤ 10
n General convex-cardinality problem is NP-hard.
Minimize F x - y 2
subject to x 0 £ K
Minimize F x - y 2 + l x 0
n L1-norm Heuristic
n Replace ||x||0 with l||x||1 or add regularization term l||x||1 to objective fct.
n l is a parameter used to achieve desired sparsity
L1 Approach (2)
n The L1 minimization problem
minN x 1 , F Î R MxN
xÎR
n Variant
n Start with cardinality constrained problem (f, C: convex)
Minimize f ( x ) subject to x Î C , card ( x ) £ K
n Apply heuristic to obtain L1-norm constrained problem
Minimize f ( x ) subject to x Î C , || x ||1 £ b
n Or L1-regularized problem
Minimize f ( x ) + l || x ||1 subject to x Î C
n b, g are adjusted so that card(x) ≤ K.
L1 Approach (4)
n Variant with polishing
n Use L1 heuristic to find x estimate with required sparsity
n Fix the sparsity pattern of x
n Re-solve the (convex) optimization problem with this sparsity pattern to
obtain final (heuristic) solution.
From Computational Methods for Sparse… 2010 IEEE Proceedings by J.A. Tropp and J. Wright
Equality--constrained Problem
Equality
n Equality-constrained problem
n Among all x consistent with measurements, pick one with min L1 norm
min x 1
subject to y = Fx (C1)
x
n How to choose m ?
n Often need to solve the equation repeatedly for different choices of m, or to
trace systematically the path of solutions as m decreases towards zero.
LASSO
n Least Absolute Shrinkage and Selection Operator (LASSO) method
n It is equivalent to the convex relaxation method (C2) in the sense that the
path of solution of (C3) parameterized by positive b matches the solution
path for as m varies.
2
min Fx - y subject to x 1 £ b (C 3)
x 2
min x 1 subject to Fx - y £ e (C 4)
x 2
min x 1
subject to F T ( F x - y ) £e
x ¥
Digital Media Lab. From CS Tutorial at ITA 2008 by Baraniuk, Romberg, and Wakin 29
• Algorithm:
(1). Initialize: set k=1
+
(2). Iterate: Choose ak ≥ 0 and coefficient vector xk from
ì * 1 2 ü
xk+ := arg min í( z - xk ) F* ( Fxk - y ) + a k z - xk 2
+ m z 1ý
z î 2 þ
+
If an acceptance test on xk is not passed, increase ak by some
factor and repeat.
(3). Line search: choose gk ∈ (0,1] and obtain xk+1 from
(
xk +1 := xk + g k xk+ - xk )
(4). Test: If stopping criterion holds, terminate with x=xk+1. Otherwise,
set k ß k+1 and goto (2).
Digital Media Lab. 32
Gradient Method (3)
n This gradient-based method works well on sparse signals when the
dictionary F satisfies RIP.
n It benefits from warm starting, that is, the work required to identify a
solution can be reduced dramatically when the initial estimate of x is close
to the solution.
n Continuation strategy
n Solve the optimization problem (C2) for a decreasing sequences of m
using the approximate solution for each value as the starting point for the
next sub-problem.
Review:
Convex Optimization
References (2)
n Convex Optimization (EE364a by Prof. Boyd)
n http://www.stanford.edu/class/ee364a/lectures.html
n Video lecture is also available Introduction
Convex sets
Convex functions
Convex optimization problems
Duality
Approximation and fitting
Statistical estimation
Geometric problems
Numerical linear algebra background
Unconstrained minimization
Equality constrained minimization
Interior-point methods
Conclusions
Lecture slides in one file.
Additional lecture slides:
Convex optimization examples
Stochastic programming
Chance constrained optimization
Filter design and equalization
Disciplined convex programming and CVX
Two lectures from EE364b:
methods for convex-cardinality problems
methods for convex-cardinality problems, part II
n Def: Convex combination of x1,. . ., xk: any point x of the form x = θ1x1
+ θ2x2 + ··· + θkxk with θ1 + ··· + θk =1, θj ≥ 0.
n Def: Convex hull (conv S): a set of all convex combinations of points
in S
n Def: Convex cone: a set that contains all conic combinations of points
in the set
Convex function
n Def: A function f(x): W à R is convex if only if any convex
combination x = θx1 + (1-θ) x2 for all x1, x2 ∈ W and θ , 0 ≤ θ ≤ 1,
satisfies f{θx1 + (1-θ) x2} ≤ θf(x1)+(1−θ)f(x2).
(Ñ 2
f ( x ) ) ³ 0 for 1 £ i, j £ N
ij
Data Compression
(ECE 5546-
5546-41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Digital Media Lab.
Greedy Algorithms
Search global
maximum starting
from A?
( P0 ) : xˆ = argmin z 0 subject to y = F x
z
èKø
n Non-zero components of x over the support: once the support is known,
err ( j ) min = y
2
-
(f y )
T
j
¾¾
® find j s.t .
(f y )
T
j
is maximum !!
2 2 ¬¾
¾ 2
fj fj
2 2
2 2
n Its solution is to choose a column which maximize (f Tj y ) fj 2
n Theorem tells that the normalization does not change the solution.
n From now on, assume pre-normalization without loss of generality.
n Weak-Matching Pursuit
n Rf: “At this point, it is not fully clear what role greedy pursuit algorithms
will ultimately play in practice.” From Computational Methods for Sparse… 2010 IEEE
Proceedings by J.A. Tropp and J. Wright
(b). Update support: find a column i of F that is most correlated with the residual.
i = arg max {err ( j )} = arg max r ( k -1) , f j , update support W ( k ) = W ( k -1) U {i}
1£ j £ N 1£ j £ N
(c). Update provisional solution: x ( k ) = x ( k -1) with updated entry x ( k ) (i ) = x ( k -1) (i ) + li*
(d). Update residual: r ( k ) = y - F x ( k ) = r ( k -1) - li*fi
(e). Stopping rule: If r ( k ) < e 0 , stop. Otherwise, apply another iteration.
2
n Unlike MP, OMP never re-selects an element already chosen and the
residual at any iteration is always orthogonal to all currently selected
elements.
(b). Update support: find a column i of F that is not already in W ( k -1) such that
i = arg max err ( j ) = arg max r ( k -1) , f j , update support W ( k ) = W ( k -1) U {i}
( k -1 ) ( k -1 )
1£ j £ N , jÏW 1£ j £ N , jÏW
2
(c). Update provisional solution: compute x ( k ) which minimizes y - F x 2 subject to supoort W ( k )
2
x(k ) = min (k )
y - Fz 2
z , supp(z) = W
2 T
min y - F W( k ) xW( k )
x
W( k )
2
¾¾
W W (
® min y - F ( k ) x ( k )
¬¾
¾ ) (y-F x ) W( k ) W( k )
( )
F TW( k ) y - F W( k ) xW( k ) = 0 ® xW( k ) = {(F T
W( k )
F W( k ) ) F }y =F
-1
T
W( k )
b
W( k )
y
2
Thus, we can compute r ( k -1)
n 2 at the beginning of the sweep stage,
and as we search for index i that gives the smallest err(i), we choose
the first that gives,
2 2
(f i
T
r ( k -1) ) ³ t2 r ( k -1) 2
³ t2 max
(f T
j r ( k -1) ) , for a prechosen t in ( 0,1)
2 2 2
1£ j £ N
fi fj
2 2
n Drawbacks:
n No reconstruction guarantee
n Moderate memory requirements compared to OMP since
orthogonalization requires maintenance of a Cholesky factorization of the
dictionary elements.
Digital Media Lab. 23
Thresholding Algorithm
n Idea: Choose the K largest inner products as the desired support.
n It implies that the search for the K elements of the support amounts to a
simple sorting of the entries of the vector |FTy|.
n If the number K is not a priori known, it can be increased until the error
2
y - F x ( k ) reaches a pre-specified value of e0.
2
· Update support: find a set of indices W of cardinality K that contains the smallest error by
" j Î W, err ( j ) £ min err (i )
iÏW
2
· Update provisional solution: compute x ( k ) which minimizes y - F x 2 subject to supoort W ( k )
2
x(k ) = min (k )
y - Fz 2
z , supp(z) = W
2
· Output: The proposed solution is x which minimizes y - F x 2 subject to supoort W.
to the s components
ï ( k +1)
ïî x = éë x ( k ) + F T r ( k ) ùû largest in magnitude
s
CoSaMP (1)
n Compressive Sampling Matching Pursuit (CoSaMP)
n .
Digital Media Lab. 28
CoSaMP (2)
Inputs: Sensing matrix F , measurement vector y , target sparsity s , tuning parameter a
Outputs: s -sparse coefficient vector x
· Initialize: Set k = 0, initial vector x ( k ) = 0, and residual r ( k ) = y.
· Main Iteration: Increment k by 1 and perform followings:
(a). Identify: find a s columns of F that are most strongly correlated with the residual.
W Î arg max å r ( k -1) , f j
|T | £a s jÎT
(b). Merge: put the old and new columns into one set T = supp (r ( k -1) ) U W
(c). Estimate: find the best coefficients for approximating the residual with these columns:
2
z * = arg min r ( k -1) - F T z
2
z
CoSaMP (3)
n Both CoSaMP and SP offer near optimal performance guarantee under
the conditions on the RIP.
n CoSaMP is the first greedy method to be shown to possess similar
performance guarantees to L1-based methods.
n Complexity of CosaMP ~ O(MN)
n Note its independence from the sparsity of the original signal. Also it
represents improvement over both greedy algorithms and convex methods.
n CoSaMP is faster but usually less effective than algorithms based on
convex programming.
n It is also faster and more effective than OMP for compressive sampling
problems, except perhaps in the ultra-sparse regime where the number of
non-zeros in the representation is very small.
n Drawback of CoSaMP
n Need to know prior knowledge of the sparsity K of the target signal.
n An incorrect choice of input sparsity may lead to a worse guarantee that
the actual error incurred by a weaker algorithm such as OMP.
n The stability bounds accompanying CoSaMP ensure that the error due to
an incorrect parameter choice is bounded, but it is not yet known how
these bounds translate into practice.
Digital Media Lab. 30
2012 Fall
Data Compression
(ECE 5546
5546--41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Digital Media Lab.
p ( B | A) p ( A)
p ( A | B) = , assume P ( B) ¹ 0
p( B)
p ( B | Ai ) p ( Ai )
p ( Ai | B ) =
å j p ( B | Aj ) P( Aj )
FROM http://en.wikipedia.org/wiki/Prior_probability
Digital Media Lab. 14
Rf:: Prior Probability (2)
Rf
n Parameters of prior distributions are called hyperparameters, to
distinguish them from parameters of the model of the underlying data.
FROM http://en.wikipedia.org/wiki/Prior_probability
Digital Media Lab. 15
n It is clear that different choices of the prior distribution p(x) may make
the integral more or less difficult to calculate, and the product
p(y|x)p(x) may take one algebraic form or another.
n For certain choices of the prior, the posterior has the same algebraic
form as the prior (generally with different parameter values). Such a
choice is a conjugate prior.
http://en.wikipedia.org/wiki/Conjugate_prior
Digital Media Lab. 18
Rf:: Conjugate Prior (3)
Rf
n Example: Conjugate priors
http://en.wikipedia.org/wiki/Conjugate_prior
Digital Media Lab. 19
c b b -1 -cx
f X ( x) = x e u ( x), b, c > 0
G (b)
where G(b) is the gamma function given by the integral:
¥
G(b) = ò0 y b-1e - y dy
1
x( ) e- x 2u ( x), n Î Z
n 2 -1
f X ( x) = n2
2 G (n 2)
cn
f X ( x) = x n -1e - cx u ( x ), n Î Z
( n - 1)!
ænö
P ( X = k ) = ç ÷ p k q n- k , k = 0, 1, 2, L , n
èk ø
Digital Media Lab. 23
y = F ( xS + xe ) = F xS + ne where ne = F xe
{
xˆ = arg min y - F x 2 + b x
x
2
L1 }
n However, direct evaluation of the posterior p(x|y) using a Laplace prior is
not tractable since the Laplace prior is not conjugate to the Gaussian
likelihood function, hence the associated Bayesian inference may not be
performed in closed form.
P ( x | a, b ) = ò p( x | a ) p (a | a, b ) da = Õ ò N ( xi | 0, a i-1 )G (a i | a, b ) da i
a i =1 0
n The integral can be also evaluated analytically, and it corresponds to
Student t-distribution.
n With appropriate choice of a and b, the Student t-distribution is strongly
peaked about xi=0 è sparseness prior!!!
n Similarly, a Gamma prior G(a0|c,d) is introduced on the inverse of the
noise variance a0=1/s2.
K - ågi
n Also, new i
a 0 = 2
y - Fm 2
< A factor graph depicting the relationship bet. Variables involved in CS recovery using BP >
(Black: variable node, white: constraint node)
Data Compression
(ECE 5546
5546--41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Digital Media Lab.
CS Applications
n Linear regression and model selection
n Sparse error correction
n Group testing and data stream algorithms
n Compressive medical imaging
n Analog-to-information conversion
n Single pixel camera
n Hyperspectral Imaging
n Compressive processing of manifold-modeled data
n Inference using compressive measurements
n Compressive sensor networks
n Genomic Sensing
y
Output
variable
http://en.wikipedia.org/wiki/Linear_regression_model
Digital Media Lab. 7
n In matrix form: y = X b + e
http://en.wikipedia.org/wiki/Linear_regression_model
Digital Media Lab. 8
Rf:: Linear Regression (3)
Rf
n Solution by Ordinary least squares (OLS) : the simplest and thus
most common estimator. It minimizes the sum of squared residuals
which leads to a closed-form:
-1
-1 æ 1 ö æ 1 ö
bˆ = éê( X T X ) X T ùú y = ç i
T
åi X i X ÷ø çè M åX y i i ÷
ë û èM i ø
http://en.wikipedia.org/wiki/Linear_regression_model
Digital Media Lab. 9
n In the data stream model, some or all of the input data that are to be
operated on are not available for random access from disk or memory,
but rather arrive as one or more continuous data streams.
n Streams can be denoted as an ordered sequence of points (or "updates")
that must be accessed in order and can be read only once or a small
number of times.
http://en.wikipedia.org/wiki/Streaming_algorithm
Digital Media Lab. 12
Rf:: Data Stream
Rf
n Data stream means various sequences of information:
n In telecommunications and computing: a sequence of digitally encoded
signals (ex: packets of data)
n In electronics and computer architecture: a data flow determines for
which time which data item is scheduled to enter or leave which port of a
systolic array, a Reconfigurable Data Path Array or similar pipe network,
or other processing unit or block (cf. main article)
Data Compression
(ECE 5546
5546--41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Digital Media Lab.
CS Applications
n Linear regression and model selection
n Sparse error correction
n Group testing and data stream algorithms
n Compressive medical imaging
n Analog-to-information conversion
n Single pixel camera
n Hyperspectral Imaging
n Compressive processing of manifold-modeled data
n Inference using compressive measurements
n Compressive sensor networks
n Genomic Sensing
Most materials here are from “Compressive Sampling for Analog Time Signals” by Richard
Baraniuk (at IMA2007 Talk) http://www.ima.umn.edu/2006-2007/ND6.4-15.07/activities/Baraniuk-
Richard/baraniuk-IMA-A2I-june07.pdf
Sensing by Sampling
n Foundation of analog-to-digital conversion (ADC):
n Shannon/Nyquist sampling theorem: “periodically sample at 2x signal
bandwidth” and “perfect reconstruction if the signal is band-limited”
STFT ¥
- jwt
S (t , w ) = å x(t )W (t - t )e
s = -¥
2
Spectogram S (t , w )
STFT ¥
- jwt
S (t , w ) = å x(t )W (t - t )e
s = -¥
2
Spectogram S (t , w )
M measurements
M » K << N
K : information rate
Streaming Measurements
n Streaming applications cannot fit signal into a processing buffer at
one time
1. Random Sampling
2. Random Filtering
3. Random Demodulator
n Process measurements
n use samples and time points in iterative algorithm (not the FFT)
n Extract information
n reconstruct most energetic portions of Fourier spectrum (not entire
spectrum)
Sparsogram (2)
n Example: frequency hopper
n Random sampling A2I at 13xsub-Nyquist rate
+/-1
Chipping
signal
n Therefore, y[1] = ån =1
pc [ n]x[ n]
é -1 + 1 + 1 ù
ê -1 + 1 - 1 ú
F=ê ú
ê +1 + 1 - 1 ú
ê ú
ë +1 - 1 - 1û
n Experimental results:
n Three examples
n Random sampling
n Random Filtering
n Random demodulation
n Open Issues
n New HW design
n New Transform that sparsify natural and man-made signals
n Analysis and optimization under real-world non-idealities such as jitter,
measurement noise, interference, etc.
n Reconstruction/processing algorithms for dealing with large N
Data Compression
(ECE 5546
5546--41)
Byeungwoo Jeon
Digital Media Lab, SKKU, Korea
http://media.skku.ac.kr; bjeon@skku.edu
Digital Media Lab.
CS Applications
n Linear regression and model selection
n Sparse error correction
n Group testing and data stream algorithms
n Compressive medical imaging
n Analog-to-information conversion
n Single pixel camera
n Hyperspectral Imaging
n Compressive processing of manifold-modeled data
n Inference using compressive measurements
n Compressive sensor networks
n Genomic Sensing
f ( x, y , l )
voxel
Data Cube
f ( x, y ) = { f ( x, y , l )}l
Spectral signature
http://en.wikipedia.org/wiki/Fingerprint
Digital Media Lab. 14
Applications: Life Science & Biotech.
n Hyperspectral imaging is an invaluable analytical technique for life
sciences and biotechnology applications whether used as a
traditional high performance spectral imaging instrument or whether
deployed as a multi-channel spectroscopy instrument.
n Hyperspectral imaging instruments can give the researcher access to
accurate, calibrated, and repeatable spectral analysis.
n When utilized as a multi-channel spectrometer, researchers are able
to conduct high-throughput screening experiments where high
spectral resolution, spatial differentiation, and channel separation are
all critical parameters. Optimized for high-throughput screening, the
Hyperspec instruments are fully-capable of processing at very high
speeds based on selected spectral bands or wavelengths of interest.
n Fluorescence
n High Throughput Screening
n Laboratory Research & Development
n Multi-Channel Spectroscopy
n Nanobead & Quantum Dot Detection
éF x, y 0 L 0 ù
ê 0 F x, y L 0 ú
F=ê ú
ê M M O M ú
ê ú
ëê 0 0 L F x , y ûú
é Y x, y 0 L 0 ù
ê 0 Y x, y L 0 ú
Y=ê ú
ê M M O M ú
ê ú
êë 0 0 L Y x , y úû