Professional Documents
Culture Documents
TSP 2016 2
TSP 2016 2
Abstract—Tensor Compressive Sensing (TCS) is a multidi- Successful CS reconstruction has been achieved by char-
mensional framework of Compressive Sensing (CS), and it is acterizing a number of properties of the sampling operator,
advantageous in terms of reducing the amount of storage, eas- e.g., the Restricted Isometry Property (RIP) [1], the mutual
ing hardware implementations and preserving multidimensional coherence [6] and the null space property [2]. These proper-
structures of signals in comparison to a conventional CS system. ties have been used to provide sufficient conditions on the
In a TCS system, instead of using a random sensing matrix and
a predefined dictionary, the average-case performance can be
equivalent sensing matrix (i.e., the product of the sensing
further improved by employing an optimized multidimensional matrix and the sparsifying basis) and to quantify the worst-
sensing matrix and a learned multilinear sparsifying dictionary. case reconstruction performance [2], [6], [7]. Random matrices
In this paper, we propose an approach that jointly optimizes such as Gaussian or Bernoulli matrices have been shown to
the sensing matrix and dictionary for a TCS system. For the fulfill these conditions for any given sparsifying basis, i.e., a
sensing matrix design in TCS, an extended separable approach dictionary, and hence are widely used as the sensing matrix in
with a closed form solution and a novel iterative non-separable CS applications.
method are proposed when the multilinear dictionary is fixed. In view of the fact that the mainstream view in the signal
In addition, a multidimensional dictionary learning method that processing community considers the average-case performance
takes advantages of the multidimensional structure is derived, rather than the worst-case performance, it is shown in [8]–[12]
and the influence of sensing matrices is taken into account in the
that the average-case reconstruction performance can be en-
learning process. A joint optimization is achieved via alternately
iterating the optimization of the sensing matrix and dictionary. hanced by optimizing the sensing matrix such that the sensing
Numerical experiments using both synthetic data and real images matrix is more incoherent with the given sparsifying basis than
demonstrate the superiority of the proposed approaches. a random sensing matrix. Also, instead of using a fixed signal-
sparsifying basis, e.g., a Discrete Wavelet Transform (DWT),
Keywords—Multidimensional system, compressive sensing, tensor one can further enhance CS performance by employing a
compressive sensing, dictionary learning, sensing matrix optimiza-
tion.
basis to abstract the basic atoms that compose the signal
ensemble. The process of learning such a basis is referred to as
"sparsifying dictionary learning” and it has been widely inves-
I. I NTRODUCTION tigated in the literature [13]–[17]. Considering that the sensing
The traditional signal acquisition-and-compression paradig- matrix and the dictionary jointly affect the performance of
m removes the signal redundancy and preserves the essential a CS system, applying only traditional dictionary learning in
contents of signals to achieve savings on storage and trans- CS applications without considering the sensing matrix may
mission, where the minimum sampling ratio is restricted by lead to a non-optimal equivalent sensing matrix. In [18]–[20],
the Shannon-Nyquist Theorem at the signal sampling stage. joint optimization of the sensing matrix and the dictionary are
The wasteful process of sensing-then-compressing is replaced investigated, demonstrating that it is beneficial to consider this
by directly acquiring the compressed version of signals in joint approach in practical CS systems.
Compressive Sensing (CS) [1]–[3], a new sampling paradigm Multidimensional signals are mapped in vector format in the
that leverages the fact that most signals have sparse repre- process of sensing and reconstruction of a CS system, because
sentations (i.e., there are only a few non-zero coefficients) the conventional CS framework considers only vectorized
in some suitable basis. Successful reconstruction of such signals. At the sensing node, such a vectorization requires the
signals is guaranteed for a sufficient number of randomly taken hardware to be capable of simultaneously multiplexing along
samples that are far fewer in number than that required in all data dimensions, which is hard to achieve especially when
the Shannon-Nyquist Theorem. Therefore CS is very attractive one of the dimensions is along a timeline. Secondly, a real-
for applications such as medical imaging and wireless sensor world vectorized signal requires an enormous sensing matrix
networks where data acquisition is expensive [4], [5]. that has as many columns as the number of signal elements.
Xin Ding and Ian J. Wassell are with the Computer Lab, University of
Consequently such an approach imposes large demands on
Cambridge, UK (e-mail: xd225, ijw24@cam.ac.uk). storage and processing power. In addition, the vectorization
Wei Chen is with the State Key Laboratory of Rail Traffic Control and also results in a loss of structure along the various dimensions
Safety, Beijing Jiaotong University, China (e-mail: weich@bjtu.edu.cn). (e.g., rank deficient along one dimension while sparse along
Corresponding author: Wei Chen. another [21]), the presence of which is beneficial for devel-
This work is supported by EPSRC Research Grant EP/K033700/1; the
Natural Science Foundation of China (61671046,61401018, U1334202); the oping efficient reconstruction algorithms (see [21], [22]). For
State Key Laboratory of Rail Traffic Control and Safety (RCS2016ZT014) of these reasons, applying conventional CS to applications that
Beijing Jiaotong University. involve multidimensional signals is challenging.
2
Extending CS to multidimensional signals has attracted • We propose a multidimensional dictionary learning ap-
growing interest over the past few years. Most of the related proach that couples the optimization of the multidimen-
work in the literature focuses on CS for 2D signals (i.e., matri- sional sensing matrix. This approach extends KSVD [14]
ces), e.g., matrix completion [23], [24], and the reconstruction and coupled-KSVD [18] to take full advantage of the
of sparse and low rank matrices [25]–[27]. multidimensional structure in tensors with a reduced
In [28], Kronecker product matrices are proposed for use in number of iterations required for the update of dictionary
CS systems, which makes it possible to partition the sensing atoms.
process along signal dimensions and paves the way to develop The proposed approaches are demonstrated to enhance the
CS for tensors, i.e., signals with two or more dimensions. In performance of existing TCS systems via the use of extensive
[29], separable imaging has been studied, which demonstrated simulations using both synthetic data and real images. For real-
that the Tensor CS (TCS) model can ease the optical designs, world applications involving multidimensional signals, such as
data storage and computation complexity for 2D images. The 2D imaging, hyperspectral imaging and video acquisition, the
optical hardware design using the TCS model for images has proposed approaches can be employed to design the optimized
been investigated in [30]. There are many other applications multidimensional sensing schemes and sparsifying basis, or ap-
that can benefit from TCS, e.g., hyperspectral imaging, video proximations thereof, to subsample and sparsify these signals.
acquisition and distributed sensing, as detailed in [31]. TCS Specifically, in hyperspectral imaging for instance, one can
reconstruction has been studied in [32]–[35]. To the best design the structure of Digital Micromirror Device (DMD) to
of our knowledge, there is no prior work concerning the achieve optimized sampling in the spatial dimensions, while
enhancement of TCS via optimizing the sensing matrices at designing the coded aperture for the optimized sensing across
various dimensions in a tensor. In addition, although dictionary the spectra [39], and both spatial and spectral dictionaries can
learning techniques have been considered for tensors [36]– be learnt using the proposed method.
[38], it is still not clear how to conduct tensor dictionary The remainder of this paper is organized as follows. Section
learning to incorporate the influence of sensing matrices in II formulates CS and TCS, and introduces the related theory.
TCS. Section III reviews the sensing matrix design approaches for
Following the train of thought in CS, in this paper, we aim CS and presents the proposed methods for TCS sensing matrix
to improve the average-case performance of TCS by optimiz- design. In Section IV, we review the related dictionary learning
ing both the sensing matrix and the dictionary. Furthermore, techniques, followed by the elaboration of the proposed mul-
considering that the sensing matrix and the dictionary are also tidimensional dictionary learning approach and we present the
coupled in TCS systems, we will investigate their joint design. joint optimization algorithm. Experimental results are given in
Unlike the optimization for a conventional CS system where Section V and Section VI concludes the paper.
a single sensing matrix and a sparsifying basis for vectorized
signals are obtained, we estimate multiple sensing matrices and
sparsifying bases, each acting along the various dimensions A. Multilinear Algebra and Notation
of the tensor, thereby maintaining the advantages of TCS.
The Kronecker structure that is involved in a TCS system Boldface lower-case letters, boldface upper-case letters and
makes the optimization problem more challenging than the non-boldface letters denote vectors, matrices and scalars, re-
conventional design problem in a CS system, and it also leads spectively. A mode-n tensor is an n-dimensional array X ∈
to opportunities for us to develop novel design approaches. RN1 ×...×Nn . The mode-i vectors of a tensor are determined
The contributions of this work are as follows: by fixing every index except the one in the mode i and the
• We are the first to consider the optimization of a slices of a tensor are its two dimensional sections determined
multidimensional sensing matrix and dictionary for a by fixing all but two indices. By arranging all the mode-i
TCS system, and we design a joint optimization of the vectors as columns of a matrix, the mode-i unfolding matrix
two, which also includes particular cases of optimizing X(i) ∈ RNi ×N1 ...Ni−1 Ni+1 ...Nn is obtained. The operator
the sensing matrix for a given multilinear dictionary f oldi (·) is defined as the opposite operation of unfolding
and learning the dictionary for a given multidimensional such that: f oldi (X(i) ) = X, where the dimensions of the
sensing matrix. tensor are assumed to be known. The mode-k tensor by matrix
• We propose a separable approach for sensing matrix de- product is defined as: Z = X ×k A, where A ∈ RJ×Nk ,
sign by extending the existing work for conventional CS. Z ∈ RN1 ×...×Nk−1 ×J×Nk+1 ×...×Nn and it is calculated by:
We prove that the resulting optimization is separable, Z = f oldi (Z(i) ) = f oldi (AX(i) ), where Z(i) is the mode
i.e., the sensing matrix along each dimension can be i unfolding of Z and Z(i) = AX(i) . The matrix Kronecker
independently optimized, and the approach has a closed product and vector outer product are denoted by A ⊗ B and
form solution. a ◦ b, respectively. The lp norm of a vector is defined as:
∑n 1
• We put forth a non-separable method for sensing matrix ||x||p = ( i=1 |xi |p ) p . For vectors, matrices and tensors,
design using a combination of the state-of-art measures the l0 norm is given by the number of nonzero entries. IN
for sensing matrix optimization. This approach leads to denotes the N × N identity matrix. The operator (·)−1 , (·)T
the best reconstruction performance in our comparison, and tr(·) represent matrix inverse, matrix transpose and the
but it is iterative and hence needs more computing power trace of a matrix, respectively. The number of elements for a
to implement. vector, matrix or tensor is denoted by len(·).
3
II. C OMPRESSIVE S ENSING (CS) AND T ENSOR a particular optical system to implement the sensing procedure,
C OMPRESSIVE S ENSING (TCS) e.g., a 2D separable imaging scheme has been developed in
A. Sensing Model [30]. Otherwise, if the data along some of the dimensions
are not available at the same time, e.g., as for the time-
Consider a multidimensional signal X ∈ RN1 ×...×Nn . Con- dependent signals, the sensing procedure can still be achieved
ventional CS takes measurements from its vectorized version progressively (as detailed in [28]) due to the partitioned sensing
via: nature of TCS.
y = Φx + e, (1) It is also useful to mention that the TCS model in (6) is
∏
where x ∈ R (N = i Ni ) denotes the vectorized signal,
N equivalent to:
Φ ∈ RM ×N (M < N ) is the sensing matrix, y ∈ RM y = (An ⊗ An−1 ⊗ ... ⊗ A1 )s + e, (7)
represents the measurement vector and e ∈ RM is a noise
term. The vectorized signal is assumed to be sparse in some as derived in [34]. By denoting A = An ⊗ An−1 ⊗ ... ⊗ A1 ,
sparsifying basis Ψ ∈ RN ×N̂ (N ≤ N̂ ), i.e., it becomes a conventional CS model akin to (3), except that
the sensing matrix in (7) has a multilinear structure.
x = Ψs, (2)
where s ∈ RN̂ is the sparse representation of x and it has only B. Signal Reconstruction
K (K ≪ N̂ ) non-zero coefficients. Thus the sensing model
In conventional CS, the problem of reconstructing s from
can be rewritten as:
the measurement vector y captured using (3) is modeled as a
y = ΦΨs + e = As + e, (3) l0 minimization problem as follows:
where A = ΦΨ ∈ R M ×N̂
is the equivalent sensing matrix. min ||s||0 , s.t. ||y − As|| ≤ ε, (8)
s
Even though CS has been successfully applied to practical
sensing systems [40]–[42], the sensing model has a few where ε is a tolerance parameter. Many algorithms have
drawbacks when it comes to tensor signals [28]. First of all, the been developed to approximate the solution of this problem,
multidimensional structure presented in the original signal X is including Basis Pursuit (BP) [1]–[3], [43], i.e., conducting
omitted due to the vectorization, which loses information that convex optimization by relaxing the l0 norm in (8) as the l1
can lead to efficient reconstruction algorithms [21]. Besides, as norm, and greedy algorithms such as Orthogonal Matching
stated by (1), the sensing system is required to operate along Pursuit (OMP) [44] and Iterative Hard Thresholding (IHT)
all dimensions of the signal simultaneously, which is difficult [45]. The reconstruction performance of the l1 minimization
to achieve in practice [29], [30]. Furthermore, the size of Φ approach has been studied in [7], [43], where the well known
associated with the vectorized signal becomes too large to be Restricted Isometry Property (RIP) was introduced to provide
practical for applications involving multidimensional signals. a sufficient condition for successful signal recovery.
TCS tackles these problems by utilizing separable sensing Definition 1: A matrix A satisfies the RIP of order K with
operators along tensor modes and its sensing model is: Restricted Isometry Constant (RIC) δ K being the smallest
number such that
Y = X ×1 Φ1 ×2 Φ2 ... ×n Φn + E, (4)
(1 − δK )||s||22 ≤ ||As||22 ≤ (1 + δK )||s||22 (9)
M1 ×...Mn
where Y ∈ R represents the measurement, E ∈
RM1 ×...Mn denotes the noise term, Φi ∈ RMi ×Ni (i = holds for all s with ||s||0 ≤ K. √
1, ..., n) are sensing matrices and Mi < Ni . The multidimen- Theorem 1: Assume that δ2K < 2 − 1 and ||e||2 ≤ ε.
sional signal is assumed to be sparse in a separable sparsifying Then the solution ŝ to (8) obeys
basis Ψi ∈ RNi ×N̂i (i = 1, ..., n), i.e., ||ŝ − s||2 ≤ C0 K −1/2 ||s − sK ||1 + C1 ε (10)
X = S ×1 Ψ1 ×2 Ψ2 ... ×n Ψn , (5) √ √
where C0 = 2+(2 √
2−2)δ2K
1−( 2+1)δ2K
, C 1 = 4 √1+δ2K
1−( 2+1)δ2K
, δ2K is the
1 ×...N̂n
where S ∈ RN̂∏ is the sparse representation that has RIC of matrix A, sK is an approximation of s with all but the
only K (K ≪ i N̂i ) non-zero coefficients. The equivalent K largest entries set to zero.
sensing model can then be written as: The previous theorem states that for the noiseless case, any
Y = S ×1 A1 ×2 A2 ... ×n An + E, (6) sparse signal with fewer than K non-zero coefficients can be
exactly recovered
√ if the RIC of the equivalent sensing matrix
where Ai = Φi Ψi (i = 1, ..., n) are the equivalent sensing satisfies δ2K < 2 − 1; while for the noisy case and the not
matrices. exactly sparse case, the reconstructed signal is still a good
Using the TCS sensing model in (4), the sensing procedure approximation of the original signal under the same condition.
in (1) is partitioned into a few processes having smaller sensing The theoretical guarantees of successful reconstruction for the
matrices Φi ∈ RMi ×Ni (i = 1, ..., n) and yet it maintains the greedy approaches have also been investigated in [44], [45].
multidimensional structure of the original signal X. Regarding The RIP essentially measures the quality of the equivalent
the hardware implementation of a TCS system, if the data sensing matrix A, which closely relates to the design of Φ and
along all dimensions exists at the same time, one may design Ψ. However, since the RIP condition is not computationally
4
tractable to verify for a given sensing matrix, another mea- A. Sensing Matrix Design for CS
sure is often used for CS projection design, i.e., the mutual We observe that the sufficient conditions on the RIC or
coherence of A [6] and it is defined by: the mutual coherence for successful CS reconstruction, as
reviewed in Section II-B, only describe the worst case bound,
µ(A) = max |aTi aj |, (11)
1≤i, j≤N̂ , i̸=j which means that the average recovery performance is not
reflected. The CS average-case performance has been analyzed
where ai denotes the ith column of A, and it is assumed to in [46]–[50], that considers a distribution over the signals. In
be normalized such that aTi ai = 1. It has been shown that the fact, the most challenging part of CS sensing matrix design lies
reconstruction error of the l1 minimization problem is bounded in deriving a measure that can directly reveal the expected-case
if µ(A) < 1/(4K − 1). reconstruction accuracy when the signals are unknown.
We can see that the mutual coherence reveals the degree In [8], Elad et al. proposed the notion of averaged mutual
of orthogonality between columns of A. A smaller value of coherence, based on which an iterative algorithm is derived for
mutual coherence indicates that the equivalent sensing matrix optimal sensing matrix design. This approach aims to minimize
A is closer to an orthogonal matrix, in which case the signal the largest absolute values of the off-diagonal entries in the
reconstruction becomes easier. In addition, we can see that Gram matrix of A, i.e., GA = AT A. It has been shown
the largest off-diagonal entries of the Gram matrix of A, i.e., to outperform a random Gaussian sensing matrix in terms of
GA = AT A, provides the value of the mutual coherence of reconstruction accuracy, but is time-consuming to construct
A. Therefore, based on this concept, optimal projection design and can ruin the worst case guarantees by increasing off-
approaches are derived, e.g., in [8], [9], [18]. diagonal values that are not the largest in the original Gram
When it comes to TCS, the reconstruction approaches for CS matrix.
can still be utilized owing to the relationship in (7). However, In order to make any subset of columns in A as orthogonal
for the algorithms where explicit usage of A is required, e.g, as possible, Sapiro et al. proposed in [18] to make GA as
OMP, the implementation is restricted by the large dimension close as possible to an identity matrix, i.e., ΨT ΦT ΦΨ ≈ IN̂ .
of A. By extending the CS reconstruction approaches to It is then approximated by minimizing ||Λ − ΛΓT ΓΛ||2F ,
utilize tensor-based operations, TCS reconstruction algorithms where Λ comes from the eigen-decomposition of ΨT Ψ, i.e.,
employing only small matrices Ai (i = 1, ..., n) have been ΨT Ψ = VΛVT , and Γ = ΦV. This approach is also
developed in [29], [30], [34], [35]. These methods maintain iterative, but outperforms Elad’s method. Considering the fact
the theoretical guarantees of conventional CS when A obeys that A has minimum coherence when the magnitudes of all
the condition on the RIC or the mutual coherence, but reduce the off-diagonal entries of GA are equal, Xu et al. proposed
the computational complexity and relax the storage memory an Equiangular Tight Frame (ETF) 1 based method in [9].
requirement. The problem is modeled as: minGt ∈H,Φ ||ΨT ΦT ΦΨ−Gt ||2F ,
Even so, the conditions on A are not intuitive for a prac- where Gt is the target Gram matrix and H is the set of the
tical TCS system, which explicitly utilizes multiple separable ETF Gram matrices. The problem is solved following an alter-
sensing matrices Ai (i = 1, ..., n) instead of a single matrix nating minimization procedure. Once the target Gram matrix
A. Fortunately, the authors of [28] have derived the follow- is determined, the optimized projection matrix is constructed
ing relationships to clarify the corresponding conditions on using a QR factorization with eigenvalue decomposition. Im-
Ai (i = 1, ..., n). proved performance has been observed for the obtained sensing
Theorem 2: Let Ai (i = 1, ..., n) be matrices with RICs matrix.
δK (A1 ), ..., δK (An ), respectively, and their mutual coherence More recently, based on the same idea as Sapiro, the
are µ(A1 ), ..., µ(An ). Then for the matrix A = An ⊗An−1 ⊗ problem of
... ⊗ A1 , we have min ||IN̂ − ΨT ΦT ΦΨ||2F (14)
Φ
µ(A) = max[µ(A1 ), ..., µ(An )], (12) has been considered and an analytical solution has been derived
∏
n
in [11]. Meanwhile, in [10], [51], it has been shown that
δK (A) ≤ (1 + δK (Ai )) − 1. (13) in order to achieve good expected-case Mean Squared Error
i=1 (MSE) performance, the equivalent sensing matrix ought to be
close to a Parseval tight frame, thus leading to the following
In [28], these relationships are then utilized to derive the design approach:
reconstruction error bounds for a TCS system. min ||Φ||2F , s.t. ΦΨΨT ΦT = IM , (15)
Φ
III. O PTIMIZED M ULTILINEAR P ROJECTIONS FOR TCS where ||Φ||2F is the sensing cost that also affects the recon-
struction accuracy (as verified in [10], [51]). A closed form
In this section, we show how to optimize the multilinear
sensing matrix when the dictionaries Ψi (i = 1, ..., n) for each 1 A matrix X ∈ RN ×N̂ is defined to be an Equiangular Tight Frame
dimension are fixed. We first introduce the related design ap- (ETF)
√ if it is column normalized and for any i ̸= j, |XT (:, i)X(:, j)| =
proaches for CS, then present the proposed methods for TCS, (N̂ − N )/[N (N̂ − 1)]. The Gram matrix of an ETF is called the ETF
including a separable and a non-separable design approach. Gram.
5
solution to this problem was also obtained in [10], [51]. These decomposed as Φ2 ⊗ Φ1 when a certain permutation of Φ has
approaches have further improved the average reconstruction rank 1 [52], which is not the case for most sensing strategies.
performance for a CS system that is able to employ the When the term in (18) is minimized to a non-zero value, the
optimized sensing matrix. solution Φ̂1 , Φ̂2 leads to a sensing matrix Φ̂2 ⊗ Φ̂1 , which
On the other hand, using the model of Xu’s method [9], may not satisfy the condition of the sensing matrix Φ for good
Cleju [12] proposed to take Gt = ΨT Ψ so that the equivalent CS recovery (e.g., the requirement on the mutual coherence),
sensing matrix has similar properties to those of Ψ. Bai et al. thereby ruining the reconstruction guarantees. Secondly, al-
[20] proposed combining the ETF Grams and that proposed though one may include the mutual coherence requirement as a
by Cleju to solve: constraint in the problem, to solve (18), explicit storage of Φ is
necessary, which is restrictive for high dimensional problems.
min (1−β)||ΨT Ψ−ΨT ΦT ΦΨ||2F +β||Gt −ΨT ΦT ΦΨ||2F , In addition, when the number of tensor modes increases, the
Gt ∈H̄,Φ
(16) problem becomes more complex to solve.
where Gt is the target Gram matrix, β is a trade-off parameter Therefore, we aim to optimize Φ1 and Φ2 directly without
and H̄ is relaxed from the set of the ETF Grams to: knowing Φ. Extending (14) and (15), we first propose a
method that is shown to be separable as independent sub-
H̄ = {Gt : Gt = GTt , Gt (k, k) = 1, ∀k, max|Gt (i, j)| ≤ µ}, design-problems. Then a non-separable design approach is
i̸=j
√ (17) presented and a gradient based algorithm is derived.
1) A Separable Design Approach: The proposed separable
where µ = (N̂ − N )/[N (N̂ − 1)]. A closed form solution design approach (Approach I) is as follows:
is then derived analytically to solve this problem [20]. The
method is shown to be more general than other existing min ||IN̂1 N̂2 −(ΨT2 ⊗ΨT1 )(ΦT2 ⊗ΦT1 )(Φ2 ⊗Φ1 )(Ψ2 ⊗Ψ1 )||2F ,
Φ1 ,Φ2
methods in the sense of the requirement on Ψ, and promising (19)
results using this method are demonstrated. and it is an extension of (14) to the case when a multilinear
sensing matrix is employed. The solution of (19) is presented
B. Multidimensional Sensing Matrix Design for TCS in Theorem 3 and Approach I is also summarized in Algorithm
In contrast to the aforementioned methods, we consider 1.
optimization of the sensing matrix for TCS. Compared to the Theorem
[ 3: Assume
] for i = 1, 2, N̄i = rank(Ψi ), Ψi =
ΛΨi 0
design process in conventional CS, the main distinction for UΨi VΨi is an SVD of Ψi and ΛΨi ∈ RN̄i ×N̄i .
T
the TCS is that we would like to optimize multiple separable 0 0
sensing matrices Φi (i = 1, ..., n), rather than a single matrix Let Φ̂i ∈ RMi ×Ni (i = 1, 2) be matrices with rank(Φ̂i ) =
Φ. In this section, we first extend the approaches in (14) and Mi and Mi ≤ N̄i is assumed. Then
(15) to the TCS case, where our main contribution is to prove • the following equation is a solution to (19):
that the problem is separable along each of its dimensions and [ T −1 ]
can be solved using the existing approaches. In addition, we V ΛΨi 0
Φ̂i = U [ IMi 0 ] UTΨi , (20)
propose a new approach for TCS sensing matrices design by 0 0
incorporating the state-of-art metrics for the sensing matrix
where i = 1, 2, U ∈ RMi ×Mi and V ∈ RN̄i ×N̄i are
optimization as in [10], [12], [20]. To simplify our exposition,
arbitrary orthonormal matrices;
we elaborate our methods in the following sections for the case
of n = 2, i.e., the tensor signal becomes a matrix, but note that • the resulting equivalent sensing matrices Âi =
the methods can be straightforwardly extended to an n mode Φ̂i Ψi (i = 1, 2) are Parseval tight frames, i.e.,
tensor case (n > 2). ||ÂTi z||2 = ||z||2 , where z ∈ RN̂i is an arbitrary vector.
As reviewed in Section II-B, the performance of existing • the minimum of (19) is N̂1 N̂2 − M1 M2 ;
TCS reconstruction algorithms relies on the quality of A, • separately solving the sub-problems
where A = A2 ⊗ A1 when n = 2. Therefore, when the
min ||IN̂i − ΨTi ΦTi Φi Ψi ||2F (21)
multilinear dictionary Ψ = Ψ2 ⊗Ψ1 is given, one can optimize Φi
Φ (where Φ = Φ2 ⊗ Φ1 ) using the methods for CS as for i = 1, 2 leads to the same solutions as (20) and the
introduced in Section III-A. resulting objective in (19) has the same minimum, i.e.,
However, when implementing a TCS system, it is still
N̂1 N̂2 − M1 M2 .
necessary to obtain the separable matrices, i.e., Φ1 and Φ2 .
One intuitive solution is to design Φ using the aforementioned Proof: The proof is given in Appendix A.
approaches for CS and then to decompose Φ by solving the Clearly, Approach I is separable, which means that we can
following problem: independently design each Φi according to the corresponding
sparsifying dictionary Ψi in mode i. This observation stays
min ||Φ − Φ2 ⊗ Φ1 ||2F , (18) consistent when we consider the situation in an alternative way.
Φ1 ,Φ2
Applying the method in (14) to acquire the optimal Φ1 and Φ2
which has been studied as a Nearest Kronecker Product (NKP) independently, we are actually trying to make any subset of
problem in [52]. But this is not a feasible solution for TCS columns in A1 and A2 , respectively, as orthogonal as possible.
sensing matrix design. First of all, Φ can only be exactly As a result, the matrix A = A2 ⊗ A1 that is obtained will
6
IV. J OINTLY L EARNING T HE M ULTIDIMENSIONAL When it comes to the tensor case, KHOSVD [37], that
D ICTIONARY AND S ENSING M ATRIX is a tensor-based dictionary learning approach, is obtained
In this section, we first propose a sensing-matrix-coupled by extending KSVD. It models the 2D dictionary learning
method for multidimensional sparsifying dictionary learning. problem as:
Then it is combined with the previously introduced optimiza- min ||X−S×1 Ψ1 ×2 Ψ2 ||2F , s.t., ∀i, ||Si ||0 ≤ K, (34)
tion approach for a multilinear sensing matrix to yield a joint Ψ1 ,Ψ2 ,S
optimization algorithm. In the spirit of the coupled KSVD
method [18], our approach for dictionary learning can be where Ψ1 ∈ RN1 ×N̂1 and Ψ2 ∈ RN2 ×N̂2 are the dictionaries
viewed as a sensing-matrix-coupled version of a tensor KSVD to be learnt, X is the tensor containing the training data, S
algorithm. We start by briefly introducing the coupled KSVD is the sparse representation and Si is its mode-i unfolding.
method. Its learning process follows the same train of thought as with
the conventional KSVD method, except that to eliminate the
A. Coupled KSVD and Related Work largest error in each iteration, a Higher Order SVD (HOSVD)
[53], i.e., SVD for tensors, is employed. First, the sparse
The Coupled KSVD (cKSVD) [18] is a dictionary learning representation is estimated using OMP. Then, an atom of Ψ1
approach for vectorized signals. Let X = [ x1 ... xT ] be and an atom of Ψ2 , that correspond to an element of S are
a N × T matrix containing a training sequence of T signals updated, and N̂1 N̂2 iterations are performed to complete the
x1 , ..., xT . The cKSVD aims to solve the following problem, update of all the atoms. This process is then repeated until
i.e., to learn a dictionary Ψ ∈ RN ×N̂ from X: the algorithm converges. Clearly, the inner iterations involve
min γ||X − ΨS||2F + ||Y − ΦΨS||2F , s.t. ∀i, ||si ||0 ≤ K, duplicated updating of the dictionary atoms because there are
Ψ,S only N̂1 + N̂2 atoms in total. In addition, the effect owing to
(29) the sensing matrix is not considered in this approach.
where S = [ s1 ... sT ] is the sparse representation with
size N̂ × T , γ > 0 is a tuning parameter and Y ∈ RM ×T
contains the measurement vectors taken by the sensing matrix B. The cTKSVD Approach
Φ ∈ RM ×N , i.e., Y = [ y1 ... yT ] and Y = ΦX + E In order to learn multidimensional separable dictionaries
with E ∈ RM ×T representing the noise. Then the problem in for high dimensional signals, and eventually to achieve joint
(29) is reformatted as: optimization of the multidimensional dictionary and sensing
matrix, we will derive a coupled-KSVD algorithm for a
min ||Z − DS||2F , s.t. ∀i, ||si ||0 ≤ K, (30) tensor, i.e., cTKSVD, in this section. The proposed cTKSVD
Ψ,S
[ ]T [ ]T algorithm follows the same train of thought as the cKSVD
where Z = γXT YT , D = γIN ΦT Ψ. The method, however, the specific steps are different due to the
problem can then be solved following the conventional KSVD involvement of extra dimensions and dictionaries. We will first
algorithm [14] and conducting proper normalization. provide the problem formulation for cTKSVD, which will then
Specifically, with an initial arbitrary Ψ, it first recovers S be solved by alternately updating the sparse coefficients and the
using some available algorithms, e.g., OMP. Then the objective atoms of various dictionaries. Note that each of the updating
in (30) is rewritten as: steps is also different from that of cKSVD in terms of either
the employed solver or the elements that are updated. Again
min ||R̃p − dp s̃Tp ||2F , (31) for simplicity we will still describe the main flow for 2-D
Ψ,S
signals, i.e., n = 2.
where p is the index of the current atom we aim to update, s̃Tp Consider a training sequence of 2-D signals X1 , ..., XT , we
∑the row Tof S where the zeros have been removed, Rp = Z−
is obtain a tensor X ∈ RN1 ×N2 ×T by stacking them along the
q̸=p dq sq and R̃p denotes the columns of Rp corresponding third dimension. Denoting the stack of the sparse represen-
to the nonzero entries of sTp . Let R̃p = UR ΛR VRT
be a SVD tations Si ∈ RN̂1 ×N̂2 , (i = 1, ..., T ) by S ∈ RN̂1 ×N̂2 ×T ,
of R̃p , then the highest component of the coupled error R̃p we propose the following optimization problem to learn the
can be eliminated by defining: multidimensional dictionary:
[ ]
ψ̂ p = (γ 2 IN + ΦT Φ)−1 γIN ΦT u1R , (32) min ||Z − S ×1 D1 ×2 D2 ||2F , s.t., ∀i, ||Si ||0 ≤ K, (35)
Ψ1 ,Ψ2 ,S
s̃p = ||ψ̂ p ||2 λ1R vR
1
, (33)
in which
[ ]
where λ1R is the largest singular value of R̃p and u1R , vR 1
γ2X γY2
are the corresponding left and right singular vectors. The Z= , Yi = X ×i Φi + Ei , (36)
γY1 Y
update column p of Ψ is obtained after normalization: ψ̂ p = [ ] [ ]
γIN̂1 γIN̂2
ψ̂ p /||ψ̂ p ||2 . The above process is then iterated to update every D1 =
Φ1
Ψ1 , D2 =
Φ2
Ψ2 , (37)
atom of Ψ.
Clearly the sensing matrix has been taken into account and γ > 0 is a tuning parameter.
during the dictionary learning process, which has been shown The problem in (35) aims to minimize the representation
to be beneficial for CS reconstruction performance [18]. error ||X − S ×1 Ψ1 ×2 Ψ2 ||2F and the overall projection error
8
||Y − S ×1 A1 ×2 A2 ||2F with constraints on the sparsity of R̃p2 and the update steps corresponding to (40) - (43) now
each slice of the tensor. In addition, it also takes into account become:
the projection errors induced by Φ1 and Φ2 individually.
(d̂2 )p2 = vR1
, D1 S̃:,p2 ,: = u1R ◦ (λ1R ω 1R ), (45)
Using an available sparse reconstruction algorithm for the
2 T −1
[ T
] 1
TCS, e.g., Tensor OMP (TOMP) [35], and initial dictionaries (ψ̂ 2 )p2 = (γ IN2 + Φ2 Φ2 ) γIN2 Φ2 vR , (46)
Ψ1 , Ψ2 , the sparse representation S can be estimated first. (ψ̂ 2 )p2 = (ψ̂ 2 )p2 /||(ψ̂ 2 )p2 ||2 , (47)
Then we update the multilinear dictionary alternately. We first
update the atoms of Ψ1 with Ψ2 fixed. The objective in (35) D1 S̃:,p2 ,: = ||(ψ̂ 2 )p2 ||2 u1R ◦ (λ1R ω 1R ), (48)
is rewritten as:
∑ in which S̃:,p2 ,: represents the lateral slice at index p2 and
||Rp1 − (d1 )p1 ◦ (d2 )q2 ◦ s(p1 −1)N̂2 +q2 ||2F , (38) its updated elements can also be calculated using LS. The
q2 dictionary Ψ2 is then updated iteratively. The whole process
∑ ∑ of updating S, Ψ1 , Ψ2 is repeated to obtain the final solution
where Rp1 = Z − q1 ̸=p1 q2 (d1 )q1 ◦ (d2 )q2 ◦ s(q1 −1)N̂2 +q2 ; of (35).
p1 is the index of the atom for the current update and q1 , q2 The uncoupled version of the proposed cTKSVD method
denote the indices of the remaining atoms of Ψ1 and all the (denoted by TKSVD) can be easily obtained by modifying the
atoms of Ψ2 , respectively; d1 , d2 are columns of D1 , D2 ; s is problem in (35) to:
the mode-3 vector of S. Then to satisfy the sparsity constraint
min ||X − S ×1 Ψ1 ×2 Ψ2 ||2F , s.t. ∀i, ||Si ||0 ≤ K, (49)
in (35), we only keep the non-zero entries of s(p1 −1)N̂2 +q2 and Ψ1 ,Ψ2 ,S
the corresponding subset of Rp1 to obtain:
and it can be solved following the same procedures as
∑ described previously for cTKSVD except that the steps of
||R̃p1 − (d1 )p1 ◦ (d2 )q2 ◦ s̃(p1 −1)N̂2 +q2 ||2F . (39) pseudo-inverse and normalization are no longer needed.
q2 Compared to the KHOSVD method [37], the proposed
cTKSVD for multidimensional dictionary learning has lower
Assuming that after carrying out a HOSVD [53] for R̃p1 , complexity. As mentioned in Section IV-A, in each iteration
the largest singular value is λ1R and the corresponding singular of the outer loop of KHOSVD, N̂1 N̂2 inner iterations are
vectors are u1R , vR
1
and ω 1R , we eliminate the largest error by: performed - this involves duplicated updating of the dictionary
atoms because there are only N̂1 + N̂2 atoms in total. However,
(d̂1 )p1 = u1R , D2 S̃p1 ,:,: = vR
1
◦ (λ1R ω 1R ), (40) for the proposed cTKSVD approach, the duplications are
removed by fully considering the multidimensional structure.
where S̃p1 ,:,: denotes the horizontal slice of S at index p1 that
contains only non-zero mode-2 vectors. The atom of Ψ1 is Since only N̂1 + N̂2 inner iterations are required for each
then calculated using the pseudo-inverse as: iteration of the outer loop, cTKSVD requires HOSVD to
be executed N̂1 N̂2 − N̂1 − N̂2 fewer times than for the
[ ]
(ψ̂ 1 )p1 = (γ 2 IN1 + ΦT1 Φ1 )−1 γIN1 ΦT1 u1R . (41) KHOSVD method and hence reduces the complexity. In addi-
tion, KHOSVD does not take into account the influence of the
The current update is then obtained after normalization: sensing matrix. The benefit of coupling of the sensing matrices
in cTKSVD will be shown by simulations in Section V-B.
(ψ̂ 1 )p1 For the cases where n > 2, cTKSVD can be formulated
(ψ̂ 1 )p1 = , (42) following a similar strategy as that employed in (35), and the
||(ψ̂ 1 )p1 ||2
problem can then be solved following similar steps to those
D2 S̃p1 ,:,: = ||(ψ̂ 1 )p1 ||2 vR
1
◦ (λ1R ω 1R ). (43) introduced earlier in this section.
We have now derived the method of learning the sparsifying
Since D2 and the support indices of each mode-2 vector dictionaries when the multilinear sensing matrix is fixed.
in S̃p1 ,:,: are known, the updated coefficients S̃p1 ,:,: can be Using an alternative optimization method, we can then jointly
easily calculated by the Least Square (LS) solution. The above optimize Φ1 , Φ2 and Ψ1 , Ψ2 . The overall procedure is
process is repeated for all the atoms to update the dictionary summarized in Algorithm 3. This algorithm works as follows.
Ψ1 . First, the dictionaries Ψ1 , Ψ2 are fixed, and the sensing
The next step is to update Ψ2 with the obtained Ψ1 fixed. matrices Φ1 , Φ2 are optimized using the methods elaborated
It follows a similar procedure to that described previously. in Section III-B. Then the sensing matrices are fixed and the
Specifically, the objective in (35) is rewritten as: dictionaries are learnt using cTKSVD detailed previously in
∑ this section. This process is repeated until convergence.
||R̃p2 − (d1 )q1 ◦ (d2 )p2 ◦ s̃(q1 −1)N̂2 +p2 ||2F , (44)
q1
V. E XPERIMENTAL R ESULTS
where s̃ is the mode-3 vector with only non-zero entries, In this section, we evaluate the proposed approaches via
R̃
∑p2 ∑ is the corresponding subset of Rp2 , Rp2 = Z − simulations using both synthetic data and real images. We first
q1 q2 ̸=p2 (d1 )q1 ◦ (d2 )q2 ◦ s(q1 −1)N̂2 +q2 and p2 is the index test the sensing matrix design approaches proposed in Section
of the atom for current update. A HOSVD is carried out for III-B with the sparsifying dictionaries being given. Then the
9
11
4
Algorithm 3 Joint Optimization 10 3
x 10
(0) (0)
Input: Ψi (i = 1, 2), Φi (i = 1, 2), X, α, β, η, γ, 10
6
iter = 0. 2.5
MSE
MSE
Output: Φ̂i (i = 1, 2), Ψ̂i (i = 1, 2). 10
8
MSE
MSE
(iter+1) 0.7
7: Update (ψ̂ 1 )p1 ,
D2 S̃p1,:,: using (41) - (43) and 10 0.125
cTKSVD approach is evaluated when the sensing matrices are 0) and high noise (σ 2 = 10−2 ) cases, respectively, when α =
fixed. Finally the experiments for the joint optimization of the 1. From both (a) and (c), we can see that when β = 0 or 1, the
two are presented. MSE is larger than that for the other values, which means that
both terms of Approach II that are controlled by β are essential
for obtaining optimal sensing matrices. In addition, we can
A. Optimal Multidimensional Sensing Matrix see that when β becomes larger in the range of [0.1, 0.9],
This section is intended to examine the proposed separable the MSE decreases slightly in (a), but increases slightly in (c).
Approach I and non-separable Approach II for multidimen- This indicates the choice of β under different conditions of
sional sensing matrix design. Before doing so, we first test sensing noise, which is consistent with that observed in [20].
the tuning parameters for Approach II, i.e., the non-separable Even so, owing to the extra term concerning the sensing energy
design approach presented in Section III-B-2. As detailed in involved in our approach, the MSE performance is much less
Section III-B-1, Approach I has a closed form solution and so sensitive to the choice of β than that in [20]. In the remaining
there are no tuning parameters involved. experiments, we take β = 0.8 when sensing noise is low and
We evaluate the Mean Squared Error (MSE) performance β = 0.2 when the noise is high. Fig. 1 (b) and (d) demonstrate
of different sensing matrices generated using Approach II with the MSE results for the tests of parameter α. It is observed that
various parameters and the results are reported by averaging α = 1 is optimal for the noiseless case while α = 0.6 when
over 500 trials. A random 2D signal S ∈ R64×64 with sparsity high noise exists. Therefore a larger α is preferred when low
K = 80 is generated, where the randomly placed non-zero noise is involved, which needs to be reduced accordingly when
elements follow an i.i.d zero-mean unit-variance Gaussian the noise becomes higher.
distribution. Both the dictionaries Ψi ∈ R64×256 (i = 1, 2) We then proceed to examine the performance of both the
and the initial sensing matrices Φi ∈ R40×64 (i = 1, 2) proposed approaches. As this is the first work to optimize the
are generated randomly with i.i.d zero-mean unit-variance multidimensional sensing matrix, we take the i.i.d Gaussian
Gaussian distributions, and the dictionaries are then column sensing matrices that are commonly used in CS problems
normalized
√ while the sensing matrices are normalized by: for comparison. Besides, since Sapiro’s approach [18] has the
Φi = 64Φi /||Φi ||F . When taking measurements, random same spirit to that of Approach I (as reviewed in Section III-A),
additive Gaussian noise with variance σ 2 is induced. A it can be easily extended to the multidimensional case, i.e.,
constant step size η = 1e − 7 is selected empirically for individually generating Φi (i = 1, 2) using the approach in
Approach II and the BP solver SPGL1 [54] is employed for [18]. We hence also include it in the comparisons and denote it
the reconstructions. by Separable Sapiro’s approach (SS). The previously described
Fig. 1 illustrates the results for the parameter tests. In Fig. 1 synthetic data is generated for the experiments and akin to [8]–
(a) and (c), the parameter β is evaluated for the noiseless (σ 2 = [10], [12], BP and OMP are selected as representative of the
10
2
10
Approach I Approach I
0
10 Approach II Approach II
SS SS
Gaussian 0 Gaussian
10
2
MSE
MSE
10
2
10
4
10 4
10
(a) (a)
2 Approach I 2
10 10
Approach II
SS
0
Gaussian 0
10 10
MSE
MSE
2 2
10 10
4 4
10 10 Approach I
Approach II
SS
6 6
10 10 Gaussian
(b) (b)
Fig. 2: MSE performance of different sensing matrices for (a) Fig. 3: MSE performance of different sensing matrices for (a)
BP, (b) OMP when Mi (i = 1, 2) varies. (K = 80, N1 = BP, (b) OMP when K varies. (M1 = M2 = 40, N1 = N2 =
N2 = 64, N̂1 = N̂2 = 256 and σ 2 = 10−4 ) 64, N̂1 = N̂2 = 256 and σ 2 = 10−4 )
2
10 0 .0 4 5
A p p ro a ch I cT K S V D (γ= 1 /2 5 6 ) (M S E : 0 .0 1 8 6 )
A p p ro a ch II cT K S V D (γ= 1 /1 2 8 ) (M S E : 0 .0 1 8 5 )
SS 0 .0 4
0 cT K S V D (γ= 1 /6 4 ) (M S E : 0 .0 1 8 3 )
10 G a u ssia n
cT K S V D (γ= 1 /3 2 ) (M S E : 0 .0 1 8 4 )
0 .0 3 5 cT K S V D (γ= 1 /1 6 ) (M S E : 0 .0 1 8 5 )
cK S V D (γ= 1 /3 2 ) (M S E : 0 .0 2 0 3 )
MSE
ARE
−2
10 0 .0 3
0 .0 2 5
−4
10
0 .0 2
−6
10 0 .0 1 5
10 12 14 16 18 0 20 40 60 80 100
Mi Ite ra tio n
(a) (a)
4
10 0 .0 2
A p p ro a ch I cT K S V D (γ= 1 /2 5 6 ) (M S E : 0 .0 3 3 6 )
2 A p p ro a ch II cT K S V D (γ= 1 /1 2 8 ) (M S E : 0 .0 3 3 5 )
10
SS cT K S V D (γ= 1 /6 4 ) (M S E : 0 .0 3 3 6 )
G a u ssia n cT K S V D (γ= 1 /3 2 ) (M S E : 0 .0 3 3 7 )
0
10 0 .0 1 5 cT K S V D (γ= 1 /1 6 ) (M S E : 0 .0 3 3 7 )
cK S V D (γ= 1 /6 4 ) (M S E : 0 .0 3 5 0 )
MSE
−2
ARE
10
−4
10 0 .0 1
−6
10
−8
10
10 12 14 16 18 0 .0 0 5
0 20 40 60 80 100
M
i Ite ra tio n
(b) (b)
Fig. 4: MSE performance of different sensing matrices (3D) Fig. 5: Convergence behavior of cTKSVD with different values
for (a) BP, (b) OMP when Mi (i = 1, 2, 3) varies. (K = of γ compared to that of cKSVD with its optimal parameter
80, N1 = N2 = N3 = 20, N̂1 = N̂2 = N̂3 = 80 and setting when (a) M1 = M2 = 7; (b) M1 = M2 = 3.
σ 2 = 10−4 )
0 .0 4 40
cT K S V D
cK S V D 35
0 .0 3 5 TKSVD
KHOSVD 30
cK S V D (u n fo ld in g )
25
P S N R (d B )
0 .0 3
MSE
20
0 .0 2 5 15
I+ cT K S V D
10 II+ cT K S V D
0 .0 2
S apiro+ cK S V D
5
S apiro+ cK S V D (unfolding)
0 .0 1 5 0
1000 2000 3000 4000 5000 6000 0 5 10 15 20 25 30
T Iteration
40
performed previously to obtain the results in Fig. 1 and 5, the
parameters are chosen as: α = 3, β = 0.8, γ = 1/8. The
35
step size for II + cTKSVD is set empirically as: η = 1e − 5.
The PSNR performance for different numbers of iterations is
30
illustrated in Fig. 7. Since Sapiro’s approach in [18] also jointly
optimizes the sensing matrix and dictionary, we include it in
P S N R (d B )
25
this figure (denoted by Sapiro’s + cKSVD). The parameter γ
I+ cT K S V D
is optimal at 1/2 for cKSVD under our settings. However,
20 II+ cT K S V D note that Sapiro’s approach is only for vectorized signals in
II+ T K S V D the conventional CS problem, i.e., a single sensing matrix
15
G a u ssia n + cT K S V D
S a p iro + cK S V D
Φ ∈ R36×64 and a dictionary Ψ ∈ R64×256 are obtained. We
SS+KHO SVD also include the Sapiro + cKSVD (unfolding) approach, i.e.,
10
S a p iro + cK S V D (u n fo ld in g ) the implementation of Sapiro + cKSVD for the unfoldings of
3 3 .5 4 4 .5 5 5 .5 6 6 .5 7
M i (i= 1 , 2 ) the data tensor, in the comparison. From Fig. 7, we can see
that the proposed approaches outperform Sapiro’s approach in
(a) both forms in the sense that their PSNR values are higher at
convergence while all the methods converge in less than 10
38 iterations.
I+ cT K S V D
II+ cT K S V D
Then the proposed approaches are compared with various
36
II+ T K S V D other approaches when the number of measurements (Mi (i =
34 G a u ssia n + cT K S V D
S a p iro + cK S V D
1, 2)) and the noise variance (σ 2 ) vary. Specifically, using the
32 SS+KHO SVD notation employed previously and by denoting the method of
S a p iro + cK S V D (u n fo ld in g ) combining sensing matrix design with that of the dictionary
P S N R (dB )
30
learning using a “+”, the methods for comparison are: II
28 + TKSVD, Gaussian + cTKSVD, Sapiro + cKSVD, SS +
26 KHOSVD and Sapiro + cKSVD (unfolding), where the Sapiro
+ cKSVD (unfolding) approach refers to the implementation of
24
Sapiro + cKSVD for the unfoldings of the data tensor. In these
22 approaches, II + TKSVD and SS + KHOSVD are uncoupled
20 methods; Gaussian + cTKSVD does not involve sensing matrix
0 .5 1 1 .5 2 2 .5 3 3 .5 4 4 .5 5
σ
2 optimization; Sapiro + cKSVD is for conventional CS system
only.
(b)
The results are shown in Fig. 8. We can see that the proposed
Fig. 8: PSNR performance of different methods when (a) approaches obtain higher PSNR values than all of the other
Mi (i = 1, 2) varies (σ 2 = 0), (b) σ 2 varies (M1 = M2 = 6). methods and II + cTKSVD performs best. To see the gain
(T = 5000, K = 4, M1 = M2 = 6, N1 = N2 = 8, N̂1 = of coupling sensing matrices during dictionary learning and
N̂2 = 16) optimizing the sensing matrices, respectively, we compare II
+ cTKSVD with II + TKSVD and Gaussian + cTKSVD.
For instance, when Mi = 5, σ 2 = 0, II + cTKSVD has
a gain of about 3dB over II + TKSVD and nearly 9dB
over Gaussian + cTKSVD. Although Sapiro + cKSVD has
of 5000 8 × 8 patches obtained by randomly extracting 25 a similar performance to ours at some specific settings, it is
patches from each of the 200 images in a training set from the not for a TCS system that requires multiple separable sensing
Berkeley segmentation dataset [55]. The test data is obtained matrices. Even though we can implement it for the tensor
by extracting non-overlapping 8 × 8 patches from the other unfoldings, i.e., the Sapiro + cKSVD (unfolding) approach,
100 images (different from the images for training) in the its performance is not superior to that of Sapiro + cKSVD.
dataset. A 2D Discrete Cosine Transform (DCT) is employed Examples of reconstructed images using these methods are
to initialize the dictionaries Ψi ∈ R8×16 (i = 1, 2) and demonstrated in Fig. 9 and 10 with the corresponding PSNR
i.i.d Gaussian matrices are used as the initial sensing matrices values listed. All of the conducted simulations verify that the
Φi ∈ RMi ×8 (i = 1, 2). We employ TOMP for reconstruction proposed methods of multidimensional sensing matrix and
and the Peak Signal to Noise Ratio (PSNR) is used as the dictionary optimization improve the performance of a TCS
evaluation criteria. system.
In the first experiment, we examine the convergence be-
havior of Algorithm 3 when the proposed Approach I and II
VI. C ONCLUSIONS
are utilized for the sensing matrix optimization step (respec-
tively denote by I + cTKSVD and II + cTKSVD). We take In this paper, we propose to jointly optimize the multidimen-
M1 = M2 = 6 and no noise is added to the measurements sional sensing matrix and dictionary for TCS systems. To ob-
at the test stage, i.e., σ 2 = 0. By conducting the simulations tain the optimized sensing matrices, a separable approach with
14
A PPENDIX A
P ROOF OF T HEOREM 3
T
Assume Ai = Φi Ψi = UAi [ ΛAi 0 ] VA i
is an SVD
of Ai for i = 1, 2 and rank(Ai ) = Mi . Then the objective
we want to minimize in (19) can be rewritten as:
Fig. 9: Reconstruction example when M1 = M2 = 6. The
images from left to right, top to bottom and their PSNR [ ] [ ]
Λ2A2 0 Λ2A1 0
(dB) values are: Original, II+cTKSVD (35.41), I+cTKSVD IN̂ − (VA2 T
VA ) ⊗ (VA1 T
VA ) 2
F.
1 N̂2 0 0 2 0 0 1
(34.97), Sapiro+cKSVD (33.64), II+TKSVD (33.57), S- [ ] [ 2 ]
apiro+cKSVD (unfolding) (33.44), SS+KHOSVD (28.62), Λ2A2 0 ΛA1 0
Gaussian+cTKSVD (28.05). Denote Σ = ⊗ = diag(ν A2 ⊗
0[ 0 ] 0 0
2
ΛAi 0
ν A1 ), ν Ai = diag( ), then we have
0 0
∑
M2 ∑
M1
||IN̂1 N̂2 − Σ||2F = N̂1 N̂2 − M1 M2 + (1 − (v2 )p (v1 )q )2 .
p=1 q=1
(51)
Therefore we can obtain that the minimum value of (19) is
N̂1 N̂2 − M1 M2 , and that it is achieved when the entries of ν̂
are all unity.
Clearly ΛAi =IMi for i = 1, 2 is a solution, i.e.,
Ai = UAi [ IMi 0 ] VA T
i
with UAi ∈ RMi ×Mi and
N̂i ×N̂i
VAi ∈ R being arbitrary orthonormal matrices. Then
we would like to find Φi (i = 1, 2) such that Φi Ψi =
T
UAi [ IMi 0 ] VA i
. Following the derivation of Theorem
2 in [11], the solution in (20) can be found.
Fig. 10: Reconstruction example when M1 = M2 = 4. The With this solution, for an arbitrary vector z ∈ RN̂i ,
images from left to right, top to bottom and their PSNR we have ||ATi z||22 = tr(zT Ai ATi z) = tr(zT z) = ||z||22 ,
(dB) values are: Original, II+cTKSVD (29.91), I+cTKSVD which indicates that the resulting equivalent sensing matrices
(29.45), Sapiro+cKSVD (28.72), II+TKSVD (26.60), S- Ai (i = 1, 2) are Parseval tight frames. In addition, we
apiro+cKSVD (unfolding) (23.51), SS+KHOSVD (22.62), observe that the solution in (20) can be obtained by separately
Gaussian+cTKSVD (21.94). solving the sub-problems in (21), of which the solutions have
been derived in [11]. By substituting the solutions of the sub-
problems into (19), we can conclude the minimum remains as
N̂1 N̂2 − M1 M2 .
15
based on real-time in-pixel compressive sensing,” in in Proc. IEEE Wei Chen (M’13) received the B.Eng. degree and
ISCAS, June 2010, pp. 2956 –2959. M.Eng. degree in Communications Engineering from
[43] E. J. Candès and T. Tao, “Decoding by linear programming,” IEEE Beijing University of Posts and Telecommunications,
Trans. Information Theory, vol. 51, no. 12, pp. 4203 – 4215, Dec. 2005. China, in 2006 and 2009, respectively, and the Ph.D.
degree in Computer Science from the University of
[44] J. A. Tropp and A. C. Gilbert, “Signal recovery from random mea-
Cambridge, UK, in 2012. He had been an Research
surements via orthogonal matching pursuit,” IEEE Trans. Information
Associate with the Computer Laboratory, University
Theory, vol. 53, no. 12, pp. 4655 –4666, Dec. 2007. of Cambridge from 2013 to 2016. He is currently a
[45] T. Blumensath and M. E. Davies, “Iterative hard thresholding for Professor with Beijing Jiaotong University, Beijing,
compressed sensing,” Applied and Computational Harmonic Analysis, China. His current research interests include sparse
vol. 27, no. 3, pp. 265–274, 2009. representation, Bayesian inference, image processing
[46] J. A. Tropp, “On the conditioning of random subdicitionaries,” Appl. and wireless sensor networks.
Comput. Harmon. Anal., vol. 25, pp. 1–24, 2008.
[47] J. A. Tropp, “Norms of random submatrices and sparse approximation,”
C. R. Acad. Sci. Paris, vol. 346, no. 23-24, pp. 1271–1274, 2008.
[48] M. A. Herman and T. Strohmer, “High resolution radar via compressed
sensing,” IEEE Trans. Signal Process., vol. 57, no. 6, pp. 2275–2284,
2009.
[49] G. E. Pfander and H. Rauhut, “Sparsity in time-frequency representa-
tions,” J. Fourier Anal. Appl., vol. 16, pp. 233–260, 2010.
[50] E. J. Candès and Y. Plan, “Near-ideal model selection by l1 minimiza-
tion,” Annals of Statistics, vol. 37, no. 5A, pp. 2145–2177, 2009.
[51] W. Chen, M. R. D. Rodrigues, and I. J. Wassell, “On the use of
unit-norm tight frames to improve the average mse performance in
compressive sensing applications,” IEEE Signal Process. Letters, vol.
19, no. 1, pp. 8–11, 2012.
[52] C. F. Van Loan, “The ubiquitous Kronecker product,” J. comput. applied
mathematics, vol. 123, no. 1, pp. 85–100, 2000.
[53] L. De Lathauwer, B. De Moor, and J. Vandewalle, “A multilinear sin-
gular value decomposition,” SIAM J. on Matrix Analysis. Applications.,
vol. 21, no. 4, pp. 1253–1278, 2000.
[54] E. V. D. Berg and M. P. Friedlander, “Probing the pareto frontier for
basis pursuit solutions,” SIAM Journal on Scientific Computing, vol.
31, no. 2, pp. 890–912, 2008.
[55] D. Martin, C. Fowlkes, D. Tal, and J. Malik, “A database of human Ian Wassell received the B.Sc. and B.Eng. degrees
segmented natural images and its application to evaluating segmentation from the University of Loughborough in 1983, and
algorithms and measuring ecological statistics,” in Proc. IEEE ICCV, the Ph.D. degree from the University of Southampton
2001, vol. 2, pp. 416–423. in 1990. He is a Senior Lecturer at the University of
Cambridge Computer Laboratory and has in excess
of 15 years experience in the simulation and design
of radio communication systems gained via a number
of positions in industry and higher education. He
has published more than 180 papers concerning
wireless communication systems and his current re-
search interests include: fixed wireless access, sensor
networks, cooperative networks, propagation modelling, compressive sensing
and cognitive radio. He is a member of the IET and a Chartered Engineer.