Tensors PDF

1
Tensor Decompositions for Signal

Processing Applications
From Two-way to Multiway Component Analysis
A. Cichocki, D. Mandic, A-H. Phan, C. Caiafa, G. Zhou, Q. Zhao, and
L. De Lathauwer
Summary I NTRODUCTION
The widespread use of multi-sensor technol- Historical notes. The roots of multiway anal-
ogy and the emergence of big datasets has high- ysis can be traced back to studies of homoge-
lighted the limitations of standard flat-view ma- neous polynomials in the 19th century, contribu-
trix models and the necessity to move towards tors include Gauss, Kronecker, Cayley, Weyl and
more versatile data analysis tools. We show Hilbert — in modern day interpretation these
that higher-order tensors (i.e., multiway arrays) are fully symmetric tensors. Decompositions of
enable such a fundamental paradigm shift to- non-symmetric tensors have been studied since
wards models that are essentially polynomial the early 20th century [1], whereas the benefits
and whose uniqueness, unlike the matrix meth- of using more than two matrices in factor analy-
ods, is guaranteed under very mild and natural sis [2] became apparent in several communities
conditions. Benefiting from the power of multi- since the 1960s. The Tucker decomposition for
linear algebra as their mathematical backbone, tensors was introduced in psychometrics [3], [4],
data analysis techniques using tensor decom- while the Canonical Polyadic Decomposition
positions are shown to have great flexibility (CPD) was independently rediscovered and put
in the choice of constraints that match data into an application context under the names
properties, and to find more general latent com- of Canonical Decomposition (CANDECOMP)
ponents in the data than matrix-based methods. in psychometrics [5] and Parallel Factor Model
A comprehensive introduction to tensor decom- (PARAFAC) in linguistics [6]. Tensors were sub-
positions is provided from a signal processing sequently adopted in diverse branches of data
perspective, starting from the algebraic founda- analysis such as chemometrics, food industry
tions, via basic Canonical Polyadic and Tucker and social sciences [7], [8]. When it comes to
models, through to advanced cause-effect and Signal Processing, the early 1990s saw a consid-
multi-view data analysis schemes. We show that erable interest in Higher-Order Statistics (HOS)
tensor decompositions enable natural general- [9] and it was soon realized that for the mul-
izations of some commonly used signal pro- tivariate case HOS are effectively higher-order
cessing paradigms, such as canonical correla- tensors; indeed, algebraic approaches to Inde-
tion and subspace techniques, signal separation, pendent Component Analysis (ICA) using HOS
linear regression, feature extraction and classi- [10]–[12] were inherently tensor-based. Around
fication. We also cover computational aspects, 2000, it was realized that the Tucker decompo-
and point out how ideas from compressed sens- sition represents a MultiLinear Singular Value
ing and scientific computing may be used for Decomposition (MLSVD) [13]. Generalizing the
addressing the otherwise unmanageable stor- matrix SVD, the workhorse of numerical linear
age and manipulation problems associated with algebra, the MLSVD spurred the interest in
big datasets. The concepts are supported by tensors in applied mathematics and scientific
illustrative real world case studies illuminating computing in very high dimensions [14]–[16].
the benefits of the tensor framework, as ef- In parallel, CPD was successfully adopted as a
ficient and promising tools for modern signal tool for sensor array processing and determinis-
processing, data analysis and machine learning tic signal separation in wireless communication
applications; these benefits also extend to [17], [18]. Subsequently, tensors have been used
vector/matrix data through tensorization. in audio, image and video processing, machine
2
learning and biomedical applications, to name N OTATIONS AND C ONVENTIONS

but a few. The significant interest in tensors and A tensor can be thought of as a multi-index
their fast emerging applications are reflected in numerical array, whereby the order of a tensor
books [7], [8], [12], [19]–[21] and tutorial papers is the number of its “modes” or “dimensions”,
[22]–[29] covering various aspects of multiway these may include space, time, frequency, trials,
analysis. classes, and dictionaries. A real-valued tensor
of order N is denoted by A ∈ R I1 × I2 ×···× IN
From a matrix to a tensor. Approaches to and its entries by ai1 ,i2 ,...,i N . Then, an N × 1
two-way (matrix) component analysis are well vector a is considered a tensor of order one,
established, and include Principal Component and an N × M matrix A a tensor of order two.
Analysis (PCA), Independent Component Anal- Subtensors are parts of the original data tensor,
ysis (ICA), Nonnegative Matrix Factorization created when only a fixed subset of indices is
(NMF) and Sparse Component Analysis (SCA) used. Vector-valued subtensors are called fibers,
[12], [19], [30]. These techniques have become defined by fixing every index but one, and
standard tools for e.g., blind source separation matrix-valued subtensors are called slices, ob-
(BSS), feature extraction, or classification. On tained by fixing all but two indices (see Table
the other hand, large classes of data arising from I). Manipulation of tensors often requires their
modern heterogeneous sensor modalities have a reformatting (reshaping); a particular case of
multiway character and are therefore naturally reshaping tensors to matrices is termed matrix
represented by multiway arrays or tensors (see unfolding or matricization (see Figure 4 (left)).
Section Tensorization). Note that a mode-n multiplication of a tensor A
with a matrix B amounts to the multiplication
Early multiway data analysis approaches re- of all mode-n vector fibers with B, and that
formatted the data tensor as a matrix and re- in linear algebra the tensor (or outer) product
sorted to methods developed for classical two- appears in the expression for a rank-1 matrix:
way analysis. However, such a “flattened” view abT = a ◦ b. Basic tensor notations are summa-
of the world and the rigid assumptions inherent rized in Table I, while Table II outlines several
in two-way analysis are not always a good types of products used in this paper.
match for multiway data. It is only through
higher-order tensor decomposition that we have I NTERPRETABLE C OMPONENTS IN T WO -WAY
the opportunity to develop sophisticated mod- D ATA A NALYSIS
els capturing multiple interactions and cou- The aim of blind source separation (BSS),
plings, instead of standard pairwise interac- factor analysis (FA) and latent variable analysis
tions. In other words, we can only discover (LVA) is to decompose a data matrix X ∈ R I × J
hidden components within multiway data if into the factor matrices A = [ a1 , a2 , . . . , a R ] ∈
the analysis tools account for intrinsic multi- R I × R and B = [b1 , b2 , . . . , bR ] ∈ R J × R as:
dimensional patterns present — motivating the
R
development of multilinear techniques. X = A D BT + E = ∑ λr ar brT + E
r =1
In this article, we emphasize that tensor de- R
compositions are not just matrix factorizations = ∑ λr ar ◦ br + E, (1)
with additional subscripts — multilinear alge- r =1
bra is much structurally richer than linear al- where D = diag(λ1 , λ2 , . . . , λ R ) is a scaling
gebra. For example, even basic notions such as (normalizing) matrix, the columns of B repre-
rank have a more subtle meaning, uniqueness sent the unknown source signals (factors or la-
conditions of higher-order tensor decomposi- tent variables depending on the tasks in hand),
tions are more relaxed and accommodating than the columns of A represent the associated mix-
those for matrices [31], [32], while matrices and ing vectors (or factor loadings), while E is noise
tensors also have completely different geometric due to an unmodelled data part or model error.
properties [20]. This boils down to matrices rep- In other words, model (1) assumes that the
resenting linear transformations and quadratic data matrix X comprises hidden components
forms, while tensors are connected with multi- br (r = 1, 2, . . . , R) that are mixed together in
linear mappings and multivariate polynomials an unknown manner through coefficients A, or,
[29]. equivalently, that data contain factors that have
3
TABLE I: Basic notation.
A, A, a, a tensor, matrix, vector, scalar
A = [ a1 , a 2 , . . . , a R ] matrix A with column vectors ar
a(:, i2 , i3 , . . . , i N ) fiber of tensor A obtained by fixing all but one index
A(:, :, i3 , . . . , i N ) matrix slice of tensor A obtained by fixing all but two indices
A(:, :, :, i4 , . . . , i N ) tensor slice of A obtained by fixing some indices
A(I1 , I2 , . . . , I N ) subtensor of A obtained by restricting indices to belong to subsets

In ⊆ {1, 2, . . . , In }
mode-n matricization of tensor A ∈ R I1 × I2 ×···× IN whose entry at

A( n) ∈ R In × I1 I2 ··· In−1 In+1 ···IN
row in and column (i1 − 1) I2 · · · In−1 In+1 · · · IN + · · · + (i N −1 −
1) IN + i N is equal to ai1 i2 ...i N
vec (A) ∈ R IN IN −1 ···I1 vectorization of tensor A ∈ R I1 × I2 ×···× IN with the entry at
position i1 + ∑kN=2 [(ik − 1) I1 I2 · · · Ik−1 ] equal to ai1 i2 ...i N
D = diag(λ1 , λ2 , . . . , λ R ) diagonal matrix with drr = λr
D = diag N (λ1 , λ2 , . . . , λ R ) diagonal tensor of order N with drr ···r = λr
A T , A −1 , A † transpose, inverse, and Moore-Penrose pseudo-inverse
an associated loading for every data channel. [30], Nonnegative Matrix Factorization (NMF)
Figure 2 (top) depicts the model (1) as a dyadic [19], and harmonic retrieval [35].
decomposition, whereby the terms ar ◦ br =
ar brT are rank-1 matrices.
T ENSORIZATION — B LESSING OF
The well-known indeterminacies intrinsic to D IMENSIONALITY
this model are: (i) arbitrary scaling of compo-
nents, and (ii) permutation of the rank-1 terms. While one-way (vectors) and two-way (matri-
Another indeterminacy is related to the physical ces) algebraic structures were respectively intro-
meaning of the factors: if the model in (1) is un- duced as natural representations for segments
constrained, it admits infinitely many combina- of scalar measurements and measurements on
tions of A and B. Standard matrix factorizations a grid, tensors were initially used purely for
in linear algebra, such as the QR-factorization, the mathematical benefits they provide in data
Eigenvalue Decomposition (EVD), and Singular analysis; for instance, it seemed natural to stack
Value Decomposition (SVD), are only special together excitation-emission spectroscopy ma-
cases of (1), and owe their uniqueness to hard trices in chemometrics into a third-order tensor
and restrictive constraints such as triangularity [7].
and orthogonality. On the other hand, certain The procedure of creating a data tensor from
properties of the factors in (1) can be repre- lower-dimensional original data is referred to
sented by appropriate constraints, making pos- as tensorization, and we propose the following
sible unique estimation or extraction of such fac- taxonomy for tensor generation:
tors. These constraints include statistical inde- 1) Rearrangement of lower dimensional data
pendence, sparsity, nonnegativity, exponential structures. Large-scale vectors or matrices
structure, uncorrelatedness, constant modulus, are readily tensorized to higher-order ten-
finite alphabet, smoothness and unimodality. sors, and can be compressed through ten-
Indeed, the first four properties form the basis sor decompositions if they admit a low-
of Independent Component Analysis (ICA) [12], rank tensor approximation; this principle
[33], [34], Sparse Component Analysis (SCA) facilitates big data analysis [21], [27], [28]
4
TABLE II: Definition of products.
C = A ×n B mode-n product of A ∈ R I1 × I2 ×···× IN and B ∈ R Jn × In yields

C ∈ R I1 ×···× In−1 × Jn × In+1×···× IN with entries ci1 ···in−1 jn in+1 ···i N =
I
∑inn=1 ai1 ···in−1 in in+1 ···i N b jn in and matrix representation C( n) = B A( n)
C = JA; B(1) , B(2) , . . . , B( N ) K full multilinear product, C = A ×1 B(1) ×2 B(2) · · · × N B( N )
C = A◦B tensor or outer product of A ∈ R I1 × I2 ×···× IN and B ∈ R J1 × J2 ×···× J M

yields C ∈ R I1 × I2 ×···× IN × J1 × J2 ×···× J M with entries ci1 i2 ···i N j1 j2 ···jN =
ai1 i2 ···i N b j1 j2 ···j M
X = a ( 1) ◦ a ( 2) ◦ · · · ◦ a ( N ) tensor or outer product of vectors a( n) ∈ R In (n = 1, . . . , N ) yields a rank-1

( 1) ( 2) (N)
tensor X ∈ R I1 × I2 ×···× IN with entries xi1 i2 ...i N = ai1 ai2 . . . ai N
C = A⊗B Kronecker product of A ∈ R I1 × I2 and B ∈ R J1 × J2 yields C ∈ R I1 J1 × I2 J2

with entries c( i1 −1) J1 + j1 ,( i2 −1) J2 + j2 = ai1 i2 b j1 j2
C = A⊙B Khatri-Rao product of A = [ a1 , . . . , a R ] ∈ R I × R and B = [ b1 , . . . , b R ] ∈

R J × R yields C ∈ R I J × R with columns cr = ar ⊗ br
6
I =2 where b = [1, z, z2, · · · ] T .
Also, in sensor array processing, tensor
structures naturally emerge when combin-
...
+L + ing snapshots from identical subarrays [17].

2) Mathematical construction. Among many
(2´ 2´ 2´ 2´ 2´ 2) such examples, the Nth-order moments
(8´8) (cumulants) of a vector-valued random
(64 ´1)
variable form an Nth-order tensor [9],
while in second-order ICA snapshots of
z0 data statistics (covariance matrices) are ef-
z0 + ∆z fectively slices of a third-order tensor [12],
z0 + 2∆z y [37]. Also, a (channel × time) data matrix
y + 2∆y can be transformed into a (channel × time ×
y 0+ ∆y
y0 0 frequency) or (channel × time × scale) tensor
z x x x x via time-frequency or wavelet representa-
0 0 0
+ ∆+ 2
x ∆x tions, a powerful procedure in multichan-
Figure 1: Construction of tensors. Top: Tensorization nel EEG analysis in brain science [19], [38].
of a vector or matrix into the so-called quantized 3) Experiment design. Multi-faceted data can be
format; in scientific computing this facilitates super- naturally stacked into a tensor; for instance,
compression of large-scale vectors or matrices. Bot-
in wireless communications the so-called
tom: Tensor formed through the discretization of a
trivariate function f ( x, y, z). signal diversity (temporal, spatial, spectral,
. . . ) corresponds to the order of the ten-
sor [18]. In the same spirit, the standard
EigenFaces can be generalized to Tensor-
(see Figure 1 (top)). For instance, a one- Faces by combining images with different
way exponential signal x (k) = azk can be illuminations, poses, and expressions [39],
rearranged into a rank-1 Hankel matrix or while the common modes in EEG record-
a Hankel tensor [36]: ings across subjects, trials, and conditions
are best analyzed when combined together
 
x (0) x (1) x (2) · · ·
 x (1) x (2) x (3) · · ·  into a tensor [26].
H =  x (2) x (3) x (4) · · ·  = a b ◦ b, 4) Naturally tensor data. Some data sources
 
.. .. .. are readily generated as tensors (e.g., RGB
 
. . . color images, videos, 3D light field dis-
(2)
5
plays) [40]. Also in scientific computing we tensor components that can be retrieved with
often need to evaluate a discretized mul- sufficient accuracy, and often there are only a
tivariate function; this is a natural tensor, few data components present. A pragmatic first
as illustrated in Figure 1 (bottom) for a assessment of the number of components may
trivariate function f ( x, y, z) [21], [27], [28]. be through the inspection of the multilinear
The high dimensionality of the tensor format singular value spectrum (see Section Tucker
is associated with blessings — these include Decomposition), which indicates the size of
possibilities to obtain compact representations, the core tensor in Figure 2 (bottom-right). Ex-
uniqueness of decompositions, flexibility in the isting techniques for rank estimation include
choice of constraints, and generality of compo- the CORCONDIA algorithm (core consistency
nents that can be identified. diagnostic) which checks whether the core ten-
sor is (approximately) diagonalizable [7], while
a number of techniques operate by balancing
C ANONICAL P OLYADIC D ECOMPOSITION the approximation error versus the number of
Definition. A Polyadic Decomposition degrees of freedom for a varying number of
(PD) represents an Nth-order tensor rank-1 terms [42]–[44].
X ∈ R I1 × I2 ×···× IN as a linear combination Uniqueness. Uniqueness conditions give the-
of rank-1 tensors in the form oretical bounds for exact tensor decomposi-
R tions. A classical uniqueness condition is due to
(1) (2) (N)
X= ∑ λr b r ◦ b r ◦ · · · ◦ br . (3) Kruskal [31], which states that for third-order
r =1 tensors the CPD is unique up to unavoidable
Equivalently, X is expressed as a multilinear scaling and permutation ambiguities, provided
product with a diagonal core: that k B(1) + k B(2) + k B(3) ≥ 2R + 2, where the
Kruskal rank k B of a matrix B is the maximum
X = D ×1 B ( 1 ) ×2 B ( 2 ) · · · × N B ( N ) value ensuring that any subset of k B columns is
= JD; B(1) , B(2), . . . , B( N ) K, (4) linearly independent. In sparse modeling, the
term (k B + 1) is also known as the spark [30].
where D = diag N (λ1 , λ2 , . . . , λ R ) (cf. the matrix
A generalization to Nth-order tensors is due to
case in (1)). Figure 2 (bottom) illustrates these
Sidiropoulos and Bro [45], and is given by:
two interpretations for a third-order tensor. The
tensor rank is defined as the smallest value of N
R for which (3) holds exactly; the minimum ∑ kB(n) ≥ 2R + N − 1. (6)
n =1
rank PD is called canonical (CPD) and is desired
in signal separation. The term CPD may also More relaxed uniqueness conditions can be ob-
be considered as an abbreviation of CANDE- tained when one factor matrix has full column
COMP/PARAFAC decomposition, see Histori- rank [46]–[48]; for a thorough study of the
cal notes. The matrix/vector form of CPD can third-order case, we refer to [32]. This all shows
be obtained via the Khatri-Rao products as: that, compared to matrix decompositions, CPD
is unique under more natural and relaxed con-
X ( n ) = B ( n ) D B ( N ) ⊙ · · · ⊙ B ( n +1 ) ditions, that only require the components to be
T “sufficiently different” and their number not
⊙ B ( n −1 ) ⊙ · · · ⊙ B ( 1 ) (5) unreasonably large. These conditions do not
have a matrix counterpart, and are at the heart
vec(X) = [B( N ) ⊙ B( N −1) ⊙ · · · ⊙ B(1) ] d.
of tensor based signal separation.
where d = (λ1 , λ2 , . . . , λ R ) T . Computation. Certain conditions, including
Rank. As mentioned earlier, rank-related Kruskal’s, enable explicit computation of the
properties are very different for matrices and factor matrices in (3) using linear algebra (es-
tensors. For instance, the number of complex- sentially, by solving sets of linear equations
valued rank-1 terms needed to represent a and by computing (generalized) Eigenvalue
higher-order tensor can be strictly less than Decomposition) [6], [47], [49], [50]. The presence
the number of real-valued rank-1 terms [20], of noise in data means that CPD is rarely ex-
while the determination of tensor rank is in act, and we need to fit a CPD model to the
general NP-hard [41]. Fortunately, in signal pro- data by minimizing a suitable cost function.
cessing applications, rank estimation most of- This is typically achieved by minimizing the
ten corresponds to determining the number of Frobenius norm of the difference between the
6
l1 lR
b1 bR
X
X @ + ... + = A T
B
a1 aR ar D
(I ´ J ) (I ´ R) (R´ R) (R´ J ) b r
c1 cR C (K ´ R)
l1 lR cr
@ b 1 + ... + bR =
A T
ar B
a1 aR
(I ´ J ´ K ) (I ´ R) (R´ R´ R) (R´ J ) b r
Figure 2: Analogy between dyadic (top) and polyadic (bottom) decompositions; the Tucker format has a
diagonal core. The uniqueness of these decompositions is a prerequisite for blind source separation and
latent variable analysis.
given data tensor and its CP approximation, to be typically more robust to overfactoring,
or alternatively by least absolute error fitting but come at a cost of a much higher compu-
when the noise is Laplacian [51]. Theoretical tational load per iteration. More sophisticated
Cramér-Rao Lower Bound (CRLB) and Cramér- versions use the rank-1 structure of the terms
Rao Induced Bound (CRIB) for the assessment within CPD to perform efficient computation
of CPD performance were derived in [52] and and storage of the Jacobian and (approximate)
[53]. Hessian; their complexity is on par with ALS
Since the computation of CPD is intrinsi- while for ill-conditioned cases the performance
cally multilinear, we can arrive at the solution is often superior [60], [61].
through a sequence of linear sub-problems as in An important difference between matrices
the Alternating Least Squares (ALS) framework, and tensors is that the existence of a best rank-R
whereby the LS cost function is optimized for approximation of a tensor of rank greater than
one component matrix at a time, while keeping R is not guaranteed [20], [62] since the set of
the other component matrices fixed [6]. As seen tensors whose rank is at most R is not closed.
from (5), such a conditional update scheme boils As a result, cost functions for computing factor
down to solving overdetermined sets of linear matrices may only have an infimum (instead
equations. of a minimum) so that their minimization will
While the ALS is attractive for its simplicity approach the boundary of that set without ever
and satisfactory performance for a few well reaching the boundary point. This will cause
separated components and at sufficiently high two or more rank-1 terms go to infinity upon
SNR, it also inherits the problems of alternating convergence of an algorithm, however, numeri-
algorithms and is not guaranteed to converge to cally the diverging terms will almost completely
a stationary point. This can be rectified by only cancel one another while the overall cost func-
updating the factor matrix for which the cost tion will still decrease along the iterations [63].
function has most decreased at a given step [54], These diverging terms indicate an inappropriate
but this results in an N-times increase in com- data model: the mismatch between the CPD and
putational cost per iteration. The convergence the original data tensor may arise due to an
of ALS is not yet completely understood — it underestimated number of components, not all
is quasi-linear close to the stationary point [55], tensor components having a rank-1 structure, or
while it becomes rather slow for ill-conditioned data being too noisy.
cases; for more detail we refer to [56], [57]. Constraints. As mentioned earlier, under
Conventional all-at-once algorithms for nu- quite mild conditions the CPD is unique by
merical optimization such as nonlinear conju- itself, without requiring additional constraints.
gate gradients, quasi-Newton or nonlinear least However, in order to enhance the accuracy and
squares [58], [59] have been shown to often robustness with respect to noise, prior knowl-
outperform ALS for ill-conditioned cases and edge of data properties (e.g., statistical inde-
7
pendence, sparsity) may be incorporated into shall consider Jk X as slices of the tensor X ∈
the constraints on factors so as to facilitate C I × J ×K (see Section Tensorization). It can be
their physical interpretation, relax the unique- shown that the signal part of X admits a CPD
ness conditions, and even simplify computation as in (3)–(4), with λ1 = · · · = λ R = 1,
(3) (3)
[64]–[66]. Moreover, the orthogonality and non- Jk A = B(1) diag(bk1 , . . . , bkR ) and B(2) = S
negativity constraints ensure the existence of [17], and the consequent source separation un-
the minimum of the optimization criterion used der rather mild conditions — its uniqueness
[63], [64], [67]. does not require constraints such as statistical
Applications. The CPD has already been es- independence or constant modulus. Moreover,
tablished as an advanced tool for signal sep- the decomposition is unique even in cases when
aration in vastly diverse branches of signal the number of sources R exceeds the number of
processing and data analysis, such as in audio subarray sensors I, or even the total number of
and speech processing, biomedical engineering, sensors Ĩ. Notice that particular array geome-
chemometrics, and machine learning [7], [22], tries, such as linearly and uniformly displaced
[23], [26]. Note that algebraic ICA algorithms subarrays, can be converted into a constraint
are effectively based on the CPD of a tensor on CPD, yielding a further relaxation of the
of the statistics of recordings; the statistical in- uniqueness conditions, reduced sensitivity to
dependence of the sources is reflected in the noise, and often faster computation [65].
diagonality of the core tensor in Figure 2, that is,
in vanishing cross-statistics [11], [12]. The CPD
T UCKER D ECOMPOSITION
is also heavily used in exploratory data anal-
ysis, where the rank-1 terms capture essential Figure 3 illustrates the principle of Tucker
properties of dynamically complex signals [8]. decomposition which treats a tensor X ∈
Another example is in wireless communication, R I1 × I2 ×···× IN as a multilinear transformation of
where the signals transmitted by different users a (typically dense but small) core tensor G ∈
correspond to rank-1 terms in the case of line- R R1 × R2 ×···× R N by the factor matrices B(n) =
of-sight propagation [17]. Also, in harmonic (n) (n) (n)
[b1 , b2 , . . . , bRn ] ∈ R In × Rn , n = 1, 2, . . . , N [3],
retrieval and direction of arrival type applica- [4], given by
tions, real or complex exponentials have a rank-
R1 R2 RN
1 structure, for which the use of CPD is natural
(1) (2) (N)

[36], [65]. X= ∑ ∑ ··· ∑ gr1 r2 ···r N br1 ◦ br2 ◦ · · · ◦ br N
r 1 =1 r 2 =1 r N =1
Example 1. Consider a sensor array consisting (7)
of K displaced but otherwise identical subarrays or equivalently
of I sensors, with Ĩ = KI sensors in total.
For R narrowband sources in the far field, the X = G ×1 B ( 1 ) ×2 B ( 2 ) · · · × N B ( N )
baseband equivalent model of the array output = JG; B(1) , B(2), . . . , B( N ) K. (8)
becomes X = AS T + E, where A ∈ C Ĩ × R is
Via the Kronecker products (see Table II) Tucker
the global array response, S ∈ C J × R contains
decomposition can be expressed in a ma-
J snapshots of the sources, and E is noise. A
trix/vector form as:
single source (R = 1) can be obtained from
the best rank-1 approximation of the matrix X, X ( n ) = B ( n ) G ( n ) ( B ( N ) ⊗ · · · ⊗ B ( n +1 ) ⊗ B ( n −1 ) ⊗ · · · ⊗ B ( 1 ) ) T
however, for R > 1 the decomposition of X is
vec(X) = [B( N ) ⊗ B( N −1) ⊗ · · · ⊗ B(1) ] vec(G).
not unique, and hence the separation of sources
is not possible without incorporating additional Although Tucker initially used the orthogonal-
information. Constraints on the sources that ity and ordering constraints on the core tensor
may yield a unique solution are, for instance, and factor matrices [3], [4], we can also employ
constant modulus or statistical independence other meaningful constraints (see below).
[12], [68]. Multilinear rank. For a core tensor of mini-
Consider a row-selection matrix Jk ∈ C I × Ĩ mal size, R1 is the column rank (the dimension
that extracts the rows of X corresponding to of the subspace spanned by mode-1 fibers), R2
the k-th subarray, k = 1, . . . , K. For two iden- is the row rank (the dimension of the subspace
tical subarrays, the generalized EVD of the spanned by mode-2 fibers), and so on. A re-
matrices J1 X and J2 X corresponds to the well- markable difference from matrices is that the
known ESPRIT [69]. For the case K > 2, we values of R1 , R2 , . . . , R N can be different for N ≥
8
TABLE III: Different forms of CPD and Tucker rep-

(I 3 ´ R3 ) resentations of a third-order tensor X ∈ R I × J ×K .
C
CPD Tucker Decomposition
@ A T
B Tensor representation, outer products
(I1 × I 2 × I 3 ) (I1 ´ R1 ) (R1×R2 ×R3) (R2 ´ I 2 ) R R1 R2 R3

X = ∑ λ r a r ◦ br ◦ cr X= ∑ ∑ ∑ gr1 r2 r3 a r1 ◦ b r2 ◦ c r3
r =1 r1 =1 r2 =1 r3 =1
Figure 3: Tucker decomposition of a third-order
tensor. The column spaces of A, B, C represent the
signal subspaces for the three modes. The core tensor Tensor representation, multilinear products
G is nondiagonal, accounting for possibly complex
interactions among tensor components. X = D ×1 A ×2 B ×3 C X = G ×1 A ×2 B ×3 C
Matrix representations
X (1) = A D (C ⊙ B) T X (1) = A G (1) ( C ⊗ B ) T

X (2) = B D ( C ⊙ A ) T X (2) = B G (2) ( C ⊗ A ) T
X (3) = C D ( B ⊙ A ) T X (3) = C G (3) ( B ⊗ A ) T
3. The N-tuple ( R1 , R2 , . . . , R N ) is consequently
called the multilinear rank of the tensor X.
Vector representation
Links between CPD and Tucker decomposi- vec(X) = ( C ⊙ B ⊙ A ) d vec(X) = ( C ⊗ B ⊗ A ) vec(G)
tion. Eq. (7) shows that Tucker decomposition
can be considered as an expansion in rank-1
Scalar representation
terms (polyadic but not necessary canonical),
while (4) represents CPD as a multilinear prod- R R1 R2 R3
xijk = ∑ λr ai r b j r ck r xijk = ∑ ∑ ∑ gr1 r2 r3 a i r1 b j r2 c k r3
uct of a core tensor and factor matrices (but the r =1 r1 =1 r2 =1 r3 =1
core is not necessary minimal); Table III shows
various other connections. However, despite the Matrix slices Xk = X( :, :, k)
obvious interchangeability of notation, the CP
R3
and Tucker decompositions serve different pur- Xk = A diag( ck1 , ck2 , . . . , ckR ) B T Xk = A ∑ ckr3 G( :, :, r3 ) B T
poses. In general, the Tucker core cannot be r3 =1
diagonalized, while the number of CPD terms

may not be bounded by the multilinear rank.
Consequently, in signal processing and data
analysis, CPD is typically used for factorizing tensor by its inverse, that is
data into easy to interpret components (i.e., the
X = JG; B(1), B(2) , . . . , B( N ) K
rank-1 terms), while the goal of unconstrained
Tucker decompositions is most often to com- = JH; B(1) R(1) , B(2) R(2) , . . . , B( N ) R( N ) K,
−1 −1 −1
press data into a tensor of smaller size (i.e., the H = JG; R(1) , R(2) , . . . , R( N ) K, (9)
core tensor) or to find the subspaces spanned by
the fibers (i.e., the column spaces of the factor where R(n) are invertible.
matrices). Multilinear SVD (MLSVD). Orthonormal
bases in a constrained Tucker representation can
Uniqueness. The unconstrained Tucker de- be obtained via the SVD of the mode-n matri-
composition is in general not unique, that is, cized tensor X(n) = Un Σ n VnT (i.e., B(n) = Un ,
factor matrices B(n) are rotation invariant. How- n = 1, 2, . . . , N). Due to the orthonormality, the
ever, physically, the subspaces defined by the corresponding core tensor becomes
factor matrices in Tucker decomposition are S = X ×1 U1T ×2 U2T · · · × N U TN . (10)
unique, while the bases in these subspaces may
be chosen arbitrarily — their choice is compen- Then, the singular values of X(n) are the
sated for within the core tensor. This becomes Frobenius norms of the corresponding slices of
clear upon realizing that any factor matrix in the core tensor S: (Σn )rn ,rn = kS(:, :, . . . , rn , :
(8) can be post-multiplied by any nonsingular , . . . , :)k, with slices in the same mode being
(rotation) matrix; in turn, this multiplies the core mutually orthogonal, i.e., their inner products
9
X(:,:,k)
J X(1) J (SVD/PCA)
... S1
...
Þ I @ U1 T
V1
X(:, j,:) I X(2) I (NMF/SCA)

K
Þ ... ÞK ... @ A2
T
B2
I
J X(i,:,:) K X(3) K (ICA)
ÞJ ... @ A3 T
B3 =S
...
Figure 4: Multiway Component Analysis (MWCA) for a third-order tensor, assuming that the components
are: principal and orthogonal in the first mode, nonnegative and sparse in the second mode and statistically
independent in the third mode.
are zero. The columns of Un may thus be seen trates the concept of MWCA and its flexibility in
as multilinear singular vectors, while the norms choosing the mode-wise constraints; a Tucker
of the slices of the core are multilinear singular representation of MWCA naturally accommo-
values [13]. As in the matrix case, the multilin- dates such diversities in different modes.
ear singular values govern the multilinear rank, Other applications. We have shown that
while the multilinear singular vectors allow, for Tucker decomposition may be considered as
each mode separately, an interpretation as in a multilinear extension of PCA [8]; it there-
PCA [8]. fore generalizes signal subspace techniques,
Low multilinear rank approximation. Anal- with applications including classification, fea-
ogous to PCA, a large-scale data tensor X can ture extraction, and subspace-based harmonic
be approximated by discarding the multilinear retrieval [25], [39], [75], [76]. For instance, a
singular vectors and slices of the core tensor that low multilinear rank approximation achieved
correspond to small multilinear singular values, through Tucker decomposition may yield a
that is, through truncated matrix SVDs. Low higher Signal-to-Noise Ratio (SNR) than the
multilinear rank approximation is always well- SNR in the original raw data tensor, making
posed, however, the truncation is not necessar- Tucker decomposition a very natural tool for
ily optimal in the LS sense, although a good es- compression and signal enhancement [7], [8],
timate can often be made as the approximation [24].
error corresponds to the degree of truncation.
When it comes to finding the best approxima- B LOCK T ERM D ECOMPOSITIONS
tion, the ALS type algorithms exhibit similar
advantages and drawbacks to those used for We have already shown that CPD is unique
CPD [8], [70]. Optimization-based algorithms under quite mild conditions, a further advan-
exploiting second-order information have also tage of tensors over matrices is that it is even
been proposed [71], [72]. possible to relax the rank-1 constraint on the
terms, thus opening completely new possibili-
Constraints and Tucker-based multiway
ties in e.g. BSS. For clarity, we shall consider
component analysis (MWCA). Besides orthog-
the third-order case, whereby, by replacing the
onality, constraints that may help to find unique (1) (2) (1) (2) T
basis vectors in a Tucker representation include rank-1 matrices br ◦ br = br br in (3) by
statistical independence, sparsity, smoothness low-rank matrices Ar BrT , the tensor X can be
and nonnegativity [19], [73], [74]. Components represented as (Figure 5, top):
of a data tensor seldom have the same proper- R
ties in its modes, and for physically meaningful X= ∑ (Ar BrT ) ◦ cr . (11)
representation different constraints may be re- r =1
quired in different modes, so as to match the Figure 5 (bottom) shows that we can even use
properties of the data at hand. Figure 4 illus- terms that are only required to have a low
10
(K ) c1 (K ) cR • ICA (using the JADE algorithm [10]) failed

because the signals were not statistically
@ A1 B1T +L + B TR
AR independent, as assumed in ICA.
• Low rank tensor approximation: a rank-2
(I ´ J ´ K ) ( I ´ L1 ) ( L1 ´ J ) ( I ´ LR ) ( LR ´ J ) CPD was used to estimate A as the third
factor matrix, which was then inverted to
C1 ( K ´ N 1 ) CR yield the sources. The accuracy of CPD was
compromised as the components of tensor
@ A +L + AR X cannot be represented by rank-1 terms.
1
1
B1T R
B TR • Low multilinear rank approximation: Tucker
(I ´ J ´ K ) ( I ´ L1 ) ( M1× J ) ( LR ´ M R ´ RR ) decomposition (TKD) for the multilinear
rank (4, 4, 2) was able to retrieve the col-
Figure 5: Block Term Decompositions (BTDs) find
data components that are structurally more com- umn space of the mixing matrix but could
plex than the rank-1 terms in CPD. Top: Decom- not find the individual mixing vectors due
position into terms with multilinear rank ( Lr , Lr , 1). to the non-uniqueness of TKD.
Bottom: Decomposition into terms with multilinear • BTD in multilinear rank-(2, 2, 1) terms
rank ( Lr , Mr , Nr ).
matched the data structure [78], and it is
remarkable that the sources were recovered
using as few as 6 samples in the noise-free
multilinear rank (see also Section Tucker De- case.
composition), to give:
s ŝP C A ŝI C A ŝC P D s ŝC P D ŝT K D ŝB T D
R
X= ∑ Gr × 1 A r × 2 B r × 3 C r . (12) 0.1
0
0.1
r =1 0
s1
s1
−0.1
−0.1
These so-called Block Term Decompositions −0.2
−0.2
(BTD) admit the modelling of more complex −0.3
signal components than CPD, and are unique 0.05 0.1 0.15
Time (seconds)
0.2 0.05 0.1 0.15
Time (seconds)
0.2
under more restrictive but still fairly natural

s ŝC P D ŝT K D ŝB T D PCA ICA CPD TKD BTD
conditions [77]–[79]. 0.3
60
Example 3. To compare some standard and 0.2

40
SAE (dB)
tensor approaches for the separation of short 0.1

s2
0
duration correlated sources, BSS was performed −0.1
20
on five linear mixtures of the sources s1 (t) = −0.2

0
0.05 0.1 0.15 0.2 0 10 20 30 40
sin(6πt) and s2 (t) = exp(10t) sin(20πt), which Time (seconds) SNR (dB)
were contaminated by white Gaussian noise, Figure 6: Blind separation of the mixture of a pure
to give the mixtures X = AS + E ∈ R5×60 , sine wave and an exponentially modulated sine wave
where S(t) = [s1 (t), s2 (t)] T and A ∈ R5×2 was a using PCA, ICA, CPD, Tucker decomposition (TKD)
and BTD. The sources s1 and s2 are correlated and
random matrix whose columns (mixing vectors) of short duration; the symbols ŝ1 and ŝ2 denote the
satisfy a1T a2 = 0.1, k a1 k = k a2 k = 1. The estimated sources.
3Hz sine wave did not complete a full period
over the 60 samples, so that the two sources
|s T s |
had a correlation degree of ks k1 ks2 k = 0.35. H IGHER -O RDER C OMPRESSED S ENSING
1 2 2 2
The tensor approaches, CPD, Tucker decompo- The aim of Compressed Sensing (CS) is to
sition and BTD employed a third-order tensor provide faithful reconstruction of a signal of
X of size 24 × 37 × 5 generated from five Han- interest when the set of available measurements
kel matrices whose elements obey X(i, j, k) = is (much) smaller than the size of the original
X(k, i + j − 1) (see Section Tensorization). The signal [80]–[83]. Formally, we have available M
average squared angular error (SAE) was used (compressive) data samples y ∈ R M , which
as the performance measure. Figure 6 shows the are assumed to be linear transformations of
simulation results, illustrating that: the original signal x ∈ R I (M < I). In other
• PCA failed since the mixing vectors were words, y = Φx, where the sensing matrix Φ ∈
not orthogonal and the source signals were R M× I is usually random. Since the projections
correlated, both violating the assumptions are of a lower dimension than the original
for PCA. data, the reconstruction is an ill-posed inverse
11
problem, whose solution requires knowledge Sparse Vector Representation (Kronecker-CS)

of the physics of the problem converted into Sparse
Measurement vector vector
constraints. For example, a 2D image X ∈ R I1 × I2 (compressed
W
can be vectorized as a long vector x = vec(X) ∈ sensing) g
R I (I = I1 I2 ) that admits sparse representation
in a known dictionary B ∈ R I × I , so that x = Bg,
(
y » @ W(3) Ä W(2) Ä W(1) »
)
where the matrix B may be a wavelet or dis- (M M M ) (M M M Í I I ) (I I I )
1 2 3 1 2 3 12 3 12 3
crete cosine transform (DCT) dictionary. Then,
faithful recovery of the original signal x requires ß
finding the sparsest vector g such that: Block Sparse Tucker Representation
(3) (M Í )
y = Wg, with kgk0 ≤ K, W = ΦB, (13) Measurement tensor Block sparse W 3 3
where k · k0 is the ℓ0 -norm (number of non-zero (compressed sensing) core tensor

entries) and K ≪ I.
Since the ℓ0 -norm minimization is not practi- @ W
(1) (2)
W
cal, alternative solutions involve iterative refine-
ments of the estimates of vector g using greedy
(M ´ M ´M ) (M ´ I ) (I ´ I Í ) (M Í )
algorithms such as the Orthogonal Matching 1 2 3 1 1 1 2 3 2 2
Pursuit (OMP) algorithm, or the ℓ1 -norm min-
imization algorithms (kgk1 = ∑iI=1 | gi |) [83]. Figure 7: Compressed sensing with a Kronecker-
Low coherence of the composite dictionary ma- structured dictionary. Top: Vector representation. Bot-
trix W is a prerequisite for a satisfactory recov- tom: Tensor representation; Orthogonal Matching
ery of g (and hence x) — we need to choose Pursuit (OMP) can perform faster if the sparse entries
belong to a small subtensor, up to permutation of the
Φ and B so that the correlation between the
columns of W (1) , W (2) , W (3) .
columns of W is minimum [83].
When extending the CS framework to tensor
data, we face two obstacles:
• Loss of information, such as spatial and con- since g is sparse, however, computing W T y is
textual relationships in data, when a tensor expensive but can be efficiently implemented
X ∈ R I1 × I2 ×···× IN is vectorized. through a sequence of products involving much
• Data handling, since the size of vectorized smaller matrices W(n) [85]. We refer to [84] for
data and the associated dictionary B ∈ links between the coherence of factors W(n) and
R I × I easily becomes prohibitively large the coherence of the global composite dictionary
(see Section Curse of Dimensionality), es- matrix W.
pecially for tensors of high order. Figure 7 and Table III illustrate that the
Fortunately, tensor data are typically highly Kronecker-CS model is effectively a vectorized
structured – a perfect match for compressive Tucker decomposition with a sparse core. The
sampling – so that the CS framework relaxes tensor equivalent of the CS paradigm in (13) is
data acquisition requirements, enables compact therefore to find the sparsest core tensor G such
storage, and facilitates data completion (inpaint- that:
ing of missing samples due to a broken sensor Y∼
= G ×1 W ( 1 ) ×2 W ( 2 ) · · · × N W ( N ) , (14)
or unreliable measurement).
Kronecker-CS for fixed dictionaries. In many with kGk0 ≤ K, for a given set of mode-
applications, the dictionary and the sensing ma- wise dictionaries B(n) and sensing matrices Φ(n)
trix admit a Kronecker structure (Kronecker-CS (n = 1, 2, . . . , N). Working with several small
model), as illustrated in Figure 7 (top) [84]. In dictionary matrices, appearing in a Tucker rep-
this way, the global composite dictionary matrix resentation, instead of a large global dictionary
becomes W = W( N ) ⊗ W( N −1) ⊗ · · · ⊗ W(1) , matrix, is an example of the use of tensor struc-
where each term W(n) = Φ(n) B(n) has a reduced ture for efficient representation, see also Section
dimensionality since B(n) ∈ R In × In and Φ(n) ∈ Curse of Dimensionality.
R Mn × In . Denote M = M1 M2 · · · M N and I = A higher-order extension of the OMP algo-
I1 I2 · · · IN , and since Mn ≤ In , n = 1, 2, . . . , N, rithm, referred to as the Kronecker-OMP algo-
this reduces storage requirements by a factor rithm [85], requires K iterations to find the K
∑n In Mn
MI . The computation of Wg is affordable non-zero entries of the core tensor G. Additional
12
Kronecker-CS of a 32-channel hyperspectral image

computational advantages can be gained if it
can be assumed that the K non-zero entries
belong to a small subtensor of G, as shown in
Figure 7 (bottom); such a structure is inherent
to e.g., hyperspectral imaging [85], [86] and
=
3D astrophysical signals. More precisely, if the
K = L N non-zero entries are located within a
subtensor of size (L × L × · · · × L), where L ≪
In , then the so-called N-way Block OMP algo-
rithm (N-BOMP) requires at most NL iterations,
which is linear in N [85]. The Kronecker-CS Original hyperspectral image - RGB display
model has been applied in Magnetic Resonance (1024 x 1024 x 32) (256 x 256 x 32)
Imaging (MRI), hyper-spectral imaging, and in
the inpainting of multiway data [84], [86].
Approaches without fixed dictionaries.
In Kronecker-CS the mode-wise dictionaries
B(n) ∈ R In × In can be chosen so as best to rep-
resent physical properties or prior knowledge
about the data. They can also be learned from
a large ensemble of data tensors, for instance in
an ALS type fashion [86]. Instead of the total Reconstruction (SP=33%, PSNR = 35.51dB) - RGB display
number of sparse entries in the core tensor, (1024 x 1024 x 32) (256 x 256 x 32)
the size of the core (i.e., the multilinear rank)
may be used as a measure for sparsity so as
to obtain a low-complexity representation from
compressively sampled data [87], [88]. Alterna-
tively, a PD representation can be used instead
of a Tucker representation. Indeed, early work
in chemometrics involved excitation-emission
data for which part of the entries was unreliable
Figure 8: Multidimensional compressed sensing of a
because of scattering; the CPD of the data tensor 3D hyperspectral image using Tucker representation
is then computed by treating such entries as with a small sparse core in wavelet bases.
missing [7]. While CS variants of several CPD
algorithms exist [59], [89], the “oracle” prop-
erties of tensor-based models are still not as sparse core tensor and, subsequently, recon-
well understood as for their standard models; struct the original 3D image as shown in Figure
a notable exception is CPD with sparse factors 8 (bottom). For the Sampling Ratio SP=33%
[90]. (M1 = M2 = 585) this gave the Peak Signal
Example 2. Figure 8 shows an original to Noise Ratio (PSNR) of 35.51dB, while taking
3D (1024 × 1024 × 32) hyperspectral image X 71 minutes to compute the required Niter = 841
which contains scene reflectance measured at sparse entries. For the same quality of recon-
32 different frequency channels, acquired by a struction (PSNR=35.51dB), the more conven-
low-noise Peltier-cooled digital camera in the tional Kronecker-OMP algorithm found 0.1%
wavelength range of 400–720 nm [91]. Within of the wavelet coefficients as significant, thus
the Kronecker-CS setting, the tensor of compres- requiring Niter = K = 0.001 × (1024 × 1024 ×
sive measurements Y was obtained by multi- 32) = 33, 555 iterations and days of computa-
plying the frontal slices by random Gaussian tion time.
sensing matrices Φ(1) ∈ R M1 ×1024 and Φ(2) ∈
R M2 ×1024 (M1 , M2 < 1024) in the first and L ARGE -S CALE D ATA AND C URSE OF
second mode, respectively, while Φ(3) ∈ R32×32 D IMENSIONALITY
was the identity matrix (see Figure 8 (top)). The sheer size of tensor data easily exceeds
We used Daubechies wavelet factor matrices the memory or saturates the processing capabil-
B(1) = B(2) ∈ R1024×1024 and B(3) ∈ R32×32 , ity of standard computers, it is therefore natural
and employed N-BOMP to recover the small to ask ourselves how tensor decompositions can
13
be computed if the tensor dimensions in all

or some modes are large or, worse still, if the A (1)
(1) (2) (3)
(3)
G G B
tensor order is high. The term curse of dimen-
sionality, in a general sense, was introduced by ( I1´ R1) ( R1´ I2 ´ R2 ) ( R2 ´ I3 ´ R3) ( R3 ´ I4 ´ R4 ) ( R4 ´ I5 )
Bellman to refer to various computational bot- Figure 9: Tensor Train (TT) decomposition of
tlenecks when dealing with high-dimensional a fifth-order tensor X ∈ R I1 × I2 ×···× I5 , consist-
settings. In the context of tensors, the curse of ing of two matrix carriages and three third-
dimensionality refers to the fact that the number order tensor carriages. The five carriages are con-
of elements of an Nth-order ( I × I × · · · × I ) nected through tensor contractions, which can
be expressed in a scalar form as xi1 ,i2 ,i3 ,i4 ,i5 =
tensor, I N , scales exponentially with the tensor R R R ( 1) ( 2) ( 3)
order N. For example, the number of values ∑r11=1 ∑r22=1 · · · ∑r44=1 ai1 ,r1 gr1 ,i2 ,r2 gr2 ,i3 ,r3 gr3 ,i4 ,r4 br4 ,i5 .
of a discretized function in Figure 1 (bottom),
quickly becomes unmanageable in terms of both
computations and storing as N increases. In tum information theory, termed tensor networks,
addition to their standard use (signal separa- which represents a tensor of a possibly very
tion, enhancement, etc.), tensor decompositions high order as a set of sparsely interconnected
may be elegantly employed in this context as matrices and core tensors of low order (typ-
efficient representation tools. The first question ically, order 3). These low-dimensional cores
is then which type of tensor decomposition is are interconnected via tensor contractions to
appropriate. provide a highly compressed representation of
Efficient data handling. If all computations a data tensor. In addition, existing algorithms
are performed on a CP representation and not for the approximation of a given tensor by a
on the raw data tensor itself, then instead of tensor network have good numerical properties,
the original I N raw data entries, the number of making it possible to control the error and
parameters in a CP representation reduces to achieve any desired accuracy of approximation.
N IR, which scales linearly in N (see Table IV). For example, tensor networks allow for the
This effectively bypasses the curse of dimension- representation of a wide class of discretized
ality, while giving us the freedom to choose the multivariate functions even in cases where the
rank R as a function of the desired accuracy number of function values is larger than the
[14]; on the other hand the CP approximation number of atoms in the universe [21], [27], [28].
may involve numerical problems (see Section
Canonical Polyadic Decomposition).
Examples of tensor networks are the hierar-
Compression is also inherent to Tucker decom-
chical Tucker (HT) decompositions and Tensor
position, as it reduces the size of a given data
Trains (TT) (see Figure 9) [15], [16]. The TTs are
tensor from the original I N to ( N IR + R N ), thus
also known as Matrix Product States (MPS) and
exhibiting an approximate compression ratio of
have been used by physicists more than two
( RI ) N . We can then benefit from the well under-
decades (see [92], [93] and references therein).
stood and reliable approximation by means of
The PARATREE algorithm was developed in
matrix SVD, however, this is only meaningful
signal processing and follows a similar idea, it
for low N.
uses a polyadic representation of a data tensor
TABLE IV: Storage complexities of tensor models for (in a possibly nonminimal number of terms),
an Nth-order tensor X ∈ R I × I ×···× I , whose original whose computation then requires only the ma-
storage complexity is O( I N ). trix SVD [94].
For very large-scale data that exhibit a well-
1. CPD O( N IR) defined structure, an even more radical ap-
proach can be employed to achieve a parsi-
2. Tucker O( N IR + R N )
monious representation — through the concept
3. TT O( N IR2 ) of quantized or quantic tensor networks (QTN)
[27], [28]. For example, a huge vector x ∈ R I
4. QTT O( NR2 log2 ( I )) with I = 2 L elements can be quantized and
tensorized through reshaping into a (2 × 2 ×
Tensor networks. A numerically reliable way · · · × 2) tensor X of order L, as illustrated in
to tackle curse of dimensionality is through a Figure 1 (top). If x is an exponential signal,
concept from scientific computing and quan- x (k) = azk , then X is a symmetric rank-1 tensor
14
(1)
C
(1) @
(1)T
A
(1) (1) B
...
(k) C
C
(1) Þ (k) @ Þ B
T
(k) (k)T
A (k) B (K)
...
(K)
C A
(K) @
(k) (K) (K)T
A (K) B
Figure 10: Efficient computation of the CP and Tucker decompositions, whereby tensor decompositions are
computed in parallel for sampled blocks, these are then merged to obtain the global components A, B, C
and a core tensor G.
that can be represented by two parameters: for efficient computation and for tracking
the scaling factor a and the generator z (cf. decompositions in the case of nonstationary
(2) in Section Tensorization). Non-symmetric data.
terms provide further opportunities, beyond the The second approach would be to employ
sum-of-exponential representation by symmet- compressed sensing ideas (see Section Higher-
ric low-rank tensors. Huge matrices and tensors Order Compressed Sensing) to fit an algebraic
may be dealt with in the same manner. For model with a limited number of parameters to
instance, an Nth-order tensor X ∈ R I1 ×···× IN , possibly large data. In addition to completion,
with In = q Ln , can be quantized in all modes the goal here is a significant reduction of the
simultaneously to yield a (q × q × · · · × q) quan- cost of data acquisition, manipulation and stor-
tized tensor of higher order. In QTN, q is small, age — breaking the Curse of Dimensionality
typically q = 2, 3, 4, for example, the binary being an extreme case.
encoding (q = 2) reshapes an Nth-order ten- While algorithms for this purpose are avail-
sor with (2 L1 × 2 L2 × · · · × 2 L N ) elements into able both for low rank and low multilinear
a tensor of order ( L1 + L2 + · · · + L N ) with rank representation [59], [87], an even more
the same number of elements. The tensor train drastic approach would be to directly adopt
decomposition applied to quantized tensors is sampled fibers as the bases in a tensor repre-
referred to as the quantized TT (QTT); variants sentation. In the Tucker decomposition setting
for other tensor representations have also been we would choose the columns of the factor
derived [27], [28]. In scientific computing, such matrices B(n) as mode-n fibers of the tensor,
formats provide the so-called super-compression which requires addressing the following two
— a logarithmic reduction of storage require- problems: (i) how to find fibers that allow us
ments: O( I N ) → O( N logq ( I )). to best represent the tensor, and (ii) how to
Computation of the decomposi- compute the corresponding core tensor at a
tion/representation. Now that we have low cost (i.e., with minimal access to the data).
addressed the possibilities for efficient tensor The matrix counterpart of this problem (i.e.,
representation, the question that needs to be representation of a large matrix on the basis of
answered is how these representations can a few columns and rows) is referred to as the
be computed from the data in an efficient pseudoskeleton approximation [98], where the opti-
manner. The first approach is to process the mal representation corresponds to the columns
data in smaller blocks rather than in a batch and rows that intersect in the submatrix of
manner [95]. In such a “divide-and-conquer” maximal volume (maximal absolute value of the
approach, different blocks may be processed determinant). Finding the optimal submatrix is
in parallel and their decompositions carefully computationally hard, but quasi-optimal sub-
recombined (see Figure 10) [95], [96]. In matrices may be found by heuristic so-called
fact, we may even compute the decomposition “cross-approximation” methods that only re-
through recursive updating, as new data arrive quire a limited, partial exploration of the data
[97]. Such recursive techniques may be used matrix. Tucker variants of this approach have
15
123%&45&6$78&$9:4;<2=&
C(3) !
!"#$%&'()& T R
P
S
>$;<=&#?2@?1&$&A9=3 prT
*'(+&,'(+&
X @ T =
-./+&0&
-
-./+ tr
C(1) r =1
(I ´ N ) (I ´ R ) (R ´ N )
G
Y ∼
=
C(2)
QT R
Figure 11: Tucker representation through fiber sam-
pling and cross-approximation: the columns of factor
Y @ U = S
r =1 ur
qrT
matrices are sampled from the fibers of the original (I ´ M ) (I ´ R) (R ´ M )

data tensor X. Within MWCA the selected fibers may
be further processed using BSS algorithms. Figure 12: The basic PLS model performs joint
sequential low-rank approximation of the matrix of
predictors X and the matrix of responses Y, so as
to share (up to the scaling ambiguity) the latent
been derived in [99]–[101] and are illustrated in components — columns of the score matrices T and
Figure 11, while cross-approximation for the TT U. The matrices P and Q are the loading matrices
for predictors and responses, and E and F are the
format has been derived in [102]. Following a
corresponding residual matrices.
somewhat different idea, a tensor generalization
of the CUR decomposition of matrices samples
fibers on the basis of statistics derived from the
contraction of the data matrix X to the princi-
data [103].
pal eigenvector score matrix T = [t1 , . . . , t R ] of
rank R; (ii) ensures that the tr components are
M ULTIWAY R EGRESSION — H IGHER O RDER maximally correlated with the ur components
PLS (HOPLS) in the approximation of the responses Y, this is
achieved when the ur ’s are scaled versions of
Multivariate regression. Regression refers to the tr ’s. The Y-variables are then regressed on
the modelling of one or more dependent variables the matrix U = [u1 , . . . , u R ]. Therefore, PLS is a
(responses), Y, by a set of independent data (pre- multivariate model with inferential ability that
dictors), X. In the simplest case of conditional aims to find a representation of X (or a part of
MSE estimation, ŷ = E(y| x ), the response y is a X) that is relevant for predicting Y, using the
linear combination of the elements of the vector model
of predictors x; for multivariate data the Multi- R
variate Linear Regression (MLR) uses a matrix X = T PT + E = ∑ tr prT + E, (15)
model, Y = XP + E, where P is the matrix of r =1
coefficients (loadings) and E the residual matrix. R
The MLR solution gives P = X T X
−1 T
X Y, and Y = U QT + F = ∑ ur qrT + F. (16)
r =1
involves inversion of the moment matrix X T X.
A common technique to stabilize the inverse of The score vectors tr provide an LS fit of X-data,
the moment matrix X T X is principal component while at the same time the maximum correla-
regression (PCR), which employs low rank ap- tion between t- and u-scores ensures a good
proximation of X. predictive model for Y-variables. The predicted
Modelling structure in data — the PLS. responses Ynew are then obtained from new data
Notice that in stabilizing multivariate regression Xnew and the loadings P and Q.
PCR uses only information in the X-variables, In practice, the score vectors tr are extracted
with no feedback from the Y-variables. The idea sequentially, by a series of orthogonal projec-
behind the Partial Least Squares (PLS) method tions followed by the deflation of X. Since the
is to account for structure in data by assum- rank of Y is not necessarily decreased with
ing that the underlying system is governed by each new t r , we may continue deflating until
a small number, R, of specifically constructed the rank of the X-block is exhausted, so as to
latent variables, called scores, that are shared balance between prediction accuracy and model
between the X- and Y-variables; in estimating order.
the number R, PLS compromises between fitting The PLS concept can be generalized to tensors
X and predicting Y. Figure 12 illustrates that the in the following ways:
PLS procedure: (i) uses eigenanalysis to perform 1) By unfolding multiway data. For example
16
(2) (2)
X( I × J × K ) and Y( I × M × N ) can be P1 PR
flattened into long matrices X( I × JK ) and (I3 ´ L3) (I3 ´ L3)
Y( I × MN ), so as to admit matrix-PLS (see
Figure 12). However, the flattening prior
@ P1
(1)T
+L+ PR
(1)T
(1) (R)
to standard bilinear PLS obscures structure t1 tR
X X
in multiway data and compromises the (I1 ´ I2 ´ I3) ( I1) (L2 ´ I2 ) ( I1) (L2 ´ I2 )
interpretation of latent components.
2) By low rank tensor approximation. The so- (2) ( I3 ´ R3L3)
P
called N-PLS attempts to find score vec-
tors having maximal covariance with re-
sponse variables, under the constraints ...
= T P(1)T
that tensors X and Y are decomposed as
X
a sum of rank-one tensors [104].
(I1 ´ R) (R ´ RL2 ´ RL3) (RL2 ´ I2 )
3) By a BTD-type approximation, as in the
Higher Order PLS (HOPLS) model shown
in Figure 13 [105]. The use of block terms
(2) (2)
within HOPLS equips it with additional Q1 QR
( J 3 ´ L3) ( J 3 ´ L3)
flexibility, together with a more realistic
analysis than unfolding-PLS and N-PLS. @ Q1
(1)T
+L+ QR(1)T
The principle of HOPLS can be formalized as u1 (1) (R)
uR
a set of sequential approximate decompositions Y Y
(I1 ´ J 2 ´ J3) ( I1) ( L2 ´ J 2 ) ( I1) ( L2 ´ J 2 )
of the independent tensor X ∈ R I1 × I2 ×···× IN and
the dependent tensor Y ∈ R J1 × J2 ×···× JM (with
I1 = J1 ), so as to ensure maximum similarity Q (2) ( J 3 ´ R3L3)
(correlation) between the scores tr and ur within
the loadings matrices T and U, based on
... (1)
R U Q
X∼
(r ) (1) ( N −1 ) =
= ∑ G X ×1 t r ×2 P r · · · × N Pr (17) Y
r =1
R ( I1 ´ R) ( R ´ RL2 ´ RL3) ( RL2 ´ J 2 )
(r ) (1) ( M −1 )
Y∼
= ∑ GY ×1 ur ×2 Qr · · · × N Qr . (18)
r =1 Figure 13: The principle of Higher Order PLS
(HOPLS) for third-order tensors. The core tensors GX
A number of data-analytic problems can be and GY are block-diagonal. The BTD-type structure
reformulated as either regression or “similarity allows for the modelling of general components that
analysis” (ANOVA, ARMA, LDA, CCA), so that are highly correlated in the first mode.
both the matrix and tensor PLS solutions can be
generalized across exploratory data analysis.
Example 4: Decoding of a 3D represented as a third-order tensor Y
hand movement trajectory from the (time×3D marker position×marker no).
electrocorticogram (ECoG). The predictive The goal of the training stage is to identify
power of tensor-based PLS is illustrated on (r ) (r ) ( n) (n)
the HOPLS parameters: GX , GY , Pr , Qr ,
a real-world example of the prediction of see also Figure 13. In the test stage, the
arm movement trajectory from ECoG. Fig. movement trajectories, Y∗ , for the new
14(left) illustrates the experimental setup,
ECoG data, X∗ , are predicted through
whereby 3D arm movement of a monkey multilinear projections: (i) the new scores,
was captured by an optical motion capture
tr∗ , are found from new data, X∗ , and the
system with reflective markers affixed to (r ) (1) (2) (3)
existing model parameters: GX , Pr , Pr , Pr ,
the left shoulder, elbow, wrist, and hand;
(ii) the predicted trajectory is calculated as
for full detail see (http://neurotycho.org). (r ) (1) (2) (3)
The predictors (32 ECoG channels) Y∗ ≈ ∑rR=1 GY ×1 t r∗ ×2 Qr ×3 Qr ×4 Qr . In
naturally build a fourth-order tensor X the simulations, standard PLS was applied in
(time×channel no×epoch length×frequency) the same way to the unfolded tensors.
while the movement trajectories for Figure 14(right) shows that although the stan-
the four markers (response) can be dard PLS was able to predict the movement cor-
17
Figure 14: Prediction of arm movement from brain electrical responses. Left: Experiment setup. Middle:
Construction of the data and response tensors and training. Right: The new data tensor (bottom) and the
predicted 3D arm movement trajectories (X, Y, Z coordinates) obtained by tensor-based HOPLS and standard
matrix-based PLS (top).
responding to each marker individually, such The linked multiway component analysis
prediction is quite crude as the two-way PLS (LMWCA) [106], shown in Figure 15, performs
does not adequately account for mutual infor- such decomposition into shared and individ-
mation among the four markers. The enhanced ual factors, and is formulated as a set of ap-
predictive performance of the BTD-based HO- proximate joint Tucker decompositions of a set
PLS (red line in Fig.14(right)) is therefore at- of data tensors X(k) ∈ R I1 × I2 ×···× IN , (k =
tributed to its ability to model interactions be- 1, 2, . . . , K ):
tween complex latent components of both pre-
dictors and responses. X( k ) ∼
= G(k) ×1 B(1,k) ×2 B(2,k) · · · × N B( N,k) , (19)
L INKED M ULTIWAY C OMPONENT A NALYSIS where each factor matrix B(n,k) =

(n) ( n,k )
AND T ENSOR D ATA F USION [BC , B I ] ∈ R In × Rn has: (i) components
(n)
Data fusion concerns joint analysis of an en- BC ∈ R In ×Cn (with 0 ≤ Cn ≤ Rn ) that
semble of data sets, such as multiple “views” are common (i.e., maximally correlated)
of a particular phenomenon, where some parts to all tensors, and (ii) components
( n,k )
of the “scene” may be visible in only one or BI ∈ R In ×( Rn −Cn ) that are tensor-specific.
a few data sets. Examples include fusion of The objective is to estimate the common
(n)
visual and thermal images in low visibility components BC , the individual components
conditions, or the analysis of human electro- ( n,k )
B I , and, via the core tensors G(k) , their
physiological signals in response to a certain mutual interactions. As in MWCA (see Section
stimulus but from different subjects and trials; Tucker Decomposition), constraints may be
these are naturally analyzed together by means imposed to match data properties [73], [76].
of matrix/tensor factorizations. The “coupled” This enables a more general and flexible
nature of the analysis of multiple datasets en- framework than group ICA and Independent
sures that there may be common factors across Vector Analysis, which also perform linked
the datasets, and that some components are not analysis of multiple data sets but assume
shared (e.g., processes that are independent of that: (i) there exist only common components
excitations or stimuli/tasks).
18
(3)
BC B(3,1)
I
(1)
BC B(1,1)
I B(3,1)
(1) @ B(1,1) BC
(2)T
(1)
(2,1)T Sample images from different and same categories
B
(2,1)T BI
4.+8)8)2%9+,+ &' ')%*$+,-.$/%
…
… !"
567&!
567&!
*'.%$+01%0+,$2'.3
(3) (3, K)
!""#$%
BC BI !"
(1) K) (I3 × R3)
BC B(1,I B(3, K ) !" 4$/,%
4$/,%/+ "#$
(K ) @
!"
!"
(1,K ) (2)T
B BC
(K)
(2, K )T (2,K )T
B BI !"
Find the class
567&!
567&! whose common
&'(%
(I1 × I2 × I3) (I1 × R1) (R1 × R2 × R3 ) (R2 × I2) !"
features
featu best
Figure 15: Coupled Tucker decomposition for linked
!" match the test
sample%
multiway component analysis (LMWCA). The data
tensors have both shared and individual components.
Constraints such as orthogonality, statistical indepen- Classification based on LMWCA
dence, sparsity and non-negativity may be imposed
where appropriate. LWCA KNN−PCA LDA−PCA
95
and (ii) the corresponding latent variables are 90

Accuracy (%)
statistically independent [107], [108], both quite 85

stringent and limiting assumptions. As an
alternative to Tucker decompositions, coupled 80
tensor decompositions may be of a polyadic or 75

even block term type [89], [109]. 10 20 30 40 50
Size of training data (%)
Example 5: Feature extraction and classifica-
tion of objects using LMWCA. Classification Performance comparison
based on common and distinct features of Figure 16: Classification of color objects belonging
natural objects from the ETH-80 database to different categories. Due to using only common
(http://www.d2.mpi-inf.mpg.de/Datasets) features, LMWCA achieves a high classification rate,
was performed using LMWCA, whereby the even when the training set is small.
discrimination among objects was performed
using only the common features. This dataset
consists of 3280 images in 8 categories, K-NN and LDA classifiers, the latter using
each containing 10 objects with 41 views 50 principal components as features. The
per object. For each category, the training enhanced classification results for LMWCA are
data were organized in two distinct fourth- attributed to the fact that the classification only
order (128 × 128 × 3 × I4 ) tensors, where makes use of the common components and
I4 = 10 × 41 × 0.5p, with p the fraction of is not hindered by components that are not
training data. LMWCA was applied to these shared across objects or views.
two tensors to find the common and individual
features, with the number of common features S OFTWARE
set to 80% of I4 . In this way, eight sets of The currently available software resources for
common features were obtained for each tensor decompositions include:
category. The test sample label was assigned • The Tensor Toolbox, a versatile framework
to the category whose common features for basic operations on sparse and dense
matched the new sample best (evaluated tensors, including CPD and Tucker formats
by canonical correlations) [110]. Figure 16 [111].
shows the results over 50 Monte Carlo runs • The TDALAB and TENSORBOX, which
and compares LMWCA with the standard provide a user-friendly interface and ad-
19
vanced algorithms for CPD, nonnegative We have also discussed multilinear variants of
Tucker decomposition and MWCA [112], several standard signal processing tools such
[113]. as multilinear SVD, ICA, NMF and PLS, and
• The Tensorlab toolbox builds upon the have shown that tensor methods can operate
complex optimization framework and of- in a deterministic way on signals of very short
fers numerical algorithms for computing duration.
the CPD, BTD and Tucker decompositions. At present the uniqueness conditions of stan-
The toolbox includes a library of con- dard tensor models are relatively well under-
straints (e.g. nonnegativity, orthogonality) stood and efficient computation algorithms do
and the possibility to combine and jointly exist, however, for future applications several
factorize dense, sparse and incomplete ten- challenging problems remain to be addressed
sors [89]. in more depth:
• The N-Way Toolbox, which includes (con- • A whole new area emerges when several
strained) CPD, Tucker decomposition and decompositions which operate on different
PLS in the context of chemometrics ap- datasets are coupled, as in multiview data
plications [114]. Many of these methods where some details of interest are visible
can handle constraints (e.g., nonnegativity, in only one mode. Such techniques need
orthogonality) and missing elements. theoretical support in terms of existence,
• The TT Toolbox, the uniqueness, and numerical properties.
Hierarchical Tucker Toolbox and the • As the complexity of advanced models in-
Tensor Calculus library provide tensor creases, their computation requires efficient
tools for scientific computing [115]–[117]. iterative algorithms, extending beyond the
• Code developed for multiway ALS class.
analysis is also available from the • Estimation of the number of components
Three-Mode Company [118]. in data, and the assessment of their dimen-
sionality would benefit from automation,
C ONCLUSIONS AND F UTURE D IRECTIONS especially in the presence of noise and out-
We live in a world overwhelmed by data, liers.
from multiple pictures of Big Ben on various so- • Both new theory and algorithms are
cial web links to terabytes of data in multiview needed to further extend the flexibility of
medical imaging, while we may need to repeat tensor models, e.g., for the constraints to
the scientific experiments many times to obtain be combined in many ways, and tailored to
ground truth. Each snapshot gives us a some- the particular signal properties in different
what incomplete view of the same object, and modes.
involves different angles, illumination, lighting • Work on efficient techniques for saving
conditions, facial expressions, and noise. and/or fast processing of ultra large-scale
We have cast a light on tensor decomposi- tensors is urgent, these now routinely oc-
tions as a perfect match for exploratory anal- cupy tera-bytes, and will soon require peta-
ysis of such multifaceted data sets, and have bytes of memory.
illustrated their applications in multi-sensor and • Tools for rigorous performance analysis
multi-modal signal processing. Our emphasis and rule of thumb performance bounds
has been to show that tensor decompositions need to be further developed across tensor
and multilinear algebra open completely new decomposition models.
possibilities for component analysis, as com- • Our discussion has been limited to tensor
pared with the “flat view” of standard two-way models in which all entries take values
methods. independently of one another. Probabilistic
Unlike matrices, tensors are multiway arrays versions of tensor decompositions incor-
of data samples whose representations are typ- porate prior knowledge about complex
ically overdetermined (fewer parameters in the variable interaction, various data alphabets,
decomposition than the number of data entries). or noise distributions, and so promise to
This gives us an enormous flexibility in finding model data more accurately and efficiently
hidden components in data and the ability to [119], [120].
enhance both robustness to noise and tolerance It is fitting to conclude with a quote from
to missing data samples and faulty sensors. Marcel Proust “The voyage of discovery is not in
20
seeking new landscapes but in having new eyes”. Center, RIKEN-BSI. His research interests include multiway
We hope to have helped to bring to the eyes data analysis, brain computer interface and machine learn-
ing.
of the Signal Processing Community the multi- Lieven De Lathauwer received the Ph.D. degree from
disciplinary developments in tensor decomposi- the Faculty of Engineering, KU Leuven, Belgium, in 1997.
tions, and to have shared our enthusiasm about From 2000 to 2007 he was Research Associate with the
Centre National de la Recherche Scientifique, France. He
tensors as powerful tools to discover new land- is currently Professor with KU Leuven. He is affiliated
scapes. The future computational, visualization with both the Group Science, Engineering and Technology
and interpretation tools will be important next of Kulak, with the Stadius Center for Dynamical Systems,
Signal Processing and Data Analytics of the Electrical Engi-
steps in supporting the different communities neering Department (ESAT) and with iMinds Future Health
working on large-scale and big data analysis Department. He is Associate Editor of the SIAM Journal on
problems. Matrix Analysis and Applications and has served as Asso-
ciate Editor for the IEEE Transactions on Signal Processing.
His research concerns the development of tensor tools for
B IOGRAPHICAL NOTES engineering applications.
Andrzej Cichocki received the Ph.D. and Dr.Sc. (habilita-
tion) degrees, all in electrical engineering, from the Warsaw R EFERENCES
University of Technology (Poland). He is currently a Senior
Team Leader of the laboratory for Advanced Brain Signal [1] F. L. Hitchcock, “Multiple invariants and generalized
Processing, at RIKEN Brain Science Institute (JAPAN) and rank of a p-way matrix or tensor,” Journal of Mathe-
Professor at Systems Research Institute, Polish Academy of matics and Physics, vol. 7, pp. 39–79, 1927.
Science(POLAND). He has authored of more than 400 pub- [2] R. Cattell, “Parallel proportional profiles and other
lications and 4 monographs in the areas of signal processing principles for determining the choice of factors by
and computational neuroscience. He serves as Associate rotation,” Psychometrika, vol. 9, pp. 267–283, 1944.
Editor for the IEEE Transactions on Signal Processing and [3] L. R. Tucker, “The extension of factor analysis to
Journal Neuroscience Methods. three-dimensional matrices,” in Contributions to Math-
Danilo P. Mandic is a Professor of signal processing at Im- ematical Psychology, H. Gulliksen and N. Frederiksen,
perial College London, London, U.K. and has been working Eds. New York: Holt, Rinehart and Winston, 1964,
in the area of nonlinear and multidimensional adaptive sig- pp. 110–127.
nal processing and time-frequency analysis. His publication [4] ——, “Some mathematical notes on three-mode factor
record includes two research monographs titled Recurrent analysis,” Psychometrika, vol. 31, no. 3, pp. 279–311,
Neural Networks for Prediction (West Sussex, U.K.: Wiley, September 1966.
August 2001) and Complex Valued Nonlinear Adaptive [5] J. Carroll and J.-J. Chang, “Analysis of individ-
Filters: Noncircularity, Widely Linear and Neural Models, ual differences in multidimensional scaling via an
an edited book titled Signal Processing for Information n-way generalization of ’Eckart-Young’ decomposi-
Fusion, and more than 200 publications on signal and image tion,” Psychometrika, vol. 35, no. 3, pp. 283–319,
processing. September 1970.
Anh Huy Phan received the Ph.D. degree from the Kita [6] R. A. Harshman, “Foundations of the PARAFAC pro-
Kyushu Institute of Technology, Japan in 2011. He worked cedure: Models and conditions for an explanatory
as Deputy Head of the Research and Development Depart- multimodal factor analysis,” UCLA Working Papers in
ment, Broadcast Research and Application Center, Vietnam Phonetics, vol. 16, pp. 1–84, 1970.
Television, and is currently a Research Scientist at the [7] A. Smilde, R. Bro, and P. Geladi, Multi-way Analysis:
Laboratory for Advanced Brain Signal Processing, and a Applications in the Chemical Sciences. New York: John
Visiting Research Scientist with the Toyota Collaboration Wiley & Sons Ltd, 2004.
Center, Brain Science Institute, RIKEN. He has served on the [8] P. Kroonenberg, Applied Multiway Data Analysis. New
Editorial Board of International Journal of Computational York: John Wiley & Sons Ltd, 2008.
Mathematics. His research interests include multilinear al- [9] C. Nikias and A. Petropulu, Higher-Order Spectra Anal-
gebra, tensor computation, blind source separation, and ysis: A Nonlinear Signal Processing Framework. Prentice
brain computer interface. Hall, 1993.
Cesar F. Caiafa received the Ph.D. degree in engineering [10] J.-F. Cardoso and A. Souloumiac, “Blind beamforming
from the Faculty of Engineering, University of Buenos Aires, for non-Gaussian signals,” in IEE Proceedings F (Radar
in 2007. He is currently Adjunct Researcher with the Ar- and Signal Processing), vol. 140, no. 6. IET, 1993, pp.
gentinean Radioastronomy Institute (IAR) - CONICET and 362–370.
Assistant Professor with Faculty of Engineering, University [11] P. Comon, “Independent component analysis, a new
of Buenos Aires. He is also Visiting Scientist at Lab. for concept?” Signal Processing, vol. 36, no. 3, pp. 287–314,
Advanced Brain Signal Processing, BSI - RIKEN, Japan. 1994.
Guoxu Zhou received his Ph.D. degree in intelligent signal [12] P. Comon and C. Jutten, Eds., Handbook of Blind Source
and information processing from South China University Separation: Independent Component Analysis and Appli-
of Technology, Guangzhou, China, in 2010. He is currently cations. Academic Press, 2010.
a Research Scientist of the Laboratory for Advanced Brain [13] L. De Lathauwer, B. De Moor, and J. Vandewalle,
Signal Processing, at RIKEN Brain Science Institute, Japan. “A multilinear singular value decomposition,” SIAM
His research interests include statistical signal processing, Journal of Matrix Analysis and Applications, vol. 24, pp.
tensor analysis, intelligent information processing, and ma- 1253–1278, 2000.
chine learning. [14] G. Beylkin and M. Mohlenkamp, “Algorithms for
Qibin Zhao received his Ph.D. degree from the Department numerical analysis in high dimensions,” SIAM J. Sci-
of Computer Science and Engineering, Shanghai Jiao Tong entific Computing, vol. 26, no. 6, pp. 2133–2159, 2005.
University, Shanghai, China, in 2009. He is currently a re- [15] J. Ballani, L. Grasedyck, and M. Kluge, “Black box ap-
search scientist at the Laboratory for Advanced Brain Signal proximation of tensors in hierarchical Tucker format,”
Processing in RIKEN Brain Science Institute, Japan and Linear Algebra and its Applications, vol. 438, no. 2, pp.
a visiting research scientist in BSI TOYOTA Collaboration 639–657, 2013.
21
[16] I. V. Oseledets, “Tensor-train decomposition,” SIAM Processing, IEEE Transactions on, vol. 52, no. 7, pp.
J. Scientific Computing, vol. 33, no. 5, pp. 2295–2317, 1814–1829, 2004.
2011. [36] N. Sidiropoulos, “Generalizing Caratheodory’s
[17] N. Sidiropoulos, R. Bro, and G. Giannakis, “Paral- uniqueness of harmonic parameterization to N
lel factor analysis in sensor array processing,” IEEE dimensions,” IEEE Trans. Information Theory, vol. 47,
Transactions on Signal Processing, vol. 48, no. 8, pp. no. 4, pp. 1687–1690, 2001.
2377–2388, 2000. [37] A. Belouchrani, K. Abed-Meraim, J.-F. Cardoso, and
[18] N. Sidiropoulos, G. Giannakis, and R. Bro, “Blind É. Moulines, “A blind source separation technique
PARAFAC receivers for DS-CDMA systems,” IEEE using second-order statistics,” IEEE Trans. Signal Pro-
Transactions on Signal Processing, vol. 48, no. 3, pp. cessing, vol. 45, no. 2, pp. 434–444, 1997.
810–823, 2000. [38] F. Miwakeichi, E. Martnez-Montes, P. Valds-Sosa,
[19] A. Cichocki, R. Zdunek, A.-H. Phan, and S. Amari, N. Nishiyama, H. Mizuhara, and Y. Yamaguchi, “De-
Nonnegative Matrix and Tensor Factorizations: Applica- composing EEG data into space−time−frequency
tions to Exploratory Multi-way Data Analysis and Blind components using parallel factor analysis,” NeuroIm-
Source Separation. Chichester: Wiley, 2009. age, vol. 22, no. 3, pp. 1035–1045, 2004.
[20] J. Landsberg, Tensors: Geometry and Applications. [39] M. Vasilescu and D. Terzopoulos, “Multilinear anal-
AMS, 2012. ysis of image ensembles: Tensorfaces,” in Proc. Eu-
[21] W. Hackbusch, Tensor Spaces and Numerical Tensor ropean Conf. on Computer Vision (ECCV), vol. 2350,
Calculus, ser. Springer series in computational math- Copenhagen, Denmark, May 2002, pp. 447–460.
ematics. Heidelberg: Springer, 2012, vol. 42. [40] M. Hirsch, D. Lanman, G.Wetzstein, and R. Raskar,
[22] E. Acar and B. Yener, “Unsupervised multiway data “Tensor displays,” in Int. Conf. on Computer Graphics
analysis: A literature survey,” IEEE Transactions on and Interactive Techniques, SIGGRAPH 2012, Los An-
Knowledge and Data Engineering, vol. 21, pp. 6–20, geles, CA, USA, Aug. 5-9, 2012, Emerging Technologies
2009. Proceedings, 2012, pp. 24–42.
[23] T. Kolda and B. Bader, “Tensor decompositions and [41] J. Hastad, “Tensor rank is NP-complete,” Journal of
applications,” SIAM Review, vol. 51, no. 3, pp. 455– Algorithms, vol. 11, no. 4, pp. 644–654, 1990.
500, September 2009. [42] M. Timmerman and H. Kiers, “Three mode principal
[24] P. Comon, X. Luciani, and A. L. F. de Almeida, components analysis: Choosing the numbers of com-
“Tensor decompositions, Alternating Least Squares ponents and sensitivity to local optima,” British Jour-
and other Tales,” Jour. Chemometrics, vol. 23, pp. 393– nal of Mathematical and Statistical Psychology, vol. 53,
405, 2009. no. 1, pp. 1–16, 2000.
[25] H. Lu, K. Plataniotis, and A. Venetsanopoulos, “A [43] E. Ceulemans and H. Kiers, “Selecting among three-
survey of multilinear subspace learning for tensor mode principal component models of different types
data,” Pattern Recognition, vol. 44, no. 7, pp. 1540– and complexities: A numerical convex-hull based
1551, 2011. method,” British Journal of Mathematical and Statistical
[26] M. Mørup, “Applications of tensor (multiway array) Psychology, vol. 59, no. 1, pp. 133–150, May 2006.
factorizations and decompositions in data mining,” [44] M. Mørup and L. K. Hansen, “Automatic
Wiley Interdisc. Rew.: Data Mining and Knowledge Dis- relevance determination for multiway models,”
covery, vol. 1, no. 1, pp. 24–40, 2011. Journal of Chemometrics, Special Issue: In Honor
[27] B. Khoromskij, “Tensors-structured numerical meth- of Professor Richard A. Harshman, vol. 23, no.
ods in scientific computing: Survey on recent ad- 7-8, pp. 352 – 363, 2009. [Online]. Available:
vances,” Chemometrics and Intelligent Laboratory Sys- http://www2.imm.dtu.dk/pubdb/p.php?5806
tems, vol. 110, no. 1, pp. 1–19, 2011. [45] N. Sidiropoulos and R. Bro, “On the uniqueness
[28] L. Grasedyck, D. Kessner, and C. Tobler, “A litera- of multilinear decomposition of N-way arrays,” J.
ture survey of low-rank tensor approximation tech- Chemometrics, vol. 14, no. 3, pp. 229–239, 2000.
niques,” CGAMM-Mitteilungen, vol. 36, pp. 53–78, [46] T. Jiang and N. D. Sidiropoulos, “Kruskal’s per-
2013. mutation lemma and the identification of CANDE-
[29] P. Comon, “Tensors: A brief survey,” IEEE Signal COMP/PARAFAC and bilinear models,” IEEE Trans.
Processing Magazine, p. (accepted), 2014. Signal Processing, vol. 52, no. 9, pp. 2625–2636, 2004.
[30] A. Bruckstein, D. Donoho, and M. Elad, “From sparse [47] L. De Lathauwer, “A link between the canonical
solutions of systems of equations to sparse modeling decomposition in multilinear algebra and simultane-
of signals and images,” SIAM Review, vol. 51, no. 1, ous matrix diagonalization,” SIAM J. Matrix Analysis
pp. 34–81, 2009. Applications, vol. 28, no. 3, pp. 642–666, 2006.
[31] J. Kruskal, “Three-way arrays: rank and uniqueness [48] A. Stegeman, “On uniqueness conditions for Cande-
of trilinear decompositions, with application to arith- comp/Parafac and Indscal with full column rank in
metic complexity and statistics,” Linear Algebra and its one mode,” Linear Algebra and its Applications, vol. 431,
Applications, vol. 18, no. 2, pp. 95 – 138, 1977. no. 1–2, pp. 211–227, 2009.
[32] I. Domanov and L. De Lathauwer, “On the unique- [49] E. Sanchez and B. Kowalski, “Tensorial resolution: a
ness of the canonical polyadic decomposition of third- direct trilinear decomposition,” J. Chemometrics, vol. 4,
order tensors — part i: Basic results and uniqueness pp. 29–45, 1990.
of one factor matrix and part ii: Uniqueness of the [50] I. Domanov and L. De Lathauwer, “Canonical
overall decomposition,” SIAM J. Matrix Anal. Appl., polyadic decomposition of third-order tensors: Re-
vol. 34, no. 3, pp. 855–903, 2013. duction to generalized eigenvalue decomposition,”
[33] A. Cichocki and S. Amari, Adaptive Blind Signal and ESAT, KU Leuven, ESAT-SISTA Internal Report 13-36,
Image Processing. John Wiley, Chichester, 2003. 2013.
[34] A. Hyvärinen, “Independent component analysis: re- [51] S. Vorobyov, Y. Rong, N. Sidiropoulos, and A. Gersh-
cent advances,” Philosophical Transactions of the Royal man, “Robust iterative fitting of multilinear models,”
Society A: Mathematical, Physical and Engineering Sci- IEEE Transactions Signal Processing, vol. 53, no. 8, pp.
ences, vol. 371, no. 1984, 2013. 2678–2689, 2005.
[35] M. Elad, P. Milanfar, and G. H. Golub, “Shape from [52] X. Liu and N. Sidiropoulos, “Cramer-Rao lower
moments – an estimation theory perspective,” Signal bounds for low-rank decomposition of multidimen-
22
sional arrays,” IEEE Trans. on Signal Processing, vol. 49, [70] L. De Lathauwer, B. De Moor, and J. Vandewalle, “On
no. 9, pp. 2074–2086, Sep. 2001. the best rank-1 and rank-(R1 , R2 , . . . , R N ) approxima-
[53] P. Tichavsky, A. Phan, and Z. Koldovsky, “Cramér- tion of higher-order tensors,” SIAM Journal of Matrix
rao-induced bounds for candecomp/parafac tensor Analysis and Applications, vol. 21, no. 4, pp. 1324–1342,
decomposition,” IEEE Transactions on Signal Process- 2000.
ing, vol. 61, no. 8, pp. 1986–1997, 2013. [71] B. Savas and L.-H. Lim, “Quasi-Newton methods
[54] B. Chen, S. He, Z. Li, and S. Zhang, “Maximum block on Grassmannians and multilinear approximations of
improvement and polynomial optimization,” SIAM tensors,” SIAM J. Scientific Computing, vol. 32, no. 6,
Journal on Optimization, vol. 22, no. 1, pp. 87–107, 2012. pp. 3352–3393, 2010.
[55] A. Uschmajew, “Local convergence of the alternating [72] M. Ishteva, P.-A. Absil, S. Van Huffel, and L. De Lath-
least squares algorithm for canonical tensor approx- auwer, “Best low multilinear rank approximation of
imation,” SIAM J. Matrix Anal. Appl., vol. 33, no. 2, higher-order tensors, based on the Riemannian trust-
pp. 639–652, 2012. region scheme,” SIAM J. Matrix Analysis Applications,
[56] M. J. Mohlenkamp, “Musings on multilinear fitting,” vol. 32, no. 1, pp. 115–135, 2011.
Linear Algebra and its Applications, vol. 438, no. 2, pp. [73] G. Zhou and A. Cichocki, “Fast and unique Tucker
834–852, 2013. decompositions via multiway blind source separa-
[57] M. Razaviyayn, M. Hong, and Z.-Q. Luo, “A unified tion,” Bulletin of Polish Academy of Science, vol. 60,
convergence analysis of block successive minimiza- no. 3, pp. 389–407, 2012.
tion methods for nonsmooth optimization,” SIAM [74] A. Cichocki, “Generalized Component Analysis and
Journal on Optimization, vol. 23, no. 2, pp. 1126–1153, Blind Source Separation Methods for Analyzing
2013. Mulitchannel Brain Signals,” in in Statistical and
[58] P. Paatero, “The multilinear engine: A table-driven Process Models for Gognitive Neuroscience and Aging.
least squares program for solving multilinear prob- Lawrence Erlbaum Associates, 2007, pp. 201–272.
lems, including the n-way parallel factor analysis [75] M. Haardt, F. Roemer, and G. D. Galdo, “Higher-
model,” Journal of Computational and Graphical Statis- order SVD based subspace estimation to improve the
tics, vol. 8, no. 4, pp. 854–888, Dec. 1999. parameter estimation accuracy in multi-dimensional
[59] E. Acar, D. Dunlavy, T. Kolda, and M. Mørup, harmonic retrieval problems,” IEEE Trans. Signal Pro-
“Scalable tensor factorizations for incomplete data,” cessing, vol. 56, pp. 3198 – 3213, Jul. 2008.
Chemometrics and Intelligent Laboratory Systems, vol. [76] A. Phan and A. Cichocki, “Tensor decompositions for
106 (1), pp. 41–56, 2011. [Online]. Available: feature extraction and classification of high dimen-
http://www2.imm.dtu.dk/pubdb/p.php?5923 sional datasets,” Nonlinear Theory and Its Applications,
[60] A.-H. Phan, P. Tichavsky, and A. Cichocki, “Low IEICE, vol. 1, no. 1, pp. 37–68, 2010.
complexity Damped Gauss-Newton algorithms for [77] L. De Lathauwer, “Decompositions of a higher-order
CANDECOMP/PARAFAC,” SIAM Journal on Matrix tensor in block terms – Part I and II,” SIAM
Analysis and Applications (SIMAX), vol. 34, no. 1, pp. Journal on Matrix Analysis and Applications (SIMAX),
126–147, 2013. vol. 30, no. 3, pp. 1022–1066, 2008, special Issue on
[61] L. Sorber, M. Van Barel, and L. De Lathauwer, Tensor Decompositions and Applications. [Online].
“Optimization-based algorithms for tensor decompo- Available: http://publi-etis.ensea.fr/2008/De08e
sitions: Canonical Polyadic Decomposition, decompo-
[78] L. De Lathauwer, “Blind separation of exponential
sition in rank-( L r , L r , 1) terms and a new generaliza-
polynomials and the decomposition of a tensor in
tion,” SIAM J. Optimization, vol. 23, no. 2, 2013.
rank-(L r ,L r ,1) terms,” SIAM J. Matrix Analysis Appli-
[62] V. de Silva and L.-H. Lim, “Tensor rank and the ill-
cations, vol. 32, no. 4, pp. 1451–1474, 2011.
posedness of the best low-rank approximation prob-
[79] L. De Lathauwer, “Block component analysis, a new
lem,” SIAM J. Matrix Anal. Appl., vol. 30, pp. 1084–
concept for blind source separation,” in Proc. 10th
1127, September 2008.
International Conf. LVA/ICA, Tel Aviv, March 12-15,,
[63] W. Krijnen, T. Dijkstra, and A. Stegeman, “On the
2012, pp. 1–8.
non-existence of optimal solutions and the occurrence
of “degeneracy” in the Candecomp/Parafac model,” [80] E. Candès, J. Romberg, and T. Tao, “Robust un-
Psychometrika, vol. 73, pp. 431–439, 2008. certainty principles: exact signal reconstruction from
[64] M. Sørensen, L. De Lathauwer, P. Comon, S. Icart, highly incomplete frequency information,” IEEE
and L. Deneire, “Canonical Polyadic Decomposition Transactions on Information Theory, vol. 52, no. 2, pp.
with orthogonality constraints,” SIAM J. Matrix Anal. 489–509, 2006.
Appl., vol. 33, no. 4, pp. 1190–1213, 2012. [81] E. J. Candes and T. Tao, “Near-optimal signal recovery
[65] M. Sørensen and L. De Lathauwer, “Blind signal from random projections: Universal encoding strate-
separation via tensor decomposition with Vander- gies?” Information Theory, IEEE Transactions on, vol. 52,
monde factor: Canonical polyadic decomposition,” no. 12, pp. 5406–5425, 2006.
IEEE Trans. Signal Processing, vol. 61, no. 22, pp. 5507– [82] D. L. Donoho, “Compressed sensing,” Information
5519, Nov. 2013. Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–
[66] G. Zhou and A. Cichocki, “Canonical Polyadic De- 1306, 2006.
composition based on a single mode blind source [83] Y. Eldar and G. Kutyniok, “Compressed Sensing:
separation,” IEEE Signal Processing Letters, vol. 19, Theory and Applications,” New York: Cambridge Univ.
no. 8, pp. 523–526, 2012. Press, vol. 20, p. 12, 2012.
[67] L.-H. Lim and P. Comon, “Nonnegative approxima- [84] M. F. Duarte and R. G. Baraniuk, “Kronecker com-
tions of nonnegative tensors,” Journal of Chemometrics, pressive sensing,” IEEE Transactions on Image Process-
vol. 23, no. 7-8, pp. 432–441, 2009. ing, vol. 21, no. 2, pp. 494–504, 2012.
[68] A. van der Veen and A. Paulraj, “An analytical con- [85] C. Caiafa and A. Cichocki, “Computing sparse rep-
stant modulus algorithm,” IEEE Transactions Signal resentations of multidimensional signals using Kro-
Processing, vol. 44, pp. 1136–1155, 1996. necker bases,” Neural Computaion, vol. 25, no. 1, pp.
[69] R. Roy and T. Kailath, “Esprit-estimation of signal pa- 186–220, 2013.
rameters via rotational invariance techniques,” Acous- [86] ——, “Multidimensional compressed sensing and
tics, Speech and Signal Processing, IEEE Transactions on, their applications,” WIREs Data Mining and Knowledge
vol. 37, no. 7, pp. 984–995, 1989. Discovery, 2013 (accepted).
23
[87] S. Gandy, B. Recht, and I. Yamada, “Tensor com- [106] A. Cichocki, “Tensors decompositions: New concepts
pletion and low-n-rank tensor recovery via convex for brain data analysis?” Journal of Control, Measure-
optimization,” Inverse Problems, vol. 27, no. 2, 2011. ment, and System Integration (SICE), vol. 47, no. 7, pp.
[88] M. Signoretto, Q. T. Dinh, L. De Lathauwer, and J. A. 507–517, 2011.
Suykens, “Learning with tensors: a framework based [107] V. Calhoun, J. Liu, and T. Adali, “A review of group
on convex optimization and spectral regularization,” ICA for fMRI data and ICA for joint inference of
Machine Learning, pp. 1–49, 2013. imaging, genetic, and ERP data,” Neuroimage, vol. 45,
[89] L. Sorber, M. Van Barel, and L. De Lathauwer, pp. 163–172, 2009.
“Tensorlab v1.0,” Feb. 2013. [Online]. Available: [108] Y.-O. Li, T. Adali, W. Wang, and V. Calhoun, “Joint
http://esat.kuleuven.be/sista/tensorlab/ blind source separation by multiset canonical corre-
[90] N. Sidiropoulos and A. Kyrillidis, “Multi-way com- lation analysis,” IEEE Transactions on Signal Processing,
pressed sensing for sparse low-rank tensors,” IEEE vol. 57, no. 10, pp. 3918 –3929, oct. 2009.
Signal Processing Letters, vol. 19, no. 11, pp. 757–760, [109] E. Acar, T. Kolda, and D. Dunlavy, “All-at-once op-
2012. timization for coupled matrix and tensor factoriza-
[91] D. Foster, K. Amano, S. Nascimento, and M. Foster, tions,” CoRR, vol. abs/1105.3422, 2011.
“Frequency of metamerism in natural scenes.” Journal [110] G. Zhou, A. Cichocki, S. Xie, and D. Mandic,
of the Optical Society of America A, vol. 23, no. 10, pp. “Beyond Canonical Correlation Analysis: Com-
2359–2372, 2006. mon and individual features analysis,” IEEE
[92] A. Cichocki, “Era of big data processing: A new Transactions on PAMI, 2013. [Online]. Available:
approach via tensor networks and tensor decomposi- http://arxiv.org/abs/1212.3913,2012
tions, (invited talk),” in Proc. of Int. Workshop on Smart [111] B. Bader, T. G. Kolda et al., “MAT-
Info-Media Systems in Asia (SISA2013), Nagoya, Japan, LAB tensor toolbox version 2.5,” Avail-
Sept.30–Oct.2, 2013. able online, Feb. 2012. [Online]. Available:
http://www.sandia.gov/∼tgkolda/TensorToolbox
[93] R. Orus, “A Practical Introduction to Tensor Net-
[112] G. Zhou and A. Cichocki, “TDALAB:
works: Matrix Product States and Projected Entangled
Tensor Decomposition Laboratory,” LABSP,
Pair States,” The Journal of Chemical Physics, 2013.
Wako-shi, Japan, 2013. [Online]. Available:
[94] J. Salmi, A. Richter, and V. Koivunen, “Sequential http://bsp.brain.riken.jp/TDALAB/
unfolding SVD for tensors with applications in array [113] A.-H. Phan, P. Tichavský, and A. Cichocki, “Tensor-
signal processing,” IEEE Transactions on Signal Process- box: a matlab package for tensor decomposition,”
ing, vol. 57, pp. 4719–4733, 2009. LABSP, RIKEN, Japan, 2012. [Online]. Available:
[95] A.-H. Phan and A. Cichocki, “PARAFAC algorithms http://www.bsp.brain.riken.jp/∼phan/tensorbox.php
for large-scale problems,” Neurocomputing, vol. 74, [114] C. Andersson and R. Bro, “The N-way toolbox
no. 11, pp. 1970–1984, 2011. for MATLAB,” Chemometrics Intell. Lab. Systems,
[96] S. K. Suter, M. Makhynia, and R. Pajarola, “Tamresh vol. 52, no. 1, pp. 1–4, 2000. [Online]. Available:
- tensor approximation multiresolution hierarchy for http://www.models.life.ku.dk/nwaytoolbox
interactive volume visualization,” Comput. Graph. Fo- [115] I. Oseledets, “TT-toolbox 2.2,” 2012. [Online].
rum, vol. 32, no. 3, pp. 151–160, 2013. Available: https://github.com/oseledets/TT-Toolbox
[97] D. Nion and N. Sidiropoulos, “Adaptive algorithms [116] D. Kressner and C. Tobler, “htucker—A MATLAB
to track the PARAFAC decomposition of a third-order toolbox for tensors in hierarchical Tucker format,”
tensor,” IEEE Trans. on Signal Processing, vol. 57, no. 6, MATHICSE, EPF Lausanne, 2012. [Online]. Available:
pp. 2299–2310, Jun. 2009. http://anchp.epfl.ch/htucker
[98] S. A. Goreinov, N. L. Zamarashkin, and E. E. Tyrtysh- [117] M. Espig, M. Schuster, A. Killaitis, N. Waldren,
nikov, “Pseudo-skeleton approximations by matrices P. Wähnert, S. Handschuh, and H. Auer,
of maximum volume,” Mathematical Notes, vol. 62, “Tensor Calculus library,” 2012. [Online]. Available:
no. 4, pp. 515–519, 1997. http://gitorious.org/tensorcalculus
[99] C. Caiafa and A. Cichocki, “Generalizing the column- [118] P. Kroonenberg, “The three-mode company. A
row matrix decomposition to multi-way arrays,” Lin- company devoted to creating three-mode software
ear Algebra and its Applications, vol. 433, no. 3, pp. 557– and promoting three-mode data analysis.” [Online].
573, 2010. Available: http://three-mode.leidenuniv.nl/
[100] S. A. Goreinov, “On cross approximation of multi- [119] S. Zhe, Y. Qi, Y. Park, I. Molloy, and S. Chari,
index array,” Doklady Math., vol. 420, no. 4, pp. 404– “DinTucker: Scaling up Gaussian process models on
406, 2008. multidimensional arrays with billions of elements,”
[101] I. Oseledets, D. V. Savostyanov, and E. Tyrtysh- PAMI (in print) arXiv preprint arXiv:1311.2663, 2014.
nikov, “Tucker dimensionality reduction of three- [120] K. Yilmaz and A. T. Cemgil, “Probabilistic latent
dimensional arrays in linear time,” SIAM J. Matrix tensor factorisation,” in Proc. of International Conference
Analysis Applications, vol. 30, no. 3, pp. 939–956, 2008. on Latent Variable analysis and Signal Separation, vol.
[102] I. Oseledets and E. Tyrtyshnikov, “TT-cross approx- 6365, 2010, pp. 346–353, cPCI-S.
imation for multidimensional arrays,” Linear Algebra
and its Applications, vol. 432, no. 1, pp. 70–88, 2010.
[103] M. W. Mahoney, M. Maggioni, and P. Drineas,
“Tensor-CUR decompositions for tensor-based data,”
SIAM Journal on Matrix Analysis and Applications,
vol. 30, no. 3, pp. 957–987, 2008.
[104] R. Bro, “Multiway calibration. Multilinear PLS,” Jour-
nal of Chemometrics, vol. 10, pp. 47–61, 1996.
[105] Q. Zhao, C. Caiafa, D. Mandic, Z. Chao, Y. Nagasaka,
N. Fujii, L. Zhang, and A. Cichocki, “Higher-order
partial least squares (HOPLS): A generalized multi-
linear regression method,” IEEE Trans on Pattern Anal-
ysis and machine Intelligence (PAMI), vol. 35, no. 7, pp.
1660–1673, 2013.

Tensors PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Tensors PDF

Uploaded by

Copyright:

Available Formats

1

Tensor Decompositions for Signal

learning and biomedical applications, to name N OTATIONS AND C ONVENTIONS

TABLE I: Basic notation.

A, A, a, a tensor, matrix, vector, scalar

A = [ a1 , a 2 , . . . , a R ] matrix A with column vectors ar

a(:, i2 , i3 , . . . , i N ) fiber of tensor A obtained by fixing all but one index

A(:, :, :, i4 , . . . , i N ) tensor slice of A obtained by fixing some indices

A(I1 , I2 , . . . , I N ) subtensor of A obtained by restricting indices to belong to subsets

mode-n matricization of tensor A ∈ R I1 × I2 ×···× IN whose entry at

D = diag(λ1 , λ2 , . . . , λ R ) diagonal matrix with drr = λr

D = diag N (λ1 , λ2 , . . . , λ R ) diagonal tensor of order N with drr ···r = λr

A T , A −1 , A † transpose, inverse, and Moore-Penrose pseudo-inverse

TABLE II: Definition of products.

C = A ×n B mode-n product of A ∈ R I1 × I2 ×···× IN and B ∈ R Jn × In yields

C = JA; B(1) , B(2) , . . . , B( N ) K full multilinear product, C = A ×1 B(1) ×2 B(2) · · · × N B( N )

C = A◦B tensor or outer product of A ∈ R I1 × I2 ×···× IN and B ∈ R J1 × J2 ×···× J M

X = a ( 1) ◦ a ( 2) ◦ · · · ◦ a ( N ) tensor or outer product of vectors a( n) ∈ R In (n = 1, . . . , N ) yields a rank-1

C = A⊗B Kronecker product of A ∈ R I1 × I2 and B ∈ R J1 × J2 yields C ∈ R I1 J1 × I2 J2

C = A⊙B Khatri-Rao product of A = [ a1 , . . . , a R ] ∈ R I × R and B = [ b1 , . . . , b R ] ∈

+L + ing snapshots from identical subarrays [17].

TABLE III: Different forms of CPD and Tucker rep-

(I1 × I 2 × I 3 ) (I1 ´ R1 ) (R1×R2 ×R3) (R2 ´ I 2 ) R R1 R2 R3

X (1) = A D (C ⊙ B) T X (1) = A G (1) ( C ⊗ B ) T

diagonalized, while the number of CPD terms

X(:, j,:) I X(2) I (NMF/SCA)

(K ) c1 (K ) cR • ICA (using the JADE algorithm [10]) failed

under more restrictive but still fairly natural

Example 3. To compare some standard and 0.2

tensor approaches for the separation of short 0.1

on five linear mixtures of the sources s1 (t) = −0.2

problem, whose solution requires knowledge Sparse Vector Representation (Kronecker-CS)

where k · k0 is the ℓ0 -norm (number of non-zero (compressed sensing) core tensor

Kronecker-CS of a 32-channel hyperspectral image

be computed if the tensor dimensions in all

matrices are sampled from the fibers of the original (I ´ M ) (I ´ R) (R ´ M )

L INKED M ULTIWAY C OMPONENT A NALYSIS where each factor matrix B(n,k) =

and (ii) the corresponding latent variables are 90

statistically independent [107], [108], both quite 85

tensor decompositions may be of a polyadic or 75

You might also like