Fdocuments - in - Introduction To Tensors Max Planck Why Tensors A Tensors Can Be Used When Matrices

Introduction to
Tensors
8 May 2014
Introduction to Tensors
• What is a … tensor?
• Basic Operations
• CP Decompositions and Tensor Rank
• Matricization and Computing the CP

Dear Tullio,
!
I admire the elegance of your
method of computation; it
must be nice to ride through
these fields upon the horse of
true mathematics while the
like of us have to make our
way laboriously on foot.
!
Cheers, Albert
Tullio Levi-Civita
What is a … tensor?
• A tensor is a multi-way extension of a matrix
• A multi-dimensional array 0 1
• A multi-linear map 8 1 6
• In particular, the following   @ 3 5 7A
132011)
(13, 42,
are all tensors:
4 9 2
• Scalars
• Vectors
• Matrices
Why Tensors?
• Tensors can be used when matrices are not enough
• A matrix can represent a binary relation
• A tensor can represent an n-ary relation
• E.g. subject–predicate–object data
• A tensor can represent a set of binary relations
• Or other matrices
• A matrix can represent a matrix
• A tensor can represent a series/set of matrices
• But using tensors for time series should be approached with care
Terminology
• Tensor is N-way array
• E.g. a matrix is a 2-way array
• Other sources use:
• N-dimensional
• But is a 3-dimensional vector a 1-dimensional tensor?
• rank-N
• But we have a different use for the word rank
• A 3-way tensor can be N-by-M-by-K dimensional
• A 3-way tensor has three modes
• Columns, rows, and tubes

the jth column of A is denoted by a:j , and the ith row of a matrix A is denoted by
ai: . Alternatively, the jth column of a matrix, a:j , may be denoted more compactly
as aj .
Fibers are the higher order analogue of matrix rows and columns. A fiber is
Fibres and Slices

defined by fixing every index but one. A matrix column is a mode-1 fiber and a
matrix row is a mode-2 fiber. Third-order tensors have column, row, and tube fibers,
denoted by x:jk , xi:k , and xij: , respectively; see Figure 2.1. When extracted from the
tensor, fibers are always assumed to be oriented as column vectors.
Preprint of article to appear in SIAM Review (June 10, 2008).
4 T. G. Kolda and B. W. Bader
slice of a third-order
(a) Mode-1 tensor,
(column) fibers: (b), Mode-2
x X::k may be(row)
denoted
fibers:more
x compactly as Xkfibers:
(c) Mode-3 (tube) . xij:
:jk i:k
Fig. 2.1: Fibers of a 3rd-order tensor.
Slices are two-dimensional sections of a tensor, defined by fixing all but two
indices. Figure 2.2 shows the horizontal, lateral, and frontal slides of a third-order
tensor X, denoted by Xi:: , X:j: , and X::k , respectively. Alternatively, the kth frontal
3 In some fields, the order of the tensor is referred to as the rank of the tensor. In much of the
literature and this review, however, the term rank means something quite di↵erent; see §3.1.
(a) Horizontal slices: Xi:: (b) Lateral slices: X:j: (c) Frontal slices: X::k (or Xk )
Fig. 2.2: Slices of a 3rd-order tensor.

Kolda & Bader 2009
Basic Operations
• Tensors require extensions to the standard

linear algebra operations for matrices
• But before tensor operations, a recap on

vectors and matrices
Basic Operations on
Vectors
T
• A transpose v transposes a row vector into a column
vector and vice versa
n
• If v, w ∈ ℝ , v + w is a vector with (v + w)i = vi + wi
• For vector v and scalar α, (αv)i = αvi

n
• A dot product of two vectors v, w ∈ ℝ is v·w = ∑i viwi
• A.k.a. scalar product or inner product

T
• Alternative notations: ⟨v, w⟩, v w (for column
T
vectors), vw (for row vectors)
Basic Operations on
Matrices
T
• Matrix transpose A has the rows of A as its columns
• If A and B are n-by-m matrices, then A + B is an  

n-by-m matrix with (A + B)ĳ = mĳ + nĳ
• If A is n-by-k and B is k-by-m, then AB is an n-by-m matrix

with  
X
k
 
(AB) j = b j
=1
T
• Vector outer product vw (for column vectors) is the
matrix product of n-by-1 and 1-by-m matrices
Intuition for Matrix
Multiplication
• Element (AB)ĳ is the inner product of row i of

A and column j of B
Multiplication
• Column j of AB is the linear combination of
columns of A with the coefficients coming
from column j of B
Multiplication
• Matrix AB is a sum of k matrices alblT
obtained by multiplying the l-th column of A

with the l-th row of B
Tensor Basic Operations
• A multi-way vector outer product is a tensor
where each element is the product of corresponding
elements in vectors: 
( b c) jk = bj ck
• Tensor sum of two same-sized tensors is their
element-wise sum (X + Y) jk = jk + y jk
• A tensor inner product of two same-sized tensors is
the sum of the element-wise products of their values:
P PJ PZ
hX , Yi = =1 j=1
· · · z=1 j···z y j···z
Norms and Distances
• The Frobenius norm of a matrix M is  
2 1/2
||M||F = (Σi,j mĳ )
• Can be used as a distance between two

matrices: d(M, N) = ||M – N||F
• Similar Frobenius distance on tensors is 

ÄP ä1/ 2
d(X , Y) = ( y ) 2
,j,k jk jk
p
• Equivalently hX Y, X Yi
CP Decomposition and
Tensor Rank
• A matrix decomposition represents the given matrix as a
product of two (or more) factor matrices
• The rank of a matrix M is the
• Number of linearly independent rows (row rank)
• Number of linearly independent columns (column rank)
• Number of rank-1 matrices needed to be summed to get M

(Schein rank)
r a l i ze
• A rank-1 matrix is an outer product of two vectors i s we ge ne
Th
• They all are equivalent
Rank-1 Tensors
c
b
X =
= b
X
a
X=a b c
X
The CP Tensor
Decomposition
c1 c2 cR
X ⇡ b1 + b2 + · · · + bR
a1 a2 aR
R
X
xijk ⇡ air bjr ckr
r=1
More on CP
• The size of the CP decomposition is the
number of rank-1 tensors involved
• The factorization can also be written using N

factor matrices (for order-N tensor)
• All column vectors are collected in one

matrix, all row vectors in other, all tube
vectors in third, etc.
CANDECOM, PARAFAC, ...
Preprint of article to appear in SIAM Review (June 10, 2008).
Tensor Decompositions and Applications
Name Proposed by
Polyadic Form of a Tensor Hitchcock, 1927 [105]
PARAFAC (Parallel Factors) Harshman, 1970 [90]
CANDECOMP or CAND (Canonical decomposition) Carroll and Chang, 1970 [38]
Topographic Components Model Möcks, 1988 [166]
CP (CANDECOMP/PARAFAC) Kiers, 2000 [122]
Table 3.1: Some of the many names for the CP decomposition.
where R is a positive integer, and ar ⌃ RI , br ⌃ RJ , and cr ⌃ RK , for r = 1, . . . , R

Elementwise, (3.1) is written as
R
⇤
xijk ⇧ air bjr ckr , for i = 1, . . . , I, j = 1, . . . , J, kKolda
= 1, .&. .Bader
, K. 2009
r=1
Tensor Rank
• The rank of a tensor is the minimum number of
rank-1 tensors needed to represent the tensor exactly
• The CP decomposition of size R
• Generalizes the matrix Schein rank
c1 c2 cR
X = b1 + b2 + · · · + bR
a1 a2 aR
Tensor Rank Oddities #1
✓✓ ◆ ✓ ◆◆
• The rank of a (real-valued) 1 0 0 1
X= ,
tensor is different over 0 1 1 0
reals and over complex
✓ ✓ ◆
numbers. 11 01 11
A= p ,
02 1 i 1i
• With reals, the rank can ✓ ✓ ◆◆
11 01 11
be larger than the largest B= p , ,
02 1i 1 i
dimension ✓ ◆ ◆
11 11 0
• rank(X) ≤ min{Ĳ, IK, JK}
C=
i1 1 1
for I-by-J-by-K tensor
• There are tensors of rank R that can be approximated
arbitrarily well with tensors of rank R’ for some R’ < R.
• That is, there are no best low-rank approximation for

such tensors.
• Eckart–Young-theorem shows this is impossible

with matrices.
• The smallest such R’ is called the border rank of

the tensor.
• The rank-R CP decomposition of a rank-R tensor is
essentially unique under mild conditions.
• Essentially unique = only scaling and permuting

are allowed.
• Does not contradict #2, as this is the rank

decomposition, not low-rank decomposition.
• Again, not true for matrices (unless orthogonality

etc. is required).
Tensor Matricization and
New Matrix Products
• Tensor matricization unfolds an N-way tensor into a matrix
• Mode-n matricization arranges the mode-n fibers as

columns of a matrix, denoted X(n)
• As many rows as is the dimensionality of the nth mode
• As many columns as is the product of the dimensions of

the other modes
• If X is an N-way tensor of size I1×I2×…×IN, then X(n) maps

element 1 , 2 ,..., N
into (iN, j) where
X
N kY1
j=1+ ( k 1)Jk [ k 6= n] with Jk = m[ m 6= n]
k=1 m=1
Matricization Example
✓ ◆
✓ 0 ◆1
1 0
X = 0-1 1 0
✓ ◆
1 0 0 1
X(1) =
0 1 1 0
✓ ◆
1 0 0 1
X(2) =
0 1 1 0
✓ ◆
1 0 0 1
X(3) =
0 -1 1 0
Another matricization
example
✓ ◆ ✓ ◆
1 3 5 7
X1 = X2 =
2 4 6 8
✓ ◆
1 3 5 7
X(1) =
2 4 6 8
✓ ◆
1 2 5 6
X(2) =
3 4 7 8
✓ ◆
1 2 3 4
X(3) =
5 6 7 8
Hadamard Matrix Product
• The element-wise matrix product
• Two matrices of size n-by-m, resulting matrix

of size n-by-m
0 1
1,1 b1,1 1,2 b1,2 ··· 1,m b1,m
B 2,1 b2,1 2,2 b2,2 ··· 2,m b2,m C
A B=B
@ .. .. .. .. C
A
. . . .
n,1 bn,1 n,2 bn,2 ··· n,m bn,m
Kronecker Matrix Product
• Element-per-matrix product
• n-by-m and j-by-k matrices A and B give  

nj-by-mk matrix A⊗B
0 1
1,1 B 1,2 B ··· 1,m B
B 2,1 B 2,2 B ··· 2,m BC
A⌦B=B
@ .. .. .. .. C
A
. . . .
n,1 B n,2 B ··· n,m B
Khatri–Rao Matrix Product
• Element-per-column product
• Number of columns must match
• n-by-m and k-by-m matrices A and B give  

nk-by-m matrix A⊙B
0 1
1,1 b1 1,2 b2 ··· 1,m bm
B 2,1 b1 2,2 b2 ··· 2,m bm C
A B=B
@ .. .. .. .. C
A
. . . .
n,1 b1 n,2 b2 ··· n,m bm
Some identities
(A ⌦ B)(C ⌦ D) = AC ⌦ BD
+ + +
(A ⌦ B) = A ⌦ B
A B C = (A B) C=A (B C)
T T T
(A B) (A B) = A A B B
+ T T + T
(A B) = (A A) (B B) (A B)
A is the Moore–Penrose pseudo-inverse

+
Another View on the CP
• Using matricization and Khatri–Rao, we can
re-write the CP decomposition
• One equation per mode
T
X (1) = A(C B)
T
X (2) = B(C A)
T
X (3) = C(B A)
Solving CP: The ALS
Approach
1.Fix B and C and solve A

T
min X (1) A(C B) F
A
2.Solve B and C similarly
T +
A = X (1) (C B)
3.Repeat until convergence
A = X (1) (C B)(CT C BT B)+
R-by-R matrix
Wrap-up
• Tensors generalize matrices
• Many matrix concepts generalize as well
• But some don’t
• And some behave very differently
• We’ve only started with the basic of tensors…

Suggested Reading
• Skillicorn, D., 2007. Understanding Complex
Datasets: Data Mining with Matrix
Decompositions, Chapman & Hall/CRC, Boca
Raton. Chapter 9
• Kolda, T.G. & Bader, B.W., 2009. Tensor

decompositions and applications. SIAM
Review 51(3), pp. 455–500.

Fdocuments - in - Introduction To Tensors Max Planck Why Tensors A Tensors Can Be Used When Matrices

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fdocuments - in - Introduction To Tensors Max Planck Why Tensors A Tensors Can Be Used When Matrices

Uploaded by

Copyright:

Available Formats

Introduction to

• CP Decompositions and Tensor Rank

• Matricization and Computing the CP

• A matrix can represent a binary relation

• A tensor can represent an n-ary relation

• E.g. subject–predicate–object data

• A tensor can represent a set of binary relations

• A matrix can represent a matrix

• A tensor can represent a series/set of matrices

• E.g. a matrix is a 2-way array

• Other sources use:

• But is a 3-dimensional vector a 1-dimensional tensor?

• But we have a diﬀerent use for the word rank

• A 3-way tensor can be N-by-M-by-K dimensional

• A 3-way tensor has three modes

• Columns, rows, and tubes

Fibres and Slices

Preprint of article to appear in SIAM Review (June 10, 2008).

4 T. G. Kolda and B. W. Bader

Fig. 2.1: Fibers of a 3rd-order tensor.

Fig. 2.2: Slices of a 3rd-order tensor.

• Tensors require extensions to the standard

• But before tensor operations, a recap on

• For vector v and scalar α, (αv)i = αvi

• A.k.a. scalar product or inner product

• If A and B are n-by-m matrices, then A + B is an

• If A is n-by-k and B is k-by-m, then AB is an n-by-m matrix

• Element (AB)ĳ is the inner product of row i of

obtained by multiplying the l-th column of A

• Can be used as a distance between two

• Similar Frobenius distance on tensors is

• The rank of a matrix M is the

• Number of linearly independent rows (row rank)

• Number of linearly independent columns (column rank)

• Number of rank-1 matrices needed to be summed to get M

• The factorization can also be written using N

• All column vectors are collected in one

Tensor Decompositions and Applications

Table 3.1: Some of the many names for the CP decomposition.

where R is a positive integer, and ar ⌃ RI , br ⌃ RJ , and cr ⌃ RK , for r = 1, . . . , R

• The CP decomposition of size R

• Generalizes the matrix Schein rank

• That is, there are no best low-rank approximation for

• Eckart–Young-theorem shows this is impossible

• The smallest such R’ is called the border rank of

• Essentially unique = only scaling and permuting

• Does not contradict #2, as this is the rank

• Again, not true for matrices (unless orthogonality

• Mode-n matricization arranges the mode-n fibers as

• As many rows as is the dimensionality of the nth mode

• As many columns as is the product of the dimensions of

• If X is an N-way tensor of size I1×I2×…×IN, then X(n) maps

• Two matrices of size n-by-m, resulting matrix

• n-by-m and j-by-k matrices A and B give

• Number of columns must match

• n-by-m and k-by-m matrices A and B give

A is the Moore–Penrose pseudo-inverse

• One equation per mode

1.Fix B and C and solve A

• Tensors generalize matrices

• Many matrix concepts generalize as well

• But some don’t

• And some behave very diﬀerently

• If A and B are n-by-m matrices, then A + B is an  

• Similar Frobenius distance on tensors is 

• n-by-m and j-by-k matrices A and B give  

• n-by-m and k-by-m matrices A and B give