You are on page 1of 42

Singular Value Decomposition (SVD)

Shusen Wang
Orthonormal Basis

Definition (Orthonormal Basis).


A set of vectors 𝐯" , 𝐯$ , ⋯ , 𝐯& ⊂ ℝ& form an orthonormal basis if:
• unit ℓ$ -norm: 𝐯* $
= 1 for all 𝑖,
• orthogonal to each other: 𝐯*. 𝐯/ = 0 for all 𝑖 ≠ 𝑗.
SVD and Truncated SVD
Singular Value Decomposition (SVD)

• Let 𝐀 ∈ ℝ5×& be any matrix.


• Rank: 𝑟 = rank(𝐀). (𝑟 ≤ 𝑚, 𝑛)
A rank-1 matrix
• SVD: 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*.
Singular Value Decomposition (SVD)

• Let 𝐀 ∈ ℝ5×& be any matrix.


• Rank: 𝑟 = rank(𝐀). (𝑟 ≤ 𝑚, 𝑛)
• SVD: 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*.
• Singular values: 𝜎" ≥ 𝜎$ ≥ ⋯ ≥ 𝜎E > 0.
• Left singular vectors: 𝐮", 𝐮$, ⋯ , 𝐮E ⊂ ℝ5 forms an orthonormal basis.
• Right singular vectors: 𝐯", 𝐯$, ⋯ , 𝐯E ⊂ ℝ& forms an orthonormal basis.
Truncated SVD

• Let 𝐀 ∈ ℝ5×& be any matrix.


• Rank: 𝑟 = rank(𝐀). (𝑟 ≤ 𝑚, 𝑛)
• SVD: 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*.
• Truncated SVD’s definition: for 0 < 𝑘 < 𝑟, 𝐀K = ∑K*F" 𝜎* 𝐮* 𝐯*.
Truncated SVD

• Let 𝐀 ∈ ℝ5×& be any matrix.


• Rank: 𝑟 = rank(𝐀). (𝑟 ≤ 𝑚, 𝑛)
• SVD: 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*.
• Truncated SVD’s definition: for 0 < 𝑘 < 𝑟, 𝐀K = ∑K*F" 𝜎* 𝐮* 𝐯*.
• Truncated SVD’s properties:
• 𝐀K is the best rank-𝑘 approximation to 𝐀.
$
• 𝐀K = argmin𝐁 𝐀 − 𝐁 Q
; 𝑠. 𝑡. rank 𝐁 ≤ 𝑘.
Matrix Frobenius Norm

• Let 𝐀 = 𝑎*/ ∈ ℝ5×& be any matrix.

• Frobenius norm: 𝐀 = ∑5 ∑ & $


*F" /F" */ .
𝑎
Q

• A generalization of the ℓ$-vector norm to matrix.

• Property: 𝐀 = ∑E*F" 𝜎*$ .


Q

• 𝜎* is the 𝑖-th singular value of 𝐀.


Matrix Rank & Singular Values

• Let 𝐀 ∈ ℝ5×& be any matrix. (Suppose 𝑚 ≤ 𝑛).


• 𝑟 = rank(𝐀). (𝑟 ≤ 𝑚)
• Singular values:
𝜎" ≥ 𝜎$ ≥ ⋯ ≥ 𝜎E > 0 = 𝜎EX" = ⋯ = 𝜎5 .

• rank(𝐀) equals # of positive singular values.


Error of Truncated SVD

• SVD: 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*.


• Truncated SVD: for 0 < 𝑘 < 𝑟, 𝐀K = ∑K*F" 𝜎* 𝐮* 𝐯*. .
• Error of the truncated SVD:
$
𝐀 − 𝐀K = ∑E*FKX" 𝜎*$ .
Q

Bottom (the smallest) singular values


SVD: Example
Original image (1000×1500)
SVD: Example

Singular values

Indices
SVD: Example

Singular values Top-50 singular values

Indices
SVD: Example

Singular values

Indices (first 50)


SVD: Example
Original image (1000×1500)
SVD: Example
Rank-5 truncated SVD
SVD: Example
Rank-20 truncated SVD
SVD: Example
Rank-50 truncated SVD
SVD: Example
Rank-100 truncated SVD
SVD: Example
Rank-100 truncated SVD

A high-quality approximation using


10% singular values/vectors!
SVD: Example

• The original matrix


• Shape: 1000×1500
• #Entries: 1.5M
• The rank-100 truncated SVD
• Shape: 100×1, 100×1500, and 100×1000
• #Entries: 0.25M
• Truncated SVD saves 83% storage
Power Iteration for Computing Truncated SVD
A Property
Theorem. If 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*. is the SVD of 𝐀, then 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .

Proof.
E . .
• 𝐀. 𝐀 = ∑*F" 𝜎* 𝐮* 𝐯* ∑E/F" 𝜎/ 𝐮/ 𝐯/. = ∑E*F" 𝜎* 𝐯* 𝐮.* ∑E/F" 𝜎/ 𝐮/ 𝐯/. .
A Property
Theorem. If 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*. is the SVD of 𝐀, then 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .

Proof.
• 𝐀. 𝐀 = ∑E*F" 𝜎* 𝐯* 𝐮.* ∑E/F" 𝜎/ 𝐮/ 𝐯/. .

• 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐮.* 𝐮* 𝐯*. + ∑*`/ 𝜎* 𝜎/ 𝐯* 𝐮.* 𝐮/ 𝐯/. .

. . .
Using ∑* 𝐗 * ∑/ 𝐗/ = ∑* 𝐗 * 𝐗 * + ∑*`/ 𝐗 * 𝐗/ .
A Property
Theorem. If 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*. is the SVD of 𝐀, then 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .

Proof.
• 𝐀. 𝐀 = ∑E*F" 𝜎* 𝐯* 𝐮.* ∑E/F" 𝜎/ 𝐮/ 𝐯/. .

• 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐮.* 𝐮* 𝐯*. + ∑*`/ 𝜎* 𝜎/ 𝐯* 𝐮.* 𝐮/ 𝐯/. .

• 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 1 𝐯*. + ∑*`/ 𝜎* 𝜎/ 𝐯* 0 𝐯/. .

Using the properties of orthonormal basis: 𝐮.* 𝐮* = 1 and 𝐮.* 𝐮/ = 0 for 𝑖 ≠ 𝑗.


A Property
Theorem. If 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*. is the SVD of 𝐀, then 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .

Proof.
• 𝐀. 𝐀 = ∑E*F" 𝜎* 𝐯* 𝐮.* ∑E/F" 𝜎/ 𝐮/ 𝐯/. .

• 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐮.* 𝐮* 𝐯*. + ∑*`/ 𝜎* 𝜎/ 𝐯* 𝐮.* 𝐮/ 𝐯/. .

• 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 1 𝐯*. + ∑*`/ 𝜎* 𝜎/ 𝐯* 0 𝐯/. .

• 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. + ∑*`/ 𝜎* 𝜎/ 𝐯* 0 𝐯/. .


A Property
Theorem. If 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*. is the SVD of 𝐀, then 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .

Eigenvalue decomposition of 𝐀. 𝐀.

Implication: To compute the top singular values 𝜎* and right singular


vectors 𝐯* , we can do eigenvalue decomposition instead of SVD.
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Merely 2 matrix-vector multiplications.


Very Cheap!
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* .

• Every vector can be written as a linear combination of the orthonormal basis.


• Because 𝐱 c is randomly initialized, 𝛼" = 𝐱 c. 𝐯" = Ω(1/𝑛) with high probability.
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.

• Every vector can be written as a linear combination of the orthonormal basis.


• Because 𝐱 c is randomly initialized, 𝛼" = 𝐱 c. 𝐯" = Ω(1/𝑛) with high probability.
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d $d
• 𝐱 d ∝ 𝐀. 𝐀 d 𝐱 c = ∑E*F" 𝜎* 𝐯* 𝐯*. ∑&/F" 𝛼/ 𝐯/ = ∑E*F" 𝛼* 𝜎* 𝐯* .
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d $d
• 𝐱 d ∝ 𝐀. 𝐀 d 𝐱 c = ∑E*F" 𝜎* 𝐯* 𝐯*. ∑&/F" 𝛼/ 𝐯/ = ∑E*F" 𝛼* 𝜎* 𝐯* .

$d
• It can be proved that 𝐀. 𝐀 d = ∑E*F" 𝜎* 𝐯* 𝐯*. .
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d $d
• 𝐱 d ∝ 𝐀. 𝐀 d 𝐱 c = ∑E*F" 𝜎* 𝐯* 𝐯*. ∑&/F" 𝛼/ 𝐯/ = ∑E*F" 𝛼* 𝜎* 𝐯* .

$d
• It can be proved that 𝐀. 𝐀 d = ∑E*F" 𝜎* 𝐯* 𝐯*. .
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d $d
• 𝐱 d ∝ 𝐀. 𝐀 d 𝐱 c = ∑E*F" 𝜎* 𝐯* 𝐯*. ∑&/F" 𝛼/ 𝐯/ = ∑E*F" 𝛼* 𝜎* 𝐯* .
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d
• 𝐱 d ∝ ∑E*F" 𝛼* 𝜎* 𝐯* .
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d
• 𝐱 d ∝ ∑E*F" 𝛼* 𝜎* 𝐯* .
E kl $d E kl $d
• 𝐱d ∝ ∑*F" 𝛼* 𝐯* = 𝛼"𝐯" + ∑*F$ 𝛼* 𝐯* .
km km
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d
• 𝐱 d ∝ ∑E*F" 𝛼* 𝜎* 𝐯* .
E kl $d E kl $d
• 𝐱d ∝ ∑*F" 𝛼* 𝐯* = 𝛼"𝐯" + ∑*F$ 𝛼* 𝐯* .
km km
Power Iteration for Truncated SVD
Goal: Compute the top 1 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a vector 𝐱 c (with unit ℓ$-norm);
2. Repeat the power iteration: 𝐱 d ← 𝐀. 𝐀 𝐱 df" and 𝐱 d ← 𝐱 d / 𝐱 d .
$

Convergence Analysis (𝐱 d converges to 𝐯" ):


• 𝐱 c = ∑&*F" 𝛼* 𝐯* . Here 𝛼" = Ω 1/𝑛 with high probability.
$d
• 𝐱 d ∝ ∑E*F" 𝛼* 𝜎* 𝐯* .
E kl $d E kl $d
• 𝐱d ∝ ∑*F" 𝛼* 𝐯* = 𝛼"𝐯" + ∑*F$ 𝛼* 𝐯* .
km km

kl
It converge to 0 because < 1.
km
Power Iteration for Truncated SVD
Goal: Compute the top 𝑘 eigenvalue/eigenvector of 𝐀. 𝐀 = ∑E*F" 𝜎*$ 𝐯* 𝐯*. .
Algorithm:
1. Randomly initialize a matrix 𝐗 c ∈ ℝ&×K (entries are i.i.d. standard Gaussian);
2. Orthogonalize the columns: 𝐗 c ← orth(𝐗 c);
3. Repeat the power iteration:
i. 𝐗 d ← 𝐀. 𝐀 𝐗 df" ;
ii. 𝐗 d ← orth(𝐗 d ).
Summary

• SVD: 𝐀 = ∑E*F" 𝜎* 𝐮* 𝐯*.


• Truncated SVD: abandon the bottom singular values/vectors.
• Power iteration (algorithm) for computing truncated SVD.

You might also like