You are on page 1of 141

Analyse de données multi-dimensionnelles

Réduction de données, analyse en variables latentes et applications

Matthieu P UIGT

Université du Littoral Côte d’Opale


Ecole d’Ingénieurs du Littoral Côte d’Opale
Master Ingénierie des Systèmes Complexes
matthieu.puigt[at]univ-littoral.fr
http://www-lisic.univ-littoral.fr/~puigt/

Retrouvez ce document sur:


https://www-lisic.univ-littoral.fr/~puigt/teaching.html

Année universitaire 2022-2023

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 1


Bibliography of the lecture
From Easy-to-follow to Not-so-easy Video Tutorials:
Josh Starmer’s youtube channel (StatQuest):
https://www.youtube.com/channel/UCtYLUTtgS3k1Fg4y5tAhLbw
Luis Serrano’s youtube channel (Serrano.Academy):
https://www.youtube.com/channel/UCgBncpylJ1kiVaPyP-PZauQ
Prof. Barry Van Veen’s youtube channel:
https://www.youtube.com/channel/UCooRZ0pxedi179pBe9aXm5A
Publications:
L. Balzano & R. Nowak, Blind calibration of sensor networks, in Proc. of
IPSN, pp. 79–88, 2007.
P. Comon, & C.Jutten (Eds.), Handbook of Blind Source Separation:
Independent component analysis and applications, Academic press, 2010.
N. Gillis, The Why and How of Nonnegative Matrix Factorization, in
“Regularization, Optimization, Kernels, and Support Vector Machines,” pp.
275–310, 2014, Chapman and Hall/CRC.
N. Guan, D. Tao, Z. Luo, & B. Yuan, NeNMF: An optimal gradient method for
nonnegative matrix factorization, IEEE Trans. on Sig. Proc., 60(6),
2882–2898, 2012.
D.D. Lee & H.S. Seung, Learning the parts of objects by non-negative matrix
factorization. Nature, 401(6755), 788–791, 1999.
C.H. Martin, T.S. Peng, & M.W. Mahoney, Predicting trends in the quality of
state-of-the-art neural networks without access to training or testing data,
Nature Communications, 12(1), 1-13, 2021.
M. Puigt, O. Berné, R. Guidara, Y. Deville, S. Hosseini, & C. Joblin,
Cross-validation of blindly separated interstellar dust spectra, Proc. of
ECMS 2009, pp. 41–48, Mondragon, Spain, July 8-10, 2009.
M. Udell & A. Townsend, Why are big data matrices approximately low
rank?, SIAM Journal on Mathematics of Data Science, 1(1), 144–160, 2019.
Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition

4 Principal Component Analysis

5 Independent Component Analysis

6 Nonnegative Matrix Factorization

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 3


Introduction (1)
High dimensionality
Many problems in data science, machine learning, or signal/image
processing involve datasets in high dimensional spaces
a high dimensional space: the number d of features is above the number
n of observations (d > n)

Examples
In healthcare, for each person—aka observation—may be described by its height,
weight, temperature, blood pressure, heart rate, etc (features)
In social science, a researcher may ask some volunteers—aka
observations—tens to hundreds of questions—aka features—which are used to
derive some properties
In movie recommendation systems, each customer may grade hundreds of
movies

on TV shows (at least...)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 4


Introduction (2)
The curse of dimensionality
Recall: we observe n samples, each of them composed of d features
î Curse of dimensionality!
To infer a property, the number n of observations must exponentially
increase when d linearly increases

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 5


Introduction (2)
The curse of dimensionality
Recall: we observe n samples, each of them composed of d features
î Curse of dimensionality!
To infer a property, the number n of observations must exponentially
increase when d linearly increases

Analogy
Let’s imagine you play Mario’s enemies and you want to catch him...
... in 1D

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 5


Introduction (2)
The curse of dimensionality
Recall: we observe n samples, each of them composed of d features
î Curse of dimensionality!
To infer a property, the number n of observations must exponentially
increase when d linearly increases

Analogy
Let’s imagine you play Mario’s enemies and you want to catch him...
... in 2D

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 5


Introduction (2)
The curse of dimensionality
Recall: we observe n samples, each of them composed of d features
î Curse of dimensionality!
To infer a property, the number n of observations must exponentially
increase when d linearly increases

Analogy
Let’s imagine you play Mario’s enemies and you want to catch him...
... in 3D

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 5


Introduction (2)
The curse of dimensionality
Recall: we observe n samples, each of them composed of d features
î Curse of dimensionality!
To infer a property, the number n of observations must exponentially
increase when d linearly increases

Analogy
Let’s imagine you play Mario’s enemies and you want to catch him...

î Increasing the dimension in which Mario can move makes your problem
much harder
î Increasing the number of Mario’s enemies makes the task easier (provided it
is possible to have a good AI to help you play all the characters)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 5


Introduction (3)
Additional properties

Fortunately!
Many real problems involve structured data
They tend to be low-rank

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 6


Introduction (3)
Additional properties

Fortunately!
Many real problems involve structured data
They tend to be low-rank

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 6


Introduction (3)
Additional properties

Fortunately!
Many real problems involve structured data
They tend to be low-rank

Approximate Low-rankness
Key property in signal & image processing and in machine learning
Data matrix / tensor can be explained by a limited number of
hidden/latent variables (with a limited/negligeable error of approximation)

Goal of this lecture (6 h)


An introduction to low-rank approximation techniques applied to matrices
Applications through different signal processing or machine learning
problems
Matlab hands-on

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 6


Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition

4 Principal Component Analysis

5 Independent Component Analysis

6 Nonnegative Matrix Factorization

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 7


Low-rank matrices (1)

We consider n d-dimensional vectors x i ∈ Rd (i ∈ {1, . . . , n}), with d > n


(high dimensional sensing)
We arrange these data as a matrix X of size d × n, i.e.,
 
| |
X = x 1 . . . x n 
| |

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 8


Low-rank matrices (1)

We consider n d-dimensional vectors x i ∈ Rd (i ∈ {1, . . . , n}), with d > n


(high dimensional sensing)
We arrange these data as a matrix X of size d × n, i.e.,
 
| |
X = x 1 . . . x n 
| |

X is low-rank iif ∃k < min(n, d), ∃W ∈ Rn×k , H ∈ Rd×k , such that


  — hT —
| | 1

X = W · H T = w 1 . . . w k  · 
 .. 
. 
| | — hT — k

where the columns of W and H are linearly independent.

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 8


Low-rank matrices (1)

We consider n d-dimensional vectors x i ∈ Rd (i ∈ {1, . . . , n}), with d > n


(high dimensional sensing)
We arrange these data as a matrix X of size d × n, i.e.,
 
| |
X = x 1 . . . x n 
| |

X is low-rank iif ∃k < min(n, d), ∃W ∈ Rn×k , H ∈ Rd×k , such that


  — hT —
| | 1

X = W · H T = w 1 . . . w k  · 
 .. 
. 
| | — hT — k

where the columns of W and H are linearly independent.


X is rank-k0 if we can’t find any matrices W and H satisfying the above
property for k < k0 while we can find for k = k0 .

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 8


Low-rank matrices (2)
Example with a rank-2 matrix

X as a matrix product
 — hT1 —
 
| |
X = w 1 ... wk · 
 .. 
. 
| | — T
hk —

= ·

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 9


Low-rank matrices (2)
Example with a rank-2 matrix

X as a matrix product

= ·

X as a sum of rank-1 matrices


k
X
X = w i · hTi
i=1

= · + ·

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 9


Low-rank matrices (3)
Some applications

Clustering (K-means)
X ≈ W · H T with all hTi are all zero except for one value where it is one
î W contains the centroids and H T is a cluster membership matrix

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 10


Low-rank matrices (3)
Some applications

Nonnegative Matrix Factorization


X ≈ W · H T with X , W , H ≥ 0 (part-based decomposition enhances
interpretability)
î In audio processing, W is a matrix of frequency patterns (timbres) and H T
a matrix of time activation

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 10


Low-rank matrices (3)
Some applications

Denoising
Y = X + E where X contains some dominant patterns (X low-rank) and E
does not (noise)
Y ≈ WH T î WH T closer to X than to Y

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 10


Low-rank matrices (4)
To conclude on low-rankness
Key assumption in many data science problems
î Many data are approximately low-rank (i.e., noisy low-rank data)
Directly applicable on matrices or tensors
Connections with neural networks and deep learning – out of the scope of
this lecture

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 11


Low-rank matrices (4)
To conclude on low-rankness
Key assumption in many data science problems
î Many data are approximately low-rank (i.e., noisy low-rank data)
Directly applicable on matrices or tensors
Connections with neural networks and deep learning – out of the scope of
this lecture
Any large data matrix tends to be low-rank (Udell, 2019)
And we live a data deluge

In 2021 (source:
Satista)
500 h Youtube
video upload
695,000 photos
on Instagram
197 million
emails sent

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 11


Low-rank matrices (4)
To conclude on low-rankness
Key assumption in many data science problems
î Many data are approximately low-rank (i.e., noisy low-rank data)
Directly applicable on matrices or tensors
Connections with neural networks and deep learning – out of the scope of
this lecture
Any large data matrix tends to be low-rank (Udell, 2019)
And we live a data deluge

Let us now see some popular low-rank approximation techniques!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 11


Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition


Definitions & Properties of SVD
Hands on: Blind Sensor Calibration

4 Principal Component Analysis

5 Independent Component Analysis

6 Nonnegative Matrix Factorization

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 12


Singular Value Decomposition aka SVD (1)
Definition
Any d × n matrix X can be expressed as

X = U · Σ · VT

where:
U is d × d matrix with orthonormal columns (left singular vectors)
V is n × n matrix with orthonormal columns (right singular vectors)
Σ is a d × n diagonal matrix with σ1 ≥ 0 and σi ≥ σj if i < j (singular
values)

If d > n (high dimensional sensing) If d < n (e.g., big data)


X is tall and skinny X is short and fat
 
σ1

0 ... 0
 σ1 0 . . . 0 ... 0
0 σ2 0... 0   0 σ2 0 ... ... 0 
Σ=.
 
 ..

..

  .. . ..

. .  
0 . . . 0 σd 0... 0
 
0
Σ= ... 0 σn 

0 ... ... 0 
 
. ..
 ..

. 
0 0
Singular Value Decomposition aka SVD (2)

Let us focus on the tall-and-skinny case X = UΣV T = (UΣ)V T

 σ1 0 ... 0
 
u11 ... u1n u1,n+1 ... u1d

0 σ2 0... 0
 u21 ... u2n u2,n+1 ... u2d  

  .. ..

 .. ..
 
 .
 . . 
. 
 0

UΣ = 
un1 ... 0 σn 
... unn un,n+1 ... und  
 0


 . ... 0 
 .. .. .. ..   
. . .  ... .. 
. 
ud1 ... udn ud,n+1 ... udd
0 ... 0

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 14


Singular Value Decomposition aka SVD (2)

Let us focus on the tall-and-skinny case X = UΣV T = (UΣ)V T

 σ1 0 ... 0
 
u11 ... u1n u1,n+1 ... u1d

0 σ2 0... 0
 u21 ... u2n u2,n+1 ... u2d  

  .. ..

 .. ..
 
 .
 . . 
. 
 0

UΣ = 
un1 ... 0 σn 
... unn un,n+1 ... und  
 0


 . ... 0 
 .. .. .. ..   
. . .  ... .. 
. 
ud1 ... udn ud,n+1 ... udd
0 ... 0

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 14


Singular Value Decomposition aka SVD (2)

Let us focus on the tall-and-skinny case X = UΣV T = (UΣ)V T


î And thus we can get the economy-size SVD, i.e.,

u11 ... u1n


 
 u21 ... u2n  σ 0 ... 0

 1
 .. ..   0

 . .  σ2 0... 0 
UΣ =  .

..
  ..
un1 ... unn 

 . 
 . ..  0
 ..

. ... 0 σn
ud1 ... udn
That is, U can be reduced to a d × n orthonormal matrix and Σ a n × n
diagonal matrix.

Similarly, we can truncate V if n > d (to convince yourself, apply SVD on X T


with d > n)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 14


Singular Value Decomposition aka SVD (3)
Properties

Orthonormality
UU T = I, U T U = I, VV T = I, VTV = I
Be careful! there might be some differences in the dimension of the identity
matrices (esp. with the econ. SVD)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 15


Singular Value Decomposition aka SVD (3)
Properties

Bases of X
The vectors in U and V are orthonormal bases for rows and columns of X ,
respectively

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 15


Singular Value Decomposition aka SVD (3)
Properties

Rank
If rank(X ) = k < min(n, d), then

σ1 ≥ σ2 ≥ . . . ≥ σk ≥ σk +1 = . . . = σmin(n,d) = 0

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 15


Singular Value Decomposition aka SVD (3)
Properties

Low-rank approximation of a matrix (Eckart-Young theorem, 1936)


Let X be a rank-k matrix and r < k .
k
X
min kX − Mk2F = σi2 ,
rank (M)≤r
i=r +1
Pr
where M = i=1 σi u i · v Ti and X = UΣV T is the SVD

X = σ1 · u 1 · v T1 +σ2 · u 2 · v T2 + . . . + σk · u k · v Tk

Most
Patterns: 2nd most k -th most
important

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 15


Singular Value Decomposition aka SVD (3)
Properties

Low-rank approximation of a matrix (Eckart-Young theorem, 1936)


Let X be a rank-k matrix and r < k .
k
X
min kX − Mk2F = σi2 ,
rank (M)≤r
i=r +1
Pr
where M = i=1 σi u i · v Ti and X = UΣV T is the SVD

X = σ1 · u 1 · v T1 +σ2 · u 2 · v T2 + . . . + σk · u k · v Tk

Most
Patterns: 2nd most k -th most
important

The singular values provide an ordered


ranking of components (i.e., the decrease of σi
allows to find an optimal rank k )

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 15


Singular Value Decomposition aka SVD (3)
Properties

Example
Truncated SVD applied on a real image

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 15


Hands on: Blind Sensor Calibration

We are now going to see an application of SVD.


The content of the next slides is inspired by:
L. Balzano & R. Nowak, Blind calibration of sensor networks, in Proc. of
IPSN, pp. 79–88, 2007.
C. Dorffer, M. Puigt, G. Delmaire, G. Roussel, Outlier-robust calibration
method for sensor networks, in Proc. IEEE ECMSM, 2017.

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 16


The "why" of sensor calibration

Sensed phenomenon =⇒ voltage


Voltage =⇒ phenomenon?

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 17


The "why" of sensor calibration

Sensed phenomenon =⇒ voltage


Voltage =⇒ phenomenon?

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 17


The "why" of sensor calibration

Sensed phenomenon =⇒ voltage


Voltage =⇒ phenomenon?
Sensor calibration needed
Not always physically possible
î Blind sensor calibration

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 17


The "why" of sensor calibration

Sensed phenomenon =⇒ voltage


Voltage =⇒ phenomenon?
Sensor calibration needed
Not always physically possible
î Blind sensor calibration
Fixed sensor network
Many methods including
Projection-based calibration (L. Balzano and R. Nowak 2007)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 17


The "why" of sensor calibration

Sensed phenomenon =⇒ voltage


Voltage =⇒ phenomenon?
Sensor calibration needed
Not always physically possible
î Blind sensor calibration
Fixed sensor network
Many methods including
Projection-based calibration (L. Balzano and R. Nowak 2007)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 17


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .
Calibrated and uncalibrated sensor readings denoted xn (t) and yn (t)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .
Calibrated and uncalibrated sensor readings denoted xn (t) and yn (t)
Affine response model:

xn (t) ≈ αn yn (t) + βn

where αn (resp. βn ) is the n-th sensor gain (resp. offset) correction

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .
Calibrated and uncalibrated sensor readings denoted xn (t) and yn (t)
Affine response model:

xn (t) ≈ αn yn (t) + βn

where αn (resp. βn ) is the n-th sensor gain (resp. offset) correction


No sensor gain is zero
xn (t) − βn
yn (t) ≈
αn

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .
Calibrated and uncalibrated sensor readings denoted xn (t) and yn (t)
Affine response model:

xn (t) ≈ αn yn (t) + βn

where αn (resp. βn ) is the n-th sensor gain (resp. offset) correction


No sensor gain is zero
xn (t) − βn
yn (t) ≈
αn
Sensors are pre-calibrated before deployment
î First sensor readings of yi (t) thus correspond to xi (t)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .
Calibrated and uncalibrated sensor readings denoted xn (t) and yn (t)
Affine response model:

xn (t) ≈ αn yn (t) + βn

where αn (resp. βn ) is the n-th sensor gain (resp. offset) correction


No sensor gain is zero
xn (t) − βn
yn (t) ≈
αn
Sensors are pre-calibrated before deployment
î First sensor readings of yi (t) thus correspond to xi (t)
The network is dense enough to oversampling the signal space
î Rank r of the observed signal is known with r  N

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (1) – Problem Statement and Assumptions
Network composed of N fixed and synchronized sensors observed at
Times t = t1 , . . . , td .
Calibrated and uncalibrated sensor readings denoted xn (t) and yn (t)
Affine response model:

xn (t) ≈ αn yn (t) + βn

where αn (resp. βn ) is the n-th sensor gain (resp. offset) correction


No sensor gain is zero
xn (t) − βn
yn (t) ≈
αn
Sensors are pre-calibrated before deployment
î First sensor readings of yi (t) thus correspond to xi (t)
The network is dense enough to oversampling the signal space
î Rank r of the observed signal is known with r  N
Observed signal subspace S is known, e.g.,
we can learn the subspace in which lie the "calibrated" data from the first
readings,
the subspace can be provided by experts in case of no pre-calibration.

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 18


The "how" of calibration (2) – Example
Example with known rank-1 subspace S
 
x1 (t) = α1 · y1 (t) + β1
∈ S, ∀t
x2 (t) = α2 · y2 (t) + β2

x2

x2 (t2 )− •
x2 (t3 )− •

x2 (t1 )− •

| | |
x1 (t1 ) x1 (t2 ) x1
x1 (t3 )
M. Puigt Analyse de données multi-dimensionnelles 2022-2023 19
The "how" of calibration (2) – Example
Example with known rank-1 subspace S
 
x1 (t) = α1 · y1 (t) + β1
∈ S, ∀t
x2 (t) = α2 · y2 (t) + β2

y2

• S



gain effect

y1

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 19


The "how" of calibration (2) – Example
Example with known rank-1 subspace S
 
x1 (t) = α1 · y1 (t) + β1
∈ S, ∀t
x2 (t) = α2 · y2 (t) + β2

y2 •

• S

• •


gain effect
offset effect

y1

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 19


The "how" of calibration (3) – General strategy
y2 •

S

y1
The "how" of calibration (3) – General strategy

1 Removing the offset contributions
y2
• (by centering the signals)
S
Indeed...

d
1X
yn = yn (tj )
d
i=1
1
Pd
xn (ti ) βn
= d n=1 −
αn αn

y1 and ∀k ∈ {1, . . . , d},


1
Pd
xn (tk ) − d i=1 xn (tk )
yn (tk ) − yn =
αn
The "how" of calibration (3) – General strategy
1 Removing the offset contributions
y2
(by centering the signals)
• S
• Indeed...

d
• 1X
yn = yn (tj )
d
i=1
1
Pd
xn (ti ) βn
= d n=1 −
αn αn

y1 and ∀k ∈ {1, . . . , d},


1
Pd
xn (tk ) − d i=1 xn (tk )
yn (tk ) − yn =
αn
The "how" of calibration (3) – General strategy
1 Removing the offset contributions
y2
(by centering the signals)
• S 2 Projecting data onto S ⊥

y1

S⊥

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 20


The "how" of calibration (3) – General strategy
1 Removing the offset contributions
y2
(by centering the signals)
• S 2 Projecting data onto S ⊥

y1

S⊥

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 20


The "how" of calibration (3) – General strategy
1 Removing the offset contributions
y2
(by centering the signals)
• S 2 Projecting data onto S ⊥

3 Estimating sensor gains by nullspace
projection, i.e.

PΩ · Y1
 
..
 ·α ≈ 0,
 
 .
PΩ · Yd
| {z }
C
y1
where
PΩ is the projection operator onto
S⊥.
Y
k =
y1 (tk ) − y1

S⊥
 . ..

 0 0 
yN (tk ) − yN

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 20


The "how" of calibration (3) – General strategy
1 Removing the offset contributions
y2
(by centering the signals)
• S 2 Projecting data onto S ⊥

3 Estimating sensor gains by nullspace
projection, i.e.

PΩ · Y1
 
..
 ·α ≈ 0,
 
 .
PΩ · Yd
| {z }
C
y1
where
PΩ is the projection operator onto
S⊥.
Y
k =
y1 (tk ) − y1

S⊥
 . ..

 0 0 
yN (tk ) − yN
4 Estimating sensor offsets (e.g., by
least squares if true data are
zero-mean)
M. Puigt Analyse de données multi-dimensionnelles 2022-2023 20
The "how" of calibration (4)

Is that really possible?


Yes! Proof of convergence if the number d of "snapshots" satisfies (Balzano
& Nowak, 2007)  
N −1
d> +1
N −r
But a scale indeterminacy remains. Possibility to solve it by, e.g., assuming
that α1 = 1

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 21


The "how" of calibration (4)

How to solve C · α = 0? (Balzano & Nowak, 2007)


1 Least-squares:
Remove α1 from α and obtain α0
Remove the first column c 1 from C and obtain C 0
The problem to solve reads C 0 · α0 = −c 1
α0 = −C 0−† · c 1 where −† denotes the pseudo-inverse (pinv in Matlab)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 21


The "how" of calibration (4)

How to solve C · α = 0? (Balzano & Nowak, 2007)


1 Least-squares:
Remove α1 from α and obtain α0
Remove the first column c 1 from C and obtain C 0
The problem to solve reads C 0 · α0 = −c 1
α0 = −C 0−† · c 1 where −† denotes the pseudo-inverse (pinv in Matlab)
2 SVD:
Compute the economy-size SVD of C (svd(C,0) in Matlab)
The last singular vector of V is proportional to α
α can be derived by dividing the last SV of V by V (1)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 21


The "how" of calibration (4)

How to solve C · α = 0? (Balzano & Nowak, 2007)


1 Least-squares:
Remove α1 from α and obtain α0
Remove the first column c 1 from C and obtain C 0
The problem to solve reads C 0 · α0 = −c 1
α0 = −C 0−† · c 1 where −† denotes the pseudo-inverse (pinv in Matlab)
2 SVD:
Compute the economy-size SVD of C (svd(C,0) in Matlab)
The last singular vector of V is proportional to α
α can be derived by dividing the last SV of V by V (1)

Your turn now!


You are given a Matlab code to fill in order to perform calibration. Which
method is the most accurate?

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 21


Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition

4 Principal Component Analysis


Definition & Properties of PCA
Application

5 Independent Component Analysis

6 Nonnegative Matrix Factorization

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 22


Principal Component Analysis (1)
Concept

We assume X a d × n real-valued centereda data matrix.

a
Recall: centering an observation consists of removing its mean! If data are arranged as n
observations of d features, the datapoints in each column are centered around the origin

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

We assume X a d × n real-valued centereda data matrix.


PCA aims to find the "principal directions" of X , i.e., a basis (the
directions are orthogonal and their norm is one) which describes X the
best.

a
Recall: centering an observation consists of removing its mean! If data are arranged as n
observations of d features, the datapoints in each column are centered around the origin

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

Adapted from StatQuest video on PCA

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

Adapted from StatQuest video on PCA

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

We assume X a d × n real-valued centereda data matrix.


PCA aims to find the "principal directions" of X , i.e., a basis (the
directions are orthogonal and their norm is one) which describes X the
best.
The first principal direction is the one which maximizes the variance of
the data (or minimizes the error of fit).

a
Recall: centering an observation consists of removing its mean! If data are arranged as n
observations of d features, the datapoints in each column are centered around the origin

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

Adapted from StatQuest video on PCA

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

Adapted from StatQuest video on PCA

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

We assume X a d × n real-valued centereda data matrix.


PCA aims to find the "principal directions" of X , i.e., a basis (the
directions are orthogonal and their norm is one) which describes X the
best.
The first principal direction is the one which maximizes the variance of
the data (or minimizes the error of fit).
The second principal direction is the one which is orthogonal to the first
principal direction and which maximizes the variance of the data
And so on...

a
Recall: centering an observation consists of removing its mean! If data are arranged as n
observations of d features, the datapoints in each column are centered around the origin

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (1)
Concept

Adapted from StatQuest video on PCA

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 23


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads
 
C = E X · XT

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads
 
C = |{z}E X · XT
mean

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

By applying an SVD on X = U · Σ · V T , we obtain

1 T
max · f · U · Σ · V T · (U · Σ · V T )T · f
kf k22 =1 n

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

By applying an SVD on X = U · Σ · V T , we obtain

1 T
max · f · U · Σ · V T · V · Σ · UT · f
kf k22 =1 n

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

By applying an SVD on X = U · Σ · V T , we obtain

1 T
max · f · U · Σ2 · U T · f
kf k22 =1 n

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

By applying an SVD on X = U · Σ · V T , we obtain

1 T
max · f · U · Σ2 · U T · f
kf k22 =1 n
... whose solution is f = u 1 , i.e., the first principal vector is the first
singular vector

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

By applying an SVD on X = U · Σ · V T , we obtain

1 T
max · f · U · Σ2 · U T · f
kf k22 =1 n
... whose solution is f = u 1 , i.e., the first principal vector is the first
singular vector
σ12
And the associated variance is 1
n
· u T1 · X · X T · u 1 = n

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (2)
Mathematical formulation

From the centered d × n matrix X , we derive its covariance matrix which


reads

1 
C= X · XT
n
Finding the first principal direction consists of solving
1 T
max · f · X · XT · f
kf k22 =1 n

By applying an SVD on X = U · Σ · V T , we obtain

1 T
max · f · U · Σ2 · U T · f
kf k22 =1 n
... whose solution is f = u 1 , i.e., the first principal vector is the first
singular vector
σ12
And the associated variance is 1
n
· u T1 · X · X T · u 1 = n
Similarly for the other principal vectors!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 24


Principal Component Analysis (3)

Some applications of PCA


Denoising, dimensionality reduction
Orthogonal regression (aka Total Least Squares)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 25


Principal Component Analysis (3)

Some applications of PCA


Denoising, dimensionality reduction
Orthogonal regression (aka Total Least Squares)
Analysis of Deep Learning networks

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 25


Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition

4 Principal Component Analysis

5 Independent Component Analysis


Let’s talk about linear systems
A kind of magic?
A bit of history
from PCA to ICA
Non-Gaussianity-based ICA

6 Nonnegative Matrix Factorization

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 26


Independent Component Analysis

Concept highly linked with blind source separation


In the 1950s, the concept of cocktail party problem was expressed.
You are attending a party which is a bit crowdy and noisy
Still, you are able to listen to the people near you
However, if you only hear with one ear, you can’t understand anything!
î Modern cocktail party problem: political debate on TV!
Doing what our ears are doing with computers is pretty challenging!
Let’s introduce the mathematical background first!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 27


Let’s talk about linear systems
All of you know how to solve this kind of systems:

2 · s1 + 3 · s2 = 5
(1)
3 · s1 − 2 · s2 = 1

If we resp. define A, s, and x the matrix and the vectors:


 
2 3
A= , s = [s1 , s2 ] T , and x = [5, 1]T
3 −2

Eq. (1) begins


x =A·s
whose solution is
s = A−1 · x = [1, 1]T
Finding s with respect to x is called an inverse problem as we must invert
the operator A.

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 28


Let’s talk about linear systems
All of you know how to solve this kind of systems:

2 · s1 + 3 · s2 + . . . + 7 · s5 = 5
(1)
3 · s1 − 2 · s2 + . . . + 2 · s5 = 1

If we resp. define A, s, and x the matrix and the vectors:


 
2 3 ... 7
A= , s = [s1 , s2 , . . . , s5 ] T , and x = [5, 1]T
3 −2 . . . 2

Eq. (1) begins


x =A·s
whose solution is
s =???
Finding s with respect to x is called an inverse problem as we must invert
the operator A.

How to solve this kind of problem if?


There are more unknown than equations (ill-posed inverse problem)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 28


Let’s talk about linear systems
All of you know how to solve this kind of systems:

? · s1 +? · s2 = 5
(1)
? · s1 +? · s2 = 1

If we resp. define A, s, and x the matrix and the vectors:


 
? ?
A= , s = [s1 , s2 ] T , and x = [5, 1]T
? ?

Eq. (1) begins


x =A·s
whose solution is
s = A−1 · x =?
Finding s with respect to x is called an inverse problem as we must invert
the operator A.

How to solve this kind of problem if?


There are more unknown than equations (ill-posed inverse problem)
We do not know the operator A (blind source separation)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 28


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

a11 · s1 + a12 · s2 = 5
a21 · s1 + a22 · s2 = 1

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

a11 · s1 (t1 ) + a12 · s2 (t1 ) = 5
a21 · s1 (t1 ) + a22 · s2 (t1 ) = 1

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

a11 · s1 (t2 ) + a12 · s2 (t2 ) = 0
a21 · s1 (t2 ) + a22 · s2 (t2 ) = 7

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

a11 · s1 (t3 ) + a12 · s2 (t3 ) = −4.2
a21 · s1 (t3 ) + a22 · s2 (t3 ) = 0.7

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

a11 · s1 (t4 ) + a12 · s2 (t4 ) = 2.1
a21 · s1 (t4 ) + a22 · s2 (t4 ) = 7.5

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

a11 · s1 (t) + a12 · s2 (t) = x1 (t)
a21 · s1 (t) + a22 · s2 (t) = x2 (t)

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A kind of magic?

Denoting  
a11 a12
A= ,
a21 a22
Eq. (1) reads

 
  | |
— x1 —
= A · s 1 s2  or X = A · S
— x2 —
| |

Fortunately...
In many signal processing / machine learning / high-dimensional data,
we get several samples of the data,
î i.e., we have a series of systems of equations!
We can use some statistical properties to invert A

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 29


A bit of history
BSS problem formulated around 1982, by Hans, Hérault, and Jutten for a
biomedical problem and first papers in the mid of the 80s
Great interest from the community, mainly in France and later in Europe
and in Japan, and then in the USA
Several special session in international conferences (e.g., GRETSI’93,
NOLTA’95)
First workshop in 1999, in Aussois, France. One conference each 18 months
until 2018!
People with different backgrounds: signal processing, statistics, neural
networks, and later machine learning, artificial intelligence
Topic renamed “Latent Variable Analysis” (LVA)
Initially, BSS addressed for simple mixtures (as in this lecture) but
More complex mixture models (well suited to acoustics) in the mid and the
end of the 90s
Until the end of the 90s, BSS ≈ ICA
First NMF methods in the mid of the 90s but famous contribution in 1999
Other methods based on sparsity around 2000
Deep learning techniques since the 2010s

a generic problem
Many applications, e.g., biomedical, audio processing, telecommnications,
astrophysics, image classification, underwater acoustics, finance, quantum
information processing
M. Puigt Analyse de données multi-dimensionnelles 2022-2023 30
From PCA to ICA
Independent ⇒ Uncorrelated (but not the inverse in general)
Let us see a graphical
 example with uniformsourcess,x = As with
−0.2485 0.8352
P = N = 2 and A =
0.4627 −0.6809

Source distributions (red: source directions) Mixture distributions (green: eigenvectors)

M. Puigt A very short introduction to BSS April/May 2011 31


From PCA to ICA
Independent ⇒ Uncorrelated (but not the inverse in general)
Let us see a graphical
 example with uniformsourcess,x = As with
−0.2485 0.8352
P = N = 2 and A =
0.4627 −0.6809

Source distributions (red: source directions) Output distributions after whitening


PCA does “half the job” and we need to rotate the data to achieve the
separation!
M. Puigt A very short introduction to BSS April/May 2011 31
Principles of ICA

I won’t go through the details of ICA.


Many methods have been proposed since 1984
Some ICA results are famous...

Historical approaches based on non-Gaussianity


Let’s understand why !

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 31


ICA based on non-Gaussianity
Mix sources ⇒ Gaussian observations
Why?
Theorem of central limit states that sum of random variables tends to a
Gaussian distribution

M. Puigt A very short introduction to BSS April/May 2011 34


Non-Gaussian ICA concept

If at most one source is Gaussian iid


Then, we can find a matrix W such that the signals yi (t) in

Y =W ·X

are the least Gaussian!


2
The Kurtosis defined as kurt(y ) = E y 4 − 3 E y 2
 
is null if y is
Gaussian.*
In practice, we first apply PCA to the data (whitening)
Then we use the kurtosis to find the rotation angle

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 32


Back to our toy example

Source distributions (red: source directions) Mixture distributions (green: eigenvectors)

M. Puigt A very short introduction to BSS April/May 2011 41


Back to our toy example

Source distributions Output distributions after whitening

M. Puigt A very short introduction to BSS April/May 2011 35


Back to our toy example

M. Puigt A very short introduction to BSS April/May 2011 35


Back to our toy example

Source distributions (red: source directions) Output distributions after ICA

M. Puigt A very short introduction to BSS April/May 2011 41


Some ICA applications

source separation (audio signals, image, financial data, etc)


denoising

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 33


Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition

4 Principal Component Analysis

5 Independent Component Analysis

6 Nonnegative Matrix Factorization


Definition & Properties of NMF
Hands on: Hyperspectral unmixing

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 34


Nonnegative Matrix Factorization (1)
Why is it so popular?

In many problems, matrices X ≈ G · F are non-negative, e.g.,


chemical concentration analysis
text analysis
(hyperspectral) image analysis
electric power consumption data
î Non-negativity on G and/or F yields better interpretability

NMF applied to face dataset (source: Lee & Seung, 1999)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 35


Nonnegative Matrix Factorization (1)
Why is it so popular?

In many problems, matrices X ≈ G · F are non-negative, e.g.,


chemical concentration analysis
text analysis
(hyperspectral) image analysis
electric power consumption data
î Non-negativity on G and/or F yields better interpretability

Principal Component Analysis applied to face dataset (source: Lee & Seung, 1999)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 35


Nonnegative Matrix Factorization (2)
Mathematical formulation and some properties

We aim to solve X ≈ G · F with X , G, F ≥ 0

How to define the discrepancy of X wrt G · F ?


Frobenius norm 21 kX − G · F k2F
î Similar to Euclidian norm for matrices
Kullback-Leibler divergence, parametric divergences, etc
î Out of the scope of this lecture!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 36


Nonnegative Matrix Factorization (2)
Mathematical formulation and some properties

We aim to solve X ≈ G · F with X , G, F ≥ 0


We might take into account additional constraints on G or F
e.g., smoothness, sparsity, etc
î Out of the scope of this lecture

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 36


Nonnegative Matrix Factorization (2)
Mathematical formulation and some properties

We aim to solve X ≈ G · F with X , G, F ≥ 0


We might take into account additional constraints on G or F
e.g., smoothness, sparsity, etc
î Out of the scope of this lecture
Issues:
1 NMF is NP-hard in the general case, i.e., convergence to a global minimum
is not guaranteed in the general case
2 NMF solution is not unique
3 Choice of the NMF rank k can be tricky

Proof of the non-uniqueness of the NMF solution


If G ∈ Rd×k
+ and F ∈ Rk+×n are some NMF solutions,
then for any k × k invertible matrix D,
(G · D) and (D −1 · F ) are also solutions.

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 36


Nonnegative Matrix Factorization (3)
How to solve NMF?

General strategy
1 Initialize the iteration number t = 1, and initialize G1 and F1
2 for t = 2 until a maximum number of iterations or a stopping criterion
1 Update G s.t.
1 1
kX − Gt+1 Ft k2F ≤ kX − Gt Ft k2F
2 2
2 Update F s.t.
1 1
kX − Gt+1 Ft+1 k2F ≤ kX − Gt+1 Ft k2F
2 2

How to proceed in practice ? Some classical algorithms

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37


Nonnegative Matrix Factorization (3)
How to solve NMF?

Alternating least squares


Apply Least Squares to update G or F
e.g., Gt+1 = (X · FtT ) · (Ft · FtT )−1
Replace negative entries by zero or a small positive threshold

Comments
Very fast
but not accurate!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37


Nonnegative Matrix Factorization (3)
How to solve NMF?

Non-negative alternating least squares


Apply Non-negative Least Squares to update W or H

Comments
Very easy to implement (e.g., in Matlab, lsqnonneg)
but slow!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37


Nonnegative Matrix Factorization (3)
How to solve NMF?

Gradient descend
Update using gradient descend
e.g., Gt+1 = Gt − ν · ∇G J (Gt , Ft ) where
J (G, F ) = 12 kX − GF k2F
∇G J (G, F ) = G · F · F T − X · F T
ν is a weight
Replace negative entries by zero or a small positive threshold

Comments
Choice of ν?
î Possibility to find optimal ν (but takes some time)
Alternative: replace classical gradient descend by extrapolation (Guan et al.,
2012)

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37


Nonnegative Matrix Factorization (3)
How to solve NMF?

Multiplicative updates
Gradient descend with a well-chosen weight so that the sum with ν is
replaced by a product

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37


Nonnegative Matrix Factorization (3)
How to solve NMF?

Proof
Gradient descend:
h i
∀i, j, Fij = Fij + νij (GT · X )ij − (GT · G · F )ij

Fij
We set: νij = (GT ·G·F )ij

We obtain
Fij h i
∀i, j, Fij = Fij + (GT · X )ij − (GT · G · F )ij
(GT
· G · F )ij
(GT · X )ij
 
= Fij · 1 + − 1
(GT · G · F )ij
(GT · X )ij
= Fij ·
(GT · G · F )ij

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37


Nonnegative Matrix Factorization (3)
How to solve NMF?

Multiplicative updates
Gradient descend with a well-chosen weight so that the sum with ν is
replaced by a product
Principle of the “heuristic" method:

∇J − (Gt , Ft ) X · FtT
Gt+1 = Gt ◦ , i.e., G t+1 = G t ◦
∇J + (Gt , Ft ) Gt · Ft · FtT
where
· F T}
∇G J (G, F ) = |G · F{z· F T} − |X {z
∇J + (G,F ) ∇J − (G,F )
◦ and the division are the elementwise operations (.* and ./ in Matlab)

Comments
Non-negativity of G and F kept along iterations
Easy to implement
But very slow when X is large!
M. Puigt Analyse de données multi-dimensionnelles 2022-2023 37
Hands on: Hyperspectral unmixing using NMF

We are now going to see an application of NMF. The content of the next
slides is inspired by:
M. Puigt, O. Berné, R. Guidara, Y. Deville, S. Hosseini, C. Joblin:
Cross-validation of blindly separated interstellar dust spectra, Proc. of
ECMS 2009, pp. 41–48, Mondragon, Spain, July 8-10, 2009

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 38


Problem Statement (1)
Interstellar medium
Lies between stars in our galaxy
Concentrated in dust clouds which play a major role in the evolution of
galaxies

Adapted from: http://www.nrao.edu/pr/2006/gbtmolecules/, Bill Saxton, NRAO/AUI/NSF

M. Puigt A very short introduction to BSS April/May 2011 60


Problem Statement (1)
Interstellar medium
Lies between stars in our galaxy
Concentrated in dust clouds which play a major role in the evolution of
galaxies

Interstellar dust
Absorbs UV light and re-emit it in
the IR domain
Several grains in Photo-
Dissociation Regions (PDRs)
Spitzer IR spectrograph provides
hyperspectral datacubes
N
x(n,m) (λ) = ∑ a(n,m),j sj (λ) Polycyclic Aromatic
j=1 Hydrocarbons
ñ Blind Source Separation (BSS) Very Small Grains
Big grains
M. Puigt A very short introduction to BSS April/May 2011 60
Problem Statement (1)
Interstellar medium
Lies between stars in our galaxy
Concentrated in dust clouds which play a major role in the evolution of
galaxies

Interstellar dust
Absorbs UV light and re-emit it in
the IR domain
Several grains in Photo-
Dissociation Regions (PDRs)
Spitzer IR spectrograph provides
hyperspectral datacubes
N
x(n,m) (λ) = ∑ a(n,m),j sj (λ)
j=1

ñ Blind Source Separation (BSS)

M. Puigt A very short introduction to BSS April/May 2011 60


Problem Statement (2)

ñ ñ Separation

How to validate the separation of unknown sources?


Cross-validation of the performance of numerous BSS methods based
on different criteria
Deriving a relevant spatial structure of the emission of grains in PDRs

M. Puigt A very short introduction to BSS April/May 2011 61


Blind Source Separation
Three main classes
Independent Component Analysis (ICA)
Sparse Component Analysis (SCA)
Non-negative Matrix Factorization (NMF)

Tested ICA methods


1 FastICA:
Maximization of
non-Gaussianity
Sources are stationary
2 Guidara et al. ICA method:
Maximum likelihood
Sources are Markovian
processes & non-stationary

M. Puigt A very short introduction to BSS April/May 2011 62


Blind Source Separation
Three main classes
Independent Component Analysis (ICA)
Sparse Component Analysis (SCA)
Non-negative Matrix Factorization (NMF)

Tested SCA methods


Low sparsity assumption
Three methods with the same
structure

1 LI-TIFROM-S: based on ratios


of TF mixtures
2 LI-TIFCORR-C & -NC: based
on TF correlation of mixtures

M. Puigt A very short introduction to BSS April/May 2011 62


Blind Source Separation
Three main classes
Independent Component Analysis (ICA)
Sparse Component Analysis (SCA)
Non-negative Matrix Factorization (NMF)

Tested SCA methods


Low sparsity assumption
Three methods with the same
structure

1 LI-TIFROM-S: based on ratios


of TF mixtures
2 LI-TIFCORR-C & -NC: based
on TF correlation of mixtures

M. Puigt A very short introduction to BSS April/May 2011 62


Blind Source Separation
Three main classes
Independent Component Analysis (ICA)
Sparse Component Analysis (SCA)
Non-negative Matrix Factorization (NMF)

Tested SCA methods


Low sparsity assumption
Three methods with the same
structure

1 LI-TIFROM-S: based on ratios


of TF mixtures
2 LI-TIFCORR-C & -NC: based
on TF correlation of mixtures

M. Puigt A very short introduction to BSS April/May 2011 62


Blind Source Separation
Three main classes
Independent Component Analysis (ICA)
Sparse Component Analysis (SCA)
Non-negative Matrix Factorization (NMF)

Tested NMF method


Lee & Seung algorithm:
b and source matrix b
Estimate both mixing matrix A S from observation
matrix X
Minimization of the divergence between observations and estimated matrices:
   
  
   

  Xij 
div X|AS = ∑ Xij log 
b b  − Xij + ASb b
i,j 
 bb
A S ij 

ij

M. Puigt A very short introduction to BSS April/May 2011 62


Pre-processing stage
Additive noise not taken into account in the mixing model
More observations than sources
ñ Pre-processing stage for reducing the noise & the complexity:

For ICA and SCA methods


1 Sources centered and normalized
2 Principal Component Analysis

For NMF method


Above pre-processing stage not possible
Presence of some rare negative samples in observations
ñ Two scenarii
1 Negative values are outliers not taken into account
2 Negativeness due to pipeline: translation of the observations to positive
values

M. Puigt A very short introduction to BSS April/May 2011 63


Estimated spectra from Ced 201 datacube

c R. Croman www.rc-astro.com

Black: Mean values


Gray: Enveloppe

M. Puigt A very short introduction to BSS April/May 2011 64


Estimated spectra from Ced 201 datacube

c R. Croman www.rc-astro.com

Black: Mean values


Gray: Enveloppe

NMF with 1st scenario

M. Puigt A very short introduction to BSS April/May 2011 64


Estimated spectra from Ced 201 datacube

c R. Croman www.rc-astro.com

Black: Mean values


Gray: Enveloppe

NMF with 1st scenario

FastICA

M. Puigt A very short introduction to BSS April/May 2011 64


Estimated spectra from Ced 201 datacube

c R. Croman www.rc-astro.com

Black: Mean values


Gray: Enveloppe

NMF with 1st scenario

FastICA

All other methods

M. Puigt A very short introduction to BSS April/May 2011 64


Distribution map of chemical species

ñ ñ
x(n,m) (λ) = ∑Nj=1 a(n,m),j sj (λ) yk (λ) = ηj sj (λ)

How to compute distribution map of grains?


 
cn,m,k = E x(n,m) (λ)yk (λ) = a(n,m),j ηj E sj (λ)2

M. Puigt A very short introduction to BSS April/May 2011 65


Distribution map of chemical species

M. Puigt A very short introduction to BSS April/May 2011 65


Conclusion

Conclusion
1 Cross-validation of separated spectra with various BSS methods
Quite the same results with all BSS methods
Physically relevant
2 Distribution maps provide another validation of the separation step
Spatial distribution not used in the separation step
Physically relevant

M. Puigt A very short introduction to BSS April/May 2011 66


Hands on

Your turn!
Joint lab subject
î Lab report to write!

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 39


Table of Contents

1 Introduction

2 Low-rank matrices

3 Singular Value Decomposition

4 Principal Component Analysis

5 Independent Component Analysis

6 Nonnegative Matrix Factorization

7 Conclusion

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 40


Conclusion of the lecture

Short introduction on high dimensional data anlaysis


Curse of dimensionality
î Dimensionality reduction
Low-rank approximation techniques (SVD, PCA, NMF)
Applications

Thank you for your attention.


Questions?

M. Puigt Analyse de données multi-dimensionnelles 2022-2023 41

You might also like