PCA, ICA, Blind Source Seperation: EEE 485/585 Statistical Learning and Data Analytics

PCA, ICA, Blind
Source Seperation
Chapter 9
PCA, ICA, Blind Source Seperation
EEE 485/585 Statistical Learning and Data Analytics Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
Cem Tekin
Bilkent University
Cannot be distributed outside this class without the permission of the

instructor. 9.1
PCA, ICA, Blind
Unsupervised learning Source Seperation
Supervised learning: Training data = {(x i , yi )}ni=1

Unsupervised learning: Training data = {x i }ni=1
Can be used on its own, but also as a pre-processing step
before supervised learning!
Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.2
PCA, ICA, Blind
Principal component analysis (PCA) Source Seperation
x i = [xi1 , xi2 , . . . , xip ]T

Can we find z i = [zi1 , zi2 , . . . , zik ]T , k << p such that
x i ≈ z i (in some sense)?
Find a k -dimensional subspace in which the data
approximately lies. Unsupervised
Learning
I.e., the subspace captures almost all variation in the data. Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.3
PCA, ICA, Blind
Normalize Source Seperation
1
Pn
x̄j = n i=1 xij , j = 1, . . . , p
xij ← xij − x̄j , j = 1, . . . , p, i = 1, . . . , n.
Pn
σj2 = n1 i=1 xij2 , j = 1, . . . , p
xij
xij ← σj , j = 1, . . . , p, i = 1, . . . , n.
Unsupervised
Learning
Dividing by σj is not necessary if attributes are on the
Principal Component
same scale. Ex: Each correspond to value in USD Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.4
PCA, ICA, Blind
Computing the major axis of variation Source Seperation
Major axis of variation = first principal component

Projection of x = [x1 , . . . , xp ] to u = [u1 , . . . , up ]T is
z = x T uu
Variance of the projections:
Unsupervised
Learning
n n Principal Component
1X T 2 1X T T T Analysis (PCA)
(x i u) = (x i u) (x i u) Blind Source
n n Separation Problem
i=1 i=1
Blind Source
n
1 X
T
Separation Problem
= u x i x Ti u Independent
n Component Analysis
i=1 (ICA)
n
!
1X
= uT x i x Ti u
n
i=1
9.5
PCA, ICA, Blind
Optimization problem Source Seperation
Pn
Σ = n1 i=1 x i x Ti is the sample covariance matrix of the
data
maximize u T Σu subject to ||u|| = 1 Unsupervised

Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.6
PCA, ICA, Blind
Optimization problem Source Seperation
Pn
Σ = n1 i=1 x i x Ti is the sample covariance matrix of the
data
maximize u T Σu subject to ||u|| = 1 Unsupervised

Learning
Method of Lagrange multipliers: Principal Component

Analysis (PCA)
Blind Source
Separation Problem
⇒ maximize u T Σu subject to u T u = 1 Blind Source

Separation Problem
⇒ maximize L(u, λ) = u T Σu − λ(u T u − 1), Independent

Component Analysis
(ICA)
⇒ Σu = λu
9.6
PCA, ICA, Blind
Result Source Seperation
Principal eigenvector of Σ is the principal component

k eigenvectors with the k largest eigenvalues form all k
principal components
Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.7
PCA, ICA, Blind
PCA algorithm Source Seperation
1 Given X = [x T1 , . . . , x Tn ]T , compute Σ = n1 XT X
2 Compute all eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λp of Σ, and the
corresponding eigenvectors u 1 , u 2 , . . . , u p
3 Pick k ≤ p eigenvectors with the largest eigenvalues, i.e.,
u1, u2, . . . , uk Unsupervised
Learning
   T 
zi1 x i u1 Principal Component
Analysis (PCA)
zi2  x T u 2 
   i Blind Source
4 For all i = 1, . . . , n, let z i =  .  =  . 

.
 .   .  . Separation Problem
Blind Source
zik x Ti u k Separation Problem
Independent
5 {z i }ni=1 is the k -dimensional approximation to {x i }ni=1 Component Analysis
(ICA)
Can we recover x i exactly from z i ?
xˆi = zi1 u 1 + zi2 u 2 + . . . + zik u k
9.8
PCA, ICA, Blind
PCA example - Arrest dataset Source Seperation
Number of arrests per 100, 000 residents for 50 states in

the US (n = 50)
X1 =Assault, X2 =Murder, X3 =Rape, X4 =UrbanPop
(percent living in urban areas), p = 4
Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.9
PCA, ICA, Blind
PCA example - Arrest dataset Source Seperation
Number of arrests per 100, 000 residents for 50 states in

the US (n = 50)
X1 =Assault, X2 =Murder, X3 =Rape, X4 =UrbanPop
(percent living in urban areas), p = 4
Unsupervised
Learning
Normalize the data, and then, apply PCA (k = 2) Principal Component
Analysis (PCA)
Principal
 component
 loading vectors:
 Blind Source
0.54 −0.42 Separation Problem
0.58 −0.19 Blind Source

u1 = 0.54 u 2 =  0.17 
   Separation Problem
Independent
Component Analysis
0.28 0.87 (ICA)
For each x i = [xi1 , xi2 , xi3 , xi4 ]T compute principal

component scores z i = [zi1 , zi2 ]T , where
zi1 = x Ti u 1 = 0.54xi1 + 0.58xi2 + 0.54xi3 + 0.28xi4

zi2 = x Ti u 2 = −0.42xi1 − 0.19xi2 + 0.17xi3 + 0.87xi4
9.9
PCA, ICA, Blind
Data visualization using
−0.5
PCA 0.0 0.5 Source Seperation
UrbanPop
3
2
0.5
Hawaii California
Rhode Island
Massachusetts
Utah New Jersey
Second Principal Component
Connecticut
1
Washington Colorado
New York Nevada
Minnesota Pennsylvania
Ohio IllinoisArizona Unsupervised
Wisconsin Oregon Rape
Texas
Learning
Delaware Missouri
Oklahoma
Kansas
Nebraska Indiana Michigan
Iowa Principal Component
0.0
New Hampshire
0
Florida
Idaho Virginia New Mexico Analysis (PCA)
Maine Wyoming
Maryland
North Dakota Montana Blind Source
Assault
South Dakota Tennessee
Louisiana
Separation Problem
Kentucky
−1
Arkansas Alabama Alaska

Georgia Blind Source
VermontWest Virginia Murder Separation Problem
−0.5
South Carolina
Independent
−2
Component Analysis
North Carolina
Mississippi
(ICA)
−3
−3 −2 −1 0 1 2 3
First Principal Component
zi1 = x Ti u 1 = 0.54xi1 + 0.58xi2 + 0.54xi3 + 0.28xi4

zi2 = x Ti u 2 = −0.42xi1 − 0.19xi2 + 0.17xi3 + 0.87xi4
9.10
Figure 10.1 from “An introduction to statistical learning" by James et al.
PCA, ICA, Blind
Interpretation of the results Source Seperation
Crime-related variables are correlated with each other

UrbonPop is less correlated with crime-related variables
States with high value in first component = states with high
crime rates
Unsupervised
States with high value in the second component = states Learning
with high level of urbanization Principal Component

Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.11
PCA, ICA, Blind
Importance of standardization in PCA Source Seperation
Scaled Unscaled
−0.5 0.0 0.5 −0.5 0.0 0.5 1.0
1.0
UrbanPop
3
UrbanPop
150
2

0.5
100
** ** *
0.5
*
*
1
* * Unsupervised
** *
50
* * * * Rape
* * * Learning
* ** * * Rape
*
0.0
** * * ** * *
0
* * ** * * ** **
* * ** * * Principal Component
0.0
* * * * * *
* * **
0
* * * *Murder * * Assault
* * Assault * ** * * * * *
* * * * Analysis (PCA)
* * * * *
−1
* * * * *
* *
−50
* *
Murder Blind Source
*
−0.5
Separation Problem
−0.5
−2
−100
* Blind Source
*
Separation Problem
−3
Independent
−3 −2 −1 0 1 2 3 −100 −50 0 50 100 150 Component Analysis
(ICA)
First Principal Component First Principal Component
Left: unit variance, Right: no scaling
Figure 10.3 from “An introduction to statistical learning" by James et al. 9.12
PCA, ICA, Blind
How to choose k ? Source Seperation
Proportion of variance explained (PVE)

Total variance:
p n
X 1X
xij2
n Unsupervised
j=1 i=1 Learning
Principal Component
Variance explained by the mth principal component Analysis (PCA)
Blind Source
 2 Separation Problem
n n n p
1 X
2 1 X 1 X X Blind Source
zim = (x Ti u m )2 =  xij umj  Separation Problem
n n n Independent
i=1 i=1 i=1 j=1 Component Analysis
(ICA)
Pn Pp 2
i=1 ( j=1
xij umj )
PVE(m) = Pp P n 2
j=1 i=1 xij
Pk
PVE(first k ) = m=1 PVE(m)
9.13
PCA, ICA, Blind
How to choose k ? Source Seperation
1.0
1.0
Cumulative Prop. Variance Explained
0.8
0.8
Prop. Variance Explained
0.6
0.6
Unsupervised
Learning
0.4
0.4
Principal Component
Analysis (PCA)
0.2
0.2
Blind Source
Separation Problem
0.0
0.0
Blind Source
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Separation Problem
Independent
Principal Component Principal Component Component Analysis
(ICA)
Left: PVE(m). Right: PVE(first k )
Figure 10.4 from “An introduction to statistical learning" by James et al. 9.14
PCA, ICA, Blind
Blind source separation problem Source Seperation
The cocktail party problem:
Sources Mixing Observations Estimation
A Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
B
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.15
PCA, ICA, Blind
Blind source separation problem Source Seperation
Observations:
     
x1 (1) x1 (2) x1 (T )
x2 (1) , x2 (2) , . . . , x2 (T )
x3 (1) x3 (2) x3 (T )
Unsupervised
Source signals (unobserved): Learning
Principal Component
      Analysis (PCA)
s1 (1) s1 (2) s1 (T ) Blind Source
s2 (1) , s2 (2) , . . . , s2 (T ) Separation Problem
s3 (1) s3 (2) s3 (T ) Blind Source

Separation Problem
Independent
Mixing model (coefficients aij unknown): Component Analysis
(ICA)
x1 (t) = a11 s1 (t) + a12 s2 (t) + a13 s3 (t)

x2 (t) = a21 s1 (t) + a22 s2 (t) + a23 s3 (t)
x3 (t) = a31 s1 (t) + a32 s2 (t) + a33 s3 (t)
Goal: Estimate the source signals {si (t)}

9.16
PCA, ICA, Blind
Applications Source Seperation
Image denoising
Medical signal processing
Brain computer interfaces
Time series analysis
Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.17
PCA, ICA, Blind
Independent Component Analysis (ICA) Source Seperation
Drop time index t (consider a single time step). Assume

signals are independent over time.
p sources, p observations
Unsupervised
Learning
x1 = a11 s1 + a12 s2 + . . . + a1p sp Principal Component
Analysis (PCA)
..
. Blind Source
Separation Problem
xp = ap1 s1 + ap2 s2 + . . . + app sp Blind Source

Separation Problem
⇒x = A s Independent
Component Analysis
|{z}
unknown mixing matrix (ICA)
Let W = A−1 (unmixing matrix)

s = Wx
How can we learn W from x(1), x(2), . . . , x(T ) only?
9.18
PCA, ICA, Blind
Ambiguities in ICA Source Seperation
To what degree can we recover W?

1 (Order ambiguity) The recovered signal
[ŝ1 (t), ŝ2 (t), . . . , ŝp (t)]T will be a permutation of
[s1 (t), s2 (t), . . . , sp (t)]T .
2 (Scaling ambiguity) The recovered signal ŝj (t) will be Unsupervised
Learning
αsj (t) for some non-zero α
Principal Component
3 (Sign ambiguity) Sign of the recovered signal ŝj (t) can be Analysis (PCA)
Blind Source
the same as or different from sj (t) Separation Problem
Blind Source
These ambiguities do not create a problem in most Separation Problem
Independent
applications Component Analysis
(ICA)
9.19
PCA, ICA, Blind
Ambiguities in ICA Source Seperation
Sources must be non-Gaussian!
Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.20
PCA, ICA, Blind
pdf of a linear transformation of a random vector Source Seperation
pS (s): pdf of random vector S = [S1 , . . . , Sp ]T (s ∈ Rp )

X = AS (A invertible)
We have
pX (x) = pS (A−1 x)|det(A)|−1 Unsupervised

Learning
⇒ pX (x) = pS (Wx)|det(W)| Principal Component

Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
9.21
PCA, ICA, Blind
ICA algorithm Source Seperation
Assume that the sources are independent random

variables
p
Y
pS (s) = pSj (sj )
j=1 Unsupervised
Learning
Principal Component
Independence holds both over sources and over time! Analysis (PCA)
Densities are known! Blind Source

Separation Problem
Compute the density of the observation Blind Source

Separation Problem
p Independent
Y Component Analysis
pX (x) = pSj (w Tj x)|det(W)| (ICA)
j=1
 T
w1
w T2 
where W =  . 
 
 .. 
w Tp
9.22
PCA, ICA, Blind
ICA algorithm - estimating W Source Seperation
Likelihood:
T
Y
L(W) = pX (x(t))
t=1
  Unsupervised
T p Learning
Y Y
=  pSj (w Tj x(t))|det(W)| Principal Component
Analysis (PCA)
t=1 j=1 Blind Source
Separation Problem
Log-likelihood: Blind Source

Separation Problem
Independent
  Component Analysis
p (ICA)
T
X X
l(W) =  log pSj (w Tj x(t)) + log |det(W)|
t=1 j=1
MLE:
Ŵ = arg max l(W)

W
9.23
PCA, ICA, Blind
ICA - example Source Seperation
Unsupervised
Learning
Principal Component
Analysis (PCA)
Blind Source
Separation Problem
Blind Source
Separation Problem
Independent
Component Analysis
(ICA)
Figure 12.20 from “Machine learning: A probabilistic perspective" by Kevin Murphy 9.24

PCA, ICA, Blind Source Seperation: EEE 485/585 Statistical Learning and Data Analytics

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PCA, ICA, Blind Source Seperation: EEE 485/585 Statistical Learning and Data Analytics

Uploaded by

Copyright:

Available Formats

PCA, ICA, Blind

Cannot be distributed outside this class without the permission of the

Supervised learning: Training data = {(x i , yi )}ni=1

x i = [xi1 , xi2 , . . . , xip ]T

Major axis of variation = first principal component

maximize u T Σu subject to ||u|| = 1 Unsupervised

maximize u T Σu subject to ||u|| = 1 Unsupervised

Method of Lagrange multipliers: Principal Component

⇒ maximize u T Σu subject to u T u = 1 Blind Source

⇒ maximize L(u, λ) = u T Σu − λ(u T u − 1), Independent

Principal eigenvector of Σ is the principal component

Can we recover x i exactly from z i ?

xˆi = zi1 u 1 + zi2 u 2 + . . . + zik u k

Number of arrests per 100, 000 residents for 50 states in

Number of arrests per 100, 000 residents for 50 states in

0.58 −0.19 Blind Source

For each x i = [xi1 , xi2 , xi3 , xi4 ]T compute principal

zi1 = x Ti u 1 = 0.54xi1 + 0.58xi2 + 0.54xi3 + 0.28xi4

Arkansas Alabama Alaska

First Principal Component

zi1 = x Ti u 1 = 0.54xi1 + 0.58xi2 + 0.54xi3 + 0.28xi4

Crime-related variables are correlated with each other

with high level of urbanization Principal Component

Second Principal Component

Left: unit variance, Right: no scaling

Proportion of variance explained (PVE)

Left: PVE(m). Right: PVE(first k )

The cocktail party problem:

Sources Mixing Observations Estimation

s3 (1) s3 (2) s3 (T ) Blind Source

x1 (t) = a11 s1 (t) + a12 s2 (t) + a13 s3 (t)

Goal: Estimate the source signals {si (t)}

Drop time index t (consider a single time step). Assume

xp = ap1 s1 + ap2 s2 + . . . + app sp Blind Source

Let W = A−1 (unmixing matrix)

To what degree can we recover W?

Sources must be non-Gaussian!

pS (s): pdf of random vector S = [S1 , . . . , Sp ]T (s ∈ Rp )

pX (x) = pS (A−1 x)|det(A)|−1 Unsupervised

⇒ pX (x) = pS (Wx)|det(W)| Principal Component

Assume that the sources are independent random

Densities are known! Blind Source

Compute the density of the observation Blind Source

Log-likelihood: Blind Source

Ŵ = arg max l(W)

You might also like