You are on page 1of 26

PCA, ICA, Blind

Source Seperation

Chapter 9
PCA, ICA, Blind Source Seperation
EEE 485/585 Statistical Learning and Data Analytics Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

Cem Tekin
Bilkent University

Cannot be distributed outside this class without the permission of the


instructor. 9.1
PCA, ICA, Blind
Unsupervised learning Source Seperation

Supervised learning: Training data = {(x i , yi )}ni=1


Unsupervised learning: Training data = {x i }ni=1
Can be used on its own, but also as a pre-processing step
before supervised learning!
Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.2
PCA, ICA, Blind
Principal component analysis (PCA) Source Seperation

x i = [xi1 , xi2 , . . . , xip ]T


Can we find z i = [zi1 , zi2 , . . . , zik ]T , k << p such that
x i ≈ z i (in some sense)?
Find a k -dimensional subspace in which the data
approximately lies. Unsupervised
Learning

I.e., the subspace captures almost all variation in the data. Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.3
PCA, ICA, Blind
Normalize Source Seperation

1
Pn
x̄j = n i=1 xij , j = 1, . . . , p
xij ← xij − x̄j , j = 1, . . . , p, i = 1, . . . , n.
Pn
σj2 = n1 i=1 xij2 , j = 1, . . . , p
xij
xij ← σj , j = 1, . . . , p, i = 1, . . . , n.
Unsupervised
Learning
Dividing by σj is not necessary if attributes are on the
Principal Component
same scale. Ex: Each correspond to value in USD Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.4
PCA, ICA, Blind
Computing the major axis of variation Source Seperation

Major axis of variation = first principal component


Projection of x = [x1 , . . . , xp ] to u = [u1 , . . . , up ]T is
z = x T uu
Variance of the projections:
Unsupervised
Learning

n n Principal Component
1X T 2 1X T T T Analysis (PCA)
(x i u) = (x i u) (x i u) Blind Source
n n Separation Problem
i=1 i=1
Blind Source
n
1 X
T
Separation Problem
= u x i x Ti u Independent
n Component Analysis
i=1 (ICA)
n
!
1X
= uT x i x Ti u
n
i=1

9.5
PCA, ICA, Blind
Optimization problem Source Seperation

Pn
Σ = n1 i=1 x i x Ti is the sample covariance matrix of the
data

maximize u T Σu subject to ||u|| = 1 Unsupervised


Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.6
PCA, ICA, Blind
Optimization problem Source Seperation

Pn
Σ = n1 i=1 x i x Ti is the sample covariance matrix of the
data

maximize u T Σu subject to ||u|| = 1 Unsupervised


Learning

Method of Lagrange multipliers: Principal Component


Analysis (PCA)

Blind Source
Separation Problem

⇒ maximize u T Σu subject to u T u = 1 Blind Source


Separation Problem

⇒ maximize L(u, λ) = u T Σu − λ(u T u − 1), Independent


Component Analysis
(ICA)
⇒ Σu = λu

9.6
PCA, ICA, Blind
Result Source Seperation

Principal eigenvector of Σ is the principal component


k eigenvectors with the k largest eigenvalues form all k
principal components

Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.7
PCA, ICA, Blind
PCA algorithm Source Seperation

1 Given X = [x T1 , . . . , x Tn ]T , compute Σ = n1 XT X
2 Compute all eigenvalues λ1 ≥ λ2 ≥ . . . ≥ λp of Σ, and the
corresponding eigenvectors u 1 , u 2 , . . . , u p
3 Pick k ≤ p eigenvectors with the largest eigenvalues, i.e.,
u1, u2, . . . , uk Unsupervised
Learning
   T 
zi1 x i u1 Principal Component
Analysis (PCA)
zi2  x T u 2 
   i Blind Source
4 For all i = 1, . . . , n, let z i =  .  =  . 

.
 .   .  . Separation Problem

Blind Source
zik x Ti u k Separation Problem

Independent
5 {z i }ni=1 is the k -dimensional approximation to {x i }ni=1 Component Analysis
(ICA)

Can we recover x i exactly from z i ?

xˆi = zi1 u 1 + zi2 u 2 + . . . + zik u k

9.8
PCA, ICA, Blind
PCA example - Arrest dataset Source Seperation

Number of arrests per 100, 000 residents for 50 states in


the US (n = 50)
X1 =Assault, X2 =Murder, X3 =Rape, X4 =UrbanPop
(percent living in urban areas), p = 4
Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.9
PCA, ICA, Blind
PCA example - Arrest dataset Source Seperation

Number of arrests per 100, 000 residents for 50 states in


the US (n = 50)
X1 =Assault, X2 =Murder, X3 =Rape, X4 =UrbanPop
(percent living in urban areas), p = 4
Unsupervised
Learning
Normalize the data, and then, apply PCA (k = 2) Principal Component
Analysis (PCA)
Principal
 component
 loading vectors:
 Blind Source
0.54 −0.42 Separation Problem

0.58 −0.19 Blind Source


u1 = 0.54 u 2 =  0.17 
   Separation Problem

Independent
Component Analysis
0.28 0.87 (ICA)

For each x i = [xi1 , xi2 , xi3 , xi4 ]T compute principal


component scores z i = [zi1 , zi2 ]T , where

zi1 = x Ti u 1 = 0.54xi1 + 0.58xi2 + 0.54xi3 + 0.28xi4


zi2 = x Ti u 2 = −0.42xi1 − 0.19xi2 + 0.17xi3 + 0.87xi4

9.9
PCA, ICA, Blind
Data visualization using
−0.5
PCA 0.0 0.5 Source Seperation

UrbanPop

3
2

0.5
Hawaii California
Rhode Island
Massachusetts
Utah New Jersey
Second Principal Component

Connecticut
1

Washington Colorado
New York Nevada
Minnesota Pennsylvania
Ohio IllinoisArizona Unsupervised
Wisconsin Oregon Rape
Texas
Learning
Delaware Missouri
Oklahoma
Kansas
Nebraska Indiana Michigan
Iowa Principal Component

0.0
New Hampshire
0

Florida
Idaho Virginia New Mexico Analysis (PCA)
Maine Wyoming
Maryland
North Dakota Montana Blind Source
Assault
South Dakota Tennessee
Louisiana
Separation Problem
Kentucky
−1

Arkansas Alabama Alaska


Georgia Blind Source
VermontWest Virginia Murder Separation Problem

−0.5
South Carolina
Independent
−2

Component Analysis
North Carolina
Mississippi
(ICA)
−3

−3 −2 −1 0 1 2 3

First Principal Component

zi1 = x Ti u 1 = 0.54xi1 + 0.58xi2 + 0.54xi3 + 0.28xi4


zi2 = x Ti u 2 = −0.42xi1 − 0.19xi2 + 0.17xi3 + 0.87xi4

9.10
Figure 10.1 from “An introduction to statistical learning" by James et al.
PCA, ICA, Blind
Interpretation of the results Source Seperation

Crime-related variables are correlated with each other


UrbonPop is less correlated with crime-related variables
States with high value in first component = states with high
crime rates
Unsupervised
States with high value in the second component = states Learning

with high level of urbanization Principal Component


Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.11
PCA, ICA, Blind
Importance of standardization in PCA Source Seperation

Scaled Unscaled
−0.5 0.0 0.5 −0.5 0.0 0.5 1.0

1.0
UrbanPop
3

UrbanPop

150
2
Second Principal Component

Second Principal Component


0.5

100
** ** *

0.5
*
*
1

* * Unsupervised
** *

50
* * * * Rape
* * * Learning
* ** * * Rape
*

0.0
** * * ** * *
0

* * ** * * ** **
* * ** * * Principal Component

0.0
* * * * * *
* * **

0
* * * *Murder * * Assault
* * Assault * ** * * * * *
* * * * Analysis (PCA)
* * * * *
−1

* * * * *
* *

−50
* *
Murder Blind Source
*

−0.5
Separation Problem

−0.5
−2

−100
* Blind Source
*
Separation Problem
−3

Independent
−3 −2 −1 0 1 2 3 −100 −50 0 50 100 150 Component Analysis
(ICA)
First Principal Component First Principal Component

Left: unit variance, Right: no scaling

Figure 10.3 from “An introduction to statistical learning" by James et al. 9.12
PCA, ICA, Blind
How to choose k ? Source Seperation

Proportion of variance explained (PVE)


Total variance:
p n
X 1X
xij2
n Unsupervised
j=1 i=1 Learning

Principal Component
Variance explained by the mth principal component Analysis (PCA)

Blind Source
 2 Separation Problem
n n n p
1 X
2 1 X 1 X X Blind Source
zim = (x Ti u m )2 =  xij umj  Separation Problem

n n n Independent
i=1 i=1 i=1 j=1 Component Analysis
(ICA)

Pn Pp 2
i=1 ( j=1
xij umj )
PVE(m) = Pp P n 2
j=1 i=1 xij
Pk
PVE(first k ) = m=1 PVE(m)

9.13
PCA, ICA, Blind
How to choose k ? Source Seperation

1.0

1.0
Cumulative Prop. Variance Explained
0.8

0.8
Prop. Variance Explained

0.6

0.6
Unsupervised
Learning
0.4

0.4
Principal Component
Analysis (PCA)
0.2

0.2
Blind Source
Separation Problem
0.0

0.0
Blind Source
1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0 Separation Problem

Independent
Principal Component Principal Component Component Analysis
(ICA)

Left: PVE(m). Right: PVE(first k )

Figure 10.4 from “An introduction to statistical learning" by James et al. 9.14
PCA, ICA, Blind
Blind source separation problem Source Seperation

The cocktail party problem:

Sources Mixing Observations Estimation

A Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source

B
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.15
PCA, ICA, Blind
Blind source separation problem Source Seperation

Observations:
     
x1 (1) x1 (2) x1 (T )
x2 (1) , x2 (2) , . . . , x2 (T )
x3 (1) x3 (2) x3 (T )
Unsupervised
Source signals (unobserved): Learning

Principal Component
      Analysis (PCA)
s1 (1) s1 (2) s1 (T ) Blind Source
s2 (1) , s2 (2) , . . . , s2 (T ) Separation Problem

s3 (1) s3 (2) s3 (T ) Blind Source


Separation Problem

Independent
Mixing model (coefficients aij unknown): Component Analysis
(ICA)

x1 (t) = a11 s1 (t) + a12 s2 (t) + a13 s3 (t)


x2 (t) = a21 s1 (t) + a22 s2 (t) + a23 s3 (t)
x3 (t) = a31 s1 (t) + a32 s2 (t) + a33 s3 (t)

Goal: Estimate the source signals {si (t)}


9.16
PCA, ICA, Blind
Applications Source Seperation

Image denoising
Medical signal processing
Brain computer interfaces
Time series analysis
Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.17
PCA, ICA, Blind
Independent Component Analysis (ICA) Source Seperation

Drop time index t (consider a single time step). Assume


signals are independent over time.
p sources, p observations

Unsupervised
Learning
x1 = a11 s1 + a12 s2 + . . . + a1p sp Principal Component
Analysis (PCA)
..
. Blind Source
Separation Problem

xp = ap1 s1 + ap2 s2 + . . . + app sp Blind Source


Separation Problem
⇒x = A s Independent
Component Analysis
|{z}
unknown mixing matrix (ICA)

Let W = A−1 (unmixing matrix)


s = Wx
How can we learn W from x(1), x(2), . . . , x(T ) only?

9.18
PCA, ICA, Blind
Ambiguities in ICA Source Seperation

To what degree can we recover W?


1 (Order ambiguity) The recovered signal
[ŝ1 (t), ŝ2 (t), . . . , ŝp (t)]T will be a permutation of
[s1 (t), s2 (t), . . . , sp (t)]T .
2 (Scaling ambiguity) The recovered signal ŝj (t) will be Unsupervised
Learning
αsj (t) for some non-zero α
Principal Component
3 (Sign ambiguity) Sign of the recovered signal ŝj (t) can be Analysis (PCA)

Blind Source
the same as or different from sj (t) Separation Problem

Blind Source
These ambiguities do not create a problem in most Separation Problem

Independent
applications Component Analysis
(ICA)

9.19
PCA, ICA, Blind
Ambiguities in ICA Source Seperation

Sources must be non-Gaussian!

Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.20
PCA, ICA, Blind
pdf of a linear transformation of a random vector Source Seperation

pS (s): pdf of random vector S = [S1 , . . . , Sp ]T (s ∈ Rp )


X = AS (A invertible)
We have

pX (x) = pS (A−1 x)|det(A)|−1 Unsupervised


Learning

⇒ pX (x) = pS (Wx)|det(W)| Principal Component


Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

9.21
PCA, ICA, Blind
ICA algorithm Source Seperation

Assume that the sources are independent random


variables
p
Y
pS (s) = pSj (sj )
j=1 Unsupervised
Learning

Principal Component
Independence holds both over sources and over time! Analysis (PCA)

Densities are known! Blind Source


Separation Problem

Compute the density of the observation Blind Source


Separation Problem

p Independent
Y Component Analysis
pX (x) = pSj (w Tj x)|det(W)| (ICA)

j=1

 T
w1
w T2 
where W =  . 
 
 .. 
w Tp
9.22
PCA, ICA, Blind
ICA algorithm - estimating W Source Seperation

Likelihood:

T
Y
L(W) = pX (x(t))
t=1
  Unsupervised
T p Learning
Y Y
=  pSj (w Tj x(t))|det(W)| Principal Component
Analysis (PCA)
t=1 j=1 Blind Source
Separation Problem

Log-likelihood: Blind Source


Separation Problem

Independent
  Component Analysis
p (ICA)
T
X X
l(W) =  log pSj (w Tj x(t)) + log |det(W)|
t=1 j=1

MLE:

Ŵ = arg max l(W)


W
9.23
PCA, ICA, Blind
ICA - example Source Seperation

Unsupervised
Learning

Principal Component
Analysis (PCA)

Blind Source
Separation Problem

Blind Source
Separation Problem

Independent
Component Analysis
(ICA)

Figure 12.20 from “Machine learning: A probabilistic perspective" by Kevin Murphy 9.24

You might also like