You are on page 1of 102

A short tutorial on blind source separation

Fabian J. Theis
Institute of Biophysics University of Regensburg, Germany fabian@theis.name

TUAT, Tokyo, 28-3-2005

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Why do we need BSS?


e.g. for the analysis of biomedical signals such as EEG, MEG or fMRI recordings
Fp1 Fp2 F3 F4

EEG Channel / Electrode

C3 C4 P3 P4 O1 O2 F7 F8 T3 T4 T5 T6 Fz Cz Pz A1 A2

61

62

63

65

65

66

67

Time [s]

.
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Why do we need BSS?


e.g. for the analysis of biomedical signals such as EEG, MEG or fMRI recordings
Fp1 Fp2 F3 F4

EEG Channel / Electrode

C3 C4 P3 P4 O1 O2 F7 F8 T3

eye movement
Independent Signals [a.u.]

sensor noise

heart beat

T4 T5 T6 Fz Cz Pz A1 A2

61

62

63

65

65

66

67

61

62

63

64

65

66

67

Time [s]

Time [s]

.
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Outline
Motivation Basics
Probability theory Information theory

Linear blind source separation


Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Conclusions

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

Outline
Motivation Basics
Probability theory Information theory

Linear blind source separation


Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Conclusions

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

A one-page primer on probability theory


main object: random variable/vector X
denition: a measurable function on a probability space determined a.e. by its density pX : Rn [0, )

properties of a probability density function (pdf)


p (x) dx = 1 Rn X transformation: pAX (y) = | det A|1 pX (A1 y)

indices derived from densities (probabilistic quantities)


expectation or mean: E(X) = Rn xpX (x) dx covariance: Cov(X) = E((X E(X))(X E(X)) )

decorrelation and independence


X is decorrelated if Cov(X) is diagonal and white if Cov(X) = I X is independent if its density factorizes pX (x1 , . . . , xn ) = pX1 (x1 ) . . . pXn (xn ) independent decorrelated (but not vice versa in general)

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

A one-page primer on probability theory


main object: random variable/vector X
denition: a measurable function on a probability space determined a.e. by its density pX : Rn [0, )

properties of a probability density function (pdf)


p (x) dx = 1 Rn X transformation: pAX (y) = | det A|1 pX (A1 y)

indices derived from densities (probabilistic quantities)


expectation or mean: E(X) = Rn xpX (x) dx covariance: Cov(X) = E((X E(X))(X E(X)) )

decorrelation and independence


X is decorrelated if Cov(X) is diagonal and white if Cov(X) = I X is independent if its density factorizes pX (x1 , . . . , xn ) = pX1 (x1 ) . . . pXn (xn ) independent decorrelated (but not vice versa in general)

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

A one-page primer on probability theory


main object: random variable/vector X
denition: a measurable function on a probability space determined a.e. by its density pX : Rn [0, )

properties of a probability density function (pdf)


p (x) dx = 1 Rn X transformation: pAX (y) = | det A|1 pX (A1 y)

indices derived from densities (probabilistic quantities)


expectation or mean: E(X) = Rn xpX (x) dx covariance: Cov(X) = E((X E(X))(X E(X)) )

decorrelation and independence


X is decorrelated if Cov(X) is diagonal and white if Cov(X) = I X is independent if its density factorizes pX (x1 , . . . , xn ) = pX1 (x1 ) . . . pXn (xn ) independent decorrelated (but not vice versa in general)

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

A one-page primer on probability theory


main object: random variable/vector X
denition: a measurable function on a probability space determined a.e. by its density pX : Rn [0, )

properties of a probability density function (pdf)


p (x) dx = 1 Rn X transformation: pAX (y) = | det A|1 pX (A1 y)

indices derived from densities (probabilistic quantities)


expectation or mean: E(X) = Rn xpX (x) dx covariance: Cov(X) = E((X E(X))(X E(X)) )

decorrelation and independence


X is decorrelated if Cov(X) is diagonal and white if Cov(X) = I X is independent if its density factorizes pX (x1 , . . . , xn ) = pX1 (x1 ) . . . pXn (xn ) independent decorrelated (but not vice versa in general)

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

0.25
0.5

0.2
0.4

0.15
0.3

0.1

0.2

0.05

0.1

0 2 1 0 0 1 2 2 1 1 2

0 2 1 0 0 1 y 2 2 1 x 1 2

uniform pX =
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2 1 0

1 vol(K ) 1K

Laplacian pX (x) =

exp

n i=1

|xi |

2 1 0 1 2 2 1

Gaussian pX (x) = c exp 1 (x ) C1 (x ) 2


Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

0.25
0.5

0.2
0.4

0.15
0.3

0.1

0.2

0.05

0.1

0 2 1 0 0 1 2 2 1 1 2

0 2 1 0 0 1 y 2 2 1 x 1 2

uniform pX =
0.16 0.14 0.12 0.1 0.08 0.06 0.04 0.02 0 2 1 0

1 vol(K ) 1K

Laplacian pX (x) =

exp

n i=1

|xi |

2 1 0 1 2 2 1

properties of a Gaussian X X decorrelated independent AX Gaussian X independent AX independent if A orthogonal c = 1n


(2) det C

Gaussian pX (x) = c exp 1 (x ) C1 (x ) 2


Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

The second page...


higher-order moments
central moment of a random variable X (n = 1): j (X ) := E ((X E (X ))j ) so 1 (X ) = E (X ) mean and 2 (X ) = Cov(X ) =: var(X ) variance 3 (X ) is called skewness measures asymmetry (3 (X ) = 0 is X symmetric)

kurtosis
the combination of moments kurt(X ) := E (X 4 ) 3(E (X 2 ))2 is called kurtosis of X kurt(X ) = 0 if X Gaussian, < 0 if uniform and > 0 if laplacian

sampling
in practice density is unknown only some samples i.e. values of random function are given given independent (Xi )i=1,...n with same density p, then X1 (), . . . , Xn () for some event are called i.i.d. samples of p strong theorem of large numbers: given a pairwise i.i.d. sequence (Xi )iN in L1 (), then (for almost all ) 1 limn n n Xi () E (X1 ) = 0 i=1
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

The second page...


higher-order moments
central moment of a random variable X (n = 1): j (X ) := E ((X E (X ))j ) so 1 (X ) = E (X ) mean and 2 (X ) = Cov(X ) =: var(X ) variance 3 (X ) is called skewness measures asymmetry (3 (X ) = 0 is X symmetric)

kurtosis
the combination of moments kurt(X ) := E (X 4 ) 3(E (X 2 ))2 is called kurtosis of X kurt(X ) = 0 if X Gaussian, < 0 if uniform and > 0 if laplacian

sampling
in practice density is unknown only some samples i.e. values of random function are given given independent (Xi )i=1,...n with same density p, then X1 (), . . . , Xn () for some event are called i.i.d. samples of p strong theorem of large numbers: given a pairwise i.i.d. sequence (Xi )iN in L1 (), then (for almost all ) 1 limn n n Xi () E (X1 ) = 0 i=1
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

The second page...


higher-order moments
central moment of a random variable X (n = 1): j (X ) := E ((X E (X ))j ) so 1 (X ) = E (X ) mean and 2 (X ) = Cov(X ) =: var(X ) variance 3 (X ) is called skewness measures asymmetry (3 (X ) = 0 is X symmetric)

kurtosis
the combination of moments kurt(X ) := E (X 4 ) 3(E (X 2 ))2 is called kurtosis of X kurt(X ) = 0 if X Gaussian, < 0 if uniform and > 0 if laplacian

sampling
in practice density is unknown only some samples i.e. values of random function are given given independent (Xi )i=1,...n with same density p, then X1 (), . . . , Xn () for some event are called i.i.d. samples of p strong theorem of large numbers: given a pairwise i.i.d. sequence (Xi )iN in L1 (), then (for almost all ) 1 limn n n Xi () E (X1 ) = 0 i=1
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

Kurtosis example
0.8 0.8 1.5

0.6

0.6 1

0.4

0.4 0.5

0.2

0.2

0 2 0 2

0 2 0 2

0 1 0 1

random variables with dierent kurtosis


blue densities: centered white Gaussian (kurt = 0) left: Laplacian with = 2 middle: uniform density in [ 3, 3], kurtosis 1.2 1 right: subgaussian random variable X := cos(Y ) with Y uniform 21 in [, ], kurtosis 8
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

An even shorter introduction to information theory


entropy
H(X) := EX (log2 pX ) is called the (dierential) entropy of X transformation: H(AX) = H(X) + EX (log | det A|) given X let Xgauss be the Gaussian with mean E(X) and covariance Cov(X); then H(Xgauss ) H(X)

negentropy
negentropy of X is dened by J(X) := H(Xgauss ) H(X) transformation: J(AX) = J(X) 1 1 approximation in 1d: J(X ) = 12 E (X 3 )2 + 48 kurt(X )2 + . . .

information
n I (X) := i=1 H(Xi ) H(X) is called mutual information of X I (X) 0 and I (X) = 0 if and only if X is independent transformation: I (LPX + c) = I (X) for scaling L, permutation P and translation c Rn

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

An even shorter introduction to information theory


entropy
H(X) := EX (log2 pX ) is called the (dierential) entropy of X transformation: H(AX) = H(X) + EX (log | det A|) given X let Xgauss be the Gaussian with mean E(X) and covariance Cov(X); then H(Xgauss ) H(X)

negentropy
negentropy of X is dened by J(X) := H(Xgauss ) H(X) transformation: J(AX) = J(X) 1 1 approximation in 1d: J(X ) = 12 E (X 3 )2 + 48 kurt(X )2 + . . .

information
n I (X) := i=1 H(Xi ) H(X) is called mutual information of X I (X) 0 and I (X) = 0 if and only if X is independent transformation: I (LPX + c) = I (X) for scaling L, permutation P and translation c Rn

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Probability theory Information theory

An even shorter introduction to information theory


entropy
H(X) := EX (log2 pX ) is called the (dierential) entropy of X transformation: H(AX) = H(X) + EX (log | det A|) given X let Xgauss be the Gaussian with mean E(X) and covariance Cov(X); then H(Xgauss ) H(X)

negentropy
negentropy of X is dened by J(X) := H(Xgauss ) H(X) transformation: J(AX) = J(X) 1 1 approximation in 1d: J(X ) = 12 E (X 3 )2 + 48 kurt(X )2 + . . .

information
n I (X) := i=1 H(Xi ) H(X) is called mutual information of X I (X) 0 and I (X) = 0 if and only if X is independent transformation: I (LPX + c) = I (X) for scaling L, permutation P and translation c Rn

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Outline
Motivation Basics
Probability theory Information theory

Linear blind source separation


Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Conclusions

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Example use of ICA for performing BSS

source 1

source 2

mixture 1

mixture 2

sources

mixtures

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Example use of ICA for performing BSS


source 1 source 2 mixture 1 mixture 2

sources
output 1 ~output 1 output 2

mixtures
~output 2

recovered sources by ICA


Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Linear BSS
Blind source separation (BSS) problem X = AS + N
X observed m-dimensional random vector A (unknown) full-rank real matrix S (unknown) n-dimensional source signals n m N (unknown) noise (often assumed to be white Gaussian)

goal: recover unknown A and S given only X additional assumptions necessary


without, problem ill-posed depending on assumptions FA, PCA, ICA, SCA, NMF

remark: often N = 0 also in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Linear BSS
Blind source separation (BSS) problem X = AS + N
X observed m-dimensional random vector A (unknown) full-rank real matrix S (unknown) n-dimensional source signals n m N (unknown) noise (often assumed to be white Gaussian)

goal: recover unknown A and S given only X additional assumptions necessary


without, problem ill-posed depending on assumptions FA, PCA, ICA, SCA, NMF

remark: often N = 0 also in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Linear BSS
Blind source separation (BSS) problem X = AS + N
X observed m-dimensional random vector A (unknown) full-rank real matrix S (unknown) n-dimensional source signals n m N (unknown) noise (often assumed to be white Gaussian)

goal: recover unknown A and S given only X additional assumptions necessary


without, problem ill-posed depending on assumptions FA, PCA, ICA, SCA, NMF

remark: often N = 0 also in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Linear BSS
Blind source separation (BSS) problem X = AS + N
X observed m-dimensional random vector A (unknown) full-rank real matrix S (unknown) n-dimensional source signals n m N (unknown) noise (often assumed to be white Gaussian)

goal: recover unknown A and S given only X additional assumptions necessary


without, problem ill-posed depending on assumptions FA, PCA, ICA, SCA, NMF

remark: often N = 0 also in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Principal component analysis


principal component analysis (PCA)
also called Karhunen-Lo`ve transformation e very common multivariate data analysis tools transform data to feature space, where few main features (principal components) make up most of the data iteratively project into directions of maximal variance second-order analysis main application: prewhitening and dimension reduction

model and algorithm


assumption: S is decorrelated without loss of generality white construction:
eigenvalue decomposition Cov(X): D = V Cov(X)V with diagonal D and orthogonal V PCA-matrix W is constructed by W := D1/2 V

indeterminacy: unique up to right transformation in orthogonal group

MATLAB example
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Principal component analysis


principal component analysis (PCA)
also called Karhunen-Lo`ve transformation e very common multivariate data analysis tools transform data to feature space, where few main features (principal components) make up most of the data iteratively project into directions of maximal variance second-order analysis main application: prewhitening and dimension reduction

model and algorithm


assumption: S is decorrelated without loss of generality white construction:
eigenvalue decomposition Cov(X): D = V Cov(X)V with diagonal D and orthogonal V PCA-matrix W is constructed by W := D1/2 V

indeterminacy: unique up to right transformation in orthogonal group

MATLAB example
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Principal component analysis


principal component analysis (PCA)
also called Karhunen-Lo`ve transformation e very common multivariate data analysis tools transform data to feature space, where few main features (principal components) make up most of the data iteratively project into directions of maximal variance second-order analysis main application: prewhitening and dimension reduction

model and algorithm


assumption: S is decorrelated without loss of generality white construction:
eigenvalue decomposition Cov(X): D = V Cov(X)V with diagonal D and orthogonal V PCA-matrix W is constructed by W := D1/2 V

indeterminacy: unique up to right transformation in orthogonal group

MATLAB example
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Independent component analysis

Blind source separation (BSS) problem X = AS + N


X observed m-dimensional random vector A (unknown) full-rank real matrix S (unknown) n-dimensional source signals n m N (unknown) noise (often assumed to be white Gaussian)

now: ICA assumption


S independent i.e. I (S) = 0 indeterminacies: only permutation and scaling (!) if S contains at most one Gaussian

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Independent component analysis

Blind source separation (BSS) problem X = AS + N


X observed m-dimensional random vector A (unknown) full-rank real matrix S (unknown) n-dimensional source signals n m N (unknown) noise (often assumed to be white Gaussian)

now: ICA assumption


S independent i.e. I (S) = 0 indeterminacies: only permutation and scaling (!) if S contains at most one Gaussian

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Additional model assumptions


in linear ICA, additional model assumptions are possible sources can be assumed to be centered i.e. E(S) = 0 (coordinate transformation X := X E(X)) white sources
if A := (a1 | . . . |an ), then scaling indeterminacy means X = AS =
n i=1

a i Si =

n i=1

1 i

ai (i Si )

hence normalization is possible e.g. var(Si ) = 1

white mixtures (complete case m = n):


by assumption Cov(S) = I let V be PCA matrix of X then Z := VX is white, and an ICA of Z gives ICA of X

orthogonal A
by assumption Cov(S) = Cov(X) = I hence I = Cov(X) = A Cov(X)A = AA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Additional model assumptions


in linear ICA, additional model assumptions are possible sources can be assumed to be centered i.e. E(S) = 0 (coordinate transformation X := X E(X)) white sources
if A := (a1 | . . . |an ), then scaling indeterminacy means X = AS =
n i=1

a i Si =

n i=1

1 i

ai (i Si )

hence normalization is possible e.g. var(Si ) = 1

white mixtures (complete case m = n):


by assumption Cov(S) = I let V be PCA matrix of X then Z := VX is white, and an ICA of Z gives ICA of X

orthogonal A
by assumption Cov(S) = Cov(X) = I hence I = Cov(X) = A Cov(X)A = AA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Additional model assumptions


in linear ICA, additional model assumptions are possible sources can be assumed to be centered i.e. E(S) = 0 (coordinate transformation X := X E(X)) white sources
if A := (a1 | . . . |an ), then scaling indeterminacy means X = AS =
n i=1

a i Si =

n i=1

1 i

ai (i Si )

hence normalization is possible e.g. var(Si ) = 1

white mixtures (complete case m = n):


by assumption Cov(S) = I let V be PCA matrix of X then Z := VX is white, and an ICA of Z gives ICA of X

orthogonal A
by assumption Cov(S) = Cov(X) = I hence I = Cov(X) = A Cov(X)A = AA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Additional model assumptions


in linear ICA, additional model assumptions are possible sources can be assumed to be centered i.e. E(S) = 0 (coordinate transformation X := X E(X)) white sources
if A := (a1 | . . . |an ), then scaling indeterminacy means X = AS =
n i=1

a i Si =

n i=1

1 i

ai (i Si )

hence normalization is possible e.g. var(Si ) = 1

white mixtures (complete case m = n):


by assumption Cov(S) = I let V be PCA matrix of X then Z := VX is white, and an ICA of Z gives ICA of X

orthogonal A
by assumption Cov(S) = Cov(X) = I hence I = Cov(X) = A Cov(X)A = AA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Additional model assumptions


in linear ICA, additional model assumptions are possible sources can be assumed to be centered i.e. E(S) = 0 (coordinate transformation X := X E(X)) white sources
if A := (a1 | . . . |an ), then scaling indeterminacy means X = AS =
n i=1

a i Si =

n i=1

1 i

ai (i Si )

hence normalization is possible e.g. var(Si ) = 1

white mixtures (complete case m = n):


by assumption Cov(S) = I let V be PCA matrix of X then Z := VX is white, and an ICA of Z gives ICA of X

orthogonal A
by assumption Cov(S) = Cov(X) = I hence I = Cov(X) = A Cov(X)A = AA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

ICA-Algorithms

basic scheme of ICA algorithms (case m = n)


search for invertible W Gl(n) that minimizes some dependence measure of WX for example minimize mutual information I (WX) [Comon, 1994] or maximize neural network output entropy H(f (WX)) [Bell and Sejnowski, 1995] earliest algorithm: extend PCA by performing nonlinear decorrelation [Hrault and Jutten, 1986] e

another dependence measure is explained in more detail in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

ICA-Algorithms

basic scheme of ICA algorithms (case m = n)


search for invertible W Gl(n) that minimizes some dependence measure of WX for example minimize mutual information I (WX) [Comon, 1994] or maximize neural network output entropy H(f (WX)) [Bell and Sejnowski, 1995] earliest algorithm: extend PCA by performing nonlinear decorrelation [Hrault and Jutten, 1986] e

another dependence measure is explained in more detail in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

ICA-Algorithms

basic scheme of ICA algorithms (case m = n)


search for invertible W Gl(n) that minimizes some dependence measure of WX for example minimize mutual information I (WX) [Comon, 1994] or maximize neural network output entropy H(f (WX)) [Bell and Sejnowski, 1995] earliest algorithm: extend PCA by performing nonlinear decorrelation [Hrault and Jutten, 1986] e

another dependence measure is explained in more detail in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

ICA-Algorithms

basic scheme of ICA algorithms (case m = n)


search for invertible W Gl(n) that minimizes some dependence measure of WX for example minimize mutual information I (WX) [Comon, 1994] or maximize neural network output entropy H(f (WX)) [Bell and Sejnowski, 1995] earliest algorithm: extend PCA by performing nonlinear decorrelation [Hrault and Jutten, 1986] e

another dependence measure is explained in more detail in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

ICA-Algorithms

basic scheme of ICA algorithms (case m = n)


search for invertible W Gl(n) that minimizes some dependence measure of WX for example minimize mutual information I (WX) [Comon, 1994] or maximize neural network output entropy H(f (WX)) [Bell and Sejnowski, 1995] earliest algorithm: extend PCA by performing nonlinear decorrelation [Hrault and Jutten, 1986] e

another dependence measure is explained in more detail in the following

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization of non-Gaussianity

basic idea
given X = AS construct ICA matrix W, which ideally equals A1 at rst, recover only one source: search for b Rn with Y = b X = b AS =: q S ideally b is row of A1 , so q = ei central limit theorem: gaussianity ( indep. RVs) > gaussianity (indep. RVs) so Y = q S is more gaussian than all source components Si at ICA solutions Y Si , hence solutions are least gaussian

Algorithm:(FastICA) Find b with b X is maximal nongaussian.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization of non-Gaussianity

basic idea
given X = AS construct ICA matrix W, which ideally equals A1 at rst, recover only one source: search for b Rn with Y = b X = b AS =: q S ideally b is row of A1 , so q = ei central limit theorem: gaussianity ( indep. RVs) > gaussianity (indep. RVs) so Y = q S is more gaussian than all source components Si at ICA solutions Y Si , hence solutions are least gaussian

Algorithm:(FastICA) Find b with b X is maximal nongaussian.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization of non-Gaussianity

basic idea
given X = AS construct ICA matrix W, which ideally equals A1 at rst, recover only one source: search for b Rn with Y = b X = b AS =: q S ideally b is row of A1 , so q = ei central limit theorem: gaussianity ( indep. RVs) > gaussianity (indep. RVs) so Y = q S is more gaussian than all source components Si at ICA solutions Y Si , hence solutions are least gaussian

Algorithm:(FastICA) Find b with b X is maximal nongaussian.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization of non-Gaussianity

basic idea
given X = AS construct ICA matrix W, which ideally equals A1 at rst, recover only one source: search for b Rn with Y = b X = b AS =: q S ideally b is row of A1 , so q = ei central limit theorem: gaussianity ( indep. RVs) > gaussianity (indep. RVs) so Y = q S is more gaussian than all source components Si at ICA solutions Y Si , hence solutions are least gaussian

Algorithm:(FastICA) Find b with b X is maximal nongaussian.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization of non-Gaussianity

basic idea
given X = AS construct ICA matrix W, which ideally equals A1 at rst, recover only one source: search for b Rn with Y = b X = b AS =: q S ideally b is row of A1 , so q = ei central limit theorem: gaussianity ( indep. RVs) > gaussianity (indep. RVs) so Y = q S is more gaussian than all source components Si at ICA solutions Y Si , hence solutions are least gaussian

Algorithm:(FastICA) Find b with b X is maximal nongaussian.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization of non-Gaussianity

basic idea
given X = AS construct ICA matrix W, which ideally equals A1 at rst, recover only one source: search for b Rn with Y = b X = b AS =: q S ideally b is row of A1 , so q = ei central limit theorem: gaussianity ( indep. RVs) > gaussianity (indep. RVs) so Y = q S is more gaussian than all source components Si at ICA solutions Y Si , hence solutions are least gaussian

Algorithm:(FastICA) Find b with b X is maximal nongaussian.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Example: sources
1.5 1.5

0.5

0.5

0.5

0.5

1.5 1.5

0.5

0.5

1.5

1.5 1.5

0.5

0.5

1.5

Kurtosis maximization: source and white mixture scatterplots (rotation by 30 degrees)

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Example: histograms
alpha=0, kurt=0.7306 alpha=10, kurt=0.93051

alpha=20, kurt=1.1106

alpha=30, kurt=1.1866

alpha=40, kurt=1.1227

alpha=50, kurt=0.94904

alpha=60, kurt=0.74824

alpha=70, kurt=0.61611

alpha=80, kurt=0.61603

alpha=90, kurt=0.74861

Kurtosis maximization: histograms

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Measuring non-gaussianity using kurtosis


kurtosis was dened as kurt(Y ) := E (Y 4 ) 3(E (y 2 ))2 . if Y Gaussian, then E (Y 4 ) = 3(E (Y 2 ))2 , so kurt(Y ) = 0 hence kurtosis (or squared kurtosis) gives a simple measure for the deviation from gaussianity assumption of unit variance, E (Y 2 ) = 1: so kurt(Y ) = E (Y 4 ) 3 q1 two-d example: q = A b = . q2 then Y = b X = q S = q1 S1 + q2 S2 . linearity of kurtosis: 4 4 kurt(Y ) = kurt(q1 S1 ) + kurt(q2 S2 ) = q1 kurt(S1 ) + q2 kurt(S2 ). 2 2 2 2 2 normalization: E (S1 ) = E (S2 ) = E (Y ) = 1, so q1 + q2 = 1 i.e. q lies on circle 4 4 but maxima S 1 R, q |q1 kurt(S1 ) + q2 kurt(S2 )| are unit vectors!
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Measuring non-gaussianity using kurtosis


kurtosis was dened as kurt(Y ) := E (Y 4 ) 3(E (y 2 ))2 . if Y Gaussian, then E (Y 4 ) = 3(E (Y 2 ))2 , so kurt(Y ) = 0 hence kurtosis (or squared kurtosis) gives a simple measure for the deviation from gaussianity assumption of unit variance, E (Y 2 ) = 1: so kurt(Y ) = E (Y 4 ) 3 q1 two-d example: q = A b = . q2 then Y = b X = q S = q1 S1 + q2 S2 . linearity of kurtosis: 4 4 kurt(Y ) = kurt(q1 S1 ) + kurt(q2 S2 ) = q1 kurt(S1 ) + q2 kurt(S2 ). 2 2 2 2 2 normalization: E (S1 ) = E (S2 ) = E (Y ) = 1, so q1 + q2 = 1 i.e. q lies on circle 4 4 but maxima S 1 R, q |q1 kurt(S1 ) + q2 kurt(S2 )| are unit vectors!
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Measuring non-gaussianity using kurtosis


kurtosis was dened as kurt(Y ) := E (Y 4 ) 3(E (y 2 ))2 . if Y Gaussian, then E (Y 4 ) = 3(E (Y 2 ))2 , so kurt(Y ) = 0 hence kurtosis (or squared kurtosis) gives a simple measure for the deviation from gaussianity assumption of unit variance, E (Y 2 ) = 1: so kurt(Y ) = E (Y 4 ) 3 q1 two-d example: q = A b = . q2 then Y = b X = q S = q1 S1 + q2 S2 . linearity of kurtosis: 4 4 kurt(Y ) = kurt(q1 S1 ) + kurt(q2 S2 ) = q1 kurt(S1 ) + q2 kurt(S2 ). 2 2 2 2 2 normalization: E (S1 ) = E (S2 ) = E (Y ) = 1, so q1 + q2 = 1 i.e. q lies on circle 4 4 but maxima S 1 R, q |q1 kurt(S1 ) + q2 kurt(S2 )| are unit vectors!
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Measuring non-gaussianity using kurtosis


kurtosis was dened as kurt(Y ) := E (Y 4 ) 3(E (y 2 ))2 . if Y Gaussian, then E (Y 4 ) = 3(E (Y 2 ))2 , so kurt(Y ) = 0 hence kurtosis (or squared kurtosis) gives a simple measure for the deviation from gaussianity assumption of unit variance, E (Y 2 ) = 1: so kurt(Y ) = E (Y 4 ) 3 q1 two-d example: q = A b = . q2 then Y = b X = q S = q1 S1 + q2 S2 . linearity of kurtosis: 4 4 kurt(Y ) = kurt(q1 S1 ) + kurt(q2 S2 ) = q1 kurt(S1 ) + q2 kurt(S2 ). 2 2 2 2 2 normalization: E (S1 ) = E (S2 ) = E (Y ) = 1, so q1 + q2 = 1 i.e. q lies on circle 4 4 but maxima S 1 R, q |q1 kurt(S1 ) + q2 kurt(S2 )| are unit vectors!
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Measuring non-gaussianity using kurtosis


kurtosis was dened as kurt(Y ) := E (Y 4 ) 3(E (y 2 ))2 . if Y Gaussian, then E (Y 4 ) = 3(E (Y 2 ))2 , so kurt(Y ) = 0 hence kurtosis (or squared kurtosis) gives a simple measure for the deviation from gaussianity assumption of unit variance, E (Y 2 ) = 1: so kurt(Y ) = E (Y 4 ) 3 q1 two-d example: q = A b = . q2 then Y = b X = q S = q1 S1 + q2 S2 . linearity of kurtosis: 4 4 kurt(Y ) = kurt(q1 S1 ) + kurt(q2 S2 ) = q1 kurt(S1 ) + q2 kurt(S2 ). 2 2 2 2 2 normalization: E (S1 ) = E (S2 ) = E (Y ) = 1, so q1 + q2 = 1 i.e. q lies on circle 4 4 but maxima S 1 R, q |q1 kurt(S1 ) + q2 kurt(S2 )| are unit vectors!
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Algorithm
S is not known after whitening Z = VX search for w Rn with w Z maximal non-gaussian because of q = (VA) w we get |q|2 = q q = (w VA)(A V w) = |w|2 so if q S n1 also w S n1 Algorithm:(kurtosis maximization) Maximize w | kurt(w Z)| on S n1 after whitening.
1.3 1.2

1.1

0.9

0.8

0.7

0.6

0.5

20

40

60

80

100

120

140

160

180

200

absolute kurtosis versus angle, plotted is | kurt((cos() sin())Z)| with the uniform Z]
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Algorithm
S is not known after whitening Z = VX search for w Rn with w Z maximal non-gaussian because of q = (VA) w we get |q|2 = q q = (w VA)(A V w) = |w|2 so if q S n1 also w S n1 Algorithm:(kurtosis maximization) Maximize w | kurt(w Z)| on S n1 after whitening.
1.3 1.2

1.1

0.9

0.8

0.7

0.6

0.5

20

40

60

80

100

120

140

160

180

200

absolute kurtosis versus angle, plotted is | kurt((cos() sin())Z)| with the uniform Z]
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Algorithm
S is not known after whitening Z = VX search for w Rn with w Z maximal non-gaussian because of q = (VA) w we get |q|2 = q q = (w VA)(A V w) = |w|2 so if q S n1 also w S n1 Algorithm:(kurtosis maximization) Maximize w | kurt(w Z)| on S n1 after whitening.
1.3 1.2

1.1

0.9

0.8

0.7

0.6

0.5

20

40

60

80

100

120

140

160

180

200

absolute kurtosis versus angle, plotted is | kurt((cos() sin())Z)| with the uniform Z]
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization
algorithmic maximization by gradient ascent:
a dierentiable function f : Rn R can be maximized by local updates in directions of its gradient suciently small learning rate > 0 and a starting point x(0) Rn , local maxima of f can be found by iterating x(t + 1) = x(t) + x(t) with x(t) = (Df )(x(t)) = grad f (x(t)) = f (x(t)) the gradient of x f at x(t).

in our case grad | kurt(w Z)|(w) = 4 sgn(kurt(w Z)) E(Z(w Z)3 ) 3|w|2 w Algorithm:(gradient ascent kurtosis maximization) Choose > 0 and w(0) S n1 . Then iterate w(t) := sgn(kurt(w(t) Z))E(Z(w(t) Z)3 ) v(t + 1) := w(t) + w(t) v(t + 1) w(t + 1) := |v(t + 1)|
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization
algorithmic maximization by gradient ascent:
a dierentiable function f : Rn R can be maximized by local updates in directions of its gradient suciently small learning rate > 0 and a starting point x(0) Rn , local maxima of f can be found by iterating x(t + 1) = x(t) + x(t) with x(t) = (Df )(x(t)) = grad f (x(t)) = f (x(t)) the gradient of x f at x(t).

in our case grad | kurt(w Z)|(w) = 4 sgn(kurt(w Z)) E(Z(w Z)3 ) 3|w|2 w Algorithm:(gradient ascent kurtosis maximization) Choose > 0 and w(0) S n1 . Then iterate w(t) := sgn(kurt(w(t) Z))E(Z(w(t) Z)3 ) v(t + 1) := w(t) + w(t) v(t + 1) w(t + 1) := |v(t + 1)|
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Maximization
algorithmic maximization by gradient ascent:
a dierentiable function f : Rn R can be maximized by local updates in directions of its gradient suciently small learning rate > 0 and a starting point x(0) Rn , local maxima of f can be found by iterating x(t + 1) = x(t) + x(t) with x(t) = (Df )(x(t)) = grad f (x(t)) = f (x(t)) the gradient of x f at x(t).

in our case grad | kurt(w Z)|(w) = 4 sgn(kurt(w Z)) E(Z(w Z)3 ) 3|w|2 w Algorithm:(gradient ascent kurtosis maximization) Choose > 0 and w(0) S n1 . Then iterate w(t) := sgn(kurt(w(t) Z))E(Z(w(t) Z)3 ) v(t + 1) := w(t) + w(t) v(t + 1) w(t + 1) := |v(t + 1)|
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

PCA versus ICA


1.5

0.5

0.5

1.5 2 1.5 1 0.5 0 0.5 1 1.5 2

Comparison of ICA basis and PCA basis when applied to transformed independent uniform density.

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Application: ICA analysis of nancial data sets


use closing day value of stock prices consider portfolio of stocks
30

MSFT

28 26 24 6 50 100 150 200 250

SUNW

5 4 3 50 100 150 200 250

40

INTC

30 20 10 60 50 50 100 150 200 250

KO

40 30 50 100 150 200 250

Historical data of four stocks from S&P 500, from 1-29-2004 to 1-28-2005.
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Preprocessing
stocks prices are non-stationary use stock returns x(t) = x(t + 1) x(t) or logarithmic dierences x(t) = log x(t + 1) log x(t)
2

MSFT

0 2 4 1 50 100 150 200 250

SUNW

0.5 0 0.5 2 50 100 150 200 250

INTC

0 2 4 2 0 50 100 150 200 250

KO

2 4 50 100 150 200 250

Figure: Stock returns


Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Independent components
perform ICA rst extract 4 sources (complete case) recombine s(t) = s(t) + s(t 1) with s(0) = 0
10 0 10 20 10 0 10 20 20 0 50 100 150 200 250 300 0 50 100 150 200 250 300

20 20 0 20 40

50

100

150

200

250

300

50

100

150

200

250

300

Figure: Extracted 4 sources (independent components using ICA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Independent components
extract only 2 sources (dimension reduction by PCA) recover original mixtures / stocks with reduced data set
10 5 0 5 10 15 20 25 0 50 100 150 200 250 300

25 20 15 10 5 0 5 0 50 100 150 200 250 300

Figure: Extracted 2 sources (independent components using ICA

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Independent components
extract only 2 sources (dimension reduction by PCA) recover original mixtures / stocks with reduced data set
30

MSFT

25

20 6

50

100

150

200

250

SUNW

5 4 3 50 100 150 200 250

35

INTC

30 25 20 60 50 50 100 150 200 250

KO

40 30 50 100 150 200 250

Figure: Recovered stocks

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Prediction
use linear lter to predict data simplest model: estimate parameters by minimizing LMSE

30

29

28

MSFT

27

26

25

24 50 100 150 200 250

Figure: One-step estimation using lter length 50


Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Prediction
use linear lter to predict data simplest model: estimate parameters by minimizing LMSE

29.5 29 28.5 28 27.5 27 26.5 26 25.5 25 50 100 150 200 250

Figure: One-step estimation using lter length 200


Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Second-order BSS using time structure


instead of independence assume here:
data possesses additional time structure S(t) source have diagonal autocovariances RS ( ) := E (S(t + ) E(S(t)))(S(t) E(S(t)) for all

goal: nd A (then estimate S(t) e.g. using regression) as before: centering and prewhitening (by PCA) allow assumptions
zero-mean X(t) and S(t) equal source and sensor dimension (m = n) orthogonal A

but hard-prewhitening gives bias...

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Second-order BSS using time structure


instead of independence assume here:
data possesses additional time structure S(t) source have diagonal autocovariances RS ( ) := E (S(t + ) E(S(t)))(S(t) E(S(t)) for all

goal: nd A (then estimate S(t) e.g. using regression) as before: centering and prewhitening (by PCA) allow assumptions
zero-mean X(t) and S(t) equal source and sensor dimension (m = n) orthogonal A

but hard-prewhitening gives bias...

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Second-order BSS using time structure


instead of independence assume here:
data possesses additional time structure S(t) source have diagonal autocovariances RS ( ) := E (S(t + ) E(S(t)))(S(t) E(S(t)) for all

goal: nd A (then estimate S(t) e.g. using regression) as before: centering and prewhitening (by PCA) allow assumptions
zero-mean X(t) and S(t) equal source and sensor dimension (m = n) orthogonal A

but hard-prewhitening gives bias...

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

AMUSE
bilinearity of autocovariance: RX ( ) = E X(t + )X(t) = ARS (0)A + 2 I ARS ( )A =0 =0

1 so symmetrized autocovariance RX ( ) := 2 RX ( ) + (RX ( )) fullls (for = 0) RX ( ) = ARS ( )A

identiability:
A can only be found up to permutation and scaling if there exists RS ( ) with pairwise dierent eigenvalues no more indeterminacies

AMUSE (algorithm for multiple unknown signals extraction)


[Tong et al., 1991] recover A by eigenvalue decomposition of RX ( )

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

AMUSE
bilinearity of autocovariance: RX ( ) = E X(t + )X(t) = ARS (0)A + 2 I ARS ( )A =0 =0

1 so symmetrized autocovariance RX ( ) := 2 RX ( ) + (RX ( )) fullls (for = 0) RX ( ) = ARS ( )A

identiability:
A can only be found up to permutation and scaling if there exists RS ( ) with pairwise dierent eigenvalues no more indeterminacies

AMUSE (algorithm for multiple unknown signals extraction)


[Tong et al., 1991] recover A by eigenvalue decomposition of RX ( )

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

AMUSE
bilinearity of autocovariance: RX ( ) = E X(t + )X(t) = ARS (0)A + 2 I ARS ( )A =0 =0

1 so symmetrized autocovariance RX ( ) := 2 RX ( ) + (RX ( )) fullls (for = 0) RX ( ) = ARS ( )A

identiability:
A can only be found up to permutation and scaling if there exists RS ( ) with pairwise dierent eigenvalues no more indeterminacies

AMUSE (algorithm for multiple unknown signals extraction)


[Tong et al., 1991] recover A by eigenvalue decomposition of RX ( )

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

AMUSE
bilinearity of autocovariance: RX ( ) = E X(t + )X(t) = ARS (0)A + 2 I ARS ( )A =0 =0

1 so symmetrized autocovariance RX ( ) := 2 RX ( ) + (RX ( )) fullls (for = 0) RX ( ) = ARS ( )A

identiability:
A can only be found up to permutation and scaling if there exists RS ( ) with pairwise dierent eigenvalues no more indeterminacies

AMUSE (algorithm for multiple unknown signals extraction)


[Tong et al., 1991] recover A by eigenvalue decomposition of RX ( )

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

SOBI

problems of AMUSE
choice of susceptible to noise or bad estimates of RX ( )

SOBI (second-order blind identication)


[Belouchrani et al., 1997] (similar: TDSEP) identify A by joint diagonalization of a whole set {RX ( (1) ), . . . , RX ( (K ) )} of autocovariance matrices joint diagonalization e.g. by Jacobi algorithm (iterative Givens rotation in two coordinates) more robust against noise and choice of

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

SOBI

problems of AMUSE
choice of susceptible to noise or bad estimates of RX ( )

SOBI (second-order blind identication)


[Belouchrani et al., 1997] (similar: TDSEP) identify A by joint diagonalization of a whole set {RX ( (1) ), . . . , RX ( (K ) )} of autocovariance matrices joint diagonalization e.g. by Jacobi algorithm (iterative Givens rotation in two coordinates) more robust against noise and choice of

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Data sets with additional structure


1.5

goal
improve SOBI performance for random processes with a higher dimensional parametrization

0.5

0.5

1.5

10

20

30

40

50

60

70

80

90

100

additional structure
random processes S and X depend multiple variables (z1 , . . . , zM ) examples: images (z1 , z2 ), 3D data sets such as fMRI scans (z1 , z2 , z3 ) use this additional information!

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Data sets with additional structure


1.5

goal
improve SOBI performance for random processes with a higher dimensional parametrization

0.5

0.5

1.5

10

20

30

40

50

60

70

80

90

100

additional structure
random processes S and X depend multiple variables (z1 , . . . , zM ) examples: images (z1 , z2 ), 3D data sets such as fMRI scans (z1 , z2 , z3 ) use this additional information!

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

From nD to 1D
usual reduction (example: images)
transform vector of images S(z1 , z2 ) to S(t) x mapping from the 2D parameter set to 1D time parametrization (e.g. row concatenation)

result
no problem in ICA because i.i.d. samples are assumed but time-structure based algorithms eectively loose information
...

...

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

From nD to 1D
usual reduction (example: images)
transform vector of images S(z1 , z2 ) to S(t) x mapping from the 2D parameter set to 1D time parametrization (e.g. row concatenation)

result
no problem in ICA because i.i.d. samples are assumed but time-structure based algorithms eectively loose information
...

...

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Multidimensional autocovariance
M-dimensional autocovariance (for centered processes) RS (1 , . . . , M ) := E S(z1 + 1 , . . . , zM + M )S(z1 , . . . , zM )
estimation from samples as usual now depends on M shifts i

advantage
random processes S and X depend multiple variables (z1 , . . . , zM ) examples: images (z1 , z2 ), 3D data sets such as fMRI scans (z1 , z2 , z3 ) use this additional information!

1dcov 2dcov

.8

.6

.4

.2

0 0 50 100 150 |tau| 200 250 300

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Multidimensional autocovariance
M-dimensional autocovariance (for centered processes) RS (1 , . . . , M ) := E S(z1 + 1 , . . . , zM + M )S(z1 , . . . , zM )
estimation from samples as usual now depends on M shifts i

advantage
random processes S and X depend multiple variables (z1 , . . . , zM ) examples: images (z1 , z2 ), 3D data sets such as fMRI scans (z1 , z2 , z3 ) use this additional information!

1dcov 2dcov

.8

.6

.4

.2

0 0 50 100 150 |tau| 200 250 300

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Multidimensional second-order BSS


replace 1D autocovariances by M-dimensional ones in AMUSE and SOBI mdAMUSE (multidimensional AMUSE)
prewhiten X(z1 , . . . , zK ) EVD of symmetrized multidimensional autocovariance RX (1 , . . . , M ) detects A how to choose lags i ?

mdSOBI (multidimensional SOBI)


joint diagonalization of a set of symmetrized multidimensional autocovariances
(1) (1) (K ) (K ) RX 1 , . . . , M , . . . , RX 1 , . . . , M

joint diagonalizer equals A except for permutation and signs (k) (k) in practice: choose (1 , . . . , M ) with increasing modulus for increasing k
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Multidimensional second-order BSS


replace 1D autocovariances by M-dimensional ones in AMUSE and SOBI mdAMUSE (multidimensional AMUSE)
prewhiten X(z1 , . . . , zK ) EVD of symmetrized multidimensional autocovariance RX (1 , . . . , M ) detects A how to choose lags i ?

mdSOBI (multidimensional SOBI)


joint diagonalization of a set of symmetrized multidimensional autocovariances
(1) (1) (K ) (K ) RX 1 , . . . , M , . . . , RX 1 , . . . , M

joint diagonalizer equals A except for permutation and signs (k) (k) in practice: choose (1 , . . . , M ) with increasing modulus for increasing k
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Multidimensional second-order BSS


replace 1D autocovariances by M-dimensional ones in AMUSE and SOBI mdAMUSE (multidimensional AMUSE)
prewhiten X(z1 , . . . , zK ) EVD of symmetrized multidimensional autocovariance RX (1 , . . . , M ) detects A how to choose lags i ?

mdSOBI (multidimensional SOBI)


joint diagonalization of a set of symmetrized multidimensional autocovariances
(1) (1) (K ) (K ) RX 1 , . . . , M , . . . , RX 1 , . . . , M

joint diagonalizer equals A except for permutation and signs (k) (k) in practice: choose (1 , . . . , M ) with increasing modulus for increasing k
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

SOBI versus mdSOBI


key observation: often data sets do not have long-distance autocorrelations, but quite high multi-dimensional close-distance correlations advantages of mdSOBI
exploit close-distance structure! SOBI weighs each matrix equally strong deterioration of performance for large K mdSOBI uses stronger close-distance correlations

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

SOBI versus mdSOBI


key observation: often data sets do not have long-distance autocorrelations, but quite high multi-dimensional close-distance correlations advantages of mdSOBI
exploit close-distance structure! SOBI weighs each matrix equally strong deterioration of performance for large K mdSOBI uses stronger close-distance correlations

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

SOBI versus mdSOBI


key observation: often data sets do not have long-distance autocorrelations, but quite high multi-dimensional close-distance correlations advantages of mdSOBI
exploit close-distance structure! SOBI weighs each matrix equally strong deterioration of performance for large K mdSOBI uses stronger close-distance correlations
K

source images

problem

performance comparison
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: Articial mixtures


articial example to compare algorithms
linear mixture of n = 3 images with a randomly chosen 3 3 matrix A increase noise level from 0% to 50% compare SOBI and mdSOBI for K = 32 and K = 128 autocovariance matrices

source images

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: Articial mixtures


same performance in low noise case mdSOBI outperforms SOBI with increasing noise reason: natural images do not have any substantial long-distance autocorrelations

noise level

SOBI and mdSOBI performance dependence on noise level

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: Articial mixtures


same performance in low noise case mdSOBI outperforms SOBI with increasing noise reason: natural images do not have any substantial long-distance autocorrelations

noise level

SOBI and mdSOBI performance dependence on noise level

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: Articial mixtures


same performance in low noise case mdSOBI outperforms SOBI with increasing noise reason: natural images do not have any substantial long-distance autocorrelations

noise level

SOBI and mdSOBI performance dependence on noise level

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: Articial mixtures


same performance in low noise case mdSOBI outperforms SOBI with increasing noise reason: natural images do not have any substantial long-distance autocorrelations

noise level

SOBI and mdSOBI performance dependence on noise level

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: fMRI analysis


function magnetic resonance imaging
noninvasive brain imaging technique information on brain activation patterns activation maps help identifying task-related brain regions BSS techniques for fMRI possible, see [McKeown et al., 1998].

data set
block design protocol with 5 time instants of visual stimulation and 5 instants of rest 100 scans of duration of 3s each acquired by D. Auer, MPI of Psychiatry, Munich, Germany

preprocessing
motion correction dimension reduction to rst 8 principal components

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: fMRI analysis


function magnetic resonance imaging
noninvasive brain imaging technique information on brain activation patterns activation maps help identifying task-related brain regions BSS techniques for fMRI possible, see [McKeown et al., 1998].

data set
block design protocol with 5 time instants of visual stimulation and 5 instants of rest 100 scans of duration of 3s each acquired by D. Auer, MPI of Psychiatry, Munich, Germany

preprocessing
motion correction dimension reduction to rst 8 principal components

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: fMRI analysis


function magnetic resonance imaging
noninvasive brain imaging technique information on brain activation patterns activation maps help identifying task-related brain regions BSS techniques for fMRI possible, see [McKeown et al., 1998].

data set
block design protocol with 5 time instants of visual stimulation and 5 instants of rest 100 scans of duration of 3s each acquired by D. Auer, MPI of Psychiatry, Munich, Germany

preprocessing
motion correction dimension reduction to rst 8 principal components

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: fMRI results


1 2 3
1
cc: 0.08

cc: 0.19

cc: 0.11

cc: 0.21

cc: 0.43

cc: 0.21

cc: 0.16

cc: 0.86

component maps

time courses

mdSOBI was performed with K = 32 component 8 is desired stimulus component active in the visual cortex (cc = 0.86) poor SOBI performance: two stimulus components with cc = 0.81 and 0.84
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Results: fMRI results


1 2 3
1
cc: 0.08

cc: 0.19

cc: 0.11

cc: 0.21

cc: 0.43

cc: 0.21

cc: 0.16

cc: 0.86

component maps

time courses

mdSOBI was performed with K = 32 component 8 is desired stimulus component active in the visual cortex (cc = 0.86) poor SOBI performance: two stimulus components with cc = 0.81 and 0.84
Theis A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Conclusions

summary
1D autocovariances do not suciently describe second-order image statistics proposed extension mdSOBI of SOBI uses multi-dimensional autocovariances performance increase in both simulations and real-world applications

future work
take also higher-order statistics into account applications to fMRI analysis also with 3D autocovariances

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Principal component analysis Independent component analysis Maximization of non-Gaussianity Second-order BSS using time structure

Conclusions

summary
1D autocovariances do not suciently describe second-order image statistics proposed extension mdSOBI of SOBI uses multi-dimensional autocovariances performance increase in both simulations and real-world applications

future work
take also higher-order statistics into account applications to fMRI analysis also with 3D autocovariances

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Conclusions
Summary
Data analysis: identiability algorithm independence: diagonality of Hessian of logarithmic density / characteristic function linear ICA by Hessian diagonalization: HessianICA postnonlinear BSS by nonlinearity detection as preprocessing step application to fMRI data sets

Questions:
Solve Hessian diagonalization dierential equation in postnonlinear model. Other models? Further applications in biological and medical data analysis?

Theis

A short tutorial on blind source separation

Motivation Basics Linear blind source separation Conclusions

Details and papers on my website http://fabian.theis.name Support by the DFG1 and the BMBF2 is gratefully acknowledged. References
A. Bell and T. Sejnowski. An information-maximisation approach to blind separation and blind deconvolution. Neural Computation, 7:11291159, 1995. A. Belouchrani, K. A. Meraim, J.-F. Cardoso, and E. Moulines. A blind source separation technique based on second order statistics. IEEE Transactions on Signal Processing, 45(2):434444, 1997.

P. Comon. Independent component analysis - a new concept? Signal Processing, 36:287314, 1994. J. Hrault and C. Jutten. Space or time adaptive signal e processing by neural network models. In J. Denker, editor, Neural Networks for Computing. Proceedings of the AIP Conference, pages 206211, New York, 1986. American Institute of Physics. M. McKeown, T. Jung, S. Makeig, G. Brown, S. Kindermann, A. Bell, and T. Sejnowksi. Analysis of fMRI data by blind separation into independent spatial components. Human Brain Mapping, 6: 160188, 1998. L. Tong, R.-W. Liu, V. Soon, and Y.-F. Huang. Indeterminacy and identiability of blind identication. IEEE Transactions on Circuits and Systems, 38:499509, 1991.

1 graduate 2 project

college: Nonlinearity and Nonequilibrium in Condensed Matter ModKog


Theis A short tutorial on blind source separation

You might also like