You are on page 1of 8

Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp.

1837-184

STATISTICAL PRINCIPLES OF SOURCE SEPARATION


Jean-Francois Cardoso 
 ENST/CNRS, 46 rue Barrault, 75634 Paris. France.

http://sig.enst.fr/ cardoso/stuff.html

Abstract: Blind signal separation (BSS) is an emerging signal processing technique, aiming
at recovering unobserved signals or `sources' from observed mixtures (typically, the output
of an array of sensors), exploiting only the assumption of mutual independence between
the signals. The weakness of the assumptions makes it a powerful approach but requires
to venture beyond familiar second order statistics. The objective of this paper is to review
some of the approaches that have been recently developed to address this exciting problem,
to show how they stem from basic principles and how they relate together.

Keywords: Blind source separation, independent component analysis, contrast functions.

1. INTRODUCTION This is compactly represented by the mixing equa-


tion

Blind signal separation (BSS) consists in recov-


x(t) = As(t) (1)
ering unobserved signals or `sources' from several where s(t) = [s1 (t); : : : ; sn (t)]y is an n  1 column
observed mixtures. Typically, the observations are vector collecting the source signals, vector x(t)
obtained at the output of a set of sensors, each similarly collects the n observed signals and the
sensor receiving a di erent combination of the square n  n `mixing matrix' A contains the
`source signals'. The adjective `blind' stresses the mixture coecients.
fact that i) the source signals are not observed
and ii) no information is available about the mix- The BSS problem consists in recovering the source
ture. This is a sound approach when modeling vector s(t) using only the observed data x(t), the
the transfer from the sources to the sensors is assumption of independence between the entries
too dicult; it is unavoidable when no a priori of the input vector s(t) and possibly some a priori
information is available about the transfer. The information about the probability distribution of
lack of a priori knowledge about the mixture is the inputs. It can be formulated as the compu-
compensated by a statistically strong but often tation of an n  n `separating matrix' B whose
physically plausible assumption of independence output y(t)
between the source signals. The so-called `blind-
ness' should not be understood negatively: the y(t) = Bx(t) (2)
weakness of the prior information precisely is the
strength of the BSS model, making it a versatile is an estimate of the vector s(t) of the source
tool for exploiting the `spatial diversity' provided signals.
by an array of sensors. The basic BSS model can be extended in several
The simplest BSS model assumes the existence directions. Assuming for instance more sensors
of n independent signals s1 (t); : : : ; sn (t) and the than sources, complex signals and mixtures and
observation of as many mixtures x1 (t); : : : ; xn (t), noisy observations, one obtains the standard nar-
these mixtures being linear and instantaneous. row band array processing/beamforming model.
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

Another extension is to consider convolutive mix- the strongest assumptions corresponds a priori
tures: this results in a multichannel blind deconvo- to the narrowest applicability. However, well de-
lution problem. All these extensions are of prac- signed approaches are in fact surprisingly robust
tical importance, but this paper is restricted to even to gross errors in modeling the source distri-
the simplest model: real signals, as many sensors butions as shown below. For simplicity, zero-mean
as sources, non convolutive mixtures, noise free sources are assumed throughout: Es = 0.
observations because it captures the essence of the
BSS problem.
Possible approaches to source separation are to 2.2 Blind identi ability
adjust a separating matrix in order to restore a
`distributional property' of the output y: maximal There are three levels of indetermination in the
peakedness or minimum entropy; to restore the BSS problem. One level is due to the data them-
mutual independence of the outputs; or possibly selves: if, for instance, two sources have the same
the joint distribution of all the sources. It is an distribution, they can only be recovered only up a
objective of this paper to show how these heuristic permutation. The second level may be due to lack
ideas de ne `contrast functions', how they can be of prior information. If, for instance, the distri-
derived from rst principles and how they relate bution is known up to scale, this scale cannot be
to each other. identi ed because it could be represented as well
rescaling the corresponding column of A. More
generally, if the source distributions are unknown,
2. MODELING AND IDENTIFIABILITY. the best that can be done is to recover a copy
of the signals, that is the source signals up to
2.1 Statistical modeling scales, signs and permutations; see e.g. Tong et
al. (1991a).
Source separation exploits primarily the `spatial Achieving signal copy is sucient as far signal
diversity', that is the fact that several sensors separation is concerned: the rst two levels of
carry di erent mixtures of the sources. It often indeterminations can be considered harmless. The
ignores any time structure. In this perspective, the real issue is to avoid any wider indetermination:
BSS problem reduces to the identi cation of the Gaussian signals can be recovered only up to a
probability distribution of a vector x = As given rotation, which is unacceptable, of course. A key
a sample distribution. Thus, the statistical model result is given (for non-deterministic signals) by
has two components: the mixing matrix A and the Comon (1994) after a theorem of Darmois. In
probability distribution of the source vector s. essence: if there is at most one Gaussian source,
Mixture. The mixing matrix A is the parameter signal copy is equivalent to restoring statistical
of interest. Its columns are assumed to be linearly independence between the signals. Threfore BSS
independent. There is something special about may be implementing by nding a separating
having an invertible matrix as the unknown pa- matrix such that y = B x has independent entries.
rameter, because the set of all n  n invertible
matrices forms a multiplicative group. This sim- Using second order information. Even though sec-
ple fact has a profound impact on source sep- ond order information (decorrelation) is not suf-
aration, because it allows to de ne algorithms cient to guarantee signal separation, it can be
with uniform performance i.e. whose behavior is put to use. Assume that the source signals have
completely independent of the particular mixture. unit variance so that Essy = I ; vector s is said
Source distribution. The probability distribution to be spatially white. Let W denote a `whitening
of each source is a `nuisance parameter': we are not matrix' for x, that is z , W x is spatially white.
primarily interested in it, even though knowing The composite transform WA necessarily is a ro-
or estimating these distributions is necessary to tation matrix because it relates two spatially white
eciently estimate the parameter of interest. Even vectors s and z = WAs. Therefore, `whitening' or
if we say nothing about the distribution of each `sphering' the data reduces the mixture to a ro-
source, we say a lot about their joint distribution tation matrix, leaving n(n 1)=2 unknown (rota-
by assuming mutual source independence. This is tion) parameters to be determined by other than
the key assumption. Source separation techniques second order information among the n2 unknown
di er widely by the assumptions explicitly or im- parameters in A.
plicitly made on the individual distributions of The prewhitening approach is sensible from an
the sources. There is a whole range of possible algorithmic point of view but it is not statistically
assumptions about the source distributions: they ecient when the source distributions deviate sig-
are known in advance; some of their features (mo- ni cantly from normality (see sec. 5.3). Actually,
ments,. . . ) are known; they belong to a parametric enforcing the whiteness constraint amounts to
family; no distribution model is available,. . . To believe that second order statistics are in nitely
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

more reliable than any other kind of statistics. This shows that the maximum likelihood principle
This is, of course, untrue. is associated with a contrast function
ML [y] = K[yjs] (6)
3. CONTRAST FUNCTIONS
and the normalized log-likelihood can be seen,
Minimum contrast estimation is a general tech- via (5) as an estimate of K[yjs]. The ML princi-
nique of statistical inference (Pfanzagl, 1973). It is ple thus says something very simple when applied
relevant for blind deconvolution (see the inspiring to the BSS problem: ` nd matrix A such that the
paper of Donoho (1981)) and has been introduced distribution of A 1 x is as close as possible to the
in the related BSS problem in (Comon, 1994) hypothesized distribution of the sources'. The con-
(even though in a somewhat restricted setting). trast function (6) is also arrived at via the infomax
In both instances, a contrast function is a real principle (see (Bell and Sejnowski, 1995; Nadal
function of the probability distribution. To deal and Parga, 1994), and references therein).
with such functions, a special notation will be
useful: for x a given random variable, f (x) gener- The simple likelihood approach described above is
ically denotes a function of x while f [x] denotes a based on a xed hypothesis about the distribution
function of the distribution of x. For instance, the of the sources. A more powerful approach is to
mean of x is the function m[x] , Ex. model the data by adjusting both the unknown
system and the distributions of the sources. In
Contrast functions for source separation are gener- other words, one should minimize the divergence
ically denoted [y] and, by de nition, verify K[yjs] with1 respect to A (via the distribution
[C s]  [s] with equality only if y = C s is a of y = A x) and with respect to the model
copy of the source signals. In other words: mix- distribution of s. This minimization problem has
ing necessarily increases a contrast function and a simple and intuitive theoretical solution. Denote
separation is achieved by minimizing a contrast y~ a random vector with i) independent entries and
function. Since the mixture can be reduced to a ii) each entry distributed as the corresponding
rotation matrix by enforcing the whiteness con- entry of y. A classic property (see e.g. (Cover and
straint Eyyy = I (sect. 2.2), one can also consider Thomas, 1991)) of y~ is that
`orthogonal contrast functions': these are denoted
 [y] and must be minimized under the whiteness K[yjs] = K[yjy~ ] + K[~yjs] (7)
constraint Eyyy = I .
for any vector s with independent entries. Eq. (7)
3.1 Information theoretic contrasts shows that K[yjs] is minimized in s by simply tak-
ing s = y~ . Hence, the contrast function associated
The maximum likelihood (ML) principle leads to to the `reduced likelihood':
several contrasts which are expressed via the Kull- MI [y] , K[yjy~ ] = min K[yjs]: (8)
back divergence. The Kullback divergence between s
two probability density functions f (s) and g(s) on
Rn is de ned as This is traditionally known as the mutual in-
Z 
f (s )

formation (between the entries of y). It veri es
K(f jg) , f (s) log g(s) ds (3) MI [y]  0 with equality if and only if y is
distributed as y~ . By de nition of y~ , this happens
whenever the integral exists. The divergence be- when the entries of y are independent. In other
tween the distributions of two random vectors w words, MI [y] is a measure of independence be-
and z is concisely denoted K[wjz]. tween the entries of y.
If q() is an hypothesized pdf for the source vector Orthogonal contrasts. If the mixing matrix has
s, then the pdf of x = As is been reduced to a rotation matrix by whitening,
contrast functions like ML or MI can still be
p(x; A) = j det Aj 1 q(A 1 x) (4) used but the latter takes an interesting alternative
form under the whiteness constraint Eyyy = I :
If T samples X1:T , [x(1); : : : ; x(T )] of x are mod- X
eled as independent, then p(X1:T ) = p(x(1))  MI [y] = H[yi] + cst (9)
   p(x(T )). Thus, the probability p(X1:T ) of the i
samples is readily computed as a function of the
unknown parameter, namely matrix A. Denoting Thus, minimizing the mutual information between
s a random vector with distribution q, simple the entries of y is equivalent to minimizing the
calculus shows that sum of the entropies of the entries of y. There
1 T !1 is a simple interpretation: mixing the entries of s
T log p(X1:T ; A) ! K[A xjs] + cst:(5)
1
`tends' to increase their entropies; it seems natural
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

to nd separated source signals as those with that if [y] = 0, then 4 [y] is equal (up to a
minimum marginal entropies. constant additive term) to
n
X
4 [y] , 2 ki Ciiii [y] = Ef4 (y) (12)
3.2 High order approximations i=1
High order statistics can be used to de ne con- where we have de ned f4(y) , 2 ni=1 ki (yi4 3).
P
trast functions which are simple approximations This is a pleasant nding: this contrast function
to those derived from the ML approach. The being the expectation of a function of y, it is par-
cumulants of the elements of a given vector y ticularly simple to estimate by a sample average.
are denoted Cij [y] , Cum[yi ; yj ] and Cijkl [y] ,
Cum[yi ; yj ; yk ; yl ]. Since the source vector s has Comon (1994) obtains another orthogonal con-
independent entries, all its cross-cumulants van- trast by approximating MI
ish: X
ICA [y] = 2 [y ]
Cijkl (13)
Cij [s] = i2 ij Cijkl [s] = ki ijkl (10) ijkl6=iiii

where  is the Kronecker symbol and we have Gaeta and Lacoume (1990) arrive at the same
de ned the variance i2 and the kurtosis ki of contrast by approximating the likelihood by a
the i-th source as the second and fourth order Gram-Charlier expansion. This contrast actually
`auto-cumulants' of si : i2 , Cii [s] = Es2i and is similar to MI in the sense that it involves only
ki , Ciiii [s] = Es4i 3E2s2i . The likelihood terms measuring the (4th order) independence
contrast ML [y] = K[yjs] measures the mismatch between the entries of y.
between output distribution and a model source Independence can also be tested on a smaller
distribution. Cruder measures can be de ned from subset of cross-cumulants with:
the quadratic mismatch between the cumulants: X
X JADE [y] , 2 [y ]:
Cijkl (14)
2 [ y ] , (Cij [y] Cij [s])2 : ijkl6=ijkk
ij
4 [ y ] ,
X
(Cijkl [y] Cijkl [s])2 : The motivation for using this speci c subset is
ijkl that JADE also is a joint diagonalization crite-
rion, meaning that it can be optimized by an e-
Clearly 2 is not a contrast because 2 [y] = 0 ex- cient algorithm, similar to the Jacobi technique of
presses only the decorrelation between the entries diagonalization (Cardoso and Souloumiac, 1993).
of y. On the contrary, one can show that 4 [y] is Simpler contrasts can be used if the kurtosis of the
a contrast if all the sources have known non zero sources are known. For instance, eq. (12) suggests,
kurtosis. Even though fourth order information for negative kurtosis, a very simple contrast:
is sucient by itself to solve the BSS problem, n
X
it is interesting to use 2 and 4 in conjunction m [y] = Eyi4 (15)
because they jointly provide an approximation to i=1
the likelihood contrast: if the sources are `close to
normal' and y is close to s then This is actually a valid contrast function if ki +
kj < 0 for all pairs of sources.
K[yjs]  24 [y] , 481 (122[y] + 4[y]) : (11)
Room is lacking to discuss the validity of this 4. ESTIMATING FUNCTIONS
approximation. The point however is not to deter-
mine how closely 24 [y] approximates K[yjs] but In sect. 3, separating matrices were characterized
rather to follow the suggestion that second and as the minimizers of contrast functions. In this
fourth order information could be used jointly. section, we derive the nite sample version of this
principle and the corresponding algorithms.
Orthogonal contrasts. We consider cumulant-based
orthogonal contrasts. The orthogonal approach,
which forces decorrelation i.e. 2 [y] = 0, corre- 4.1 A speci c gradient for transformation models
sponds to replacing the factor 12 in eq. (11) by an
in nite weight (optimal weighting is considered An appropriate notion of gradient is needed to
in (Cardoso et al., 1996); see also sec. 4.4) or describe the rst order variation of a contrast
equivalently to minimizing 4 [y] under the white- function [y] under linear transforms of y. By
ness constraint 2 [y] = 0. Simple algebra shows de nition, an in nitesimal linear transform of y
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

is close to the identity transformation, i.e. is in non-linear version 'i (yi ) of the ith output. The
the form I + E where E is a `small' matrix. It diagonal terms i = j x the scales of the outputs.
transforms y into (I + E )y = y + E y. If  is Similarly, maximizing the likelihood under the
smooth enough, we denote r[y] the n  n matrix whiteness constraint corresponds to solving (20)
the relative (Cardoso and Laheld, 1996)or the with with H = H' de ned as
natural (Amari, 1996) gradient of  at [y] such
that H' (y) , yyy I + '(y)yy y'(y)y (21)
[y + E y] = [y] + hr[y] j Ei + o(kEk)(16)
Note that H' is decomposed in two parts: a skew-
where hji is the Euclidean scalar product between symmetric part which is just H' (y) H' (y)y and
matrices: hM jN i = trace MN y = nij=1 Mij Nij .
P a symmetric part which is yyy I and expresses
the whiteness condition.
The gradient of the Kullback divergence between
the distribution of the output y and the hypoth- Di erentiating other orthogonal contrast func-
esized distribution of the sources is tions yields similar estimating equations. For
instance, the simple 4th-order contrasts (12)
rML [y] , rK[yjs] = EH' (y) (17) and (15) yield estimating equations in the form (21)
with ' functions respectively given by
where the vector-to matrix mapping Hs core is 'i (yi ) = ki yi3 and 'i (yi ) = yi3 (22)
H'(y) , '(y)yy I (18)
respectively. The derivative of a contrast function
with '(y) , ['1 (y1 ); : : : ; 'n (yn )]y and where the does not necessarily take the form of the mean
so-called `score functions' are de ned as value of H (y) for some function H . However the
0
contrast functions ICA and JADE are `asymp-
'i , (log qi )0 'i () = qqi (()) (19) totically associated' to the same estimating equa-
tion as 4 . This means that solving (20) with this
i
estimating function or minimizing ICA , JADE
Hence, with hypothetic pdf qi for the i-th source or 4 with the cumulants estimated from T sam-
is associated a non-linear function governing the ples yields estimates which are equivalent for large
associated estimating equation. Note that if s enough T .
is a zero-mean unit-variance Gaussian variable,
the associated score is '(s) = s. In this case,
the estimating equation reduces to a condition of 4.3 Gradient algorithms
empirical whiteness.
Relative gradient descent. The steepest descent
technique of minimization consists in moving by a
4.2 Estimating equations small step in a direction opposite to the gradient.
To minimize a contrast [y] by following the
More generally, an estimating function H for the steepest relative gradient, y undergoes a small
BSS problem is a vector-to matrix mapping H : change into (I + E )y with E = r[y] for
Rn ! Rnn . It is associated to the estimating a `small' positive step size . Thus, one step
equation of a relative gradient descent can be formally
described as
T
1X H (y(t)) = 0 (20) y (I r[y])y = y r[y] y: (23)
T t=1
According to (16), the resulting variation of [y]
thus called because, H being matrix-valued, equa- is   hr[y]jEi = hr[y]j r[y]i =
tion (20) speci es a priori as many constraints kr[y]k2 < 0. The formal description (23) is
as unknown parameters in the BSS problem, so readily turned into o -line or on-line algorithms
that an estimate of the mixture (or of its inverse) when [y] has, like ML , a relative gradient in the
can be determined by solving eq. (20). Estimat- form r[y] = EH (y). Multiplication of y by (I
ing equations and estimating functions provide a r[y]) as in eq. (23) amounts to changing B into
uni ed view of many BSS techniques. (I r[y])B . In particular a stochastic gradient
The simplest example of estimating equation algorithm is obtained by deleting the expectation
stems from the log-likelihood whose stationary operator in r[y] = EH (y). Therefore, an on-line
points verify eq. (20) is with y = A 1 x and stochastic relative algorithm updates a separating
H = H' (eq. (18)). For i 6= j , the ij -th term matrix Bt when a new sample x(t) is received
of matrix equation (18) expresses the empirical according to
decorrelation between the j th output yj and a Bt+1 = Bt t H (y(t))Bt (24)
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

where t is a sequence of adaptation steps. 5. PERFORMANCE ISSUES


The most remarkable feature of gradient algo- 5.1 Equivariance and uniform performance
rithms in the form (24) is their uniform perfor-
mance property. This is to be understood in the The performance of a given BSS technique de-
sense that the trajectory of the global system pends a priori on the distribution of the source
Ct , Bt A is signals and on the mixing matrix: performance is
Ct+1 = Ct t H (C s(t))Ct (25) expected to be poor for nearly Gaussian signals
and for nearly singular mixtures. However, this is
not true, at least in the high SNR domain: in the
which does not depend on A: the only e ect of limit of noise free observations, the BSS problem is
the mixing matrix is to determine (together with `uniformly hard in the mixing matrix'. This means
B0 ) the initial value C0 = B0 A of the global that the achievable separation is independent of
system. This is a very nice property because the matrix A: poorly conditioned mixtures can be
the performance in terms of source separation recovered just as easily as if A = I (see below)
depends on the global system Ct and not on the Therefore, it seems possible to design algorithms
particular values of Bt and A. This is true for any having `uniform performance' (with respect to A).
estimating function H ; however uniformly good This is a very desirable property since such algo-
performance can only be expected if function rithms can be studied and tuned independently of
H is correctly selected, for instance by deriving the particular mixture to be separated.
it from a contrast function. On line algorithms
based on function H in the form (18) are described It is not dicult to show that BSS algorithms
in (Amari et al., 1995); those based on form (21) based on the solution of estimating equations
are studied in detail in (Cardoso and Laheld, or on the optimization of contrast functions are
1996). The uniform performance property has also equivariant. We have already stressed (sec. 4.3)
been obtained in (Cichocki et al., 1994). that adaptive algorithms can also be made equiv-
ariant. See (Cardoso, 1995) for a general pre-
sentation and (Cardoso and Amari, 1997) for an
extension to location-scale models.
4.4 Adapting to the sources
5.2 Stability
When the pdf of a given source is assumed to
be q(s), this assumption manifests itself in the Let B be a separating matrix which is a station-
estimating functions like H' and H' via the score ary point of the learning rule (24), i.e. veri es
function ' de ned by (19). If the source has a in EH (y) = 0 for a given estimating function H .
fact a pdf r(s), one should rather use the `true' The question is to decide whether or not it is also
score function = rr . Pham (1997) has pro- a stable stationary point. It can be answered for
0

posed an elegant approach to easily approximate both the symmetric form H' and for the asym-
by a linear combination: metric form H' and, in both cases, depends only
the following non-linear moments
L
X
' (s) , l fl (s) (26) i , E'0i (si ) Es2i E'i (si )si (28)
l=1
where the normalization of the sources (scales) is
of a xed set of basis functions. Surprisingly, the determined by the condition EH (s) = 0.
set of coecients minimizing the mean square Leaving aside the issue of the stability with re-
error E(' (s) (s))2 between the true score and spect to scale, the analysis yields a pairwise sta-
its approximation can be found without knowing bility condition for the symmetric form (21)
explicitly : it is
' (s) = EF 0 (s)y EF (s)F (s)y 1 F (s): (27) (1 + i )(1 + j ) > 1 for 1  i < j  n (29)


while the stability condition for the asymmetric


where F (s) , [f1(s); : : : ; fL(s)]y is the L  1 form (18) is that 1+ i > 0 for 1  i  n and that
column vector of basis functions and F 0 (s) is the
column vector of their derivatives. This is a very i + j > 0 for 1  i < j  n: (30)
signi cant result because the expression of '
can be simply estimated by replacing in (27) the Therefore stability appears to depend on pair-
expectations by sample averages and the values of wise conditions. Note that the stability domain
s by the estimated source signals. is larger for the symmetric form (21): this is a
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

consequence of letting the second order informa- when ' = i.e. when the non-linear function
tion (the whiteness constraint) do `half the job'. In matches the true score function. The maximum
both cases, a sucient stability condition is i > 0 value of is  , E 2 (s)Es2 E2[ (s)s] and
for all the sources. Another important point is expression (34) with =  actually is the best
that i = 0 for any 'i if the ith source is normally achievable ISR rate with T independent samples
distributed. Hence, the stability conditions can (the asymptotic Cramer-Rao bound, see (Yellin
never be met if there is more than one Gaussian and Friedlander, 1996; Pham and Garrat, 1997)).
source. Finally, the achievable performance depending on
We have considered linear-cubic score functions in the size of  , this moment characterizes the hard-
sect. 4. If 'i (si ) = i si + i s3i for two constants i ness of the BSS problem with respect to source
and i , then i = i (3E2 s2i Es4i ) = i ki . There- distribution. It tends to zero when the common
fore, if one wishes to use cubic non-linearities, it is source distribution tends to a normal distribution,
sucient to know the sign of the kurtosis of each showing how the achievable performance worsens
source to make separating matrices stable. For when the source distributions tend to be normally
other than cubic scores, the sign of the kurtosis distributed. It tends to +1 when the source dis-
is not relevant to stability. tributions tend to discrete or to bounded sup-
port. In the case of discrete sources, deterministic
(error-free) blind identi cation is possible with a
5.3 Accuracy of estimating equations nite number of samples. In the case of sources
with bounded support, the MSE of blind identi-
The interference-to-signal ratio (ISR) obtained in cation decreases at a much faster rate than the
rejecting the qth source in the estimate of the pth 1=T rate obtained for nite values of .
source by a separating matrix B is
(BA)2 Es2
pq (B ) , (BA)2pq Esq2 p 6= q (31)
pp p 6. CONCLUSION
Let BbT be the separating matrix obtained via a In this short review, the focus was on some basic
particular algorithm using T samples. The limit principles underlying the simplest source separa-
tion model. Many interesting issues have been left
ISRpq , Tlim
!1
T E (BbT ) (32) out, like discussing the connections between BSS
and blind deconvolution; convergence speed of
usually exists and provides an asymptotic measure adaptive algorithms; design of consistent estima-
of performance of separation of a given o -line tors based on noisy observations, detection of the
BSS technique. If the procedure is equivariant, number of sources, etc. . . Rather than concluding
this measure does not depend on A. For simplicity, by a summary, we brie y comment on three other
we consider identically distributed signals and issues.
identical non-linear functions: 'i () = '(), so
that i =  and i = for 1  i  n where Algebraic approaches. The 4th order cumulants
of x have a very regular structure in the BSS
i , E'2i (si )Es2i E2 ['i (si )si ]  0: (33) model which calls calls for algebraic approaches.
Simple algorithms can be based on the eigen-
If the score function has been obtained by the structure of `cumulant matrices' built from cu-
technique of Pham described in sec. 4.4 based on mulants (Cardoso, 1989; Tong et al., 1993). An
T independent samples, then the rejection rates exciting direction of research is to investigate
ISRpq and ISRpq obtained with the estimating high-order decompositions that would generalize
function H' and H' respectively are (for p 6= q): matrix factorizations like SVD or EVD to 4th
 
order cumulants (Lathauwer and B. De Moor,
ISRpq = ISR = 21 1 + +1 2 (34) 1996; Cardoso, 1991; Comon and Mourrain, 1994).
  Using temporal correlation. If the source signals
ISRpq = ISR = 21 1 + 21 (35) are temporally correlated, time structure can also
be exploited. It is possible to achieve separation
These results teach several things. First, ISR is if all the source signals have distinct spectra even
lower bounded by 1=4 regardless of the value of if each source signal is a Gaussian process (Tong
: this is a general property of orthogonal BSS et al., 1991b). Simple algebraic techniques can
techniques (Cardoso, 1994) and is the price to pay be devised (see (Tong et al., 1990; Belouchrani
for blindly trusting second order statistics i.e. for et al., 1997)); the Whittle approximation to the
whitening. Second, ISR and ISR are both mini- likelihood is investigated in (Pham and Garat,
mized by maximizing . This is achieved precisely 1993).
Proceedings of the SYSID’97, 11th IFAC symposium on system identification. pp. 1837-184

Deterministic identi cation. As indicated in sec. 5.3, Comon, P. (1994). Independent component analy-
sources with discrete support allow for deter- sis, a new concept ?. Signal Processing, Else-
ministic identi cation (in nite Fisher informa- vier 36(3), 287{314. Special issue on Higher-
tion). Speci c contrast functions can be de- Order Statistics.
vised (Gamboa, 1995) to take advantage of dis- Comon, P. and B. Mourrain (1994). Decomposi-
creteness. There is a rich domain of application tion of quantics in sums of powers. In: SPIE
with digital communication signals where infor- conference on Advanced Signal Processing.
mation is coded with discrete symbols. San Diego.
Cover, Thomas M. and Joy A. Thomas (1991).
Elements of information theory. Wiley series
7. REFERENCES
in telecommunications. John Wiley.
Amari, S. (1996). Neural learning in structured Donoho, D. (1981). On minimum entropy decon-
parameter spaces|natural Riemannian gra- volution. In: Applied time-series analysis II.
dient. In: Proc. NIPS. pp. 565{609. Academic Press.
Amari, S.-I., A. Cichocki and H.H. Yang (1995). Gaeta, Michel and Jean-Louis Lacoume (1990).
Recurrent neural networks for blind separa- Source separation without a priori knowledge:
tion of sources,. In: Proc. Int. Symp. NOLTA. the maximum likelihood solution. In: Proc.
pp. 37{42. EUSIPCO . pp. 621{624.
Bell, A. J. and T. J. Sejnowski (1995). An Gamboa, Fabrice (1995). Separation of sources
information-maximisation approach to blind having unknown discrete supports. In: Proc.
separation and blind deconvolution. Neural IEEE SP Workshop on Higher-Order Stat.,
computation 7(6), 1004{1034. Aiguablava, Spain. pp. 56{59.
Belouchrani, Adel, Karim Abed Meraim, Jean- Lathauwer, L. De and J. Vandewalle B. De Moor

Francois Cardoso and Eric Moulines (1997). (1996). Independent component analysis
A blind source separation technique based on based on higher-order statistics only. In: Proc.
second order statistics. IEEE Trans. on S.P. IEEE SSAP workshop, Corfu . pp. 356{359.
45(2), 434{44. Nadal, J.-P. and N. Parga (1994). Nonlinear neu-
Cardoso, Jean-Francois (1989). Source separa- rons in the low-noise limit: a factorial code
tion using higher order moments. In: Proc. maximizes information transfer. NETWORK
ICASSP. pp. 2109{2112. 5 , 565{581.
Cardoso, Jean-Francois (1991). Super-symmetric Pfanzagl, J. (1973). Asymptotic expansions re-
decomposition of the fourth-order cumulant lated to minimum contrast estimators. The
tensor. Blind identi cation of more sources Annals of Statistics 1(6), 993{1026.
than sensors. In: Proc. ICASSP. pp. 3109{ Pham, Dinh-Tuan and Philippe Garrat (1997).
3112. Blind separation of mixture of independent
Cardoso, Jean-Francois (1994). On the perfor- sources through a quasi-maximum likelihood
mance of source separation algorithms. In: approach. IEEE Tr. SP. To appear.
Proc. EUSIPCO. Edinburgh. pp. 776{779. Pham, D.T. and P. Garat (1993). Separation
Cardoso, Jean-Francois (1995). The equivariant aveugle de sources temporellement correlees.
approach to source separation. In: Proc. In: Proc. GRETSI . pp. 317{320.
NOLTA. pp. 55{60. Tong, L., R. Liu, V.C. Soon and Y. Huang
Cardoso, Jean-Francois and Antoine Souloumiac (1991a). Indeterminacy and identi ability
(1993). Blind beamforming for non Gaussian of blind identi cation. IEEE Tr. on CS
signals. IEE Proceedings-F 140(6), 362{370. 38(5), 499{509.
Cardoso, Jean-Francois and Beate Laheld (1996). Tong, L., V.C. Soon, Y.F. Huang and R. Liu
Equivariant adaptive source separation. (1991 b). A necessary and sucient condition
IEEE Trans. on S.P. 44(12), 3017{3030. for the blind identi cation of memoryless sys-
Cardoso, Jean-Francois and Shun-Ichi Amari tems. In: Proc. ISCAS. Vol. 1. Singapore.
(1997). Maximum likelihood source separa- pp. 1{4.
tion: equivariance and adaptivity. In: To ap- Tong, L., Y.F. Huang V.C. Soon and R. Liu
pear in Proc. of SYSID'97. (1990). AMUSE: a new blind identi cation
Cardoso, Jean-Francois, Sandip Bose and Ben- algorithm. In: Proc. ISCAS.
jamin Friedlander (1996). On optimal source Tong, Lang, Yujiro Inouye and Ruey-wen Liu
separation based on second and fourth or- (1993). Waveform preserving blind estimation
der cumulants. In: Proc. IEEE Workshop on of multiple independent sources. IEEE Tr. on
SSAP, Corfou, Greece. SP 41(7), 2461{2470.
Cichocki, A., R. Unbehauen and E. Rum- Yellin, D. and B. Friedlander (1996). Multi-
mert (1994). Robust learning algorithm for channel system identi cation and deconvo-
blind separation of signals. Electronic letters lution: performance bounds. In: Proc. IEEE
30(17), 1386{87. SSAP workshop, Corfu. pp. 582{585.

You might also like