# An introduction to a new data analysis tool: Independent Component Analysis

Andreas Jung Regensburg, March 18th 2002
Abstract A common problem encountered in data analysis and signal processing, is ﬁnding a suitable representation of multivariate data. For computational and conceptual simplicity, often these representations are sought as a linear transformation of the original data. Well known linear transformation are for example the principal component analysis or projection pursuit. A recently new developed nonlinear method is the independent component analysis (ICA), in which the components of the desired representation have minimal stochastical dependence. Such a representation seems to capture the essential structure of the data in many applications. In this paper, we will focus on the theory and methods of ICA in contrast to classical transformations, as well as the applications of this method to biomedical data as for example electroencephalography (EEG). For an illustration of the algorithm, we will also visualized the unmixprocess with a set of images. Finally we will give an outlook to the possible future developments of ICA. Main aspects of my future research will be: using time structure information from the data to enhance the convergence of the algorithm; determine the meaningfulness of the independent components and treating non-stationary data as most biomedical systems are in a non-equilibrium.

1

Introduction

A central problem in data analysis, statistics and signal processing, is ﬁnding a suitable representation of the multivariate data, by means of a suitable transformation. It is important for subsequent analysis of the data, whether it is pattern recognition, de-noising, visualization or anything else, that the data is represented in a manner that facilitates the analysis. Especially when biomedical data is analyzed, the representation of the data for the analysis by the physicians must be as clear as possible and should present only the essential structures hidden in the data. Since in real-world problems only continuous-valued parameters are measured, we will concentrate us in this paper only to continuous-valued multidimensional variables. Let us denote by x = (x1 , x2 , . . . , xm )T ∈ Rm a m-dimensional random variable; the problem is to ﬁnd a transformation τ : x → y so that the n-dimensional transform y ∈ Rn deﬁned by y = f(x), f : Rm → Rn (1)

has some desirable properties. (Please note that we will use throughout this paper the same notation for the random variables and their realizations, the context should make the distinction clear). Often a linear transformation is used to represent the observed variables, i.e., y = Wx (2)

where W is a (n × m)-matrix which has to be determined. Using linear transformations makes the problem computational and conceptually simpler, and facilitates the interpretation of the results. Several principles and methods have been developed to ﬁnd a suitable representation, principal component 1

t = 1. only a mixture of some underlying source signals can be observed. Due to some circumstances.. that the recovered signals are nearly identical to the original sources. As the name implies. so that the transformed signals yi (t) correspond to the original signals si (t). This problem can be solved (under some restrictions) by the independent component analysis (ICA). ”interestingness” of the resulting components. Amplitude Time Amplitude Time Figure 2: Comparison between the original sources signals (right) used to construct the mixture shown in ﬁgure 1 and the recovered signals (left) from the mixture by using the independent component analysis. One can clearly see. 2 . The unmixed signals are very close to the original signals. . so that the resulting components yi are stochastical as independent from each other as possible. Often one can assume that the source signals are stochastical independent from each other. the basic goal is to ﬁnd a transformation. Such a recovery of the original sources is demonstrate in ﬁgure 2. The goal is to determine these sources signals. where on the left hand side the original source signals are plotted and on the right hand side the by the independent component analysis recovered signals. except of a scaling-factor and permutation. simplicity of the transformation-matrix W or any other application oriented one. except of a permutation and a scaling factor (sign). and thus the independent component analysis can ﬁnd a transformation. as illustrated in ﬁgure 1. 2.analysis is just one among them.. Amplitude Time Figure 1: An illustration of a blind source separation (BBS) problem. Recently a new method has gained wide spread attention: the independent component analysis (ICA). in which the observed variables x correspond to a realization of a m-dimensional discrete time signal x(t). which can’t be determined by this method. in the sense of optimality of dimension reduction.. One typical application of this method is the blind source separation (BSS) problem. These observed signals originate from a (non-)linear mixture of some underlying source signals si (t). These methods deﬁne a principle that tells which transformation is optimal.

These methods only use information contained in the covariance matrix of the data vector x and of course the mean of the data.. In contrast. 2 Classical Transformation Several principles have been developed in statistics and signal processing to ﬁnd a suitable linear representation of some observed data. 1995].. which means that the mean is subtracted. Let us denote the direction by w1 . sn . are second order methods.. as a higher order method. 2001]. that explain the maximum amount of variance possible by n linearly transformed components. as a second order. but since the data can always easily be centered. The technique of ICA was for the ﬁrst time introduced in early 1980s in the context of neural network modelling. In this paper. where ICA could help revealing the interesting structures..Principal Component Analysis One of the most popular methods for ﬁnding a linear transformation of some observed data. A good intuitive way to explain PCA is a recursive formulation of the method: In the ﬁrst step. if at least as many microphones are placed in the room as speakers. and projection pursuit. this can be neglected. Two well known classical linear transformations are the principal component analysis. which is of course a task-dependent property. A complete coverage of ICA and many more references can be found in the book [Hyv¨ arinen et al. 1999b]. In this section we will discuss classical methods for determining the linear transformation as in (2). s2 . A mixture of simultaneous speech signals are recorded by several microphones. in the sense of the mean square error of the reconstruction. Goal is to separate these original speech signals. where x0 is the original non-centered variable and E {·} denotes the expectation. for example from higher moments. which correspond to the voices of the speakers. can be high-dimensional time series of any kind. The distribution of a variable x. The most widely used second-order technique is the principal component analysis (PCA). one looks for the projection on the direction in which the variance of the projection is maximized.. The use of second-order techniques can be understood in the context of the classical assumption of Gaussianity. as for example multi-satellite missions. for a more detailed survey we refer to the article [Hyv¨ arinen. Possible other applications having multidimensional data-sets. This makes second-order methods very robust and computationally simple. [Lee et al. but also on real-world problems like biomedical signal processing or separation of audio signals in telecommunication. [Hyv¨ arinen and Oja. Thus it is unnecessary to include any other information.Another prominent example of a BSS-problem is the cocktail party problem. They will be discuss in the following. 1999a]). 1997] and [Hyv¨ arinen. from biomedical applications. together with impressive demonstrations on problems like the cocktail party problem. then w1 = arg max E {(wT x)2 }. most higher-order methods try to ﬁnd a meaningful representation. which was the basis of this paper.1 Second-order methods . But these methods seem to capture a meaningful representation in a wide variety of applications. ICA can separate them with high quality. ||w||=1 (3) 3 . In mid-1990s some highly successful new algorithms were introduced by several research groups ([Bell and Sejnowski. . 2. is completely determined by second-order information. which is distributed normal or Gaussian. since only classical matrix manipulations are used. One might roughly characterize the second-order methods as one. For simplicity. so the transformed variable can be written as x = x0 − E {x0 }. Since the speech signals from diﬀerent speakers are stochastical independent. 1999]. ﬁnancial markets or any other scientiﬁc experiment. The basic idea is to ﬁnd components s1 . we will only give a short overview about the theory and methods of the independent component analysis. we assume all variables as centered. which tries to ﬁnd a faithful representation of the data.

This iterative process will be continued until all n principal components are found. In order for this to be meaningful. the computation of the wi can be simply accomplished using the covariance matrix E {xxT } = C. would yield a wrong reconstruction of the original sources. we will show the result of a PCA transformation of the mixed signals from ﬁgure 1. x2 ). One can clearly see in ﬁgure 4. so that the representations reveal the wanted information. since the method only searches for directions with maximum variance. However such a transformation must not always produce the best result in the sense of recovering mixed sources from the blind source separation problem. Indeed. 2.Once this direction or ﬁrst principal component (s1 = wT 1 x) is found. where the line shows the direction of the ﬁrst principal component. This gives an optimal (in the sense of mean-square) dimension reduction from 2 to 1 dimensions. that the original signals were not recovered. In practice. the variables x are assumed to be non-Gaussian. in which the the ﬁrst component is shown of a two dimensional data set. Therefore higher-order methods are necessary to ﬁnd more meaningful transformations.Projection pursuit Higher-order methods use information of the distribution of x that is not contained in the covariance matrix. only orthogonal directions are allowed for the next principal component. As an example. shown in ﬁgure 1. The wi are the eigenvectors of C that correspond to the n largest eigenvalues of C. since 4 . A simple illustration of PCA can be found in ﬁgure 3. it can be proven that the representation given by PCA is an optimal linear dimension reduction technique in the sense of the mean-square error. Figure 3: A principal component analysis (PCA) of a two dimensional data set (x1 . Amplitude Time Figure 4: The result of a principal component analysis (PCA) for the blind source separation problem.2 Higher-order methods . The basic goal in PCA is the optimal reduction of dimension of the data.

one often shows this classical example. The classical second-order method would yield an uninteresting and therefore wrong projection.. For the deﬁnition of stochastical independents. stochastical independents is a much stronger requirement than uncorrelatedness. which means that E {yi yj } − E {yi }E {yj } = 0. Such projections can be used for optimal visualization of some clustering structures in the data. ym some random variables (for simplicity with zero mean) with the joint probability density f (y1 . An illustrative example for projection pursuit is shown in ﬁgure 5.would be vertical. It was stated. Figure 5: To illustrate the problems of variance based methods. so the projection on it would not separate the clusters. The independent component analysis (ICA) goes one step further and uses all higher moments of the distribution. ... The variables are stochastical independent. (5) In general. 3 Independent component analysis The Independent Component Analysis is a method... since the aim of this transformation is to visualized only interesting structures normally in two or three dimensions. In contrast a projection pursuit method would yield a horizontal direction and so clearly separate the clusters. Stochastical independents must be distinguished from uncorrelatedness. A technique developed in statistics for ﬁnding ”interesting” projections of multidimensional data is the so called projection pursuit. . if the density function can be factorized [Papoulis. for i = j. A two dimensional data set (x1 . let us ﬁrst recall some basic deﬁnitions needed. The ﬁrst principal component . For independents of the yi it must hold E {g1 (yi )g2 (yj )} − E {g1 (yi )}E {g2 (yj )} = 0. ym ) = f1 (y1 )f2 (y2 ). 5 for i = j. in which the components of the new representation have minimal stochastical dependence. (6) .fm (ym ) (4) where fi (yi ) denotes the marginal probability density of yi .. x2 ) is clearly separated into two clusters. interesting direction. that only the non-Gaussian distributions are the most interesting ones. which is the horizontal one. Let us denote y1 ... The reduction of dimension is also an important objective. Therefore higherorder statistics must be analyzed and used for these methods. for the visualization.. 1991] f (y1 . .. but the projection pursuit method would ﬁnd the. y2 ..the information about the distribution of a (zero-mean) Gaussian variable is fully contained in the covariance matrix.direction of maximum variance . ym ).

The arrows represent the unit vectors e1 . The vectors represent the unmixing-vectors for the PCA. as in the blind source separation problem. a linear mixture of these signals. one would get a transformed scatter-plot as shown in ﬁgure 7. 6 . Every point in a scatter-plot is given by the pair (x1 . e2 0 e1 x2 0 x1 Figure 6: A scatter-plot of two independent signals (x1 .(right plot) and the ICAsolution (left plot). but not an arbitrary linear mixing. where u1 (the ﬁrst principal component) represents the direction of maximum variance. while the PCA only ﬁnds the orthogonal vectors u1 . This is a much stricter condition than the uncorrelatedness. It is obvious that the ICA-solution recovers the transformed unit vectors a1 .(left) and the ICA-solution (right).a2 .x2 ). a2 u1 a1 x2 0 x2 u2 0 x1 0 0 x1 Figure 7: Comparison between PCA and ICA: A mixture of the two independent signals (see ﬁgure 6) is shown in a scatter-plot. Assuming. The vectors in the plots represent the unmixing-vectors for the PCA. x2 ).for any (measurable) functions g1 and g2 [Papoulis. An intuitive example of independent component analysis can be given by a scatter-plot of two signals x1 . Theses directions correspond to the original sources we were looking for. can now be illustrated. but only ﬁnds orthogonal directions. a2 . One can see the ”cross-like” structure of the plot shown in ﬁgure 6. 1991]. So PCA can only recover a rotation.e2 . Obviously the principal component analysis doesn’t ﬁnd the correct sources. ICA in contrast is able to ﬁnd the transformed unit vectors a1 . The procedure of recovering the original (independent) sources.u2 . which corresponds to the (factorized) joint density distribution of two independent signals x1 and x2 . for simplicity. x2 .

especially in real world problems. one cannot assume to ﬁnd strictly independent components. This deﬁnition reduces the ICA problem to an ordinary estimation of latent variables. • number of observed signals m must be at least as large as the independent components n. sn )T are assumed to be independent. sn ). .e... Furthermore only for the linear case the identiﬁability of ICA can be shown. since a linear transformation. . 3. i. • matrix A must be of full column rank. As stated in the introduction. in the sense of maximizing some function F (y1 . The Matrix A is a constant m × n ’mixing’ matrix and n a m-dimensional random noise vector. which has proven to be a task diﬃcult enough. This will be done in a section later on.. f : Rm → Rn has minimum mutual stochastical dependence. In the following.is not able to generate independent components. . since the mean of the random variable x can always be subtracted. m ≥ n. 1994]. which generated the data. which often can be considered as a good approximation. • all components si must be non-gaussian (with possible exception of one component).1 Deﬁnition of independent component analysis In this section. ym ) that measures stochastical independence. therefore the great majority of ICA research neglects this term and the following more simple deﬁnition can be formulated: Deﬁnition 3. we will only concentrate us on the noise-free model. then they have to be strictly stationary! 7 . The ﬁrst and most general deﬁnition is as follows: Deﬁnition 1. we shall only consider the linear case. (Noisy ICA model) ICA of a random vector x consists of estimating the following generative model for the data: x = As + n (7) where the latent variables (components) si in the vector s = (s1 . since no assumptions are made on the data or the model. the problem of estimating an additional noise-term makes the problem complex. using information theoretical considerations. However. The justiﬁcation for this approximations is that methods using the simpler model are more robust. By imposing the following fundamental restrictions.. (Noise-free ICA model) ICA of a random vectors x consists of estimating the following generative model for the data: x = As (8) where A and s are as in Deﬁnition 2. Although the general case for any nonlinear function f(x) can be formulated. Usually..3. which is in practice no restriction. (General deﬁnition) ICA of the random vector x consists of ﬁnding a linear transform y = Wx so that the components yi are as independent as possible.2 Identiﬁability of the ICA model The identiﬁability of the noise-free ICA model has been treated in [Comon. in general. we will deﬁne the problem of independent component analysis. This is the most general deﬁnition. Also the function for measuring the stochastical independence must be deﬁned.. A diﬀerent approach is taken by the following more estimation-theoretically oriented deﬁnition: Deﬁnition 2... one also assumes that x and s are centered. we are looking for a transformation τ : x → y so that the n-dimensional transform y ∈ Rn deﬁned by y = f(x).. Furhtermore. Note. the identiﬁability of the model can be assured: • stochastical independence of the components s = (s1 . if x and s are a result of a stochastic process.

Using this fact.. K (f (x). depending on the probability density f (x) of the signal x: I (x) = log (f (x)) . He has deﬁned the information as a measure. where the components are ordered by their variances. y ) = K (f (x. if and only if f (x) and f (y ) are equal. one can deﬁne a so called distance measure called ”Kullbach-Leibler”Divergence between two distributions f (x) and f (y ). if and only if the joint probability density of x and y factorizes into their marginal densities.. The estimation of the data represented by the model is usually preformed by formulating an objective function and optimizing. one usually uses a contrast function based on information theoretical considerations – this will be discussed in the next paragraph – but also other properties can be used. y ) f (x)f (y ) (13) Two signals or random variables are independent if their joint probability density factorizes into their marginal densities f (x. a measure for the independence can be formulated.G. how the actual ICA-method looks like. 2001].3 ”The” ICA-Method Since we have formulated in the previous sections the model and the identiﬁability of the independent component analysis.Puntonet et al. y ) log f (x. [Puntonet and Prieto. f (y )) = x. Information theory Shannon has introduced in the mid 20th century as the ﬁrst one working on information theory. a measure for the information and the bandwidth of channels.A basic indeterminacy in the model is that the independent components and the columns of the mixing matrix A can only be estimated up to a scaling factor. since any constant factor multiplied with the independent components could be cancelled by dividing the columns of the mixing matrix with the same factor. the mutual information: M (x.. since any permutation is a solution of the ICA model. y ) = f (x)f (y ). [Theis et al. many algorithms haven been developed and some of them will be stated in the following. it. In contrast to PCA. These deﬁnitions are easily extendable to higher dimensions and continuous random variables. the mutual information is only zero.y f (x. Another indeterminacy is the ordering of the independents components. We can formulate the method as ICA-Method = Contrastfunction + Optimisation Algorithm (9) For measuring the independence of the sources. 1995]. 8 . 1998]. This objective function is often called contrast or cost function and acts as an index. 2001]. [Jung et al. For optimizing the function an optimization algorithm is necessary. ICA can not determine the original ordering of s. we will discuss in this section.y f (x) log f (x) f (y ) (12) The ”Kulllbach-Leibler”-Divergence is only equal to zero. However a new ordering could be introduced by using a measure of non-Gaussianity or the norm of the columns of the mixing matrix. f (x)f (y )) = x. either minimizing or maximizing. 3. as for example geometric considerations [C. Since the Kulbach-Leibler-Divergence is only equal to zero if their arguments are equal. The average of the information is called the ”Shannon Entropy”: E {I (x)} = x (10) f (x) log (f (x)) (11) Derived from this entropy. how well the estimated data is represented by the model and how independent the estimated sources are. y ).

3  0. The original images shown in ﬁgure 8 consists of 250 by 250 = 62500 pixels with a resolution of 256 gray values. where s is matrix with 3 rows.2 −0.But one should realize. one starts at an arbitrary starting-point and descents along the gradient of the contrast-function until a minimum is reached.8 (14) so that the mixed signals x are computed by x = As. Each row of x represents one mixed image. and 62500 columns. An acceleration can be achieved by using the so called conjugate gradient descent method.4 A visualized example of unmixing images To give a better impression. each row corresponds to one black-and-white image. To ﬁnd a minimum. A solution can be the combination of gradient descent methods with other approaches like simulated annealing or genetic algorithms. Unfortunaly this methods must not necessarily direct you into the global minimum.5 A :=  0.1 0. how the ICA algorithm works. Optimizing such a function is a diﬃcult task.2 0. We have mixed these images.4 0. let us denote them as s. 3. the algorithm is stuck in this valley and can’t ”jump” over the surrounding hills. Figure 8: The original sources/images exists of 250x250=62500 pixels with a resolution of 256 gray values. the second is an artiﬁcial image with the black text ”ICA do it!” and the last images contains uniform random noise with gray values between 0 and 255. the optimization algorithm must be choosen carefully and there is no guaranty for ﬁnding the global minimum.6  −0. The ﬁrst images is a photo taken from an Airbus A300 Zero-g. all three images are shown in ﬁgure 9. we will mention some of the possible optimization algorithms one could use for this problem.3  0. Optimization algorithm Algorithmically minimizing a contrast-function can be done with several methods. especially suggestive when the minimum lies in a narrow valley. 9 . In the next paragraph. the ﬁrst and intuitive one is the gradient descent method. that such a contrast-function as the mutual information is a highly nonlinear function with many local minima. by using the mixing matrix 0. we will demonstrated it by visualizing the demixing process of three mixed black and white images. since once the algorithm has reached a (local) minimum.

In every step a new estimated mixing matrix A ˜ −1 x are the unmixed signals. These images were presented to the FastICA-Algorithm. mutual information M (y) as a measure of independents. It is obvious that the algorithm converges to the original images. All algorithms work iteratively. Note that the third iteration seems to be a better solution than the last. where y = A we can optimize the mixing matrix. the algorithm tries to unmix them by using only the property of independents of the original images/sources. Figure 10: Visualization of the unmixing-process of the FastICA-Algorithm. When representing these mixed images to an ICA-algorithm. so we have the possibility to show the development of the unmixing process. details for this phenomena are given in the text.Figure 9: Resulting signals/images after mixing the images from ﬁgure 8 with the mixing-matrix A as described in the text. as we have ˜ is calculated and by using the done in ﬁgure 10. 10 . Every set of images correspond to one iteration step. except of scaling and permutation.

it shows.6 0. but the expected result of having unmixed the original images can’t be fulﬁlled.1 0. one would have to take a natural image and not a photo of ”man-made” object. the unmixed images are very close to the original ones. s) =  −0. Lang. where speech recognition is an interesting application. A sketch of an analysis of a EEG from a patient without any abnormal neural behavior we will be shown in the following. When using images.2 0. one assumes to have independent sources.0700 −0. 4. In the following. where the aim is to detect abnormal neural behavior of the brain. using this new nonlinear data analysis method. estimated mixing matrix A     0.0021 0.2000 0.0021 1. than the ﬁnal unmixed ones. But when the correlation matrix is calculated. but also in image processing of for example satellite data. while a patient is working on a predeﬁned problem. Brawanski at the University Hospital in Regensburg and in cooperation with the group of Prof. where electrical potentials are measured with electrodes on the surface of the head. Especially the Airbus image has a small fraction of the text image (”ICA do it!”) overlayed.0000 0.0000 −0.5953 0.5 −0.0049  (16) correlation(s. As fundamental restriction for the identiﬁability of the ICA method.0981  .0700 −0. Not only in audio processing. which give a correlation with the text image.0049 1.2313 0.One can clearly see the unmixing of the images towards the three original images. A :=  0.3  . that the images from the third iteration seem to be closer to the original images. where active regions in the brain are detected. Electro-Encephalography (EEG) is a method. Acknowledgements go to Dr. The ICA-algorithm used here correctly converges to a solution. or EEG (Electro-Encephalography). where detailed experience with analysis of EEG data with ICA methods has been gathered in the last years.3923 0. A visual inspection of the images in ﬁgure 8 suggest to have independent signals/images.3 0. Some typical examples are fMRI (functional Magnetic Resonance Imaging). that the oﬀ-diagonal elements are not zero   1.8004 −0.2648 0.8 The phenomena of not perfectly unmixing is easily explained and points out nicely the problems with the independent component analysis.0000 and therefore they are not independent! The restriction of having independent signals as sources is a very strict one and can’t always be assumed. who was the contact person for the EEG data at the University Hospital. since these images have more often horizontal and vertical edges. The result of the ICA-Algorithm is the following ˜ – for comparison the original mixing matrix is also given. which can’t be reconstructed by the algorithm. But one major application of ICA is the analysis of bio-medical data. since having knowledge about one image won’t give you any information about the other two images.4997 0. Often one expects that the observed data from biological systems is a superposition of some underlying unknown sources. originated from a possible unknown brain tumor. Except of scaling (sign) and permutation.2997 −0.2 ˜ :=  −0. Therefore one should always keep in mind the fundamental requirements of the ICA method and realize what one expects from such an method! 4 Applications of ICA In many signal processing areas. ICA helps to process high dimensional multivariate data. Schulmeyer. we will give an example of an EEG analysis and how it can improve the analysis by the physicians.1 Electro-Encephalography (EEG) In the Graduiertenkolleg ”Nonlinearity and Nonequilibrium in condensed matter” we are working together with the group of Prof. where the unmixed signals are independent and their correlation matrix is an identity matrix. Exactly this separation of independent sources is possible with ICA.4 −0. this method can help to explore the data. But one can also notice. A (15) 0. Goal is to get a better understanding of the processes in the human 11 .

The EEG-channels are labelled by a shortcut which correspond to a given electrode on the head of the patient. to diagnose brain disease or monitor the depth of anaesthesia. Obviously one can see the mixture of many diﬀerent signals: the alpha-waves of the brain having a main frequency of 8Hz is nearly visible in all channels. in contrast the artefact of an eye-blink is only present in channel ”Fp2” – this electrode is placed close to the right eye. One main problem is the superposition of the signals of the brain it self and the artifacts like eye-blinks. if one assumes. These signals can now be separated by the independent component analysis. we have plotted in ﬁgure 11 the signals from every electrode over a time period of 7 seconds. mathematical spoken their probability densities factorize in to their marginal 12 . Before any further analysis is possible. further the alpha-waves of the brain (8Hz) are nearly visible in all the channels. Fp1 Fp2 F3 F4 C3 C4 P3 P4 EEG-Channel O1 O2 F7 F8 T3 T4 T5 T6 Fz Cz Pz A1 A2 61 62 63 64 65 Time [s] 66 67 68 Figure 11: An electro-encephalography (EEG) measurement of a patient recorded at the University Hospital Regensburg. One clear artefact (an eye-blink) is visible in the ”Fp2” channel.brain.e. i. that the signals we are looking for are independent. one has to extract and separate these signals by a data analysis tool. To give a better impression of the signals measured during an EEG. The plotted lines show the evolution of the electric potentials at each electrode over the time. head movements or the heartbeat.

Furthermore the IC #19 could be identiﬁed as noise due to fast muscle-movement and IC #20 seems to be the pulse from the heartbeat. In ﬁgure 12 the independent components respectively the independent signals are plot for the same time period as in ﬁgure 11. This can now be projected on to the head with a density plot. Now we can see. since the distri13 . One can clearly identify the eye-blink in IC #1. But not only the waveform can give information about the extracted signals. Note.probability densities. this is in contrast to the PCA. where the signals are sorted by the variances of the signals. as shown in ﬁgure 13. that the independent components are not sorted by any criteria. 1 2 3 4 5 6 7 Independent Component 8 9 10 11 12 13 14 15 16 17 18 19 20 21 61 62 63 64 65 Time [s] 66 67 68 Figure 12: Plotted are the independent components (ICs) of an ICA of the EEG from ﬁgure 11. that the independent component #8 has no physical meaning. All ICs can be localized by using the information from the mixing. also the mixing matrix contains information about the origin and location of the signals. using the recovered mixing matrix (see ﬁgure 13). some signals can easily be identiﬁed by its characteristic waveform. Each column of the mixing matrix holds the information about how the source-signals were distributed on to the electrodes. As described in the caption. the heartbeat (ecg) is separated into IC #5 and a main alpha-wave is visible in IC #9.

Goal is to identify brain diseases. Brawanski is providing the data from patients treated on the intensive-care unit at the neurosurgery. Note that mixing-vector #8 seems to be an artifact of the ICA-Algorithm. Furthermore the time series are to short for nonlinear time series analysis. Therefore those density plots can further help the physicians to characterize and identify the signals. the origin of the independent components (ICs) can be localized and plotted using a density-plot. This seemed to be a good application for the independent component analysis. Mixingvector #1 Fp1(1) Fp2(2) Mixingvector #5 Fp1(1) Fp2(2) Mixingvector #9 Fp1(1) Fp2(2) F7(11) F3(3) Fz(17) F4(4) F8(12) + F7(11) F3(3) Fz(17) F4(4) F8(12) + F7(11) F3(3) Fz(17) F4(4) F8(12) + A1(20) T3(13) C3(5) Cz(18) C4(6) T4(14) A2(21) 0 A1(20) T3(13) C3(5) Cz(18) C4(6) T4(14) A2(21) 0 A1(20) T3(13) C3(5) Cz(18) C4(6) T4(14) A2(21) 0 P3(7) T5(15) Pz(19) P4(8) T6(16) - P3(7) T5(15) Pz(19) P4(8 ) T6(16) ) P3(7) T5(15) Pz(19) P4(8) T6(16) ) O1(9) O2(10) O1(9) O2(10) O1(9) O2(10) Mixingvector #8 Fp1(1) Fp2(2) Mixingvector #19 Fp1(1) Fp2(2) Mixingvector #20 Fp1(1) Fp2(2) F7(11) F3(3) Fz(17) F4(4) F8(12) + F7(11) F3(3) Fz(17) F4(4) F8(12) + F7(11) F3(3) Fz(17) F4(4) F8(12) + A1(20) T3(13) C3(5) Cz(18) C4(6) T4(14) A2(21) 0 A1(20) T3(13) C3(5) Cz(18) C4(6) T4(14) A2(21) 0 A1(20) T3(13) C3(5) Cz(18) C4(6) T4(14) A2(21) 0 P3(7) T5(15) Pz(19) P4(8) T6(16) ) P3(7) T5(15) Pz(19) P4(8) T6(16) ) P3(7) T5(15) Pz(19) P4(8) T6(16) ) O1(9) O2(10) O1(9) O2(10) O1(9) O2(10) Figure 13: Using the information from the recovered mixing-matrix. Still it is not jet possible to rank the independent components by a conﬁdence level or reliability or meaningfulness of the components. Dr. the ICA method tries to ﬁnd a representation in which the transformed components have minimal stochastical 14 . #19 to muscle-noise near the left temple and #20 to the heartbeat pulse near he right temple.bution over the electrodes is just random and no certain area can be distinguished. Mixing-vector #1 corresponds to an eye-blink. the density-plot gives no physical interpretation. 4. 5 Summary and Outlook The Independent Component Analysis is a new data analysis tool for processing multivariate data sets. Rupert Faltermeier in the group of Prof. where only second order statistics is used. by a better understanding which parameter and processes the system trigger and which inﬂuence they have on the brain.2 Neuro-monitoring data Another biomedical application is the analysis of neuro monitoring data from patients on the intensivecare unit. which measurement categories have the highest information contents and how the measured signals are coupled. #5 to the heartbeat (ecg) recorded mostly at the ears. #9 to the ground activity of the brain. therefore one should better develop a model of the brain to ﬁt their parameters and analyze the behavior of such a system. since stationarity of the data is one basic requirement. In contrast to classical (linear) transformation. First analysis have shown a high non-stationarity of the data. Furthermore it would be of great interested to understand. which makes the analysis with the independent component analysis very diﬃcult.

Puntonet et al. In the mathematical aspects of ICA we work together with Fabian Theis and Stefan Bechtluft-Sachs (mathematical institute). Our main cooperations in the Graduiertenkolleg are the contacts to the university hospital. Lang. Until now. [C.G. 1995] C. J.Prieto. (1995). how well the ICA method could help to separate distortions by seismic events from real gravitational waves. Such a representation seems to capture the essential structure of the data in many applications. To treat the non-stationary data from the neuro monitoring.R.Jutten.Ortega (1995). one probably has to develop a model for biomedical systems and ﬁt parameters to it. T. 7:1129–1159. 46:267–284. as shown for example for the electroencephalography (EEG). Another space science experiment is the gravitational wave detector Geo600 in Hannover. Signal Processing. how versatile this new data analysis method is.dependence. and Sejnowski. J. where only little is know about the origin of the signals. that the analysis of bio-medical data with respect to the (non-)linear and non-stationary nature of the signals from the brain. especially for real world data. My future research will focus on the development of new ICA methods to implement time-series analysis and dynamical aspects into multivariate data analysis tools. Our aim is to enhance the separation quality of the independent components.Alvarez. References [Bell and Sejnowski. Brawanski. ICA extracts the main features interesting for the analysis. Obermaier) and Dr. not only for pure theoretical applications. especially with Dr. Separation of sources: a geometry-based procedure for reconstruction of n-valued signals. In the applications of the ICA method to fMRI-data we work together with Tobias Westenhuber in the group of Prof. It would also be interesting to know. time structure information is not taken into account in the classical ICA algorithms. One can see. C. This method can therefore enhance the analysis by the physicians and give a better inside into the mechanisms and structure of the brain. a new concept? Signal Processing. Questions of how reliable are the resulting components are of great interest not only for the physicians. A. Furthermore the meaningfulness of the independent components and the ”real” number of independent sources are interesting in real world problems. makes nonlinear methods necessary. A. With classical (linear) methods.. Rupert Faltermeier (former member of the group of Prof. A main advantage of having a model system is the possibility of studying the system behavior (chaotic behavior from nonlinear diﬀerential equations?) and probably use prediction-methods to forecast the signals. but also for real world problems like bio-medical data analysis. The Cluster mission from the european space agency (ESA) is a project with four autonomis satellites measuring the variation of the magnetic ﬁeld around our earth. As an example for a totaly diﬀerent application area of the ICA method is the space science research. Therefore this (non-)linear data analysis method contributes in such a way to the Graduiertenkolleg ”Nonlinearity and Nonequilibrium in condensed matter”. Joachim Vogt at the International University in Bremen is working on the data analysis and we are looking forward to use ICA for these problems. one was not able to analysis these signals in such a clear way. Multi satellite missions produce data from experiments on many satellites. Independent component analysis. M. and J. which have to be analyzed by data analysis tools. therefore we work together with Andreas Kaiser and Thomas Schreiber in the group of Holger Kantz (time series analysis) from the Max Planck Institute for the Physics of Complex Systems (MPI-PKS in Dresden). since ICA is not able to treat these data. Neural Computation. 36:287–314.G. An information-maximisation approach to blind separation and blind deconvolution. The data analysis is mainly done by the Max Plank Institute for Gravitational Physics (Albert-Einstein-Institute) in Potsdam. Schulmeyer in the group of Prof. Many diﬀerent channels from seismic instruments are recorded and analyzed on their inﬂuence on the experiment it self. 1994] Comon. 15 . P. (1994). [Comon. 1995] Bell.Puntonet. Especially in bio-medical applications.

(2001). Probability. 1999] Lee. McGraw-Hill. 1997] Hyv¨ arinen. E. Neural Computation. San Diego. A.. [Lee et al. A. 1999a] Hyv¨ arinen.. Jung. E. A. J. and Oja. E. Inc.Proceedings. Fast and robust ﬁxed-point algorithms for independent component analysis. and Prieto. A. [Jung et al. 11:417–441. (2):94–128. and Lang. A fast ﬁxed-point algorithm for independent component analysis. (2001). Theis. and Lang. T. A theoretic model for linear geometric ICA. Neural net approach for blind separation of sources based on geometric properties. 2001] Hyv¨ arinen... Faktgeo . F. 1998] Puntonet. M. Girolami. W. C. A. Survey on independent component analysis.. 10(3):626–634. 1991] Papoulis... arinen. C. Neurocomputing.. 16 . ICA 2001.. and Oja. G. G. [Hyv¨ arinen et al. A. A.. [Hyv¨ arinen and Oja. IEEE Transactions on Neural Networks. (1998). and Sejnowski. Independent component analysis using an extended infomax algorithm for mixed sub-gaussian and super-gaussian sources. 3rd edition. Neural Com[Hyv¨ arinen. San Diego. [Theis et al. Puntmnet. W.Proceedings. (1999).a histogram based approach to linear geometric ica. 9:1483–1492. Karhunen. (1997). 18:141–164. and Stochastic Processes.. (1999a). 2001] Jung. (1991).. Independent Component Analysi¡s. E.. J.[Hyv¨ arinen. ICA 2001 . Random Variables. T. J. 1999b] Hyv¨ puting Surveys. [Papoulis. (2001). F.-W. 2001] Theis. Neural Computation. [Puntonet and Prieto. (1999b). A. John Wiley & Sons.