SPE 113233-PP An ICA Approach to Purify Components of Spatial Components of Seismic Recordings

Arash Moaddel Haghighi, SPE, Petroleum University of Technolgy; and Iman Moaddel Haghighi, SPE, University of Tehran, Physics Department

Copyright 2008, Society of Petroleum Engineers This paper was prepared for presentation at the 2008 SPE Annual Technical Conference and Exhibition held in Denver, Colorado, USA, 21–24 September 2008. This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract Decomposing linear mixtures or superpositions into their components is a problem occurring in many different branches of science, such as telecommunications, Seismology, and biomedical signal analysis. Blind source separation (BSS) in particular, deals with the case where neither the sources, nor the mixing matrix or process of mixing are known, the only available data are the mixed signals. The standard approach to BSS is Independent Component Analysis (ICA). In fact ICA is a statistical technique that represents a multidimensional random vector as a linear combination of nongaussian random variables -'independent components'- that are as independent as possible. In 3D seismic surveys involved in exploration operations, recorded time series in each dimension are taken to be independent in nature and behavior, which is a direct result of physical response of materials into which seismic waves penetrate. But as an observation one dimension is sometimes contaminated up to one fifth by information from another dimension, resulting an increase in SNR. Here we have applied FAST-ica algorithm to a 3D seismic record sample to extract least dependent recordings for all three spatial dimensions. In order to test the reliability of the decomposition we used mutual information (MI) transfer between signals to confirm the result of outputs from FAST-ica algorithm as, least dependent components, leading to more accurate interpretations of petrophysical parameters. Introduction A long-standing problem in statistics and related areas is how to find a suitable representation of multivariate data. Representation here means that we somehow transform the data so that its essential structure is made more visible or accessible. This problem is more critical when we aim to extract information from observed set of data. An example of such data configuration is a recorded seismic signal, where we

hope to extract geological features from responses of different layers to an impulse shock. The key point in interpreting such digital signals is to keep the noise level and/or component overlapping as low as possible. The latter will be better understood if we keep in mind that representing multivariate datasets may cause some uncertainty about component independency, depending on the approach utilized to accomplish this job. A good representation is also a central goal of many techniques in data mining and exploratory data analysis. In signal processing, the same problem can be found in feature extraction, and also in the source separation problem that will be considered below. Let us assume that the data consists of a number of variables that we have observed together. Let us denote the number of variables by m and the number of observations by T. We can then denote the data by xi (t ) , where the indices take the values i= 1,…,m and t= 1,…,T. The dimensions m and T can be very large. A very general formulation of the problem can be stated as follows: What could be a function from an m –dimensional space to an ndimensional space such that the transformed variables give information on the data that is otherwise hidden in the large data set. That is, the transformed variables should be the underlying factors or components that describe the essential structure of the data. It is hoped that these components correspond to some physical causes that were involved in the process that generated the data in the first place. In most cases, we consider linear functions only, because then the interpretation of the representation is simpler, and so is its computation. Thus, every component, say yi (t ) ,is expressed as a linear combination of the observed variables:

yi (t ) = ∑ wij .x j (t ) , for i=1,…,n , j=1,…,m
j

(1)

2

SPE 113233-PP

Where the wij ’s are some coefficients that define the representation. The problem can then be rephrased as the problem of determining the coefficients wij . Using linear algebra, we can express the linear transformation in equation.1 as matrix multiplication. Collecting the coefficients wij in a matrix W, the equation becomes:

components. This job is done using the statistical properties of independent signals. In other words, our mathematical trick is simply making the properties of observed signals, as closer as possible, to those of independent signals, so that we can handle them as independent signals. This is done by using some geometrical transformations as numerous authors, Hyvarinen(1999),Stogbauer(2004),Kraskov(2006) have reported different attacks to this problem. ICA principels and Mutual information concept ICA belongs to a class of blind source separation (BSS) methods for separating data into underlying informational components, where such data can take the form of images, sounds, telecommunication channels or stock market prices. The term “blind” is intended ICA in a nutshell. If two people speak at the same time in a room containing two microphones then the output of each microphone is a mixture of two voice signals. Given these two signal mixtures, ICA can recover the two original voices or source signals. This example uses speech, but ICA can extract source signals from any set of two or more measured signal mixtures, where each signal mixture is assumed to consist of a mixture of source signals to imply that such methods can separate data into source signals even if very little is known about the nature of those source signals. As an example, imagine there are two people speaking at the same time in a room containing two microphones. If each voice signal is examined at a fine time scale then it becomes apparent that the amplitude of one voice at any given point in time is unrelated to the amplitude of the other voice at that time The reason that the amplitudes of the two voices are unrelated is that they are generated by two unrelated physical processes (i.e., by two different people). If we know that the voices are unrelated then one key strategy for separating voice mixtures into their constituent voice components is to look for unrelated time-varying signals within these mixtures. Using this strategy, the extracted signals are unrelated, just as the voices are unrelated, and it follows that the extracted signals are the voices. So, simply knowing that each voice is unrelated to the others suggests a strategy for separating individual voices from mixtures of voices. This apparently mundane observation is a necessary prerequisite for understanding how ICA exploits the fact that two signals, such as voices, from different physical sources are independent.

 y1 (t )   x1 (t )       y2 (t )   x2 (t )  .  .    =W ×  .  .      .  .   y (t )   x (t )   n   m 

(2)

A basic statistical approach consists of considering the xi (t ) as a set of T realization of m random variables. Thus each set xi (t ) , t= 1,…, T is a sample of one random variable; let us denote the random variable by xi . In this framework, we could determine the matrix W by the statistical properties of the transformed components yi (t ) . Returning to our discussion of data overlapping, we sometimes observe that one component of a multivariate set of data is contaminated by another component. This could happen especially when we are uncertain about: 1) Instruments and 2) Source of signals. This is the exact situation where we usually come across in seismic recordings during a seismic operation. In interpreting a seismic recording to identify the probable reservoir features, it is possible to detect one spatial component contaminated with another. This is due to nature of the procedure. An instrument is not that much accurate to filter and allow just one component: the information bits from other components may overlap and damage the desired component and may sound like a new kind of noise signal, however this an information-contained signal that should properly handled rather than being omitted using common denoising techniques. As mentioned above, if the overlapped signals are treated as classical noise signals, we may lose valuable amount of information that the overlapping signal may carry. Thus these intruders should be somehow extracted to unleash the information they transmit. Mathematically speaking, we have to make the recorded signals as independent as possible and the obtained least dependent signals can be interpreted if they were never mixed together. This technique is categorized in a group of mathematical algorithms called Independent Component analysis which from now-on we briefly address it as ICA. ICA algorithms try to extract the least dependent components out of a mixture of components while we don’t know anything about the mixing process or original

SPE 113233-PP

3

centered by subtracting the sample mean, which makes the model zero-mean. It is convenient to use vector-matrix notation instead of the sums like in the previous equation. Let us denote by X the random vector whose elements are the mixtures x1, ..., xn, and likewise by S the random vector with elements s1, ... , sn. Let us denote by A the matrix with elements aij. Generally, bold lower case letters indicate vectors and bold upper-case letters denote matrices. All vectors are understood as column vectors; thus X T , or the transpose of X, is a row vector. Using this vector-matrix notation, the above mixing model is written as: X=A.S Fig 1 Source signals distribution, Stone(2004) This implies that, if the two different source voice signals shown in the top panels are examined at a fine time scale then the amplitude of one voice (Fig 1,top left) at any given time provides no information regarding the amplitude of the other voice (top right) at that time. This can be confirmed graphically by plotting the amplitude of one voice at each time point against the corresponding amplitude of the other voice (bottom panel). The resultant distribution of points does not indicate any obvious pattern, suggesting that the two voice signals are independent. While it is true that two voice signals are unrelated, this informal notion can be captured in terms of statistical independence. If two or more signals are statistically independent of each other then the value of one signal provides no information regarding the value of the other signals. Before considering how ICA works, we need to introduce some terminology. As its name suggests, independent component analysis separates a set of signal mixtures into a corresponding set of statistically independent component signals or source signals. The mixtures can be sounds, electrical signals, e.g., electroencephalographic (EEG) signals, or images (e.g., faces, fMRI data). The defining feature of the extracted signals is that each extracted signal is statistically independent of all the other extracted signals. To rigorously define ICA, Jones et.al (1987), Jutten et.al (1991), we can use a statistical ``latent variables'' model. Assume that we observe n linear mixtures x1,..., xn of nindependent components (4)

Sometimes we need the columns of matrix A; denoting them by a j the model can also be written as:

x = ∑ a i si
i =1

n

(5)

The statistical model in Eq.5 is called independent component analysis, or ICA model. The ICA model is a generative model, which means that it describes how the observed data are generated by a process of mixing the components si. The independent components are latent variables, meaning that they cannot be directly observed. Also the mixing matrix is assumed to be unknown. All we observe is the random vector X, and we must estimate both A and S using it. This must be done under as general assumptions as possible. The starting point for ICA is the very simple assumption that the components are statistically independent. It will be seen below that we must also assume that the independent component must have nongaussian distributions. However, in the basic model we do not assume these distributions known (if they are known, the problem is considerably simplified.), Then, after estimating the matrix A, we can compute its inverse, say W, and obtain the independent component simply by: S=WX (6)

x j = a j1s1 + a j 2 s2 + ... + a jn sn For all j

. (3)

We have now dropped the time index t; in the ICA model, we assume that each mixture xj as well as each independent component Sk is a random variable, instead of a proper time signal. The observed values xj(t), e.g., the microphone signals in the cocktail party problem, are then a sample of this random variable. Without loss of generality, we can assume that both the mixture variables and the independent components have zero mean: If this is not true, then the observable variables xi can always be

ICA is very closely related to the method called blind source separation (BSS) or blind signal separation. A ``source'' means here an original signal, i.e. independent component, like the speaker in a cocktail party problem. ``Blind'' means that we no very little, if anything, on the mixing matrix, and make little assumptions on the source signals. ICA is one method, perhaps the most widely used, for performing blind source separation. To define the concept of independence, consider two scalar-valued random variables y1 and y2. Basically, the variables y1and y2 are said to be independent if information on the value of y1 does not give any information on the value of y2, and vice versa. Above, we noted that this is the case with the variables s1, s2 but not with the mixture variables x1, x2. Technically, independence can be defined

4

SPE 113233-PP

by the probability densities. Let us denote by p(y1,y2) the joint probability density function (pdf) of y1 and y2. Let us further denote by p1(y1) the marginal pdf of y1, i.e. the pdf of y1 when it is considered alone:

not determine the W exactly, because we have no knowledge of matrix A, but we can find an estimator that gives a good approximation. To see how this leads to the basic principle of ICA estimation, let us make a change of variables.

P ( y1 ) = ∫ P( y1 , y2 )dy2 1

(7)

and similarly for y2. Then we define that y1 and y2 are independent if and only if the joint pdf is factorizable in the following way:

= AT W , then we T T T have: y = W X = W As = Z s . y is thus a linear combination of si , with weights given by z i . Since a sum
Defining z of even two independent random variables is more Gaussian than the original variables,

P ( y1 , y2 ) = P ( y1 ) P2 ( y2 ) 1

z T s is more

(8)

Gaussian than any of the

si and becomes least Gaussian si . In this case, obviously
that maximizes the

This definition extends naturally for any number n of random variables, in which case the joint density must be a product of n terms. The definition can be used to derive a most important property of independent random variables. Given two functions, h1 and h2, we always have

when it in fact equals one of the only one of the elements could take as W a
T

z i of z is nonzero. Therfore, we
vector

E{h1 ( y1 )h2 ( y 2 )} = E{h1 ( y1 )}E{h2 ( y 2 )}
This can be proven as follows:

nongaussianity of W X . Such a vector would necessarily correspond to a Z which has only one nonzero component. This means that W X = Z s equals one of the independent components. Another approach for ICA estimation, inspired by information theory, is minimization of mutual information. We will explain this approach here, and show that it leads to the same principle of finding most nongaussian directions as was described above. In particular, this approach gives a rigorous justification for the heuristic principles used above. Using the concept of differential entropy, we define the mutual information I between m (scalar) random variables, yi, i=1...m as follows:
T T

(9)

E{h1 ( y1 )h2 ( y 2 )} = ∫ ∫ h1 ( y1 )h2 ( y 2 ) P ( y1 , y 2 )dy1 dy 2 =

∫ ∫ h ( y ) P ( y )h ( y ) P ( y )dy dy = ∫ h ( y ) P dy × ∫ h ( y ) P dy = E{h ( y )}E{h ( y )}..(10)
1 1 1 1 2 2 2 2 1 2 1 1 1 1 2 2 2 2 1 1 2 2

Intuitively speaking, the key to estimating the ICA model is nongaussianity. Actually, without nongaussianity the estimation is not possible at all. This is at the same time probably the main reason for the rather late resurgence of ICA research: In most of classical statistical theory, random variables are assumed to have gaussian distributions, thus precluding any methods related to ICA. The Central Limit Theorem, a classical result in probability theory, tells that the distribution of a sum of independent random variables tends toward a gaussian distribution, under certain conditions. Thus, a sum of two independent random variables usually has a distribution that is closer to gaussian than any of the two original random variables. Let us now assume that the data vector X is distributed according to the ICA data model we previously proposed, .i.e it is a mixture of independent components. For simplicity, we assume that all the independent components have identical distributions. To estimate one of the independent components, we consider a linear combination of the xi ; let us denote this by

I {y1 , y 2 ,... y m } = ∑ H ( yi ) − H ( y )
i =1

m

(11)

y = W T X = ∑ wi xi where W is a vector to be determined.
i

If W were one of the rows of the inverse A, this linear combination would actually equal one of the independent components. The question is now: How could we use the centeral limit theorem to determine W so that it would equal one of the rows of the inverse A? In practice, we can

Mutual information is a natural measure of the dependence between random variables. In fact, it is equivalent to the well-known Kullback-Leibler divergence between the joint density f ( y ) and the product of its marginal densities; a very natural measure for independence. It is always non-negative, and zero if and only if the variables are statistically independent. Thus, mutual information takes into account the whole dependence structure of the variables, and not only the covariance, like PCA and related methods. Mutual information can be interpreted by using the interpretation of entropy as code length. The terms H(yi) give the lengths of codes for the yi when these are coded separately, and H(y) gives the code length when y is coded as a random vector, i.e. all the components are coded in the same code. Mutual information thus shows what code length reduction is obtained by coding the whole vector instead of the separate components. In general, better codes can be obtained by coding the whole vector. However, if the yi are independent, they give no information on each other, and one could just as well code the variables separately without increasing code length.

SPE 113233-PP

5

An important property of mutual information, Ristaniemi(1999),Cover(1991), is that we have for an invertible linear transformation y = WX :

I {y1 , y2 ,... ym } = ∑ H {yi }− H {x} − log det W ..(12)
i

Now, let us consider what if we constrain the uncorrelated

yi to be
means

E yy T = WE xxT W T = I , which implies must be ,
and this implies that det W must be constant. Morever, for yi of unit variance, entropy and negentropy differ only by a constant, and the sign. Thus we obtain,

{ }

and

{ }

of

unit

variance.

This

I {y1 , y 2 ,..., y m } = C − ∑ J ( yi )
i

(13)

Where C is a constant that does not depend on W. This shows the fundamental relation between negentropy and mutual information.Since mutual information is the natural information-theoretic measure of the independence of random variables, we could use it as the criterion for finding the ICA transform. In this approach that is an alternative to the model estimation approach, we define the ICA of a random vector X as an invertible transformation as in Cichocki(1997), where the matrix W is determined so that the mutual information of the transformed components si is minimized. It is now obvious that finding an invertible transformation W that minimizes the mutual information is roughly equivalent to finding directions in which the negentropy is maximized. More precisely, it is roughly equivalent to finding 1-D subspaces such that the projections in those subspaces have maximum negentropy. Rigorously speaking, it shows that ICA estimation by minimization of mutual information is equivalent to maximizing the sum of nongaussianities of the estimates, when the estimates are constrained to be uncorrelated. This is the main idea behind the ICA by using mutual information. ICA in seismic signal purification As we discussed earlier, sesmic signals are typical candidates of data mixing and overlapping. In practice it is observed, Ghasem-al-askari(2007) that , a spatila component is sometimes contaminated up to 20% by another spatial component which leads to a considerable information loss. That is due to the nature of the way we usually follow to remove these unexpected guests. They are usually treated as simple noises and thus are omitted using usual frequency filters. But as we now understand these are signals with valuable information that are now embedded within our desired signal to make it as much as possible similar to a noisy signal. What we have to do to recover these bits of information is to perform an ICA upon the collection of observed signals to extract embedded signals.

To do so, we use an algorithm that uses the maximization of guassianity between observed signals to make them independent. The best candidate for a fast ICA, is indeed the FAST-ICA algorithm, Govert(2005)is is a computationally highly efficient algorithm that is shown to be 10-100 times faster than any other method to extract the hidden factors among a set of data. Although we mathematically have shown that ICA gives estimates for the unmixed signals, we have to provide a tool to test the reliability of ICA outputs. This task can be done by taking the Mutual information changes into account. According to what we previously mentioned Mutual information or MI, has an important property: it is always positive and non-zero and is zero only and if only, the examined signals in dataset are statistically independent. So MI provides a good way of checking the results of ICA: if the MI transfer between ICA outputs is less than the MI transfer between original signals, then we hope to obtain the signals that are less dependent in nature. In this paper we have used the algorithm proposed by Astakhov(2004) to compute the MI transfer between signals. ICA output and verification We implemented the ICA on a set of seismic records obtained from Building & Housing Research Center, Iran, which contains the ground level acceleration in 3 spatial directions, namely comp-08, comp-98 and comp-up. Using the FAST-ICA algorithm we try to make the signals as independent as possible. After we have implemented the ICA algorithm three signals are obtained and are compared to original signals in Fig 2.
up-compone nt
3.00E-02

2.50E-02

2.00E-02

1.50E-02

1.00E-02 (cm/sec/sec) "up"-original "up"-independent

5.00E-03

0.00E+00

-5.00E-03

-1.00E-02

-1.50E-02

-2.00E-02 (ms)

Fig 2 ICA (white line) compared to Original Signals (blue line)

6

SPE 113233-PP

"08"
1.50E-02

Total MI
ICA output : 0.018007 Original : 0.076635

1.00E-02

5.00E-03

ICA output
(c m /s e c /s e c ) 0.00E+00 independent original -5.00E-03

-1.00E-02

Original time series

-1.50E-02

-2.00E-02 (ms)

Fig 4 Total MI transfer Conclusion ICA is a method that is widely used in different branches which deal with signals of multivariate set of data. ICA can extract underlying and hidden factors embedded in original observed signals. Here we have applied ICA to extract the overlapped signals from observed signals of seismic recordings. This can help to extract more information from a given set of data. ICA reliability is checked using a concept called mutual information transfer. This criteria shows that we can reduce the degree of dependency of signals to a lower level using independent component analysis. Acknowledgments Authors are grateful to Dr.Astakhov for his help and support. Thanks also to staff of mathematics lab in Helsinki University who kindly provided the software for the FAST-ICA algorithm. The help from COSMOS data center for providing the seismic data is also appreciated. Nomenclature A= Separation Matrix I= Mutual information f(y)= joint density function P(y) = Marginal probability density function Sij= Random vector element SNR= Signal to noise ratio W= Mixing matrix Y(t)= Matrix of original signals X(t)= Matrix of observed signals References 1-Astakhov. Sergey A. ,Harald Stögbauer, Alexander Kraskov and Peter Grassberger, John-von-Neumann Institute for Computing, Forschungszentrum Jülich, D52425 Jülich, Germany, Least-dependent-component analysis based on mutual information, PHYSICAL REVIEW E 70, 066123 (2004) 2-Cichocki. A.and R.E. Bogner, L. Moszczynski, and K. Pope. Modified Herault-Jutten algorithms for blind separation of sources. Digital Signal Processing, 7:80 93, 1997. 3-Cover. T. M and J. A. Thomas,Elements of Information Theory. John Wiley & Sons, 1991.

"98"
3.00E-02

2.50E-02

2.00E-02

1.50E-02

1.00E-02 (c m /s ec /s e c ) "98" original 5.00E-03 "98" independent

0.00E+00

-5.00E-03

-1.00E-02

-1.50E-02

-2.00E-02 (ms)

Fig 2 ICA (white line) compared to Original Signals (blue line) Continued At this point we try to compare the MI transfer between original and output signals. First we compute the pairwaise MI transfer between original and outputs. As we can see in figure 3, the MI has decreased considerably.

Pairw ise MI transfer comparison

0.35 0.3 0.25 0.2 MI 0.15 0.1 0.05 0 original "up"-98 "up"-"08" "08"-"98" 0.26153875 0.30343479 0.3244785 independent 0.08329666 0.00616777 0.00938168 "08"-"98" "up"-"08" "up"-98 "up"-98 "up"-"08" "08"-"98"

Fig 3 Pariwise MI transfer comparison The same result is obtained if we compute the total MI transfer. This is shown in figure 4.

SPE 113233-PP

7

4-Ghasem-al-Askari .M.K, Explorational geopysics presentation,Petroleum University of Technology,2007 5-Govert Hugo, FastICA package for Matlab 7.x and 6.x Version 2.5, October 19 2005-Hyvärinen.A, Gaussian moments for noisy independent component analysis. IEEE Signal Processing Letters, 6(6):145-147, 1999 6-Jones. M.C. and R. Sibson. What is projection pursuit? J. of the Royal Statistical Society, ser. A, 150:1-36, 1987. 7-Jutten.V and J. Herault. Blind separation of sources, part I: An adaptive algorithm based on neuromimetic architecture. Signal Processing, 24:1-10, 1991. 8-Ristaniemi. T and J. Joutsensalo. On the performance of blind source separation in CDMA downlink.

In Proc. Int. Workshop on Independent Component Analysis and Signal Separation (ICA'99), pages 437-441, Aussois, France, 1999. 9-St¨ogbauer Harald, Alexander Kraskov, and Peter Grassberger, John-von-Neumann Institute for Computing, Forschungszentrum J¨ulich, D-52425 J¨ulich, Germany, Estimating Mutual Information, (May 18, 2006) 10-Stone James,V. Independent component analysis, A Bradford Book ,The MIT Press ,Cambridge, Massachusetts London, England,2004.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master Your Semester with a Special Offer from Scribd & The New York Times

Cancel anytime.