You are on page 1of 6

2017 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT.

2528, 2017, TOKYO, JAPAN

NONLINEAR PREDICTION-ERROR FILTERS

H. S. Carvalho, F. Shams, M. H. Gumiero, R. Ferrari and L. Boccato

School of Electrical and Computer Engineering, University of Campinas, Campinas, SP, Brazil.

ABSTRACT deconvolution approach, originally proposed by [4], models


the observed signal as the result of the convolution between
. In this work, we investigate the possibility of employing
the multiple generating system and a signal associated with
nonlinear structures . The obtained results in the context of
the reflectivity function of the subsurface and with the source
synthetic data clearly reveal the potential of this approach.
signature (wavelet). Then, the idea is to design a filter clas-
Index Terms Deconvolution, seismic removal, prediction- sically, a linear finite impulse response (FIR) system to pre-
error filter, extreme learning machines, echo state networks dict the multiples, supposedly of a periodic nature, so that
the multiple generating system is canceled by subtracting the
1. INTRODUCTION filter output from the observed signal. This method is partic-
ularly effective when the assumptions of multiple periodicity
Undoubtedly, blind deconvolution constitutes an important is valid, and when the depth for the first layer below the sur-
task in signal processing, whose objective is to retrieve an face is relatively small, where velocity-based methods do not
specific signal from a collection of measurements that can perform well. A typical example of this situation is a shallow
be seen as the result of the convolution between such signal water layer in maritime seismic acquisition.
and an unknown system, possibly with the presence of noise, Since the sensors must be positioned at a certain mini-
having at disposal a minimum amount of a priori information mum distance (offset) from the source (e.g., 150 meters for
about both the signal and the distortion system. This type maritime acquisition), the multiple reflections are, in fact, not
of problem naturally appears in several applications, such as periodic. In this case, in order to employ the predictive ap-
channel equalization, source separation of convolutive mix- proach, it is necessary to restore the periodicity, which can be
tures and seismic exploration [1]. achieved with the aid of plane-wave decomposition methods
In the context of seismic exploration, the collected data []. A relatively simple approximation of the plane-wave de-
refer to reflection measurements observed by receivers posi- composition corresponds to the -p (linear Radon) transform
tioned at the surface of the earth or the sea, while recording []. However, the application of the -p transform ends up in-
reflections in different layers of the subsurface of the acoustic troducing an amplitude distortion in the multiple reflections,
wave generated by a source [2]. From these measurements, being more severe when seismic traces associated with ini-
the goal is to estimate the reflectivity function of the subsur- tial offsets, including zero offset, are not available [], which
face, which identifies the existing geological structures and/or certainly poses difficulties for the prediction filter, especially
layers. when the adopted structure does not have enough flexibility.
However, reflections of the acoustic wave also occur in the This work deals with the aforementioned scenario and
opposite direction, meaning that part of the energy is reflected proposes to investigate the application of nonlinear structures
back to the subsurface before returning to the surface and, for multiple removal according to the predictive approach.
eventually, to the sensors. As a result, multiple reflections as- The idea we intend to motivate is that, by using a more flex-
sociated with the same layer will occur. In this case, the multi- ible structure for predicting the multiples, the amplitude dis-
ples are normally treated as an undesirable noise that compli- tortion introduced by the transform can be counterbalanced in
cates the interpretation of the seismic images at a later stage a more effective manner, when compared with a linear filter.
and, therefore, should be removed. This challenge character- To the best of our knowledge, this is the first study of such
izes the problem known as seismic multiple removal (SMR) perspective.
[3, 2].
In this initial study, we shall study the performance of two
A variety of techniques have been developed for SMR,
neural network models, viz., the extreme learning machines
each being more adequate for particular scenarios depending
(ELMs) [5] and the echo state networks (ESNs) [6], since
on the expected characteristics of the collected data and the
these structures offer an elegant balance between processing
properties of the subsurface at that location. The predictive
capability and computational complexity, and have been suc-
Thanks to CNPq for the finantial support. cessfully applied in adaptive filtering tasks [7]. As we shall

c
978-1-5090-6341-3/17/$31.00 2017 IEEE
observe, the results for synthetic seismic data are promising where W RLN gives the coefficients of such linear com-
and encourage the extension of this study. binations and L is the number of outputs.
Suppose that all the parameters of the network have been
2. PROBLEM STATEMENT predefined, except the output weight matrix W. Having ac-
cess to a training set {u(n), d(n)}, n = 1, . . . , T , where
In the case of zero offset, the seismic trace can be modeled d(n) represents the reference (target) signal for the network
as: output, the training process is reduced to the problem of
t(k) = rp (k) w(k) m(k) + (k), (1) adapting the coefficients in W, which can be formulated as
a linear regression task, so that a closed-form solution in
where m(k) denotes the impulse response of the multiple
the least-squares sense can be obtained with the aid of the
generating system, w(k) is the wavelet and (k) represents
Moore-Penrose pseudo-inverse.
an additive noise.
The possibility of keeping part of the neural structure un-
In order to recover the periodicity of the multiple events,
trained and fixed simplifies the adaptation process of the net-
the -p (also known as linear Radon) transform is applied to
work as well as avoids the use of iterative gradient-based al-
the set of collected traces.
gorithms. This attractive idea forms the essence of both ELMs
[5] and ESNs [6].
3. ESN AND ELM In the case of ELMs, the input weights (Wi ) can be ran-
domly created according to any continuous probability den-
Consider the neural network architecture depicted in Figure sity function, since the neural network preserves the universal
1, which represents the basic structure of echo state networks approximation capability, as shown in [5, 8]. Additionally,
and extreme learning machines. a broad class of nonlinear functions can be employed as the
activation function of the hidden neurons [9].
Hidden layer (Dynamical reservoir )
The presence of feedback connections in ESNs provides
W dynamical properties to the network processing, which can
be particularly useful when the input signals present tempo-
Wi ral dependency, as occurs in the context of adaptive filtering
and time series prediction [6, 7]. However, it also brings an
additional concern regarding the stability and dynamic behav-
x(n)
ior of the model. Hence, a fundamental result known as echo
Input Output
state property (ESP) is explored for an adequate, yet simple,
u(n) y(n)
W
r definition of the dynamical reservoir parameters [6, 10].
The ESP ensures that the effect of the initial state (x(0))

asymptotically vanishes, so that the state vector x(n) reflects


the recent history of the input signal [6], thus generating an
internal memory of the input signal. Interestingly, there are
several simple recipes for creating Wr in order to meet the
Fig. 1. Architecture of ESNs and ELMs. The dotted arrows conditions required by the ESP, as presented by [10]. For ex-
represent the feedback connections, which are present only in ample, any matrix Wr with non-negative elements having its
ESNs. maximum absolute eigenvalue inside the unity circle guaran-
tees the existence of echo states. So, following the spirit of the
The activation of the internal neurons can be computed as previous discussion, the input weights (Wi ) and the recurrent
follows: connection weights (Wr ) of ESNs can be set in advance and
independently of the task.
x(n) = f (Wi u(n) + Wr x(n 1)) (2) Therefore, the effective use of ELMs and ESNs involves:
where Wi RN K is the matrix containing the input (i) the definition of the hidden layer (dynamical reservoir) pa-
synaptic weights, u(n) is the input signal vector, Wr rameters, which can be arbitrary or, in the case of ESNs, hav-
RN N is the recurrent connection weight matrix, f () = ing in view the conditions that guarantee the echo state prop-
[f1 () . . . fN ()]T specifies the activation function of the erty; (ii) the solution of a linear regression task to obtain the
internal neurons, K is the number of inputs and N denotes output weights.
the number of internal neurons. The signals generated by
the hidden layer, or dynamical reservoir, as usually called in 4. EXPERIMENTAL RESULTS
ESNs, are linearly combined to produce the network outputs,
so that: In this section, we present the procedure used in the genera-
y(n) = Wx(n), (3) tion of the synthetic seismic traces, along with the method-
ology for selecting the parameters of ELMs and ESNs, and, primary event. The selected trace in -p domain, considering
finally, we analyze the obtained results for both networks in p = 2.1 105 s/m, along with its autocorrelation function,
the selected scenarios. All the experiments have been carried are shown in Figure 3. As we can observe, the second primary
out in Matlab R
, using routines of the SeismicLab package event occurs between two multiples associated with the first
[11]. primary.

Normalized trace amplitude


4.1. Synthetic data generation and scenarios 0.2

Shallow water layer 0


The selected trace t(k) in -p domain is normalized and
scaled according to the following procedure: -0.2

Primaries
t(k) t -0.4
1. tn (k) = , 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
t tau (s)
0.2 10-4
2. tn (k) = tn (k), 10
tmax

Autocorrelation
n
5
where t and t denote the mean and the standard deviation of
the trace, tn (k) is the normalized trace at instant n and tmax
n is
0

the maximum amplitude of the normalized trace.


-5
In the first scenario, the set of seismic traces contains a 0 50 100 150 200 250 300
single primary reflection, which occurs after 0.1 seconds, and lag

the corresponding multiple events. Figure 2 exhibits the nor-


malized trace in -p domain associated with the ray parame- Fig. 3. Normalized trace in -p domain and the correspond-
ter p = 2.1 105 s/m, as well as the trace autocorrelation ing autocorrelation function considering the presence of two
function. Albeit being a very simple scenario, it is useful for primary reflections and their multiples.
demonstrating the difficulties faced by a linear deconvolution
filter and for motivating the application of nonlinear struc-
tures. 4.2. Parameter setting
Two crucial parameters for the predictive deconvolution ap-
Normalized trace amplitude

0.2
proach are the prediction lag (L) and the number of input
0
samples used by the deconvolution filter (K). With respect
to the prediction lag, the distance between the event at zero
-0.2
Multiples
time in the autocorrelation function and the first parallel event
Primary serves as an indicative of the period of the multiple events and,
-0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 hence, can be used as an initial guess for L [2]. The number
tau (s) of inputs, on the other hand, is usually selected taking into
10-4 account the length of the wavelet and the width of the first
10
parallel peak in the autocorrelation. However, when the trace
Autocorrelation

5 contains several primary events, it is advisable to keep K as


small as possible in order to reduce the chance of eliminating
0
these primaries along with the multiples. This concern is par-
-5 ticularly important in the second scenario, where two primary
0 50 100 150 200 250 300
events are observed in the seismic trace.
lag
Another important parameter to be defined refers to the
number of neurons (N ) in the hidden layer of ELMs and
Fig. 2. Normalized trace in -p domain and the corresponding ESNs, which also needs to be carefully adjusted, since a rel-
autocorrelation function considering the presence of a single atively large value of N not only may increase the flexibility
primary event and its multiples. of the prediction model, but also the possibility of simultane-
ously attenuating other primaries. On the other hand, a rel-
In the second scenario, another primary event occurs after atively small value of N may not be sufficient for a correct
0.3 seconds and its multiples are also present in the seismic estimation of the multiples.
trace. In this case, not only do the prediction model have to Having all these considerations in mind and, based on pre-
estimate the multiples, but also it must preserve the second liminary experiments with the trace shown in Figure 3, we
adopted the following values for the three aforementioned pa- eliminate the multiples. This limitation is related to the lin-
rameters: L = 17, K = 10 and N = 35, for ELMs; L = 20, ear structure of the filter, which is not able to cope with the
K = 5 and N = 40, for ESNs. amplitude distortion promoted by the application of the -p
With respect to the remaining adjustable parameters of the transform.
networks, the input weights of ELM were selected according Figures 5 and 6 shows the average prediction error ob-
to a uniform distribution in the interval [1, 1]. Analogously, served at the output of the ELM and the ESN, respectively, as
the input weights of ESN were randomly selected according well as the autocorrelation function.
to a Gaussian distribution with zero mean and unit variance.
Moreover, each element in the recurrent weight matrix Wr

Normalized trace amplitude


0.2
was generated according to a uniform distribution in the inter- ELM
Original trace
val [0, 1] and, then, Wr was scaled in order that the maximum 0
absolute eigenvalue was equal to 0.8.
Since the intermediate layers of ELMs and ESNs are ran- -0.2

domly created, in order to analyze the consistent performance


-0.4
of these models, we present the results involving the filtered 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

trace, i.e., the prediction error at the output of the networks, tau (s)
10-4
considering an average of NE = 100 independent experi- 5

ments. Autocorrelation

4.3. First Scenario: One Primary


-5
Considering the normalized trace in -p domain shown in Fig- 0 50 100 150 200 250 300
ure 2, a FIR filter with K = 70 coefficients was employed for lag
predicting the multiples, using a lag of L = 10. The predic-
tion error obtained at the output of the linear filter, along with Fig. 5. Prediction error and the corresponding autocorrelation
its autocorrelation function, are shown in Figure 4. function for the ELM.
Normalized trace amplitude

Normalized trace amplitude

0.2 FIR
Original trace 0.2
ESN
0 Original trace
0
-0.2
-0.2
-0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-0.4
tau (s) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
-4
10 tau (s)
10
-4
10
Autocorrelation

4
5
Autocorrelation

0 0

-2
-5
0 50 100 150 200 250 300
-4
lag 0 50 100 150 200 250 300
lag

Fig. 4. Prediction error and the corresponding autocorrelation


function for the FIR filter. Fig. 6. Prediction error and the corresponding autocorrelation
function for the ESN filter.
As we can perceive, the linear predictor was not capable
of adequately estimating the multiples, which, thus, remain By observing Figures 5 and 6, it is possible to affirm that
visible in the prediction error. In fact, even using an elevated the nonlinear structures were capable of significantly attenu-
number of coefficients which, as discussed in Section 4.2 ating the multiples. Additionally, we can also notice that the
is certainly not recommended when the trace contains more autocorrelation around the lag of 20, which is approximately
than one primary reflection, since it increases the probability the periodicity of the multiples, has been strongly attenuated,
of the filter distorting other primaries , the filter could not which is another evidence that the networks were able to ad-
equately estimate the multiples.
In order to analyze the robustness of the studied nonlinear
approach, Figure 7 displays the standard deviation associated
with each sample of the filtered trace considering the set of
NE = 100 independent simulations for both the ELM and

Normalized trace amplitude


0.2
the ESN. ELM
Original trace
0
0.03
ESN
ELM -0.2
0.025
-0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Standard Deviation

0.02
tau (s)
10-4
10
0.015

Autocorrelation
5
0.01

0
0.005

-5
0 0 50 100 150 200 250 300
0 0.5 1 1.5 2 2.5 3 lag
tau (s)

Fig. 7. Standard deviation as a function of considering Fig. 8. Prediction error and the corresponding autocorrelation
NE = 100 independent experiments with ELM and ESN. function for the ELM.

By comparing the range of values of the standard devia-


tion with that of the amplitude of the normalized trace (Fig-
ure 2), it is possible to conclude that the filtered traces do not
significantly vary from one experiment to another and, more
important, that the multiples are significantly attenuated in all
repetitions for both networks.
Normalized trace amplitude

4.4. Second Scenario: Two Primaries 0.2


ESN
Now, we analyze the case, depicted in Figure 2, in which the Original trace
0
seismic trace contains two primaries and the respective multi-
ples. Using the parameter configuration described in Section -0.2
4.2, we show in Figures 8 and 9 the average filtered trace pro-
duced by the ELM and the ESN, respectively, as well as the -0.4
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
autocorrelation function. tau (s)
As we observe, similarly to the previous scenario, both 10-4
5
the networks were capable of reducing the multiples and, at
Autocorrelation

the same time, preserving the primary events. By comparing


the autocorrelation of the original trace (Figure 2) with those 0

associated with the prediction error at the ELM / ESN out-


put, it is possible to notice that all the side peaks have been -5
strongly attenuated, except the peak occurring around the lag 0 50 100 150 200 250 300
lag
of 50, which is related to the correlation between the primary
events.
Finally, the results achieved in this study do not explicitly Fig. 9. Prediction error and the corresponding autocorrelation
indicate a preference for a particular network. Nonetheless, function for the ELM.
a potential advantage of ESNs is associated with the use of
a smaller number of input samples (K) of the seismic trace,
which may be related to the fact that, due to the feedback
connections, the ESN creates an internal memory of the input
history and uses it for estimating the multiple events. [10] I. B. Yildiz, H. Jaeger, and S. J. Kiebel, Re-visiting the
echo state property, Neural Networks, vol. 35, pp. 19,
5. CONCLUSION 2012.

[11] SeismicLab, http://seismic-lab.physics.


The obtained results clearly motivate the use of nonlinear ualberta.ca/, Accessed: 2017-05-19.
structures and the continuity of this investigation. The next
stages of the ongoing research involve a more detailed eval-
uation of the performance, including a parameter sensitivity
analysis, of ESNs and ELMs, as well of other nonlinear struc-
tures, when applied to real seismic data. Moreover, we intend
to investigate the potential benefits of using an adaptive stage
for subtracting the estimated multiples from the original trace.

6. REFERENCES

[1] J. M. T. Romano, R. R. de F. Attux, C. C. Caval-


cante, and R. Suyama, Unsupervised Signal Process-
ing: Channel Equalization and Source Separation, CRC
Press, 2010.

[2] D. J. Verschuur, Seismic Multiple Removal Techniques:


Past, present and future. Revisited Edition, EAGE Pub-
lications, 2006.

[3] E. A. Robinson and S. Treitel, Geophysical Signal Anal-


ysis, Society Of Exploration Geophysicists, 2000.

[4] E. A. Robinson, Predictive Decomposition of Time Se-


ries with Applications to Seismic Exploration, Ph.D.
thesis, Department of Geology and Geophysics, MIT,
1954.

[5] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme


learning machine: theory and applications, Neurocom-
puting, vol. 70, pp. 489501, 2006.

[6] H. Jaeger, The echo state approach to analyzing


and training recurrent neural networks, Tech. Rep.
148, German National Research Center for Information
Technology, 2001.

[7] L. Boccato, A. Lopes, R. Attux, and F. J. Von Zuben,


An extended echo state network using volterra filtering
and principal component analysis, Neural Networks,
vol. 32, pp. 292302, 2012.

[8] G.-B. Huang, L. Chen, and C.-K. Siew, Universal ap-


proximation using incremental constructive feedforward
networks with random hidden nodes, IEEE Transac-
tions on Neural Networks, vol. 17, no. 4, pp. 879892,
2006.

[9] G.-B. Huang, D. H. Wang, and Y. Lan, Extreme learn-


ing machines: A survey, International Journal of Ma-
chine Learning and Cybernetics, vol. 2, no. 2, pp. 107
122, 2011.