You are on page 1of 158

Seismic Data Processing

TA3600 / TG001

G.G. Drijkoningen
D.J. Verschuur
Centre for Technical Geoscience (CTG)
Delft University of Technology
P.O. Box 5028
2600 GA Delft
The Netherlands

September 2003

c 2003
Copyright
All rights reserved.
No parts of this publication may be reproduced,
stored in a retrieval system, or transmitted,
in any form or by any means, electronic,
mechanical, photocopying, recording, or otherwise,
without the prior written permission of the
Centre for Technical Geoscience.

Contents
1 Introduction

2 Basic signal analysis

1.1 Exploration seismics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


1.2 Structure of lecture notes . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1
2.2
2.3
2.4
2.5

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . .
The Fourier transform . . . . . . . . . . . . . . . . . . . . . .
The discrete Fourier transform . . . . . . . . . . . . . . . . .
The spatial Fourier transform . . . . . . . . . . . . . . . . . .
The two-dimensional Fourier transform (in relation to waves)

3 A basic seismic processing sequence


3.1
3.2
3.3
3.4
3.5

Seismic processing and imaging . . . .


Sorting of seismic data . . . . . . . . .
Normal move-out and velocity analysis
Stacking . . . . . . . . . . . . . . . . .
Zero-o set migration . . . . . . . . . .

4 Extended Processing
4.1
4.2
4.3
4.4
4.5
4.6

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.

Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Data formats . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Surveying information and eld geometry . . . . . . . . . . . . . . . . .
Trace editing and balancing . . . . . . . . . . . . . . . . . . . . . . . . .
Static corrections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6.1 Deterministic deconvolution in the frequency domain . . . . . . .
4.6.2 Deterministic deconvolution in the time domain : Wiener lters .
4.6.3 Statistical deconvolution: minimum phase deconvolution . . . . .
4.6.4 Statistical deconvolution: predictive deconvolution . . . . . . . .
3

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

2
5

. 6
. 6
. 11
. 12
. 13
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

18
18
20
25
34
37

47
47
48
49
50
53
59
59
62
68
69

4.7
4.8
4.9
4.10
4.11

4.6.5 Spiking deconvolution . . . . . . . . . . . . .


Filtering in the (f; kx ) domain . . . . . . . . . . . .
Dip Move-Out (DMO) / Pre-stack Partial Migration
Zero o set (poststack) migration algorithms . . . . .
Conversion from time to depth . . . . . . . . . . . .
Prestack migration . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

. 74
. 77
. 84
. 92
. 108
. 112

5 3D seismic processing

114

A Discretisation of Fourier transform

124

B Derivation of the wave equation

130

C The de nition of SEG-Y

134

D Traveltime equation for a dipping refracting boundary

136

E Correlation of signals

139

F Wiener lters

142

G Derivation of the DMO-ellipse

146

H Derivation of the Kirchho integral

149

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114


5.2 Midpoint oriented processing . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.3 3D Poststack migration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Chapter 1

Introduction
1.1 Exploration seismics
The object of exploration seismics is obtaining structural subsurface information from
seismic data, i.e., data obtained by recording elastic wave motion of the ground. The
main reason for doing this is the exploration for oil or gas elds (hydro-carbonates). In
exploration seismics this wave motion is excitated by an active source, the seismic source,
e.g. for land seismics (onshore) dynamite. From the source elastic energy is radiated into
the earth, and the earth reacts to this signal. The energy that is returned to the earth's
surface, is then studied in order to infer the structure of the subsurface. Conventionally,
three stages are discerned in obtaining the information of the subsurface, namely data
acquisition, processing and interpretation.
In seismic data acquisition, we concern ourselves only with the data gathering in the
eld, and making sure the data is of sucient quality. In seismic acquisition, an elastic
wave eld is emitted by a seismic source at a certain location at the surface. The re ected
wave eld is measured by receivers that are located along lines (2D seismics) or on a grid
(3D seismics). After each such a shot record experiment, the source is moved to another
location and the measurement is repeated. Figure 1.1 gives an illustration of seismic
acquisition in a land (onshore) survey. At sea (in a marine or o shore survey) the source
and receivers are towed behind a vessel. In order to gather the data, many choices have
to be made which are related to the physics of the problem, the local situation and, of
course, to economical considerations. For instance, a choice must made about the seismic
source being used: on land, one usually has the choice between dynamite and vibroseis;
at sea, air guns are deployed. Also on the sensor side, choices have to be made, mainly
with respect to their frequency characteristics. With respect to the recording equipment,
one usually does not have a choice for each survey but one must be able to exploit its
capabilities as much as possible.
Figure 1.2 shows two raw seismic recordings, made on land and at sea. The land shot
record (Figure 1.2a) shows immediately the strong noisy events, which are referred to as
ground-roll or surface waves (they propagate along the surface). The re ection events
(i.e. more at events at e.g. 0.9 and 1.6 seconds) are hidden by this strong noise. The
marine shot record (Figure 1.2b) is more clean, as we measure in a water layer, which is a
good conductor for sound. However, shear waves cannot propagate through water, and are
therefore not measured for marine data. Note that the rst 0.4 seconds of data are empty
(except for the direct wave from source to receivers), meaning that the water bottom is
approximately 300 meters depth (water velocity is approximately 1470 m/s). Note also in
this shot record the strong multiples of the water bottom re ection at 0.8, 1.2, 1.6 and 2.0
seconds. This indicates a major problem in marine data: the surface re ects all energy
2

Figure 1.1: Seismic acquisition on land using a dynamite source and a cable of geophones.
back into the subsurface (re ection coecient is -1, which produces a large amount of
multiply re ected events). In fact almost all events we see below 0.8 seconds are due to
multiples.
So far, the choices are made about the hardware, but for a geophysicist the choice of
the parameters of the survey are of prime importance. Relevant questions are such as:
where do we want to put our shot positions, and where our sensor positions? How often
do we want to shoot? What are the distances between the sensors themselves? These are
questions which are considered in seismic survey design.
In seismic processing, we want to manipulate our gathered data such that we obtain an
accurate image of the subsurface. To do this properly, we have to understand the physical
processes that are involved in seismic experiments. For instance, the seismic source puts
a certain signal (i.e. the source wavelet) into the earth and the structure of the subsurface
does not depend on the signal we put in. Therefore, we have to remove this source signal
from our data before we start the imaging of the subsurface. This process is called signature
deconvolution. Another undesired e ect for land data is the surface wave (see also Figure
1.2a). Since they travel along the earth's surface they do not contain any information of
the deeper interior of the earth. We would like to remove these events by ltering, often
done with two-dimensional lters, so-called (f; kx ) ltering. Again for land data, there is
an e ect of surface and near-surface topography which can have a tremendous e ect on the
total response as the near-surface conditions can vary strongly from location to location.
This e ect is corrected for in a so-called static correction procedure. As mentioned, for
marine data the multiples (related to the water surface) are a major problem, which should
be removed or suppressed in one of the rst processing steps.
So far, we considered quite deterministic aspects in the seismic experiment. However,
an important feature of seismic data is that the noise level is too high to obtain an
accurate picture of the subsurface using only one shot record experiment. Therefore,
in seismic acquisition we make a multiple-fold coverage of the subsurface in order to be
3

500

offset (m)
1000 1500

2000

0.5

0.5

1.0

1.0

1.5

1.5
time (s)

time (s)

2.0

2.0

2.5

2.5

3.0

3.0

3.5

3.5

4.0

4.0

a)

offset (m)
500 1000 1500 2000 2500 3000

b)

Figure 1.2: Raw seismic measurements. a) Shot record from a land survey. b) Shot record
from a marine survey.

able to increase the signal to noise ratio. It is because of this noise that already in our
eld procedure we take account of the fact that we want to add signals together during
processing (stacking).
The nal stage of the seismic procedure is the seismic interpretation. In general, it
is in this stage that we translate the seismic information into the information we hoped
to obtain when we started out to do our survey. Often, this is geological information,
but can also be information to a civil engineer, or someone from another eld. When we
look at a completely processed section from the point of view of the geologist, he hopes
to see structures or features which he can relate to geological phenomena, such as faults,
anticlines, etc. From the way the structures are built up, he hopes to be able to infer with
what kind of rocks we are dealing, in what kind of environment and in which times the
rocks were formed, which is all a very non-unique process. In general, we cannot infer any
geology from seismics alone: too many di erent rocks and environments can give rise to
(almost) the same seismic response. We need a lot of external geological information in
order to make the interpretation less non-unique.
4

1.2 Structure of lecture notes


We start these lecture notes with some basic notions from signal processing, such as the
Fourier transform, Nyquist criterion, aliasing, discretisation, etc. This is done in Chapter
2. We assume the reader has some acquaintance with these concepts such that we can
keep the discussion rather short.
In the next chapter we will discuss a very basic processing sequence, which makes the
reader acquainted with backbone of the more complete seismic processing sequence. In
this basic processing sequence we rst discuss the common-mid point concept, why we
use it and how it is obtained. We then discuss what methods can be used to determine
seismic velocities which are needed when we are adding traces to increase the signal to
noise ratio. As said earlier, this latter process is called stacking. Finally, we discuss the
simplest scheme of migration which takes account of the fact that we are dealing with a
wave phenomenon, and this entails that we have to remove in some way these wave e ects
in order to arrive at a seismic section which looks similar to a real vertical slice through
the subsurface. The migration, resulting in a time image, needs to be converted to depth
using a time-to-depth procedure, which is the last section in this chapter.
In Chapter 4 we will deal with more extended parts of the seismic processing sequence which can or have to be applied in order to arrive at a decent seismic section.
Amongst these processes are static corrections, deconvolution, (f; kx ) ltering, Dip MoveOut (DMO) and depth migration. We will make an e ort to discuss all these subjects in
a homogeneous manner.

Chapter 2

Basic signal analysis


2.1 Introduction
In this chapter we will discuss some basic notions from signal analysis. As the basis for
signal analysis we use the Fourier transform. We shall de ne it here. In applications
on the computer we do not deal with a continuous transform but with a discrete one,
which introduces some special e ects such as aliasing. We shall not go into much detail
and assume the reader has some acquaintance with signal processing. As an extension of
the one-dimensional Fourier transform we will also discuss the two-dimensional transform,
which in the seismic case is a transformation with respect to time and space. We will put
some more e ort in looking at speci c features in this domain. As applications we will
look at some ltering techniques, not only in the time-transformed variable, but also in
the space-transformed variable. Already in seismic data acquisition, we apply both kind
of lters so, when processing, we must be aware of this.

2.2 The Fourier transform


De nitions

The Fourier transform is the transformation of a function into weights of sines and
cosines of certain frequencies. In this transformation the sines and cosines are the basis functions into which the function is decomposed. Let us consider a function g(t) in
which t denotes time and transform this function into the transform-variable domain, the
frequency-domain, by:
Z +1
G(f ) =
g(t) exp(,2ift) dt;
(2.1)
,1p

in which i is the imaginary unit, i.e, i = ( , 1); and f is the transform variable frequency.
In our notation, we use capital letters to denote the fact that the function is represented
in the frequency domain.
The inverse transform, the reconstruction from the time signal out of its frequency
components, can be expressed as:

g(t) =

Convolution theorem

Z +1
,1

G(f ) exp(2ift)df:

(2.2)

We will not go into detail about all the properties of the Fourier transform; for that
we would like to refer to the standard book of [Bracewell, 1978]. However, we would
like to mention a few properties which will be used in these notes. One very important
one is the convolution theorem, which states that a convolution in the time domain is a
multiplication in the Fourier domain. Mathematically, when we convolve a function h(t)
with another function g(t), we obtain a multiplication of the spectra of h(t) and g(t), i.e.:

Ft

Z +1
,1

h(t0 )g(t , t0 )dt0 = H (f )G(f );

(2.3)

in which Ft denotes the Fourier transform with respect to the subscript variable t. Similarly, a convolution in the Fourier domain, is a multiplication in the time domain. We
shall frequently make use of this property.
To prove this, we rst write out the Fourier transform of h(t) and g(t) in full:

Ft [h(t)  g(t)] =

Z 1 Z 1
1

g(a)h(t , a)da exp(,2ift)dt:

(2.4)

Then the integrand is multiplied by 1 = exp(2ifa) exp(,2ifa); and the order of integration is changed:

Ft [h(t)  g(t)] =

Z1
1

g(a)

Z 1
1

h(t , a) expf,2if (t , a)gdt exp(,2ifa)da;

(2.5)

in which the Fourier transform of h(t) may now be recognized in the square brackets, and
thus:
Z1
Ft [h(t)  g(t)] = g(a)H (f ) exp(,2ifa)da:
(2.6)
1

H (f ) may now be taken outside the integral as it is independent of the integration variable
a: Then, the resulting integral can be recognized at the Fourier transform of g(t). This
completes the proof.

A simple example : boxcar-function

A simple application of this property is windowing, or bandpass ltering. In general,


in the time domain it is called windowing, while in the frequency domain windowing is
called bandpass ltering. Let us consider the window h(t) de ned as follows:

8
>
< 0 if t  ,a
h(t) = > 1 if ,a < t < a
: 0 if t  a:

(2.7)

When we apply this window in the time domain (i.e. multiply with the window function
in the time domain), we convolve with its Fourier spectrum in the frequency domain. We
therefore have to calculate the Fourier transform of the window function, which is:

Za

exp(,2ift)dt

(2.8)

2ift) a
= exp(,,2if
t=,a

(2.9)

H (f ) =

,a

= sin(2ffa) :
7

(2.10)

2
1
0

-2

-1

-2

-1

Figure 2.1: A window function in the time domain (left) corresponds to a sinc function in
the frequency domain (right); the width of the window is chosen as 2, so a = 1:
This function is drawn in gure 2.1. It is a scaled version of the so-called sinc-function and
has many side-lobes with large amplitudes. When we multiply with the window function
h(t) in the time domain, and thus convolve with function H (f ) in the frequency domain,
we will obtain a result which is not optimal. This shows that when we apply a window,
we better smooth the edges of the window.
There are many ways to smooth the sides, such as a cosine roll-o (see gure 2.2):
80
if t < t1
>
>
t
,
t

2
2
>
< cos ( t2 ,t1 2 ) if t1 < t < t2
if t2 < t < t3
h(t) = > 1
(2.11)
t
,
t

2
3
>
( t4 ,t3 2 ) if t3 < t < t4
>
: cos
0
if t4 < t:
Another much used window is the gaussian:
h(t) = exp(,a(t , t0 )2 )
for a > 0;
(2.12)
for which its Fourier transform is also a gaussian. This is depicted in gure 2.3. These
relations are important in seismic applications where very often windowing is applied, and
thus the e ect of the window must be well understood.
In this example, we used a simple function to show the e ect of ltering, and were
able to show the e ect of the lter in analytical terms. When we deal with the seismic
experiment, the lters can most of the time not be calculated analytically, and we have
to determine them in another way. Of course, ltering takes place in many ways, due to
many di erent processes. In the seismic experiment, we put a signal into the earth which
has a certain frequency content. A simple example of the spectrum of a dynamite source
signature is given in gure 2.4. When we assume that the earth is convolutional, the
output is just a multiplication of the spectrum of the earth response with the spectrum
of the source signal. But also on the sensor side, we have a sensor response, ltering once
more the input signal. The typical spectrum of a sensor on land, a geophone, is given
in gure 2.5, which has been taken from [Pieuchot, 1984]. As we can see, the amplitudes
below the resonance frequency are damped, due to the coil system of the geophone. It
acts as a lter to the data, and the total signal can be regarded as a multiplication of the
spectra of the source signal, the earth response, and the geophone response.
In a seismic recording system, we often have other lters available in the system but it
is often a matter of choice whether they are activated or not. There is however one lter
8

20

10

-3

-1

Figure 2.2: Tapering of a window function in the time domain (left) and its e ect in the
frequency domain (right) for a cosine roll-o .

-2

-1

-0

-2

-1

-0

Figure 2.3: Tapering of a window function in the time domain (left) and its e ect in the
frequency domain (right) for a gaussian tapering.
20

amplitude

amplitude

10

-2
0

0.02

0.04

0.06 0.08
time (s)

0.10

0.12

40

80
120
160
frequency (Hz)

200

240

Figure 2.4: A dynamite wavelet in the time domain (left) and in the frequency domain
(right).

which is mandatory when we deal with discretised data and that is the anti-alias lter.
This high-cut frequency lter is also an analog lter but the reason why we use it will
become clear in the next section, when we discuss the discrete Fourier transform.

Correlation

Another property, which is related to the convolution theorem, is one with respect to
correlation. Let us rst de ne the time reverse brev (t) of a signal b(t) by:
brev (t) = b (,t);
(2.13)
where b(t) is allowed to be complex and the asterisk denotes complex conjugate. Normally,
of course, we deal with real time signals. However, by allowing these signals to be complex
it is easier to see their symmetry properties. When we apply a Fourier transformation to
b(t); and take the complex conjugate of each side, we obtain:

B  (f ) =

Z 1

b(t) exp(,2ift)dt



,1
Z1
=
b (t) exp(2ift) dt
,1
Z1
=
b (,t) exp(,2ift) dt
,1
= Ft [b (,t)]

Figure 2.5: A typical geophone response curve (from Pieuchot, 1984).

10

(2.14)

= Ft [brev (t)];
which is the Fourier transform of brev (t): When we de ne the cross-correlation function
ab of two signals a(t) and b(t) as:

ab ( ) =

Z1

a(t)b (t ,  )dt;

,1

(2.15)

then this can be recognized as a convolution of a(t) with the time reverse of b(t):

ab ( ) =

Z1

,1

a(t)brev ( , t) dt

= a( )  brev ( )
(2.16)
So then we obtain in the frequency domain:
ab (f ) = A(f )B  (f ):
(2.17)
The function ab (f ) is known as the cross-spectrum. It can be seen that the correlation of
a(t) with b(t) is not necessarily the same as the correlation of b(t) with a(t); but we still
have the symmetry that:
ab ( ) = ba (, )
(2.18)
and thus
ab (f ) = ba (f ):
(2.19)
We have de ned the cross-correlation, but in the same we can de ne the autocorrelation, when we substitute a(t) for b(t) in the above de nitions. A special characteristic
of the autocorrelation with respect to the cross-correlation is that the autocorrelation
exhibits symmetry in time when the time signal is real. This is due to the fact that
aa (f ) = A(f )A (f ) is real and consequently its inverse Fourier transform symmetric
around t = 0.

2.3 The discrete Fourier transform


The discussion in the last section was focussed on the continuous representation of the
Fourier transform with some applications. So what they had in common is that the
lters were considered to be analog. Examples of lters are springs and dashpots in
mechanical terms, or coils, capacitors and resistances in electrical terms. When we do
seismic processing, we work on discretised data right from the start, because we need to
use the computer to work on the data, and computers are digital these days. When using
digital computers, lters can also be de ned and are then by de nition digital lters. But
before discussing digital lters, we would rst like to discuss some basic elements of digital
signal analysis.
Again, we shall not go in much detail into how the discretised version of the Fourier
transform is derived, but summarize the most important results from it. In appendix
A we give the full derivation of the discrete form of the Fourier transform. Discretising
means that we choose certain time instances, and certain frequency instances to sample
our continuous signal, thus we take times t = kt and f = nf: The most important
consequence of discretising the data in one domain, is that the data in the other domain
becomes periodic. Thus discretising in the time domain means that the spectrum becomes
periodic, and vice versa.
11

Another important aspect is that in real life we cannot sample a signal until times
at in nity. Therefore we can only measure the signal with a nite amount of samples.
Therefore we always make an error, the error becoming smaller when we take more samples.
This means that we can reconstruct the continuous signal only up to a certain degree but
never exactly. Only when we use an in nite number of samples can we reconstruct our
signal perfectly.
With these consequences in mind, we can write the discrete form of the Fourier transform as :

Gl = t

KX
,1
k=0

gk exp(,2ikl=K )

for l = 0; 1; :::; K , 1;

(2.20)

in which summation is over the time samples k, and the spectral coecients are given
for frequency samples l. Also, K is the number of samples used to sample the continuous
function, and t is the sampling rate. The inverse Fourier transform is then given by:

gk = f

KX
,1
l=0

for k = 0; 1; :::; K , 1;

Gl exp(2ikl=K )

(2.21)

in which we now sum over the frequency components, and f is the sampling rate of the
frequencies. A simple relation connected to this Fourier pair is that we have the equality:
K tf = 1:
(2.22)
When we sample continuous data, we must make sure we take enough samples, that is, we
must take at least two samples for the maximum frequency in the data. Put in another
way, with a chosen t the maximum frequency which is represented properly is:
1 ;
fN = 2
(2.23)
t
where the maximum frequency is called the Nyquist frequency, denoted by fN :
Discretising in such a way does not seem to be a problem, but there is one snag: how do
we know on beforehand what our highest frequency in our signal will be? We do not know
and therefore we have to lter the continuous data before we digitize it. This means that
we must include a high-cut lter that makes sure that the signal level is damped below
the noise level at the Nyquist frequency and then the data can be digitized properly. This
is an analog lter, commonly called the alias lter. This lter has always been applied to
the data when the data arrives at the processing centre.

2.4 The spatial Fourier transform


For years, seismic recordings have taken place at discrete receiver stations, either with
single geophones or with geophone patterns. However, it has not always been realized by
practicing seismologists that we were sampling in the horizontal direction where we must
also take account of the Nyquist criterion. This criterion is the same as for sampling in
time, but only the parameters must be interpreted di erently. For sampling in one spatial
coordinate, say x, we de ne the following transform:

G~ (kx ) =

Z +1
,1

G(x) exp(2ikx x)dx;

12

(2.24)

where kx denotes the spatial frequency and now the forward transform from x to kx has
the opposite sign in the exponential compared to the time-frequency case. Note also that
we put a tilde ~ on top of a quantity to denote that it is represented in the kx domain.
The original x{domain signal can be synthesized by integrating over all kx components,
i.e.:
Z +1
G(x) =
G~ (kx ) exp(,2ikx x)dkx :
(2.25)
,1

This type of transformation can thus be applied to any kind of data sampled in space. For
instance, this transformation can be applied to data which represent the topography of
the surface. This type of transformation is also often applied to gravimetric and magnetometric data, which represent the gravity and magnetic eld at the surface, respectively.
It should therefore be remarked that this transformation can be applied to many data,
not at all from a geophysical nature, but for instance for image processing, economics, or
whatever else.
Again, we have only de ned the spatial Fourier transform for continuous signals but
in seismics we deal with discrete samples in space so we have to make the transforms
suitable for discrete signals as well. This discretisation goes in the same manner as for the
temporal Fourier transform (see also appendix A), and we obtain the pair:

X
G~ m = x Gn exp(2imn=N )
N ,1
n=0

Gn = kx

NX
,1
m=0

G~ m exp(,2imn=N )

for m = 0; 1; :::; N , 1
for n = 0; 1; :::; N , 1;

(2.26)
(2.27)

where we have N spatial samples and we again have the relation:


N xkx = 1:
(2.28)
The spatial Nyquist criterion is given by:
kN = 21 x ;
(2.29)
i.e., we must sample the maximum horizontal wavenumber at least two times. The same
remarks as for the temporal Fourier transform are valid: discretising in x makes the kx {
spectrum periodic, and vice versa. An example of aliasing in space is given in gures 2.6
and 2.7 which can be recognized by the "folding" back of the events in the (f; kx ) domain
if the dip in the space-time domain becomes too large. Also, we have to avoid steep slopes
when windowing in order to avoid side-lobe leakage in the other domain.

2.5 The two-dimensional Fourier transform (in relation to


waves)
In the previous sections we discussed the one-dimensional Fourier transform, but when
we deal with seismic data, we deal with two-dimensional data. At one sensor position, we
record the signal in time, and we record at many sensor positions, so we sample the time
and the horizontal space coordinate. In obtaining the seismic image of the subsurface,
we heavily depend on the two-dimensional nature of our data. Some operations take
place directly in the (t; x) domain, but other processes are done in the so-called (f; kx )
domain where the f and kx are the transform variables with respect to time t and space x;
13

respectively. As well known, f is the temporal frequency, and kx is the spatial frequency
in the horizontal direction where we now speci cally denote the subscript x to denote
that we are sampling in one horizontal Cartesian coordinate direction, namely x: The
two-dimensional Fourier transform of signal s(x; t) is de ned as:

S~(kx; f ) =

Z1Z1

,1 ,1

s(x; t) exp[2i(kx x , ft)]dtdx:

(2.30)

For seismic applications, the two-dimensional transformation is not a simple representation


since the quantities on the axes are di erent: space and time. But, space and time are
related, due to the fact that we deal with a wave phenomenon: a wavefront is propagating
through the earth, and we record it at the earth's surface. Therefore, the quantity kx
should be interpreted also in terms of waves. For a sinusoidal signal, the frequency f is
the number of cycles per unit time; the same is valid for the spatial frequency kx : In the
same way, kx is the number of wavelengths per space unit, so kx is related to the horizontal
wavelength x by:
k = 1:
(2.31)
x

x

When we substitute the relation:

cx = fx;
(2.32)
in which cx denotes the wave speed in the x{direction, we obtain for the spatial wavenumber kx :
(2.33)
kx = cf :
x
Let us look at the following example to illustrate some of the above. Suppose we have
a linear event in x-t as depicted in gure 2.7. In general, the event can be de ned as:
s(x; t) = s(t , x=cx );
(2.34)
in which s(t) is some characteristic function for the source signal. Applying a twodimensional Fourier transform, i.e. a Fourier transformation with respect to time as well
as the space-coordinate x :

Z1Z1

S~(kx ; f ) =

,1 ,1

s(t , x=cx ) exp[2i(kx x , ft)]dtdx:

(2.35)

Putting  = (t , x=cx ), we have that dt = d , at t = ,1;  = ,1, and at t = +1;  =


+1. From this it follows that:

S~(kx; f ) =

Z1Z1

,1 ,1

s( ) exp[,2if ] exp[,2ix(f=cx , kx)]ddx:

(2.36)

The latter expression can be considered as the product of two integrals, the rst one being
a Fourier transform with respect to the variable  only, i.e.:

S (f ) =

Z1

 =,1

s( ) exp[,2if ]d;

(2.37)

and the second integral being a Fourier transform with respect to the variable x only, so
that the total expression (2.36) can be written as:

S~(kx ; f ) = S (f )

Z1

x=,1

exp[2ix(kx , f=cx )]dx:

14

(2.38)

Now, because the delta-function is the inverse transform of the constant value 1, we have:

(kx ) =

Z1

x=,1

exp[2ixkx ]dx:

(2.39)

The integral transform in (2.38) can be recognized as a delayed delta function of kx .


Therefore it can be expressed as:
S~(kx ; f ) = S (f )(kx , f=cx)
(2.40)
This expression can be recognized as the product of a complex frequency spectrum S (f )
(which is a function of the frequency f only!) and a "razor-blade"{like two-dimensional
{function: (kx , f=cx), which is zero everywhere in the kx , f plane, except on the line
kx , f=cx = 0, where it assumes in nite amplitude. Note also, that the dip of the event
in the (f; kx ) domain is reciprocal to the one in the time domain. For the linear event in
the (t; x) domain, we obtain:
t = 1 ;
(2.41)
x cx
while in the (f; kx ) domain we obtain:
f = c :
(2.42)
kx x
In the above example we saw that the wave s(t , x=cx ) mapped onto the function
S (f )(kx , f=cx) in the (f; kx) domain. But as we have seen in the de nition of the
spatial wavenumber (eq.(2.33)), a frequency component f is included. This is the reason
why often a substitution of kx = fp is done, in which p is called the horizontal slowness or
ray parameter. The latter name is often used because it relates to the parameter that is
constant across an interface when using Snell's law : 1=cx = sin =c. When using kx = fp,
the forward transformation (eq.(2.35)) reads:

S~(fp; f ) =

Z1Z1

,1 ,1

s(t , x=cx ) exp[2if (px , t)]dtdx:

(2.43)

where we now recognize the linear event s(t , x=cx ) from above in the exponent. The
linear event maps in the (p; f ) domain as:
S~(fp; f ) = S (f )(p , 1=cx )
(f =
6 0)
(2.44)
In this domain, the wave s(t , x=cx) becomes only non-zero for the constant ray parameter
p = 1=cx . The type of transformation described here is often used in the so-called Radon
transformation. The Radon transformation exploits the wave character of our seismic
data.
In gure 2.6 three plane waves are displayed with increasing ray parameter, the time
and space sampling being 4 ms and 10 m respectively. Clearly, the e ect of aliasing can
be observed both in the time domain (positive dip looks as negative dip) as well as the
wavenumber domain (wrap around along the wavenumber axis).
In gure 2.7 we have created three linear events with di erent velocities. When we
transform the time-axis to the frequency domain, we obtain an interference pattern from
which we cannot recognize very much. the last picture gives the (f; kx ) spectrum by
Fourier transforming with respect to the horizontal coordinate.
Later on, in chapter 4, we shall go into more detail in this domain, where we can make
some nice classi cation based on its characteristics in this domain. There are also other
15

200

distance (m)
400

600

200

600

0.2

0.2

0.3

0.3

0.3

0.4

0.4

0.4

0.5

0.5

0.5

horizontal wavenumber (m-1)


-0.02
0
0.02

b) p = 0:3  10,3 s/m


0.04

-0.04

horizontal wavenumber (m-1)


-0.02
0
0.02

0.04

-0.04

20

20

20

40

40

40
frequency (Hz)

frequency (Hz)

60

80

80

100

100

100

120

120

120

e) p = 0:3  10,3 s/m

horizontal wavenumber (m-1)


-0.02
0
0.02

0.04

60

80

d) p = 0 s/m

600

c) p = 2:0  10,3 s/m

60

distance (m)
400

time (s)

0.2
time (s)

0.1

time (s)

0.1

-0.04

200

0.1

a) p = 0 s/m

frequency (Hz)

distance (m)
400

f) p = 2:0  10,3 s/m

Figure 2.6: Three discretised plane waves of frequency 30 Hz with ray parameter p =
0; p = 0:3  10,3 and p = 2:0  10,3 s/m respectively in the time domain (upper part)
and in the wavenumber domain (lower part). The time sampling is 4 ms and the spatial
sampling is 10 m.
applications of the (f; kx ) domain. For example, di erential operators in the x domain may
become simple algebraic operations in the kx domain for some simple models, making the
operation more easy to understand. Also, some operations may be more stable to compute
in the kx domain than in the x domain, or sometimes the other way around. In the next
chapters we will see a few applications of this.

16

offset (m)
500 1000 1500 2000 2500

offset (m)
500 1000 1500 2000 2500

20
0.5
frequency (Hz)

time (s)

40

1.0

60

80

1.5
100

120

2.0
-0.02
0

horizontal wavenumber
-0.01
0
0.01

20

frequency (Hz)

40

60

80

100

120

Figure 2.7: A linear event in (t; x) (upper left), (f; x) (upper right) and (f; kx ) (lower).

17

Chapter 3

A basic seismic processing


sequence
3.1 Seismic processing and imaging
Wave propagation versus signal to noise ratio

In seismic processing we are going to manipulate our measured data, such that we
obtain an accurate image of the subsurface. In fact the main problem is that the information we measure at the surface is a function of time, should be mapped to the correct
position in depth in the subsurface. This means that re ection energy has to be repositioned, which is called migration. In gure 3.1 it is clearly visible that three re ections
from totally di erent points in the subsurface are received at the same geophone position.
We can consider two ways of introducing seismic processing to a newcomer.
One is in terms of wave theory. We have to understand the physical processes that

x
t

a)

b)

Figure 3.1: Re ections in time (a) and in depth (b).

18

are involved all the way from the seismic source, through the subsurface, until the seismic
recording instrument. We have to try to obtain only those features which are due to the
structure of the subsurface and not related to other features. For instance, we want to
know the source signal we put into the earth because then we can remove it from our data
later: the structure of the subsurface does not depend on the source signal. In this way
we can remove or suppress certain unwanted features in the image we want to obtain.
Another way of introducing seismic processing to a newcomer is more in terms of the
image we obtain: signal-to-noise ratio and resolution. In order to see the image we need
to have at least a moderate signal-to-noise ratio. We would like this ratio to be as large
as possible by trying to suppress unwanted features in the nal image. Another aspect
of the nal seismic image is the resolution: we would like the image to be as "crisp" as
possible. As you may know, these two aspects cannot be seen separately. Usually, given a
certain data set, an increase in signal-to-noise ratio decreases the resolution (as information
is stacked together), and also an increase in resolution (by correctly incorporating wave
theory) has normally the consequence that the signal-to-noise ratio gets worse. In seismic
processing we would like to obtain the optimum between the two: a good, although not
perfect, signal-to-noise ratio with a good resolution.
In these notes we take the view of trying to understand each process in the wave
problem, and try to nd ways to cope with them. In this way we hope at least to increase
the signal-to-noise ratio, perhaps at some costs with respect to resolution. This is perhaps
a very important characteristic of raw seismic data: it has a very poor signal-to-noise
ratio, and it needs a lot of cleaning up before the image of the subsurface can be made
visible. It is along this line that we will discuss seismic processing: trying to understand
the physical processes. Sometimes, we will refer to the e ect it can have on the total signal
in terms of signal-to-noise ratio and resolution.
With seismic processing, we have many physical processes we have to take into account. Actually, there are too many and this means that we must make simplifying
assumptions. First, we only look at re ected energy, not at critically refracted waves,
resonances, surface waves, etc. Of course, these types of waves contain much information
of the subsurface (e.g. the surface waves contain information of the upper layers) but
these waves are treated as noise. Also critically refracted waves contain useful information
about the subsurface. That information is indeed used indirectly in re ection seismics
via determining static corrections, but in the seismic processing itself, this information
is thrown away and thus treated as noise. Another important assumption in processing
is that the earth is not elastic, but acoustic. In conventional processing, we mostly look
at P-wave arrivals, and neglect any mode-conversion to S-waves, and even if we consider
S-waves, we do not include any conversions to P-waves. Some elastic-wave processing is
done in research environments, but are still very rarely used in production. Money is better spent on 3-D "P-wave" seismics, rather than on 2-D "elastic" seismics; 3-D seismics
with three-component sources and receivers are still prohibitively expensive in seismic data
acquisition.
As said previously, the conventional way of processing is to obtain an image of the
primary P-wave re ectivity, so the image could be called the "primary P-wave re ectivity
image". All other arrivals/signals are treated as noise. As the name "primary P-wave
re ectivity" suggests, multiples are treated as noise (as opposed to "primaries"); S-wave
are treated as noise (as opposed to P-waves); refractions are treated as noise (as opposed
to re ectivity). Therefore, we can de ne the signal-to-noise ratio as:

S = Signal = Primary P-wave Re ection Energy


N Noise All but Primary P-wave Re ection Energy

(3.1)

It can be seen now that processing of seismic data is to cancel out and/or remove all
the energy which is not primary P-wave re ectivity energy, and "map" the re ectivity in
depth from the time-recordings made at the surface. In terms of total impulse response of
19

the earth G(x; y; t), we want to obtain that part of the impuls response of the earth which
is due to primary P-wave re ections:
G(x; y; t) Processing
! Gprimary,P-wave,re ectivity (x; y; z)
(3.2)

Overview of seismic processing steps

In this chapter, we will look at a basic processing sequence in order to see the backbone
of a larger sequence. The steps which will be considered here are common to seismic
processing of data gathered on land (on-shore) as well as at sea (o -shore). These are:
CMP sorting, NMO correction and velocity analysis, stacking, migration and time to depth
conversion. Although this is a basic processing sequence, it does not mean that this will
always give a good image: on land statics problems can be the largest problem and has to
be dealt with separately; also on land we have to deal with the surface waves which are
often removed by (f; kx ) ltering; at sea the source wavelet is not always a clean one and
one has to cancel this e ect via signature deconvolution. Another problem for marine data
are the strong multiples that are mainly generated in the water layer. We will certainly
discuss these processes but leave them until the next chapter.

3.2 Sorting of seismic data


Common shot and common receiver gathers

When data is shot in the eld, we record the shots sequentially. By a (shot) record
we mean all the recordings from the sensors for a single shot experiment. Normally, the
measurement for one source at one receiver location is called a trace, which is a time
series of re ections. It is obvious that for each shot we will order these recordings (traces)
by increasing (or decreasing) o set. The o set is de ned as the distance from source to
receiver. A simple simulated example of such a shot is given in gure 3.2. In this gure
on the left hand side the ray paths from source to the receivers of the seismic waves are
shown. Note that due to the di erent velocities in the di erent layers, the ray paths are
bended according to Snell's law. For this record, one shot consists of the explosion from
one charge of dynamite (supposed it is measured on land). The data is stored in the
recording instrument and then put onto a magnetic tape, record by record.
When the next shot is red, we do the same, record with the instrument and then
write the data onto tape. We say that the data is shot ordered. A section as shown in
gure 3.2 is commonly called a common-shot gather, or common-shot panel: we show the
recorded wave eld for one shot.
It can be guessed that if we talk about shot ordered data, we could also have receiver
ordered data. This is indeed the case. One could get all the shots together, of course in
an increasing shot position, belonging to one receiver position. Such a gather is called
a common-receiver gather (or panel). However, this assumes that during acquisition the
same receiver position is covered by di erent shots. In practice, we often make use of
reciprocity: interchanging source and receiver will give exactly the same response (if the
directional properties of the source and receiver can be considered identical). In fact gure
3.2 can also be considered as a common receiver gather, where all ray paths from di erent
shots come together at one receiver position.
Why should we need these distinctions? A nice feature about a common-shot gather is
to see whether a receiver position has a higher elevation than its neighbors and thus gives
an extra time shift in its record. This e ect is called "statics". Therefore common-shot
gathers are good for detecting geophone statics. In the same way, we can see on common
20

distance (m)
1000 1500 2000 2500 3000

time (s)

0.5

1.0

1.5

2.0

Figure 3.2: Shot gather measurement.


receiver gathers whether a shot was set deeper than the neighboring shot positions, and
therefore common-receiver gathers are good for detecting shot statics (see Chapter 4).

Common midpoint gathers

The way of organizing the data in common-shot gathers is just a consequence of the
logistics in the eld, but for some processing steps it is not a convenient sorting the data.
A commonly used way of sorting the data is in common-midpoint gathers. A mid-point is
here de ned as the mid-point between source and receiver position. An illustration of the
mid-point is given in gure 3.3. We gather those traces that have a certain midpoint in
common, like in gure 3.3, the record from receiver 3 due to shot 1, and the record from
receiver 1 due to shot 2. Once we have gathered all the traces with a common-midpoint
(CMP) position, we have to decide how to order these records for one CMP, and the logical
choice is to order them by increasing (or decreasing) o set. A gather for one mid-point
position with the traces for increasing (or decreasing) o sets is called a common-midpoint
gather (or panel). Figure 3.4 shows a CMP gather for the same subsurface model as gure
3.2.
For what reason is the common-midpoint gather convenient? The most important one
is for stacking which we shall discuss in one of the next sections. Suppose the earth would
consist of horizontal layers as depicted in gure 3.5. Then the geometrical arrival from shot
to receiver all re ect right below the midpoint between the source and receiver, and thus
the re ection points in the subsurface then only di er in depths. With other words, all the
re ections measured at the di erent o sets in a CMP gather carry information on the same
subsurface points (below the midpoint position). If we would make a correction for the
o set dependence of the traveltime for each trace, the re ections from the same place would
arrive at the same time for all the traces, and thus we could add the traces together to
increase the signal-to-noise ratio. This process is called normal move-out (NMO) correction
and stacking respectively, as will be discussed later. This argumentation is not valid for
common-shot gathers since the re ection points in the subsurface do not coincide for each
21

offset

shot 1

source

shot 2

shot 3

midpoints

receivers

Figure 3.3: Midpoint de nition in between sources and receivers.


trace (for a horizontally layered earth). However, for a laterally varying medium, as shown
in gure 3.4 the re ections within a CMP gather are coming still from a small region, and
the stacking procedure may still give acceptable results. As a result, the resolution of the
nal image will be limited by the assumption that all energy in a CMP gather comes from
the same subsurface points. In chapter 4 we will see that corrections can be included for
dipping re ectors, such that this midpoint smear is corrected for (DMO).

Common o set gathers

As can be expected, we can also form a common-o set gather, a gather in which we
collect all those source-receiver pairs that have a certain o set in common. Usually, we
shoot with xed distances between source and receivers, and so we will have as many traces
in our common-o set gather as there are shots, thus often quite a large amount. For the
model of gure 3.2 and gure 3.4 the zero o set con guration (i.e. source and receivers
at the same positions) is shown in gure 3.6. Note that in the zero o set section the

time (s)

0.5

1.0

1.5

2.0

Figure 3.4: Common midpoint gather.

22

offset (m)
500 1000150020002500

Figure 3.5: Common shot and common midpoint gather for horizontally layered earth.
distance (m)
1000
0

2000

3000

4000

time (s)

0.5

1.0

1.5

2.0

Figure 3.6: Zero o set gather.


general structures can already be recognized. Common o set gathers are used in prestack
migration algorithms since it can give a check on velocities. Migrating a common-o set
gather for a small o set should give the same image as a migration of such a gather for
a large o set, otherwise the velocity used in the migration is not the right one. The
zero o set section takes a special place in the seismic processing, as a stacked section is
supposed to resemble a zero o set section (see section 3.4 on stacking).
A graph combining all this information is given in gure 3.7. Here we assumed we have
recorded along a line in the eld, which we call the x-direction. Also, we have assumed
that we have 10 receiver positions with the rst receiver at the source location (i.e. at zero
o set). On the horizontal axis we have plotted the x{coordinate of the source (xs ), while
on the vertical axis we have put the x{coordinate of the receiver (xr ). Then, each grid
point determines where a recording has taken place. In this graph a column represents a
23

common-shot gather, and a horizontal line a common-receiver gather. A common-midpoint


gather is given by the line xs + xr = constant; which is a line at 45 degrees with a negative
slope. A common-o set gather is given by the line xs , xr = constant; which is a line of
45 degrees but now with a positive slope.
What can be noticed in the graph, is that we started out with 10 receiver positions for
each shot, while the CMP gather contains only 5 traces. Why that so? This can be seen
in gure 3.3. When we shift one source position to the next, we actually shift two CMP's
because the distance between each CMP is half the source spacing. So a factor two is
involved. On the other hand there are twice as many CMP gathers, as the total of traces
in the survey is of course constant in any sorting domain. Because each CMP gather has
half the number of traces it means that the distance between two traces is twice as large
as between to receivers in a shot gather. With other words, each CMP gather has a twice
as coarse spatial sampling compared to a common shot gather.
In gure 3.7 we assumed the spacing between the shot positions and the receiver
positions were the same but this does not need to be so. This also in uences the number
of traces in a CMP gather. The number of traces in a CMP gather is called the multiplicity
or the fold. It can be shown easily that the multiplicity M is:

M = 2xN=rx ;
s

(3.3)

in which Nr is the number of receivers per shot, xs is the spacing between the shot
positions, and xr is the spacing between the receivers.
In the above argumentation there is still one assumption made, and that is that the
earth is horizontally layered. When the earth is not like that, the re ection points do
not coincide any more, see gure 3.4. Still, the results obtained with this assumption are
very good, it only gets worse results when the dips of the layers of the earth are becoming
steep. We will come to that in section 3.4 when discussing stacking.
x
x
x
x
x
x
x
x
x
x

xs
x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

shot gather

x
x
x
x
x
x
x
x
x
x

CMP gather

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

common receiver gather

x
x
x
x
x
x
x
x
x
x

x
x
x
x
x
x
x
x
x
x

common offset gather

xr

Figure 3.7: Relation between di erent sortings of seismic data.

24

time (s)

0.5

1.5

200

400

600

800

1000 1200
offset (m)

1400

1600

1800

2000

Figure 3.8: a) Distances in a subsurface model with one at re ector. b) NMO curve
for geometry of a) with depth 300 m and velocity of 1500 m/s. The dashed line is the
parabolic approximation of the hyperbola.

3.3 Normal move-out and velocity analysis


NMO curve for single interface

The most important physical parameter needed for obtaining an accurate image of the
subsurface, is the velocity of the medium. We record our data at the surface in time, and
what we wish to obtain is an image of the subsurface in depth. The link between time and
depth is of course the wave velocity, which varies in the earth from position to position
(i.e. the earth is an inhomogeneous medium). Unfortunately, it is not so easy to obtain a
good velocity model and this is often an iterative process. In this section we will discuss
the e ect of the velocity on the obtained data. We will rst discuss some simple models
in order to understand the features we can encounter in real data. As a consequence of
this, we will discuss which operations have to be applied to the data in order to obtain
the desired information. We assume here that we deal with a CMP gather.
Let us rst consider the re ection from a single interface as depicted in gure 3.8. The
time for the geometrical ray from source to receiver is given by:
2 2 1=2
(3.4)
T = R = (4d + x ) ;

in which x is the source-receiver distance, R is the total distance traveled by the ray, d is
the thickness of the layer and c is the wave speed. When we write 2d=c as T0 ; then we can
rewrite this equation as:
!
2 1=2
T = T0 1 + c2xT 2
:
0

(3.5)

Note that this function describes a hyperbola. We can see that we have an extra time
delay due to the factor x2 =(c2 T0 ): The extra time delay is called the Normal Move Out,
abbreviated to NMO. This extra term is solely due to the extra o set of the receiver with
respect to the source; at coincident source-receiver position this term is zero. Often, the
square-root term in this equation is approximated by its one-term Taylor series expansion,
i.e.:
2
T ' T0 + 2cx2 T :
(3.6)
0

25

R
x

0.5

time (s)

d1

1.5

200

400

600

800

1000 1200
offset (m)

1400

1600

1800

2000

Figure 3.9: a) Distances in a subsurface model with two at re ectors. b) NMO curve for
second re ector with depth 300 m of each layer and velocities of 1500 m/s and 2500 m/s
in the rst and second layer respectively. The dashed line is the hyperbolic approximation
of the traveltime curve.
Figure 3.8b shows the traveltime curve for a layer of 300 meter depth and a velocity of
1500 m/s. The dashed line in this gure shows the parabolic approximation according to
equation (3.6).
In seismic processing we are not interested in the extra time delay due to the receiver
position: the image of the subsurface should be independent of it. The removal of the
extra time delay due to NMO is called the NMO correction.

NMO curve for more than one interface

Let us now move to a model with two interfaces, as depicted in gure 3.9. We call the
source-receiver distance x, the horizontal distance the ray has traveled in the second layer
x2 , the wave speed in the rst layer c1, and in the second c2 , the thickness of the rst
layer d1 , and of the second d2 . Then the traveltime from source to receiver is given by:

,(x , x )2 + 4d2 1=2 ,x2 + 4d2 1=2


2
1
+ 2 c 2
T =
c1
2
2 !1=2 2d2
22 !1=2
2
d
(
x
,
x
)
x
1
2
=
1+
+
1+
c1

4d21

c2

4d22

21 !1=2
22 !1=2
x
x
= T1 1 + T 2 c2
+ T2 1 + T 2 c2
;
1 1
2 2

(3.7)
(3.8)
(3.9)

in which T1 and T2 are the zero-o set traveltimes through the rst and second layer
respectively, and x1 = x , x2 . The problem with this formula is that, if we assume that c1
and c2 ; are known, we do not know x2 . Therefore we cannot directly use this expression
to describe the move-out behaviour of this two-re ector model.
In order to tackle this, we rst expand the square-root terms in equation (3.9) in a
26

Taylor series expansion as we did for the one-interface case:


2
2
T ' T1 + 2Tx1c2 + T2 + 2Tx2c2 :
11

and we square this equation in order to obtain:

T 2 = (T1 + T2 )2 + (T1 + T2 )

(3.10)

22

x21 + x22 + O(x4 ):


T1 c21 T2 c22

(3.11)

In this equation, we still have the distances x1 and x2 present. A relation between x1
and x2 can be found using Snell's law at the interface, being:
sin = sin ;
(3.12)

c1

c2

with and are the angles of the ray with the normal in layer 1 and 2 respectively,
when crossing the rst interface (see also gure 3.9). We make an approximation for small
angles for which sin  tan and sin  tan , such that equation (3.12) becomes:

x1  x2 ;
2d1 c1 2d2 c2
x1
x2
T1 c21  T2 c22 :

or

(3.13)
(3.14)

Writing this as x2 = (T2 c22 )=(T1 c21 )x1 and substituting this in x1 + x2 = x, we have:
2

1 c1 ;
x1 = x T c2T+
T c2

Similarly for x2 , we obtain:

11

(3.15)

22

2 c2 :
x2 = x T c2T+
T c2
11

(3.16)

22

We can use equations (3.15) and (3.16) in the quadratic form of eq.(3.11) to obtain:

T2

T1 c21 + T2 c22
(T1 c21 + T2 c22 )2

(T1 + T2 )2 + (T1 + T2 )x2

 (T1 + T2 )2 + T(Tc12 ++ TT2 )c2 x2 :


1
2
This equation is of the form:

(3.17)
(3.18)

T 2 = Ttot (0)2 + c2x :

(3.19)

N
X
c2rms = T 1(0) c2i Ti (0);

(3.20)

rms

with crms is what is called the root-mean-square velocity:


tot

i=1

27

offset (m)
500
1000

0.2

offset (m)
500
1000

offset (m)
500
1000

0.2

0.6

freqeuncy (Hz)

0.4

time (s)

time (s)

20
0.4

0.6

40

60
0.8

1.0

0.8

a) Input CMP gather

1.0

b) NMO corrected

80

c) Amplitude spectrum

Figure 3.10: CMP gather with one re ection before (a) and after NMO correction (b). The
amplitude spectrum in (c) of the NMO corrected gather shows the frequency reduction
due to the stretching e ect.
in which Ti (0) denotes the zero-o set traveltime through the i{th layer; Ttot (0) denotes
the total zero-o set time:
N
X
Ttot (0) = Ti (0):
(3.21)
i=1

We see here that with the assumptions made, a hyperbolic move-out for the interfaces
below the rst one is obtained. The approximation however is a very good one at small
and intermediate o sets (for horizontal layers) but becomes worse when the o set becomes
large. This e ect can be observed in gure 3.9b, where the hyperbolic approximation of
the second interface re ection is plotted with a dashed line.

Applying NMO correction

Then, how do we apply this NMO correction? First we have to determine the stacking
(i.e. root-mean-square) velocities for each zero o set time T0 (see next section). Then, for
each sample of the zero-o set trace will remain in its position. For a trace with o set x,
we calculate the position of the re ection according to equation (3.19) and nd the sample
nearest to this time T . This sample is then time-shifted back with the time di erence
between T and T0 (in fact it is mapped from time T to time T0 ). In this simple scheme
we have taken the sample nearest to the time T , but in general we can be much more
accurate by using a better interpolation scheme. It is important to realize that with NMO
we interpolate the data.
An artifact of the NMO correction is the NMO stretch. An example of this e ect is
shown in gure 3.10. How does this occur? We can see that the correction factor not only
depends on the o set x and the velocity crms , but also on the time T0 . So given a certain
stacking velocity and o set, the correction T , T0 becomes smaller when T0 becomes larger.
This is visible in gure 3.11, where for the second event smaller time shifts need be applied
compared to the rst event. Thus, the correction is not constant along a trace, even if we
have a constant o set and constant velocity. Also, we can see from this correction that
28

the e ect will become more prominent when the o set becomes larger as well. This e ect
is called NMO stretching. A measure for this stretch is the quantity tNMO =T0 , with the
applied NMO correction being de ned as tNMO = T , T0 . This can be seen by analyzing
the change in NMO correction as a function of T0 :
 2 2 2 1=2 
@
(T0 + x =c ) , T0
@ tNMO =
(3.22)
@T
@T
0

T0
T0 , (T02 + x2 =c2 )1=2
,
1
=
2
(T0 + x2 =c2 )1=2
(T02 + x2 =c2 )1=2

 , tTNMO :
0

(3.23)
(3.24)

This quantity relates to the frequency distortion by:


tNMO f
=
;

T0

(3.25)

fdom

where fdom is the dominant frequency and f is the change in the dominant frequency
due to the stretching, supposed we have a constant velocity. This frequency distortion is
clearly visible if we take the amplitude spectrum of each NMO corrected trace in gure
3.10b, as displayed in gure 3.10c. The frequency content decreases dramatically towards
the large o sets. In this example T0 = 0:44s and tNMO = 0:5s at the largest o sets,
giving a stretch of 1.14, meaning that the frequency reduces to less than half the original
bandwidth, which can indeed be observed in gure 3.10c.
When we are processing the data, we do not want to have a too severe signal distortion, and therefore the data which is distorted more than a certain threshold (say 100%),
is zeroed, called the mute. The measure for this distortion is as given in equation (3.25).
Figure 3.12 shows an example of a CMP gather with two re ection in which NMO correction is applied with and without the mute applied. In gure 3.12c the signal is muted
when more than 50% stretch is present.
x

T0,1
TNMO(x,t)

T0,2
TNMO(x,t)

Figure 3.11: NMO correction is applied as a time and o set dependent time shift of the
data.
29

Velocity estimation

In the application of the NMO correction, there is of course one big question: which
velocity do we use? Indeed, we do not know the stacking velocity on beforehand. Actually,
we use the alignment of a re ection in a CMP gather as a measure for the velocity. Since,
if the velocity is right, the re ection will align perfectly. However, when the velocity is
taken too small, the correction is too large and the re ection will not align well; in the
same way, when the velocity is chosen too big, the correction is too small, and again the
re ection will not align. An example of these cases is given in gure 3.13.
As the earth is consisting of more than one interface, we need to determine the velocities, although they may just be root-mean square velocities for each layer. The goal is the
same as in the case of just one interface: we would like all the re ections to be horizontally
aligned. A systematic way of determining these velocities is to make common-midpoint
panels which are each NMO corrected for a constant velocity. Then we can see for those
velocities the re ector will align or not; usually the deeper the interface the higher the
(root-mean-square) velocity. An example of such an analysis is given for a four re ector
median (see gure 3.5) in gure 3.14.
Another way of determining velocities is via t2 , x2 analysis. For this analysis we have
to pick the traveltimes for a certain re ector and plot them as a function of x2 . As we
have seen with multiple interfaces, the slope of this curve should be 1=c2RMS , and thus we
know the stacking velocity. This method can be quite accurate but depends on the quality
of the data whether we are able to pick the re ection times from the data. An example is
given in gure 3.15, where for a two-re ector situation the axis have been stretched such
that the di erent velocities become visible as di erent dip lines.
The most commonly used way of determining velocities is via the velocity spectrum,
which can be seen as a hyperbolic transformation of the seismic re ections from the spacetime domain to the velocity-time domain. In order to determine the velocity spectrum we
correct the CMP gather for a certain stacking velocity and apply a coherency measure (i.e.
0

offset (m)
500
1000

0.2

0.2

0.4

0.4

0.4

0.6

0.8

1.0

time (s)

0.2

time (s)

time (s)

offset (m)
500
1000

0.6

0.8

a) Input CMP gather

1.0

offset (m)
500
1000

0.6

0.8

b) NMO corrected

1.0

c) NMO corrected and muted

Figure 3.12: a) CMP gather with two re ections. b) CMP gather after NMO correction
without stretch-mute. c) CMP gather after NMO correction and stretch-mute.

30

offset (m)
500
1000

0.2

0.4

0.4

0.4

0.8

1.0

time (s)

0.2

0.6

0.6

0.8

vnmo < vrms

1.0

offset (m)
500
1000

0.2

time (s)

time (s)

offset (m)
500
1000

0.6

0.8

1.0

vnmo = vrms

vnmo > vrms

Figure 3.13: CMP gather with one re ection after NMO correction with too low, correct
and too high stacking velocities.

time (s)

1300

1500

1700

velocity (m/s)
1900

2100

2300

2500

0.5

1.0

Figure 3.14: CMP gather NMO corrected with a range of constant NMO velocities from
1300 to 2700 m/s with steps of 200 m/s.
weighted stack) to this data. This gives us one output trace. Then, for a next velocity,
31

we do the same. For a complete set of velocities, we plot these results next to each other,
the result being called the velocity spectrum. On the vertical axis we then have the time,
while on the horizontal axis we have the velocity. As an example we consider again the
synthetic CMP gather in the model of gure 3.5, for which we calculate the semblance
for velocities between 1000 m/s and 3000 m/s with 50 m/s interval, see gure 3.16. The
result we obtain is often displayed in contour mode or color mode.
As a coherency measure, the semblance is most often used. The semblance S (t; c) at
a time t for a velocity c is de ned as:

PM
2
A
(
x
;
t;
c
)
m
S (t; c) = M1 PmM=1 2
;
A (x ; t; c)
m=1

(3.26)

in which M is the number of traces in a CMP and A is the amplitude of the seismogram
at o set xm and time t after NMO correction with velocity c. For the de nition of other
coherency measures, the reader is referred to [Yilmaz, 1987] (page 169, 173). Note that
if an event is perfectly aligned with constant amplitude for all o sets, the semblance has
value 1. Therefore, the semblance has always values between 0 and 1.
Normally, the coherency measure is averaged over a few time samples, just to avoid
too much computing time. Say, the k{th sample is in the center of the time gate, then
the semblance is:
Pk+p PM A(x ; t ; c)2
m i
1
;
(3.27)
S (tk ; c) = M Pi=kk+,pp PmM=1 2
i=k,p m=1 A (xm ; ti ; c)
with 2p being the number of samples within the time gate. In practice, the time gate
should be chosen quite small, at most the period of the dominant frequency of the signal
(1=fdom ), usually between 20 and 40 ms, otherwise we loose too much temporal resolution.
The e ect of using di erent time gates is shown in gure 3.17.
For a more extensive discussion on the velocity analysis we would like to refer to
[Yilmaz, 1987] (pp.166|182).

offset (m)
500
1000

0.4

0.6

0.2

time^2 (s^2)

time (s)

0.2

x10 6
offset^2 (m^2)
0.5
1.0
1.5

0.4

0.6

0.8

0.8

1.0

1.0

Figure 3.15: CMP gather with two re ections (left) after the t2 ; x2 axis stretching.

32

0.5

time (s)

time (s)

offset (m)
500
1000

1.0

stacking velocity (m/s)


1000 1500 2000 2500 3000

0.5

1.0

Figure 3.16: CMP gather with its velocity spectrum, using a semblance calculation with
window length of 20 ms.

0.5

time (s)

time (s)

stacking velocity (m/s)


1000 1500 2000 2500 3000

1.0

stacking velocity (m/s)


1000 1500 2000 2500 3000

0.5

1.0

Figure 3.17: Semblance velocity spectrum of a CMP gather with a window length of 40
ms (left) and 60 ms (right).

33

offset (m)
500
1000

horiziontal wavenumber (1/m)


-0.04 -0.02 0 0.02 0.04

20

0.2

offset (m)
500
1000

horiziontal wavenumber (1/m)


-0.04 -0.02 0 0.02 0.04
0

0.2

0.8

1.0

frequency (Hz)

time (s)

frequency (Hz)

time (s)
0.6

40
0.4

60

0.6
80

100

120

20

0.2

40
0.4

0.8

0.4

60

80

100

120

1.0

time (s)

0.6

0.8

1.0

a)
b)
c)
Figure 3.18: CMP gather with 2 primaries and 1 multiple in the (x; t) and (f; kx ) domain
before (a) and after (b) NMO correction and after stacking (c).

3.4 Stacking
A characteristic of seismic data as obtained for the exploration for oil and gas, is that
they generally show a poor signal-to-noise ratio, not only due to coherent events such
as surface waves, but also due to uncorrelated noise. Often, only the strong re ectors
show up in raw seismic data. An important goal in seismic processing is to increase the
signal-to-noise ratio, and the most important steps towards this goal, is CMP sorting
and stacking. With stacking we add the NMO-corrected traces in a CMP gather to give
one output trace. A better nomenclature is perhaps horizontal stacking because we stack
in the horizontal direction. This is in contrast to vertical stacking, which is recording
the data at the same place from the same shot position several times and adding (i.e.
averaging) these results. With stacking, we average over di erent angles of incidence of
the waves, even in horizontally layered media. This means that we loose some information
on the re ection coecient since, as the reader may know, the re ection coecient of an
interface is angle-dependent. Therefore, the stacked section will contain the average angle
dependent re ection information. It is important to realize that stacking the traces in the
x , t domain is equal to selecting the kx = 0 line in the (f; kx) domain. This is similar to
the time domain situation: the summation of all time samples yields the d.c. component
(i.e. f = 0 Hz) of the signal. In principal, all events in the NMO corrected CMP gather
that are not present at kx = 0 will disappear in the stack. Of course, an optimally aligned
(i.e. NMO corrected) event will yield a large contribution at kx = 0. This is visible in
gure 3.18 where a CMP gather with two -primaries and one multiple is shown before
and after NMO correction. The NMO correction maps all primary energy around the zero
wavenumber axis, although the multiple energy is still spread out in the (f; kx ) domain.
The resulting stack shows a reduced multiple energy, which is a desired feature of the
stack.
Although the signal-to-noise ratio is increased by stacking, we will also have introduced
some distortions. We have already discussed the NMO stretch and the approximation with
the root-mean-square velocity. Therefore, when we add traces, we do not do a perfect job
so we loose resolution. The e ect of an erroneous velocity for the NMO is shown in
gure 3.19, which shows a stacked section with the correct stacking velocities and with
7% too high stacking velocities for the data generated in the model of gure 3.2. One can
see that the stacked trace is getting a lower frequency content and that the amplitudes
are decreasing in some parts with the erroneous velocities. Note that a stacked section
34

distance (m)
0

1000

2000

3000

4000

time (s)

0.5

1.0

1.5

2.0

correct stacking velocities


distance (m)

1000

2000

3000

4000

time (s)

0.5

1.0

1.5

2.0

stacking velocities 7% too high

Figure 3.19: Stacked sections with correct and too high stacking velocities.
simulates a zero o set section, but with much better signal to noise ratio. Compare
therefore the stacked result to the zero o set section of gure 3.6, which shows exactly
the same region (1000 - 4000 m) in the model. Note the resemblance of the stack with
the zero o set section. Note also that the stack is twice as dense sampled in the trace
direction, due to the fact that there are twice as many CMP positions as there are shot
35

positions.
Finally, it should be emphasized that, with stacking, we reduce the data volume. The
amount of data reduction is the number of added traces in a CMP gather. There are
certain algorithms which are expensive to compute and are therefore applied to stacked
data rather than on pre-stack data. An example of this is migration as shall be discussed
in the next section.

36

Figure 3.20: Stacked section (a) and its time migrated version (b) (from Yilmaz, 1987, g.
4-20)

3.5 Zero-o set migration


Introduction

Although we have removed some timing e ects with the NMO correction, this does
not mean that we have removed the wave e ects: it is just one of many. Migration deals
with a further removal of wave phenomena in order to arrive at a section which is a better
representation of the subsurface. After the NMO correction and stacking, we can still have
di ractions in our stack, such as shown in gure 3.20. Also, dipping re ectors will not be
positioned at their right place and will not show the true dip, as also shown in gure 3.20.
In migration we are going to deal with this.
We have only summarized brie y the problem but in order to understand it better,
we have to invoke some concepts and theory, such as the exploding re ector model and
wave theory. But let us rst summarize at what stage we have arrived. We sorted our
data in CMP gathers, determined the stacking velocities, applied the NMO correction and
stacked the data. In the NMO correction we removed the o set dependence of the receiver
position with respect to the source. In this step we made our data look as if it were
37

c/2

Figure 3.21: Exploding re ector model for zero o set data. A zero o set measurement
can be considered as an exploding re ector response in a medium with half the velocity.
obtained with coincident source-receiver position, thus zero-o set. For a re ection from
a horizontal re ector, the re ection points on the re ector coincide for all traces in one
CMP gather at the same lateral location as the midpoint at the surface (see gure 3.5).
However, when we have a laterally varying structure, the re ection points on the re ector
do not coincide for the traces in one CMP gather (see gure 3.4). Still, we assume we
have done a good job and assume the re ection in the stacked section is coming from one
point in the subsurface. The main purpose of the stacking process is to reduce the data
volume and to improve the signal-to-noise ratio of the data.

Exploding re ector model

An important concept in migration is the exploding re ector model. Consider a simple


model with one re ector in the subsurface. When we have a source which emits a signal
at t = 0, the signal will propagate through the medium to the re ector, will be re ected
and will arrive back at the receiver (= shot position for a zero o set experiment). This is
shown in gure 3.21 at the left hand side. Say the wave takes a time T to do this. Apart
from some amplitude di erences, the data recorded in such a way would be the same if
we could re o the sources on the re ector at time 0 but assume half the velocity of the
medium in between. Putting the sources on the re ector is called the exploding re ector
model. This is shown in gure 3.21 at the right hand side. To get a good amplitude,
each exploding source should have the strength of the re ection coecient of the re ector.
When we have recorded the data at time T , and would keep track of the time to get back
from time T to the re ector, we would obtain the image at time t = 0; again assuming we
have taken the right (i.e. half the original) velocity. The condition of t = 0 is called the
imaging condition. In fact, from a kinematic point of view, any zero o set section can be
considered as being a record of exploding re ectors in the subsurface (for primary wave
paths).

38

c2

depth z

39

Figure 3.22: A di ractor (left) and its seismic response (right), a hyperbola in the zero
o set section.

with T de ned as the hyperbolic move-out of equation (3.28) as a function of the o set
x , xs and the stacking velocity c. This equation indicates that all seismic energy is
added, or stacked, along a hyperbola with the apex in point (x; T ). What we do when
stacking along hyperbolae, is actually removing the wave propagation e ect from the point
di ractor to the receiver positions. A very nice feature about the di raction stack is that
it visualizes our intuitive idea of migration, and is very useful in a conceptual sense.
So far, we considered one point di ractor, but we can build up any rer ector by putting
all point di ractors on that re ector. When the spacing between the point di ractors
become in nitely small, the responses become identical. This concept agrees with Huygens'
principle. As example, consider four point di ractors, as depicted on the left of gure
(3.23). Each di ractor has the behaviour as discussed above, as can be seen on the right
of gure (3.23), but the combination of the time responses shows an apparent dip. The
actual dip goes, of course, through the apeces of the hyperbolae.

xs

where R being the distance in a homogeneous medium with velocity c from the di ractor
at (xd ; zd = Td =cs ) to the surface position xr .
Before the computers were in use, people used to stack along the hyperbola in order
to obtain the di racted point. Of course, for this procedure to be e ective we need to
know the stacking velocity. When we have more than one di ractor, we can do the same
procedure and stack all the hyperbolae. This is called a di raction stack. In the early
days of computers the di raction stack was used to apply the migration by calculating the
following formula (assumed to have a discrete number of x's, being the traces in a zero
o set section):
X
p(x; T ) = p(xs ; T (x , xs; c));
(3.29)

Let us consider the simple example of a point di ractor in the subsurface. When
we excite a source and let the wave travel to the di ractor and let it travel back to the
receiver, we will obtain an image as shown in gure 3.22; it de nes a so-called a di raction
hyperbola, according to the following formula, with the apex in (xd ; Td ):
 2R 2
2
2
= T 2 + 4(xr , xd ) ;
(3.28)
T =

Di raction stack

time T

depth

40

Figure 3.23: Four point di ractors (left) and their seismic responses (right). Note the
apparent dip from the hyperbolae.

ld
ip

rea

t d
ip

are
n

ap
p

Let us now look at a full dipping re ector. Of course, it has some of the characteristics
as we saw with the four point di ractors, only with a full re ector we no longer see the
separate hyperbolae. Actually, we will only see the apparent dip. As we saw with the 4
point di ractors, we need to bring the re ection energy back to where they came from,
namely the apex of each hyperbola. When connecting all the apeces of the hyperbolae,
we get the real dip. This is depicted in gure (3.24).
The next gure (3.25) quanti es the e ect of migrating the energy to its actual location.
In particular, compare the gures in the middle and and on the right: the di erence is
a factor cos , where  is the dip of the re ector with the horizontal. The zero o set
traveltime at a certain x-value can be speci ed by tZO = (2=c)x sin , assuming that x = 0
corresponds to the point where the re ector hits the surface in gure 3.25a. The slope in
the zero o set section is therefore dt=dx = (2=c) sin , see gure 3.25b. If this zero o set
section is migrated and the result is displayed in vertical time  = z=c, the resulting slope
of the re ector is d=dx = (2=c) tan  ( gure 3.25c). Thus, migration increases the time
dip in the section by cos  and thus re ectors in the unmigrated section are increased in
their up-dip direction in the migrated section. At the same time, migration decreases the
apparent signal frequency by the factor cos . The reason that the dip is increased by cos 
and the frequency decreased by cos  lies in the fact that the horizontal wavenumber is
preserved.
Another commonly observed phenomena is the so-called "bow-tie" shaped zero o set
response, due to synclinal structures in the earth. This is shown in gure 3.26, where it
can be observed how in the middle above the syncline multi-valued arrivals are present.
This behaviour can be predicted by considering small portions of the re ected signal, and
increasing the dip of each portion of the re ected signal. Note that in gure 3.20 such
structures are also visible.
When we consider the above con gurations, we can well understand the e ect of migration of a real data set, as shown in gure 3.20. We can observe that all the di ractions in

time

Figure 3.24: Relation between the re ection points in depth (a) and the traveltimes in the
zero o set section (b) for a dipping re ector (from Yilmaz, 1987, g. 4-14).
x

dt/dx = (2/c) sin

dt/dx = (2/c) tan

dz/dx = tan

depth section

zero o set section

migrated zero o set section

Figure 3.25: Migration increases the dip in the zero o set section.

41

the stacked section are well collapsed after the migration. What is lacking in the approach
of the di raction stack is sound theory, and the nal migrated result may be correct in
position (if the di raction responses can be assumed to have a hyperbolic shape, i.e. if
the subsurface exhibits moderately velocity variations), but not in amplitude.

Migration using wave theory (zero o set)

The most elegant way to migrate seismic data in a way which takes wave theory into
account, is Kirchho migration. Wave theory takes amplitdue e ects into account, and is
derived from fundamental physical laws in appendix B. Via the wave equation we can then
derive a formulation for the Kirchho migration. The theoretical derivation is postponed
until the next chapter (section 4.9). In this section, we are interested in the application of
zero-o set (post-stack) migration. This is described by the Kirchho 's migration formula
[Schneider, 1978]:
Z
s
p(x; t) = ,21 @z s p(x ; tR+ R=c) dAs ;
(3.30)
z =0

In the Kirchho migration formula (3.30), the p(xs ; t + R=c) in the integrand can be
identi ed as the zero o set data or the stacked data (as a stack should simulate the zero
o set recording). The data is recorded at the surface; Let us call our surface measurement
pzo , where the subsript zo stands for zero o set. Then we set c ! c=2 in order to correct
for two-way traveltime. Then, we calculate the response for each point in the subsurface
and put t = 0, the imaging condition, which images the exploding re ector which starts
to act at t = 0. With other words, we start with our measurements at the surface and do
a downward continuation (inverse wave eld extrapolation) to all depth levels, and pick
the t = 0 point at each subsurface point. If there was a re ector at a certain point, it
will be imaged with this procedure. If there is no re ector at a certain depth point, no
contribution at t = 0 is expected for that point. So we can obtain a depth section by
integrating over the surface to obtain:
Z
s
(3.31)
p(x) = ,21 @z s pzo (x R; 2R=c) dAs :
z =0
Remember that R is the distance between the outputs point on the depth section and the
particulars receiver or trace location on the surface z = 0. So as we integrate along the
surface z = 0 we are actually summing along di raction hyperbolae (in the case of a
constant velocity medium), de ned by the time curve t = 2R=c, but then in a weighted
fashion. Note indeed the large resemblance with the di raction stack de nition of equation
(3.29). The extra 1=R factor takes the spherical divergence of the wave front into account

0
0.05
0.1
t [s]

z [m]

50
100
150
200
250

0.15
0.2

100

200

300

400

500
x m[m]

600

700

800

900

0.25

1000

0.3

depth section with zero o set ray paths

100

200

300

400

500
x [m]

600

700

800

900

1000

zero o set section

Figure 3.26: A syncline re ector (left) yields "bow-tie" shape in zero o set section (right).

42

y
p(x,y,z=0,t)

p(x,y,z,t)

;t+2R=c) dxs dys


Downward extrapolation: p(x; y; z; t) = ,21 @z xs ;ys p(x ;y ;z =0
R
s s s

Figure 3.27: Downward continuation step used in migration.


and the factor @z compensates for the frequency dependent and wave front angle dependent
e ects of the lateral summation process. Note that the integral over surface As will
numerically be implemented as a summation over all (xs ; ys ) positions, i.e. a summation
over all traces in the seismic section. Although the di raction stack of equation (3.29) has
been written as a summation over xs only, the extension to 3D by adding a summation
over the ys coordinate is straightforward; in that situation the hyperbola is replaced by a
hyperboloid: Ts2 = T 2 + 4[(xs , x)2 + (ys , y)2 ]=c2 .
For inhomogeneous media, the di raction responses are no longer hyperbolic, and the
concept of di raction stack is wrong. Here, we are doing the summation much better than
the di raction stack because we have included the wave equation.
The complete 3D zero o set migration procedure can now be as follows:

 Step 1 : Extract or simulate by stacking the zero o set dataset p(x; y; z = 0; t).

Consider this to be measured in a half-velocity medium with exploding re ectors.


 Step 2 : Do a downward continuation (inverse extrapolation) step from the surface
level to a level in the subsurface, according to:
Z
s s s
p(x; y; z; t) = ,21 @z s s p(x ; y ; z =R0; t + 2R=c) dxsdys :
(3.32)
x ;y
Note that for this extrapolation step we need the velocities in the subsurface. This
extrapolation is visualized in gure 3.27.
 Select at each depth level the zero time component, which yields the migrated section:
pmig (x; y; z) = p(x; y; z; t = 0):
(3.33)
Our nal result is a depth section, as we would obtain when we would make a geological cross-section through the subsurface (of course with a limited resolution). However,
migration is not a simple process without any artifacts, and most importantly, we usually
43

do not exactly know the velocity as a function of x; y and z . Therefore, we would like
to be able to compare our original stacked section with the migrated section directly in
order to see what the migration has done. Especially seismic interpreters need this type
of comparison. To this aim, the depth coordinate z is mapped backed onto time  via:

 = zc

(3.34)

for a constant-velocity medium. ([Gazdag and Sguazerro, 1984], equation (43) etc.; [Schneider, 1978]
p.56). For an inhomogeneous subsurface, this mapping is more complicated. For this purpose often ray-trace techniques are used to located the re ectors in time.

Time migration using the stacking velocities

To overcome the problem of not knowing the interval velocities in your medium, people
have though of a work-around, using the stacking velocites. As we have done a stack in
general, the stacking velocities are already known. In equation (3.32) we need to know
the distance R from subsurface point to the surface (which depends on the velocities in
the subsurface). It is often assumed that this path can be approximated by a straight line
(as in a homogeneous medium) using the stacking velocity. Therefore, R is replaced by:
s2
s2
R=c   0 = ( 2 + 4x c2+ 4y )1=2 :
rms

(3.35)

Furthermore, the extrapolated data is considered in migrated time  and not in depth,
which transforms equation (3.32) into:

p(x; y;  )  ,21 @z

p(xs; ys ; zs = 0; 2 0 ) dxs dys ;


crms  0
xs ;ys

(3.36)

which again describes a di raction stack (if we also neglect the derivative to depth). In
these type of migrations, it is assumed that the structures in the subsurface are simple
enough to use the hyperbolic approximation of the response of an exploding re ector
source.

E ects of wrong migration velocities

The only important parameter we can actually set is the velocity distribution. It is
therefore important to know how a wrong velocity distribution will manifest itself in the
nal result. This is shown in gure 3.28 where we see a correctly and incorrectly migrated
V-shaped re ector response. Note again the e ect of migration: the increase of the slopes
and the collapsing of the di raction hyperbola into a point (i.e. the edge of the V-shape).
When we put the velocity too low, the di raction hyperbolae are not completely collapsed
yet and we keep a hyperbola in our result. Such a section is undermigrated. In the same
way, when the velocity is too high, then the di raction hyperbolae are corrected too much,
and an over-migrated section will arise. As such, migration can also be used to determine
velocities: it is that velocity that images the di ractor(s) in its original point with no
di raction e ects visible anymore. A well-known e ect of over-migrated sections is the
creation of so-called "migration smiles" and crossing events, as visible in gure 3.29.

44

distance (m)
500

1000

1000

migration, correct velocity


distance (m)
500

1000

0.5

1.0

migration, too high velocity

distance (m)
500

0.5

1.0

zero o set section

0.5

1.0

time (s)

0.5

1.0

time (s)

1000

time (s)

time (s)

distance (m)
500

migration, too low velocity

Figure 3.28: Stacked section and its time migrated version with the correct and wrong
velocities.

45

Figure 3.29: Stacked section and its time migrated version with the correct and wrong
velocities (from Yilmaz, 1987, g. 4-54).

46

Chapter 4

Extended Processing
4.1 Introduction

In this chapter we will deal with more speci c subjects which are needed in processing.
In chapter 3 we discussed a basic processing sequence which is common to data acquired
on land as well as at sea. As was mentioned in that chapter, the processes which were
described do not guarantee a good image of the subsurface in either case. On land, and at
sea, there are speci c phenomena which must be dealt with, and they shall be discussed
in this chapter. However, these extra processing steps are common to either land and/or
sea data in seismic processing production work. In certain circumstances they can still
not be sucient to get a good image of the subsurface. In that situation the data is given
to people specialized in particular processing which is not commonly done, and often a
close connection to research groups is kept.
There is perhaps one processing feature which is not dealt with explicitly but is very
important nowadays in processing, namely 3-D. Many processes as described in this chapter apply to 3-D processing, and do not need any further explanation in their application
to 3-D data sets, but some do. The most important one in that respect is 3-D migration.
The largest improvement in data quality via 3-D shooting, has been due to migration of
3-D data in comparison with 2-D migration. We leave the discussion which is of particular
interest to 3-D, to a later chapter on special processing.
In this chapter we will not discriminate between marine and land data, unless specifically mentioned. We will rst discuss some data formats as currently being used in
seismic exploration for oil and gas, before telling some things about surveying and eld
geometry-information. Then, in order to correct for amplitude variations, trace balancing
is standardly applied to seismic data. A large nuisance in land data are static time shifts
due to the in uence of the shallow subsurface. Static corrections aim to correct for those
local deviations and we will discuss them in a separate section. Another process which
will be discussed is deconvolution, which can be used to remove short-time and long-time
multiples, but also to shape the seismic wavelet. Another often applied process is ltering
in the (f; kx ) domain, especially for the removal of surface waves in land data, and removal
of multiples and noise in marine data. The last processes we will discuss in this chapter
are related to migration: rst a separate section on Dip Move-Out (DMO), and then a
more extensive discussion on migration techniques.
47

4.2 Data formats


Seismic data is processed on the computer, and therefore the data must be stored on a
device. The data storage can be done in many ways, and in general there are quite a
lot of ways to do that. Because the amount of data we are dealing with in seismics, it is
important to do this in an ecient way. Already early in the use of computers seismic data
is stored in a format speci c to seismic data. And fortunately to the seismic industry, the
formats introduced became standard. The standards in the seismic industry are made by
a special standardization committee of the Society of Exploration Geophysicists (SEG),
which is an American society. Quite a few standards came out of this committee, and two
of them became very popular. These are the standards SEG-D and SEG-Y. There is a
special booklet on the mostly used standards for data stored on tape devices, called Digital
Tape Standards [SEG, 1980]. Since in the early days the disk capacity was rather small
compared to the tape devices, hardly any format existed for seismic data stored on disk.
However, times have changed and some standards have been de ned, although not very
much used (yet). Many processing packages have their own internal format, but lately
the standard de ned for tape devices, SEG-Y, is used for disk les as well. However,
the data is unformatted written on disk, so it depends on what kind of internal word
representation is used on the computer. SEG-Y is, by de nition, IBM oating point. The
other disk format for disk storage, is SEG-2, which is unfortunately only de ned for a PC
environment [Pullan, 1990].

SEG-D

Let us brie y tell something about the standard SEG-D. This is a standard which is
mostly used in the eld, and eld tapes delivered to the processing centre are usually in
this format. The rst thing which is usually done when it arrives at the processing centre
is that the tape is converted to a SEG-Y tape, and from then on only SEG-Y is used.
SEG-D has some advantageous features for use in the eld. One is that it can deal with
multiplexed as well as demultiplexed data. Multiplexed data is that the data is not stored
trace by trace, but that all the rst samples of all traces (receivers) for one shot record are
stored, then the second samples of all the traces, and so on. This way of storing very much
links up to how the electronics deals with the data, i.e. how the data is sampled. Another
advantage of the SEG-D format is that various word lengths are allowed which makes it
possible to make use of less storage space than the conventional 4 bytes. As a nal remark
it can be said that the format is an awful standard to implement and understand, the
description of it takes up more than 30 pages. This can be found in the publication by
the SEG ([SEG, 1980]).

SEG-Y

The other tape standard mentioned is SEG-Y. This standard has been published
[Barry et al., 1975] and has been enclosed in appendix C. A very nice feature of the
standard is that it is not only a published standard, but it has also been adopted as
a standard for the oil industry. This has made data communication very simple, and most
data transfer takes place via SEG-Y.
SEG-Y is a tape standard which was de ned on top of the at-the-time standard of
IBM- oating point, this means that oating-point numbers are stored as such. A SEG-Y
tape consists of two so-called reel headers, so these are headers which occur only once
on a tape. There are two reel headers, one being an EBCDIC header of 3200 bytes, and
another being a binary header of 400 bytes. In the EBCDIC header one can put general
information about the data on the tape, such as where it was shot and when, and by who,
and so on. How this is lled is completely arbitrary, as long as it is in EBCDIC (which can
easily be converted to ASCII). For the binary reel header this is di erent. In this header
the way is built up is very precise, as can be seen in appendix C. In this header there are
a few positions which are recommended to be lled (and thus MUST be lled), and are
emphasized by an asterisk in the appendix. These are data such as line number, sampling
48

3200 bytes

400

240

nt * 4

240

data

reel header 1
EBCDIC

nt * 4

....

data

reel header 2
BINARY
trace header 1
(I2, I4)

trace header 2
(I2, I4)

Figure 4.1: General set-up of a SEG-Y le


rate, number of samples per trace, and so on.
The next blocks on a SEG-Y tape are trace blocks which consists of a trace header and the
values of the trace itself. This trace header is by de nition 240 bytes large, and information
of the trace can be set in this header. Again, some positions are recommended to be lled
and thus must be set and are emphasized by an asterisk in the table in appendix C. There
are totally six elds which must be set, namely trace sequence number in the line, the
original eld record number, the trace number within the original eld record number,
a trace identi cation code, the number of samples, and the sampling interval; optionally,
other information like source and receiver coordinates, water depth, elevation information,
etc. After this header the values of the trace are given in one record, of course as many
as given in the header. After these trace blocks comes the header of the next trace, and
then the values of that next trace, and so on (see also gure (4.1)). For more details the
reader is referred to again appendix C.
Summary
In this section the mostly used data formats for seismic re ection data are brie y
discussed: SEG-D and SEG-Y.

4.3 Surveying information and eld geometry


The next step after converting SEG-D to SEG-Y is incorporating the survey results and
the eld geometry. In the rst step all the information from the surveyors has to be put
into the SEG-Y reel and trace headers (i.e., coordinates, elevation information, etc.). It
should be realized that the surveying of a seismic survey is an enormous amount of work,
both at sea and on land. Sometimes it can take half a year to have processed the surveying
data from a 3-D marine seismic survey. It should be remembered that surveying is a basic
need of any seismic method: if one doesn't know where the measured data is located, the
results are not of much use. Also, when the surveying is not done properly, one can choose
the processing parameters as precise as possible, but the results will always be degraded
by the surveying errors. Normally, positioning errors that are made vary between 1 and 5
meters.
When we set up the eld geometry, we use the surveying data to specify how we have
shot in the eld. Setting the eld geometry of a marine survey is rather simple compared
to a land survey; on land we often have many obstacles which had to be avoided, or we
49

had to follow a road which wasn't on a straight line. In setting up the eld geometry,
we must then also use the seismic observers log in order to avoid errors. Many types of
processing problems arise from incorrectly setting the eld geometry.

4.4 Trace editing and balancing


Trace editing

The next step is looking at the data itself. The data must be examined on their quality, and the seismic observers log must be worked through. An observers log is of report
of all the measurements that have been done in the eld and some comments about the
acquisition circumstances and possible hardware problems. An example of part of such a
log for a marine survey is shown in gure 4.2. In every survey there are bad shots and bad
traces and they must be deleted or repaired from the le. Bad shots can be shots that
triggered the instrument but not the shot itself, or the shots are not recorded at all. Bad
traces can be dead traces (being completely zero), or traces that show polarity reversal
due to bad soldering in the eld, noisy traces due to 50 Hz power lines; many other reasons
may exist why a certain shot or trace was bad. At sea, there is no way to repeat a shot at
a certain position, as the boat will have to keep moving. On land, shots may be repeated
at the same position such that bad shots can just be removed from the survey. When the
seismic observer has done his or her job properly, it should be mentioned in the log why
these features occurred. Trace editing is often a tedious and time-consuming job, but if
this is not done properly, the resultant stack can sometimes be awful. In practice, when
people start to process data for the rst time, they often do a lot of other processing as
well (e.g. make a raw stack) just to nd out that there are many spurious features present
in the stack. When the starting processor nds out what caused them, it often appears
that they included a bad shot or a bad trace.

Trace balancing

The next step in a normal procedure is equalizing the amplitudes of samples, traces
and shots. With regard to equalizing amplitudes on one trace, it may be obvious that
re ections late in the section will be of a much smaller amplitude than the ones early in
the section, simply due to energy losses in the subsurface. The most important energy
losses are due to geometrical spreading, absorption of the rocks and transmission losses.
Geometrical spreading is spreading of the wavefront since the energy of a wave is inversely
proportional to 1=r2 (in a homogeneous medium). This means that the amplitude is
decaying as 1=r. Although the measured eld is from a correct physical experiment,
some aspects cannot be dealt with in standard processing. Therefore, corrections are
made for the raw data to match with the assumptions in further data processing. The
rst assumption is that the data is processed as if it was measured
p in a two-dimensional
world, which means a spreading which is proportional to 1= r for the amplitude. A
correction is made on the traces with a spreading function. This geometrical spreading
function is speci ed as a traveltime and a speci c average velocity function. The other
(gain) correction is applied to correct for absorption, and this one is corrected with an
exponential gain function, where the factor in the exponential is related to the average
absorption coecient of the rocks. Then, the last correction is to confront the transmission
losses at interfaces in the subsurface, although this correction is usually included in the
exponential gain for the absorption losses.
The other amplitude equalization is within the traces of one shot. In one shot it can
happen that a certain trace has over the whole trace a smaller amplitude than any of
the neighboring traces, and this can be due to e.g. a di erent sensitivity of a geophone,
or perhaps bad coupling of the geophone to the ground. This could be corrected for by
calculating a total energy of a trace and making the energy of each trace in a shot the
50

same, or at least smoothly varying as a function of o set.


Then the third equalization is equalization of di erent shots. Shots may generate
di erent amounts of propagating energy due to di erent coupling conditions, and this is
most severe for dynamite on land. This equalization becomes important when a di erent
sorting than source-receiver is used, such as cmp-o set (which is mostly done). In one
CMP, we combined traces from di erent shots, and if all the shots haven't generated the
same signal energies, the traces in the CMP will show this. It may be possible to equalize
those traces, but it is more reasonable, i.e. more related to the real cause, to make the
energies of all the shots equal before any sorting is done.
A last equalization is purely for plotting purposes and does not serve to correct for
a physical phenomenon, as discussed in the last three paragraphs. This equalization is
Automatic Gain Control or AGC. With this type of scaling a trace sample is scaled to
the sum of the absolute values of the neighbouring sample values. The number of time
samples around this sample is the window length W + 1. In formula, AGC is applied by
a scaling function sAGC [t] as follows:
xout [t] = sAGC [t]xin [t]
(4.1)

Figure 4.2: Example of an oberserver's log page for a marine dataset.

51

where

sAGC [t] = 1 Pt+W=12


W +1  =t,W=2 jxin [ ]j

(4.2)

With seismic data, it customary to plot the data with AGC because it makes the amplitudes along the section of comparable amplitude.
Summary
In this section some energy equalization methods are brie y discussed: geometrical
spreading, receiver{ and shot equalization, and Automatic Gain Control

52

4.5 Static corrections


Elevation and weathering statics

When we record our seismic data in the eld, the ideal case would be that there would
be no topography at all and that the shallow subsurface would not in uence our resultant
section at all. In the situation of marine seismics, this ideal situation is almost the case,
but the case on land is much di erent. On land, we usually have topography, and we need
to correct for that topography. In seismic terms, if one geophone would be on top of a
small hill and the others not, the sound would need extra time to arrive at the geophone
on the hill. This means that all the re ections of the subsurface would arrive later than
in the neighboring traces. In order to get the timing right, we would have to apply some
time shift to the whole trace. Applying a time shift is called a static correction; it is
called static because it is one time correction for the complete trace, thus all samples will
be shifted with the same constant amount. More speci cally, when we correct for static
shifts due to topography, we call it the elevation correction, the elevation referring to some
datum level.
Besides the elevation statics, there is another type of statics: the weathering statics.
This is caused by some low-velocity chunk of material right below the geophone. This
kind of static shift is classi ed as a static shift due to the weathered layer, which is a
general term for static shifts due to the top layers of the earth which are often weathered
and irregular. Although in theory, the static shift for both types depends also on angle of
incidence of the re ected waves, in practice a single time shift works often satisfactorily.
On land, these two types of static shifts, elevation and weathering, are often determined
in a separate refraction survey. This is done because the data from the seismic re ection
survey is not suitable for a good analysis of the shallow subsurface, and also because in
many cases the static shifts can have such a detrimental e ect on the stack that it is worth
spending money on a separate survey in order to determine a model which explains these
shifts as good as possible. The approach here is that we can determine the structure of the
shallow subsurface in a deterministic fashion. The static corrections which are determined
via eld data, obtained from the refractions in the re ection data or from a separate
refraction survey, are called eld statics. The static corrections for surface topography are
the elevation corrections, while corrections for the weathered layer are called weathering
corrections.
The procedure of eld statics is, in the order of being applied to the data:

 Weathered-layer correction
 Elevation correction
This is illustrated in gure (4.3). These corrections do a good job in the sense that
re ectors in the subsurface align much better, but it may still not be good enough and
often an extra static correction is applied which makes the alignment of re ectors even
better. This aligning is not based on any physical measurement, but based on maximizing
a correlation between adjacent traces. This is called the residual static correction. This
correction may relate to some physical feature, perhaps due to unexplained features in
the model determined from the refractions, but in essence it is a cosmetic correction. It
should be realized that determining the residual static correction goes much better when
the eld statics have been applied and therefore determining the residual static without
having applied eld statics can be very dicult due to the irregular nature of the crosscorrelation function.

The weathering corrections

Weathering corrections are applied to correct for di erences in re ection time from
trace to trace, due to variations in the transit time of the weathered layer along the
53

cw
cc

weathering correction

cc

elevation correction

cc

Figure 4.3: Static correction procedure: removing e ects of weathered layer (from top to
middle gure), and of surface topography (from middle to bottom gure).
seismic traverse. The weathered layer is a zone of less compacted young sediments with low
seismic velocities. It is only present in the rst few meters below the surface and may show
considerable variations in thickness as well as in seismic velocity. Due to these variations
in the weathered layer, irregular di erences in arrival time between corresponding seismic
signals on di erent seismic traces of one shot record and between signals on corresponding
traces from di erent shots may occur. As a result of these irregular "delay-time" variations,
the re ection events on the records show an irregular (noisy) character.
The weathering correction is a traveltime correction; the delay time of the weathered
layer is subtracted from the observed traveltime and replaced by a delay time of material
with the velocity cc of the rst consolidated layer, i.e.:
Tw = , dc w + dcw
w

(4.3)

in which dw denotes the thickness of the weathered layer and cw denotes the seismic
velocity of the weathered layer. In fact, the weathered layer is replaced by a layer with
the velocity of the rst consolidated layer. In the following we shall derive the necessary
expressions to determine this expression, since we do not know dw on beforehand.
54

The following derivation is one which allows irregular refracting boundaries between
the weathered and rst consolidated layer; it is also known as the plus-minus method. For
simpli cation, we shall assume that the boundary is a dipping boundary; however, the
analysis is also valid for irregular boundaries.
The computation of the weathering correction can be derived from separate refraction
surveys. With a refraction survey, the distance between source and receiver is large compared to the depth of investigation: we must make sure the rays are critically refracted
into the refractor and therefore we need large angles of incidence. Assuming the shot is
red near the surface, the wave rst travels in the weathered (upper) layer, then travels in
the rst consolidated layer horizontally along the base of the weathered layer, and refracts
critically through the weathered layer to the geophone.
With a refraction survey, it is standard procedure is to shoot from both sides in order
to be able to determine possible dips in the refracting boundary. In appendix D, the
traveltime T of the ray for one source and one geophone is derived:
1 x+ +
T = c cos
(4.4)
S
G

c
where x is the distance between source and receiver, is the angle of the refracting
boundary with the horizontal, cc is the wave speed in the refracting medium (the rst
consolidated layer), and S and G are some delay times at the source and receiver, respectively. What a delay time means, can be seen simplest by assuming a horizontal
refractor ( = 0); consider gure 4.4(a). The time of the part of the ray which goes from
the refractor to the geophone G, is given by:
TAG = TAD + TDG;
(4.5)
where A is the point from which the wave is di racted from the base of the weathered
layer. Now we can write the traveltime TAD as

x3 sin  = x3 cw = x3 = T
=
(4.6)
TAD = AD
c c c
cw cw
cc AG
w c
in which cw is the wave speed in the weathered layer, x3 is the horizontal distance of the
upgoing ray, and c is the critical angle. Thus, the time from A to D is now converted in
a time for the ray to travel the horizontal distance; this is shown in gure (4.4b). In the
case of a dipping refractor, this term is included in the term x=(c2 cos ); the delay time
is then taken with respect to the normal of the dipping boundary. The other term TDG
can be seen as a delay time, denoted by G , so TAG can be written as:
TAG = xc 3 + G
(4.7)
0

Let us now consider the situation as drawn in gure (4.5); we have taken again only
one dipping refracting boundary. In this situation we have a geophone placed between
two shots S1 and S2 . It is also assumed that a geophone is at position S2 when shot S1
is red, and vice versa. Let us denote the total distance between shot S1 and S2 by L,
and the distance from shot S1 to the geophone G by x. Then the traveltimes from shot
S1 and S2 are given by:
1 L+ +
TS1 S2 = c cos
(4.8)
S1
S2

c
1 x+ +
TS1 G = c cos
(4.9)
S1
G1

c
1 (L , x) +  + 
TS2 G = c cos
(4.10)
S2
G2

c
55

geophone
G

cw

geophone
G

cw

dw

c dw
D

D
A

cc

x3

cc

(a)

(b)

Figure 4.4: Horizontal re ector: upgoing ray with critical angle (a), with its equivalent
path (b).
where the numbers 1 and 2 in G1 and G2 denote that they are due to shot 1 and 2,
respectively. When we add the last two equations, and substitute the rst into it, the
following equation is obtained:
TS1G + TS2 G = TS1 S2 + G1 + G2
(4.11)
Hence,
G1 + G2 = TS1 G + TS2 G , TS1 S2
(4.12)
What can now be seen in this equation, is that the delay times of the geophone can be
determined from the times which are observed in the refraction seismograms.
The most important result from this equation, is when we look at the ray paths of
the di erent shots. What can be seen, is that the delay times are partly obtained from
subtracting the paths which are similar as indicated by the numbered ray segments
1 to

4. This means, that the delay time which is obtained here, is a local traveltime. In the
case of a dipping refractor, the analysis is exact; however, when the refractor is irregular,
the above is an approximation but still applicable as long as the boundaries are not too
irregular.
In the weathering correction which needs to be applied, the depth of the weathered
layer is used; however, we do not know the depth on beforehand. To that end, consider
the delay time G1 + G2 , i.e., (see also appendix D):
 cos(c , ) cos(c + ) 
+
 + = d
G1

G2

cw

= dG 2 cos cc cos


w

cw

(4.13)

Using this equation, the depth at the geophone dG can be expressed in the delay time:
(G1 + G2 )
dG = c2wcos
(4.14)
cos c
56

x
L

S1
1

dG

S2
c

velocity c1

velocity c2

Figure 4.5: Con guration of dipping refractor for determining delay times of geophones,
using shots from both ends.
Let us now turn to the weathering correction, as mentioned in the introduction of the
section. The weathering correction is de ned as the di erence between the actual travel
time to a point at the surface and the traveltime it would have taken to reach the same
surface point if the weathered layer were replaced by material having the velocity of the
rst consolidated layer. We thus obtain (see gure (4.4)):
Tw = dcG , dcG = dG cwc ,c cc
c
w
c w
1
cw , cc
G1 + G2 ) cw , cc = (G1 + G2 )
= 2(cos
1
=
2
cos c cc
2 cos (1 , c2w =c2c )
cc

 cc , cw 1=2
(

+

)
G
1
G
2
= , 2 cos
(4.15)
cc + cw
Now using the delay time as derived earlier (eq.(4.12)), the nal expression becomes:


S2 G , TS1 S2 ) cc , cw 1=2
Tw = , (TS1 G +2Tcos

cc + cw
p

(4.16)

The factor of the square root (cc , cw )=(cc + cw ) is of the order 0.6 to 0.9 and can often
be assumed to vary slowly along the seismic traverse or to be constant in a particular
region. For the calculation of this factor the weathering velocity cw must be known; this
may be determined from the short distance geophones of the refraction spreads.
When the seismic source is an explosive, buried under the weathered layer, it is also
possible to determine the weathering corrections at the shot-points from the so-called
uphole time: a geophone is planted close to the shot hole in order to determine the
vertical travel time from the source to the surface. The weathering correction is found by
57

subtraction of the "uphole time" Tup from the source depth Zshot divided by the velocity
of the rst consolidated layer cc :
Tw = Zshot
c , Tup
c

(4.17)

At the shot holes, the factor with the square root in equation (4.15) can now be checked by
comparing the weathering correction from the uphole times with the weathering corrections
derived from geophones. Increased density of shooting along the seismic traverse, as is the
case in multiple-coverage shooting, increases the number of checkpoints along the seismic
line accordingly.

Elevation corrections

Once the weathering corrections have been applied, elevation corrections have to be
applied for sources and detectors. These elevations are determined via surveyors who
have determined the coordinates, so also the elevation, with their instruments. In the
calculations of the time correction it is assumed that the weathered layer has been replaced
by material having the velocity cc of the rst consolidated layer. The elevation correction
for the shot is:
(4.18)
Te;s = ZS c, dS
c

in which ZS is the elevation of the shot and dS is the shot depth. The elevation correction
for the geophones is:
Te;g = ZcG
(4.19)
c

in which ZG is the geophone elevation. These corrections have to be subtracted or added


to the weathering corrections, depending on the chosen datum level (station above or
below datum).

Residual statics

In the former discussion of refraction spreads, it should be realized that refraction


surveys have their limitations, such as non-detection of low-velocity zones (no critical
refractions!) and blind zones, and thus refraction statics are not perfect. Therefore, when
these corrections have not cleaned up the data suciently, one can apply the so-called
residual static corrections. Normally, the residual statics are in the order of the dominant
seismic wavelength. They can be determined via calculations of time shifts using crosscorrelations between pairs of seismic traces belonging to large sets of multiple-coverage
seismic re ection data, being CMP gathers, shot gathers, o set gathers and/or geophone
gathers. These statistically determined static corrections are often applied in a "surfaceconsistent" way: the same time shift at the same location. So if the multiplicity (or fold)
of the data is N , and there are M surface positions, one has to solve M  N equations
with M unknowns. These residual static corrections can be determined more eciently
when the eld statics have been applied, with less iterations and with a more accurate
result.
Summary
The static errors due to a weathered layer and surface topography are derived. The
weathering correction is usually determined from refraction spreads. In this section this
correction is derived, where the nal result only contains traveltimes, the velocities of
the layers and a dip, if present; no depths are needed. To seismic re ection data, the
weathering correction is applied rst. Next, the elevation correction is applied to correct
for surface topography. Then nally, residual statics are applied to line re ections up even
better to correct for errors not determined from the refractions spreads.

58

4.6 Deconvolution
Deconvolution concerns itself with removing a part of the data which is convolutional. For
instance, we know that a total seismic response consists of the convolution of the seismic
source wavelet with the earth response, convolved with the response from the seismic
detector, convolved with the seismic response from the recording system. If we consider
only the seismic source signature s(t) and the impulse response of the earth g(t), then the
seismic signal can be written as:
x(t) = g(t)  s(t):
(4.20)
Usually, we are not interested in the responses from the seismic source, detector or recording system, so we want to remove them. The most critical response in this list is usually
the seismic source. Removing the seismic source from a seismic recording is called signature deconvolution. We can distinguish two types of deconvolution: deterministic and
statistical deconvolution: the rst type describes the deconvolution of a known signature
s(t) from the data, whereas the second type describes the removal of signature or other
e ects from the data based on some statistical assumptions (e.g. "whiteness" of re ectivity series, minimum phase assumption of wavelet).

4.6.1 Deterministic deconvolution in the frequency domain


Let us assume we have a wavelet with a known spectrum, S (f ). Neglecting receiver and
recording-system responses, the response is seen as a convolution of only the earth response
and the seismic source in the time domain. Then the convolution becomes a multiplication
in the frequency domain:
X (f ) = S (f )G(f );
(4.21)
in which X (f ) is the spectrum of the seismic recording, and G(f ) is the spectrum of the
earth response. Now if we want to remove the source signature, then we have to divide
each side by S (f ); or equivalently apply the inverse operator F (f ) = 1=S (f ) to each side,
obtaining:
X (f ) = G(f ):
(4.22)
S (f )
Of course, this states the problem too simple. For instance, a seismic recording always
contains noise. When the seismic recording is taken as the earth response together with
some noise term, i.e., X (f ) = S (f )G(f ) + N (f ) in which N (f ) denotes the noise term,
then the deconvolution in the frequency domain becomes:
X (f ) = G(f ) + N (f ) :
(4.23)
S (f )
S (f )
The next problem is that due to this division, the noise is blown up outside the bandwidth
of signal S (f ). This e ect is shown in gure (4.6).
There are two ways to tackle this problem. The rst one is that we stabilize the
division. This is done by not applying a lter F (f ) = 1=S (f ) but rst multiplying both
the
numerator and the denominator by the complex conjugate of the source spectrum,
S  (f ); and since the denominator is now real we can add a small (real) constant  to it.
Thus instead of 1=S (f ), we apply the lter:

F (f ) = S (f )SS ((ff )) + 2 :

59

(4.24)

signal spectrum

noise spectrum

frequency
a) Before deconvolution

noise spectrum

1.0

signal spectrum

frequency
b) After deconvolution

Figure 4.6: The e ect of deconvolution in the frequency domain in the presence of noise.
Often we take  as a fraction of the maximum value in jS (f )j, e.g.  = MAX (jS (f )j)
with in the order of 0.01 - 0.1. In this way we have controlled the noise, but it can
still be large outside the bandwidth of S (f ) (see gure (4.6)). As an example, gure (4.7)
shows the result for deconvolution
The other way of dealing with the blowing up of the noise is only doing the division
in a certain bandwidth which is equivalent to shaping the wavelet s(t) into a shorter one,
which we call d(t). In fact the signal d(t) is our desired output wavelet on the seismic
data, which is often a nice-looking signal in the time domain, such that we have optimal
resolution in our seismic sections. In this case we do not apply the lter 1=S (f ) but instead
we use D(f )=S (f ). Then the deconvolution amounts to:

X (f )D(f ) = G(f )D(f ) + N (f )D(f ) ;


S (f )
S (f )
60

(4.25)

20

40

60

80

100

120

a)

20

40

60

80

100

120

b)

20

40

60

80

100

120

20

40

60

80

100

120

20

40

60

80

100

120

1
1

20

40

60

80

100

120

c)

Figure 4.7: Applying stabilized inversion in the frequency domain, left for a noise free
wavelet, right for a noisy wavelet. a) Spectrum of signal to be inverted. b) Spectra of
inverse operators with 3 stabilization constants (e = 0, 0.05, 0.1). c) Multiplication of
inverse lters with original spectrum of a), i.e. the deconvolution results.
where jD(f )j is approximately equal to jS (f )j, i.e.:
(f )j
a < jjD
S (f )j < b;

(4.26)

in which b=a is less than 10, say. As mentioned, we would like to end up with a signal
that is short in the time domain. This means that the spectrum of D(f ) must be smooth
compared to the true wavelet spectrum S (f ). Note that a short signal in time corresponds
with a smooth (i.e. oversampled) signal in frequency, as the major part of the time signal
will be zero. Practically this means when we know the spectrum we can design some
smooth envelope around the source spectrum S (f ), or we can just pick a few signi cant
points in the spectrum and let a smooth interpolator go through these picked points. An
example of designing such a window is given in gure (4.8).
61

signal spectrum S()

smoothed spectrum D()

frequency

smoothed signal d(t)


original signal s(t)

time

Figure 4.8: Designing a desired wavelet via smoothing in the frequency domain.
As a last remark of deconvolution in the frequency domain it can be said that in
practice both ways of control over the division by S (f ) are used. That is that we apply a
lter :

F (f ) = S (Df )(Sf )(Sf )(f+) 2
(4.27)
to the data, resulting in:

X (f )D(f )S  (f ) = G(f )D(f )S (f )S (f ) + N (f )D(f )S  (f ) :


S (f )S  (f ) + 2
S (f )S  (f ) + 2
S (f )S  (f ) + 2

(4.28)

This is about the best we can do given the constraints of bandwidth and signal-to-noise
ratio.

4.6.2 Deterministic deconvolution in the time domain : Wiener lters


So far we have dealt with deconvolution in the frequency domain which was just a modi cation of a division of spectra. Let us look speci cally at the lter F (f ) for the case that
62

the stabilized division and shaping is applied, i.e.:


(S (f )S  (f ) + 2 )F (f ) = D(f )S  (f );
(4.29)
in which we have brought the denominator of the division in equation (4.27) to the other
side. When we transform this equation back to the time domain, we obtain:
fss(t) + 2 (t)g  f (t) = ds (t)
(4.30)
These equations are called the normal equations. In words, by calculating the autocorrelation of the signal s(t) and the cross-correlation of the desired signal d(t) and the actual
signal s(t), the corresponding lter can be determined via the above equations. The length
of the lter can be chosen by the user. Note that the de nitions of auto-correlation and
cross-correlation can be found in Chapter 2. In Appendix E the concepts of correlation
are further explained and some examples are shown.
These normal equations can also be derived starting from a di erent point of view,
namely minimizing the squared error between a desired output and an actual output, as
is common for Wiener lters. In that situation we want to minimize the error between
a desired signal d(t) and the convolution of the in put signal s(t) and a lter f (t), and
stabilizing this by also minimizing the energy of the coecients of the lter:
X
E = [d(t) , f (t)  s(t)]2 = minimum:
(4.31)
t

In this way the length of the lter f (t) can be chosen according to the user's de nition.
This may be desired because we do not always need an exact deconvolution lter, but a
short lter that will do 95% of the job. To avoid that the lter coecients f (t) will take
large values, often the energy of the coecients of the lter are minimized as well:
X
X
E = [d(t) , f (t)  s(t)]2 + 2 f 2(t) = minimum:
(4.32)
t

The minimum energy can befound by taking the derivative of the above equation to
the lter coecients and equlaizing them to zero. We shall not do this derivation here
but refer the interested reader to Appendix F. The result of this derivation is the set
of normal equations as described above. There exist a fast scheme of calculating the
lter coecients for such a system, namely the Levinson recursion scheme (e.g. see
[Robinson and Treitel, 1980]), which makes it possible to determine the coecients ef ciently on a computer. But in order to use this scheme we must rewrite the system to a
slightly di erent form such that a Toeplitz matrix arises. To this e ect, write the normal
equations as a discrete convolution:
N
X

n=0

ss[i , n]f [n] + 2 f [i] = ds [i]

for i = 0; 1; 2; :::; N;

(4.33)

where we use square brackets to denote that we deal with discrete data. Note that the lter
f [i] is a causal one here. Writing this in matrix form, using the fact that the autocorrelation
is symmetric around zero correlation time, i.e. ss[n] = ss[,n], we obtain:
0  [0] + 2  [1]
1 0 f [0] 1 0  [0] 1
ss[2]    ss[N ]
ss
ss
B
BB f [1] CC BB dsds[1] C

[1] ss [0] + 2 ss[1]    ss[N , 1] C
B
ss
C
B
BB f [2] CC BB ds[2] C
C
2    ss[N , 2] C

[2]

[1]

[0]
+

B
C
C
ss
ss
ss
=B
B
C
B
C
C
.
.
.
.
B
C
B
C
B
.
..
..
..
@
A @ .. A @ .. C
A
ss[N ] ss[N , 1] ss[N , 2]    ss[0] + 2
f [N ]
ds [N ]
(4.34)
63

0.5

amplitude

0.5

1.5

2
0

10

15

20
time (ms)

25

30

35

40

Figure 4.9: A mixed-phase wavelet to nd the inverse for.


The matrix with the autocorrelation coecients has the Toeplitz form, which is exploited
by Levinson's algorithm. This algorithm reduces computation from N 3 operations to N 2 ;
where N is the number of rows or columns in the matrix. Thus, this can be computed
eciently on a computer.
What is now the advantage of determining the lter in the time domain? An advantage
is that the formulation makes it easy to take any number of coecients for the lter f [n].
Also, we can take a few negative lter coecients, and many positive ones, or vice versa;
this can be advantageous in many circumstances. When doing the deconvolution in the
frequency domain there is no control on the length of the lter in the time domain.
Moreover, it should be realized that in order to do the deconvolution in the frequency
domain, we have to transform the data to the frequency domain rst, which takes up some
computing time. This becomes only worthwhile for a certain number of lter samples, say
about 100. For a small number of lter coecients, it is more ecient to calculate the
lter coecients via the Levinson scheme.
Via the time domain, we have a lot of exibility in the choice of the lter coecients,
such as length, desired wavelet and the amount of negative and positive samples. But how
can we optimize these parameters? In order to address this problem, consider the wavelet
as given in gure (4.9). Is there a good (short) causal lter that converts the wavelet into
a spike? In general, yes, but we should not require the spike to be at a zero delay. The
optimal choice is usually desiring a spike that has a delay that corresponds to the point
where the wavelet s[k] has most of its energy.
This can be inspected via de ning the partial energy of a signal, i.e.:

p[k] =

k
X
s2[n]

n=,1

(4.35)

and plotting the partial energy as a function of k: Note that most signals in seismics are
causal so the summation will start at n = 0: Using the partial energies we can classify
certain wavelets. For instance, the minimum-phase or minimum delay wavelet has the
fastest energy build-up, thus most of the energy is in the beginning. In the same way,
a maximum phase or maximum delay wavelet has the slowest energy build-up, so most
of the energy is at the end. Any case in between these extremes is called a mixed-phase
wavelet. The wavelet of gure (4.9) is a mixed-phase wavelet and its cumulative energy
is shown in gure (4.10). It appears that the main peak is around 12 ms (fastest energy
build-up). It is now clear that the optimum delay of a spike for a minimum-phase wavelet,
64

is no delay. For any mixed phase wavelet, we must delay the spike of the desired output
signal. This delay time os often called the lag.
For Wiener lters, there is some measure of how good the lter does its work, and that
is of course via the error which is calculated via the square of the di erence between the
actual output, and the desired output. The performance P of the deconvolution is de ned
as:
P [d[n] , f [n]  s[n]]2
P = 1 , n P d2 [n]
;
(4.36)
n
which is a number between 0 and 1, P = 1 meaning that we have a perfect lter. Let us
consider again the wavelet of gure (4.9) and design deconvolution lters for a xed lter
length of 20 ms, but with di erent lag times in the desired output signal. Figure (4.11)
shows the result for the deconvolution for a number of lag times, with the dotted lines
showing the desired output signal, being the delayed pulse. The best result is achieved for
lag 16 ms. If we repeat this experiment for a range of lag times and plot the performance
as a function of lag, gure (4.12) is the result. There we can see that the optimum lag of
this wavelet is 14 ms.
As is obvious, when we increase the lter length, the performance will be better; in the
extreme of in nite lter length, we will nd an (almost) perfect inverse lter. However,
with a nite delay of the spike and lter coecients which start at t=0 (i.e. causal lter
as considered in these examples), we cannot obtain the exact inverse lter. We can also
make a curve which gives the performance as a function of the lter length. Such an
example is given in gure (4.13) for our wavelet under consideration. The lag has now be
kept constant at 14 ms. As can be seen in this gure, is that the asymptotic value does
not go up to 1, but a lower constant value. This is the best we can do with a nite delay,
a causal lter with an in nite number of samples.
So the two important parameters in optimizing the lter design, are the lag (i.e. delay)
in the desired output, and the lter length. We can always improve the performance by
increasing the lter length and changing the lag time. Therefore, for design purposes, it is
useful to plot values for P as a function of lag and lter length as is given in gure (4.14).
Note that the previous two gures showed cross-sections of this 2 dimensional function.
There is some rule of thumb when designing lters. That is that if we inspect the
wavelet and look where most of the energy is (via partial energies) then the optimum lag
distance will be half the distance of the lter length and the peaking of the energy of the
wavelet. So, for instance, if the wavelet seems to put its energy mostly around 12 ms, and
20
18
16
14

amplitude

12
10
8
6
4
2
0
0

10

15

20
time (ms)

25

30

35

40

Figure 4.10: Partial energy function for the wavelet of gure (4.9).

65

0.8

lag 0 ms

0.8

lag 8 ms

0.6

P= 0.00

0.6

P= 0.7963

0.4

0.4

0.2

0.2

0.2

0.2

0.4

0.4

0.6

0.6

0.8
1
0

0.8

10

20

30

40

1
0

50

10

20

30

40

50

0.8

lag 16 ms

0.8

lag 24 ms

0.6

P= 0.8892

0.6

P= 0.5269

0.4

0.4

0.2

0.2

0.2

0.2

0.4

0.4

0.6

0.6

0.8
1
0

0.8

10

20

30

40

1
0

50

10

20

30

40

50

0.8

lag 32 ms

0.8

lag 40 ms

0.6

P= 0.5269

0.6

P= 0.1503

0.4

0.4

0.2

0.2

0.2

0.2

0.4

0.4

0.6

0.6

0.8
1
0

0.8

10

20

30

40

1
0

50

10

20

30

40

50

Figure 4.11: The actual output for spiking deconvolution for the wavelet given in gure
(4.9), for di erent delays. The dashed line de nes the desired wavelet, i.e. a shifted spike.
The length of the lter is 20 ms

0.9

0.8

performance

0.7

0.6

filter length=20 ms

0.5

0.4

0.3
0

10

15

20

25
lag (ms)

30

35

40

45

50

Figure 4.12: The performance curve as function of delay for spiking deconvolution for the
wavelet given in gure (4.9) for a lter length of 20 ms.

66

0.85
0.8
0.75

performance

0.7
0.65
0.6

lag=14 ms

0.55
0.5
0.45
0.4
0.35
0

10

15

20
25
filter length (ms)

30

35

40

Figure 4.13: The performance of the deconvolution lter as a function of lter length for
spiking deconvolution of the wavelet as given in gure (4.9) with a lag of 14 ms.

deconvolution performance

deconvolution performance
50
45
40

0.9

35

0.8

30
lag

0.7
0.6

25
20

0.5

15

0.4

40

10

30

0.3
50

20

40
30
10
lag

10

20
0

10

15

20
filter length

25

30

35

40

filter length

Figure 4.14: 3d and contour plot of the spiking deconvolution lter performance as a
function of lter length and lag for the wavelet as given in gure (4.9).

67

we choose a lter length of 20 ms, the optimum lag will be around 0.5*(12+20) = 16 ms.
This rule of thumb is found to be quite useful in practice.
There is perhaps one parameter which we have left out in the discussion so far and that
is the stabilization constant 2 . This parameter is not only needed in the deconvolution
via the frequency domain, but also for the deconvolution in the time domain. In order to
solve the matrix system for the lter in a stable fashion, we need to add some noise to the
diagonal of that matrix. Common values for this stabilization factor are 1% to 5% of the
maximum of the power of the wavelet (ss [0]).

4.6.3 Statistical deconvolution: minimum phase deconvolution


Up to now we have assumed that the wavelet where we want to deconvolve for is known.
However, in practice this is most of the time not the case. Therefore, common practice
is to make some assumptions on the statistical properties of the wavelet and of the earth
response. One of these approaches leads to the method of minimum phase deconvolution.
For this we go back to the convolution model of the earth response, as given in equation
(4.6):
x(t) = g(t)  s(t):
(4.37)
Suppose we now de ne our desired deconvolved earth response as d(t) by applying lter
f (t) to the measured response x(t) and we want that d(t) = g(t). This can be achieved
by solving the following energy equation:

E=

X
t

[g(t) , f (t)  x(t)]2 = minimum:

(4.38)

Note that at this stage g(t) is still unknown, but we assume it for the moment to be known.
Similar as described in the previous sub-sections, solving this will lead to the following
normal equations (omitting the stabilization constant):
xx( )  f ( ) = gx( );
(4.39)
where xx ( ) is the autocorrelation x(t)x(,t) and gx ( ) is the cross-correlation g(t)x(,t).
The following expressions can now be formulated:

xx( ) = x( )  x(, )


= s( )  g( )  s(, )  g(, )
= g( )  g(, )  s( )  s(, )
= gg ( )  ss ( );
for the autocorrelation function and
gx( ) = g( )  x(, )
= g( )  s(, )  g(, )
= gg ( )  s(, )

(4.40)

(4.41)

for the cross-correlation function. With these expressions the normal equation can be
written as:
gg ( )  ss( )  f ( ) = gg ( )  s(, )
(4.42)
i.e. fully in terms of the autocorrelation of the true earth response, the autocorrelation of
the wavelet and the wavelet itself.
68

Now, the assumptions come into play. First let us assume that the actual earth response

g(t) is statistically "white" :

gg ( ) = G( );

(4.43)

which means that it consists of a series of uncorrelated re ections. In practice we know


that this is not the case in general, although to some extend this assumption is acceptable
(see also the examples in Appendix E). With this assumption, the normal equation can
be rewritten as:
ss ( )  f ( ) = s(, ):
(4.44)
The in uence of the earth has been fully removed from this equation. In fact we assume
that by looking at the auto-correlation of the seismic signal, the earth re ectivity only
gives a scaling factor, but that the auto-correlation can be directly interpreted as the
auto-correlation of the source signal (besides a scaling factor).
The second assumption is that the wavelet is considered to be minimum phase. This
means that given the power spectrum of the wavelet (ss ( )), the time domain response
s( ) can be constructed by de ning that the wavelet is as short as possible, i.e. it has the
fastest possible energy build-up. See also [Robinson and Treitel, 1980] for more details on
the minimum phase concept. At this stage it is important to know that given the power
spectrum of a signal, its minimum phase time domain representation can be constructed.
In Figure 4.15 an example of the minimum phase equivalent of the mixed-phase wavelet
of Figure 4.9 is given. Note that the amplitude spectrum is the same of both signals (by
de nition) and that the minimum phase equivalent of the input signal has most of its
energy concentrated around zero time. In that case the signal s(, ) in equation (4.6.3)
can be determined from the power spectrum and the lter f (t) can be calculated. As the
factor G in equation (4.6.3) is not known, the (assumed minimum phase) source wavelet
can only be determined up to an absolute scale factor. This means that the lter (f (t)
will remove this wavelet only up to this absolute scale factor.
In this subsection, we have seen that under two assumptions: the whiteness of the
re ection series and the minimum phase behavior of the source signal the e ect of the
source can be removed from the seismic traces.

4.6.4 Statistical deconvolution: predictive deconvolution


We can now assume that the wavelet has been shortened by minumum phase deconvolution
or the wavelet has already a short duration in time (i.e. dynamite wavelet). However, there
are other e ects in the seismic response that are undesired and can be removed using the
Wiener lter deconvolution method. One of these e ects is reverberations of the seismic
wave eld between two re ectors, for example reverberations in the water layer. Figure
(4.16) shows the e ect of a water layer on the seismic response. Each up-going re ection
will be followed by a train of reverberations, alternating in sign because the surface has a
re ection of approximately -1. These reverberations have a nice property, i.e. they repeat
themselves after the reverberation period  = 2z=c, with z the thickness of the water
layer and c the water velocity, and each reverberation is scaled by ,R1 , the re ectivity
of the water bottom. Deconvolution via the time domain has speci c applications in
the removal of these multiple re ections, and the Wiener lter designed for this is called
predictive deconvolution.
The main idea of predictive deconvolution is that if we delay the signal with seconds,
which is chosen in the order of  = 2z=c seconds, and adapt the amplitudes (by convolving with lter ft ), it will t again in itself. This means that a primary re ection will
t with the rst order reverberation, the rst order reverberation will match the second
order reverberation, etc. So we want to minimize the error E for the lter coecients ft
69

1.5

0.5

0.5

amplitude

amplitude

0.5

0
1
0.5

1.5

10

15

20
time (ms)

25

30

35

1.5

40

a) Mixed-phase wavelet

10

15

20
time (ms)

25

30

35

40

b) Minimum-phase wavelet

14

12

amplitude

10

50

100

150

200

250

frequency (Hz)

c) Amplitude spectrum

Figure 4.15: A mixed phase wavelet (a) and its minimum phase equivalent (b). The
amplitude spectrum (c) is the same for both signals.
according to the following formulation (see also appendix F):

E=

X
t

(xt , ft  xt, )2 ;

(4.45)

= 2z/c
-1

-1

R1

R1

R1

R1

t
2

-R1

Figure 4.16: Reverberations in a water layer in depth (left) and its e ect on the seismic
trace (right).

70

or, equivalently,

E=

X
t

(xt+ , ft  xt )2 :

(4.46)

The crucial factor in this type of deconvolution is choosing the desired wavelet: we choose
it as dt = xt+ , thus the original signal, but then delayed with . The solution can again
be de ned as a set normal equations as follows (neglecting the stabilization):

xx(t)  f (t) = xx (t + ):

(4.47)
If we consider a discrete version of this equation, introduce a short length lter f [n] and
also take stabilization into account, the normal equations are as follows:
N
X
n=0

xx[i , n]f [n] + 2 f [i] = xx [i + N ]

for i = 0; 1; 2; :::; N;

(4.48)

With N being the discrete equivalent of the time lag . As in the subsection on Wiener
ltering, this set of normal equations can be written as a matrix-vector equation as follows:

0  [0] + 2  [1]
xx[2]
xx
xx
B
2
xx[1] xx[0] + 
xx[1]
B
B

[2]

[1]

B
xx
xx
xx[0] + 2
B
..
B
...
@
.
xx[N ] xx[N , 1] xx[N , 2]

   xx[N ]
   xx[N , 1]
   xx[N , 2]
..
.
   xx[0] + 2

10
CC BB
CC BB
CC BB
A@

1 0  [N ]

CC BB xxxx
[N + 1]
CC BB xx[N + 2]
=B
.. C
..
C
@
. A B
.

f [0]
f [1]
f [2]

f [N ]

xx[N + N ]

(4.49)
In these equations only the autocorrelation of the seismic trace, xx [n], and the shifted
autocorrelation, xx [n + N ], are used to de ne the lter samples f [n].
Note that if we have found the optimum solution f [n] we have to apply it on the data as
the predictive deconvolution lter [n] , f [n , N ] to remove the reverberations from the
input.
Figure (4.17) shows the result of a predictive deconvolution ltering procedure to a
shot record from a eld dataset. In the gure (4.18) the autocorrelations of the two shots
have been displayed, showing clearly that predictive deconvolution removes the ringing in
the autocorrelation function of the data.

Analytical example of predictive-error ltering


We will analyze the response:

i!1) :
G(!) = 1 +R1Rexp(
exp(i! )
1

(4.50)

This is the normal-incidence re ection response from a layer above a half space. The upper
boundary has re ection coecient |1 (like the sea surface) and the lower boundary has
a re ection coecient R1 : The two-way traveltime is given by 1 (see also gure (4.19)).
When we expand the denominator, we obtain:

G(!) = R1 exp(i!1 )[1 , R1 exp(i!1 ) + R12 exp(2i!1 ) , R13 exp(3i!1 ) +   ]: (4.51)
In each of these terms we recognize a multiple. In the time domain this gives the response:

gt = 0; 0; : : : ; 0; R1 ; 0; 0; : : : ; 0; ,R12 ; 0; 0; : : : ; 0; R13 ; 0; 0 : : : :
71

(4.52)

1
CC
CC
CC
A

offset (m)
1000 1500 2000

2500

0.5

0.5

1.0

1.0

1.5

1.5
time (s)

time (s)

500

2.0

500

offset (m)
1000 1500 2000

2500

2.0

2.5

2.5

3.0

3.0

3.5

3.5

4.0

4.0

Figure 4.17: Shot record with reverberations (left) and after predictive deconvolution
(right).
where we have N1 , 1 zeroes at the beginning, with N1 = 1 =t and t being the discrete
time sampling. Also between R1 and ,R12 we have N1 , 1 zeroes, between ,R12 and R13
we have again N1 , 1 zeroes, etc. We now assume we have measured this seismogram
(with in nite bandwidth) so for the moment leave all the earlier considerations on source,
receivers, etc. out. We wish to convert this seismogram to a seismogram without multiples.
For application of the Wiener-Levinson technique we require the auto-correlation function
gg ( ) of gt :

gg ( ) = R12 (1 + R12 + R14 +   ) for  = 0


= 0
for 0 <  < 1
= ,R13 (1 + R12 + R14 +   ) = ,R1 gg (0) for  = 1 :
Thus the discrete version of the autocorrelation of g( ) can be written as:
gg ; [n] = Eg ; 0; 0; : : : ; 0; ,R1 Eg ; : : : :
72

(4.53)

(4.54)

offset (m)
1000 1500 2000

2500

500

-1.5

-1.5

-1.0

-1.0

-0.5

-0.5
time (s)

time (s)

500

offset (m)
1000 1500 2000

2500

0.5

0.5

1.0

1.0

1.5

1.5

Figure 4.18: Autocorrelation functions of the shot record with reverberations (top left)
and after predictive deconvolution (top right)
R1

R1

g(t)

R1
t

-R1

-R1

R1

f(t)=(t)+R1(t-1)

R1

y(t)=f(t)g(t)

Figure 4.19: The input, operator and output for the analytical example of multiples within
one layer.
where Eg denotes the energy of gt ; and we have N1 , 1 zeroes. Let the lter length Nf
73

Airgun wavelet

Airgun wavelet after spiking decon

20

0
0
-20

0.2

0.4

0.2

0.4

Figure 4.20: Measured airgun wavelet and the result after spiking deconvolution.
be less than N1 , and let the prediction distance be = 1 , or N = N1 . The normal
equations are then:
0
10
1 0
1
gg [0] 0    0
f [0]
gg (N1 )
BB 0 gg [0]    0 CC BB f [1] CC BB 0 CC
BB ..
B
C=B
C:
(4.55)
.. C
. C
@ .
A B@ ... CA B@ ... CA
0
0    gg [0]
f [N , 1]
0
The only member of this system whose right-hand side does not vanish is:
gg [0]f [0] = gg [N1 ]
(4.56)
and thus
f [0] = gg [N[0]1 ] = ,RE1 Eg = ,R1 :
(4.57)
gg

The total prediction-error operator becomes:


1; 0; 0; : : : ; R1 ;
(4.58)
(note the minus sign disappearing). In practice it is not necessary to set the prediction
distance exactly to N1 . The model here allows us to choose N to take any value as long
as it is less than or equal to N1 . Also, we must have N + Nf > N1 . So we can now apply
our lter to the seismogram, as depicted in gure (4.19).
It is interesting to note that the original seismogram was periodic in time. This is
only the case for normal-incidence data when we consider a plane-layered earth. If we do
not have normal-incidence data, it is not periodic (in the (t; x) domain). In that case, the
predictive deconvolution procedure can only partly solve the reverberation problem, but
taking a lter length with Nf > 1 it can still handle the situation to some extend. Another
solution of this is to go to the so-called linear Radon domain, or  , p domain, where data
is mapped into plane waves. There, the periodicity for each angle of incidence is again
constant, and predictive deconvolution can be applied more accurately. A discussion of
the linear Radon transform is beyond the scope of this course notes.

4.6.5 Spiking deconvolution


One of the most used deconvolution procedures in practice is the spiking deconvolution.
In fact the spiking deconvolution is a special case of predictive deconvolution: by taking
74

the lag to be one sample in predictive deconvolution, a spiking deconvolution is obtained!


Examples of spiking deconvolution are shown in gures (4.20) and (4.21), in which spiking
deconvolution results on a single airgun wavelet and on a synthetic shot record with
this airgun wavelet are shown. Note that although the spiking is not perfect, clearly
the temporal resolution is increased after the spiking deconvolution procedure: events
have sharpened and each event starts with a large positive peak, which is desired in
interpretation. It can be shown that spiking deconvolution performs best when the wavelet
is minimum phase: then the spiking deconvolution lter is stable and appears to be a
perfect inverse lter for the minumum phase signal.

Summary of deconvolution

With a deconvolution process an undesired distortion on the seismic signals is removed,


which is either caused by the acquisition tools (e.g. airgun bubble e ect) or by (the shallow)
part of the earth (e.g. reverberations in a water layer). If this e ect can be considered
as a convolution e ect on the desired seismic signals, by convolution with a lter with
the inverse e ect, this distortion can be removed from the data. In most cases the objective of deconvolution is to increase the time resolution of the data. Two main types of
deconvolution are considered: 1) deterministic and 2) statistical deconvolution.
In the deterministic deconvolution, the convolutional distortion is known and need be
removed in an optimal way. For the statistical deconvolution the undesired e ect is not
precisely known, but based on some statistical assumptions on the data this e ect is reduced.
Assumptions that are often used is uncorrelated re ectors, minimum phase behavior of the
source signal and repeating patterns within the signal.
The deterministic deconvolution can be performed in either the frequency domain or
in the time domain. Statistical deconvolution, with the aid of prediction error ltering,
is always performed in the time domain. In the time domain domain the deconvolution

-1500

offset (m)
-1000
-500

offset (m)
-1000
-500

0.5

time (s)

time (s)

0.5

-1500

1.0

1.0

1.5

1.5

2.0

2.0

Shot with airgun wavelet with ghost

Shot ghost after spiking decon

Figure 4.21: Synthetic shot record with measured airgun wavelet and the result after
spiking deconvolution.

75

problems can often be written as a set of so-called normal equations in which auto- and
cross-correlation functions of the input signals are related. This type of equations can be
written as a (Toeplitz) matrix-vector equation and can be solved eciently with a Levinson
recursion scheme. The advantage of applying the deconvolution in the time domain is that
the user has a good control on the time length and possible instability e ects. Often the
deconvolution with a short lter in the time domain gives an optimal stable result, but with
a slight loss in accuracy.

76

horizontal wave number

GR

frequency

c refl

Figure 4.22: Ground roll and a re ection in the (f; kx ) domain.

4.7 Filtering in the (

f; kx )

domain

We have seen in the chapter on basic signal analysis (chapter 2) that a dipping broadband
event in the (t; x) domain gives a dipping event in the (f; kx ) domain. The dip is not the
reciprocal of the velocity as in the (t; x) domain, but the velocity itself. In this section, we
shall discuss two applications which make use of these characteristics of di erent slopes
in the (f; kx ) domain, together with some other properties in this domain. These applications are the removal of ground roll in land data, and removal of multiples in marine data.

Ground roll ltering

Let us rst look at ground roll in land data. Ground roll is the general term for surface
waves which travel along the surface, and only if the earth would be a homogeneous
half space, the ground roll would be a perfect Rayleigh wave, giving one event in the
measurements. Since this is not the case for the real earth, it is a combination of resonances
which show a very dispersive character due to interferences of the di erent modes.
At the same times that we receive the ground roll, we get the re ections back from
the subsurface, so they overlap. A di erence between these two types of arrivals, i.e.
the ground roll and the re ections, is that they travel with a di erent speed along the
surface. The ground roll travels along the surface with a relative low speed, while the
re ections from below arrive nearly at the same time for a close group of receivers with
a slight hyperbolic move-out because in re ection seismology we are aiming at re ections
from small angles of incidence. In the (t; x) domain, we obtain the response as derived in
chapter 2, i.e.:
pGR (t; x) = sGR (t , c x )
(4.59)
GR
where sGR is the wavelet for the ground roll, where the subscript GR stands for ground
roll. For the re ected events, we have the response in the (t; x) domain:
(4.60)
p (t; x) = p (t , x sin() )
Refl

Refl

cRefl

where we made use of the fact that the apparent speed along the surface for the re ected
77

offset (m)
500

1000

-0.05
0

horizontal wavenumber (m-1)


0

20

frequency (Hz)

time (s)

0.5

1.0

40

60

1.5
80

2.0

100

a)

b)

Figure 4.23: Simulated shot record with dispersive ground roll. a) Shot record with ground
roll and two re ection in (t; x) domain. b) Shot record of a) in (f; kx ) domain.
events is cRefl = sin(), where  denotes the angle the normal on the wave front makes with
the vertical. Since  is small, the apparent velocity is large. In principle the re ection
events are located within a pie-slice in the (f; kx ) domain for apparent velocities between
cRefl and 1. We can now make a sketch of these two arrivals in the (f; kx ) domain, as is
given in gure 4.22. A nice feature of this diagram is that the ground roll and re ections
are now well separated.
A shot record with dispersive ground roll, simulated in an earth model with a few thin
layers on top, together with two re ections is displayed in gure 4.23. Note the separation
of the ground roll and re ections in the (f; kx ) domain.

78

-100

offset (m)
0

100

-0.05
0

horizontal wavenumber (m-1)


-0.03
0
0.03

frequency (Hz)

time (s)

-0.05

50

0.05

0.10

100

a)

b)

Figure 4.24: "Pie-slice" lters in the (t; x) and (f; kx ) domain.


The use of the two-dimensional Fourier transform enables us to carry out ltering
in the (f; kx ) domain. In the case of one-dimensional ltering we are used to perform
multiplication of the complex spectrum of an input signal with the complex spectrum of
a lter function, being equivalent to convolution in the time domain:
Ft [h(t)  f (t)] = H (f )F (f )
(4.61)
in which h(t) is the input time response, and f (t) the lter response. Similarly, in the
(f; kx ) domain, ltering may be carried out through multiplication with some window
function or a more general function of kx or f only.
However, the two-dimensionality of the (f; kx ) domain provides the extra freedom to design
lters of which the transfer characteristics are functions of both kx and f . In particular,
lters with boundaries being linear functions of kx and f , i.e. lters with sector-shaped
pass-band or rejection-band boundaries (see gure 4.24). These lters, called "pie-slice"
lters of which the (linear) sector boundaries are speci ed by apparent velocities, may be
designed in such a way that, say, re ection energy is passed in a sector covering the f {axis
(apparent velocity is 1) and a range of relatively high apparent velocities on both sides of
the f {axis, while suppressing energy from the regions further away from the f {axis, e.g.
in the region between the +kx{axis and a line through the origin with velocity parameter
cGR = f=kx;GR somewhat larger than the highest apparent velocity of the low-velocity
ground roll events.

79

offset (m)
500

1000

-0.05
0

horizontal wavenumber (m-1)


0

20

frequency (Hz)

time (s)

0.5

1.0

40

60

1.5
80

2.0

100

a)

b)

Figure 4.25: Simulated shot record with dispersive ground roll after (f; kx ) domain pieslice ltering. a) Filtered shot record in (t; x) domain. b) Filtered shot record of a) in
(f; kx ) domain.
This is shown in gure 4.25 in which the example of gure 4.23 has been ltered such
that the ground roll is removed (except for some edge e ects).
A eld example of a section with ground roll and without ground roll, removed by (f; kx )
ltering, is shown in gure 4.26. Note that the re ection events (e.g. at 0.8 and 1.5
seconds) become better visible after the ltering. However, in the lower part of the data,
some of the ground-roll appears to be aliased, and the (f; kx ) ltering procedure will smear
this aliased ground roll. It will be dicult to make a distinction between smeared ground
roll and true re ection events in that region.
So far, we established a separation of the arrivals in the (f; kx ) domain by means of
di erent apparent velocities, but the process is even more accentuated by the fact that
ground roll usually contains much lower frequencies than the re ection events; it can sometimes even happen that this separation can be achieved only on frequency considerations,
but even in that case (f; kx ) "pie-slice" ltering has the preference.

80

500

offset (m)
1000 1500

2000

0.5

0.5

1.0

1.0

1.5

1.5
time (s)

time (s)

2.0

2.5

3.0

3.0

3.5

3.5

4.0

a)

500

offset (m)
1000 1500

2000

2.0

2.5

4.0

b)

Figure 4.26: Field example of a shot gather (after statics correction) with ground roll and
after removal of ground roll by dip- ltering.

Multiple removal using (f; kx ) domain ltering

Another application of (f; kx ) domain ltering is removing multiples on marine data.


We already discussed multiple-removal in the context of predictive deconvolution, but
there we quite depend on the statistical nature of the process and on the plane layering of
the re ectors causing the multiples. Also, for long-period multiples, this ltering procedure
will not yield good results. With other words, the assumption that multiples appear as
a strict periodic sequence does not always apply in pratice, especially for non-zero o set
data. For these situations, (f; kx ) ltering can help. Here again, we make use of the
fact that there is a di erent apparent velocity of the multiple, compared to the re ections
coming from deeper down. The re ections from deeper down have generally encountered
a higher wave speed, and so these arrivals will arrive more vertically at the surface (see
also gure 4.27), compared to multiples that arrive at the same traveltime (and therefore
have propagated longer in shallower layers). However, due to structure in the subsurface,
this observation may not be true anymore. Therefore, for multiple elimination, the data is
sorted into CMP gathers (see also chapter 3) in order to make the hyperbolic assumption
of the re ection events better valid.
The way the multiple elimination ltering is applied is as follows:

 Sort data into CMP gathers.


 Apply NMO correction with velocity in between primary and multiple events, which
81

Figure 4.27: The ray path of a multiple and a re ection from deeper down.
will map the upward curved primaries to the negative kx values and the downward
curving multiples to the positive kx plane.
 Apply (f; kx) ltering by removing one half of the kx plane, containing the multiple
events.
 Resort data into shot gathers or continue processing in CMP gathers.
A eld example for a CMP gather from a marine line in the North Sea is given in gure
4.28. Here we see an enormous amount of multiple re ections, which can be identi ed after
NMO correction, with velocities in between primary and multiple velocities, as shown in
gure 4.28b. The upward curving events are identi ed as primaries, whereas the downward
curving events are the multiples. In the (f; kx ) domain, the positive kx plane is zeroed,
and after inverse NMO correction, gure 4.28d is the result. Note that due to the mute
in the NMO correction, some re ection information is lost. Note also that in the lower
part of the region, we are not sure that all the remaining events are primaries: generally
the di erence in move-out velocity is not present for all multiples, especially in the deeper
part of the data.

Summary of F-K ltering

When disturbing events need be removed from the seismic data, and there is a clear
di erence in slope between desired and undesired events, a ltering procedure in the (f; kx )
domain can be applied. In this domain, certain slopes can be rejected from the seismic
data, resulting in removing all events related to these slopes. Applications can be found
in removal of groundroll and other low velocity events (especially in land data) and the
removal of multiples (especially in marine data).

82

0.5

1.0

1.0

1.5

1.5
time (s)

0.5

2.0

2.5

3.0

3.0

3.5

3.5

4.0

a)
offset (m)
500 1000 1500 2000 2500 3000

0.5

0.5

1.0

1.0

1.5

1.5

2.0

2.5

3.0

3.0

3.5

3.5
4.0

c)

b)
offset (m)
500 1000 1500 2000 2500 3000

2.0

2.5

4.0

offset (m)
500 1000 1500 2000 2500 3000

2.0

2.5

4.0

time (s)

time (s)

time (s)

offset (m)
500 1000 1500 2000 2500 3000

d)

Figure 4.28: Field example of multiple elimination on a marine CMP gather, using NMO
correction. a) Input CMP gather. b) NMO corrected CMP gather with NMO velocities in
between primary and multiple velocities. c) Result of b) after removing positive dipping
events. d) Filtered CMP gather after inverse NMO correction.

83

200

400

600

800

1000

1200

50
100
150
200
250
300
350
400

Figure 4.29: The e ect of a dipping re ector on the ray pattern.

4.8 Dip Move-Out (DMO) / Pre-stack Partial Migration


The (t; x){curve for a dipping re ector

When we applied the NMO correction to the CMP gather, we assumed we were dealing
with horizontal layers, giving rise to quasi-hyperbolic events. When we have dipping
re ectors, the NMO correction still corrects for the hyperbolic move out, but the velocities
we use are not the true velocities any more, since they include the dip of the re ector. In
order to obtain the true velocity, an extra term needs to be added, and the extra correction
for the dip is called Dip Move-out, abbreviated to DMO.
Let us consider gure (4.29). We see that when we take a line perpendicular to the
re ector at subsurface re ection point for a nite o set in the subsurface and take the
intersection of this line with the surface (z = 0); that this point does not lie at the
midpoint between source and receiver. This would not be so troublesome if the subsurface
re ection point would be the same for the neighboring source-receiver pair in the CMP.
But, as can be seen in the gure, the re ection points are smeared out over the re ector.
We will now derive the extra term due to the re ection-point smear. To this purpose,
consider gure (4.30). We have a source S with a receiver R; the distance between these
two is called 2xh , where the subscript h stands for half-o set. The depth of the re ector,
measured perpendicular to the interface at the receiver location, is called dR ; the depth
of the re ector at point H half between S and R is called dH ; the angle of the re ector
with the horizontal is called . When we take the image of the receiver, we can apply the
"cosine-rule" to determine the distance r of the ray path from source to receiver, i.e.:

r2 = (2xh )2 + (2dR )2 , 2(2xh )(2dR ) cos( 2 + )

= 4x2h + 4d2R + 8xh dR sin( )


(4.62)
so we see an extra term arising in the distance, and thus also in the traveltime. But before
writing down the traveltime, we should consider that we want to get the same common
re ection point for a CMP gather, so we do not want dR in the equation but dH : To this
e ect, consider the extra lines drawn in gure (4.30) to determine the relation between
dH and dR : Hence,
dH = dH;1 + dH;2
84

dR

d H,1
x

d H,2

Figure 4.30: A model to derive the DMO term.


= dR + xh sin( )
Now substituting dH for dR ; we obtain:

r2 = 4x2h + 4(dH , xh sin )2 + 8xh sin( )(dH , xh sin )

= 4d2H + 4x2h cos2


This is the distance travelled by the ray, so the traveltime becomes:

2 2 ( )
t = rc = t2H + 4xh cos
c2

(4.63)
(4.64)
(4.65)
(4.66)

in which tH is given by:

tH = 2dcH
(4.67)
We see that we have a dip-dependent velocity cdip which is related to the true velocity c
by cdip = c= cos( ):
Here we see that the NMO correction with a velocity of c = cdip = c= cos( ) will do

a good job, only the velocity used for the correction is not the true: it includes the dip.
However, when we have two dips which arrive at the same time, there is a problem: which
velocity do we take ? This is the problem of con icting dips and is illustrated in gure
(4.31). We have two re ectors, one horizontal and one dipping. The CMP gather looks as
given in gure (b). In gures (c) and (d) we have corrected with the right NMO velocity
for the horizontal and dipping re ector, respectively. As can be seen, if we have the right
velocity for the one, it is wrong for the other. After correcting for DMO, we take the e ect
of dip into account and correct for the dip in the right manner, as shown in gure (e).
85

Figure 4.31: A model of two re ectors, showing the problem of con icting dips.

86

The DMO correction

In this subsection we will derive the term needed for correcting for the extra time
e ect. To that purpose, we apply the correction for velocity and dip in two steps, via
writing cos2 ( ) = 1 , sin2 ( ) and splitting the above equation (4.66) as:

t = t2DMO + 4cx2h
in which tDMO is de ned as:

(4.68)
2

( )
tDMO = t2H , 4xh sin
2
c

(4.69)

The rst equation can be seen as the NMO correction. The other equation has been
termed dip move-out or DMO. It can be seen that the time tDMO is equal to the time tH
when the o set between source and receiever is zero; when there is some o set, the time
tDMO will be smaller than tH (and remember that the NMO correction has already taken
place).
Let us now return to the gure given in the beginning, gure (4.29). In that gure,
we saw the re ection-point smear along the subsurface re ector. We need to put each
source-receiver pair in the right CMP gather. So, this means that for each o set the data
has to be shifted to another CMP position. For a single dipping re ector with a constant
velocity layer in between, we can derive what the re ection-point smear is. This is derived
in appendix G. The derivation involves some quite elaborate algebra, so the full derivation
is left out here. From the re ection-point smear together and some extra relations, we
can derive the equation which describes the time e ect due to dip, as is also shown in
appendix G. The result is:

tI = tDMO

2 !1=2
(
x
,
x
)
I
h
1,

x2h

(4.70)

where tI is the true time from the re ection point upwards, perpendicular to the re ector.
This equation is the equation of an ellipse. An example of such an ellipse is given in gure
(4.32).
What is very striking and very nice of the expression above, is that the operation
does not depend on the velocity of subsurface! For this con guration, DMO is a velocityindependent process, and can thus be robustly included in any processing scheme. However, when the model consists of more than one layer, the DMO correction in general
still depends on the velocities of the layers but is not so sensitive to it. When we would
add the traces only with an NMO correction, then the resulting image will be of a lower
quality. So, we must somehow correct for this, and that is DMO. The function of DMO
is to migrate to a true zero-o set section.
The DMO operator has to be applied to common-o set sections. So therefore the usual
procedure in applying NMO and DMO is:






Apply NMO
Sort the data to common-o set gathers
Apply DMO
Sort the data back to CMP gathers
87

Figure 4.32: The DMO operator : an ellipse in the common o set domain. (Note that
tNMO in gure = tDMO in text; xm in gure = xI in text)

 Inverse NMO with the velocities from the rst NMO


 Apply NMO with the true velocities
An example is given in gure (4.33). These days, this is a standard procedure in data
processing. Following DMO, it is often possible to re-pick velocities on the DMO corrected
data and re-apply NMO thus giving both a better quality stack and better velocities for
input to post-stack migration, interpretation and other velocity-dependent techniques.
In the discussion so far we discussed DMO as a process via the space-time domain as
operators. Also, we only discussed it via kinematic e ects (ray theory), not yet putting
any wave theoretical aspects in it. DMO can be given a wave-theoretical basis by way of
the Kirchho integral. There are still problems with amplitude and phase distortions in
this approach, but despite these problems, integral DMO methods are extremely popular
today due to their speed and their adaptability to irregular surface sampling as is common
in 3-D seismics. As with many migration algorithms, DMO can be applied via di erent
routes, such as via the (f; kx ) domain. This latter has been done by [Hale, 1984].
When should we use which method? We follow the recommendations given by [Deregowski, 1986].
If you have regular spaced 2D data and amplitudes are an important feature then it is
best to use an (f; kx )-domain method such as the log-stretch technique. This is especially
true if the data contains a range of di erent dips. If amplitudes are not of prime concern,
then (t; x)-domain methods are adequate provided some care has been taken to pass the
steeper dips and that the operator is anti-aliased.
If a complete proper image of the subsurface is the goal, pre-stack migration is needed.
However, this technique su ers from a large computational cost and the requirement of a
detailed velocity model, but for suciently complicated structures it is the only method
to correctly image the data. DMO is a method which bridges these extremes. DMO
88

Figure 4.33: Figure from [Yilmaz, 1987], gure 4-125.

89

Figure 4.34: The e ect of con icting dips (Schoot,1989).

90

Figure 4.35: The better-resolved dipping fault using DMO (Schoot,1989).


increases the costs via computational e ort, but is still fast enough to be part of a standard
processing sequence.
Finally, we would like to give a eld example as shown in gures (4.34) and (4.35),
taken from [van der Schoot, 1989].

Summary of DMO

In this section, the correction due to the dip of a re ector is given. The most important
feature is that for a single layer the correction does not depend on the velocity model.
When more layers are present, DMO slightly depends on the velocity model. Mostly, DMO
is applied in a separate processing step, using common-o set gathers.

91

4.9 Zero o set (poststack) migration algorithms


In our basic processing sequence in chapter 3, we discussed Kirchho migration because it
corresponded with our intuitive notion of collapsing di raction hyperbolae to their apexes,
as was common in the early days with the di raction stack. In Kirchho migration we
not only get the timing right, but we also take account of amplitude variations along
the di raction hyperbolae. In practice, Kirchho migration is hardly used in post-stack
migration; instead nite-di erence methods, f , kx and f , x algorithms are employed.
These techniques will be the topic in this section. Again, we will derive these techniques
but will not go into all its ne mathematical details. We hope to give the most important
properties of these techniques.
With f , kx , f , x and nite-di erence techniques we will focus on both time and
depth migration. The basic di erence between time migration and depth migration is
that in time migration the migration result is expressed in "vertical time", i.e. the time
domain variant of depth. After application of time migration, a time-to-depth conversion
is required in order to obtain the nal depth image (see section 4.10). The application
of time migration versus depth migration implies that there is less dependence on the
velocity variations expected in the subsurface; depth migration is used in regions where
large lateral velocity variations exist. We will point out some di erences between time and
depth migration. For each method we will discuss what kind of parameters are needed,
and what the in uence is of these parameters. Finally, we give a summary of when to use
which technique.
In general migration consists of two steps:
1. Inverse extrapolation of the input data (i.e. the stacked section representing a zero
o set section) from surface level to a certain depth (or vertical time) level.
2. Select the t=0 component, which is the migrated result for that depth (or vertical
time) level.
Step 1 involves the wave equation. Therefore, we will derive the Kirchho integral, and
show how this is used for migration.

The Kirchho integral

These days the theory has developed such that we can describe the migration process
much better via wave theory. We will discuss some concepts in wave theory and discuss
migration in these terms. The migration methods based on wave theory are referred
to as wave-equation migration. Mathematically, there are two ways to solve the wave
equation, one is via integral methods, and the other via di erential methods. We choose
here to nd a solution via integral methods, because of two reasons. First, it adheres
to the intuitive idea of the di raction stack (which in fact is also an integration along
hyperbolic paths). Secondly, the integral method gives the general exact solution, while
the di erential methods involve some approximations. The classical paper explaining the
integral method as applied to seismic migration, i.e. Kirchho migration, is the one by
[Schneider, 1978]. In appendix H the derivation from the Kirchhof integral can be found.
The Kirchho integral looks as follows:
Z1Z
1
(x)p(x; t) = 4
(,G@t v , prG)  ndAs dts ;
(4.71)
,1 @D
where p(x; t) and v(x; t) describe the pressure eld and the particle velocity vector eld
of a wave eld. G(x; t) descirbes a so-called Green's function, which is the solution of a
point source in the same medium where the actual wave eld is present.
92

Figure 4.36: A pressure eld can be synthesized from the wave elds of a monopole and
dipole distribution on a closed surface, using respectively the particle velocity and pressure
of the actual wave eld at this boundary as their source strengths (after [Berkhout, 1984],
gure 5.1)
This equation expresses that if we know the pressure p and the time derivative of
the normal component of the particle velocity on a closed surface, the pressure can be
computed in every point inside D. Also, we recognize that the pressure at a certain position
is synthesized by means of a monopole (i.e. G) and dipole (i.e. rG  n) distribution on
a closed surface @D. The propagation of the secondary sources at the boundary @D to
the observation point (x; t) is described by the Green's function G. The same kind of
expression can be derived for the particle velocity, see [Berkhout, 1984], chapter 5, from
which gure 4.36 has been drawn.

Using the Kirchho integral for migration

The con guration of gure 4.36 is not suited to do zero o set migration directly. The
seismic measurements are done on the surface of the earth, and only one type of wave eld
(either pressure or vertical velocity component) is measured. Therefore, the Kirchho
integral can be rewritten for the special case of a at surface of the earth into a more
convenient shape.
The derivation of this migration integral can also be found in appendix H. Besides
choosing a special situation of the boundary D, being a at surface combined with a
semi-hemisphere that extends to in nity, also a special type of Green's function is used.
As we only have to de ne that the Green's function is the solution to the wave equation,
any combination of two Green's function to a new one is also valid. In the derivation of
the migration formula, the Green's functions are chosen to be two monopoles that are
chosen just above and below the surface, with opposite signs.
With these two choices, one term in the Kirchhof integral will disappear and nally
the following equation remains:
Z
s
p(x; t) = ,21 @z s ( p(x ; tr+ r=c) )dAs :
(4.72)
z =0
This is Kirchho 's migration formula, given by [Schneider, 1978]. In this equation As
represents the surface and R the distance between the subsurface location point x and
93

a point at the surface, and c the propagation velocity. We would like to stress that this
result does not involve any approximations, the result is only dependent on the knowledge
of the velocity distribution (i.e. vertical derivative of the pressure eld) at the surface.
In fact it states that thes wave eld in any point in the subsurface p(x; t) can be calculated
from the wave eld p(x ; t) recorded at a plane reference level zs , assumed that we have a
recording from ,1 until +1 at the surface.
Note that if the term t + r=c is replaced by t , r=c the inverse propagation from the surface
to point (x; t) becomes a forward extrapolation.

A general zero o set migration procedure

With the Kirchho 's migration formula the wave eld at a certain depth level can be
constructed from the wave eld measured at the surface. This can be used in a zero o set
migration procedure as follows:

 Consider a stacked section (eventually after DMO) as a zero o set section. This will





be correct in traveltime but not in amplitudes.


Consider a zero o set section to be an exploding re ector measurement, in which
all re ectors in the subsurface are considered to be sources that explode at t=0 and
travel to the surface with half the medium velocity (see also Chapter 3 of this lecture
notes). This again is good for explaining the traveltimes in a zero o set section, but
not perfect for amplitudes.
Inverse extrapolate the wave eld measured at the surface (i.e. the zero o set section)
to a depth level in the earth (by means of the migration formula equation (4.72)).
select the t = 0 component of this extrapolated wave eld, which will contain the
exploding re ector contributions of the depth level under consideration. Save this
t = 0 component in the migrated output section.
Repeat the last two steps for all depth levels, such that the complete migration result
is constructed for all depth levels.

The procedure is illustrated in gure 4.37 for a wedge-shaped re ector model. The zero
o set response ( gure 4.37a) contains the dipping contributions from the anks and a
di raction from the tip of the wedge. For three di erent depth levels a combined picture
of the migrated image (above the t=0 line) and the inverse extrapolated data (below the
t=0 line) is shown. Note that at the indicated t = 0 line the imaging of the current depth
level takes place. Note also that the di raction of the tip of the wedge is collapsing towards
a point at 1200 m depth and that the dip of the sides of the wedge are correctly positioned
after imaging.

Overview of the discussed migration procedures

In the following a number of migration techniques are discussed. Depending on the required accuracy and of the constraints given by the subsurface model, di erent techniques
will be selected in practice.
In chapter 3 already the so-called Kirchho migration procedure is discussed: by
modeling the operators that describe propagation from each point in the subsurface to the
surface level, the wave eld (zero o set section) can be extrapolated to each subsurface
point. By selecting the t = 0 component of each extrapolated wave eld the subsurface
image is built up. The quality of the modeled operators will de ne the quality of the image.
In its simplest form, the local stacking velocities (i.e. hyperbolic approximations) are used
to describe these propagation operators and the result is often called the di raction stack.
Such an approximation is only valid if the medium velocities vary very smoothly in both
lateral and vertical direction. For more complex models often raytracing methods are used
to calculate the traveltimes from the surface to each image point.
94

-1000

distance (m)
-500
0
500

1000

1500

t=0 0-1500

0.2

0.2

0.4

0.4

0.6

0.6

vertical time (s)

vertical time (s)

-1500
0

0.8
1.0
1.2
1.4

-1000

t=0

1.2
1.4
1.6
1.8

2.0

2.0

a) z=0

b) z=400
1000

1500

-1500
0

0.2

0.2

0.4

0.4

0.6

t=0

0.8
1.0
1.2
1.4

vertical time (s)

vertical time (s)

distance (m)
-500
0
500

1500

1.0

1.8

-1000

1000

0.8

1.6

-1500
0

distance (m)
-500
0
500

-1000

distance (m)
-500
0
500

1000

1500

0.6
0.8
1.0

t=0

1.2
1.4

1.6

1.6

1.8

1.8

2.0

2.0

c) z=800

d) z=1200

Figure 4.37: Zero o set migration in action. Above the indicated t=0 line the migrated
image is visible, below the t=0 line the inverse extrapolated result at the current depth
level. At t=0 the actual imaging takes place for that depth. a) Zero o set data of wedge
shaped re ector. b) Wave eld after inverse propagation to 400 m depth. c) Result at 800
m depth. d) Result at 1200 m depth.
The Kirchho migration procedure is a non-recursive method: for the result at each
image point, the extrapolation is done via one operator from the original wave eld (i.e.
zero o set section). Another approach is recursive migration: the wave eld is extrapolated in small depth steps the output for one depth level being the input for the next
extrapolation step. In the following, some well-known recursive migration procedures are
discussed in more detail.
The rst discussed method is the Gazdag phase shift migration, which is a relatively ecient procedure for a velocity model where the velocity only varies with depth.
This means that each extrapolation step (from one to the next depth level) can be achieved
by a simple multiplication in the wavenumber-frequency domain. So extrapolation will
consist of three steps: (i) Forward Fourier transform of the wave eld from x , y , t to
kx , ky , !, (ii) multiplication with the so-called phase shift operator, (iii) inverse Fourier
transform to x , y , t and selection of the t = 0 component.
95

If the velocity medium is completely homogeneous, an even more ecient method


can be used:the Stolt migration. By a forward Fourier transform to the frequencywavenumber domain, an interpolation procedure (axis transformation) and an inverse
Fourier transform, the complete image for all vertical times is immediately obtained (no
recursion procedure needed). This means that a direct conversion from the zero o set
data (as shown in gure 4.37a) to the nal image ( gure 4.37d) is achieved.
In practice, the subsurface hardly ever can be assumed by a homogenous velocity
model, and other techniques need be used that can handle more or less these velocity
variations. One of the procedures is the migration using a nite-di erence operator,
where the extrapolation and imaging is done in small steps, each step considering the
local velocity. Depending whether this migration procedure is carried out in the vertical
time domain or in the true depth domain, less or more accurate results can be achieved.
For laterally more complex media, the so-called recursive space-frequency depth
migration is the good solution, which also takes small steps in the lateral and depth
direction, each time using an exact wave eld extrapolation operator belonging to the
local velocity.
As can be expected, everything is a matter of eciency: the cheapest method is the
Stolt migration, but it can handle the least model complexities. The more complex the
subsurface model, the more expensive the migration method will be that must be selected
for an acceptable image.

Time migration via the f , kx , ky domain (Gazdag phase shift migration)

We can use the result from Kirchho migration as given in equation (4.72). This is
written in a compact form. Because of the form of r; we can write this formula symbolically
as a three-dimensional convolution, i.e.:
p(x; y; z; t) = p(x; y; 0; t)  21 @z (t +rr=c) ;
(4.73)
in which r is given by:
r2 = x2 + y2 + (z)2 :
(4.74)
In this expression we can see that we have a three-dimensional convolution over x; y and t
of the data with a term describing the inverse propagation through the medium from the
surface to depth z . It is assumed that we can consider a homogenous layer of thickness
z , otherwise equation (4.74) would be a more complicated function of the coordinates.
For this situation a convolution in t; x and y means that in the f , kx , ky domain this is
a multiplication of the transformed data with the f , kx , ky {transform of the last term
in equation(4.73). Note that the transformation is only valid for models for which the
velocity c is a function of the depth z only.
Let us determine what the f , kx , ky transform of this propagation term is. If we
call this term W , then we can write:
@ 1 Z Z exp(+2if r=c) exp(2ik x + 2ik y)dxdy: (4.75)
W~ (kx ; ky ; z; f ) = , @z
x
y
2
r
Now transforming this equation to polar coordinates for x and y, via x = r cos() and
y = r sin(), and also for kx and ky via kx = kr cos() and ky = kr sin() we arrive at:
dxdy = rdrd
kxx + ky y = kr r cos( , )
(4.76)

r =

p2
r + z2 :

96

Substituting this in the above, we get:



Z 1 exp(+2if r=c)  1 Z 2
@
~
W (kr ; z; f ) = , @z
rdr 2
exp(2ikr r cos( , ))d
r
0
0
Z 1 exp(+2ifr=c)
@
rdrJ (2k r);
(4.77)
= ,

@z 0

in which we have used the Bessel function J0 : The last integral is a standard Fourier-Bessel
transform, so we get:
@ exp(+2ikz z )
W~ (kr ; z; f ) = , @z
2ikz
= exp(+2ikz z )
(4.78)
in which kz is de ned as:
!
1=2
2
(4.79)
kz = fc2 , kr2 :
So this is a relatively simple function in the f , kx , ky {domain. Note that for forward

extrapolation, the phase shift is given as:


W~ (kr ; z; f ) = exp(,2ikz z):

(4.80)

Figure 4.38 shows a picture of the amplitude of the (complex valued) W function in
the kx ; ky domain. This operator is normally referred to as the phase shift operator. For
a homogeneous medium a depth step z can be taken by a simple multiplication in the
wavenumber-frequency domain:
P~ (kx ; ky ; z; f ) = W~ (kx ; ky ; z; f )P~ (kx ; ky ; 0; f );
(4.81)
in which the sign of the exponent in the phase shift operator will de ne whether it is a
forward (, sign) or inverse (+ sign) extrapolation. It appears that in a 2-D medium a
similar expression can be found:
P~ (kx ; z; f ) = W~ (kx; z; f )P~ (kx ; 0; f );
(4.82)
with the phase shift operator de ned as:
W~ (kx ; z; f ) = exp(2ikz z )
and the value of kz de ned as:

2 2 !1=2
f
kz = c2 , kx :

(4.83)
(4.84)

In equation (4.83) the sign of the exponent determines again forward or inverse propagation.
So far, we considered a general extrapolation scheme for any wave eld at the surface
that we want to inverse propagate into a medium with velocity c. For the case of a zero
o set section (i.e. our stacked section), we have shown in chapter 3 that we can consider
this as a response of exploding re ectors in the subsurface, considering a medium with
97

Figure 4.38: Amplitude of the homogeneous phase shift operator in the wavenumber
domain.
half velocity c=2. Therefore, from now on we have to use c=2 in our expressions. Then
equation (4.79) becomes:
2 2 2 !1=2
4
f
k =
,k ,k
:
(4.85)
z

c2

As mentioned in the beginning of this section, we are usually interested in results which
is mapped into vertical time and in this expression, we have a step z in depth instead
of in time. To convert to time steps, we rewrite the inverse propagation operator as:
W~ (kx ; ky ; z; f ) = exp(+2i
 );
(4.86)
in which
and  are given by:

and

2 c2 !1=2
2
2
k
k
c
y

= f 2 , x4 , 4

(4.87)

 = 2c z :

(4.88)

In this expression, we consider a zero o set section along the x and y coordinate, so the
horizontal wavenumbers are kx and ky . Note that
is related to kz via kz = 2
=c. W~
represents a simple phase shift which is a wave eld extrapolation over the time di erence
98

 . In a sense, we do not get the data in z; but in  so it would be better to introduce a


W~ 0 , which is related to W~ via:

W~ 0(kx; ky ; ; f ) = W~ (kx ; ky ; z; f ):


(4.89)
For migration, we forward f , kx , ky transform the data, apply the phase shift as above,
inverse f , kx , ky transform the result and gather the data at t = 0, to obtain the migrated
data. This method is called the phase-shift method, described by [Gazdag, 1978].

99

Time migration by Fourier transform (Stolt migration)

Let us consider the special case that the wave speed is constant. In this situation, the
phase shift migration procedure can be very eciently rewritten in a one-step migration
procedure, which is commonly known as Stolt migration. The wave eld extrapolation
process in this homogeneous velocity case can be written as:

p(x; y; ; t) =

ZZZ

P~ (kx ; ky ;  = 0; f ) exp(2i
 ) exp(,2i(kx x+ky y)+2ift)dkx dky df:

(4.90)
For migration we need the t = 0 component of the extrapolated wave eld, which results
into:
ZZZ
p(x; y; ; t = 0) =
P~ (kx ; ky ;  = 0; f ) exp(,2i(kx x + ky y) + 2i
 )dkx dky df:
(4.91)
If we can change the integration over f into an integration of
, it describes a simple
inverse Fourier transform. Therefore we are going to rewrite
as given in equation (4.87)
as:
2 c2 ky2 c2 !1=2
k
x
2
f=
+ 4 + 4
(4.92)

Changing the integration over f into one over


using requires the followng relation:
df = (
2 + k2 c2 =4
+ k2 c2 =4)1=2 d

(4.93)
x
y
and using this in the migration formula (4.91), we get the result:

p(x; y; ; t = 0) =

ZZ"


2
2
2
(
+ kx c =4 + ky2 c2 =4)1=2

P~ kx; ky ;  = 0; (
2 + kx2 c2 =4 + ky2 c2 =4)1=2

(4.94)

exp(,2i(kx x + ky y) + 2i
 )dkx dky d
:
(4.95)
The nice feature of this formulation is that it describes an inverse Fourier transform over
the coordinates kx , ky ,
, yielding the complete migration for all  values. The result
as given above is the constant-velocity Stolt (time) migration. To resume this procedure,
the following steps have to be taken:






Forward Fourier transform from x , y , t to kx , ky , f of the zero o set data


Interpolate from P~ (kx ; ky ; f ) to P~ (kx ; ky ;
)
Scale this result with the Jacobian factor of equation (4.93)
Inverse Fourier transform from kx , ky ,
to x , y , t

The second step describes a mapping from the energy in the kx , ky , f domain to another
position. Therefore this method is often called kz mapping.
Let us now discuss some practical aspects of the f , k migrations, rst the Stolt
migration. Stolt migration is not much applied in practice simply because it assumes
that the velocity is constant in the earth. Stolt made an extension of his original scheme,
100

and this one should be considered as a separate algorithm. In this extended algorithm
he introduced a so-called stretch-factor in which the term "stretch" is used because it
stretches the time-axis. This stretch factor is, theoretically, a complicated function of
velocity and stretch-coordinate variables; in practice it is often set to a scalar. The stretch
factor usually varies between 0 and 2, where for a constant velocity medium, the stretch
factor is exactly 1. A too small stretch factor gets undermigrated data, while a too large
stretch factor overmigrates the data. For a further discussion the reader is referred to
[Yilmaz, 1987], page 298 and page 514.
The other migration algorithm we derived, is the Gazdag phase-shift method which is
valid for velocity functions that vary with depth only, so more general than in the Stolt
migration. A parameter in the phase-shift method is the 'depth' step size  . This step
size must be set smaller when the dips are becoming larger. In practice, the 'depth' step
size is typically taken between the half and full-dominant period of the wave eld (which
is dependent on the steepness of the dips in the section). Again, for a good discussion on
results with this method, the reader is referred to [Yilmaz, 1987], page 301.

Recursive depth migration in the space-frequency domain

A restriction of the above mentioned migration methods is that they operate in the
wavenumber domain, under the assumption that the velocity eld is laterally invariant.
This can be a serious hurdle in practice, as the earth in general does not behave like that.
In such situation, the so-called recursive depth migration in the x , ! domain can be a
solution. For this we go back to the phase shift operator for homogeneous media, as given
by equation (4.80). With this operator, we can describe the propagation of a wave eld
from depth level z over a distance z (in the 2D domain):
P~ (kx ; z + z; f ) = W~ (kx ; z; f )P~ (kx ; z; f );
(4.96)
which describes a forward propagation. The inverse propagation, as used in migration, is
given by:
P~ (kx ; z + z; f ) = W~  (kx ; z; f )P~ (kx; z; f ):
(4.97)
This equation can also be rewritten in the space domain as a convolution:
P (x; z + z; f ) = W  (x; z; f )  P (x; z; f );
(4.98)
With W  (x; z; f ) a convolution operator, which is the inverse Fourier transform of the
(complex conjugate of the) phase shift operator.
Normally, we are in a medium with varying velocity as a function of x and z. However,
if we can assume a local homogeneous medium, i.e. that the medium is homogenous
within
an area with the horizontal length of the spatial convolution operator W  (x; z; f ) and
depth
z . Then, we can for each part in the medium use a di erent convolution operator
W  (x; z; f ), based on the local velocity, which is in fact a slice from the 3D operator
as shown in gure 4.38. Figure 4.39 shows the phase shift operator in the wave number
domain. Often, the objective for these type of migrations is to create operators in the
space domain that are as short as possible. The dashed line in gure 4.39 is such an
optimized operator, which is identical to the true operator up to a certain wavenumber
(i.e. propagation angle), and is short in the space domain (see gure 4.39 below). We
see that for the given example the medium should be homogeneous within an area of
approximately 200 m.
The nal migration algorithm consists of the following steps:

 Create short operators for di erent velocities and frequencies and store them in a
table.
 Start with the zero o set data at surface in the frequency domain P (x; z = 0; f ).
101

Figure 4.39: Amplitude and phase of the 2-dimensional homogeneous phase shift operator
in the wavenumber domain. The solid line is the true operator, the dashed line the
optimized short version. Below, the inverse Fourier transform of the optimized operator
to the space domain (the amplitude is shown).

 Apply an extrapolation to depth level z for all frequencies, using space dependent
operators:
P (x; z + z; f ) = W (x; z; f )  P (x; z; f ).
 Select the t = 0 component of the inverse extrapolated result, which is the migrated
result at thatR depth level:
P (x; z) = P (x; z; f )df .
mig

 Repeat the recursive extrapolation and imaging for each depth level.
Note that this process is easily extended to the 3D case, using short convolutional operators in the (x; y; f ) domain.

Finite-di erence time migration

The other type of migration we discuss here and which is often applied in seismic
processing, is nite-di erence migration. It is based on a di erential approach, rather
than an integral approach (as was the case with the previous methods). It involves the
so-called one-way wave equation, and some approximation to the vertical wavenumber.
Let us return to the homogeneous wave equation which was used as a starting point
102

for Kirchho migration, but now in the 2D situation, i.e.:


r2p(x; z; t) , 1 @ 2 p(x; z; t) = 0:

c2

(4.99)

For models for which the velocities only vary in the vertical direction, we can easily apply
our two-dimensional Fourier transform to this equation to obtain:

d2 P~ (k ; z; f ) + 42 f 2 , k2 P~ (k ; z; f ) = 0:
x
dz2 x
c2 x

This is an equation which is easy to solve:


Zz
P~ (kx ; z; f ) = P~0 exp(2i kz dz );
0

(4.100)
(4.101)

where kz is as given earlier (4.79) and the minus sign in the exponent is chosen for forward
extrapolation, or for inverse extrapolation we have to use the plus sign. This solution is
very simple, and actually a solution of a simpler di erential equation than we started o
with, namely the so-called one-way wave equation:

d ~
~
dz P (kx ; z; f ) = ,2ikz P (kx ; z; f ):

(4.102)

This assumes that we are considering a wave eld propagating in one direction, without
interaction with inhomogeneities (re ection etc). Again, the phase shift operator as given
by equation (4.80) is a solution of this one-way wave equation. As for the f , kx migration,
we are interested in time migration so we convert z to  via:
Zz
(4.103)
 = 2 1 dz;
0 c

so that d = 2dz=c. Also invoking the property that we deal with zero o set data in a
half velocity medium, we obtain:

d P~ 0 (k ; ; f ) = ,2i
P~ 0 (k ; ; f )
x
d x

(4.104)

where we have used


as before (equation (4.87)), and we have used P 0 to denote that it
does depend on  rather than z .
In nite-di erence migration, an approximation is made of
; by making a Taylor
expansion of the square root of
:

2 c2 !1=2
2 c2 !1=2
k
k
x
x
2
f ,
=f 1, 2

4f

22
22
' f 1 , k8xfc2 = f , k8xfc :

(4.105)

With this approximation the one-way wave equation is written as:

d P~ (k ; ; f )  ,2i[f , kx2 c2 ]P~ (k ; ; f )


x
d x
8f
22
= ,2if P~ (kx ; ; f ) + 2i k8xfc P~ (kx ; ; f ):
103

(4.106)
(4.107)

For a simpler solution, we assume that we have a wave eld P~ 0 that is related to the
wave eld-extrapolated wave eld Q~ by:
P~ 0 = Q~ exp(,2if );
(4.108)
in which  is again the vertical traveltime. This can be seen as the time shift for vertical
propagation, i.e. the solution for kx = 0. The extrapolation process (going from one to
another depth level) has mainly two e ects on the wave eld: reducing the overall time
by a vertical time shift and a contraction of the wave eld (e.g. collapsing of di ractors).
The general time shift with  compensates this rst e ect. The remaining e ect will be
described in the function Q~ . With other words, the function Q~ describes a correction
for other than vertical angles on this solution. Using this formulation for P~ 0 , working
out dP~ 0 =d and using the approximation for
; the approximated one-way wave equation
becomes:
dQ~ = 2i kx2 c2 Q:
~
(4.109)
d
8f
The factor of the right-hand side can be written as (c2 =8)  (,2ikx )2 =(2if ) and we can
recognize (2if ) as a di erentiation to t in time and (,2ikx ) as a di erentiation with
respect to x. The inverse Fourier transform of this equation then becomes:

@ 2 Q = c2 @ 2 Q :
@@t 8 @x2

(4.110)

This is the equation used for nite-di erence time migration. The most important approximation to derive this, is the expansion of
.
For implementation in a computer algorithms, the derivatives are written as nite
di erences, e.g.:
@Q  Q(t + t) , Q(t , t) :
(4.111)
@t
2t
For spatial derivatives a similar expression is used. With these approximations, it means
that each derivative involves a small operator in time and space. The output is calculated
as a recursive application of these nite di erence operators to calculate the result for
small steps in  .
2
Note that the 3D extension is rather straightforward, as the spatial derivative @@xQ2 is
2
2
replaced by @@xQ2 + @@yQ2 .
This expression is theoretically valid for velocities which vary only in the vertical
direction, but practically are also used for (smooth) lateral velocity variations. Because
of the expansion of
, the scheme is only valid for certain ranges of
, which can be
easily related to the structural dip. Theoretical studies have pointed out that dips up to
15 degrees can be handled accurately enough, and that's why this is called the 15-degree
nite-di erence scheme. However, in practice, dips up to 35 degrees can be handled.
Higher-order approximations to
can be used, and this is done in the so-called steepdip or 45 degrees nite-di erence algorithm [Claerbout, 1985]. As the name suggests, the
method can handle dips up to 45 degrees to a sucient degree.
Apart from the velocity being input to the nite-di erence migration scheme, there
is another important parameter and that is the depth step size,  . Since we deal with
a di erence scheme, the scheme can be expensive when the depth step size is taken very
small. However, too large a step in  causes the algorithm to undermigrate, so not
migrate enough. Often, the undermigration is accompanied by some dispersive "noise"
which is an e ect of approximating di erential operators with di erence operators. For
a more extensive discussion on practical aspects, the reader is recommended to read the
104

appropriate sections in [Yilmaz, 1987], page 277.

Finite di erence depth migration

Before discussing depth migration in more detail, we would like to point out the sometimes badly used word depth in it. With depth migration we mean that we map onto depth
instead of onto time. Sometimes depth migration is used for a migration which includes
an extra term which corrects for lateral velocity variations ([Yilmaz, 1987]). However, in
this terminology it suggests that the extra term only exist when we map onto depth, and
not in time; this is not true. Depth migration is important when strong lateral velocity
variations exist and then the picture in time can suggest some structure while if the image
would be made in depth, we would see that there is no structure at all. An example of this
has already been given in the previous chapter in gure (3.26). Of course, the section can
always be better in depth because then it corresponds more to a geological cross-section,
but this is often a dicult task because of the sensitivity to the velocity. Here also another
advantage of migration to a time section becomes important, namely that the algorithm is
not so sensitive to the velocity. The corrections due to a migration to vertical traveltime
are more corrections due to dip, while with a migration to depth we at the same time
correct for velocities as well. This sensitivity can be compared to determining interval
velocities on a time-migrated section which is also a very sensitive process. A seismic
interpreter should be well aware of the velocity e ect when interpreting a time section.
We will adopt the approach that by depth migration, we mean that we map onto depth,
not necessarily including the extra term in the migration procedure.
In the above, we derived the migration in terms of the vertical traveltime  by converting the depth z to  by means of equation (4.103). We will not do that in this section.
On top of that, we will speci cally consider strong lateral velocity variations because
then it is strongly recommended to map onto depth rather in time in order to prevent
misinterpretation.
Let us rst introduce a velocity clat which depends on the horizontal as well as vertical
coordinates, clat = clat (x; z ): Using this velocity we de ne a velocity c which is a horizontal
average of the velocity clat so:
Z xmax
1
c(z ) =
c dx:
(4.112)

xmax , xmin

xmin

lat

This velocity c is used as being constant locally. Now let us de ne the vertical traveltime
 by:
Zz
 = 2 dzc = 2cz ;
(4.113)
0
where we have now used the laterally invariant c:
We can now follow the same procedure as before when we employed the one-way wave
equation. Since we here deal with depth migration, we will use the one-way wave equation
in z rather than in : When we go through the same derivation as for the nite-di erence
time migration, we now have that d=dz = 2=c, while for kz we use the same expansion
as for
(equation(4.105)) so:
2
kz ' c2f , clat4fkx ;
(4.114)
lat
in which we have kept clat . Again we assume a solution in the form of:
P~ = Q~ exp(,2if ) = Q~ exp(,4ifz=c):
(4.115)
Using these in the one-way wave equation of equation (4.102) yields:
dQ~  2i clat kx2 Q~ + 2if  1 , 1  Q:
~
(4.116)
dz
4f
c clat
105

We see that if the velocity clat is only a function of the depth z , then the extra term on the
right-hand side vanishes and the equation is equivalent to the one we derived before for
time migration (equation (4.109)). The rst term on the right-hand side in this equation
is called the di raction term, while the second term is called the thin-lens term.
Rewriting using the formulation for spatial and temporal derivation, we obtain in the
t , x domain:
@ 2 Q  clat @ 2 Q + 2  1 , 1  @ 2 Q :
(4.117)
@t@z 4 @x2
c clat @t2
Because of the dependence on z , this is called depth migration. The extra term accounts
for lateral velocity variations, and is only important when strong lateral velocity variations exist. For slow lateral velocity variations, the approximation as discussed in time
migration, is suciently accurate.
Note that we could have included lateral velocity variations in the time migration as
well. Then we obtain in the (kx ; f ) domain, following the same procedure as before,
dQ~  2i cclat kx2 Q~ + 2if 1 , c  Q:
~
(4.118)
d
8f
clat
Note the extra term on the right.

When to use which technique

We have so far discussed the most popular techniques to perform the migration, but all
of them have some advantages and some disadvantages. In practice, the seismic processor
will decide which algorithm to use, based on the situation that is faced. As some (depth)
migration algorithms require interval velocities, which may not be known yet, the availble
information can limit the number of possibilities. The various options are listed below:
Kirchho time migration
Advantages
 Simplicity, based on NMO velocities
 Can handle steep dips
 Adaptable to unusual source-receiver geometries
Disadvantages
 Cannot handle low Signal-to-Noise ratio's
 Cannot handle lateral velocity variations
 Improper amplitudes
 Possible aliasing
Finite-di erence time migration
Advantages
 Can handle slow lateral velocity variations
 Can handle low Signal-to-Noise ratio's
Disadvantages
 Slow in computational speed
 Cannot handle dips above 45 degrees
106

 Cannot handle complex media


Migration by Fourier transform (Gazdag, Stolt)
Advantages
 Fast in computational speed
 Able to handle steep dips
 Can handle low Signal-to-Noise ratio's
Disadvantages
 Cannot handle lateral velocity variations
Depth migration (in comparison to time migration)
Advantages
 Can handle vertical and lateral velocity variations
 Can handle steep dips
Disadvantages
 Requires accurate interval velocity model
 Slow in computational speed
Another description of when to use which type of migration, is given in [Yilmaz, 1987],
table 4-1, page 246.
The usual goal of a seismic processor is to obtain a section which represents as much
as possible a geological cross-section (where we only have an impedance map!). So some
conversion to depth has to take place. Also here in di erent circumstances, it is known
when to use which technique. This is given in [Yilmaz, 1987], gure 5-8, p. 361. (A
note must be made about Yilmaz here. He suggests that only a migration mapped onto
depth includes the extra thin-lens term; this is not true. Only when a depth section is
desired, then the depth migration can directly map onto depth instead of the vertical
intercept-time ( ).)

Summary of zero o set migration

After stacking the resulting section is considered as a zero o set experiment. This zero
o set experiment can again be considered as the result of a so-called exploding re ector
experiment in the half velocity medium: each re ector point is considered as a source; all
these source re at the same time. By inverse extrapolation of this exploding re ector
experiment through this half-velocity medium and imaging at t=0 at each depth level, the
resulting migrated image is retrieved.
For this inverse extrapolation the Kirchho integral is used. This integral has been
derived form the wave equation and states that the wave eld of each point within a volume
can be determined from the wave eld values at the boundary of this volume and a number
of Green's functions that describe the propagation from the desired point to all boundary
locations.
In migration practice, the boundary is chosen as the earth's surface, and all points
below the surface can be determined. Depending on the complexity of the medium several
migration approaches can be followed. In the simplest case: the medium is homogeneous, a
mapping procedure (Stolt migration) can be used, that achieves a perfect migration result by
a forward and inverse temporal and spatial Fourier transform of the data, and a coordinate
mapping in between. If the medium is inhomogeneous, other procedures need be followed.

107

If the velocity distribution only varies as a function of z then a recursive migration via
the wavenumber-frequency domain can be applied (Gazdag phase shift migration) and in
the case of (local) lateral and depth variations, Kirchho migration (based on traveltime
functions), nite di erence methods (in time or depth) or recursive f , x migration can
be used. Each method has its pros and cons, and the seismic processor makes his/her
decision based on the complexity of the model and the available resources (computer power
or nancial).

4.10 Conversion from time to depth


In the previous section we have spoken of time and depth migration, referring to whether
the output section is in time or depth, respectively. In time, we do not need to know the
velocities that well, stacking velocities will often do. In depth migration we need to know
the velocities very well, which is often a dicult task. Still, our goal is to obtain a section
which is as close as possible to a geological cross-section; to that e ect we want to have
our section in depth. In this section we will brie y discuss the conversion from time to
depth, especially in which circumstances you can use certain techniques.

Dix formula

Let us rst consider a model with plane horizontal layers. We showed in Chapter 3
that we could determine the root-mean-square velocities from the interval velocities via:
N
X
c2rms;N = T 1 (0) c2i Ti(0);
(4.119)
tot;N
i=1
where we have included an extra N in the notation of crms;N and Ttot;N . We can invert
this formula, which means that we can determine the interval velocities from the rootmean-square velocities. When we consider the root-mean-square velocities for N = 2 and
N = 3; we have:
2
+ c22 T2 (0)
c2rms;2 = c1 TT1 (0)
(4.120)
1 (0) + T2 (0)
2
+ c22 T2 (0) + c23 T3 (0)
c2rms;3 = c1 TT1 (0)
1 (0) + T2 (0) + T3 (0)

(4.121)

We bring the denominator on the right-hand side to the left-hand side, subtract the rst
equation from the second, and obtain:
c2rms;3 (T1 (0) + T2 (0) + T3 (0)) , c2rms;2 (T1 (0) + T2 (0)) = c23 T3 (0)
(4.122)
in which we recall that T3 (0) is the zero-o set traveltime through layer 3, so in fact the
di erence between the total time up to the time at level 3 minus the time at level 2, so
T3 (0) = Ttot;3 (0) , Ttot;2 (0): So then the interval velocity c3 becomes:

v
u
u c2 T (0) , c2 T (0)
c3 = t rms;3 Ttot;3 (0) , Trms;2(0)tot;2
tot;3
tot;2

(4.123)

In general the interval velocity for the ith layer is given by:

v
u
u c2 T (0) , c2 ,1 Ttot;i,1 (0)
ci = t rms;i Ttot;i (0) , rms;i
Ttot;i,1 (0)
tot;i
108

(4.124)

The values for crms;n and Ttot;n can directly be obtained from the velocity le as used for
stacking the data. This is Dix formula [Dix, 1955]. Dix' formula converts RMS-velocities
to interval velocities.
In our procedure to get a depth section for a model with horizontal plane layers, we
convert the time axis on our (zero-o set) stacked section to a depth axis using this formula.
Although we derived Dix formula for horizontal layers, the formula will still be good
when we have mild lateral velocity variations. It has been shown that even in the case
of dipping events, the formula will still be good. In that case however, in order to obtain
a good depth section, we must rst time-migrate the data (without the thin-lens term)
before we can convert the time axis to a depth axis.

Image rays

The above formula breaks down when the lateral velocity variations become larger.
However, the concept of image rays has helped us still to be able to convert the time
section to a depth section. What is an image ray? An image ray is a ray which goes down
vertically from the surface; this is in contrast to normal-incidence rays for which the ray is
perpendicular to the re ection interface. An example is given in gure (4.40), taken from
[Yilmaz, 1987].
Let us now look at a simple model in which we have a point di ractor buried below
a dipping interface between two media which show quite a strong contrast in velocity
(4.41). When we look at the zero-o set section we see a shape which does not look like a
hyperbola any more; it is skewed. It has the lowest time, its apex, at a receiver location
that is not right above the di ractor, but laterally shifted. In the gure, B is the point
above the di ractor, and A is the point of the apex of the traveltime curve. However,
the ray picture shows a very interesting feature, namely that at point A the ray path is
perpendicular to the surface. This was rst recognized by [Hubral, 1977]. We can use the
image ray to perform a lateral shift, which is equivalent to applying the thin-lens term
in the nite-di erence migration scheme. We should realize that with the image ray, we
apply rst the di raction term in the (time) migration, and only later perform the lateral
shift as predicted by the image ray.
In order to convert the time section to a depth section (note: section instead of axis) for
models for which the lateral velocity variations are not too large, we can use the image-ray
concept rather than full depth migration. Time migration will always position the re ection information at the apex of the migration operator. The procedure to obtain the depth
section is then to do a time migration with only the di raction term, and then convert to
depth along the image rays. In this way, that re ection information is positioned at the
correct depth and lateral location.

109

Figure 4.40: The image rays for a model with 3 re ectors. From Yilmaz (1987), g 5-7

Conversion to depth via depth migration

There is some point when even the image-ray approach breaks down. It was already
said that the image rays are used after a time migration has been done, but sometimes the
migration cannot be split into these two parts, and one has to apply the thin-lens term
alternately with the di raction term. Whether the image-ray approach breaks down can
be inspected via a plot of the image rays. When more than one image ray is associated
with a subsurface point, we should apply the di raction and thin-lens term alternately
and use a full depth migration in order to get the appropriate depth section. Such a case
in shown in gure (4.40), where for the lowest re ector the many parts are illuminated by
more than one image ray.
110

A B

1.0

500

t (s)

z (m)

1.5

1000

2.0
1500

2000

500

1000

1500
x (m)

2000

2500

2.5

3000

500

1000

1500
x (m)

2000

2500

3000

Figure 4.41: The ray paths for a point di ractor in a zero-o set section (left), with its
time section (right). (Adapted from Yilmaz (1987))
All the considerations of this section are summarized in the gure as can be found in
[Yilmaz, 1987], g5-8, p.361, which is included here for completeness in gure (4.42).

CMP stack

No dip,
no lateral
velocity
variations

Conversion
of time axis
to depth axis
along vertical rays

Dipping events,
mild lateral
velocity
variations

Dipping events,
moderate lateral
velocity
variations

Time migration
(apply diffraction
term)

Time migration
(apply diffraction
term)

Conversion
of time axis
to depth axis
along vertical rays

Dipping events,
strong lateral
velocity
variations

Depth migration
(apply diffraction
and thin-lens terms
in an alternate
manner)

Conversion
of time axis
to depth axis
along image rays

Figure 4.42: Strategy for obtaining a depth section. (Adapted from Yilmaz (1987))

111

4.11 Prestack migration


In the previous chapters and sections, a general seismic processing procedure has been
discussed that consists of the following major steps:
1
2
3
4
5

CMP sorting
NMO correction
DMO correction
Stack
Poststack time/depth migration

In a number of these steps some assumptions have been made that are not valid for general
inhomogeneous earth models, such as:
1 CMP sorting: if the earth is inhomogeneous and re ectors have complex shapes the
re ection events within a CMP gather do not belong to one subsurface re ection
point (see section 4.8).
2 NMO correction: in complex subsurface media the moveout in a CMP gather is not
hyperbolic, so a perfect moveout correction cannot be achieved.
3 DMO: if strong lateral velocity and/or re ector geometry variations are present, the
DMO procedure still will not resolve the re ection point smear within a CMP gather.
4 Stack: as the events within a CMP gather do not belong to the same subsurface
re ection point, stacking of these events will mix subsurface information.
5 Poststack time/depth migration: given the approximations in the previous steps, a
stacked section does not represent a true zero o set section and migration of this
stack will therefore not result in an exact image. Furthermore, some poststack
migration algorithms have limitations due to the assumed simple velocity eld or
limitation in the maximum dip that can be handled.
In fact this procedure is still used so often as it is robust and very ecient. Furthermore,
as a rst indication it still serves a prupose to do some fast and robust CMP-oriented
processing. However, if the results are not satisfactory, a true prestack migration procedure need be followed: the original eld shot records need be directly migrated into
a subsurface image. In such a procedure, the exploding re ector model does not work
anymore: the prestack dataset is a two-way wave eld, with a mix of up and downgoing
propagation e ects. A prestack migration procedure is a more complex procedure, but
could be explained as the following sequence:
1 Put a point source in the shot position of the shot record under consideration and
apply a forward extrapolation to each subsurface point. For this step still the Kirchho extrapolation procedure can be used, but now with the full velocity model (not
half velocity).
2 The wave eld of the shot record (i.e. the measured response at the receivers) is
inverse extrapolated to each subsurface point, using again the Kirchho formula.
3 At each depth level the two wave elds (source and receiver eld) are correlated,
which means that the downward extrapolated receiver wave eld is corrected in time
by the corresponding source wave eld. The contribution that appears at t = 0 is
the prestack depth migration contribution for this subsurface point.
112

4 By repeating this procedure for all shot records in the dataset, and adding all resulting images, the complete subsurface image is obtained.
Note that this procedure is much more expensive, as the stacking procedure is now applied
after migration. However, complex subsurface models with large lateral velocity variations
(e.g. in a salt dome environment) require such an elaborate method in order to correctly
image structures below or near the edges of such a salt dome structure.
Stacking after migration has also another advantage: as neighboring shot records illuminate largely the same subsurface area, it means that (if the correct velocity model
has been used) after migration at a certain re ector location, all migrated contributions
from each experiment should have the same depth. Based on this knowledge, prestackrelated velocity analysis can be done: if the re ections do not align from the di erent
seismic experiments after migration, the velocity model need be adapted and migration
is recalculated. although this procedure is much more computational intensive than the
velocity analysis based on NMO correction, the main advantage again is the accuracy in
complex models, where CMP gathers do not show any hyperbolic moveout behavior, and
the NMO based velocity analysis does not apply at all. A similar list of migration procedures also exist for prestack depth migration, as still the wave eld extrapolation process
takes a central role. Therefore, Kirchho techniques, Gazdag, Stolt, nite di erence and
recursive depth migration algorithms are available to the user. As the prestack migration
procedure was not the main objective of this course, we will not go into these details.

113

Chapter 5

3D seismic processing
5.1 Introduction
So far we considered most of the seismic processing only in a 2D mode: assuming sources
and receivers were positioned along one coordinate, and also the earth was assumed to be
invariant in the direction perpendicular to the acquisition line. In reality this is de nitely
not the case. Up to the seventies, seismic processing was indeed fully 2D oriented. Acquisition was done along lines. But by combining the results from many parallel lines, a 3D
view of the earth could still be obtained (although the sampling in the cross-line direction
was very coarse). In the eighties, full 3D seismic came into development. For each shot
several parallel lines of receivers were recorded simultaneously, thus creating a much denser
coverage of the earth. Nowadays, true 2D seismic acquisition belongs to the exception,
and is sometimes only done for certain research purposes. Together with the change in
the acquisition method, the processing algorithms had to follow the transition from 2D
to 3D. In this chapter, an overview of the major consequences of this extra dimension are
described.

5.2 Midpoint oriented processing


Even nowadays, the vaste amount of data is processed still in a midpoint-oriented way.
This means that prestack data is sorted into common midpoint (CMP) gathers and stacked
after application of the proper moveout correction. The midpoints are now de ned on a
two-dimensional grid at the surface, instead of along one coordinate only. Typically, the
midpoint spacing for a 3D survey is in the order of 25 m in both directions. Sometimes
the in-line spacing is smaller than the cross-line spacing. The major di erence with 2D
in this processing approach is just that the amount of data has increased enormously. A
small 3D seismic survey of 10x10 km, with 50 traces at every 25x25 m grid cell (which is
called a bin) contains thus 8x106 traces, which results in 0.1 Tbyte of data. Typical 3D
surveys can easily consist of several Tbytes.

114

Figure 5.1: 3D marine seismic acquisition, with multiple streamers towed behind a vessel.
(Picture taken from Veritas DGC website)

A 3D midpoint-oriented processing scheme will look typically like:


1.
2.
3.
4.
5.
6.
7.

Geometry assign
Preprocessing: groundroll removal, statics correction, deconvolution
CMP sorting
NMO velocity analysis and correction
3D DMO
Stack
3D Poststack time/depth migration

In the following some of these steps are further described.

3D geometry assignment

An important step in 3D seismic processing is the assignment of the correct geometry


to the seismic measurements. At rst this sounds like a trivial taks, but in practice
this needs careful labour work. Before 3D seismic acquisition takes place, the survey is
designed, based on a compromise between costs of acquisition and the required accuracy
in the image. For this, the expected presence of strong noise in the seismic records or
special requirements in the imaging of the subsurface structures can play an important
role.
For 3D marine seismic the design of the survey is limited by the practical possibilities: sources and receivers need be towed behind one (or more) vessels (see Figure 5.1).
Typically, a 3D marine survey consists of a dual source, alternatively shooting into 7
115

streamers. Each streamer has a length from 3 km up to sometimes 8 km. The streamers
have a typical cross-line spacing of 100 m, such that with the two sources the cross-line
midpoint spacing becomes 25 m. However, during shooting these long cables cannot be
kept in a straight line behind the ship. Cross-line currents will move the cables such that
so-called feathering and snaking e ects can occur. This means that the receivers can be
easily shifted 100 m away from its supposed position. Using GPS systems mounted on
the cables, the exact position of each hydrophone can be determined. All this position
information is separately stored. After all data has been shot, the geometry information
needs be merged with the seismic measurements.
For 3D land data, the design of an seismic acquisition is not bounded by the fact that
sources and receiver need be in-line with each other. But there other aspects play a role,
e.g., the cost of planting geophones in the ground, the conditions of the surface, obstacles
in the area (water, buildings, mountains etc.). Typically, a 3D acquisition on land has
a cross-spread type geometry: sources are densely sampled in one direction and coarsely
sampled in the perpendicular direction, and the receiver spacing is also dense in one and
coarse in the other direction, but in opposite directions compared to the sources. In this
way the coverage of the subsurface becomes well distributed over the area. Except in
desert areas, the intended acquisition geometry can never be exactly achieved in practice,
and in the end an irregular geometry is obtained. Like in the marine case, the position
of each source and receiver is determined and this information need be transferred to the
seismic traces after the acquisition has nished.
All traces will get the correct header information (source and receiver x and y coordinates). Besides that, each trace will get an in-line and cross-line CMP (i.e. bin) number.
Once this geometry information is assigned to the seismic data, all kinds of quality control
plots can be made.
An example of such a plot is a coverage (or fold) plot, like shown in Figure 5.2. For a
3D marine dataset the number of traces that have a midpoint in a certain bin is displayed
in color-code. If the sources and receiver positions would be stricitly regular, this coverage
plot would have one color only, as the geometry was designed that way. However, due to
the above mentioned e ects, this fold plot can display a variable coverage. Moreover, also
in marine acquisition obstacles can be present, like an oil platform. This will be observed
in the coverage plot as a low-fold area, as the ship has to manouver around the obstacle.
Besides the fact that in an irregular acquisition geometry not all bins are covered by
the same amount of traces, also the exact midpoint positions do not fall in the center of
each bin. In fact, the binning process assign a bin number to each traces, meaning that the
geometric midpoint of such a trace falls within the boundaries of a grid cell. But the actual
midpoint can also be plotted on a map in order to quantify the midpoint smearing within
each grid cell. Figure 5.3 shows such a plot for a 3D marine dataset, that has encountered
feathering e ects. Therefore, midpoints are spread out over the grid cells (even moving to
a neighbouring cells). Another e ect that one is faced with in 3D acquisition is that the
distribution of o sets for one CMP gather can vary from bin to bin. Figure 5.4 shows for
one xed cross-line location in Figure 5.3 the distribution of o sets as a function of the
in-line midpoint coordinate. The feathering e ects can be clearly observed here. Note that
for some midpoint locations (like the left box at CMP 2100) a very regular distribution of
o sets is visible, whereas other midpoints (e.g. the box around CMP 2115) have a very
irregular o set distribution. An irregular o set distribution has the e ect that correlated
noise events - like multiples or gorundroll - will not be suppressed that well. Also due to
AVO e ects in the desired re ection data (AVO = Amplitude versus O set) the stacked
amplitude will vary with varying o set distribution. This may result into false amplitude
anomalies on the stacked section.

CMP sorting and velocity analysis

After the geometry has been assigned to the traces, and also all midpoints have been
de ned (i.e. the binning process), the seismic data can be sorted into CMP gathers. For

116

Figure 5.2: An example of a coverage plot for a 3D marine survey. Note the holes due to
oil platforms in this area.
the two CMP locations in Figure 5.4 the seismic trace are plotted at their actual o set
location in Figure 5.5. An NMO correction has been applied to facilitate the interpretation
of the events. The evens that show a residual curvature in these plots are identi ed as
multiples or converted waves. Note the irregular o set distribution for CMP 2115 (right
hand side of Figure 5.5).
The processing per CMP gather is similar to the 2D case. A stacking velocity analysis
is carried out in the CMP domain. The result is a stacking velocity eld as a function of
the in-line and cross-line coordinate (i.e. bin number). Then each seismic trace is NMO
corrected with its velocity function and these traces are stacked to simulate the zero o set
trace - with high SNR - at that particular CMP position.

DMO correction

Like in the 2D case, the e ect of dipping structures in the subsurface can be corrected
for in a separate DMO step. However, all calculations are now carried out in a 3D sense,
i.e. along the x- and y-coordinate. As DMO acts as a lateral summation process (i.e. a
partial migration) and it is often applied to 3D seismic data even if there is no complex
strucutural information. It has a tendency to smear out all seismic amplitudes, and thus
reduces the so-called acquisition footprint. This means that due to the acquisition, not
all CMP gathers are evenly lled (e.g. varying coverage or o set distribution within each
bin). This will generate an amplitude pattern on the stack. Applying 3D DMO will reduce
this e ect.

117

y coordinate axis

x coordinate axis

Figure 5.3: An example of a midpoint plot for a 3D marine survey.

Stacking

After the NMO correction, and optionally a DMO correction, all traces from one CMP
gather are stacked such that for each bin location one trace is produced, simulating the zero
o set trace at this position. Thus, after stacking a volume of stacked traces is obtained.
This stack is now densely sampled in the in-line and cross-line direction.

5.3 3D Poststack migration


To obtain the nal subsurface image after the stacking process, a 3D poststack migration
needs be carried out. In fact, all migration algorithms as discussed in the previous chapter
have their implementation for both 2D and 3D data. Moreover, most migration formulas
have been derived in the 3D sense there. If the earth has variations in all directions, a
full 3D migration process is the only way to accurately reveal its structures (see Figure
5.6). For a homogenous earth this means that each stacked trace is smeared out along
a sphere. Or viewed from another way, all seismic amplitudes along a hyperboloid will
be added together to form the re ection for one subsurface point. Of course, the earth is
non-homogenous, and depending on the complexity of the earth a choice between various
3D migration algorithms need be made.
118

CDP 2100

4000

CDP 2115

3500

3000

offset [m]

2500

2000

1500

1000

500

2.0

2.1

2.2

2.3

2.4

2.5
2.6
2.7
midpoint coordinate [km]

2.8

2.9

3.0

3.1

3.2

Figure 5.4: Plot of the distribution of o sets as a function of the in-line midpoint location
for one xed cross-line position for the data of Figure 5.3. Note the irregular coverage
of o sets for the various midpoints. The two rectangles indicate the CMP locations for
which the seismic data is shown in Figure 5.5.
Besides the extra dimension involded in the processing steps, also in the interpretation
of the seimic images a new dimension has been added. Small events that look like artifacts
in a 2D cross-section of the earth's subsurface appear to be genuine structures if you look
at the image in a 3D sense.
Examples of such structures can be prehistoric riverbeds that appear as channels
throughout a seismic section. In Figure 5.7 one slice at constant depth through a 3D
migrated seismic dataset is displayed. To obtain a better view of the ne details, the
original data (left hand side) has been ltered such that small edges become better visible
(right hand side). The ancient rivers that are present at this depth become well visible.
Often, river sands are good oil reservoirs. Therefore, the detection of these structures
is of high importance to the oil and gas exploration industry. Such channels are hardly
recognizable on a 2D cross-section of the earth. Note that the circular shapes in the top
of the left side image in Figure 5.7 are due to a salt dome; these circular events show the
boundaries of this dome (like cutting throug a mountain). Inside this dome, the imaging shows a very complicated pattern (see also the same area after ltering). This is a
well-known feature of salt bodies: the salt has a very heterogenous character.
Furthermore, several interfaces can be picked or tracked in a volumetric way, resulting
in a 3D map of the structures of interest, that can be viewed from di erent angles. An
example of an integrated display of a seismic vertical cross-section, a few well-logs and an
interpreted horizon is shown in Figure 5.8.

119

offset [m]
1000 2000 3000

1.0

1.0

1.5

1.5

time [s]

0.5

time [s]

0.5

offset [m]
1000 2000 3000

2.0

2.0

2.5

2.5

3.0

3.0

3.5

3.5

Figure 5.5: For the two CMP locations indicated in Figure 5.4 the seismic traces are
plotted at the true o set locations. Note that the left hand side CMP has a very regular
and the right hand side CMP a very irregular distribution of o sets.

120

Figure 5.6: After 3D processing and migration, a volume image of the earth is obtained.
picture taken from ExxonMobil website)

121

y (km)
0

y (km)
10

10

10

x (km)

x (km)

10

15

15

20

20

Figure 5.7: Depth slice through 3D migrated section before (left) and after (right) edgedetection ltering.

122

Figure 5.8: One vertical slice of a 3D migration combined with an interpreted horizon
through the full survey. (picture found on the internet)

123

Appendix A

Discretisation of Fourier transform


The continuous Fourier integrals are nearly always used for deriving mathematical results,
but, in performing transforms on numerical data, the integrals are always replaced by
summations. The continuous signal a(t) becomes the discrete signal, or time series, ak ; in
which k is an integer, and the sampling has taken place at regular intervals kt: Thus the
discrete signal corresponds exactly to the continuous signal at times
t = kt
(A.1)
Consider the evaluation of the Fourier transform (2.2) at the discrete times kt:

Z1

ak =

,1

A(f ) exp(2ifkt)df

k = :::; ,2; ,1; 0; 1; 2; :::

where ak stands for the fact that time is now discrete so:
ak = a(t); when t = kt
k = :::; ,2; ,1; 0; 1; 2; :::
This integral may be replaced by an in nite sum of pieces of the integral:

ak =

1 Z mt + 21 t
X

m=,1 mt , 21 t

A(f ) exp(2ifkt)df

k = :::; ,2; ,1; 0; 1; 2; :::

(A.2)
(A.3)
(A.4)

In order to get the bounds of the integral from ,1=(2t) to +1=(2t); we change to the
variable f 0 = f , m=t to yield:
1 Z
X

1
2t
0
ak =
1 A(f +
,
m=,1 2t

m ) exp(2iff 0 + m gkt)df 0
t
t

k = :::; ,2; ,1; 0; 1; 2; :::

(A.5)
Changing the order of the integration and summation, and noting that the exponential
becomes periodic (so exp(2imk) = 1); this becomes

ak =

1 " X
1
2t
A(f 0 +
, 21 t m=,1

m ) exp(2if 0 kt)df 0
t

k = :::; ,2; ,1; 0; 1; 2; ::: (A.6)

The Fourier transform of the discrete time series is thus

ak =

1
2t
AD (f ) exp(2ifkt)df
, 21 t

124

k = :::; ,2; ,1; 0; 1; 2; :::

(A.7)

provided

AD (f 0 ) =

1
X
m=,1

A(f 0 + mt )

(A.8)

So this is an in nite series of shifted spectra as shown in gure (A.1b) The discretisation
of the time signal forces the Fourier transform to become periodic. In the discrete case we
get the same spectrum as the continuous case if we only take the period from ,1=(2t)
to +1=(2t); and else be zero; the signal must be band-limited. So this means means that
the discrete signal must be zero for frequencies jf j  fN = 1=(2t): The frequency fN is
known as the Nyquist frequency.
Let us now look at the other integral of the continuous Fourier- transform pair, i.e.
(2.1). We evaluate the integral by discretisation , so then we obtain for AD (f ):

AD (f ) = t

1
X

k=,1

ak exp(,2ifkt)

(A.9)

In practice the number of samples is always nite since we measure only for a certain time.
Say we have N samples. Then we have to determine the function BD (f ) which resembles
closely the one with in nitely many samples. Say, we de ne BD (f ) as:

BD (f ) = t

NX
,1
k=0

bk exp(,2ifkt)

(A.10)

and we have to determine the coecients bk . For that purpose, we use the least-squares
method, that means that we determine the coecients bk such that the average quadratic
error:
Z 21 t
EN = 1 (AD (f ) , BD (f ))2 df
(A.11)
, 2t

is minimized for a xed value of N: EN can be viewed as a function of fbn g, thus:


EN = EN (b0 ; b1 ; b2 ; : : : ; bN ,2 ; bN ,1 )
(A.12)
In order to minimize EN we must put the partial derivatives with respect to bn to zero,
i.e.,:
@EN =@bn = 0;
for n = 1; 2; : : : ; N , 1
(A.13)
Working this out gives us:

@EN =@bn =
=
=

1
2t
2 1 (AD (f ) , BD (f ))f, @B@bD (f ) gdf
, 2t
n
Z 1
,2 21t (AD (f ) , BD (f )) exp(,2ifnt)tdf
, 2t
Z 21 t
1
X
,2t 1 t
ak exp(,2ifkt) exp(,2ifnt)df
, 2t k=,1

,2t

1
NX
,1
2t
(
,

t
)
bk exp(,2ifkt) exp(,2ifnt)df
, 21 t
k=0

125

(A.14)

Since the integration variable does not depend on the summation variable, we can interchange the summation and the integration:

@EN =@bn = ,2(t)2


+2(t)2

1
X

k=,1
NX
,1

ak

1
2t
exp(,2ifkt) exp(,2ifnt)df
, 21 t

1
2t
bk 1 exp(,2ifkt) exp(,2ifnt)df
, 2t
k=0

(A.15)

Because of the orthogonality relations, the integrals are zero, unless k equals n; in that
case the integral becomes (1=t): We thus obtain:
@EN =@bn = ,2(t)2 an 1t + 2(t)2 bn 1t
n = 0; 1; 2; : : : ; N , 1
(A.16)
Since each of these partial derivatives are set equal to zero, we obtain the nice result that
the coecients bk are equal to the coecients ak which are the coecients for the in nite
sum.
Combining all this information, we obtain the Fourier pair:
NX
,1

ak exp(,2ifkt)
k=0
Z 21 t
AD (f ) exp(2ifkt)df
, 21 t

AD (f ) = t
ak =

(A.17)

k = 0; 1; 2; :::; N , 1

(A.18)

This is the transform pair for continuous frequency and discrete time. Notice that the
integral runs from ,1=2t to +1=2t , i.e. one period where one spectrum of AD (f ) is
present.
So far, we considered the fact that the values for frequencies above the Nyquist frequency must be set to zero. This way of looking at it, is a frequency-domain consideration.
We can translate this to the time domain by saying that if there is no information in the
continuous time signal a(t) at frequencies above fN ; the maximum sampling interval t
is
tmax = 2f1
(A.19)
N
This is the sampling theorem.
If we choose t too large , we under-sample the signal and we get aliasing as shown
in Figure (A.2). The original signal appears to have a lower frequency.
In practice the number samples in a time series is always nite. We wish to nd
the discrete Fourier transform of a nite length sequence. We approach the problem by
dividing the de nite integral (A.7) into the sum of N pieces of equal frequency interval
f: Because AD (f ) is periodic, with period 1=t; we may rst rewrite the integral with
di erent limits, but with the same frequency interval:

ak =

1
t

AD (f ) exp(2ifkt)df

k = 0; 1; 2; :::; N , 1

(A.20)

k = 0; 1; 2; :::; N , 1

(A.21)

Writing the integral as a summation, we obtain

ak = f

NX
,1
n=0

An exp(2infkt)

126

A continuous
(a)

A discrete
(b)

0
1
t 1

1
t 1
A discrete

(c)

1
t 2

1
t 2

Figure A.1: E ect of time-discretisation in frequency domain: (a) continuous spectrum;


(b) properly time-sampled spectra giving rise to periodicity (period 1=t1 ; (c) too coarse
time sampling t2 such that spectra overlap (= aliasing in time domain).

127

(a)

(b)

(c)

Figure A.2: E ect of discretisation in time: (a) properly sampled signal; (b) just undersampled signal; (c) fully under-sampled signal.

128

where

An = AD (f ); whenf = nf:
We now notice that the series ak is periodic with period N :
ak+N = f
= f
= f

NX
,1
n=0
NX
,1
n=0
NX
,1
n=0

(A.22)

An exp(2inf fk + N gt)

(A.23)

An exp(2infkt + 2infN t)

(A.24)

An exp(2infkt)

(A.25)

= ak
(A.26)
since N f = 1=t and so exp(2in) = 1: Thus we arrive at the following discrete Fourier
transform pair for a nite-length time series

An = t
ak = f

NX
,1
k=0

ak exp(,2ink=N )

NX
,1
n=0

An exp(2ink=N )

n = 0; 1; 2; :::; N , 1

(A.27)

k = 0; 1; 2; :::; N , 1

(A.28)

These two equations are the nal discrete-time and discrete- frequency Fourier transform
pair.

129

Appendix B

Derivation of the wave equation


In this appendix we will derive the wave equation for homogeneous media, using the
conservation of momentum (Newton's second law) and the conservation of mass. In this
derivation, we will follow [Berkhout, 1984] (appendix C), where we consider a single cube
of mass when it is subdued to a seismic disturbance (see gure (B.1)). Such a cube has a
volume V with sides x; y and z:
Conservation of mass gives us:
m(t0 ) = m(t0 + dt)
(B.1)
where m is the mass of the volume V; and t denotes time. Using the density ; the
conservation of mass can be written as:
(t0 )V (t0 ) = (t0 + dt)V (t0 + dt)
(B.2)
Making this explicit:
0 V = (0 + d)(V + dV )

p+pz
z

p+p

x
p+p

Figure B.1: A cube of mass, used for derivation of the wave equation.

130

= 0 V + 0 dV + V d + ddV
Ignoring lower-order terms, i.e., ddV; it follows that

d = , dV
0
V

(B.3)
(B.4)

We want to derive an equation with the pressure in it so we assume there is a linear


relation between the pressure p and the density:

dp = K d
0

(B.5)

where K is called the bulk modulus. Then, we can rewrite the above equation as:

dp = ,K dVV

(B.6)

which formulates Hooke's law. It shows that for a constant mass the pressure is linearly
related to the relative volume change. Now we can also derive that:

dV
(x + dx)(y + dy)(z + dz ) , xyz
V =
xyz
xyz

' dxx + dyy + dzz + O(dxdy) + O(dxdz) + O(dydz)

(B.7)

For dx we can write:

dx = (vx dt)x+x , (vxdt)


x dt) x
= @ (v@x
(B.8)
where vx denotes the particle velocity in the x{direction. We can do the same for the y
and z {component and obtain:
dV '  @vx + @vy + @vz  dt
V
@x @y @z
= (r  v)dt
(B.9)
Substitute this in Hooke's law (equation B.6):

dp = ,K (r  v)dt

(B.10)

1 dp
K dt = ,r  v

(B.11)

or

The term on the left-hand side can be written as :


1 dp = 1  @p + v  rp

K dt

K @t
131

(B.12)

Ignoring the second term in brackets (low-velocity approximation), we obtain for equation
(B.11):
1 @p = ,r  v
(B.13)
K @t
This is one basic relation needed for the derivation of the wave equation.
The other relation is obtained via Newton's law applied to the volume V :
F = m ddtv

(B.14)

where F is the (vectorial) force working on the element V: Consider the force in the
x{direction:
Fx = ,pxSx

 @p


@p
= , @x x + @t t Sx
@p V
' , @x
(B.15)
ignoring the term with t; and Sx is the surface in the x{direction, thus yz: So we
can write:

T

@p ; @p ; @p V
F = , @x
@y @z
= ,V (rp)
Substituting in Newton's law (equation B.14), we obtain:

,V (rp) = m ddtv


= V dv
dt

(B.16)

(B.17)

We can write dv=dt as @ v=@t; for this we have used a low-velocity approximation:
We divide by V to give:

dv = @ v + (v  r)v  @ v
dt @t
@t

(B.18)

,rp = 0 @@tv

(B.19)

This equation is called the equation of motion.


We are now going to combine the conservation of mass and the equation of motion.
Therefore we let the operator (r) work on the equation of motion:

 @v 
,r  (rp) = r   @t
@ (r  v )
=  @t
132

(B.20)

for constant : Substituting the result of the conservation of mass gives:


 1 @p 
@
2
,r p = 
,
0 @t

K @t

(B.21)

Rewriting gives us the wave equation:


2

or

r2p , K0 @@t2p = 0

(B.22)

r2p , c12 @@tp2 = 0

in which c can be seen as the velocity of sound, for which we have: c = K=:

133

(B.23)

Appendix C

The de nition of SEG-Y


In the following pages you will nd a reprint of the article from [Barry et al., 1975] in
which the SEG-Y seismic data format is de ned.

134

135

Appendix D

Traveltime equation for a dipping


refracting boundary
In this appendix, the traveltime for a refraction on a dipping boundary is derived. In
order to obtain the desired expression, the dipping boundary is de ned mathematically,
and Snell's law is rewritten in a suitable form, before the total traveltime is determined.
Consider the con guration given in gure (D.1). The boundary between the rst and
second layer has an angle with the horizontal. The equation for the dipping boundary
is given by:
z = z0 + x tan
(D.1)
or, using the depth at the end point z3 :
z = z3 , x tan
(D.2)
A ray goes to the refractor (downgoing ray), is refracted critically in the second layer
(refracted ray) and goes back to the surface (upgoing ray). Since the ray is critically
refracted in the second layer, we have the following relations:
U = c +
(D.3)
D = c ,
(D.4)
in which c is the critical angle, D is the angle with the vertical of the downgoing ray and
U of the upgoing ray. Since Snell's law holds for all rays, we have:
sin(D + ) = sin(U , )
(D.5)

c1

c1

sin(D + ) = 1
c1
c2
sin(U , ) = 1

c1

c2

(D.6)
(D.7)

For convenience, the horizontal slowness p and vertical slowness q are introduced:
pD = sinc D
(D.8)
1

136

x
x1

z0

x2

x3

z1

z2

z3

velocity
c1

velocity
c2

Figure D.1: Con guration for refraction on dipping plane boundary.

pU = sinc U
1
qD = cosc D
1
qU = cosc U
1

(D.9)
(D.10)
(D.11)

Writing the sine's out in equations (D.5){(D.7), the introduction of the pD ; pU ; qD and qU
gives the relations:
pD + qD tan = pU , qU tan
(D.12)
1
(D.13)
pD + qD tan = c cos
2
1
pU , qU tan = c cos
(D.14)
2
We have now all the necessary formulae to derive the traveltime for the refracted ray
as drawn in gure (D.1). The total distance between source and receiver is denoted by x;
the horizontal distances x1 ; x2 and x3 are the distances of each part of the ray as drawn
in the gure. The total traveltime T becomes:
1 x +p x +q z
T = pD x1 + qD z1 + c cos
2 U 3 U 2
2
137

1 x + p (x , x , x ) +
= pD x1 + qD (z0 + x1 tan ) + c cos
2 U
1 2
2

qU [z3 , (x , x1 , x2 ) tan ]
(D.15)
where we used x = x1 + x2 + x3 and the equations for the interface (eqs.(D.1) and (D.2)),
where the z0 {equation is put with the downgoing ray and the z3 {equation with the upgoing
ray. Now grouping together the terms with x1 ; x2 and x, T reads:
1 )+
T = x1 (pD + qD tan , pU + qU tan ) + x2 (,pU + qU tan + c cos
2
x(pU , qU tan ) + qD z0 + qU z3
1 x+q z +q z
= c cos
(D.16)
D 0
U 3
2
where we used Snell's law as derived above (eqs.(D.12) and (D.14)). Notice that the second
and third terms represent times associated with the begin- and end-point of the ray at the
surface. Therefore it can also be written as:
1 x+ +
(D.17)
T = c cos
0 3
2
These last two equations ((D.16) and (D.17)) are the desired expressions.

138

Appendix E

Correlation of signals
In this appendix we will give some more background information on the correlation process.
In Chapter 2 the correlation ab of two signals a(t) and b(t) was de ned as follows:

ab ( ) =

Z1

,1

a(t)b (t ,  )dt:

(E.1)

In the frequency domain this becomes:


ab (f ) = A(f )B  (f ):
(E.2)
If A and B are the same signal, then ab represents the auto-correlation function. Due to
the fact that in the freqeuncy domain the auto-correlation spectrum becomes A(f )A (f )
this is a real valued function and the auto-correlation function in time is symmetric around
zero time.
The purpose of a correlation function is to nd resemblences between two signals
(or within signals). Looking at equation (E.1) we see that the correlation is calculated
by shifting one signal across the other and do a sample-by-sample multiplication. This
summation result will be larger if the two signals will look similar to each other (such that
positive values are multiplied by positive and negative by negative, all contributing to a
large correlation value).
Examples of auto-correlation functions for a few signals are displayed in Figure E.1.
The rst signal is a rst derivative of a Gauss wavelet. Note that the autocorrelation of
each signal is indeed symmetric around zero time (i.e. zero phase behaviour). Note also
that the noisy signal in Figure E.1e has a very spiky autocorrelation. This can be easily
understood by considering that a noise signal will only correlate with itself if there is not
shift. As soon the signal is shifted by one or more time samples, there is only accidental
local correlation. That is why we can assume in practice that a random signal as an
autocorrelation function that is a scaled delta pulse. In Figure E.1e the signal with the
three shifted Gauss wavelets y(t) is added with noise, resulting in x(t) = y(t) + n(t). The
autocorrelation of this is xx = yy + nn + yn + ny . Neglecting the cross terms, we see
indeed that Figure E.1f seems to be the summation of Figure E.1d and a delta function.
In Figure E.1g we convolved a random re ection series with the Gauss wavelet of Figure
E.1a. The autocorrelation of this signal x(t) = g(t)  s(t) with g(t) being the random
re ection series is given by xx = gg  ss  Gss. This means that Figure E.1h is a
scaled version of Figure E.1b, which appears to be true to some extend.
In Figure E.2 the same four signals are shown, but now accompanied by the crosscorrelation between each signal and the original Gauss wavelet. It can be clearly observed
that these cross-correlation signals are look like the original signal, but everywhere the
Gauss wavelet is replaced by its autocorrelation.

139

1.0
0.5
0
-0.5

-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

a) Signal with one event

-0.5

0
time (s)

0.5

1.0

b) Auto-correlation function

1.0
0.5
0
0

-0.5
-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

c) Signal with three events

-0.5

0
time (s)

0.5

1.0

d) Auto-correlation function

1.0
0.5
0
-0.5
0
-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

e) Signal with three events and noise

-0.5

0
time (s)

0.5

1.0

f) Auto-correlation function

1.0
0.5
0
-0.5

-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

g) Signal with random events

-0.5

0
time (s)

0.5

1.0

h) Auto-correlation function

Figure E.1: Four signals, based on the rst derivative of Gauss wavelets, and their autocorrelation function.

140

1.0
0.5
0
-0.5

-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

a) Signal with one event

-0.5

0
time (s)

0.5

1.0

b) Auto-correlation function

1.0
0.5
0
0

-0.5
-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

c) Signal with three events

-0.5

0
time (s)

0.5

1.0

d) Cross-correlation function of c and a

1.0
0.5
0
0

-0.5
-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

e) Signal with three events and noise

-0.5

0
time (s)

0.5

1.0

f) Cross-correlation function of e and a

1.0
0.5
0

0
-0.5
-1.0
0

0.5

1.0
time (s)

1.5

2.0

-1.0

g) Signal with random events

-0.5

0
time (s)

0.5

1.0

h) Cross-correlation function of g and a

Figure E.2: Four signals, based on the rst derivative of Gauss wavelets, and some crosscorrelation functions.

141

Appendix F

Wiener lters
We assume stationary time series: input xt and desired output dt are stationary. A
stationary time series is a time series with statistical properties that do not change with
time. We want to nd the best lter such that the actual output at = ft  xt is close to
our desired signal dt in which the asterisk stands for convolution. In a Wiener lter via
the time-domain we minimize the error energy, which is de ned as:

E=

(dt , at )2 =

(dt , ft  xt )2 = minimum

(F.1)

Least-squares solution

We assume the lter ft has a nite length of N + 1 points. So the goal is to minimize
this error of the energy of the output. To nd the solution for lter ft such that the energy
of the error is minimum is called the least-squares solution of this problem. Minimizing the
error can be achieved by requiring the rst derivative with respect to the lter coecients
fi to be zero, i.e.:
@E = 0
for i = 0; 1; 2; : : : ; N:
(F.2)
@f
Working this out gives:

or

N
@E = 2 X d , X
fnxt,n xt,i = 0
t
@fi
t
n=0

X
t

dt xt,i ,

Bringing one term to the other side


N
XX
t n=0

N
XX
t n=0

fnxt,nxt,i = 0:

fnxt,n xt,i =

X
t

dt xt,i

(F.3)
(F.4)

(F.5)

and interchanging the order of summation on the left-hand side gives:


N
X
X
n=0

fn

xt,n xt,i =

142

X
t

dt xt,i :

(F.6)

Now substituting s = t , i, we obtain:


N
X
X

n=0

fn

xs+(i,n) xs =

X
s

ds+ixs

(F.7)

and we recognize the auto- and cross-correlation function on the left- and right-hand side,
respectively:
N
X
n=0

fnxx[i , n] = dx[i]

for i = 0; 1; 2; : : : ; N;

(F.8)

where the correlation is denoted by  (see also Chapter 2). Using the fact that the signal
is real, so xx[i] = xx [,i], we obtain the matrix system:

0  [0]
xx
B

B
xx [1]
B

B
xx [2]
B
B
@ ...

   xx[N ] 1 0 f [0] 1 0 dx [0] 1


   xx[N , 1] C
CC BBB f [1] CCC BBB dx [1] CCC
   xx[N , 2] C
CC = BB dx [2] CC :
CC BBB f [2]
..
.
A @ .. CA B@ ... CA
.
xx[N ] xx[N , 1] xx[N , 2]    xx[0]
f [N ]
dx[N ]
xx[1]
xx[0]
xx[1]

xx[2]
xx[1]
xx[0]

(F.9)
This is completely equivalent to equation (4.34) in Chapter 4: the matrix has the wellknown Toeplitz structure.

Analytic example

Let us take a simple example, namely the wavelet (x0 ; x1 ) as input signal. We wish to
get the wavelet (d0 ; d1 ; d2 ): We want to get a lter with length 2, i.e. (f0 ; f1 ): The actual
output is:
(a0 ; a1 ; a2 ) = (f0 ; f1 )  (x0 ; x1 )
= (f0 x0 ; f0 x1 + f1 x0 ; f1 x1 )
(F.10)
We want to determine the lter coecients by minimizing its error energy:

E =

2
X
t=0

(dt , at )2

= (d0 , f0 x0 )2 + (d1 , f0 x1 , f1 x0 )2 + (d2 , f1 x1 )2


(F.11)
Next we set each partial derivative with respect to the lter coecients zero, i.e. for lter
coecient f0 :
or

@E = 0 : 2(d , f x )  (,x ) + 2(d , f x , f x )  (,x ) = 0


0 0 0
0
1 0 1 1 0
1
@f0

f0 (x20 + x21 ) + f1 x0 x1 = d0 x0 + d1 x1 :
The same for lter coecient f1 :
@E = 0 : 2(d , f x , f x )  (,x ) + 2(d , f x )  (,x ) = 0
1 0 1 1 0
0
2 1 1
1
@f1
143

(F.12)
(F.13)
(F.14)

Figure F.1: Some simple examples of Wiener lters (from Robinson and Treitel 1980).
or

f0x0 x1 + f1 (x20 + x21 ) = d1 x0 + d2 x1 :

Combining the two equations yields:


xx[0]f0 + xx[1]f1 = dx [0]
xx[1]f0 + xx[0]f1 = dx[1]:
Now let x be (2,1), and d be (1,0,0). Then we can solve for f :
5f0 + 2f1 = 2
144

(F.15)

(F.16)

2f0 + 5f1 = 2:
(F.17)
It follows then that f = (10=21; ,4=21). For this solution the actual output is:
at = (2; 1)  (10=21; ,4=21) = (20=21; 2=21; ,4=21):
(F.18)
The error energy between the desired output d = (1; 0; 0) and the actual output is:

2  2 2   4 2 1
20
Emin = 1 , 21 + 0 , 21 + 0 , , 21 = 21  0:048:
(F.19)
Of course, if we would allow ft to be 3 lter points long, a zero energy could be achieved:
ft = (1=2; ,1=2; 1=2): It is convenient to normalize E such that it lies between 0 and 1.
We can achieve this by normalizing with the power of the signal (dd [0]):
1
21 = 1 :
=
Emin = Emin
1 21
dd [0]

(F.20)

Some more examples with desired outputs for this wavelet and lter lengths are found in
gure (F.1).

Damped least-squares solution

Sometimes we need an extra constraint in order to prevent the lter coecients fi


to become unstable. This is then achieved by also minimizing the energy of the lter
coecients. Thus, the total energy to be minimized is:

E=

X
t

(dt , ft  xt )2 + 

X 2
fi = minimum;
t

(F.21)

in which the stabilization parameter  de nes to what extend the energy of the lter
coecients need be involved. In a similar way to the unstabilized problem, the derivatives
of the energy with respect to the lter coecients is put to zero, which yields the following
set of normal equations:
N
X
n=0

fnxx[i , n] + fi = dx [i]

for i = 0; 1; 2; : : : ; N:

Again this can be written as a matrix-vector system:

0  [0] +   [1]
xx [2]
xx
xx
B

[1]

[0]
+

xx [1]
xx
xx
B
B
xx[2]
xx[1]
xx [0] + 
B
B
.
B
..
@
xx[N ] xx[N , 1] xx [N , 2]

   xx[N ]
   xx[N , 1]
   xx[N , 2]
..
.
   xx[0] + 

10
CC BB
CC BB
CC BB
A@

(F.22)

1 0  [0] 1
CC BB dx
[1] C
CC
CC = BB dx
[2]
dx
BB . CCC :
.. C
C
. A @ .. A

f [0]
f [1]
f [2]

f [N ]

dx [N ]

(F.23)
As such, stabilization of a linear system of equations is obtained by adding a stabilization
factor to the main diagonal of the autocorrelation matrix. Note that if the stabilization
factor  is taken very large, the estimated lter is just a scaled version of the crosscorrelation dx of the input and the desired signal.

145

Appendix G

Derivation of the DMO-ellipse


In this appendix we will derive the DMO-ellipse, which is a the common-re ection point
time tI as a function of the coordinate xI of point I . The derivation is quite elaborate in
the sense that many geometrical quantities have to be determined.
Let us start with gure (G.1). When a ray re ects at a dipping re ector, it does not
re ect at the point H 0 of the line perpendicular to the interface going through the point
H , half-way between the source and geophone. Because of the dip of the re ector, the
re ection point is slightly shifted, as we also showed in the main text as the re ection
smear. We are rst going to determine this shift.
In gure (G.1) we have drawn a ray which is re ected at an interface which has a dip
. The source S is situated at the origin (for convenience), the point H is at the surface
at half-o set xh and the receiver R is at the surface at distance 2xh . The equation for
the interface, and the line perpendicular to the interface going through H are respectively
given by:

dH + x tan
z = ,x tan + cos
h

(G.1)

xh
z = tanx , tan
(G.2)

in which dH is the distance
from H to H 0 . The intersection point of these two lines gives
the coordinates of H 0 :
xH = xh + dH sin
zH = dH cos
In order to determine the equation of the ray, we put an image source at S  to obtain:
(dH + xh sin ) 2xh (dH + xh sin ) cos
z = cos
(G.3)
dH sin , xh cos2 x , dH sin , xh cos2
The intersection point I 0 between the ray and the interface is given by:
2
xI = xh + dH sin + dxh sin cos2
(G.4)
H
2
zI = dH cos , dxh sin2 cos
(G.5)
H
0

146

S(x=0,z=0)

R(x=2xh,z=0)

dR

dH

dS

S*(x=2ds sin,z=2ds cos)

Figure G.1: The model for deriving the DMO ellipse.


The re ection-point smear can now be evaluated by the distance between the points I 0
and H 0 , being (x2h =dH ) sin cos . Using the traveltime tH rather than dH (tH = 2dH =c),
gives as nal result:

, xH = x2h sin
xI , xH = xIcos
c

2 tH
0

(G.6)

where xH = xh.
The next equation follows straightforwardly from gure (G.1) by looking at the di erence in distance travelled perpendicular to the interface, i.e., tI = tII I and tH = tHH H ,
which is (xI , xh ) sin :
0

tI = tH , xIc=,2xh sin
147

(G.7)

Finally we need the equation as derived in the main text:


2

t2DMO = t2H , 4cx2h sin2

(G.8)

We are now going to combine the last three equations in order to obtain an expression
of tI as a function of xI . To that purpose we write equation (G.6) as
sin = xI , xh t
(G.9)
c=2
x2h H
and substitute this in the other two equations ((G.7) and (G.8)):

tI = tH

2
1 , (xI , xh)

tH = tDMO

x2h

2 !,1=2
(
x
,
x
)
I
h
1,

x2h

(G.10)
(G.11)

The last step is now to combine these two equations in order to get an expression in which
tI is a function of tDMO rather than tH . We then obtain the nal result:

tI = tDMO

2 !1=2
(
x
,
x
)
I
h
1,

x2h

This is the equation of an ellipse in the (tI ; xI ) domain.

148

(G.12)

Appendix H

Derivation of the Kirchho


integral
The classical paper explaining the integral method as applied to seismic migration, i.e.
Kirchho migration, is the one by [Schneider, 1978].

From wave equation to Kirccho integral

Let us start with the wave equation for the pressure p = p(x) = p(x; y; z; t) :

r2p , c12 @t2 p = 0;

(H.1)

which is the wave equation for a source-free 3 dimensional homogeneous medium with
propagation velocity c. This equation combines Newton's second law, and Hooke's law,
i. e. the conservation of mass for a homogeneous, isotropic medium with wave velocity
c. This derivation is given in appendix B, which is taken from [Berkhout, 1984]. When
we want to invoke Huygens' principle, any wave eld can be thought to be the cumulative
e ect of an in nite number of point sources. The solution of the wave equation for a
point source is called a Green's function. Thus a Green's function G is the solution of the
equation:
r2 G , 1 @ 2 G = ,(x , xs)(t , ts);
(H.2)

c2

in which the
superscript s denotes that it pertains to the source, i.e. the point source at
position xs "explodes" at time ts .
In the following, we are going to combine the latter two equations, thereby invoking
Huygens' principle: the Green's function will act as Huygens source that will bring the
measurements from one depth level to the other. When we multiply equation (H.2) with
the pressure p, multiply equation (4.99) by G, and subtract the two equations from each
other, we obtain:

p(x , xs)(t , ts ) = Gr2p , pr2 G , c12 (G@t2 p , p@t2 G):

(H.3)

We can rewrite this as:

p(x , xs)(t , ts) = r  (Grp , prG) , c12 @t (G@t p , p@t G):


149

(H.4)

Now we integrate over some volume D and over all time to obtain:

(x)p(x; t) =

Z1Z

r  (Grp , prG)dV sdts ,


,1 D
1 Z 1 Z @ (G@ p , p@ G)dV s dts ;
t
t
t
c2
,1 D

(H.5)

in which the function (x) is de ned as:


when xfD; @D; D0 g:
(H.6)
(x) = f1; 12 ; 0g
The term on the left-hand side occurs because of integrating out the {function, and
it takes the values 1, 1/2 and 0 depending on whether the argument of the {function
becomes zero within D; at the boundary of D, called @D, or outside D, called D0 . This
integration means physically that we assume there
are Green's functions at all positions
within volume D and also at all di erent times ts . So this integration describes the total
e ect of all these Green's functions.
Let us look at the second integral in equation (H.5). This integral can be integrated by
parts with respect to time giving a term of the form:

Z

(G@t p , p@t G)dV s

1

ts =,1

(H.7)

Since the pressure and its time derivative are zero before the Green's sources are red,
the contribution of the integral at ,1 is zero. At +1 we assume the radiation condition
(which states that the wave elds die out rapid enough such that integration over the
volume D is still negligible), so there the contribution of the integral is zero too. We are
now only left with the rst integral in equation (H.5). Applying Gauss' theorem to the
result, we obtain the integral:
Z1Z
(x)p(x; t) = 41
(Grp , prG)  ndAs dts ;
(H.8)
,1 @D
where n is the outward pointing unit normal on @D. Finally, to obtain a form more
connected to the seismic imaging, we substitute the particle velocity v for the gradient of
the pressure via the equation of motion (equation (B.19) in appendix B). Then, we obtain
the Kirchho integral:
Z1Z
1
(,G@t v , prG)  ndAs dts :
(H.9)
(x)p(x; t) = 4
,1 @D
This equation expresses that if we know the pressure p and the time derivative of the
normal component of the particle velocity on a closed surface, the pressure can be computed in every point inside D. Also, we recognize that the pressure at a certain position
is synthesized by means of a monopole (i.e. G) and dipole (i.e. rG  n) distribution on
a closed surface @D. The propagation of the secondary sources at the boundary @D to
the observation point (x; t) is described by the Green's function G. The same kind of
expression can be derived for the particle velocity, see [Berkhout, 1984], chapter 5, from
which gure H.1 has been drawn.

Using the Kirchho integral for migration

We will apply our results to seismic migration. To that end we consider data recorded
at z = 0; which means that we take @D to be this surface, and assume that the contribution
150

Figure H.1: A pressure eld can be synthesized from the wave elds of a monopole and
dipole distribution on a closed surface, using respectively the particle velocity and pressure
of the actual wave eld at this boundary as their source strengths (after [Berkhout, 1984],
gure 5.1)
from an arbitrarily large hemisphere to the integral is zero. As a simple example, let us
look at the case of a constant velocity medium, which means that there are no boundaries
in the problem. We can rst specify the Green's function, something we have left out
so far in our discussion. Finding the solution for the free-space Green's function is very
standard, and we leave the derivation as an exercise, or we nd it in the literature (e.g.
[Aki and Richards, 1980]):
s
G(x; t; xs ; ts ) = (t , 4tr r=c) ;
(H.10)
in which r is de ned as:

r = (x , xs )2 + (y , ys )2 + (z , zs )2 :

(H.11)

For a constant velocity earth, we want to obtain two Green's functions such that they
vanish on the surface z = 0. This is because we want to get rid of the rst term in
equation (H.9), such the expression becomes more simple. And we are free to choose our
Green's function as long as they are solution of the wave equation. Therefore, we consider
also an image source, with the surface zs = 0 as the mirroring surface and with opposite
amplitude:
s
0
s
G(x; t; xs ; ts) = (t , 4tr r=c) , (t , 4tr0 r =c) ;
(H.12)
in which r0 is de ned as:

r0 = (x , xs )2 + (y , ys )2 + (z + zs )2 :

(H.13)

Note that there are two solutions each time, one with the , and one with the + sign in
equation (H). The rst is for forward and the second for backward (i.e. inverse) propagation. Since we aim to propagate the di raction hyperbolae back to their origin in
151

migration, we only use the anti-causal solution for the Green's function, i.e. the + signs.
We can plug this Green's function into Kirchho 's integral. Making use of the fact that
the Green's function is zero on the boundary @D the rst term in brackets in Kirchho 's
integral (H.9) vanishes. Next, we have to apply the gradient of the Green's
function which
in our case is just the vertical derivative since we consider the surface z s = 0. Then, rG
becomes (notat that the positive z-axis is de ned as pointing down):
rG  n = ,@zs G
s
= ,2@zs (t , 4tr+ r=c) ;
(H.14)
using the fact that r0 (z s ) = r(,z s).
Since z appears only in the combination z , z s , we can take @zs = ,@z and take the
derivative outside the integral:
Z1Z
s
1
s ; ts ) (t , t + r=c) )dAs dts:
(
,
p
(
x
(H.15)
p(x; t) = 2 @z
r
,1 zs =0
Evaluating the integral with the {function, we obtain the result:
Z
s
p(x; t) = ,21 @z s ( p(x ; tr+ r=c) )dAs :
(H.16)
z =0
This is Kirchho 's migration formula, given by [Schneider, 1978]. We would like to stress
that this result does not involve any approximations, the result is only dependent on the
knowledge of the velocity distribution (i.e. vertical derivative of the pressure eld) at the
surface.
In fact it states that thes wave eld in any point in the subsurface p(x; t) can be calculated
from the wave eld p(x ; t) recorded at a plane reference level zs , assumed that we have a
recording from ,1 until +1 at the surface.
Note that if the term t + r=c is replaced by t , r=c the inverse propagation from the surface
to point (x; t) becomes a forward extrapolation.

152

Bibliography
[Aki and Richards, 1980] Aki, K. and Richards, P. G. (1980). Quantitative Seismology.
W. H. Freeman and Co.
[Barry et al., 1975] Barry, K. M., Cavers, D. A., and Kneale, C. W. (1975). Report on
recommended standards for digital tape formats. Geophysics, 40(2):344{352.
[Berkhout, 1984] Berkhout, A. J. (1984). Seismic migration, imaging of acoustic energy
by wave eld extrapolation, B: practical aspects. Elsevier.
[Bracewell, 1978] Bracewell, R. N. (1978). The Fourier transform and its applications.
McGraw-Hill Book Company.
[Claerbout, 1985] Claerbout, J. F. (1985). imaging the earth's interior. Blackwell Scienti c
Publications.
[Deregowski, 1986] Deregowski, S. M. (1986). What is dmo? First Break, 4(7):7{24.
[Dix, 1955] Dix, C. H. (1955). Seismic velocities from surface measurements. Geophysics,
20(1):68{86.
[Gazdag, 1978] Gazdag, J. (1978). Wave equation migration with the phase shift method.
Geophysics, 43:1342{1351.
[Gazdag and Sguazerro, 1984] Gazdag, J. and Sguazerro, P. (1984). Migration of seismic
data by phase shift plus interpolation. Geophysics, 49(2):124{131.
[Hale, 1984] Hale, D. (1984). Dip-moveout by fourier transform. Geophysics, 49(6):741{
757.
[Hubral, 1977] Hubral, P. (1977). Time migration - some ray theoretical aspects. Geophys.
Prosp., 25(4):738{745.
[Pieuchot, 1984] Pieuchot, M. (1984). Seismic instrumentation (Handbook of Geophysical
Exploration, section I:Seismic Exploration). Geophysical Press, Amsterdam. ISBN 0946631-02-6.
[Pullan, 1990] Pullan, S. E. (1990). Recommended standard for seismic (radar) les in
the personal computer environment. Geophysics, 55(9):1260{1271.
[Robinson and Treitel, 1980] Robinson, E. A. and Treitel, S. (1980). Geophysical signal
analysis. Prentice Hall, Inc.
[Schneider, 1978] Schneider, W. A. (1978). Integral formulation for migration in two and
three dimensions. Geophysics, 43(1):49{76.
[SEG, 1980] SEG (1980). Digital Tape Standards. Society of Exploration Geophysicists.
153

[van der Schoot, 1989] van der Schoot, A. (1989). Common re ection point stacking, a
model driven approach to dip moveout. PhD thesis, Delft University of Technology.
[Yilmaz, 1987] Yilmaz, O. (1987). Seismic data processing. Society of Exploration Geophysicists.

154

Contents

155