You are on page 1of 27

Ukrainian Catholic University

Faculty of Applied Sciences


Data Science Master Programme

Singular value decomposition (SVD) in noise


attenuation of marine seismic data
Linear Algebra final project report

Authors:
Iaroslav Plutenko
Mykhailo Hodis

24 January 2019
Contents

Introduction 3
Linear Algebra in modern engineering 3
Singular Value Decomposition as one of the powerful methods of LA 4
Seismic Exploration and Math 6
Description of the industry and underlying principles 6
Basics of marine operations. 8
Conditions of the survey 10
Data processing 11
Combating Noise 12
Linear Algebra in play 13
SVD in noise attenuation 14
Conclusion 25
List of figures 26
References 27
Introduction
Linear Algebra in modern engineering
Organization and structuring of information helps us to make complex tasks easier in
different spheres of everyday life. Once such structures are defined problem could be handled
effortlessly. Linear algebra is generally the study about such structures [5] . Namely,

Linear algebra is the study of linear equations and vectors

First occurrences of procedure for solving linear systems simultaneously were found in in
the ancient Chinese mathematical text The Nine Chapters on the Mathematical Art [4] . However
wider usage of linear systems takes beginning in 1637 after coordinates in geometry were
introduced by René Descartes.
Linear algebra plays a considerable role in almost every area of mathematics. Basic
objects, such as lines, planes and rotations are defined with linear algebra in modern presentation
of geometry. It is applied in functional analysis to spaces of functions. Because of the possibility
to model different natural phenomena and convenient computations with such models, the study
of linear algebra is also being used in most sciences and engineering spheres. Even for nonlinear
systems, that cannot be modeled with the linear algebra, linear algebra sometimes helps to make
first-order approximation.
Among a wide list of linear algebra real-life application possibilities, here are just few of
them:
● Load and displacements in structures.
● Compatibility in structures.
● Finite element analysis (mechanical, electrical, and thermodynamic applications).
● Stress and strain in more than 1-D.
● Mechanical vibrations.
● Current and voltage in LCR circuits.
● Small signals in nonlinear circuits = amplifiers.
● Flow in a network of pipes.
● Control theory (governs how state space systems evolve over time, discrete and
continuous).
● Control theory (optimal controller can be found using simple linear algebra).
● Control theory (Model Predictive control is heavily reliant on linear algebra).
● Computer vision (used to calibrate camera, stitch together stereo images).
● Machine learning (Support Vector Machine).
● Machine learning (Principal Component Analysis).
● Lots of optimization techniques rely on linear algebra as soon as the dimensionality
starts to increase.
● Fit an arbitrary polynomial to some data.

Singular Value Decomposition as one of the powerful methods of LA


Describing a matrix using its constituent elements is called matrix factorization also
known as matrix decomposition.
One of the most used and widely known matrix decomposition methods is the Singular-
Value Decomposition, or SVD. It is more stable than other methods, because any matrix could
be decomposed with SVD.
This technique has a long and somewhat surprising history. Its beginning takes place in
the social sciences with intelligence testing. Researchers noted that tests given to measure
different aspects of intelligence, such as verbal and spatial, were often closely correlated, so they
assumed that there was a general measure of intelligence in common, which they called “g” for
“general intelligence,” also today known as “I.Q.”. These researchers set about teasing out the
different factors that made up intelligence to pull out the most important one.
Today SVD is a key topic in Linear Algebra having numerous real-life applications,
including compressing, denoising, data reduction, signal processing and statistics. SVD
factorization is a powerful and indispensable tool for virtually all aspects of engineering,
economics, science etc. where the math is involved.
The idea behind the SVD is that every real or complex matrix A can be factored into the
product of three matrices
U ΣV T
where U and V are unitary matrices with real or complex entries and their columns are
orthonormal eigenvectors of A AT and AT A respectively. Matrix Σ is a rectangular diagonal
matrix, containing real non-negative numbers (square roots of the non-zero eigenvalues of A AT
and AT A ) on a diagonal in decreasing order. Generally, components of the SVD decomposition
have following structure:
● U is an m×m matrix with columns called left-singular vectors of A.
● Σ is an m×n matrix with diagonal entries σi called singular values of A.
● V T is an n×n matrix with columns called right-singular vectors of A.

Figure 1. SVD components shapes.

We already know, that matrix sense could be interpreted as a linear transformation in


space. This transformation can be decomposed into three sub-transformations: 1-rotation, 2-
rescaling, 3-rotation. Obviously, these steps correspond to matrices U, Σ, V T . For a better
understanding, the impact of such sub-transformations on a unit circle is depicted on the Figure
2.

Figure 2. SVD decomposition components influence on some unit disc.

SVD proved to be a computationally viable tool for solving a wide variety of problems
arising in many practical applications. Among possible usages of SVD for other computations
are: pseudoinverses, solving homogeneous equations, finding nearest orthogonal matrix, total
least squares minimization, separable models decomposition. The gist of using SVD is that these
applications require knowing the rank of the matrix, approximations of a matrix using matrices
of lower rank, their orthogonal complements and orthogonal projections onto these subspaces.
Mostly such computations should be done with presence of impurities in the data – noise – and
SVD appears to be quite effective for these computations.
Seismic Exploration and Math
Description of the industry and underlying principles
One of the important scientific and engineering areas employing extensive processing of
input data is the seismic surveying.
In seismic surveying, sound waves are mechanically generated and sent into the earth.
Some of this energy is reflected back to recording sensors, measuring devices that record
accurately the strength of this energy and the time it has taken for this energy to travel through
the various layers in the earth's crust and back to the locations of the sensors. These recordings
are then taken and, using specialized seismic data processing, are transformed into visual images
of the subsurface of the earth in the seismic survey area. Just as doctors use x-rays and audio- or
sonograms to “see” into the human body indirectly, geoscientists use seismic surveying to obtain
a picture of the structure and nature of the rock layers indirectly.
Seismic surveys are conducted for a variety of reasons. They are used to check
foundations for roads, buildings and large structures, such as bridges. They can help detect
groundwater. They can be used to assess where coal and minerals are located. One of the most
common uses of seismic data is in connection with the exploration, development, and production
of oil and gas reserves to map potential and known hydrocarbon-bearing formations and the
geologic structures that surround them. Most commercial seismic surveying is conducted for this
purpose. Oil & gas exploration and production is conducted in many places on the earth's
surface, in both the onshore (land) and offshore (marine) domains. Although the principles are
identical, the operational details differ between the two domains. In this overview, only marine
operations will be addressed.
Marine seismic acquisition is the most common method for offshore exploration. In most
marine work, the sensor is a hydrophone that detects the pressure fluctuations in the water caused
by the reflected sound waves. The cable containing the hydrophones, called a streamer, is towed
or ‘streamed’ behind a moving vessel. These streamers are typically 3 to 8 kilometers long,
although they can be up to 12 kilometers long depending on the depth of the geophysical target
being investigated
Acquisition takes place with large seismic vessels towing one or multiple airgun arrays
behind the vessel. This equipment is capable of producing a seismic signal by firing highly
pressurized airshot into the seawater. Receiving devices are towed behind the ship as well as
airguns in one or several long streamers that are conventionally 6-9 kilometers long. Figure
below shows the schematics of a marine seismic survey with one seismic vessel, two airgun
arrays and multiple streamers towed behind the vessel. [6] [10]
Towed streamer operations represent the most significant commercial activity, followed
by ocean bottom seismic survey (including arrays placed on the seafloor and arrays buried a
meter or so below the seafloor).
Figure 3. Marine seismic data acquisition set-up towed by vessel.

When energy from a sound source is released in the marine environment, pressure waves
are created in the water column. The magnitude of the pressure is called amplitude, and the
excited waves are P-waves, or compressional waves. To a first approximation, water will only
propagate P-waves and the sensors that make accurate measurements of the amplitudes of P-
waves are hydrophones. The velocity of sound in seawater and density of seawater can vary as a
result of changes in salinity, temperature, and gas and sediment content, under certain
hydrographic conditions layers can be formed that can reflect P-waves. and that can also trap
certain frequencies of P-waves. In this latter case, the trapping layer will be called a waveguide.
Although the terms amplitude (pressure) and energy are often used interchangeably (as in “the
P-wave pressure, or the P-wave energy”), energy is proportional to the square of the amplitude.
Rocks underlying the seafloor have rigidity: water does not. When P-waves enter the
rock, they can be transmitted and reflected as in water, but they can also convert to S-wave or
shear-waves. It is impossible for P-waves to propagate in rocks without mode-converting
(converting from P-wave mode to S-wave mode) to S-waves, but most seismic surveying is
accomplished using pressure sensors in the water column, so no direct S-waves are recorded in
that situation. However. S-waves contain information of use to geoscientists that is not contained
in P-waves. so sometimes it is advantageous to record S-waves. This can be done by placing
sensors on the seafloor and catching the S – wave energy that has been created by the initial
production of P-waves from the marine source. [2]
Within a given exploration zone, the details of a specific survey operation can vary
enormously. There are, however, two principal categories of seismic surveying. These are two-
dimensional (2D) seismic surveys and three-dimensional (3D) seismic surveys. 2D can be
described as a fairly basic survey method, which, although somewhat simplistic in its underlying
assumptions, has been and still is used very effectively to find oil & gas.
A sub-category of 2D is the site survey where ultra-high-resolution data is acquired in the
immediate vicinity of an intended well to identify both seabed and shallow subsurface hazards.
Ultra-high resolution here means that the survey is intended to provide more detailed information
about the seafloor and the conditions of the rock down to a depth of a few hundred metres
beneath the seafloor. 3D surveying is a more complex method of seismic surveying than 2D and
involves greater investment and much more sophisticated equipment than 2D surveying. Until
the beginning of the 1980s, 2D work dominated in oil & gas exploration, but 3D became the
dominant survey technique in the late 80s with the introduction of improved streamer towing and
positioning technologies.
4D surveys (or time-lapse 3D) are simply 3D surveys which are repeated over the same
area, some period of time elapsing between the initial survey and the subsequent surveys. There
might be several repeated surveys, depending on the specific oil or gas field in question. The
purpose of this type of survey is to obtain images of how the hydrocarbon reservoir is changing
over time due to production in order to maximize hydrocarbon recovery from the field. 4D
surveys have become increasingly used since the mid-1990s, and now represent a significant
percentage of overall seismic activity.
More recently, increasingly sophisticated towed streamer acquisition schemes - multi-
azimuth, wide azimuth and rich azimuth - have been developed to provide improved subsurface
imaging in geologically and geophysical challenging environments.
In 1984 the first twin streamer operation was undertaken, which effectively doubled the
data acquisition efficiency of the vessel by generating two subsurface lines per vessel sail line.
By moving to twin source/twin streamer configurations in 1985, the output was increased to four
subsurface lines per vessel sail line or pass. The next logical step of towing three streamers and
two sources behind a single vessel, thus acquiring six lines per pass, was achieved in 1990. The
number of deployed streamers has consistently increased with as many as 16 streamers having
been towed.
Multi-streamer operations require a significant amount of in-sea equipment - the 16-
streamer operation referred to above entailed 72 kilometers of cable being towed behind the
vessel.
Consequently, the back deck of the vessel becomes very busy due to the activity involved
in handling equipment including streamers, sources and the related control devices.
Organizing and operating such a set-up in a safe and efficient manner requires a very
high level of knowledge and skill. [2]

Basics of marine operations.


The first stage of normal operations (commonly called mobilization) is supplying the ship
being with all necessary fuel, water, food. seismic equipment and crew. It will then sail to the
designated survey area. The vessel will have been provided in advance with all necessary details
regarding the survey layout and design, and what and how much equipment will be deployed.
The navigators will have information, specifying where each data acquisition sail line must start
and finish, and the location of each source or shot point. This information will have been fed into
the onboard integrated navigation system. On the bridge, the captain will ensure that while the
ship is under normal manual control, he will be navigating as agreed to the first line-start
position. He and the seismic crew (party) manager will be closely monitoring wind, weather
conditions and any incoming reports.
As the survey area is approached, the observers will deploy the streamers, attaching depth
monitoring and control devices (birds) at regular spacings as they go. As sea water temperature
and salinity vary by location, considerable care is taken to ensure that the streamers are correctly
“ballasted”, to be neutrally buoyant for the chosen operating depth for the specific survey area.
Ballasting is accomplished by ensuring that the upward lift of the positively buoyant streamer is
exactly counterbalanced by the weight of the streamer electronic module, stress members,
external devices and, if necessary, externally attached weights. The mechanics will start the
compressors and prepare and check the source arrays, which are deployed after the streamers, but
can later be recovered and redeployed when necessary. The navigators will work with the
mechanics and observers to attach the necessary buoys for positioning.
In the instrument room, positioning of all in-sea equipment will be verified and all
equipment will be powered up, tested and checked for trouble free operation. Test records for
background noise will be made. The streamer, source and buoy links will all be tested, and the
whole system confirmed ready for use. As the ship approaches the start of a pre-defined sail line,
it is said to be on the run-in. This is the stage where it is very close to the agreed start position,
the vessel has the correct heading and the streamers are as much in line behind the vessel as
conditions will allow. Now the ship is steered according to the input from the navigation system.
Around the vessel, all involved crew members will be monitoring the ships position from
information screens in their respective areas. The navigator monitors the approach to start of line
in terms of distance to go, heading and speed to ensure that no positioning problems arise at the
last moment. The mechanics will be closely watching the compressor monitors and will make a
last-minute visual inspection of the source equipment that can be seen from the vessel. The
observers will take any final test records for future reference and will check the source control
system.
Depending on the country of operations and the area-specific environmental controls in
place, a visual watch for marine mammals from the vessel may be ongoing for at least 30 to 60
minutes before the source is first activated. On some surveys, dedicated acoustic monitoring
methods may take place. It is only when the crew has been informed that no marine mammals
arc present that the source can be activated, and data acquisition can proceed.
The source is activated at the first predetermined position and data acquisition
commences recorded. This process is repeated at successive regularly spaced distance intervals,
(with the source firing every 10 to 12 seconds) depending on vessel speed as determined by the
navigation system. This process is repeated until the vessel has reached the pre-defined end of
the sail line. Throughout the recording period, all personnel involved perform detailed prescribed
tasks. The navigator monitors the positioning system output, checking for any discrepancies, and
completes the end of line paperwork and prepares plans for the line change (relocating of vessel
to the next sail line). The mechanic monitors the compressor performance, checks the backdeck
towing systems, and is ready to deal with any 'mechanical' problems. The observer monitors the
data recording system operation, changes recording media (typically high-density tape decks),
and fills in the line log as the line progresses.
When the line is complete, all systems stop recording. The ship is now in line-change
mode. The navigator has planned how the vessel should maneuver to start the run-in for the next
line. The line-change time varies according to the layout of the survey and the configuration of
the equipment but is usually between one and three hours. During the changeover period, all the
crew involved work quickly to resolve any problems and make modifications or repairs in
readiness for the next line. The run-in is then started, all equipment is readied, the sources
activated, and the activity cycle is repeated. Infrequently, technical failures occur, and line-starts
are delayed, or lines are terminated early. Operations may also be affected by weather, and
oceanographic conditions or adjacent shipping. [3]
However increasingly popular
become Multi-azimuth methods, like
Coil Shooting. Coil Shooting sample
is used in our project. The Coil
Shouting technique, in which a single
vessel acquires full azimuth 3D
seismic data by sailing in circles,
delivers more accurate and reliable
subsurface images than conventional
3D methods in areas of complex
geology. [7]
Figure 4. Coil Shooting
schematics.

In general, seismic surveys are planned to be acquired in calm weather, to minimize the
amount of extraneous noise recorded along with the primary signals. This noise increases with
increasing sea state and most companies specify how much measured noise is acceptable during
the acquisition of the data. If the prevailing conditions lead to this level being exceeded, the
acquisition is stopped. If conditions become excessive, then the streamers and source arrays may
have to be recovered. The vessel will “ride out the storm” on location or move to more sheltered
waters, whichever is the safer and better operational option; the vessel crew's safety being the
overriding concern.

Conditions of the survey


If the survey is in an area of high shipping activity, seismic operations can be difficult. A
seismic survey vessel is limited in its maneuverability because of the long streamers (generally
several kilometers, with a maximum length currently of approximately 12 kilometers) deployed
Irom the stern. The main vessel itself is in little danger, but with many vessels in close proximity,
the streamers may be fouled or cut. Aside from the large financial loss from the value of the
streamers themselves, this can mean reduced revenues through disrupted operations. In difficult
areas, chase or guard boats are employed. These are smaller vessels, usually ex-fishing boats,
which contact potentially threatening shipping traffic and direct them away from possible contact
with the streamers.
The survey vessel is sometimes required to operate in areas of strong currents; shallow
water such as over sandbanks, or in the vicinity of obstructions such as oil platforms. These may,
in many cases, cause problems and affect the rate or quality of data acquisition due to the limited
maneuvering ability of the vessel. Careful planning can mitigate these problems to some extent
in some areas, but the ability of the survey vessel to acquire data efficiently will be severely
hampered.

Data processing
Initially, the processing team's schedule is linked with the acquisition phase of the
operation. At the beginning of the survey, the vessels record a few test lines' that either track
previous surveys or pass near wells offering check-shot data. These data are transported ashore
by helicopter or supply boat, conveyed to the processing team which uses them to review data
quality and select parameters for beginning the process.
After the survey is finished, perhaps eight weeks after it began, acquisition and
processing contractors continue working in parallel. While the processing team begins its
massive numerical manipulations, the acquisition contractor performs a series of no less
important navigational computations. These determine the exact position, for every shot in the
survey, of the survey vessels, airgun arrays and tailbuoys at the end of the streamers, and also the
precise shape of the streamers themselves.
The navigational results are then dispatched to the processing team which merges them
with the seismic data to perform stacking. This is a pivotal moment in the six-month saga.
Throughout its task, the processing team focuses on two main goals: enhancing signal at
the expense of noise, and shifting acoustic reflectors as seen on sections to nearer their true
position. Stacking is a key signal enhancement technique, and unlike some of the other noise
reduction algorithms that processors use, it is intimately tied to how data are acquired.
Stacking is the averaging of many seismic traces that reflect from common points in the
subsurface. Each trace is assumed to contain the same signal but different random noise, so
averaging them enhances the former while minimizing the latter.
Each trace in a set of traces having a common midpoint, called a CMP gather, is nec-
essarily recorded with a different airgun-hydrophone group spacing, or offset — this varies from
about 50 meters for the so-called near trace, to several kilometers for the farthest trace. As offset
increases, the two-way reflection time also increases because the sound has a longer path to
travel. This effect, called moveout, produces a downward curving hyperbola on the gather.
Moveout must be corrected for before the traces are stacked, otherwise the reflections will not
sum constructively. In the normal moveout (NMO) correction, every trace is converted to zero
offset giving all traces a consistent set of two-way times to the reflectors.
Stacking, with its accompanying velocity analysis and interpolation, is pivotal because it
cuts the amount of data manyfolds. It divides the processing into two parts—up to and including
stack, and after stack. Processing before stack takes at least 70 percent of the team’s time and
resources, because of the amount of data and analysis involved. Going back to correct mistakes
and data errors or changing processing parameters and then restacking is expensive. Post-stack
processing, while no less critical to the appearance of the final product, can be less intense
because the data set is reduced, and reprocessing using different processes and parameters costs
less. [2]

Combating Noise
Noise contamination is a common event in seismic reflection resulting in very striking
features in the seismograms, hindering the data processing and interpretation.
The attenuation of seismic noise is a challenging task. Various techniques are utilized on
each level of acquisition, starting from hardware, continuing with real time digital processing
and finishing with high-level comprehensive onshore post-processing.
Commonly frequency filters are employed, but they often do not show good results. The
characteristic of noise depends mainly on the type of data researchers are working. In land data,
the most common is the ground roll, that has low frequencies and high amplitudes whereas in
marine data (on a shallow water acquisition), head waves and harmonic modes are linear and
dispersive events that mask part of the interest reflections, influencing the delimitation of
lithological layers. In this study we used the method Singular Value Decomposition (SVD) in
order to mitigate some types of noise in seismic reflection and create alternatives to
interpretation based on different frequency content present in seismic section. This new approach
results in a technique that can identify and mitigate the unwanted event from the seismograms,
trying always to preserve or enhance the interest signal.
In general, noise is divided into the categories: coherent and random noise. Coherent
noise can be followed and predicted over a number of traces, while random noise is
unpredictable. In a general way, seismic data x can be decomposed into the sum of signal, s and
noise, n, as follows:
x=s+n

Figure 5. Noise classification.

We’ll focus on swell noise because it is dominant in Coil Shooting samples.


Swell noise is high amplitude, low frequency (2-15 Hz) noise caused by rough weather
conditions during acquisition. Swell noise cannot be removed by a band pass filter as it would
remove seismic signal that belongs to the same frequency range.
The level of cross-flow-induced noise is increased when the data are acquired during
turns or along circles (like in Coil Shooting) and when marine currents are strong. Towed marine
data suffers from vertical and horizontal flow of water across the streamers. Vertical cross-flow
can be induced by wave action and results in so-called swell noise. Horizontal cross-flow is
induced by ocean currents and as vessel turns or sails along a circular path.
All sources of cross-flow generate vibrations that propagate along the streamer and are
recorded as high-amplitude low-frequency noise.
The cross-flow noise can be more than 20 dB higher in amplitude the seismic signal in
the frequency range of 0 to 5 Hz and comparable with the signal from 5 to 10 Hz. [10]

Linear Algebra in play


Methods of Linear algebra are widely used in signal processing both in pre-stack and
post-stack stages. In pre-stack phase computations are applied to refine the signal, enhance signal
to noise ratio, here Singular Value Decomposition is employed. In post-stack various techniques
are used, like Least Square Approximation in sophisticated 3D maps construction. Our focus will
be on SVD application on raw data, this method can significantly attenuate the noise.
Acoustic signal captured by hydrophones along the streamer is digitized and recorded in
special format on tapes. This format is called SEG-Y, it is one of several standards developed by
the Society of Exploration Geophysicists (SEG) for storing geophysical data. It is an open
standard, and is controlled by the SEG Technical Standards Committee, a non-profit organization
Data in this format can be transferred to other computer media and viewed/analyzed by
dedicated software. Typically, in the viewer raw data can be interpreted like image. It consists of
many vertical lines representing traces, each trace is a sampled waveform of recorded sound, so
it contains both signal captured from the various reflectors and noise. Vertical axis represents the
time passed from airgun shot. Horizontal - number of traces. Traces are associated with
hydrophones located along the streamer with specified offset, they receive reflected signal with
delay caused by increasing travel path of signal – moveout.
Visualization of traces is below. We can observe strong contamination by low frequency
noise. This representation is pretty much like the image and SVD method is well known for
dealing with image processing, no wonder SVD can be applied to raw data treating it as a raster
matrix.
Figure 6. Visualisation of raw seismic data sample in SeismiGraphix.

SVD in noise attenuation


SVD can be considered as the cornerstone and the atrium to broader notion of Singular
Spectrum Analysis (SSA). The main challenge in the use of SSA for noise attenuation arises
from the selection of the number of singular values that recover the data. [8]
The answer to this depends on how correlated are the signal and the noise. In general, the
signal is believed to be represented by the largest singular values, but this can change when the
noise is not white, or the signal-to-noise ratio is too low. In our example swell noise is
represented by largest singular values.
The singular values σi are measures of the lateral correlation, i.e., how correlated events
are from trace to trace. Hence, a laterally coherent event will have a high singular value. Since
the singular values are sorted in decreasing order such that σ1 ≥ σ2 ≥ . . . ≥ σ r ≥ 0, then the
first eigenimages consist of laterally coherent signals while the latter ones are associated to
random noise as the degree of correlation is low.
This is best illustrated by an example.
Figure on the left is a picture of a noisy
T, that has been decomposed into its
eigenimages by SVD. We can see that
the first two eigenimages contain the
laterally coherent T, while the other
eigenimages contain noise. By
displaying the image as
Figure 7. Eigenimages with coherent pattern and random noise.
eigenimages, one can understand why rank reduction is used in noise attenuation as it separates
the noise from the signal. [6] [9]
However, for swell noise situation is opposite – first eigenimages contain strong high-
amplitude noise which should be subtracted from the entire picture.
The assumptions are following:
● Data = Seismic+Noise.
● Noise amplitudes >> Signal amplitudes.
● Largest singular values of the matrix Σ correspond to the largest amplitude values, which
are associated with the cross-flow streamer noise.
The noise is estimated iteratively in frequency band typically 0 to 5 Hz or 0 to 10 Hz.
The number i of largest singular
values that will be kept in the SVD and the
number of iterations are the critical
parameters for this method.
If these numbers are too high, the
signal could be attenuated. In this
implementation, the numbers of singular
values and number of iterations can vary
from shot to shot as a function of the noise
level.
Also, the process can be stopped if
the difference of the noise estimated in two
consecutive iteration is less than a user-
defined threshold.
The criterion to detect the noisy
traces is based on the calculation of the root
mean square amplitude in the window where
the noise dominates the signal. [7]
Figure 8. Workflow of noise attenuation by
SVD.

For our work we determined this criteria as per observer perception, which is less
scientific but is sufficient for demonstration purposes
For the current project the unprocessed noise contaminated sample of Coil Shooting
survey was obtained (courtesy of Nick Moldoveanu, Global Geophysical Advisor at
Schlumberger), file of SEG-Y format, 15 sec record length, 2ms sampling rate, 2560 traces per
shot record, 3 records. Visualization of these seismogram was shown earlier. Useful signal is a
thin tilted streak across the upper part of the image while the entire image is heavily
contaminated by ripple-like noise.
This file of 233 MB size contains 7680 traces, each trace has 7681 values - samples,
according to sampling rate 2 ms, total 15 sec of duration. This division makes corresponding
matrix almost square, however we will take smaller fragment to reduce computational time and
to demonstrate results better. As stated earlier there are three records in the file, 2560 traces each,
but taking even less traces will be enough for demo.
According to SEG Technical Standards Committee standards SEGY file header of 3200
bytes with text data, 400-byte binary file header, also it may contain optional extended headers
(not present in our file) followed by trace data. Each trace has 240-byte header, the length of
trace and type of data is specified in headers, most often it’s 4-byte values (equivalent to float
primitive type in Java).

Figure 9. Structure of SEGY file.

Software for viewing SEGY files and simple manipulation on them are available for free
with GNU and other types of licenses. We used cross-platform SeismiGraphix standalone
application written in Java which allows read and view SEG-Y format, open and edit headers.
Author and developer - Abel Surace who started his career in Oil and Gas Exploration industry
then became Software Engineer developing and maintaining awesome applications for Seismic
and Well data management used by several Oil and Gas companies in Canada, USA and
worldwide.
To perform matrix operation on SEGY data first we need to retrieve the data from file
and put into the matrix. There are plenty of dedicated libraries for Python but for Java they are
scarce. In our example we start reading the file, place header(s) in the memory, then read trace
data (each trace header is also stored in memory) into array of float primitives (4 bytes long),
convert it into matrix structure.
The obtained matrix undergoes Singular Value Decomposition. All vector, matrix types
and operations are provided by org.apache.commons.math3 library. Unfortunately, SVD
operation in our set-up is resource consuming operation for large matrices - that was the main
reason we didn’t process the entire file reducing it to the fragment.
After decomposition we construct back low-rank matrices corresponding to each singular
value. First low-rank matrix based on the largest singular value is subtracted from initial matrix.
As was explained earlier swell noise is appropriate for largest singular values, so by subtraction
we literally remove it from the source data.
Resulting matrix is converted back into SEGY file. Each column obtains back trace
header stored in memory and entire matrix is flushed into file prepended by file headers. Such
file is read by SeismiGraphix to visualize results after each iteration.
On the second cycle low-rank matrix for second singular value is build and is subtracted
from matrix obtained after first cycle. Thus, we have matrix additionally cleared from swell
noise. This matrix is also flushed into the next file to visualize results.
It was determined empirically that best results are achieved circa 50 cycles. More
iterations lead to signal deterioration. According to the practical method after each cycle manual
operations should be performed like filtering and refining, but we skipped them using only low-
rank matrix calculations and subtractions. Our goal was to demonstrate viability of the
mathematical method for swell noise attenuation.
As summary the routine performs following steps:
1. Original SEGY file is opened for reading.
2. Textual file header and binary file header are read and stored in memory buffer.
3. Loop is started to read trace data. Trace headers are copied into byte buffer in array
fashion in order to be retrieved back later for writing. Trace data are copied into two-
dimensional array. 1000 traces are taken to constitute example for further work.
4. Upon completing the loop array is converted to matrix.
5. Matrix is decomposed (SVD)
6. New loop is started (number of iterations = 51). Each iteration produce low-rank matrix
starting from the largest singular values. This matrix is subtracted from the original
matrix first. Result are written into the file, using stored trace and file headers to comply
with file format. Each consecutive iteration produce matrix without first largest k singular
values.

Source Java code:

package com.la.marine;

import org.apache.commons.math3.linear.Array2DRowRealMatrix;
import org.apache.commons.math3.linear.RealMatrix;
import org.apache.commons.math3.linear.RealVector;
import org.apache.commons.math3.linear.SingularValueDecomposition;
import java.io.*;
import java.nio.ByteBuffer;

public class App


{
   public static void main( String[] args )
   {
       try {
           DataInputStream dataInputStream = new DataInputStream(
                   new FileInputStream("shots_noise_sample.segy"));
           int nTraces = 1000;
           int traceHeaderLen = 240;
           int nSamples= 7681;
           int initLen = 3200+400;
           int numIter = 51;
           
           ByteBuffer initHeaders=ByteBuffer.allocate(initLen);
           dataInputStream.readFully(initHeaders.array());
           ByteBuffer[] traceHeaders = new ByteBuffer[nTraces];
           
           double[][] traceData = new double[nTraces][nSamples];
           for (int j=0; j<nTraces ; j++) {
               traceHeaders[j] = ByteBuffer.allocate(traceHeaderLen);
               dataInputStream.readFully(traceHeaders[j].array());
               for (int i = 0; i < nSamples; i++) {
                   traceData[j][i] = (double)dataInputStream.readFloat();
               }
           }

           dataInputStream.close();
           System.out.println("Read completed");

           RealMatrix mx = (new Array2DRowRealMatrix(traceData)).transpose();

           int height = mx.getRowDimension();


           int width = mx.getColumnDimension();

           System.out.println("Matrix has rows m = " + height);


           System.out.println("Matrix has columns n = " + width);

           //SVD magic goes here - thanks to Apache commons.math3 library


           SingularValueDecomposition svd = new SingularValueDecomposition(mx);
           RealMatrix uMatrix = svd.getU();
           RealMatrix vMatrix = svd.getV();

           double sValues[] = svd.getSingularValues();

           RealVector uk;
           RealVector vk;

           RealMatrix ak;
           RealMatrix akPrev = mx.copy();
           
           DataOutputStream dataOutputStream;

           for (int k=0; k < numIter; k++ ) {


               uk = uMatrix.getColumnVector(k);
               vk = vMatrix.getColumnVector(k);
               ak=uk.outerProduct(vk);
               ak = ak.scalarMultiply(sValues[k]);
               akPrev = akPrev.subtract(ak);

               dataOutputStream =
                       new DataOutputStream(
                               new FileOutputStream("from_matrix_back"+k+".segy"));

               dataOutputStream.write(initHeaders.array());
               for (int i=0; i<width; i++ ) {
                   dataOutputStream.write(traceHeaders[i].array());
                   for(int j=0; j<height; j++) {
                       dataOutputStream.writeFloat((float)akPrev.getEntry(j,i) );
                   }
               }
               dataOutputStream.close();
               System.out.println("Processed #" + k);
           }
       } catch (IOException e) {
           e.printStackTrace();
       }
   }
}

In the output we have 51 files SEGY with different degree of denoising.


Link to archive containing several files from these samples.
https://drive.google.com/open?id=17lvXftKU-L4c4SrklZNqNAWLtBRyQC3Q
Figure 10. Result after 1st iteration.

Figure 11. Result after 2nd iteration.


Figure 12. Result after 5th iteration.

Figure 13. Result after 10th iteration.


Figure 14. Result after 23rd iteration.

Figure 15. Result after 40st iteration.

On higher iterations Normalization is applied (button Norm in SeismiGraphix) which


doesn’t deteriorate the visualization. When strong swell noise is present, and Normalization is
enabled this leads to higher contrast which impedes the overall perception – see for example
Normalization on first image.
Figure 16. Normalization applied on the image from 1st iteration.

However, the dead(black) traces are visually removed, on latest images this is useful
function.

Figure 17. Results of 51st iteration (with normalization).


We see that swell noise is removed to great extent and useful signal is standing out from
the smooth environment.
Results in the trace examining mode (magnified fragment), original file:

Figure 18. Trace view mode on the raw data image.

Note how trace record is ‘curved’ by the low-frequency noise


Last file - the trace is relatively straight.
Figure 19. Trace view mode on the data of the last iteration.

Ноwever, one can notice that signal is starting to deteriorate - instead of sine pattern it’s
getting sharp edges. So, we stopped the loop at 51 iteration assuming it’s the best result we can
achieve with pure SVD.
Further enhancement of signal should be conducted by other methods, actually they are
supposed to be applied at all steps between iterations.
Conclusion
We applied Singular Value Decomposition to digital seismic data in order to remove or
attenuate strong coherent swell noise which happened to be represented by large singular values.
As per our experience we achieved anticipated results by producing visualized records
largely stripped of unwanted ripple and with distinct signal.
The drawback of this method is that it still requires human supervision to evaluate
preliminary and intermediate results and adjust them with other techniques. For large sets of data
containing millions of records it is hard toil that still wants automation. No doubt this task will be
achieved in nearest future with application of specific algorithms and neural network-based
solutions once they became faster and smarter. Likely the industry of seismic acquisition will be
evolving towards the increase of computation powers onboard, developing communication
network to exchange the data between vessel, shore, data storages and computation centers,
forming integrated system for industry like those being developed for space industry, military etc.
For short term SVD for noise attenuation will require highly customized software allowing
to store and manipulate intermediate results without acceptable timelines, this may require
dedicated hardware as well.
Actually, there is no need to subtract each low-rank matrix iteratively – we did it to
produce and visualize intermediate results after each iteration. Second reason - in our
circumstances it showed better results than other approach - perhaps due to better control over the
process, supervising middle steps and working with raw data.
When the number of discarded singular values is predicted the required matrix can be
obtained by re-multiplying A=U Σ V T without defined set of large singular values. Python
libraries can do it relatively fast as this is recognized language for researchers, scientist and
engineers.
List of figures
Figure 1. SVD components shapes. 4
Figure 2. SVD decomposition components influence on some unit disc. 5
Figure 3. Marine seismic data acquisition set-up towed by vessel. 7
Figure 4. Coil Shooting schematics. 10
Figure 5. Noise classification. 12
Figure 6. Visualisation of raw seismic data sample in SeismiGraphix. 14
Figure 7. Eigenimages with coherent pattern and random noise. 14
Figure 8. Workflow of noise attenuation by SVD. 15
Figure 9. Structure of SEGY file. 16
Figure 10. Result after 1st iteration. 19
Figure 11. Result after 2nd iteration. 19
Figure 12. Result after 5th iteration. 20
Figure 13. Result after 10th iteration. 20
Figure 14. Result after 23rd iteration. 21
Figure 15. Result after 40st iteration. 21
Figure 16. Normalization applied on the image from 1st iteration. 22
Figure 17. Results of 51st iteration (with normalization). 22
Figure 18. Trace view mode on the raw data image. 23
Figure 19. Trace view mode on the data of the last iteration. 24
References
[1] Bekara M. Local singular value decomposition for signal enhancement of seismic
data. Geophysics. 2007
[2] Borehav D., Kingston J., Shaw P., Zeelst J. [1991] 3D Marine Seismic Data
Processing.
[3] Dondurur D. Acquisition and Processing of Marine Seismic Data
[4] Hart, Roger. The Chinese Roots of Linear Algebra. JHU Press. 2007
[5] David Cherney, Tom Denton, Rohit Thomas and Andrew Waldron. Linear Algebra.
[6] Magnussen F. De-blending of marine seismic hydrophone and multicomponent
data. University of Oslo June, 2015
[7] Moldoveanu, N. [2011] Attenuation of high energy marine towed-streamer noise. In:
2011 SEG Annual Meeting. Society of Exploration Geophysicists.
[8] Oropeza V. The Singular Spectrum Analysis method and its application to seismic
data denoising and reconstruction. University of Alberta 2010
[9] Roodaki A., Bouquard G., Bouhdiche O. and others SVD-based Hydrophone Driven
Shear Noise Attenuation for Shallow Water OBS. In 78th EAGE Conference and
Exhibition 31 May 2016
[10] Saeed Baseem S. De-noising seismic data by Empirical Mode Decompositio

You might also like