Professional Documents
Culture Documents
Tool
Niklas Borwell
Bo Lee
Klas Eurenius
Project in Computational Science: Report
Januari 2012
PROJECT REPORT
r informationsteknologi
Institutionen fo
Abstract
We have created a tool in C# that looks for geological structures in 2D
seismic images, with the main motivation in its large potential for time
and cost savings in the process of finding oil. Today, still, image screening
is done manually at the same time as the amounts of aquired geological
data grows faster than ever.
Using computer assisted image analysis to screen seismic images is a
largely unexplored field of research, so we apply well known image process-
ing and analysis theory. For image preprocessing this includes intensity
level thresholding, frequency filtering, mathematical morphology etc. For
analysis and classification we use skeletonization, distance transformation,
property measuring etc.
Considering this works as an early stage prototype, the results are
very promising. The produced software tool successes to find and classify
objects correctly, except for some problems which occur when different
image objects are tightly connected. Processing time for processing a
”standard sized” single image ranges from 0.2 s to 10 s depending on how
much information it contains, noting that this is for an AMD Athlon X2
@ 2 GHz and that the code is not optimized.
1
Acknowledgements
We wish to thank supervisor Sergio Courtade from Schlumberger Ltd. for
providing guidance and expertise in geology and, specifically, in seismic
image interpretation. We also thank course coordinator Maya Neytcheva
for making this project possible at all.
2
Contents
1 Introduction 4
1.1 Project description . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Confusing terminology . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 5
2.1 Seismic data format . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Computer aided search . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Image interpretation . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Methodology 7
4 Theory 7
4.1 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.2 Threshold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.3 Closing operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4.4 Bounding box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4.5 Skeleton representation . . . . . . . . . . . . . . . . . . . . . . . 9
4.6 Constrained distance transform . . . . . . . . . . . . . . . . . . . 9
4.7 Elongatedness measure . . . . . . . . . . . . . . . . . . . . . . . . 10
5 Implementation 11
5.1 Mark interesting areas . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2 Merge areas into objects . . . . . . . . . . . . . . . . . . . . . . . 12
5.3 Classifying objects . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.4 Vector representation . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.4.1 Skeletonization . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4.2 Path selection . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.4.3 Path sampling . . . . . . . . . . . . . . . . . . . . . . . . 17
5.5 Separation problem . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6 Results 19
6.1 Main images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Other images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.3 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 Further development 22
7.1 3D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.2 Use more data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
7.3 Complementing segmentation methods . . . . . . . . . . . . . . . 24
7.4 Employ an advanced classification tool . . . . . . . . . . . . . . 24
7.5 Parallelization and optimization . . . . . . . . . . . . . . . . . . . 25
8 Conclusion 25
9 References 25
3
1 Introduction
As it becomes increasingly difficult to find profitable deposits of oil to explore,
the accuracy of the information regarding promising geological sites becomes
more and more important. Huge amounts of seismic data are collected every
week but the manpower and time needed to analyze all the data manually
imposes a huge cost. This means that the need for a computerized automated
analysis tool is in great demand.
4
Object refers to an image object, meaning a feature in the image that has
been segmented (see Section 4.1). Geological objects are the actual physical
objects represented in the unprocessed images such as channels, deposits and
similar features that are interested to locate.
Channel will mainly refer to long elongated geological objects, like old riverbeds
or valleys. Sometimes channel will refer to an information channel from the
measuring equipment that was used for a particular picture. For example, the
data we have used is the acoustic amplitude reflection channel in the seismolog-
ical measuring equipment, or it might refer to the red color channel in a RGB
image. In these cases we clarify what kind of channel we refer to, and if we
don’t have a better name for it then it will be called “information channel”.
2 Background
2.1 Seismic data format
The subsurface is mapped to digital 3D volumes by sound waves - if onshore
by special vibrator trucks attached with an array of receiving geophones, if
offshore by boats with an air gun generating vibrations. The transmitted waves
are reflected back when hitting density boundaries of, e.g., changes in rock types
or fluid contents. Using this method the signal’s time of travel is closely related
to subsurface depth (Satterfield/Whiteley).
Upon interpreting, the recorded volume is often sliced vertically, horizontally
or along a ”horizon”. The latter represents a line or surface corresponding to
a density border. Though, for this project we will solely work with horizontal
slices which are more commonly known as ”time slices”. The data is colored
red and blue where high amplitude reflections occur (Figure 2).
• channels,
• deposits,
• structural reflections.
5
(a) 3D volume illustration with stacked time slices. The
time resolution of the original data set is 2 ms.
(b) Time slice where a channel can be seen. (c) Vertical slice of the 3D volume. Horizons
We work with this kind of images. are the red and blue lines running through
the image horizontally.
6
(a) Image with three different object types (b) Objects outlined. Red: deposit, green:
from our dataset. structural reflection, blue: channcel.
3 Methodology
This project is not based on or built on any previous work in this area, so we
build our tool using well-known image analysis algorithms programmed in high
level languages. In general the following is our method of development:
1. Use MATLAB and its Image Processing Toolbox to try different image
analysis methods. The Image Processing Toolbox offers a comprehensive
set of image analysis functions, and allows for a very quick startup phase
in the development process.
2. When we feel confident that we have found a working method, the MAT-
LAB code is migrated to C# code. For image processing and analysis we
use EmguCV which is a .NET wrapper to the Intel OpenCV image process-
ing library (for C/C++). While searching for suitable image processing
libraries we found that OpenCV is, arguably, a de facto standard in im-
age processing and computer vision implementations. EmguCV/OpenCV
does actually not do everything MATLAB does, which forces us to imple-
ment many algorithms ourselves.
The choice of using C# (instead of e.g. C) is solely because Schlumberger’s
software platform Petrel nicely allows for .NET plugins or apps. While our
tool is not a Petrel application, it had been considered in an early stage.
Furthermore, making it C# will facilitate any attempts to integrate it
within Petrel in the future.
3. Final evaluation of results and computation times. Even though speed is
not a part of our goals, the problem formulation has an inherent speed
concern - the tool needs to outperform human screening.
4 Theory
This section aims to explain certain image analysis concepts and methods used
in the implementation.
7
4.1 Objects
We often describe a binary image (i.e. an image with only two colors: black
and white) in terms of objects and background. An object in this context is
defined as a connected white area, while all black pixels are considered part of
the background. An example of this concept is shown in Figure 4.
4.2 Threshold
Threshold is the most common way to select parts of an image. In a normal
8-bit grayscale image, each pixel has a brightness value that ranges between 0
and 255. When we use a threshold, we simply select those pixels that have a
brightness value larger than a certain threshold. Then we create a new binary
image where the selected pixels are set to white and the rest set to black. An
example of this technique with different threshold values is shown in Figure 5.
By using color space transformations this can be extended to a more general
concept. By converting the image to a HLS (Hue-Lightness-Saturation) color
space, the saturation and hue will also be represented by a grayscale image.
So by applying threshold to for instance the saturation channel, we can select
saturated areas.
8
(a) Original image (b) Result of a threshold at 167
9
(a) Input image with three disconnected white areas
(b) Image after dilation. The white areas have been con-
nected
(c) Image after erosion. Notice that the white areas are
connected while the major shape of the original areas are
preserved
area
elongatedness = ,
(2d)2
10
(a) Original image (b) Image with bounding box in
red
5 Implementation
The input image used to demonstrate the steps of the program is shown in
Figure 10. The main algorithm consists of the following steps.
11
(a) Input image with a branching line (b) Result of constrained distance trans-
and blue pixel as a starting point form
5. Vector representation.
The algorithm uses four user defined search parameters, which makes it possible
to adapt the algorithm for different kinds of images:
• A merging parameter which defines how dense the interesting areas are.
• A size limit that defines how small objects we are interested in.
12
Figure 10: Original image. We want to find and mark the channel and the
deposit.
same properties (see Section 4.3). The result is therefore often a compromise
between the two. Possible solutions to this problem are discussed in Section 7.
The technique used to merge areas is a closing operation (see Section 4.3).
The size of the closing operation is controlled by a search parameter which
defines the minimum distance between two areas to be considered part of the
same object. The output of this step is a binary image consisting of objects and
background (see Section 4.1), see Figure 12.
13
Figure 11: Output of step 1. Interesting areas are marked in white.
with our own observations. A lot of improvement in this area is possible and
discussed further in Section 7. The output of this step is a list of interesting
objects with measurements and classification result (see Figure 13).
14
Figure 12: Output of step 2. Interesting areas are merged into objects.
5.4.1 Skeletonization
The first step is to create a skeleton representation of the object (see Section
5.4.1). The resulting image has a centerline running through the object along
with a number of branches (see Figure 14).
1. One of the endpoints of the centerline touches the bounding box of the
object (see Section 4.4).
2. The centerline is the the longest possible line in the skeleton that agrees
with 1.
15
Figure 13: Output of step 3. Uninteresting objects are removed and remaining
objects are classified.
16
Figure 14: Skeleton representation of a channel.
3. Select the point with the highest value found in the distance transform
images.
The traced path will be the assumed centerline of the object (see Figure 15).
17
Figure 15: Asumed centerline.
The most difficult problem of the algorithm appears in Section 5.2: Merging of
areas into objects. The different objects we wish to distinguish between usually
share the same appearance and properties which makes a direct classification
impossible. This is further complicated by the fact that they often overlap, see
Figure17. Different objects are sometimes more connected to each other than
objects are in themselves, which makes a perfect separation based on connect-
edness also impossible.
The issue comes from the fact that we are looking at 2D slices from a com-
plete 3D volume which sometimes even makes it hard for a person to distinguish
between different objects. Possible solutions for this problem are discussed in
Section 7.
18
Figure 16: Channel object with corresponding vector representation.
6 Results
In this section we consider the output of two different types of images. The first
we call ”main images” because the program development is almost exclusively
based on this type. The other image type is discussed in ”Other images” (Section
6.2) and we use it to demonstrate some flexibility of the tool. Last, a discussion
about speed is included.
19
Figure 17: Illustration of separation problem. The channel (orange arrow) is
strongly connected to the deposit (blue arrow) which is further connected to the
structural reflections (green arrows).
coincidental case because the correct channel path just happens to be slightly
longer than the second longest endpoint-to-endpoint path. Unfortunately this
can not be guaranteed to hold in the general case.
6.3 Speed
In the long run, beyond our prototype program, speed is of utmost importance.
Due to time constraints we haven’t optimized our code with respect to speed,
but we will here present some figures to show that doing automatic segmentation
and classification can become a huge benefit in future.
The computations are done on an Athlon X2 2.0 Ghz (2005) CPU, and the
code is not in any way optimized for this CPU type. A single original size
(557x475 pixels) image takes up to 10 s to process if there are objects present.
20
(a) Modified image where the different ob- (b) Unmodified image where all objects are
ject types have been separated. connected.
(c) Object 1 Class: channel, Elongated- (d) Channel and deposit are segmented as
ness: 18.4, Size: 43 · 103 , Length: 813. Ob- one image object. Note that the vector rep-
ject 2 Class: deposit, Elongatedness: 5.2, resentation is the same as in (a) because of
Size: 14 · 103 , Length: 813. longest path beteween endpoints.
On the other hand an ”empty” image takes only a fraction of a second to process
(see Figure 20).
We note that the images we are working with are high resolution compared
to the amount of information (useful for us) they contain. As the computation
time is linear to the image’s pixel count (see Table 1), we can make use of
downscaling in order to gain speed. As an example, our main image (Figure
18) segments the same even when downscaled to 25% of its original size. This
would have to be a rather dynamic algorithm because the width of channels
vary greatly.
Finally we state that these speed results show an obvious advantage of using
machines instead of humans for image screening. Also there are many options to
optimize the performance of our implementation even further, e.g. optimizing
for CPU, optimizing the image processing algorithms, and even changing to a
more modern computer.
21
(a) Input image. Alternative image type from Schlum-
berger.
(a) 4.5 s to process this image with objects (b) 0.3 s to process image with nothing to
present. classify.
Figure 20: Computation times for images with and without objects.
7 Further development
This chapter discusses some ideas we have for further development, some of the
issues we have not been able to solve yet and some ideas we have for solving
them.
22
Resolution (pixels) Relative size (%) Time (s)
278x237 25 ≈1
417x356 50 ≈2
557x475 100 ≈4−5
1114x950 200 ≈ 15 − 18
7.1 3D
Our program is developed to work on 2D data slices from a large 3D data set,
handling one image at a time segmenting and classifying the objects we think
are interesting and marking them before moving on to the next one.
This process is repeated for every slice in the dataset until the whole volume is
analysed.
The same methods that are used to analyse the 2D slices could be generalised
to work on the 3D volume. The reason for doing the analysis in 2D in the first
place is inherited from manual analysis. It is difficult for humans to see through
a 3D volume so the volume is sliced to allow it to be properly examined. This
limitation does not apply to computers.
We think that analysing the data set in 3D is preferable to 2D for several
reasons. The main reason is that having the whole 3D object gives us much more
information and would render otherwise quite difficult operations unnecessary.
When handling 2D slices we have to take into account that the seismological
objects we are looking for are not flat but stretched over several time-slices, mak-
ing it necessary to track the objects through the dataset in order to determine
how many different objects that are actually present in the volume.
Using 3D, this would become unnecessary as each object would be segmented
as a whole from the start. Having access to full 3D objects could also be
helpful when trying to differentiate between different geological objects that
are connected to each other as image objects for some reason.
We discussed in Section 6.1 the problem of having two different kinds of ob-
jects closely connected, with no image information that clearly separates them,
(Figure 17). This connection however is not present in all of the images in our
dataset meaning that a human analyst can with relative certainty say that those
are two separate objects. If we examine the 3D mock up (Figure 21), created
by interpolating our nine time slice dataset to give it a bit more volume, we can
quite easily see that the deposit, even though it is connected to the channel is
probably not a part of the channel. More importantly, it will be easier for the
computer to make this discovery. This kind of interpolation would of course be
unnecessary if the whole 3D volume is used.
Structural reflections are a major problem when processing and analysing 2D
slices (see Section 2.3), handling the data volume in 3D will ease this problem.
23
Figure 21: 3D mock up of the channel and the deposit. In this image it is
possible to see that the deposit (blue area on the right) is probably not a part
of the channel.
24
Principal component analysis (PCA) can be used in conjunction with as
suitable classifier to create a powerful classification tool. PCA transforms the
parameter space into an eigenvector space based om maximum variance along
each eigenvector. This transformation reduces the number of dimensions to
consider when classifying a dataset, making it easier and faster to do so (Jolliffe
2002, 1-6).
We mentioned “suitable” classifier above instead of naming one. This is
because different methods are suitable for different types of data. Before a clas-
sifier is chosen we need to know very well how the measured data is distributed.
A supervised classifier is probably the best choice though as it can be tailored
to the data. This means that the classifier needs to be taught how to behave,
and for that end a large quantity of data is needed.
8 Conclusion
We have met the goals we set at the beginning of this project. We have created a
prototype program that can segment, detect and, to some degree, classify inter-
esting objects in seismic images. The program is tuned to work on color coded
acoustic amplitude reflection images but can be adapted to different images by
changing a few parameters.
Two major concerns are still present that we have not been able to solve
in the scope of this project. These are separation of geological objects that
should not be connected and an advanced classification tool. The program is
by no means a complete analysis tool and there is a lot of room for further
development. We have given a few suggestions to this end in this report.
9 References
Dorothy Satterfield, Martin Whiteley. Seismic Interpretation and Subsurface
Mapping. http://www.derby.ac.uk/files/seismic interpretation.ppt (January 12,
2012)
Jolliffe, I.T. 2002 Principal component analysis. 2nd Edition. New York:
Springer-verlag New York Inc.
25
Sonka, Milan, Hlavac, Vaclav och Boyle, Roger. 2008. Image processing, Anal-
ysis, and Machine Vision. 3rd Edition. Stamford: CENGAGE Learning
26