You are on page 1of 5

2009 Second International Symposium on Knowledge Acquisition and Modeling

Model-Driven Data Mining in the Oil & Gas Exploration and Production
Xiongyan Li1,2, Hongqi Li1,2, Zhuang Wu3
1. State Key Laboratory for Petroleum Resource and Prospecting, Beijing, 102249,China
2. Well Logging Research Center, China University of Petroleum, Beijing, 102249, China
3. Information College, Capital University of Economics and Business, Beijing 100070, China
e-mail: wangliaoziji@126.com
Abstract—Data mining is not an autonomous data-driven tools. To this end, some automatic methods, algorithms
trial-and-error process, but a human-machine-cooperated and tools are produced without human involvement and
interactive knowledge discovery process. As a result, the the capability to adapt to external environment constraints.
domain-driven data mining is proposed. Additionally, there Meanwhile, mining patterns are often unpractical to
are lots of models existing in oil and gas exploration and business even though they are interesting. There are huge
production, such as geological models, logging constrained gaps between the academia interest and business reality.
seismic inversion models and well logging interpretation During the second phase, with the development and
models, which contain the multifarious domain knowledge. success of methods, algorithms, and tools of data mining,
This paper proposes model-driven data mining in the oil and many researchers became interested in discovering the
gas exploration and production, with the purpose of mining actionable knowledge to fill the gap between academia
actionable knowledge benefiting the exploration and and business. Actionable knowledge discovery is
production of oil and gas. Main ideas of the model-driven significant and also very challenging. It is nominated as
data mining methodology are introduced. Guided by this one of Grand Challenges of KDD in the next ten years[4-5].
methodology, we demonstrate some of our work in mining Simultaneously, what is the perfect data mining tool,
four types of data, including petrophysical data, logging interactive or automated? Researchers began to discuss
data, seismic data and geological data. Real work of the role of human involvement in the data mining
model-driven data mining has shown that our methodology process[6]. Subsequently, the domain-driven data mining
is practical and potential for deeply analyzing data in the was proposed. In another word, the data mining is not
exploration and production of oil and gas. only a data-driven trial-and-error process, but highly
Keywords—data mining; model-driven; domain-driven; domain-dependent[7-8]. More specifically, the data mining
petrophysical data; logging data; seismic data; geological refers to domain expertise and constraints in a
data; human-machine cooperation context. In addition,
researchers start to figure out the conceptual framework
of the foundations of data mining for the understanding of
1. INTRODUCTION
the nature of data mining and the scope of data mining
Along with the growth like the explosion of capacity methods[9-11].
of generating and collecting data, in order to gain useful In the 21st century, the ideal international and
information from the mass of data, the Knowledge domestic environments bring rapid and constant
Discovery from Data (KDD) was generated. KDD is the economic growth, as well as putting forward a new aim
nontrivial process of identifying valid, novel, potentially for the exploration and production of oil and gas. On one
useful, and ultimately understandable patterns in data[1]. hand, oil and gas resources are relatively rich in China,
Data mining emerged during the late 1980s, and was but the conditions of remaining oil and gas resources are
treated as a synonym for KDD by many people. deteriorating and the proportion of low-grade resources is
Alternatively, others view data mining as simply an increasing, with the enhanced hidden performance of
essential step in the KDD process[2-3]. The development of remaining resources. So it is difficult to explore and
data mining in the last two decades can be divided into product remaining oil and gas resources[12-13]. At the same
two phases. In the first phase, researchers were more time, it comes up with a higher demand for the methods
concerned about the data mining methods, algorithms and

978-0-7695-3888-4/09 $26.00
$25.00 © 2009 IEEE 20
DOI 10.1109/KAM.2009.173
and ideas of the exploration and production of oil and gas. characteristics of petrophysical data, logging data,
More specifically, there is an urgent need for new seismic data and geological data.
technologies to guide the exploration and production of
oil and gas resources. On the other hand, it shows a
2. MODEL-DRIVEN DATA MINING IN THE OIL & GAS
rampant growth in various types of data due to the oil and
EXPLORATION AND PRODUCTION
gas exploration and production in decades. However,
these data are not fully utilized. That is to say, the 2.1 Fundamental Concepts and Mining Ideas
important information has not been mined. Consequently, Generally speaking, data mining tasks can be
the data mining should be applied to mining the data from classified into two categories: descriptive and predictive.
the oil and gas exploration and production mainly Descriptive mining tasks characterize the general
including the petrophysical data, logging data, seismic properties of the data in database, and predictive mining
data and geological data etc. As a result, data mining is tasks aim to predict characteristics and types of unknown
applied to reservoir management and description, to data on the basis of features and types of current data.
improve recovery ratio, fine exploration, and geoscience Wang et al asserts that data mining is a process of
data processing etc[14]. Due to a variety of models existing knowledge transformation[8]. In addition, the existent
in the oil and gas exploration and production, including studies of data mining can be classified broadly under
the geological models, logging constrained seismic three views: the function-oriented view, the
inversion models, well logging interpretation models, theory-oriented view and the procedure/process-oriented
which contain the abundant knowledge of oil and gas view[9]. At the same time, a three-layered conceptual
exploration and production. With the purpose of making framework is proposed by Yao, consisting of the
better use of data mining to contribute to the oil and gas philosophy layer, the technique layer and the application
exploration and production, domain knowledge and layer[10]. MDDM is based on the data mining theories and
existent model, should be absorbed into the mining techniques, and takes advantages of the model deriving
process, namely Model-Driven Data Mining (MDDM). from the domain knowledge, as derived parameters and
This paper tries to introduce the MDDM path and display boundary conditions. The aim is to more effectively
the mining processes and results of mining simple data obtain actionable knowledge, and the specific mining
type and various data types, according to the theories and ideas have been shown in Fig. 1.

Figure1. Theories and ideas of MDDM

In the oil and gas exploration and production domain, order to yield the uniform data platform, such as the
the data volume could be divided into two categories. The transformation and integration of petrophysical data,
first category is the same type of data, such as logging data and well test data, or logging data, well test
petrophysical data, logging data, seismic data and data and seismic data. The aim of integration and
geological data etc.; the other category is various sorts of transformation is to get derived or model parameters.
data. For example, the same geological body, different Selected parameters are defined as that with the purpose
types of data should be transformed and integrated in of mining, we use the cluster analysis, association rule

21
and feature selection methods to obtain the typical properties data of core including porosity, permeability,
characteristics and intrinsic links from data. The aim is to fluid nature and fluid saturation etc. As conventional
get more sensitive and crucial feature parameters as the reservoirs which are easy to be exploited have became
core parameters to build the model. The construction of depleted, and subtle reservoirs gradually become the
model utilizes the classification and induction methods primary target, the study about the core physical
consisting of Decision Tree(DT), Bayesian Network(BN), properties has developed from single measurement and
Support Vector Machines(SVM), Artificial Neural statistical analysis to communication and conversion
Networks(ANN), Genetic Algorithms(GA), Rough among the parameters of physical properties of core. With
Set(RS) and Fuzzy Set(FS) etc. For the initial model, we the exception of studying the changing law of physical
can use three indexes including domain knowledge, properties under static conditions, the dynamic variation
intelligibility and accuracy to evaluate its efficiency. rule controlled by artificial factors is becoming the main
Domain knowledge would be applied to determining the focus. Therefore, the goals and tasks of mining
extent of contribution of model built to the mining task. petrophysical data can be concluded as the following
After the test, we can clearly be aware whether the model three categories: (1) transformation or prediction among
built can effectively obtain interesting and actionable the different parameters of physical properties; (2)
knowledge in the oil and gas exploration and production. predicting the fluid nature; (3) changing rules under static
Intelligibility refers to whether parameters used in the and dynamic conditions.
model built have a solid physical meaning, and if the 2) Logging Data
parameters combination only is an ordinary numeric Although the logging data are less precise and direct
combination. Accuracy of model built means the degree than the petrophysical data, they can accurately reflect the
of optimization, and the capacity of describing and formation and fluid information from different angles and
predicting the characteristics and types of current and have a more general meaning to the formation evaluation
future data. Model updating based on the conclusion of and recognition. Consequently, the most prominent aim of
model evaluation relies on the practical reservoir mining logging data is to obtain the optimal model
characterization and parameter significance to improve predicting fluid nature and recognize the oil and gas
the model. In accordance with the results of practical reservoir in the exploration and production. For a long
application, the model application could be aimed at time, the lower limit of reservoir is determined by the
obtaining the verification and predictive analysis of traditional cross plot, and its effect depends on human
model. At the same time, the well test data would be beings’ carefulness and comprehensive consideration.
combined to optimize the model in the practical Additionally, the restriction of mindset and too much
application process. experience may cast huge influence on the lower limit of
2.2 Mining Simple Data Type reservoir. What’s more, geological conditions are so
complicated that the difficulty to determine the lower
In the oil and gas exploration and production, the data
limit of reservoir is growing geometrically. Needless to
mainly is composed of petrophysical data, logging data,
say, it is difficult to obtain the lower limit of reservoir in
seismic data and geological data. In addition, well log
the cross plot, and two or many times cross plot is in need.
data, drilling data and well test data are also included.
Therefore, the method has some shortcomings, such as
From the perspective of data, their resolution is from high
heavy workload, large errors and so on. Fortunately, the
to low and accuracy is from accurate to rough; from the
data mining techniques are based on the strict
point view of described geological body, the information
mathematical logic and pick out the most accurate rate of
of geological body is from small to large.
parameters combinations as the lower limit of reservoir
1) Petrophysical Data
through the exhaustive and different parameters’
The research methods of physics, namely, observation,
combinations, at the same time, the combination of the
experiment, induction and conclusion are used to acquire
lower limits of different fluids nature is the predictive
the petrophysical data, for instance, the physical

22
model. techniques to fuse the isogram images of thickness and
3) Seismic Data gas generation ability of source rocks, its result is shown
Compared with petrophysical data and logging data, in Fig. 2. We can get the relationship between thickness
the seismic data is poor in accuracy and resolution, but and gas generation ability of shale source rocks from Fig.
can obtain large and rich formation information. With the 2, which offers more accurate and reliable information for
emergence of three-dimensional and four-dimensional selecting gas accumulation zone and evaluation as well as
seismic data, seismic data can describe changes of interpretation analysis.
hydrocarbon reservoir under static and dynamic
conditions from the space angle, and have a strong ability
to describe the characteristics of hydrocarbon reservoir.
Due to mass and complicated structure of seismic data, its
mining methods and ideas are extraordinarily different
from petrophysical data and logging data. Furthermore,
the seismic data mining is not a purely mathematical
problem and more specifically, the simple mining
algorithms can not obtain actionable knowledge and
effective results to benefit the oil and gas exploration and
production. Therefore, it is essential to set up more
Figure 2. Fusing thickness and gas generation ability of source rocks
boundary conditions from the domain knowledge and
logging constrained seismic inversion model angel, and to 2.3 Mining Various Data Types
design the corresponding algorithms according to special
Petrophysical data, logging data and seismic data are
data structure in order to achieve the desired mining
all metrical data, but geological data are mainly empirical,
objectives.
descriptive and diagrammatic data. Therefore, mining the
4) Geological Data combination of different types of data mainly is to mine
Geological data is in forms of symbols, words and the data volume of fusing petrophysical data, logging data
graphs to express sedimentary facies, structural feature, and seismic data. Because of particularity of geological
distribution of source rocks thickness and hydrocarbon data, it is regarded as the object of reference and
generation ability and so on. For the different types of validation. The characteristics of different types of data
geological data, it is essential to pick out corresponding are combined, which are complementary in precision and
mining methods and ideas to acquire valid information extent, horizontal and vertical aspects of formation. The
and achieve mining objectives. For the symbols and combination of different types of data benefits the
words of geological data, descriptive mining and text establishment of precise formation framework.
mining would be applied to reveal the deep meaning Additionally, the mining process in line with the changes
behind them. For the graphs of geological data, graph rule of geological body is carried out for the data volume
mining would be used, and image fusion techniques are of fusing different types of data.
the main methods. Compared with petrophysical data,
logging data and seismic data, the mining results of
3. CONCLUSIONS AND FUTURE WORK
geological data mainly are descriptive consequences.
Through extraction, transformation and integration of The concept of MDDM in the oil and gas exploration
mining algorithms, geological information is expressed in and production is proposed in this paper. From the
a more intuitive and easier to be accepted form with perspective of model-driven, the extent of model, which
symbols, words and graphs, which show sedimentary and is inclusive of the domain knowledge, taking part in the
structural characteristics of formation. mining process plays a vital role in determining the
With graph mining methods and image fusion reliability and practicability of mining outcome. If the

23
excessive domain knowledge is involved in the mining Transactions on Knowledge and Data Engineering, Vol. 8,
process, the mining results will bring more inaccuracy, No. 6, pp. 866-883, December 1996.
which will affect the reliability. However, if the fewer [4]L.B. Cao and C.Q. Zhang. Domain-Driven Actionable
domain knowledge participated in the mining process, the Knowledge Discovery in the Real World, W.K.Ng,
more useless information will be manufactured, which M.Kitsuregawa, and J.Li(eds.): PAKDD2006, LNAI3918,
will restrict the practicability of mining fruits. Therefore, pp. 821-830, 2006.
it is essential to rationally combine domain knowledge [5]Z.Y. He, X.F. Xu, and S.C. Deng, Data Mining for
with mining methods. Only in this way can we obtain Actionable Knowledge: A Survey, Technical Report:
more outcomes with higher speed, better quality and less arXiv:cs/0501079, 2005.
cost. [6]Panel members. The Perfect Data Mining Tool:
Data mining techniques change the status of data from Automated or Interactive?, in Panel at SIGKDD-2002,
indirect and auxiliary means to direct and major methods. Edmonton, Canada, 2002.
Furthermore, they can directly guide oil and gas [7]L.B. Cao, L. Lin and C.Q. Zhang. Domain-Driven
exploration and production, namely, “data exploration” In-Depth Pattern Discovery: A Practical Methodology,
and “data production”. The contradictions of supplies and Australian Data Mining Conf, 2005.
demands of oil and gas are increasingly prominent, which [8]G.Y. Wang and Y. Wang. Domain-Oriented
force us to find internal, unknown links and distribution Data-Driven Data Mining: A New Understanding for Data
rules of remaining oil and gas reservoirs. At the same Mining, Journal of Chongqing University of Posta and
time, this need will stimulate the booming development Telecommunications (Natural Science Edition): 20(3), pp.
of oil and gas data mining and promise a splendid future 266-271, 2008.
for the oil and gas exploration and production, especially [9]Y.Y. Yao, N. Zhong and Y. Zhao. A Conceptual
providing news methods and ideas for the exploration and Framework of Data Mining, Studies in Computational
production of subtle reservoir. Intelligence (SCI) 118, pp.501-515, 2008.
[10]Y.Y. Yao, N. Zhong and Y. Zhao. A Three-layered
Conceptual Framework of Data Mining, Proceedings of
ACKNOWLEDGEMENT
ICDM’04 Workshop of Foundation of Data Mining,
This work is partially supported by a grant from the pp.215-221, 2004.
National Key Technologies R & D Program of China [11]Y.Y. Yao. A Step Towards the Foundations of Data
during the 10th Five-Year Plan Period Mining, Data Mining and Knowledge Discovery: Theory,
(No.2001BA605A09). The authors would like to Tools, Technology V, B.V. Dasarathy(ed.), The
appreciate Penny for her kind assistance and supports. International Society for Optical Engineering, pp.
254-263, 2003.

REFERENCES [12]H. Qu, W.Z. Zhao and S.Y. Hu. Oil & Gas Resources
Status and the Exploration Fields in China, China
[1]U.M. Fayyad, G. Piatetsky-Shapiro and P. Smyth.
Petroleum Exploration, 2006(4): pp. 1-5.
From Data Mining to Knowledge Discovery: an
[13]J.P. Pan and Z.J. Jin. Potentials of petroleum
Overview, in: Advances in Knowledge Discovery and
resources and exploration strategy in China, Acta Petrolei
Data Mining, U.M. Fayyad, G. Piatetsky-Shapiro, P.
Sinica, 2004, 25(2): pp. 1-6.
Smyth and R.Uthurusamy(eds.), AAAI/MIT, Menlo Park,
[14]M. Stundner and J. S. Al-Thuwaini. How
CA, pp. 1-34, 1996.
Data-Driven Modeling Methods Like Neural Networks
[2]Jiawei Han, Micheline Kamber. Date Mining Concepts
can Help to Integrate Different Types of Data into
and Techniques, Second Edition, Beijing: China Machine
Reservoir Management, SPE68163, 2001.
Pressˈ2006.4.
[3]M.S. Chen, J.W. Han and Philip S. Yu. Data Mining:
An Overview from a Database Perspective, IEEE

24

You might also like