You are on page 1of 8

SPE 122186

Selection of Infill Drilling Locations Using Customized Type Curves


A. Al-Kinani, G. Nunez, M. Stundner, G. Zangl, O. Iskandar; SPE, Schlumberger; T. Mata, S. Cottone, J. Cavero;
SPE, YPF

Copyright 2009, Society of Petroleum Engineers

This paper was prepared for presentation at the 2009 SPE Latin American & Caribbean Petroleum Engineering Conference, Cartagena, Columbia, 2009, 31 May - 3 Jun 2009

This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents of the paper have not been
reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect any position of the Society of Petroleum Engineers, its
officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written consent of the Society of Petroleum Engineers is prohibited. Permission to
reproduce in print is restricted to an abstract of not more than 300 words; illustrations may not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.

Abstract

This paper discusses a new workflow to stochastically estimate the performance of infill locations in a mature oil or gas
field. Usually performance evaluations for infill wells are conducted using either much generalized statistical methods or
numerical simulation. Both approaches have a significant drawback; the prior being quick however very often lacking in
accuracy, the latter being very accurate however usually very complex in setup and computation.
The presented workflow is a new approach to infill well performance prediction that combines speed and reasonable
accuracy. The workflow generates a set of key performance indicators of existing wells derived from historic dynamic data
(fluid production rates, pressures, etc.), static data (reservoir properties, etc.) and predicted data (simplified production
forecasts). The wells are then grouped according to the similarity of their KPIs. The production profiles of the wells within the
same group are combined to a type curve that is described by the most likely production profile and an associated uncertainty
range.
A data-driven expert system is used to identify and capture the correlations of the parameters such as geographic locations,
well spacing, reservoir properties and the group membership (equivalent to type curve). This expert system can then be applied
to any location in the field in order to determine the most likely group membership of a potential infill well. The classification
of an infill well to a group is hereby not necessarily unique; the expert system might classify an infill well into several groups
and assign a probability of occurrence for each of the groups. A Monte Carlo routine is then applied to forecast the
performance of the infill locations honoring the respective probability of occurrence of each type curve.
The presented approach has been successfully applied for infill well selection in a statistical field development study for
YPF in the Argentinean San Jorge Basin.

Introduction

Despite the long production history with more than 20 000 wells, the current methodology to make decisions still takes a
long and costly path consisting of testing, plugging and stimulation procedures. The authors believe that, in parallel to the
reservoir modeling studies, YPF can make use of the immense array of well data and the wisdom acquired over one century of
production history, to produce a set of innovative practices to boost the efficiency of their current operations.
Therefore a statistical approach based on the power of emerging computing tools for data mining can assist YPF to
recognize patterns and develop methodologies which are strong enough to tackle the aforementioned technical challenges,
transforming these opportunities in real bottom line results.
The goal of the study is to generate sound technical arguments to formulate an innovative strategy to accelerate the
exploitation of oil and gas assets of YPF in the San Jorge Basin, Argentina. The workflows of interest are
• identify candidates for infill drilling locations,
• propose a field development strategy based on lessons learned from the past in order to know the size of the
business from an economic point of view
• identify benefits through optimizing infill locations using data mining methodology.
The objective of this paper is to present the methodology and results obtained during the analysis phase as value promise
for field development.
2 SPE 122186

Workflow overview

The objective of the workflow is to select the best infill locations in the study area using an integrated data mining
workflow. The main focus is to systematically investigate past production performance and use identified trends to predict the
future performance of the existing wells and of infill locations.
A dataset of a study area was provided. The dataset contained mainly information about oil, gas and water production as
well as the water injection volumes. Only very little petrophysical information was available with no representative aerial
trend. The presented workflow takes advantage of the available production data and systematically investigates past
production performance using statistical indicators as well as significant key performance indicators (KPI). The predicted
future performance is determined with a simplified well forecast based on a hyperbolic decline curve, leading to a set of future
performance related KPIs. These two sets of KPIs are combined in an expert system to draw quick decisions about the future
development potential in a certain area of the field. The result of this workflow is a list of infill locations and their predicted
performance. As a concluding point in this workflow a comprehensive reasoning logic will score each infill location according
to its performance considering the expert knowledge from the engineers who operate the field.

Data Screening and Outlier Identification


The first step was to analyze available production and injection data to identify statistical patterns and outliers. Static key
performance indicators (KPIs) to describe the past performance of each well were determined, such as e.g. ‘initial oil rate’,
‘cumulative oil production (after a certain production period)’, ‘cumulative liquid production (after a certain production
period)’, etc. The existing wells were subdivided into several peer groups according to the area and according to the time
interval in which they started production. The KPIs of the wells were hence not only investigated and compared on a fieldwide
level but were also categorized in peer groups that should show similar behavior.
The statistical approach of this workflow requires a dataset that is as consistent and free of outliers as possible. Many of the
tasks in this workflow require aerially interpolated data. Therefore outliers would significantly falsify the interpolation results
and might lead to wrong conclusions. Several statistical tools were used to spot outliers in the dataset. Cross-plots, cumulative
frequency distribution plots as well as histograms were used for an initial outlier screening. At a later stage data driven tools
such as self-organizing maps (SOM) and clustering algorithms were used to finalize the outlier search (Zangl, 2003).
Figure 1 shows a histogram of the ‘initial oil rate’ KPI in a certain area of the field. The categories represent the magnitude
of the oil rates (category one represents the lowest, category ten the highest oil rates). In the depiction it becomes clear that the
wells in that area show a lognormal distribution of the initial oil rate. The wells in category nine can be identified as outliers as
they are not necessarily covered by the lognormal curve.

Initial oil rate


Initial oil rate
0.2

0.18

0.16

0.14

0.12
Frequency
frequency

0.1

0.08

0.06

0.04

0.02

0
1 2 3 4 5 6 7 8 9 10
Category
category

Figure 1: Histogram of initial oil rates in a certain area of the field. Categories represent oil rates, where category 1
represents lowest oil rates and category 10 the highest.

In Figure 2 a map of the study area can be seen. The color represents the different time intervals in which a well came on-
stream. The size of the bubbles represents the initial oil rate (the bigger the bubbles, the higher the oil rates). Singularities such
as extremely good wells in regions where performance is rather poor can be identified and further analysis of the production
can be initiated.
SPE 122186 3

Y-coordinate

X-coordinate
Figure 2: Geographic bubble map depicting the time interval (color) and the initial oil rates (size)

Classification of wells
As mentioned before the performance of each well is described by two sets of KPIs; the historic KPIs calculated out of
historic production information and the future KPIs calculated out of predicted performance. The predicted performance is
determined using decline curve analysis.
Both KPI sets combined give multiple parameters to describe the performance of every well. After a statistical analysis a
subset of 9 KPIs has been identified that have the biggest influence in the classification of each well. A multidimensional
clustering algorithm (SOM) is used to sort the data according to their features.

The 15 clusters, which have been detected in this analysis therefore represent groups of similar wells, which means that the
SOM has significantly reduced the used data amount from more than 700 individual wells to 15 groups of similarly behaving
wells. Each of these 15 groups is clearly defined by particular features that vary significantly from one group to the other (e.g.
significantly different initial oil rates, initial water cut, etc.), however vary only very slightly for the wells within a particular
group. Figure 3 shows the result of the clustering process in a cross-plot. The plot shows the hyperbolic exponent from the
decline curve analysis (predicted KPI) vs. the initial oil rate (historic KPI) for four different clusters. The color of the points
represents the cluster membership. As can be seen the clustering has grouped the wells into bins of similar measurements.
4 SPE 122186

Hyperbolic Decline Exponent

Initial oil rate

Figure 3: Cross-plot of a predicted KPI vs. a historic KPI; color = cluster membership

This feature generalization imposes a stochastic approach to manage the whole population of wells in a particular group. In
contrast to a conventional aproach, where each well is processed individually with its respective deterministic parameters, the
generalized technique implies that the parameters for each well are defined stochastically through the group to which this
particular well belongs to.
This requires a statistical analysis of the population of each group. The distribution of the well parameters within each
group was checked for outliers, confidence intervals and consistency and the mean and standard deviation were determined
(Figure 4). From this point in the workflow onwards the individual well data are not considered anymore but only the
probability distribution parameters of the group to which this well belongs to are used in the remaining steps.
Hyperbolic Decline Exponent

Initial oil rate


Figure 4: Statistical analysis of each cluster

Type curve selection


When investigating the cluster membership values in a geographic map, it becomes clear that even though cluster
memberships are varying in the whole field, there will be regions where wells of a certain cluster are prevailing. Also it will be
apparent that some clusters only appear in certain areas of the field but do not occur anywhere else. Referring to the statistical
analysis of the clusters, it becomes clear that in certain areas in the field it is more likely for a well to be in a cluster with good
statistical KPIs than in clusters with bad statistical KPIs.
SPE 122186 5

A technique has to be applied to describe the relation of location, time when a well has been drilled (here referred to as
‘vintage interval’) and the initial spacing with the cluster membership. A Bayesian Network is used to approach this task.
Formally a Bayesian Network is a probabilistic model that represents a set of variables and their probabilistic
interdependencies. The interdependencies can either be entered by an expert – as used in various expert or troubleshooting
systems – or a learning algorithm can infer and quantify the interdependencies between the input and output parameters from a
provided training data set. In the discussed workflow the latter approach is used.

Figure 5: Quantification of aerial type curve probability using a Bayesian Network approach

As can be seen in Figure 5 a Bayesian Network is initially set up with four input parameters: x-coordinate, y-coordinate,
spacing and vintage interval. The output parameter is the type curve distribution. The training process investigates and
quantifies the causal dependencies between input parameters and type curve cluster. The training algorithm modified the
conditional probability tables of the given Bayesian Network according to the observations in the field. The Bayesian network
was then able to quantify the probability that a well at a certain location, drilled at a certain time, would perform according to a
certain type curve (e.g. in the depicted example the investigated location would have a 43.3 % chance of performing like a
cluster 4 well, a 22.4 % chance of being a cluster 8 well, etc.). Having the type curve selector in place it is possible to infer the
most likely characteristics of a new well drilled in any location in the reservoir.

Probabilistic Forecasting using Monte Carlo Loop


The probabilistic information from the Bayesian type curve selector has to be treated probabilistically in all further
workflow steps. Therefore in order to forecast a well’s production probabilistically a Monte Carlo loop is set up that processes
the information from the Bayesian network and the SOM accordingly. A random number generator in the Monte Carlo loop
algorithm is used to select an arbitrary SOM cluster in each iteration loop, according to the outcome of the Bayesian
Network. Every parameter describing the normal distribution (mean and standard deviation) for the selected SOM cluster is
retrieved and used to create a production forecast. A random value for every forecast parameter is then picked, honoring the
respective distributions. The value has to lie within the range described by the normal distribution density functions and the
frequency by which a value is picked has to correspond to the probability density function as well. The forecast model (in this
case: hyperbolic decline curve) is then used with these three forecast KPIs to calculate the forecast for a particular well. This
loop is repeated multiple times to create multiple realizations of the forecast. The mean and standard deviation for the
forecasted oil rate at every time step is then determined and reported as a result.
Again there is a set of predefined KPIs such as ‘initial oil rate’, ‘estimated ultimate recovery’, etc. that describes the
performance of the predicted location. The forecast is created with a Monte Carlo routine. Therefore the results are not
deterministic but stochastic and hence the spread of the calculated oil rates from the various Monte Carlo realizations contains
information about the forecast uncertainty. This uncertainty is also considered an important KPI in the assessment of an infill
location. A location that has a predicted oil rates and other good performance KPIs can still be a poor drilling target if the
spread of possible oil rate outcomes is not acceptable.
6 SPE 122186

Model Validation
Once the clustering algorithm and the cluster quantification process is set up, a certain number of wells are picked to
calibrate and blind test the process. These wells have not been part of the model setup. In the model validation step the
production performance at their locations is predicted using the full forecasting workflow. The stochastic result for the
production performance (especially the P30, P50 and P70 lines for oil production rates and cumulative oil production) are
plotted and compared against actual historical values. In case of a general under- or over performing trend, the model has to be
reviewed. This could result in different peer grouping of the wells or an adapted forecasting method.

Figure 6: Model validation. left plots: oil rate actual (green) vs. predicted P30, P50, P70 (blue lines); right plots
cumulative oil production actual (green) and predicted P30, P50, P70 (blue lines)

Expert system
The candidate screening is performed using an expert system based on a Bayesian Belief network (BBN). The BBN can be
applied to describe and reconstruct a complex decision process involving multiple parameters under uncertainty. The various
input parameters in a BBN can either be conditionally independent – hence, not having any influence on each other or
conditionally dependent. In the latter case prior knowledge of the dependency of the various parameters has to be quantified
and entered in the BBN. This is either done using a Bayesian learning approach or manually through an expert (Mitchell 1997,
Korb 2004).
The input parameters in a BBN can either be continuously measured variables or discrete variables. Each parameter can
also be entered stochastically, considering uncertainty in the measurements or reasoning process. The result of a BBN is a
decision, which again is represented stochastically considering the inherent uncertainty of the variable measurements and
decision process.
BBNs are applied in various disciplines such as troubleshooting of computer hardware problems, medical diagnosis,
speech recognition, credit fraud detection or spam mail filtering.
In this study a BBN was set up purely by experts to reproduce their reasoning process, considering the various aspects of
their decision such as economic, logistic and reservoir considerations (Neapolitan 2004). The outcome is a score between 0
and 100 that describes the well’s probability of being a good producer, 100 being the best.
SPE 122186 7

Figure 7: Reasoning system represented in a Bayesian Network

There are 4 decision levels involved in the reasoning process:


1. Level 1 – Current location Production – node group “GoodForecast”: This stage of the process ranks the locations
through the production profile generated by the data mining model. Three indicators were selected: ‘cumulative
oil production’ at the end of the first and third year and the ‘initial oil rate’. The output of the first level is the node
“Good Forecast” in the Bayesian Network, which combines the three production indicators to get a first score of
the locations. This score is then input to the second decision level and the process continues until the fourth and
last level is reached.
2. Level 2 – Current Location Production vs. Existing Well Production – node group “Check_OK”: The idea of this
level is to weight results in terms of how good the production forecast fits the surrounding wells production
performance. The ratio of the initial rate and the cumulative oil production at the end of the third year are
computed for locations and existing wells, as mentioned in level 1 for the new locations P50 forecast were
considered.
3. Level 3 – Area Review – Drainage and Water Cut – node group “GoodArea”: The idea of this decision level is to
include knowledge related to drained areas and water production to the ranking matrix. The purpose of this
decision step is to privilege locations placed in areas with less risk of producing water and of finding depleted
zones. Indicators in this case were assigned to each location by averaging the last water cut and the total oil
production of the wells in the surrounding area (radius of 400 m).
4. Level 4 – Economics – node group “Score”: Finally the Net Present Value is used as economic indicator to
consider the ranking process. A distribution plot similar to the one shown in level 1 was used to establish the
interval limits.
The final result is a score between 0 and 100, 0 meaning that, according to the defined criteria, the given location is not a
feasible candidate, 100 meaning that the current location fulfills all requirements for a good drilling location.

Case Study

A case study has been set up to validate the approach and to estimate the business impact of this forecasting routine. A
study area was picked to investigate how the field development would have been done differently if this forecasting model
would have been available before. Three cases were investigated and compared:
1. “Base case”: the last seven wells that have been drilled in the study area were determined. These seven wells have
been left out of the model set up process. The forecasting model was set up and the performance of the seven wells
was predicted. The results of the forecasting workflow were cross-checked with the actual performance to validate the
applicability of the model (‘blind test’). Two of the blind test results have been depicted earlier in Figure 6. It was
agreed that the model is accurate enough to proceed.
2. “Same number of wells”: a virtual grid of infill locations was created. The locations fulfilled several requirements
(e.g. minimum spacing to neighbor wells, maximum distance from edge of the field, etc.). The forecasting workflow
was automated in order to predict the performance of every virtual location and the reasoning system was then
applied to score every location. The top seven locations were selected as the picks of the forecasting workflow.
3. “More wells”: this scenario investigates a case where a significant higher number of wells can be drilled (in this case
23 locations were investigated). Again, a virtual grid with wells was created and every location was forecasted and
scored according to the previously described workflow.
8 SPE 122186

A comparison of the three scenarios can be seen in Figure 8. It can be seen that even with the same number of wells a
significant higher cumulative production can be achieved in the first 10 years. A significant increase in the number of wells
drilled logically also leads to a significantly higher recovery. However, it is interesting to note that the performance per well
decreases after a certain number of wells, thus leaving each well with approx. 5 % less recovery in the “More wells” case
compared to the “Same wells” case.

Figure 8: Case comparison – cumulative oil production base case (purple), same wells (black) and more wells (orange)

Conclusions
A predictive tool to forecast the performance of infill wells is presented to YPF showing that it can bring numerous
advantages and new opportunities to the asset:
• As an advisory expert system this methodology integrates a degree of expertise that was not available to the asset
team before. The methodology can consistently handle large amounts of data and can be customized to the
standards of the asset team. Also the turnover of experienced personnel can be approached with an advisory tool
like that because it captures the knowledge from experts and lessons learnt from previous decisions.
• Due to the streamlined workflow the time to find a ranked list of infill drilling locations is reduced to
approximately three to four weeks. The conventional approach could take several weeks or even several months
longer. The current approach saves YPF a significant amount of resources (workforce and time).
• Since several hundred infill locations can be forecasted very quickly YPF can now perform sensitivity studies to
optimize their infill drilling campaign in a short time period. YPF can hereby formulate the objective function for
the optimization process to maximize oil recovery or the NPV of the field development plan.

Acknowledgement
The authors would like to thank YPF and Schlumberger for the permission to publish the present paper.

References
Arps, J.J.: “Analysis of Decline Curves,” Trans. AIME (1945), 160, 228-247
Ghoraishy S.M., Liang J.T., Green D.W., Liang H.C., “Application of Bayesian Networks for Predicting the Performance of Gel-Treated
Wells in the Arbuckle Formation, Kansas”, paper SPE 113401 prepared for presentation at the 2008 SPE/DOE Improved Oil recovery
Symposium held in Tulsa, Oklahoma, USA, 19-23 April 2008
Korb Kevin, Nicholson Ann: “Bayesian Artificial Intelligence”, Chapman & Hall/CRC Press UK, London, United Kingdom, 2004
Jensen Finn: “Bayesian Networks and Decision Graphs”, Statistics for Engineering and Information Science, Springer, New York, NY,
USA, 2001
Mitchell Tom: “Machine Learning”, International Edition, McGraw-Hill, Singapore, 1997
Neapolitan Richard: “Learning Bayesian Networks”, Pearson Prentice Hall, Upper Saddle River, NJ, USA, 2004
Zangl, G., Hannerer, J., “Data Mining – Applications in the Petroleum Industry”, 2003, Round Oak Publishing, Katy, TX

You might also like