Professional Documents
Culture Documents
Yajing Chen and Zhouyuan Zhu, China University of Petroleum, Beijing; Yangxiao Lu, The University of Texas
at Dallas; Changhao Hu, Fei Gao, Wei Li, Nian Sun, and Tian Feng, E&D Research Institute of Liaohe Oilfield
Company of CNPC
This paper was prepared for presentation at the SPE/IATMI Asia Pacific Oil & Gas Conference and Exhibition held in Bali, Indonesia, 29-31 October 2019.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Reservoir analogue study, which is different with flow physics based prediction methods such as reservoir
simulation, is based on human experience and knowledge from skilled reservoir engineers. In this work,
we present a new workflow to replace the human knowledge-based analogue studies with data analytics
and machine learning techniques.
First, we collect reservoir properties, development parameters, and historical recovery data for 1381
actual U.S. oilfields from Tertiary Oil Recovery Information System (TORIS) by U.S. Department of
Energy. We conduct extensive data cleaning for outliers and missing values. Then, we determine the most
important determining factors for recovery factors. We further use single-variable and bi-variable analysis to
understand relationship between recovery factor and determining factors. Finally, we use train an Artificial
Neural Network (ANN) model to make recovery factors predictions.
We have found that the recovery factors mostly depend on 19 principal factors, a reduction from a total
of more than 70 properties originally in TORIS. We randomly select data from 90% of these oilfields as
training set for machine learning. The predictability and accuracy of such methodology is tested by making
recovery forecasts for the remaining 10% oilfields and by comparing the forecasts with the actual recovery
factors. Eventually, the average error in recovery factor predicted by the trained ANN model is about 10%.
Overall, this methodology has shown strong performance in computer assisted analogue study, which shows
minimum requirements on human knowledge and hands-on work to study these 1381 oilfields.
This work provides a new workflow of using data analytics and machine learning techniques for reservoir
analogues studies. Reservoir engineering software systematically built based on this methodology can serve
as more efficient and accurate predictive tool in studying reservoir analogues.
Introduction
During the exploration and production of oil and gas assets, the major task for reservoir engineers is to make
predictions of future productions and reservoir performance. Two main categories of methods are usually
used for this purpose. The first one is fluid flow physics based reservoir modeling. The other is experience-
2 SPE-196487-MS
based prediction. The first type of method is based on forward solving of certain dynamic porous media flow
model, which includes most reservoir engineering analytical solutions such as Buckley-Leverett, material
balance calculations, unsteady state rate transient analysis and numerical reservoir simulation (Coats, 1982).
The second type of method is based on human knowledge, experience and fuzzy logic for future predictions,
which includes reservoir analogue study, and decline curve analysis such as Arps method. We focus on the
use of data analytics and artificial intelligence for reservoir analogue study in this work.
The origin of the systematic scientific reservoir analogue study can be traced back to 1950s. For example,
such method was proposed and applied for bottom water coning in different oil fields (Meyer et al., 1956).
The behavioral analysis of two-phase fluids separated into two different regions by gravity in the reservoir
was conducted. In recent years, such method also was used to search for analogous reservoirs with the aid
of modern statistics (Martin Rodriguez et al., 2013). Multivariate statistical techniques were used to find a
unique and reproducible list of reservoirs with properties that are most similar to the sleeted target. Such
entire procedure was systematic and unbiased.
Data mining is the process of discovering underlying patterns in large datasets through the combined
efforts of modern machine learning, applied statistics, and computer database systems (Han et al., 2011).
Integrated with petroleum engineering knowledge, such method was used for deepwater Gulf of Mexico
reservoir recovery predictions (Srivastava et al., 2016). It involved the use of easy to implement data
mining techniques by integration of dimensionless numbers. Artificial Neural Networks (ANN) are machine
learning models that are inspired by, but not necessarily identical to, the biological neural networks that
constitute animal brains (Han et al., 2011). Such systems "learn" to perform tasks through considering
examples, generally without being programmed with any task-specific rules. In reservoir engineering, this
method was used in reservoir management to optimize the injection-production ratio in a Middle East
reservoir (Stundner et al., 2001). A huge Barnett shale dataset for water-production was analyzed using
statistical methods to determine hidden structures in well and production data. Neural network based model
was used to predict the potential for water production from wells drilled in the Barnett shale (Awoleke et al.,
2010). Furthermore, in order to overcome the disadvantages of deterministic, cumbersome and expensive
(manpower and time consuming) simulation based performance evaluation for SAGD operations, ANN has
also employed as a data-driven modeling alternative to make recovery predictions in Canadian oil sand
reservoirs (Dzurman et al., 2013). In addition, ANN was also used to assist interpretations of well logs
in order to interpret flow units and predict permeability (Aminian et al., 2003). With limited core data
and relevant geological interpretations, statistical techniques and ANN was successfully used to provide
consistent and reliable predictions for permeability and flow units. In the petrophysics discipline, ANN was
also used successfully to predict the rock facies in carbonate well logs (Tang et al., 2011).
In this study, we explore the potential of using data analytics and artificial intelligence for reservoir
analogue studies. This is a new concept combining the statistical data analysis, training and predictions
using ANN and traditional oilfield data analysis. First, we conduct data cleaning for the massive reservoir
properties and performance data from the corresponding database. We conduct several steps to choose a set
of factors, which we deem as the most relevant to reservoir recovery. Then we use single-variable analysis
and bi-variable analysis to perform the initial exploratory data investigation. From the statistical analysis of
numerous existing oil reservoir performances, we find the underlying patterns and behaviors. Finally, we
train an ANN model to make recovery factor predictions accordingly. This work provides a new workflow
for making predictions of reservoir recovery through reservoir analogue study using both data mining and
machine learning.
features and petrophysical properties to the existing ones. In the early stage of exploration and production,
almost the only way for reservoir engineers to make predictions is through analogue study, because it is
really difficult to obtain enough information to build reliable flow physics based predictive models such
as numerical reservoir simulation with enough input data certainty. Using the analogue study method, such
reservoirs within the similar geological basin, with the similar geological characteristics, or with the similar
petrophysical properties, are used to predict the recovery performance of the targeted reservoir. It can
estimate the recovery factors, initial production rates, production decline rates, and reservoir recovery drive
mechanisms. It can also make recommendations for the preferred well pattern arrangements. When using
such method, if two reservoirs should have been developed using a similar development strategy, comparing
them may yield reliable results. However, if different development strategies are deployed for the two
reservoirs, this approach may have difficulties in making accurate predictions with enough confidence. The
pros and cons of the reservoir analogue study and its comparisons with other types of reservoir prediction
methods are shown in Table 1.
Reading through this huge dataset, generate recovery estimates and make corresponding suggestions for
a new green field development may cost one or two experienced reservoir engineers several months, if all
the work for the reservoir analogue study is performed manually. This dataset serves as an ideal candidate
for testing the use of data analytics and machine learning for assisting reservoir analogue studies.
SPE-196487-MS 5
Table 2—Part of the redundant and irrelevant reservoir properties that are ruled
out in this reservoir analogue study based on reservoir engineering judgments.
Specific names Reference numbers Extensive variables Current conditions Others unrelated variables
Geologic province and play Original oil in place Cumulative oil production Formation temperature
Current producing
Formation name
GOR and injection rate
Table 3—Key factors that are selected in this reservoir analogue study based on reservoir engineering judgments.
Oil Formation
Initial oil saturation (Soi) Initial water saturation (Swi) Total Vertical Depth (TVD) Formation temperature
Volume Factor (FVF)
Permeability API gravity Viscosity Initial gas oil ratio (GOR) Initial pressure
Single-variable analysis
In order to have an overall understanding of the large number of reservoirs contained in this database, we
conduct single-variable analysis on all the key factors. Single-variable histograms of nine selected important
key factors are shown in Fig. 2. It is clear from the histograms that we can observe certain trends about
these reservoirs. Most reservoirs are developed using well spacing from 80 acres down to 5 acres. The oil
API gravities cover a wide range, mostly from 10 to 55 degree. From the API gravity distribution, we can
conclude that most of the reservoirs in the database are light oil. Histogram of true vertical depth shows that
most reservoirs are around 5,000 feet, with very few ones being over 10,000 feet deep. And the reservoir
permeability values mostly range from 2 mD to 2000 mD. The oil viscosities mostly range from 0.2 cp
to 100 cp. This gives mobility ratios that are favorable for water flooding. The initial crude oil saturation
of the reservoir is mostly around 60% to 80%. The inter-layer heterogeneity indicator Dykstra-Parsons
permeability variation VDP is mostly from to 0.5 to 0.95.
SPE-196487-MS 7
Figure 2—Histograms of key factors (well spacing, permeability, Soi, Sor, viscosity, API, GOR, initial Pressure).
Bi-variable analysis
In order to find out whether these key factors are correlated with each other, we conduct bi-variable analysis.
The results are shown in Fig. 3. As we can see, initial water saturation Swi is negatively correlated with
initial oil saturation Soi (correlation coefficient of −0.94); TVD is correlated with reservoir temperature
(correlation coefficient of 0.77) and initial pressure (correlation coefficient of 0.94); TVD is also weakly
correlated with API gravity (correlation coefficient of 0.36); which are consistent with common sense.
Most water flooding operations are conducted in reservoirs with minimal gas saturations. Geothermal
gradient determines the temperature. Depth determines the initial pressure to a large extent in most cases.
Heavier oil with lower API gravity exists in shallower formations due to hydrocarbon migrations, loss of
light components, and biodegradations in shallow depth. For other variable combinations, the correlations
between any two properties are relatively small. Therefore, it is also possible to conduct another set of
8 SPE-196487-MS
machine learning predictions with reduced number of independent variables, with Swi, temperature and
initial pressure ruled out from the training process. Sensitivity studies on this option are conducted in the
following sections. Overall, bi-variable analysis shows the inter-dependency between different variables.
Backed by the understanding of the reservoir engineering physical process, it may help to further reduce
the number of unknowns for following studies.
Figure 3—Bi-variable correlation plots of key factors for reservoir recovery (in sequence): well spacing, net pay,
gross pay, Soi, Swi, oil FVF, TVD, temperature, permeability, API gravity, viscosity, initial GOR, initial Pressure,
Sor, VDP. The upper triangle shows the absolute values of correlation coefficients between any two variables.
SPE-196487-MS 9
Figure 4—Bi-variable correlation plots of more related key factors in this study: Swi-Soi (upper
left), Temperature-TVD (upper right), Pressure-TVD (lower left), and API gravity-TVD (lower right).
We split our dataset into training dataset and validation dataset. The training dataset is randomly selected
from the TORIS database, which contains 90% of all of the reservoirs. The validation dataset contains the
remaining 10% reservoirs. Our ANN model is trained using the training dataset.
We have implemented 3 sets of trainings in our work:
1. Using the entire 19 key controlling factors shown in the Table 3 (method A);
2. Using the reduced set of 16 key controlling factors with initial water saturation Swi, temperature and
initial pressure ruled out based on the result from bi-variable analysis (method B);
3. Using another reduced set of 16 controlling factors with lithology type, depositional environment, and
geological trap type ruled out, due to their nature of being non-numeric properties (method C).
Then, we need to optimize the hyper-parameters of this network using the validation data. Hyper-
parameters are preset parameters, including learning rate, number of iterations, number of units per layer,
etc. The detailed settings of the different layers, number of units in each layer and corresponding parameters
are shown in Table 4. As seen, the ANN model has been customized in the number of units for each layer.
During training, in order to measure the difference between the model output and the actual values, we
use the mean squared error (MSE) loss function to facilitate the training process. Furthermore, we employ
the ADAM optimizer with the learning rate of 0.001 to update the parameters of the network based on the
loss. In addition, we have to set an appropriate epoch number to avoid under fitting and over fittting. Fig.
6, Fig. 7 and Fig. 8 show the correlation between validation MAE and number of epochs for method A,
method B and method C respectively. After tuning the hyper-parameters, we obtain a well-trained ANN
model for our study.
Figure 6—The correlation of validation MAE and number of epochs (method A).
Figure 7—The correlation of validation MAE and number of epochs (method B).
Figure 8—The correlation of validation MAE and number of epochs (method C).
Figure 9—Cross plot of predicted versus actual recovery factors using method A.
Figure 10—Cross plot of predicted versus actual recovery factors using method B.
Figure 11—Cross plot of predicted versus actual recovery factors using method C.
Conclusions
In conclusion, we use data mining techniques to analyze the TORIS database and make reservoir recovery
predications using the trained ANN based model. We demonstrate the integrated process of combining
modern big data technology with traditional oilfield analogue studies to achieve recovery predictions. We
present the following specific conclusions:
SPE-196487-MS 13
1. We perform data cleaning to select the useful dataset from the large number of reservoirs in TORIS
database. The dominant influencing factors for recovery are also successfully screened out from the
various types of reservoirs and geological properties in TORIS. Originally containing a database of 70
properties in 1381 oil reservoirs, we successfully reduced the 70 attribute dimensions to 19 attributes,
or even down to 16 properties, through data cleaning and relevant data analysis.
2. We randomly select 90% of the data from the cleaned database to train the ANN model for machine
learning. Then we use the remaining 10% of the data as test cases to verify the accuracy of predictions
made from the ANN model. When finally comparing the predicted recovery factor with the actual
one, the error of the predicted result by the trained ANN model is about 10%.
Acknowledgement
The authors would like to gratefully acknowledge the financial support from National Natural Science
Foundation of China (Grant No. 51804315). We also want to thank Dr. Xingru Wu from University of
Oklahoma for insightful discussions on this work.
Reference
Aminian, K., Ameri, S., Oyerokun, A., Thomas, B. 2003. Prediction of Flow Units and Permeability Using Artificial
Neural Networks. Presented at the SPE Western Regional/AAPG Pacific Section Joint Meeting, Long Beach,
California, 19-24 May. SPE-83586-MS. https://doi.org/10.2118/83586-MS
Awoleke, O., Lane, R. 2010. Analysis of Data from the Barnett Shale Using Conventional Statistical and Virtual
Intelligence Techniques. SPE Reservoir Evaluation & Engineering, 14(05): 48-49. SPE-127919-PA. https://
doi.org/10.2118/127919-PA
Coats, K. 1982. Reservoir Simulation: State of the Art. Journal of Petroleum Technology, 34(08): 1633-1634. SPE-10020-
PA. https://doi.org/10.2118/10020-PA
Dzurman, P. J., Leung, J. W., Zanon, S. J., Amirian, E. 2013. Data-Driven Modeling Approach for Recovery Performance
Prediction in SAGD Operations. Presented at the SPE Heavy Oil Conference-Canada, Calgary, Alberta, Canada, 11-13
June. SPE-165557-MS. https://doi.org/10.2118/165557-MS
Han, J., Kamber, M., Pei J. 2011. What Is Data Mining. Data Mining: Concepts and Techniques, 3rd Edition, Morgan
Kaufmann Publishers, Waltham, USA.
Martin, H., Escobar, E., Embid, S., Rodriguez, N., Hegazy, M., Lake, L. W. 2013. New Approach to Identify Analogue
Reservoirs. Presented at the SPE Annual Technical Conference and Exhibition, New Orleans, Louisiana, USA, 30
September-2 October. SPE-166449-MS. https://doi.org/10.2118/166449-MS
Meyer, H. I., Searcy, D. F. 1956. Analog Study of Water Coning. Journal of Petroleum Technology, 8(04): 61-64. SPE-554-
G. https://doi.org/10.2118/554-G
Schuetter, J., Mishra, S., Zhong, M., LaFollette, R. 2018. A Data-Analytics Tutorial: Building Predictive Models for
Oil Production in an Unconventional Shale Reservoir. SPE Journal, 23(04): 1075-1089. SPE-189969-PA. https://
doi.org/10.2118/189969-PA
Srivastava, P., Wu, X., Amirlatifi, A., Devegowda, D. 2016. Recovery factor prediction for deepwater Gulf of Mexico
oilfields by integration of dimensionless numbers with data mining techniques. In SPE Intelligent Energy International
Conference and Exhibition. Presented at the SPE Intelligent Energy International Conference and Exhibition,
Aberdeen, Scotland, UK, 6-8 September. SPE-181024-MS. https://doi.org/10.2118/181024-MS
Stundner, M., Al-Thuwaini, J. S. 2001. How Data-Driven Modeling Methods like Neural Networks can help to integrate
different Types of Data into Reservoir Management. Presented at the SPE Middle East Oil Show, Manama, Bahrain,
17-20 March. SPE-68163-MS. https://doi.org/10.2118/68163-MS
Tang, H., Meddaugh, W. S., Toomey, N. 2011. Using an Artificial-Neural-Network Method To Predict Carbonate
Well Log Facies Successfully. SPE Reservoir Evaluation & Engineering, 14(1): 35-44. SPE-123988-PA. https://
doi.org/10.2118/123988-PA
TORIS Data Preparation Guidelines for Management and Operating Contract for the Department of Energy's National
Oil and Related Programs. Bartlesville, Oklahoma: BDM-Oklahoma, Inc. https://www.netl.doe.gov/research/oil-and-
gas/software/databases#NPC